Posts Tagged ‘Results’

Search Engines vs Directories

Tuesday, July 15th, 2008

Search engines, such as Google, create their listings automatically. Search engines crawl through the web. Search engines eventually find your site and index the pages they find. Page titles, body text (ie, great content), META tags and other elements all play a role in what gets indexed. People then review the results of what was found by the search engine, based on keywords they type into the search engine.

A directory such as Yahoo! Directory depends on human editors to create its listings. You submit a description of your site to the directory for editors to review. A good site, with good content, will be more likely to get reviewed than a poor site. A search of a directory looks for matches only in that directory’s index.

Yahoo! also has a search engine that includes spidered websites along with their directory listings and “Sponsor Results” which are pay per click ads, similar to Google’s Adwords. Originally Yahoo! displayed only listings from their directory. Then in 2002 they added search engine listings from Google. In 2004 they started using their own search engine based on AltaVista’s technology. A few years later they acquired Overture (formerly GoTo) which was the first pay per click program.

Search Engine Optimization for Graphic Designers

Tuesday, July 15th, 2008

Here is a quick look into a few of the most important factors that influence how your studio’s website ranks in Google search results. With a few strategic changes, and an awareness about how Google’s search algorithms work, you can make a big impact on how Google-friendly your website becomes.

NON-BROWSER GOOGLE VIA CELLPHONE AND E-MAIL

Tuesday, July 15th, 2008

VIA CELLPHONE  <google.com/sms>: Send text message to GOOGL (46645) to receive Google results on your phone.

VIA E-MAIL: Send an e-mail to google@6url.com with a search term in the subject line - get top 10 Google results via e-mail. Useful from a Blackberry or similar device.

Swiss Army Google

Tuesday, July 15th, 2008

Google has a number of services that can help you accomplish tasks you may never have thought to use Google for. For example, the new calculator feature (www.google.com/help/features.html#calculator) lets you do both math and a variety of conversions from the search box. For extra fun, try the query “Answer to life the universe and everything.”

Let Google help you figure out whether you’ve got the right spelling—and the right word—for your search. Enter a misspelled word or phrase into the query box (try “thre blund mise”) and Google may suggest a proper spelling. This doesn’t always succeed; it works best when the word you’re searching for can be found in a dictionary. Once you search for a properly spelled word, look at the results page, which repeats your query. (If you’re searching for “three blind mice,” underneath the search window will appear a statement such as Searched the web for “three blind mice.”) You’ll discover that you can click on each word in your search phrase and get a definition from a dictionary.

Suppose you want to contact someone and don’t have his phone number handy. Google can help you with that, too. Just enter a name, city, and state. (The city is optional, but you must enter a state.) If a phone number matches the listing, you’ll see it at the top of the search results along with a map link to the address. If you’d rather restrict your results, use rphonebook: for residential listings or bphonebook: for business listings.

Syntax Search Tricks

Tuesday, July 15th, 2008

Using a special syntax is a way to tell Google that you want to restrict your searches to certain elements or characteristics of Web pages. Google has a fairly complete list of its syntax elements at www.google.com/help/operators.html. Here are some advanced operators that can help narrow down your search results.

Intitle: at the beginning of a query word or phrase (intitle:”Three Blind Mice”) restricts your search results to just the titles of Web pages.

Intext: does the opposite of intitle:, searching only the body text, ignoring titles, links, and so forth. Intext: is perfect when what you’re searching for might commonly appear in URLs. If you’re looking for the term HTML, for example, and you don’t want to get results such as www.mysite.com/index.html, you can enter intext:html.

Link: lets you see which pages are linking to your Web page or to another page you’re interested in.

Try using site: (which restricts results to top-level domains) with intitle: to find certain types of pages. For example, get scholarly pages about Mark Twain by searching for intitle:”Mark Twain”site:edu. Experiment with mixing various elements; you’ll develop several strategies for finding the stuff you want more effectively. The site: command is very helpful as an alternative to the mediocre search engines built into many sites.

Google Shares Three Ranking Philosophies

Monday, July 14th, 2008

The Google Search Quality Team is keeping its promise to explain more about how they conduct their work. As usual and expected, it’s fantastically vague, but since a chunk of our readers at any given time are new to search, it’s worth going over.

Writing on the Official Google blog, Amit Singhal, a Google Fellow on the Core Ranking Team, defines Google ranking:

“Google ranking is a collection of algorithms used to find the most relevant documents for a user query. We do this for hundreds of millions of queries a day, from a collection of billions and billions of pages. These algorithms are run for every query entered into most of Google’s search services. While our web search is the most used Google search service and the most widely known, the same ranking algorithms are also used - with some modifications - for other Google search services, including Images, News, YouTube, Maps, Product Search, Book Search, and more.”

Then he gave three philosophies that the Core Ranking Team follows:

1) Best locally relevant results served globally.
2) Keep it simple.
3) No manual intervention.

Singhal says that the team strives for simplicity in their architecture, something that Twitter has been struggling with lately. Obviously, with all the queries conducted and the massive amount of content to be indexed, it coud be easy to piece together a very complex architecture (similar to Google’s woes with their ad products). With approximately 10 ranking updates per week, Singhal says the team takes simplicity in architecture into consideration in every single update.

Singhal also emphasized philosophy #3 - that Google does not hand edit results.

“You are the ones creating pages and linking to pages. We are using all this human contribution through our algorithms. The final ordering of the results is decided by our algorithms using the contributions of the greater Internet community, not manually by us.”

SEO:Relevancy Rankings

Monday, June 30th, 2008

Most of the search engines return results with confidence or relevancy rankings.  In other words, they list the hits according to how closely they think the results match the query.  However, these lists often leave users shaking their heads on confusion, since, to the user, the results may seem completely irrelevant.

Why does this happen?  Basically it’s because search engine technology has not yet reached the point where humans and computers understand each other well enough to communicate clearly.

Most search engines use search term frequency as a primary way of determining whether a document is relevant.  If you’re researching diabetes and the word “diabetes” appears multiple times in a Web document, it’s reasonable to assume that the document will contain useful information.  Therefore, a document that repeats the word “diabetes” over and over is likely to turn up near the top of your list.

If your keyword is a common one, or if it has multiple other meanings, you could end up with a lot of irrelevant hits.  And if your keyword is a subject about which you desire information, you don’t need to see it repeated over and over–it’s the information about that word that you’re interested in, not the word itself.

Some search engines consider both the frequency and the positioning of keywords to determine relevancy, reasoning that if the keywords appear early in the document, or in the headers, this increases the likelihood that the document is on target.  For example, one method is to rank hits according to how many times your keywords appear and in which fields they appear (i.e., in headers, titles or plain text).  Another method is to determine which documents are most frequently linked to other documents on the Web.  The reasoning here is that if other folks consider certain pages important, you should, too.

If you use the advanced query form on AltaVista, you can assign relevance weights to your query terms before conducting a search.  Although this takes some practice, it essentially allows you to have a stronger say in what results you will get back.

As far as the user is concerned, relevancy ranking is critical, and becomes more so as the sheer volume of information on the Web grows.  Most of us don’t have the time to sift through scores of hits to determine which hyperlinks we should actually explore. The more clearly relevant the results are, the more we’re likely to value the search engine.

Regular Expressions for input validation?

Saturday, June 28th, 2008

It is always a good idea to try and avoid regular expressions, where possible and practical. There are functions in PHP which will do exactly what some regular expressions do, but faster. Take this example:

if(ereg(’[0123456789]‘, $number)) {
// Is integer
}else{
// Is not integer
}

It is much faster to do this instead:

if(ctype_digit($number)) {
// Is integer
}else{
// Is not integer
}

To test this, I used ereg(’[0123456789]‘, $number) 1,000,000 times, followed by using ctype_digit($number) 1,000,000 times. Here are the results:

Regular Expressions: 2.401 seconds
ctype_digit: 0.985 seconds
Time saved: 1.416 seconds; 58.98%

Getting results from a select multiple HTML tag.

Saturday, June 28th, 2008

The select multiple tag in an HTML construct allows users to select multiple items from a list. These items are then passed to the action c for the form. The problem is that they are all passed with the same widget name. I.e.
<select name=”var” multiple=”yes”>
Each selected option will arrive at the action handler as var=option1, var=option2, var=option3. Each option will overwrite the contents of the previous $var variable. The solution is to use PHP’s “array from form element” feature. The following should be used:
<select name=”var[]” multiple=”yes”>
Now first item becomes $var[0], the next $var[1], etc.

How a Search Engine Might Use a Searcher’s Knowledge, Interests, and Education to Rerank and Validate Search Results

Thursday, June 26th, 2008

The amount of pages on the Web that a search engine could try to index is extremely large, and the approaches that search engines attempt to use to index and rank those pages is mostly an automated effort, but that doesn’t mean that the search engines don’t have people take a look at search results, and try to gauge how relevant their automated results might be.

A search engine typically locates web pages that contain the keywords entered by a searcher within a search box. The order that those results appear are based upon a number of algorithms used by search engines which look at various factors, such as: the frequency and number of entered keywords that are within each page and the position of the entered keywords within each page.

An example might be a first page that has a keyword located in the title or near the top of the page ranking higher than a second page that has a keyword in a footer or near the bottom of such second page. That first page might be presented to a searcher before the second page because of the location of the keyword.

While this automated approach might be satisfactory to some searchers, other searchers might find rankings of pages to be inadequate or irrelevant to their needs.

How might a search engine verify page ranking results of a search algorithm with respect to the specific needs or characteristics of specific groups of users?

A recent patent application from Yahoo explores the topic, and it wouldn’t be too much of a surprise of the other major search engines employed some processes of their own to do something similar. In fact, a set of Quality Guidelines (pdf) were uncovered from Google, which provides instructions to people who manually review the pages that appear in search results from Google.