Posts Tagged ‘Query’

MySQL specific shortcuts

Thursday, July 24th, 2008

MySQL provides many extentions to SQL which help performance in many common use scenarios. Among these are INSERT … SELECT, INSERT … ON DUPLICATE KEY UPDATE, and REPLACE.

I rarely hesitate to use the above since they are so convenient and provide real performance benefits in many situations. MySQL has other keywords which are more dangerous, however, and should be used sparingly. These include INSERT DELAYED, which tells MySQL that it is not important to insert the data immediately (say, e.g., in a logging situation). The problem with this is that under high load situations the insert might be delayed indefinitely, causing the insert queue to baloon. You can also give MySQL index hints about which indices to use. MySQL gets it right most of the time and when it doesn’t it is usually because of a bad scheme or poorly written query.

Search Engine Elements

Tuesday, July 15th, 2008

The three major elements of a search engines are: the spider, also called the crawler; the index or catalog; and the search engine which displays the results of your query in your browser.

The spider visits your web page, indexes it, and then follows links to other pages within the site. This is sometimes referred to as being “spidered” or “crawled.” The spider returns to the site every so often looking for changes.

The index is a giant database that contains a copy of every web page that the spider finds. When a web page is changed, then this database is updated with the new information.

Sometimes it takes a while for pages or changes to be added to the index. Therefore, a web page may have been “spidered” but not yet “indexed.” Until it is added to the index, it is not available to searches by the search engine.

Search engine software sifts through the millions of pages recorded in the index to find matches to a query and ranks them in the order of what it believes is most relevant. Different search engines often produce very different results.

Search Engine Optimization-Keyword Search

Tuesday, July 15th, 2008

Most search engines handle words and simple phrases.  In its simplest form, text search looks for pages with lots of occurrences of each of the words in a query, stopwords aside.  The more common a word is on a page, compared with its frequency in the overall language, the more likely that page will appear among the search results.  Hitting all the words in a query is a lot better than missing some.

Search engines also make some efforts to “understand” what is meant by the query words.  For example, most search engines now offer optional spelling correction.  And increasingly they search not just on the words and phrases actually entered, but the also use stemming to search for alternate forms of the words (e.g., speak, speaker, speaking, spoke).  Teoma-based engines are also offering refinement by category, ala the now-defunct Northern Light.  However, Excite-like concept search has otherwise not made a comeback yet, since the concept categories are too unstable.

When ranking results, search engines give special weight to keywords that appear:

* High up on the page
* In headings
* In BOLDFACE (at least in Inktomi)
* In the URL
* In the title (important)
* In the description
* In the ALT tags for graphics.
* In the generic keywords metatags (only for Inktomi, and only a little bit even for them)
* In the link text for inbound links.

More weight is put on the factors that the site owner would find it awkward to fake, such as inbound link text, page title (which shows up on the SERP — Search Engine Results Page), and description.

Search engine optimization-Page Rank

Tuesday, July 15th, 2008

Search engine ranking algorithms are closely guarded secrets, for at least two reasons: search engine companies want to protect their methods from their competitors, and they also want to make it difficult for web site owners to manipulate their rankings.

That said, a specific page’s relevance ranking for a specific query currently depends on three factors:

* Its relevance to the words and concepts in the query
* Its overall link popularity
* Whether or not it is being penalized for excessive search engine optimization (SEO).

Examples of SEO abuse would be a lot of sites linked to each other in a circular scam, or excessive and highly ungrammatical stuffing with keywords.

Factor 2 was innovated by Google with PageRank.  Essentially, the more incoming links your page has, the better.  But it is more complicated than that:  indeed, PageRank is a tricky concept because it is circular, as follows:   Every page on the Internet has a minimum PageRank score just for existing.  85%  (at least, that’s the best known estimate, based on an early paper) of this PageRank is passed along to the pages that page links to, divided more or less equally along its outgoing links.   A page’s PageRank is the sum of the minimum value plus all the PageRank passed to it via incoming links.

Although this is circular, mathematical algorithms exist for calculating it iteratively.

In one final complication, what I just said applies to “raw PageRank.”   Google actually reports PageRank scores of 0 to 10 that are believed to be based on the logarithm of raw PageRank (they’re reported as whole numbers).   And the base of that logarithm is believed to be approximately 6.

Anyhow, there are about 30 sites on the Web of PageRank10, including Yahoo, Google, Microsoft, Intel, and NASA.  IBM, AOL, and CNN, by way of contrast, were only at PageRank 9 as of early in 2004.

Further refinements in link popularity rankings are under development.  Notably, link popularity can be made specific to a subject or category; i.e., pages can have different PageRanks for health vs. sports vs. computers vs. whatever.  Supposedly, AskJeeves/Teoma already works that way.

It is believed that Inktomi, Altavista, et al. use link popularity in their ranking algorithms, but to a much lesser extent than Google.  Yahoo, owner of Inktomi, Altavista, Alltheweb, is rolling out a new search engine, which reportedly includes a feature called Web Rank.  More on how that works soon.

SEO:Keyword Searching

Monday, June 30th, 2008

This is the most common form of text search on the Web.  Most search engines do their text query and retrieval using keywords.

What is a keyword, exactly?  It can simply be any word on a webpage.  For example, I used the word “simply” in the previous sentence, making it one of the keywords for this particular webpage in some search engine’s index.   However, since the word “simply” has nothing to do with the subject of this webpage (i.e., how search engines work), it is not a very useful keyword.   Useful keywords and key phrases for this page would be “search,” “search engines,” “search engine methods,” “how search engines work,” “ranking” “relevancy,” “search engine tutorials,” etc.  Those keywords would actually tell a user something about the subject and content of this page.

Unless the author of the Web document specifies the keywords for her document (this is possible by using meta tags), it’s up to the search engine to determine them.  Essentially, this means that search engines pull out and index words that appear to be significant.  Since since engines are software programs, not rational human beings, they work according to rules established by their creators for what words are usually important in a broad range of documents.  The title of a page, for example, usually gives useful information about the subject of the page (if it doesn’t, it should!).  Words that are mentioned towards the beginning of a document (think of the “topic sentence” in a high school essay, where you lay out the subject you intend to discuss) are given more weight by most search engines.   The same goes for words that are repeated several times throughout the document.

Some search engines index every word on every page. Others index only part of the document.

Full-text indexing systems generally pick up every word in the text except commonly occurring stop words such as “a,” “an,” “the,” “is,” “and,” “or,” and “www.”  Some of the search engines discriminate upper case from lower case; others store all words without reference to capitalization.

SEO : Nomenclatures

Thursday, June 26th, 2008

Whenever possible, you should save your images, media, and web pages with the keywords in the file names. For example, if your keyword phrase is “golf putters” you’ll want to save the images used on that page as golf-putters-01.jpg or golf_putters_01.jpg (either will work). It’s not confirmed, but many SEO’s have experienced improvement in ranking by renaming images and media.

More important is your web page’s filename, since many search engines now allow users to query using “inurl:” searches. Your filename for the golf putters page could be golf-putters.html or golf_putters.html. Anytime there is an opportunity to display or present content, do your best to insure the content has the keywords in the filename (as well as a Title or ALT attribute).

Blogs and SEO - Keywords

Thursday, June 26th, 2008

You have a choice. You can target a general high traffic keyword you have little chance of ranking well for and get barely any traffic. Or you can shoot for a keyword that gets a moderate level of targeted traffic resulting in more subscribers and sales. I like to call this a “lucrative keyword”. Whatever you call them, here’s the most important thing: They may not get you the most traffic, but they often bring the most profit.
More Web Site Traffic and More Sales? Not Always

You may be surprised to learn that there isn’t always a correlation between high traffic and high sales. Many of the most profitable sites in the world get moderate traffic because their lucrative keywords result in a much higher ratio of visitors to buyers.
Length of Search Query is a Factor

A recent article in Information Week stated that the highest conversion rates from search engine traffic comes from people who do four word queries. The great thing about your blog is that it can get so well-indexed that you have the potential to show up for any number of four word phrases that are relevant to your industry.
Target Your Blog for More Traffic and Sales

It isn’t just the four word phrases that get converting traffic - there are two and three word phrases that can bring you traffic and sales. Targeting your blog discussion to a two or three word phrase that has a high yield of traffic, and yet has little competition, is not a dream of past Internet days. Another recent study revealed that surprisingly high percentages of search engine queries debuted as late as 2004.As long as there are new developments, new products, services and trends, you’ll never have a shortage of these terms if you learn how to discover them. a