Posts Tagged ‘methods’

SEO:Concept-based searching

Monday, June 30th, 2008

Excite used to be the best-known general-purpose search engine site on the Web that relies on concept-based searching.  It is now effectively extinct.

Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say.  In the best circumstances, a concept-based search returns hits on documents that are “about” the subject/theme you’re exploring, even if the words in the document don’t precisely match the words you enter into the query.

How did this method work?  There are various methods of building clustering systems, some of which are highly complex, relying on sophisticated linguistic and artificial intelligence theory that we won’t even attempt to go into here.  Excite used to a numerical approach.  Excite’s software determines meaning by calculating the frequency with which certain important words appear.  When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is “about” a certain subject.

For example, the word heart, when used in the medical/health context, would be likely to appear with such words as coronary, artery, lung, stroke, cholesterol, pump, blood, attack, and arteriosclerosis.  If the word heart appears in a document with others words such as flowers, candy, love, passion, and valentine, a very different context is established, and a concept-oriented search engine returns hits on the subject of romance.

How Does PageRank Work in google?

Thursday, June 19th, 2008

1. PageRank is only one of numerous methods Google uses to determine a page’s relevance or importance.
2. Google interprets a link from page A to page B as a vote, by page A, for page B. Google looks not only at the sheer volume of votes; among 100 other aspects it also analyzes the page that casts the vote. However, these aspects don’t count, when PageRank is calculated.
3. PageRank is based on incoming links, but not just on the number of them - relevance and quality are important (in terms of the PageRank of sites, which link to a given site).
4. PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn)). That’s the equation that calculates a page’s PageRank.
5. Not all links weight the same when it comes to PR.
6. If you had a web page with a PR8 and had 1 link on it, the site linked to would get a fair amount of PR value. But, if you had 100 links on that page, each individual link would only get a fraction of the value.
7. Bad incoming links don’t have impact on Page Rank.
8. Ranking popularity considers site age, backlink relevancy and backlink duration. PageRank doesn’t.
9. Content is not taken into account when PageRank is calculated.
10. PageRank does not rank web sites as a whole, but is determined for each page individually.
11. Each inbound link is important to the overall total. Except banned sites, which don’t count.
12. PageRank values don’t range from 0 to 10. PageRank is a floating-point number.
13. Each Page Rank level is progressively harder to reach. PageRank is believed to be calculated on a logarithmic scale.
14. Google calculates pages PRs permanently, but we see the update once every few months (Google Toolbar).