Posts Tagged ‘URLs’

Search Engine Optimization-Searching by Means of Search Engines

Tuesday, July 15th, 2008

This is where things start to get complicated.
Search engines are trickier than they look!  You’ll discover this the first time you enter a query on C++, the programming language.  At least of the Web search engines will essentially say, “Huh?”

C++ is not a word.  It’s a letter followed by two characters that might, depending on the index, be regarded merely as punctuation.  Many text search engines have trouble handling input of this type.  Many don’t deal too well with numbers, either.  So much for “007,” “R2D2,”or “Catch-22.”

Important Note:  This problem is no longer as bad as it used to be.  I’m now finding relevant hits for C++ on a majority of search engines sites.

Here’s another example of a text string search engines hate:  To be or not to be.  Just about anyone who finished junior high school will be able to tell you where the phrase comes from and (possibly!) what it means.  But some search engines choke because all the words in the phrase are stop words–i.e., unimportant words too short and too common to be considered relevant strings on which to search.   However, if you enclose the query in quotation marks, forcing the search engine to find the words, “to be or not to be” in that precise order, most search engines can  recognize the phrase as a famous quotation from Hamlet.

Let’s take a less obvious example.  Suppose you’re a fan of murder mysteries and you want to search the Web for the home pages of all your favorite authors in that genre.  If you simply enter the words “mystery” and “writer,” most search engines will return hyperlinks to all Web documents that contain the word “mystery” or the word, “writer.”  This will probably include hundreds–or even thousands–of URLs, most of which will have no relevance to your search. If you enter the words as a phrase, however, you stand a better chance of getting some good hits.

However, as search technology advances, this is not as much of a problem as it was a couple of years ago. Many search engines will now automatically apply the “adjacency” operator when responding to a two-word query. This means that they will indeed look for documents in which your two words appear next to each other.

If you understand how search engines organize information and run queries, you can maximize your chances of getting hits on URLs that matter.

Long URLs Break Layout

Friday, June 20th, 2008

While setting up a site to display a news RSS feed and I found that Tables don’t handle extra long URLs very well. They stretch the TD cell and break your design. So much for doing markup with tables.

Implement Smart URLs

Thursday, June 19th, 2008

The best URL structure for blogs is, in my opinion, as short as possible while still containing enough information to make an educated guess about the content you’ll find on the page. I don’t like the 10 hyphen, lengthy blog titles that are the byproduct of many CMS plugins, but they are certainly better than any dynamic parameters in the URL. Yes - I know I’m not walking the talk here, and hopefully it’s something we can fix in the near future. To those who say that one dynamic parameter in the URL doesn’t hurt, I’d take issue - just re-writing a ?ID=450 to /450 has improved search traffic considerably on several blogs we’ve worked with.

Most spam comes from just 6 botnets

Monday, June 16th, 2008

With the impact on Mega-D’s operations, Srizbi has now taken over as the leader of the spam pack responsible for nearly 40% of spam. Srizbi is well known as a spamming Trojan, and an advanced one at that. As we reported here, lately Srizbi has been particularly active in distributing spam with URLs that link to websites hosting more copies of the spambot. Analysis of Srizbi indicates it is extremely stealthy, operating in full kernel mode, which, among other things, allows it to hide its network activities and bypass sniffer tools. One interesting thing we noticed about Srizbi is that it provides continuous feedback and statistics to control servers about which email addresses were good, and which were bad.

How to Make a WordPress Blog Duplicate Content Safe

Monday, June 16th, 2008

In one of my recent posts I wrote about the duplicate content issue. This topic is especially important to me since my blog uses the WordPress content management system which, when used with the default configuration, is not duplicate content proof. In fact this CMS is capable to render almost 100% of your content duplicate. As usual the fault of the system has roots in its advantages. WordPress has many features facilitating blogging and linking, such as RSS feeds to posts and comments, trackback URLs, monthly archives and so on. In the same time this variety of URLs returning similar or identical pages represents a clear case of duplicate content.