As the number of internet users and available web pages worldwide continue to grow exponentially, the importance of maintaining a high index on search engines is magnified with it. Consequently, many spammers or special-interest groups wishing to spread a particular message have developed more sophisticated techniques for "cheating" their way to the top of search results. One of these newer methods involves exploiting sites that display a 200 Code for error messages by adding the spammer's unsolicited content into Google's (or another search engine's) indices for the purpose of generating traffic the next time a user searches for that term. For more details about this technique, see this previous blog post on the subject.
However, a more interesting phenomenon is the recent adoption of this method by political organizations and other non-commercial action groups. For example, typing the phrase "Save Us From Berlusconi" (see Image 1) into Google generates countless results in this fashion, evidently a result of the efforts made by individuals and organizations opposed to the Italian Prime Minister to get their message across.
This was brought to our attention after the messages appeared in the search engine indices for the site Transposh. Similarly, these indexed pages can also appear even without a specific search being carried out for them (see Image 2), a trend that has been noticed by the Colnect administrator who reported the problem originally.
This relatively recent spamming method has the potential to undermine the legitimacy of search engine results and consequently make some users think twice before clicking on a link that appears at the top of their results list. Google and the other major search engines need to put a halt to this problem before it becomes even more prevalent and completely compromises the integrity of their search functions.
Colnect, Connecting Collectors. Colnect offers revolutionizing services to Collectors the world over. Colnect is available in 63 languages and offers extensive collectible catalogs and the easiest personal collection management and Auto-Matching for deals. Join us today :)
Showing posts with label spam. Show all posts
Showing posts with label spam. Show all posts
Wednesday, December 23, 2009
Saturday, December 5, 2009
New SPAM technique? "warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html"
It's not uncommon to see weird requests coming to my server at Colnect but I found this one interesting since it came from GoogleBot, the bot used by Google to index the web for its search engine.
The request made by the bot was for the URL:
/warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html
Needless to say, this URL never existed on my domain. Seeing the actual page of atoall . com, having the title "Hot girls pictures free games boys images local news all", made me suspect spamming.
Searching for this URL on Google currently gets 106,000 results for warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html.
which means that Google has indexed that many pages which don't really exist on the other domains. Some very well known domains have this page URL indexed on Google.
Well, some sites are configured to never return a proper 404 code to let bots and people know the page is not found on their server. They prefer returning a 200 code that tells bots and browsers the page is found. The page's content, displayed to the user, indicates that what the user was looking for was never found. Most users would never know the difference between getting a 404 or 200 code.
Well, it makes search bots, like Google, index a page that has content which was searched by a user. The next time a user would search for the same term on a search engine, there is a chance that he'll get to their page. Also, as some plug-ins to browsers can "steal" 404 pages by replacing them with their own custom results, returning a 200 code prevents it.
The downside of returning such pages is the obvious spamming by sites such as atoall . com and others which seek illegitimate sources of traffic. According to Alexa, the site has been gaining traffic since August and it wouldn't come as a surprised if this unique form of spamming Google's search engine has a lot to do with it.
Another issue is that the search engine may choose to penalize sites which return the wrong results. The search engine can easily know if that is the case by requesting randomly generated page URLs.
So now my only question is: how come Google didn't already penalize atoall . com and removed it from their search results?
The request made by the bot was for the URL:
/warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html
Needless to say, this URL never existed on my domain. Seeing the actual page of atoall . com, having the title "Hot girls pictures free games boys images local news all", made me suspect spamming.
Searching for this URL on Google currently gets 106,000 results for warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html.
which means that Google has indexed that many pages which don't really exist on the other domains. Some very well known domains have this page URL indexed on Google.
How does it happen?
Well, some sites are configured to never return a proper 404 code to let bots and people know the page is not found on their server. They prefer returning a 200 code that tells bots and browsers the page is found. The page's content, displayed to the user, indicates that what the user was looking for was never found. Most users would never know the difference between getting a 404 or 200 code.
So why do they generate a 200 code?
Well, it makes search bots, like Google, index a page that has content which was searched by a user. The next time a user would search for the same term on a search engine, there is a chance that he'll get to their page. Also, as some plug-ins to browsers can "steal" 404 pages by replacing them with their own custom results, returning a 200 code prevents it.
Why shouldn't they generate a 200 code?
The downside of returning such pages is the obvious spamming by sites such as atoall . com and others which seek illegitimate sources of traffic. According to Alexa, the site has been gaining traffic since August and it wouldn't come as a surprised if this unique form of spamming Google's search engine has a lot to do with it.
Another issue is that the search engine may choose to penalize sites which return the wrong results. The search engine can easily know if that is the case by requesting randomly generated page URLs.
So now my only question is: how come Google didn't already penalize atoall . com and removed it from their search results?
Subscribe to:
Posts (Atom)
Link and Search
Did you like reading it? Stay in the loop via RSS. Thanks :)