It's not uncommon to see weird requests coming to my server at
Colnect but I found this one interesting since it came from GoogleBot, the bot used by Google to index the web for its search engine.
The request made by the bot was for the URL:
/warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.htmlNeedless to say, this URL never existed on my domain. Seeing the actual page of
atoall . com, having the title "
Hot girls pictures free games boys images local news all", made me suspect spamming.
Searching for this URL on
Google currently gets
106,000 results for
warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html.
which means that Google has indexed that many pages which don't really exist on the other domains. Some very well known domains have this page URL indexed on Google.
How does it happen?
Well, some sites are configured to never return a proper 404 code to let bots and people know the page is not found on their server. They prefer returning a 200 code that tells bots and browsers the page is found. The page's content, displayed to the user, indicates that what the user was looking for was never found. Most users would never know the difference between getting a 404 or 200 code.
So why do they generate a 200 code?
Well, it makes search bots, like Google, index a page that has content which was searched by a user. The next time a user would search for the same term on a search engine, there is a chance that he'll get to their page. Also, as some plug-ins to browsers can "steal" 404 pages by replacing them with their own custom results, returning a 200 code prevents it.
Why shouldn't they generate a 200 code?
The downside of returning such pages is the obvious spamming by sites such as
atoall . com and others which seek illegitimate sources of traffic. According to
Alexa, the site has been gaining traffic since August and it wouldn't come as a surprised if this unique form of spamming Google's search engine has a lot to do with it.
Another issue is that the search engine may choose to penalize sites which return the wrong results. The search engine can easily know if that is the case by requesting randomly generated page URLs.
So now my only question is: how come Google didn't already penalize
atoall . com and removed it from their search results?