Saturday, December 5, 2009

New SPAM technique? "warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html"

It's not uncommon to see weird requests coming to my server at Colnect but I found this one interesting since it came from GoogleBot, the bot used by Google to index the web for its search engine.

The request made by the bot was for the URL:
/warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html

Needless to say, this URL never existed on my domain. Seeing the actual page of atoall . com, having the title "Hot girls pictures free games boys images local news all", made me suspect spamming.

Searching for this URL on Google currently gets 106,000 results for warning_this_is_english_domain_to_solve_this_problem_submit_site_in_atoall.com.html.
which means that Google has indexed that many pages which don't really exist on the other domains. Some very well known domains have this page URL indexed on Google.


How does it happen?



Well, some sites are configured to never return a proper 404 code to let bots and people know the page is not found on their server. They prefer returning a 200 code that tells bots and browsers the page is found. The page's content, displayed to the user, indicates that what the user was looking for was never found. Most users would never know the difference between getting a 404 or 200 code.

So why do they generate a 200 code?



Well, it makes search bots, like Google, index a page that has content which was searched by a user. The next time a user would search for the same term on a search engine, there is a chance that he'll get to their page. Also, as some plug-ins to browsers can "steal" 404 pages by replacing them with their own custom results, returning a 200 code prevents it.

Why shouldn't they generate a 200 code?



The downside of returning such pages is the obvious spamming by sites such as atoall . com and others which seek illegitimate sources of traffic. According to Alexa, the site has been gaining traffic since August and it wouldn't come as a surprised if this unique form of spamming Google's search engine has a lot to do with it.

Another issue is that the search engine may choose to penalize sites which return the wrong results. The search engine can easily know if that is the case by requesting randomly generated page URLs.

So now my only question is: how come Google didn't already penalize atoall . com and removed it from their search results?

Thursday, December 3, 2009

Google Analytics Asynchronous Tracking

As Colnect is using Google Analytics to measure our traffic, we're happy to learn about the change to their tracking script. Announced 2 days ago and now implemented on Colnect, the script will now be loaded asynchronously and thus not block other page elements from loading. This should results is slightly faster load times and improve user experience on the site.

So now the question is when such asynchronous code be available for AdSense? I see no reason why the ads shouldn't load only when the page has been rendered.

Tuesday, December 1, 2009

Note to Facebook Collectors of Coins, Banknotes, Stamps and other Collectibles

Stamps? Coins? Banknotes? Phone Cards? Bottle Caps? Tea Bags?

Though only recently have we publicized our FaceBook fan page for collectors, we already have to change it due to limitations with FaceBook. The former page name was "colnect.com", which is not a bad name but when searching FaceBook for "colnect" it would never appear in the search results. Contacting FaceBook's support resulted in no answer. Trying to change the name of the page also fails so we've now opened a new page "" and welcome all collectors to join us there.

Link and Search

Did you like reading it? Stay in the loop via RSS. Thanks :)