Wednesday, April 15, 2009

Invalid URL Requests From Legitimate Bots

In a former post I've mentioned that I have no idea how come invalid URLs for which no link on the site (nor sitemap) exists are being tried by legitimate bots such as GoogleBot.

Now I have a partial answer for the non existing URLs presented in the post. Some time ago, a twitter account for Colnect editors has been opened @ColnectEdits. It automatically twits about edits done on Colnect's catalogs so that other collectors may track it.



An interesting thing that you can see in the attached picture is the the links generated by the tweets are shown as http://colnect.com/en/phone... but actually do link to the correct full URLs, such as http://colnect.com/en/phonecards/item/id/9212. So it seems that the web crawlers read both as legitimate URLs and try to fetch them. Since it seems GoogleBot does not want to learn that /en/phone returns 404 from Colnect, I am now forced to add these as legitimate URLs to my site to avoid seeing more 404s in my logs. Oh well...

1 comment:

We welcome comments to our blog post but MANUALLY verify each comment. Spam comments will be reported. When asking for an answer on anything Colnect related, please use Colnect's forums. Thanks and happy Colnecting :)

Link and Search

Did you like reading it? Stay in the loop via RSS. Thanks :)