In a former post I've mentioned that I have no idea how come invalid URLs for which no link on the site (nor sitemap) exists are being tried by legitimate bots such as GoogleBot.
Now I have a partial answer for the non existing URLs presented in the post. Some time ago, a twitter account for Colnect editors has been opened @ColnectEdits. It automatically twits about edits done on Colnect's catalogs so that other collectors may track it.
An interesting thing that you can see in the attached picture is the the links generated by the tweets are shown as http://colnect.com/en/phone... but actually do link to the correct full URLs, such as http://colnect.com/en/phonecards/item/id/9212. So it seems that the web crawlers read both as legitimate URLs and try to fetch them. Since it seems GoogleBot does not want to learn that /en/phone returns 404 from Colnect, I am now forced to add these as legitimate URLs to my site to avoid seeing more 404s in my logs. Oh well...
Colnect, Connecting Collectors. Colnect offers revolutionizing services to Collectors the world over. Colnect is available in 63 languages and offers extensive collectible catalogs and the easiest personal collection management and Auto-Matching for deals. Join us today :)
Wednesday, April 15, 2009
Subscribe to:
Post Comments (Atom)
Link and Search
Did you like reading it? Stay in the loop via RSS. Thanks :)
There's nothing wrong with a 404
ReplyDelete