Sunday, April 12, 2009

When Web Crawlers Attack

Web crawlers, or search bots, are very popular beasts of the Internet. They allow your site to be automatically scanned and indexed. The main advantage is that people may find your site through these indexes and visit your site. The main disadvantages is that your content is copied somewhere else (where you have no control over it) and that the bots take your server resources and bandwidth.

On my site for collectors, I have created a pretty extensive robots.txt file to prevent some nicer bots from scanning parts of the site they shouldn't and blocking semi-nice bots. In addition, server rules to block some less than nice bots out there were added.

The biggest problem left unanswered is what to do when the supposedly nice bots attack your site. The web's most-popular bots is probably GoogleBot, create and operated by Google. Obviously, it brings traffic and is a good bot that should be allowed to scan the site. However, more and more frequently I see that the bot is looking for more and more URLs that NEVER existed on the site. Atop of that, since the site supports 35 languages, the bot even made up language-specific URLs. For some reason, it decided I should have a /en/phone page and so it also tries to fetch /es/phone, de/phone and so on.

So why is that so annoying? Two main reasons:

1/ It appears in my logs. I check these for errors and end up spending time on it.
2/ The bot is not giving up on these URLs although a proper 404 code is returned. It tries them over and over and over and over again.

Any suggestions? Seems to me that modifying robots.txt with 35 new URLs each time GoogleBot makes up a URL isn't the easiest solution.

The problem is not unique to GoogleBot. I have completely blocked Alexa's ia_archiver which is making up URLs like crazy.

Are there any reasons for inventing NEVER-existing URLs? Probably broken HTML files or invalid links from somewhere. Sometimes, wrong interpretation of JavaScript code (do they really HAVE TO follow every nofollow link as well???) seems to be the reason.

2009/04/15 - Read the update

Tuesday, April 7, 2009

Colnect Rising on Compete


Though I update about trends in site metrics for Colnect, I'm not really sure what they mean as they don't always coincide with my Analytics results. You're welcomed to check Colnect's rankings on Compete. It has risen 34% in the last month. Pretty nice :)

Sunday, April 5, 2009

GMail turn 5 - still BETA??? Colnect will not follow.

Gmail's official blog announced that Gmail celebrates it's 5th birthday. 5 years is not a short amount of time. However, GMail is still in BETA. It seems that Google has changed the common meaning of "BETA" from "publicly available product about to go fully public when final fixes and additions are made" into "fully fledged public product that is expected to sometimes fail and we won't take responsibility for it when it does".
Google even created the 'beta' mark trend in logos of companies and services.

I personally find it rediculous and unfair to the customers. Of course products sometimes fail but we cannot abuse the term "BETA" for 5 (FIVE!!!) years.

Colnect has been marked as beta for less than 6 months since it went public before all key features were ready and prior to proper testing. Raising a site from grass-roots up is not a simple task. However, as of today, since Colnect is relatively stable and many of its key features (a lot more is to come but I'll elaborate on that another time) are ready and publicly available, the BETA mark will be removed.
Yes, my system may sometimes fail. Yes, it's not as perfect as I'd like it to be. However, it's public, it's working, it makes many people using it happy so it's not a beta anymore.

Thursday, April 2, 2009

Buying and selling collectibles

A recent addition to premium members of Colnect are the buy and sell lists. You can read all about them here.

Buy List / Sell List

These lists are available with Premium Membership. Unlike Custom Personal Lists, collectibles added to these lists appear on the Collectors inventory information section of each single collectible item page.

When adding collectibles to these lists it is best to put the relevant price in the public note box. We suggest using world-popular currencies and use their 3 letter code rather than symbol. Example: USD is always US dollar, but the $ sign has different meaning in different countries.

NOTE! Prices you quote must be valid. You may add details regarding trades on your personal page under My Account. Complaints received regarding invalid prices (for example: you offered to sell an item for a certain price but later asked for a higher price) will be investigated. If you are found dishonest, your Colnect account may be deactivated without any refunds.

Japanese and Lithuanian languages added

Colnect is now available in 35 languages. The latest two additions are Japanese and Lithuanian.

Translations on Colnect are performed manually by volunteering translators who are members of the site. Whenever a phrase is not properly translated they can translate it easily. It's all explained here.

A recent addition is the use of automated suggestions. When a phrase has not yet been translated, it'll first be translated with an automated suggestion. An icon telling the translator he should translate (or confirm) that phrase still exists. The use of suggestions is intended for the period of time after a new content is published on Colnect (which is quite often) until a translator actually gets to translate it.

Yes, automated translations sometimes suck really bad. For example "FREE trial - 1 month" had a Hebrew suggestion that can be translated back to English as "Free trial - 1 year". What?!?! How did a month become a year? That is quite dangerous and I hope these mistakes are not too frequent. I hope that the automated suggestions many times "get over the net", meaning they are understood by the reader although acknowledged as improper language use.

Japanese is currently the only language for which Colnect yet has no translator and so we rely on the automatic suggestions. It's a sort of pilot to see if it can attract Japanese collectors and hopefully one of them will agree to become a translator. If this experiment succeeds, other languages may be added this way. A warning message will be displayed with languages that are not completely manually translated.

You're welcomed to check Transposh for translation solutions.

Tuesday, March 31, 2009

Twitter fails as a promotional tool?

Colnect has joined Twitter less than a week ago using a few Twitter profiles:
* Colnect news Twitter @colnect
* A personal Twitter for Colnect's manager @AmirWald
* Automated feed reporting new collectibles in Colnect's catalogs @ColnectCatalogs
* Automated feed on Colnect's catalog edits @ColnectEdits

During these days 28 visits came to Colnect from Twitter, a meager amount in comparison to the number of "followers" and energy invested. The bounce rate (visitors seeing a single page and leaving the site) was incredibly high as well. In comparison, a few posts on a relevant forum resulted in hundreds of relevant visits (with much lesser bounce rate).

It seems a lot of people use Twitter to self promote and so it's more of a bubble where "followers" is a rough indication of the number of people who will actually read anything of what you write. My guess is that for most people, a small percent of their "followers" actually read more than 5% of their tweets. Though some people think of it as a useful personal tool, it doesn't seem like they dominate Twitter.

Though less than a week may be a too short amount of time for a verdict, results so far are very unsatisfying. In the future, the automated feeds may be of use to some of the addict collectors on Colnect. Let's see what the future brings.

Wednesday, March 25, 2009

Colnect on Twitter

Giving up to the fad? Possibly...
Easier than blog posts? Obviously...
Useful? hmmm.....

Anyway, you're welcomed to follow the official Colnect on Twitter
All public updates regarding Colnect may be there before anywhere else.

Link and Search

Did you like reading it? Stay in the loop via RSS. Thanks :)