Saturday, September 20, 2008

Performance: MySQL, APC, memcached.

A highly important issue of any notable website is performance. You may have created the best website in the world but if it dies under load, you're gonna lose customers. User experience is very important today and having a slow website doesn't help at all.

Optimization is, however, not a trivial issue and requires expertise in different fields. There are so many different places where you can optimize that it's not always that easy to know what to focus on. Though this post will adhere to its title I'll still list here where optimization can occur in a website.

* Correct usage of HTTP headers to make client browsers request less information.
* Smaller responses (gZIP / more CSS - less HTML / use of Ajax to return instead of reloading complete pages).
* Optimization of your server machine(s) hardware AKA "I need more CPU, I need more memory and 'how much is another 1U?".
* Server software optimizations: Webserver (such as Apache) / Scripting engine (such as PHP) / DBMS (such as MySQL) / cache engines (such as memcached, APC) could and should be tweaked heavily. Failing to define an appropriate index in your DBMS or making some wrong choices on where and when the webserver saves user sessions, for example, could carry a heavy toll.
* Network optimizations: anyone said CDNs?

The fun part is that all these parts are well entangled.

I've read an interesting post about prefering MySQL cache over the popular memcached in some situations. Though it was pretty much one-sided (ignoring the overhead of a database connection), it rose some interesting points and is well worth reading.

An advantage towards the DBMS that I consider relevant is greater flexibility. For example: you allow outdated information to persist (such as statistics). Say you want it updated about every 5 minutes. If you cache it for 5 minutes it'll expire and then you may face a situation in which a few threads query the database again to get this information. If you use a Memory table for this information you can read it and, if expired, set some writing lock that'll cause other thread to keep reading the expired information until it's well updated.

Another interesting older post about performance showed some interesting benchmarks. The biggest problem of relying on others' benchmarks is there can always be one single parameter different on your system that would mean the results for you would be totally different. For example:
* A new version of a product has just changed everything about it.
* A configuration option made a product completely flunk its benchmark tests.
* Your queries may not be similar at all to what is tested (though you may think it is).

So these were my 2c about performance for now. The bottom line is simple: there's always a part of your system that's not properly optimized. The best is to check the painful spots and remedy them while maintaining an overall look of what your system has to provide.

Friday, September 12, 2008

Colnect V2 alpha site is up for the Prague Fair

During 12-14/9/2008 a big international collectors fair is being held in Prague. To allow collectors to preview the new version of Colnect, which includes a vast database of stamps, the alpha site has been opened and is available here.

At the moment it is NOT yet considered stable and is meant only for the taste of how Colnect would be. Hopefully, it'll be ready by the end of the month and the current Colnect will be replaced by the new improved one.

There are many new things in Colnect V2 but perhaps the most important ones for current Colnect members are the addition of versatile filters to the system which allow collectors to easily find the items they're looking for and match them with collections of other collectors.

Updates to follow...

Thursday, September 11, 2008

The "Language Icon" initiative

Colnect is currently available is 25 languages and so there should be an easy way to let users choose their preferred language. To facilitate this, there's currently a big part of the welcome screen that shows the names of the different languages. The reason is that it's highly important that a user would see their language available when first visiting the site since for many people using their native tongue greatly improves usability. Having a big box with all language names is something I can get away with on the main page but not on every page of Colnect. The problem is not when registered members (who will have their preferred language loaded as they log in) but with new visitors. For this reason there's currently a selection box on the top and side menu which allows to change a language quickly for every page. A small issue remains: what do you write in this selection box? Currently, the English word 'Language' appears there. The word itself could have been translated to every language but seeing this word in a language you probably don't understand (if you understand it, why would you change your language?) won't be very helpful. This is not ideal but I have to assume every Internet users knows at lit a tiny bit of English (sorry all, but English is the web's most international language). I've considered the option of using flags but have ruled it out because: 1 - Flags represent countries, not languages. Consider English which is widely spoken in the US, UK and Canada. On the other hand, consider Canada which has both English and French as official languages. 2 - Adding 25 flag icons for every page is an extra communication load with no good justification.

A solution?

An interesting project I've came across is the 'Language Icon'. They've decided to create an international icon to mean the word "language". Here it is: Classic Icon 32 x 32 It's supposed to look like a tongue [UPDATE: it has radically changed since this post was made!] though personally I don't find it resembling a tongue. If it'll catch on, however, it could be of great use to websites / application around the world. Kudos for the idea! I've already added this icon to Colnect V2, about to be released to the public soon, where you can find it on the side menu on internal pages.

Thursday, September 4, 2008

How traffic changed from PR0 to PR4

More than a month ago, Colnect's PageRank has changed to PR4. Now's the time for some statistics taken directly from Google Analytics deployed on Colnect.

Comparing the last 2 weeks with the 2 weeks before the change show 25% more traffic from Google. But what's more interesting is that there's 68% more traffic from Live and 58% more from Yahoo. So the PageRank probably did make a difference but is Yahoo and Live taking their information from Google? Perhaps it was vice versa and I just never stumbled upon tools to test my ranking with these search engines due to the lesser amount of traffic they bring.

Doctrine v1.0 is finally out

Colnect V2 (including stamps and more collectibles) is now almost ready to be shown in alpha and that's why it's such good news that Doctrine v1.0 has been released.

Doctrine is a PHP ORM that is nicely integrated with Symfony. It allows defining your database schema easily with YAML files. The database and PHP classes can then be automatically generated to provide you will all the needed functionality of database interaction.

Although IMO some edges have not yet been met in Doctrine (most importantly the i18n support), I hope it'll be able to work properly on the new Colnect. Developing with an ORM is surely much easier to maintain than using raw SQL. I expect Doctrine to keep growing stronger and more stable in the near future as the ideas behind it are very useful and needed.

Monday, August 25, 2008

Adjusting CSS to RTL languages

Writing the new version of Colnect from scratch, I've decided to start using more CSS and less HTML table tags where possible. Truth is I'm still not sure that this decision will hold as CSS still seems immature to me. Yes, it's been around for years and it has many proponents but the truth is that sometimes you really have to work hard to do something which could have been easily explained to any design language. One such issue is RTL (right-to-left) languages, such as Hebrew and Arabic.

HTML supports the dir tag to allow one to easily change from left-oriented design to right-oriented one. In CSS, however, it seems the matter has not been taken into serious consideration. When you have a CSS float, for example, you can choose if it floats left or right but there's no way for you to say something simple like left and left-fixed. IMHO, left should have changed to right on RTL languages while left-fixed would have always kept left. The same goes to specifying the 4 dimensions like in 'padding: 1px 2px 3px 4px;'. They should be switched unless the directive fixed is added.

But since CSS doesn't do that well, a developer from Google has created a python script called CSSJanus which tries to address many issues relevant for converting a CSS from a left-oriented one to a right-oriented one. It's code is available here.

Since Colnect is built using PHP, I've decided to only use a few ideas from the CSSJanus code and integrate them into the JS/CSS combinator already in use. The idea is quite simple, the application asks for a different CSS file when it's right-to-left (RTL) oriented by prefixing some directive to the CSS requested which lets the combinator understand it should add the conversion.

You can start with the combinator script code here.

These two lines at the top of the script will add RTL directive:
$bRTL = (substr($_GET['files'], 0, 4) == 'rtl_');
if ($bRTL) $_GET['files'] = substr($_GET['files'], 4);


Now the cache hash should be different so there's a slight modification here:
$hash = $lastmodified . '-' . md5($_GET['files'].($bRTL ? 'RTL' : ''));


And the last thing to do is to create the left-to-right conversion function and place it just after stripping the CSS comments. Add this:
if ($bRTL) $contents = CssSwitchLeftToRight($contents);


And here's my simple conversion function (that does NOT cover many cases covered by CSSJanus):
/**
* Switch left to right and vice versa for a few of the cases relevant for css
*
* @param string $str
* return string
*/
function CssSwitchLeftToRight($str) {
$arConversionSeq = array(
'/-left/' => 'TOK1',
'/-right/' => '-left',
'/TOK1/' => '-right',
'/float\s*:\s*left/i' => 'TOK2',
'/float\s*:\s*right/i' => 'float:left',
'/TOK2/' => 'float:right',
);
foreach ($arConversionSeq as $pattern => $replacement) {
$str = /*"doing{} $pattern => $replacement ".*/preg_replace($pattern, $replacement, $str);
}
return $str;
}


I have not posted the entire script here since it has site-specific modifications on my site. You're welcomed to comment here if further clarifications are needed.

Saturday, August 9, 2008

How many collectible phone cards are there?

I'm sorry but this blog post is not going to answer that question.

When I took over Colnect (previously known as Islands Phonecards Database) we've had ~30,000 collectible phone cards in our database. Less than a month ago I've written that "Collectible phone cards catalog has passed 100,000 items" but as of today I'm happy to announce that Colnect's catalog has just passed the 110,000 mark.

It seems that not only is the database growing, but that its growth rate is on the rise. So when will it stop? Obviously it'll start slowing down when most of the collectible phone cards in the world will already be listed on Colnect. Just how many are there? As I know we're still missing some tens of thousands of Brazilian and Chinese cards, my assumption ranges somewhere between 200,000 and 1,000,000 different collectible phone cards. The variation is great due to the unexpected nature of variants. A card may be listed once but then an expert collectors note that there were small variations between the different prints and one card becomes 20 different variants, all with different collectible value.

So when will this race stop? Let's wait and see...

Link and Search

Did you like reading it? Stay in the loop via RSS. Thanks :)