Archive for the 'Technical' Category

OpenSocial for Fab Swingers

Wednesday, October 31st, 2007

There’s a fascinating article over at TechCrunch about Google OpenSocial. The basic idea seems to be that Google will provide a standard API for social networks so that developers only need to support one API.

This is fantastic news for us, a site like Fab Swingers is obviously much too small to attract a development community but through OpenSocial all we’ll need to do is implement a standard API and then our users will access to a huge range of products.

Overall I suspect that OpenSocial will be a big win for the smaller social network sites.

Performance to the next level: memcached

Sunday, October 28th, 2007

I think that I’ve more or less reached the limit of my MySQL optimisation, there’s always ways that I can speed up queries here and there but I’m not going to get a 10x speed up through this path. So I am now looking at reducing the number of databases queries through a cache of some sort.

As suggested by a comment on this blog I checked out memcached. I have to say I was very impressed. I love the fact that it’s used by Slashdot (see their write up) and LiveJournal amongst other. It’s also great that it’s so simple and doesn’t even have a configuration file!

I had been thinking about doing the cache at the level of my Python application. However the problem with that is that I run multiple processes for the application and in the future I may well want to run the application on more than one physical server.

Memcache comes with a pure Python interface however I shall be using the C Python interface which is reportedly 3 times faster — because I’m in this for the speed!

I’ve not decided how to use memcached though, I could cache data structures (Python dictionaries probably) or I could cache whole pages. Following my recent MySQL optimisations/removal of awstats I’ve bought myself quite a bit of time so I won’t need to roll out memcache until traffic doubles which should give me another couple of months.

PS: I think that I may disable the MySQL query cache, I’m seeing a hit rate of 500,000 verses 14,000,000 inserts so I suspect that the cache is making performance worse. The problem is that my member table changes so frequently that the cache probably ends up being invalidated before it can be used.

MyISAM performance with text columns

Monday, October 15th, 2007

I’ve now deployed with the new table structure. It turns out that one of the big things that was slugging performance was tables with ‘text’ columns. I’ve read around the subject and the general consensus seems to be that text and blob columns will both cause severe loss of performance.

As an example doing a count(*) on the message table has moved from 400 ms to 0 ms when I removed the text column. I’ve now shifted the text column into a separate message_detail table and am only querying on it when absolutely necessary.

Here’s what ’show status’ was showing before I made the changes:

Created_tmp_disk_tables 1839073
Created_tmp_tables 3880113

And here’s what the same command is showing now:

Created_tmp_disk_tables 1231
Created_tmp_tables 303910

So there has been an absolutely huge reduction in the proportion of temporary tables which are now being written to disk.

This has been a good start to the MySQL optimisation but there’s still more to do.

Statistics snafu

Friday, October 12th, 2007

Today I installed YSlow, a pretty cool tool for spotting speed problems with webpages, it’s based on the 14 Rules for High Performance Websites.

Anyway, the results showed that I need to get the cache expire headers running on lighttpd and that also on lighttpd I wasn’t gzipping my content. Then I went onto the Swinging Forums and it popped up with a warning about a JavaScript error. It turns out that I hadn’t installed the Google Analytics code properly.

I’ve been a bit surprised by how poorly the forum has been performing, it seems fairly busy but it never shows up on the stats so of course now I know that it’s because it was being ignored! I’ve probably been underestimating my traffic by a fairly significant amount.

UPDATE: Just noticed that I’ve also missed the Google Analytics code from the chatroom. Some people spend hours and hours in the chatroom so I suspect that we’ll see a huge increase in the reported length of time people are staying on the site.

Performance revisited

Tuesday, October 9th, 2007

The good news is that traffic to Fab Swingers has continued to rocket. Following my recent work the Python side is fixed and the bottleneck has now moved to MySQL.

Today I improved the performance of the homepage. The homepage shows the 10 most recent couples, 5 most recent women and 5 most recent men. I was just using a LIMIT but I’ve now improved the WHERE clause so that the limit is performed on far fewer records which has sped everything up pretty nicely.

However the more I think about it, the more I think that the main issue I have is the structure of some of the tables. For example the main member table should be split vertically so that a lot of the detail (particularly the profile text column) get shifted into a detail table. There are always going to be some scans and sorts that won’t use indexes so if the rows in the table are much narrower (and in MyISAM’s static format) then that would be much faster.

I also need to revisit my indexing strategy, much more sorting needs to be done through the indexes. It is a challenge on the search page because there are so many possibilities however I’m sure the 80/20 rule applies so I should be able to knock off the 80%.

Finally, I think that I need to do a bit of caching in the application. Profiles and the homepage are all moderately static so don’t really need to be created dynamically for every view.

I’m not beating myself up over the performance; I’m managing 15 hits a second to CherryPy/MySQL on a low power server which is also handling many more hits to the static content. I think that with a bit more work I can extend the life of this hardware by another a few months but it’s clear that I will need a hardware investment this calendar year (probably buying a second cheap server rather than upgrading the current server to something more powerful).

Looking at the recent number I think that the challenge for year two is not going to be marketing but in getting the technical infrastructure to keep pace with the traffic growth whilst keeping within the financial constraints of the advertising-funded model.

XHTML Friends Network

Friday, September 14th, 2007

I just added support to Fab Swingers for XHTML Friends Network or XFN. It’s a microformat that uses the rel attribute on link tags to add some extra information so that you can map human relationships.

I’ve added it to all the profiles so that the friends links use the XFN tags.

There’s more about XFN over at http://gmpg.org/xfn/.

I’m not sure how much practical use this is but I like experimenting with new technologies and it’s all about trying to make Fab Swingers the best adult social networking site. I think the next technology I should look at is OpenID.

mod_wsgi deployment up and running!

Saturday, September 8th, 2007

It took a lot longer than planned but I know have the mod_wsgi deployment up and running and the performance is fantastic. Even during the busy evening the site was just lightening fast.

The server load is a little bit high so I guess that I’ll need to something when the traffic doubles but that will be a few months away at least.

I’m still not even thinking about a new server though; the next step will be to work out some sort of cache strategy. But for now I can forget about performance and start thinking about features and marketing again.

mod_wsgi Problems Continued

Friday, September 7th, 2007

The last deployment of the mod_wsgi version of the site failed again. This time I finally learnt the lesson and went to the bother of setting up a better test environment. Rather than simply using my Mac OS X laptop to test prior to deployment on Debian I am now testing on Debian too. I’m also making liberal use of Apache benchmark to at least do some basic testing under load.

The first problem I had was that I was ending up with an enormous number of database connections again which eventually crashed out. This turned out to be a problem with my code, I am now just using a global variable to store the MySQL connection object which seems to be working fine.

The second problem was that the server failed around half the time under load with simultaneous connections. No error messages that I could find were generated. This took about half a day to track down and eventually I found out that the problem was actually down to Python 2.4’s randint method from the random standard library. This was quite a surprise but when I Googled it turned out that other people have had similar problems.

In my case I was just trying to generate a random number so that I could choose between one of three adverts to display so I’m now just doing datetime.time().second % 3 which is absolutely fine under load and I don’t really care that it isn’t very random.

The performance numbers I’m now seeing from Apache benchmark are now in line with the numbers that Plenty of Fish have published although I’m very aware that there’s a huge difference between performance under test conditions and in real life.

Fingers crossed that the deployment tomorrow am goes to plan!

Problems with mod_python deployment

Tuesday, September 4th, 2007

I went ahead with the mod_python deployment at 6.30 am this morning as planned; however it didn’t work out so I had to rollback.

The problem was that I ran out of MySQL database connections. I’ve got 120 processes and many of these processes will need MySQL connections. However only 1 in 10 requests are for CherryPy, the others are for static content so I could easily get by with only 12 processes if I was serving my static content on another server.

It’s also quite a waste of resources to be loading up lots of heavy Apache processes with both mod_php and mod_python when the vast majority of the time they are just doing images.

So, the new plan is to use lighttpd for the static content and then an Apache with ServerLimit and MaxClients set to 12.

I’ll try again tomorrow.

Switch to new code base tomorrow!

Monday, September 3rd, 2007

I’ve decided to switch the site over to the new code base tomorrow. This means that we’ll have moved away SQLObject and will be deploying under mod_python.

I am slightly concerned that there are still issues with the new code, ideally it would have been tested more thoroughly. However it is so staggeringly improved that I think it is with moving even if there are some problems. The speed increase is from anything up to 8000 ms to generate a page down to 60 ms!

It looks like that it’s the removal of the CherrpyPy WSGI middleware which is responsible for most of this speed-up. I started by deploying under mod_wsgi which initially looked great but under load it failed to show any improvement over my original setup. I think that the fault lies with CherryPy’s WSGI so the solution is to bypass it completely with mod_python.

I’ll get up early and do it first thing which is the quietest time. If there’s any problems I will just revert back.