Thursday, February 14, 2013

Server Load Balancing Simulation

I read an interesting (to me) post about scaling a web application using the Heroku stack, and how they ended up having capacity issues not from the servers, but from the load balancing router.

Heroku recently changed the router from selecting the next available server to just selecting a server from the pool randomly and passing it without blocking.  Each server has its own blocking queue, in this case.  The justification from Heroku was that for a non-blocking server (like Node.js) this wouldn't be an issue, and should improve utilization.

But, the folks at RapGenius use Ruby on Rails, which is a blocking server, meaning they can process just one request at a time.

So I was reading the article, and they were complaining that they now needed to increase the number of servers, in order to meet the demand on their website.  I was starting to get antsy when they seemed to be speculating, but then they showed results from a simulation they ran in R, and that made me really happy.  Especially since they were able to put together statistics for their request processing times and identify a distribution.

I can't help but think how easy this model would be to build, run, and analyze in any of the available discrete event simulation tools we use, rather than use R.  But, R is free to use and our favorite discrete event simulators are not so much.  But, I wonder how many of these startups would ever consider using something like Simio to model their website infrastructure, and ultimately help themselves scale up.

Here's the post.