The post Script for launching load balanced EC2 auto scaling group appeared first on Mr Kirkland.
]]>The auto scaling group is launched and shut down with the scripts attached below. In our implementation the instances were a simple phpthumb installation that we processed images via http request. Whilst our solution worked, it wasn’t as robust as blitline, partly because it made no use of queuing (as I mentioned in my scalability post), plus blitline is still pretty cost effective.
However I still have these two handy scripts for spinning up a load balanced group of servers (and shutting them down) which I think are rather handy, so today I’m releasing them on to wilds of the world wide internetz.
Download: http://yumiko.theartistsweb.net/data/asg-scripts.tgz
INSTALLATION
0) you need installed the AWS command line tools http://aws.amazon.com/developertools/2535 docs http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/UsingTheCommandLineTools.html
1) chmod 755 the scripts
# chmod 755 aws-start-as-group.sh
# chmod 755 aws-shut-down-as-group.sh
2) edit the config at the top of each script
USAGE
1) edit config at the top of the aws-start-as-group.sh script
* set your AMI and AVAILABILTY_ZONES
* tweak INSTANCE_TYPE, MAX + MIN INSTANCES depending on your estimated needs (NB number of MIN_INSTANCES will constantly be running)
* for finer tuning experiment with the COOLDOWN and CPU values
* if you are going to run multiple autoscaling groups then you might need to edit the BASE_NAME for each time you run the script (though it auto creates a name based on the date and hour)
2) execute with
# aws-start-as-group.sh
3) terminate all resources with
# aws-shut-down-as-group.sh $BASE_NAME
NB $BASE_NAME is set in the config and when you run this script it is echoed at the end, it's a prefix and todays date
The post Script for launching load balanced EC2 auto scaling group appeared first on Mr Kirkland.
]]>The post Preparing Your Website/Web App For Scalability appeared first on Mr Kirkland.
]]>This is a good problem to have (server heat is proportional to # of users). And it’s not so difficult to deal with if make a few preparations in advance.
After launching a number of web services and viral social media apps, some of which grew to hundreds of concurrent users and zillions of hits within days, I’ve had to learn “on the job” about scaling websites. Here I present to you my thoughts and some simple suggestions on how you can prepare ahead of time.
Really the most important principal to take on board is modular design. If you look at any high volume platform you’ll quickly see evidence of how the components of the service are split up into separate independent units. Wikipedia is classic example of this, cheek out the architecture behind wikipedia.
If you can separate all the different components of your website/web app, then you can easily apply more resources where they are needed. Just upgrading to a bigger server won’t get you far, long term you need to be able to identify where the bottle necks are and apply resources efficiently. In fact even identifying the bottlenecks can be surprisingly hard, modular design makes this much easier.
So some examples of the typical components your website/web app will be using:
Separating these components needn’t be rocket science, here’s some simple examples of how you could apply modularity to the components above:
In addition to thinking modular, always be monitoring and profiling your systems then you know exactly where the bottlenecks are and can deal with them more effectively.
Typically there may be parts of your application that involve heavy processing. If these don’t need to be real time, or can tolerate slight delays then separate the processes and make use of queuing/batching so your web application isn’t held up.
Google Analytics is a good example of this, generally there’s no need to have real time webstats, so the collected stats are batch processed at regular intervals. Another example could be image processing, I’ve written about this in more detail here.
Taking modularization a step further and we have partitioning i.e. splitting data up in to smaller manageable chunks which can be stored/operated on separately. If you have a multi user system you could split users across servers, so for example your could have 1000 users per server, or split users by user id odd/even, divisible by X etc. Again you could do this ahead of time with subdomains:
odd user id: controlpanel-a.mydomain.com
even user id: controlpanel-b.mydomain.com
Another example of partitioning is database partitioning, here’s a quick introduction, but this would come way later, having a dedicated db server or db cluster will scale you a long way.
Slightly contrary to the idea of scalability, but definitely on topic: don’t start your code design with optimization, this is a waste of time. Not even the mighty He-man could predict where all your bottle necks will be. Always focus on writing easily maintainable code, and then in the later stages of development you can profile your application to find out where you need to optimize your code.
Caching is probably the simplest and cheapest way to dramatically improving performance. There are a variety of stages at which you can use a cache and whilst some require a little care at least a couple are almost trivially easy to implement. It’s good to be aware of all the options even if you don’t need to implement them yet.
Within 10 mins his app was performing about 10 – 50 times faster. All I did was add a few basic indexes.
This isn’t strictly about scaling, but it’s such a good performance tip: Learn how to do good DB Indexes, they are really simple and without them you’re hemorrhaging your DB’s potential performance away. here’s a quick tutorial.
I was having a geek cafe session with my good friend Craig Mod one afternoon and he asked me to look at his application as it was a bit slow. Within 10 mins his app was performing about 10 – 50 times faster. All I did was add a few basic indexes.
As mentioned above, don’t forget to turn on the DB query cache.
There are a number of reason why at some point you’ll need a DB master and slave(s) (load balancing, redundancy, zero downtime backups), you can prepare for this now by having two DB objects e.g $db_reader and $db_writer
$db_reader = new db_object(SLAVE_DSN);
$db_writer = new db_object(MASTER_DSN);
//reads go to the slave (or master)
$db_reader->query("select * from user where id = 1");
//writes only go to the master
$db_writer->query("update user set status = 'kool kat' where id = 1");
With a simple change to your DB config this allows you to instantly use a master/slave(s) setup to spread the load on your database.
Allow for the DB reader to be slightly behind the master – replicated slaves are likely to periodically lag behind the master even if just for a second or two. If you are using the tip above and sending reads to a slave then this could cause problems in your application. For example say a user is logged into a control panel making changes, if the DB reader is behind the DB writer, then trying to fetch the new changes from the database will potentially show the old values if the DB reader is behind.
I’ve mentioned this already above, but it’s worth mentioning again. Set up your site to host all images, js and css on a domain different from your main website. I’d recommend you have some sort of central configuration so the static server url is automatically included in all image urls e.g.
//hand coded (not recommended)
<img src="http://images.mywebsite.com/images/1234_an_image.jpg" / >
//simple variable
<img src=”<?php echo $image_server; ?>/images/1234_an_image.jpg” / >
//or full managed image urls using a special class to generate urls
<img src=”<?php echo $this->images->get_image_url($image_id); ?>” />
If your application sends emails, make sure the mail config can be changed easily and can use SMTP. Part of modularizing + queuing would involve having a separate server for sending, it’s most likely you’ll connect to this from your app via SMTP. Particularly now as there are a number of 3rd party SMTP services that can reduce the increasing headache of reliably sending mail.
If you are using a standard mail library like phpmailer, or as part of a framework, then this is probably set up already – see the docs. if you have your own DIY mail functions, it’s probably best to swap to some established library (e.g. phpmailer) anyway, no need to re-invent the wheel.
Separating storage of data from your application is an essential part of being modular. Probably most of your data is in a database already so without too much sweating and grunting this is most of the battle won. However you may be storing uploaded files, images etc. which you definitely don’t want to store in a database. But you shouldn’t just write these to the local file system either, if you use a cluster of application servers then you’ll need some sort of shorted storage system. A good example of this would be Amazon S3, there are plenty of client libraries and example code that means it should only take a matter of hours to integrate. An NFS server could be an option, but NFS can be nasty!.
Large backups can really slow down or even stop a server (particularly large DB backups), but chances are you could significantly reduce this load by
Using virtual servers/cloud computing will give you a certain amount of scalability for free. Most virtual server providers make it very easy to up the power on your machines, whilst this may not be the most elegant solution, it saves a massive amount of work compared to the task of migrating dedicated servers.
However that’s not the only scalability benefit. You get many of the suggestions I’ve made here for free with Cloud/Virtual computing, e.g.
And last but not least I should mention there are now a number of higher level services, RightScale.com, Scalr.net, Google App Engine to name a few that offer automatic scalability. If you are able to work within their constraints (for example App Engine only supports Python + Java) then they can be a very attractive offering. Google App Engine:
Automatic scaling is built in with App Engine, all you have to do is write your application code and we’ll do the rest.
As technology advances, a lot of scalability issues become easier to solve or disappear – we have cloud computing, easy to use CDN’s and services like Google App Engine. However until we have the Holodeck, I’m pretty confident most of the principles I’ve raised today will still be important in your design considerations.
…oh and hopefully your server has stopped melting now.
Here’s a few books that have helped me:
Check out a real world example, wikimedia:
http://meta.wikimedia.org/wiki/Wikimedia_servers
highscalability.com has some good resources:
http://highscalability.com/blog/category/blog
The post Preparing Your Website/Web App For Scalability appeared first on Mr Kirkland.
]]>