Mr Kirkland

Recipe To 150 Thousand Monthly Uniques

Chris Kirkland — Wed, 08 Apr 2015 02:40:51 +0000

My fellow Tokyo tech cheapo Greg Lane and I co-founded TokyoCheapo.com three years ago. From our humble beginnings as a little side project, our baby has really grown up.

Three years on and the traffic has really jumped up, especially in the last year – from 50K monthly UV around April 2014 to 150K UV this month. And now we’re launching a much desired Tokyo Guide Book, click here and buy this awesome book – it will save you so much money!

Since the lion share of our traffic is from organic search, you probably all want to hear about our SEO strategy. Whilst we have put work into our SEO (mostly under the expert guidance of pagespeed.io), and perhaps even a little into content marketing, I would attribute the main reasons for traffic growth to:

Having identified an underserved niche
Writing good well researched and detailed posts
And the gratuitous overuse of kittens

80/20 rule – (approximately) the top 20% of our articles receive 80% of our traffic.

1. You Need An Underserved Niche

The bad news first (if you’re trying to boost an existing content site), we had it easy. No one was really writing about Tokyo for smaller budgets but this was exactly what the world wanted. Tokyo had (still has?) a reputation for being very expensive which frightens off a lot of potential visitors. In reality Tokyo really isn’t that expensive, yet most of the popular media covering Tokyo tends to focus on the high end and extravagant (perhaps in chasing the advertising $$$) thus perpetuating the false reputation.

This is one of the main ingredients to our success, lots of people looking for the information we were putting out and no one else really writing it.

2. Good Timing

We’re also pretty good at stopping the press, photoshopping some animals into photos and pushing out a topical article should the chance suddenly arise.

Though a lot of our content is fairly timeless, we’ve still managed to massively boost the audience to some of our articles by dropping them at precisely the right time. For the most part this is fairly obvious – ice cream tends to sell better in the summer, but it also includes reacting quickly when an opportunity arises.

Examples:

Publishing “3 nights in Tokyo on $100” a few days after the Mercer report (yet again) declared Tokyo as the world’s #1 most expensive city.
Making sure we put out “Top Winter Illuminations” plenty of time before Christmas
Slapping up a quick and totally unplanned “Halloween Report” with some snaps of the unprecedented craziness of the night before (Halloween is a new thing in Japan)

We plan ahead, and having run the site for more than a year we already know a lot of seasonal topics and opportunities. We’re also pretty good at stopping the press, photoshopping some animals into photos and pushing out a topical article should the chance suddenly arise.

3. Depth

It really sets you apart when you’re the most detailed and best on a given topic.

Unlike a lot of other clickbaity media covering Japan our articles are detailed and practical (if still a little clickbaity). Giving useful and actionable advice has been a key pillar in building the TokyoCheapo brand.

In the 3 nights article mentioned above, we spent a lot of time researching and putting together a very detailed itinerary complete with cost break down and maps. Going by the 80/20 principle it would seem counter intuitive to spend all that extra time getting all the last details, however in the case of our most popular articles the 80/20 flips round and the more detail you put in the bigger the pay off, it really sets you apart when you’re the most detailed and best on a given topic.

We’ve also decked out the site with custom location an mapping, meta data sections at the end of posts, so it’s easy for our authors to input the pertinent details and a consistent UX for our readers.

4. Good Content Often vs Crap Every Day

We’ve kept up a steady flow of articles since the beginning, at first two or three a week and now it’s more than once a day so now we have a huge archive of content. However, I would say quality trumps quantity, especially given that the 80/20 rule applies to our traffic – (approximately) the top 20% of our articles receive 80% of our traffic.

Though we’ve kept up a fairly steady rhythm we don’t force ourselves to publish every day, if we’ve got nothing good to say we shut up. Our focus is to put out good articles as often as we can.

5. Kitteh (and Dogz)

Sometimes our most useful articles have the most boring photos, but with a little help from doggie,

or Kitteh

things start to look a little better.

Kittens and puppies are the rocket fuel of the internet, use them properly and with care.

6. SEO Business

6.1 Proper Mobile First Responsive Theme

In case you didn’t get the memo, you need to make sure you have a good UX for mobile users or they won’t be coming to your site any more. We had a “responsive” theme since near the beginning, but it looked a little shabby for mobile users, so we threw it out and started again spending time on making sure the mobile views worked well.

We also have user agent detection in both our varnish set up and within WordPress. We use the device detection to serve mobile ads correctly, and we use our responsive framework’s show/hide classes to hide certain elements that clutter or slow the mobile UX.

6.2 Page Speed

Uncle Google tells us webmasters that we should optimize our webpages’ speed and I agree. It’s better use of technology and better UX. Moreover as you can see from the graph above, we experienced a drop in our organic traffic following a drop of the page speed ball (some cache misconfiguration that took me a month to spot).

I gave a talk about page speed and learning from running Tokyo Cheapo at UX Talk Tokyo last year, here’s a video of the slides (skip to 9:38 to jump straight to the tips):

And here’s the TLDR; takeaways from the video:

Via Negativa – take away the crap – we turn off plugins, purge legacy javascript and css, removed all the js social media buttons and even took a screen shot of the facebook like box to make a fake one with an image (as opposed to the 300kb and 29 reuqests it took before).
Caching – we have a fairly complex varnish setup, but a simpler approach that’ll still give you most the benefits would be setting up with Cloudflare, and (if on wordpress) installing W3C total cache plugin. If your content is a blog or fairly static you can cache the HTML as well as the assets (js, css etc)
Combine and Minify – this means putting all your js in one single file and all your css in another. This can get quite fiddly and technical, but there’s some services (like cloudflare and the W3C total cache plugin) that will attempt to do this automatically for you.
CDN – Content Delivery Network that is serve all your static content from a CDN. Again there’s a few services that will do this for you automagically (cloudflare, yottaa) but it’s not that tricky if you use the W3C total cache plugin.
Fast DNS – we switched from our own (slow) DNS to Amazon AWS service Route53, most big ISP’s probably have globally distributed DNS servers. Check your DNS lookup speed if it’s more than 100ms then it’s worth switching where you host your DNS.

6.3 Fixing WordPress Woes

6.3.1 Avoiding the Slashdot Effect

Naturally we host TokyoCheapo on the cheapest servers possible, in fact the main instance that handles all the traffic is probably around about as powerful as an iPhone 6. WordPress is not optimized to run quickly out of the box, so if our caching and CDN set-up wasn’t there our puny little servers would be destroyed instantly. Even powerful servers will buckle under pressure if they get a lot of traffic and they are running an unoptimized vanilla wordpress set-up. The main points to mitigate these issues I covered above – caching and a CDN.

6.3.2 Removing Tags

One of pagespeed.io first fixes was to jettison our entire tags archive, that is all the thousands of low quality pages under tokyocheapo.com/tags/. The idea is to keep googlebot only crawling your articles and save him/her the trouble of having to crawl auto generated and duplicate content. We’d been running the site for two years before this change and actually had a bit of traffic to some of our tag archive pages, so we could not just 404 them all.

However we had over 1000 tags to trawl through and 301 redirect to appropriate articles. I whipped up a quick script using our search engine (we use Relevanassi, which is way better than the default WP search) to quickly find the closest matched articles to tags and spit this into a massive .htaccess file of redirects. I then went through the list and manually changed a few of the redirects to more suited pages.

Analytics Data showed that the tag pages do not actually get a lot of clicks. Accidentally, we also created a lot internal near-duplicate content on some keywords using the tag feature of WordPress. We removed all tags from our blog and set 301 redirects from the one page to a page that formerly had this tag applied.

6.3.3 Changing Orange Juice To Link Juice in About US

We had a humor reference and link to an article about cheap orange juice in our “about us” box in the footer. Whilst slightly funny it wasn’t that useful to visitors and we couldn’t care less about ranking for “orange juice”, so swap the orange juice link juice to a more useful and popular article (h/t to Chris Dietrich for pointing this out.).

6.3.4 Writing More Compelling Meta Descriptions

We have an editorial policy of writing an informative and hopefully catchy meta descriptions for our posts. It’s not about stuffing keywords; it’s about taking advantage of those few precious lines of copy Google (usually) shows in the search results. You could think of it as copy in an adwords advert. Would you leave advert copy to some auto-generated summary by WordPress?

6.3.5 Yoast SEO Plugin

The Yoast SEO plugin takes care of a lot of SEO basics out of the box – sitemaps, canonical urls as well as giving you handy tools for making custom meta descriptions and choosing good keywords to focus on. Even a vanilla install out of the box is already quite helpful, but as mentioned above we make use of it for manually improving title and meta descriptions.

6.3.6 Inter Linking & Smart Links Plugin

No brainer, interlinking to useful, related articles on our site and other sites is helpful to our readers and something uncle Google listens to. Our writers and editor are doing this as they add artilces, plus we use this plugin to set some rules for automatically linking certain text to specific URLs.

6.4 Some Notes On Varnish

This section assumes knowledge of varnish and HTTP caching.

Varnish is a whole subject in itself, but I leave some brief notes here for my own future reference (since I keep forgetting how it works) and those of you brave enough to stray away from the comfort of cloudflare and the like.

6.4.1 Our Varnish Set Up

I started with the varnish wiki wordpress vcl example and butchered it from there. This example VCL takes care of some basics like handling PURGE requests and skipping the cache when it sees wp login cookie:

# wordpress logged in users must always pass
        if( req.url ~ "^/wp-(login|admin)" || req.http.Cookie ~ "wordpress_logged_in_" ){
                set req.backend = apache;
                return (pass);
        }

I simply blocked the wp-cron.php since we don’t use it:

# block the wp-cron.php
        if( req.url ~ "^/wp-cron" ){
                error 403 "Not permitted";
                return(error);
        }

It also drops all apache set cookies (apart from for logged in users).

# remove any remaining cookies
        unset req.http.Cookie;

Varnish is all about taming the cookie monster.

6.4.2 Getting WordPress, W3 Total Cache and Varnish To Play Nice

Since we use the W3 Total Cache plugin, our .htaccess now has a bucket load of cache rules particularly for static content (scripts etc) set by W3 Total Cache. One of the rules messes with the cache efficiency, so it’s been commented out:

 
#        Header append Vary User-Agent env=!dont-vary

If you have the header set to vary per user-agent, then varnish will cache a separate copy of a page for every unique browser string. This renders the cache very inefficient, since most requests will have slightly different browser identification strings, so varnish has to pass the request to apache for pretty much every new user.

Because we need granular control over the expiry times of our content (e.g. home page for 1 hour, category page 6 hours, inividual post 2 days etc), the W3 Total Cache HTTP cache expiry setting aren’t adequate, and my work around is explained below:

6.4.3 Custom Cache Expiry Headers Using .htaccess

I’ve spent a long time wrestling with wordpress to try and get it to spit out good cache headers so it plays nicely with our varnish instance. In the end I opted for hacking around in .htaccess instead. The following code uses rewrite rules to set apache environment variables (for cache age), and then below I manually set a ‘Cache-control “max-age=xxx”‘ header, first setting a default value which then gets overwritten if my “cache_age” apache environment variable has been set.

# -------- Chris's cache control hacks -----------#
# we specify resources that have special cache times - default time is below

# half a day or so
RewriteRule ^events/$ - [E=cache_age:43200]
RewriteRule ^questions/.*$ - [E=cache_age:43200]
RewriteRule ^sitemap/.*$ - [E=cache_age:43200]
RewriteRule ^accommodationcat/$ - [E=cache_age:43200]
RewriteRule ^business/$ - [E=cache_age:43200]
RewriteRule ^business/financial/$ - [E=cache_age:43200]
RewriteRule ^business/internet/$ - [E=cache_age:43200]
RewriteRule ^entertainment/$ - [E=cache_age:43200]
RewriteRule ^entertainment/art/$ - [E=cache_age:43200]
RewriteRule ^entertainment/event-posts/$ - [E=cache_age:43200]
RewriteRule ^food-and-drink/$ - [E=cache_age:43200]
RewriteRule ^food-and-drink/cafe/$ - [E=cache_age:43200]
RewriteRule ^food-and-drink/drinking/$ - [E=cache_age:43200]
RewriteRule ^lifestyle/$ - [E=cache_age:43200]
RewriteRule ^lifestyle/outdoors/$ - [E=cache_age:43200]
RewriteRule ^living/$ - [E=cache_age:43200]
RewriteRule ^living/household/$ - [E=cache_age:43200]
RewriteRule ^podcast/$ - [E=cache_age:43200]
RewriteRule ^shopping-2/$ - [E=cache_age:43200]
RewriteRule ^shopping-2/fashion/$ - [E=cache_age:43200]
RewriteRule ^travel/$ - [E=cache_age:43200]
RewriteRule ^travel/holidays/$ - [E=cache_age:43200]
RewriteRule ^travel/transport/$ - [E=cache_age:43200]

# an hour or so
RewriteRule ^feed/.*$ - [E=cache_age:3600]
RewriteRule ^$ - [E=cache_age:3600]

RewriteCond %{QUERY_STRING} ^s=(.*)$
RewriteRule . - [E=cache_age:3600]

# very long
RewriteRule ^wp-content/.*$ - [E=cache_age:6048000]

# first a catch all default - one week
Header set Cache-Control "max-age=604800"
Header set Cache-Control "max-age=%{REDIRECT_cache_age}e" env=REDIRECT_cache_age
# for some reaso the query string condition based rule above doesn't get changed to REDIRECT_cache_age
Header set Cache-Control "max-age=%{cache_age}e" env=cache_age

# -------- End of Chris's cache control hacks -----------#

6.5 Redirect Moved or Missing Pages

If you change a URL, make sure you leave a 301 redirect at the old URL, ultimately no one (especially Google) likes to land on a 404 page, no matter how many funnies it has. We brief our team not to edit the “slug” of a page once it is published, and we also check our logs and webmaster tools for 404’s that have slipped through the net.

Redirecting is simple enough, just a case of slipping one line into your .htaccess file for each URL
e.g.

Redirect 301 /old/url/ http://domain.com/new/url/

7. The Graph You Probably Want To See

And here’s an analytics graph chronicling our growth thus far:

8. Support Our Crowd Funding Campaign!

We’re running a crowd funding campaign for our Guide Book “A Cheapo’s Guide To Tokyo” please support us! You can help us with just three clicks.

And If you’re ever likely to visit Tokyo order a copy, it’ll save you a ton of money and time!

The post Recipe To 150 Thousand Monthly Uniques appeared first on Mr Kirkland.

Script for launching load balanced EC2 auto scaling group

Chris Kirkland — Sat, 23 Mar 2013 04:17:39 +0000

We just migrated the image processing on artweb.com from our own load balanced EC2 auto scaling group of instances to use blitline instead.

The auto scaling group is launched and shut down with the scripts attached below. In our implementation the instances were a simple phpthumb installation that we processed images via http request. Whilst our solution worked, it wasn’t as robust as blitline, partly because it made no use of queuing (as I mentioned in my scalability post), plus blitline is still pretty cost effective.

However I still have these two handy scripts for spinning up a load balanced group of servers (and shutting them down) which I think are rather handy, so today I’m releasing them on to wilds of the world wide internetz.

Download: http://yumiko.theartistsweb.net/data/asg-scripts.tgz

INSTALLATION 0) you need installed the AWS command line tools http://aws.amazon.com/developertools/2535 docs http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/UsingTheCommandLineTools.html


1) chmod 755 the scripts

 # chmod 755 aws-start-as-group.sh

 # chmod 755 aws-shut-down-as-group.sh
2) edit the config at the top of each script
USAGE

1) edit config at the top of the aws-start-as-group.sh script

 * set your AMI and AVAILABILTY_ZONES

 * tweak INSTANCE_TYPE, MAX + MIN INSTANCES depending on your estimated needs (NB number of MIN_INSTANCES will constantly be running)

 * for finer tuning experiment with the COOLDOWN and CPU values

 * if you are going to run multiple autoscaling groups then you might need to edit the BASE_NAME for each time you run the script (though it auto creates a name based on the date and hour)
2) execute with

 # aws-start-as-group.sh

3) terminate all resources with # aws-shut-down-as-group.sh $BASE_NAME NB $BASE_NAME is set in the config and when you run this script it is echoed at the end, it's a prefix and todays date

The post Script for launching load balanced EC2 auto scaling group appeared first on Mr Kirkland.

Increase Facebook Fans – Case Study: Sponsored Story vs Advert

Chris Kirkland — Sat, 07 May 2011 14:26:09 +0000

I found sponsored stories gave 4x better cost performance than standard FB ads at just 0.18GBP per new fan. If you wanna find out how/why read on!

I recently decided to test out facebook’s advertising platform for artweb.com with the goal of increasing the number of fans to our FB page.

Control Case/Background

Since we launched our FB page last year, after an initial influx of fans the trend seemed to be just over one new fan a day on average. There’s no special activities going on during this period, so I’m confident this serves as a good ‘control case’ of the number of new fans per day.

At the start of these tests we had around 500 fans (all organic).

Case 1: Advert -> FB Landing Page

Our first tactic was to create an highly targeted ad shown to potential cusomers with a ‘like us on facebook’ incentive:

This advert went to a landing page (custom FBML page) with a big graphic saying like us for a 20% discount code. Nice and simple.

Using the advert settings we narrowed down the users to a few very narrow art related group, close to the demographic and interests we see in our typical users, here’s one group:

who live in the United Kingdom, age 21 and older, who like art, art design, artwork, contemporary art, fine art, illustration, modern art, painting, sculpting or sculpture, who have graduated from university, who studied art design, fine art or fine arts

We ran the ad for a few days with the following results:

And here’s the cost breakdown across 4 different target groups:

Results

So with this short 3 day test we picked up 60 new users at a cost of just under 48GBP, so that works out at around 0.80GBP per new fan.

Case 2 – Sponsored Story

So next up we wanted to see how the ‘social effect’ of sponsored stories would work for us. Sponsored stories seemed quite attractive, because the almost don’t seem like adverts, they only go out to friends (so it’s kind of like the stories on a users wall) and it’s just a little box with no advertising copy:

And just like normal facebook ads we could be very targeted:
who live in one of the countries: United States, New Zealand, Canada, United Kingdom or Australia, age 18 and older, who like art, art design, art history, artwork, contemporary art, digital art, fine art, fine arts, illustration, modern art, painting or sculpting, whose friends are already connected to ArtWeb.com

So we ran this campaign for 7 days with the following results:

And a break down of cost:

Results

We had 351 new fans at just over 63GBP, so that’s about 0.18GBP per new fan which is over 4 times more efficient than the advert case.

Observations and Conclusions

Our results show sponsored stories to have a much better CTR and cost performance with the goal of increasing fans. It’s no surprise to me that the sponsored stories attracted more fans – they have the crucial endorsement from a friend (or friends), plus no advert copy so it hardly even looks like an ad!

I should point out that from a business perspective advertising copy makes an important distinction, it gave us a chance to succinctly mention our commercial services whereas the sponsored story gave less of a chance – someone liking the sponsored story might not immediately understand what we offer as a service even after a quick look at our wall. So one should bear this in mind when doing this type of comparison as it, we may have more fans with the sponsored story but do they really know what we are about?

18pence for a fan!

As a general marketing observation, 0.18GBP for a ‘fan’ seems like incredibly good value – with similar interests/keyword on CPC advertising we’ll often bid many times more than this for a single click but we don’t yet have enough solid conversion data about traffic from our facebook page to draw a real comparison. However even without direct conversions, the long term value (brand awareness, user interaction, friend referal, kudos from having a million fans, potential SEO implications to mention a few) of a fan makes this good value.

Also I note the wall street journal reports $1.07 as a typical cost per fan.

This ‘good value’ won’t last though, it’s still early days. Like with CPC you can bet yo ass it’ll get more expensive.

The post Increase Facebook Fans – Case Study: Sponsored Story vs Advert appeared first on Mr Kirkland.

moving/migrating svn repositories

Chris Kirkland — Tue, 02 Nov 2010 15:12:08 +0000

For those of you still on SVN (and so I don’t forget) here’s a few SVN tips for when moving repositories

Updating your local copy If your repository is moved

If the repo moves, to a new server or changes protocal (ssh -> https) etc. Then there’s a simple command for seamlessly updating your local checked out copy
svn switch --relocate

To figure out what should be look in the .svn/entries file in the top dir of your local copy.
You’ll see something like (for svn+ssh)
svn+ssh://username@svnserver.com/var/lib/svn/project_name/trunk svn+ssh://username@svnserver.com/var/lib/svn/project_name

so if you are moving to new-svnserver.com then you’d use:
svn switch --relocate svn+ssh://username@svnserver.com/var/lib/svn/project_name/trunk svn+ssh://username@new-svnserver.com/var/lib/svn/project_name/trunk

or perhaps you’re moving to a hostedservice.com and accessing over https:
svn switch --relocate svn+ssh://username@svnserver.com/var/lib/svn/project_name/trunk https://hostedservice.com/myaccount/project_name/trunk

If you happen to have a number of projects in the same dir all needing the same migration, here’s a quick shell script to loop through them all:
#some config DATA_PATH=/path/to/projects OLD_SERVER=svn+ssh://username@oldeserver/var/lib/svn/ NEW_SERVER=https://hostedrepo.com/accoutname/

cd $DATA_PATH for i in * do if [ -f $DATA_PATH/$i/.svn/entries ] then cd $DATA_PATH/$i && line=$(grep -m1 $OLD_SERVER .svn/entries); new_line=$(echo "$line" | sed "s|$OLD_SERVER|$NEW_SERVER|g") svn switch --relocate $line $new_line; fi cd $DATA_PATH done

Migrating a repository to a new server

It’s pretty simple to migrate repositories with the command
svn dump

1. On the old server, dump all the individual projects in /var/lib/svn (or wherever your repository is located)
cd /var/lib/svn for i in *;do echo $i; svnadmin dump $i > /path/to/dump/$i.dump;done scp /path/to/dump/*.dump newserver:/tmp/

2. assuming you have already installed svn on the new server with a svn user account (assumed to be svn below), load the dumped data:
#load the dumps we just copied accross cd /tmp/ for i in *.dump;do REPO=$(echo $i | sed s/.dump//g); svnadmin create /var/lib/svn/$REPO; svnadmin load /var/lib/svn/$REPO < $i;done

#now set permissions cd /var/lib/svn/ for i in *;do echo $i; chown -R svn:svn $i; chmod -R g+w $i/db;done

3. The you’ll probably want to follow the tip above about updating clients with the new server url

references

https://wiki.archlinux.org/index.php/Subversion_backup_and_restore
http://svnbook.red-bean.com/en/1.1/ch05s03.html#svn-ch-5-sect-3.5

The post moving/migrating svn repositories appeared first on Mr Kirkland.

Preparing Your Website/Web App For Scalability

Chris Kirkland — Tue, 11 May 2010 14:45:05 +0000

Congratulations, your server is melting.

This is a good problem to have (server heat is proportional to # of users). And it’s not so difficult to deal with if make a few preparations in advance.

After launching a number of web services and viral social media apps, some of which grew to hundreds of concurrent users and zillions of hits within days, I’ve had to learn “on the job” about scaling websites. Here I present to you my thoughts and some simple suggestions on how you can prepare ahead of time.

Part 1: Think Modular

Really the most important principal to take on board is modular design. If you look at any high volume platform you’ll quickly see evidence of how the components of the service are split up into separate independent units. Wikipedia is classic example of this, cheek out the architecture behind wikipedia.

If you can separate all the different components of your website/web app, then you can easily apply more resources where they are needed. Just upgrading to a bigger server won’t get you far, long term you need to be able to identify where the bottle necks are and apply resources efficiently. In fact even identifying the bottlenecks can be surprisingly hard, modular design makes this much easier.

Spilt services into independent modules, it’s much more efficient to scale them individually

So some examples of the typical components your website/web app will be using:

media, images, static files (javascript, css etc)
db server
mail
batch processing (webstats, image resizing etc.)
front end (normal user area)
back end (control panel)

Separating these components needn’t be rocket science, here’s some simple examples of how you could apply modularity to the components above:

media -> move all media and static files on a dedicated domain/subdomain (or better still in a CDN) e.g.
db server -> always use a central config file for db settings, so you can easily change the hostname when the time comes to use a dedicated db server: $db_hostname = ‘mydomain.com’ -> $db_hostname = ‘dbserver.mydomain.com’;
mail -> again make sure mail settings are in a central config and ideally use a framework/library that allows you to change to an external SMTP server/service
batch processing -> create an independent system for any batch processes, for example use a standalone phpthumb installation for image resizing.
separate front end / back end -> make use of different a domain for control panels e.g. www.mydomain.com and controlpanel.mydomain.com, so later on it’ll be easy to move these onto different servers

In addition to thinking modular, always be monitoring and profiling your systems then you know exactly where the bottlenecks are and can deal with them more effectively.

Part 2: Other Key Principles

1. Queuing/Batching

Queuing prevents a server being overloaded all of a sudden

Typically there may be parts of your application that involve heavy processing. If these don’t need to be real time, or can tolerate slight delays then separate the processes and make use of queuing/batching so your web application isn’t held up.

Google Analytics is a good example of this, generally there’s no need to have real time webstats, so the collected stats are batch processed at regular intervals. Another example could be image processing, I’ve written about this in more detail here.

2. Partitioning

Taking modularization a step further and we have partitioning i.e. splitting data up in to smaller manageable chunks which can be stored/operated on separately. If you have a multi user system you could split users across servers, so for example your could have 1000 users per server, or split users by user id odd/even, divisible by X etc. Again you could do this ahead of time with subdomains:
odd user id: controlpanel-a.mydomain.com even user id: controlpanel-b.mydomain.com

Another example of partitioning is database partitioning, here’s a quick introduction, but this would come way later, having a dedicated db server or db cluster will scale you a long way.

3. Code First, Profile and Optimize Later

Slightly contrary to the idea of scalability, but definitely on topic: don’t start your code design with optimization, this is a waste of time. Not even the mighty He-man could predict where all your bottle necks will be. Always focus on writing easily maintainable code, and then in the later stages of development you can profile your application to find out where you need to optimize your code.

4. Cache Cache Cache

Caching is probably the simplest and cheapest way to dramatically improving performance. There are a variety of stages at which you can use a cache and whilst some require a little care at least a couple are almost trivially easy to implement. It’s good to be aware of all the options even if you don’t need to implement them yet.

Super Easy

Opcode Cache/Acceleration (like APC, xcache ) -> once installed require minimal maintenance and are completely transparent to your application. They simply keep copies of your complied scripts in memory.
DB Cache -> again usually very simple to install and will be transparent to your application and keep the result of queries in memory. for mysql users, see here.

Easyish

Cache server/HTTP accelerator (squid nginx varnish etc.) -> This is very common with high traffic sites, Caches can easily multiply the capacity of your website. Again this kind of cache is outside of your application so is relatively easy to work with. However you may need to take some precautions to ensure certain items aren’t cached, and are expired in good time and you’ll need a higher level of sysadmin skills to set up and maintain a cache server.
Built in framework cache -> many code frame works come with a cache system out of the box (e.g. codeigniter) this is a lot easier than rolling your own, but still you’ll need to be aware of how it works to avoid caching pages that should stay fresh.

Harder

Application object cache -> this kind of solution involves using a library/framework that allows you to cache arbitrary objects/variables across sessions. For example you might have a widget that appears on multiple pages, you could cache the object that generates this widget (or even the HTML output itself).
Setting HTTP headers -> you can of course manually set HTTP headers that tell web browsers how long to cache pages for. This gives the best performance boost as the data is stored locally on the end users machine. However this is no trivial task and requires a full understanding of how browsers respond to HTTP cache headers, here’s more info on the subject

Part 3: Quick and Practical tips

Within 10 mins his app was performing about 10 – 50 times faster. All I did was add a few basic indexes.

1. Database Indexes + Query Optimisation

This isn’t strictly about scaling, but it’s such a good performance tip: Learn how to do good DB Indexes, they are really simple and without them you’re hemorrhaging your DB’s potential performance away. here’s a quick tutorial.

I was having a geek cafe session with my good friend Craig Mod one afternoon and he asked me to look at his application as it was a bit slow. Within 10 mins his app was performing about 10 – 50 times faster. All I did was add a few basic indexes.

As mentioned above, don’t forget to turn on the DB query cache.

2. Use a separate database reader and writer

There are a number of reason why at some point you’ll need a DB master and slave(s) (load balancing, redundancy, zero downtime backups), you can prepare for this now by having two DB objects e.g $db_reader and $db_writer
$db_reader = new db_object(SLAVE_DSN); $db_writer = new db_object(MASTER_DSN); //reads go to the slave (or master) $db_reader->query("select * from user where id = 1"); //writes only go to the master $db_writer->query("update user set status = 'kool kat' where id = 1");

With a simple change to your DB config this allows you to instantly use a master/slave(s) setup to spread the load on your database.

3. Account For DB Reader Delays

Allow for the DB reader to be slightly behind the master – replicated slaves are likely to periodically lag behind the master even if just for a second or two. If you are using the tip above and sending reads to a slave then this could cause problems in your application. For example say a user is logged into a control panel making changes, if the DB reader is behind the DB writer, then trying to fetch the new changes from the database will potentially show the old values if the DB reader is behind.

4. Separate Server For Media/Static Files

I’ve mentioned this already above, but it’s worth mentioning again. Set up your site to host all images, js and css on a domain different from your main website. I’d recommend you have some sort of central configuration so the static server url is automatically included in all image urls e.g.
//hand coded (not recommended)

//simple variable
/images/1234_an_image.jpg” / >

//or full managed image urls using a special class to generate urls
images->get_image_url($image_id); ?>” />

5. Mail Config

If your application sends emails, make sure the mail config can be changed easily and can use SMTP.  Part of modularizing + queuing would involve having a separate server for sending, it’s most likely you’ll connect to this from your app via SMTP. Particularly now as there are a number of 3rd party SMTP services that can reduce the increasing headache of reliably sending mail.

If you are using a standard mail library like phpmailer, or as part of a framework, then this is probably set up already – see the docs. if you have your own DIY mail functions, it’s probably best to swap to some established library (e.g. phpmailer) anyway, no need to re-invent the wheel.

6. Avoid local file system dependencies – separate your data storage.

Separating storage of data from your application is an essential part of being modular. Probably most of your data is in a database already so without too much sweating and grunting this is most of the battle won. However you may be storing uploaded files, images etc. which you definitely don’t want to store in a database. But you shouldn’t just write these to the local file system either, if you use a cluster of application servers then you’ll need some sort of shorted storage system. A good example of this would be Amazon S3, there are plenty of client libraries and example code that means it should only take a matter of hours to integrate. An NFS server could be an option, but NFS can be nasty!.

7. Efficient Backups

Large backups can really slow down or even stop a server (particularly large DB backups), but chances are you could significantly reduce this load by

Not backing up temp/cache files
Using git/svn etc. and backing up the repository not the web application docroot
Database dumps/snapshots using a slave, so write locks will not affect your DB server
For your data storage use a filesystem/system that support snapshots (e.g. ZFS)

8a. Get into the Cloud

Using virtual servers/cloud computing will give you a certain amount of scalability for free. Most virtual server providers make it very easy to up the power on your machines, whilst this may not be the most elegant solution, it saves a massive amount of work compared to the task of migrating dedicated servers.

However that’s not the only scalability benefit. You get many of the suggestions I’ve made here for free with Cloud/Virtual computing, e.g.

Easy to clone machines (for setting up multiple identical servers)
On demand computing/per hour billing is great for batch processing
Filesystem snapshots
Robust Storage Solutions (e.g. Amazon S3)

8b. Services Above The Clouds

And last but not least I should mention there are now a number of higher level services, RightScale.com, Scalr.net, Google App Engine to name a few that offer automatic scalability. If you are able to work within their constraints (for example App Engine only supports Python + Java) then they can be a very attractive offering. Google App Engine:

Automatic scaling is built in with App Engine, all you have to do is write your application code and we’ll do the rest.

Closing Comments

As technology advances, a lot of scalability issues become easier to solve or disappear – we have cloud computing, easy to use CDN’s and services like Google App Engine. However until we have the Holodeck, I’m pretty confident most of the principles I’ve raised today will still be important in your design considerations.

…oh and hopefully your server has stopped melting now.

Amazon s3 versioning

Chris Kirkland — Tue, 09 Feb 2010 09:36:24 +0000

Amazon now offers what I would veritably call “indestructable fool proof” file storage. S3 already provides an extremely high level storage durability – multiple geographically separated copies of your data. Now Amazon have taken it a step further with versioning, so even if you accidentally delete your data it’s still safe.

The amazon S3 announcement which just popped into my inbox:

We are pleased to announce the availability of the Versioning feature for beta use across all of our Amazon S3 Regions. Versioning allows you to preserve, retrieve, and restore every version of every object in an Amazon S3 bucket. Once you enable Versioning for a bucket, Amazon S3 preserves existing objects any time you perform a PUT, POST, COPY, or DELETE operation on them. By default, GET requests will retrieve the most recently written version. Older versions of an overwritten or deleted object can be retrieved by specifying a version in the request.

You can read more about how to use versioning here.

Obviously you’ll have to pay for the extra space taken up by versions, but this looks like a really top class option for storing data that’s not regularly updated e.g. an image archive.

Also be interested to see if this spawns any new uses for Amazon web services…

The post Amazon s3 versioning appeared first on Mr Kirkland.

Cloud Computing Price Comparison

Chris Kirkland — Tue, 27 Oct 2009 10:33:02 +0000

Apples and Lizards

I’ve just been researching the estimated cost of cloud computing on some of the various providers out there who want to absorb your servers into their clouds.

Doing this research has been like comparing apples to lizards. The providers all have their own way of billing you, and there’s a whole host of extra features offered, some free, some not. To help you on your quest I present some nice tables with the results of my investigations here.

data

I tried taking a fixed monthly budget and seeing what best value I could get for the money as I think this is a fairer and more realistic comparison. I have as far as possible searched for the best deals (e.g. using amazon’s reserved instances and GoGrid’s pre-pay plans).

Price	Resource	EC2	GoGrid	RackSpace	VPS.net
$100/Month	CPU	3x1GHz	1Xeon	2GHz*	2.8GHz
	Ram (GB)	4.2	1	2	1.7
	Storage (GB)	480	60	80	70
	B/W (GB)	130	0	80	1750

$200/Month	CPU	2xDual2Ghz	1Xeon	4Ghz**	6GHz
	Ram (GB)	15	1.5	4	3.8
	Storage (GB)	1700	90	160	150
	B/W (GB)	100	206	80	3750

$500/Month	CPU	4xDual2Ghz	3x1Xeon	4x2Ghz*	3×4.8Ghz
	Ram (GB)	30	9	8	9
	Storage (GB)	3400	420	320	360
	B/W (GB)	550	0	500	9000

$1000/Month	CPU	8xDual2Ghz	3x3Xeon	2xquadx2Ghz	6×7.2Ghz
	Ram (GB)	60	12	16	28
	Storage (GB)	6800	520	620	1080
	B/W (GB)	1200	1000	1000	27000

* The 2Ghz is actually 1/8th of a 2xquad core 2Ghz machine
* The 4Ghz is actually 1/4th of a 2xquad core 2Ghz machine

Alas GoGrid only displays “Xeon” for the CPU, no further info seems to be divulged.

If the shoe fits

It’s become evident to me that you have to find the best fit for your needs (and future needs) in terms of price and features. It seems Amazon is the cheapest in terms of Memory and CPU, VPS.net by far the best for B/W while GoGrid throws in great freebies such as 100% SLA + Load balancing.

Here’s some quick facts to throw into the mix:

Amazon

No persistant storage (you can use EBS or S3 for this at extra cost)
No SLA
Support costs $100/month or $400/month
Lot’s of complementary services (Storage, RDB, Billing etc.)

GoGrid

100% SLA
Free Load Balancer
Free 24/7 support

Rackspace

Free “Fanatical Support“

VPS.net

Free Daily Backups
Support $99/month (includes 100% SLA)

Some price comparison tools

amazon AWS price calculator
rack space cloud pricing
GoGrid pricing calculator (doesn’t include bandwidth though)

Anyway, I’d be interested to hear anyone’s real experiences with hosting services in the cloud.

The post Cloud Computing Price Comparison appeared first on Mr Kirkland.

Cyber Mercenaries For Hire

Chris Kirkland — Tue, 16 Jun 2009 01:06:31 +0000

I have one or two allegedly white hat friends and enjoy stories of lore and legend regarding their chivilarous and well contained sporting activites. However when it comes to professional cyber crime, I feel my Tokyo life is somewhat sheltered from the darker forces that glide through the veins and fibre cables of the internet (this may be in part to being a non-windows user), so I was rather intreiged by one of this morning’s comments on my blog:

Tired of a competitor’s site? Hinder the enemy? Fed pioneers or copywriters?

Kill their sites! How? We will help you in this!
Obstructions of any site, portal, shop!

Different types of attacks: Date-attack, Trash, Attack, Attack, etc. Intellectual
You can work on schedule, as well as the simultaneous attack of several sites.

On average the data, ordered the site falls within 5 minutes after the start. As a demonstration of our capabilities, allows screening.

Our prices

24 hours of attack – $ 70
12 hours of the attack – $ 50
1 hour attack – $ 25

I note that perhaps a copywriter/proof reader might benefit this particular ‘Cyber Mercenary’, or perhaps there is a subtle difference in the third line between ‘Attack’ and ‘Attack’ to which I am not yet attuned.

Oh, and here’s an example attack that was “clearly ordered by someone”.

The post Cyber Mercenaries For Hire appeared first on Mr Kirkland.

Tate Modern, Brick Lane and negative utilitarians

Chris Kirkland — Tue, 28 Apr 2009 11:44:13 +0000

Just at the end of my trip to the UK, highlights include:

A visit to the Tate Modern
A stroll down Brick Lane with master painter Darvish Fahkr
And lunch with ‘Philospoher’ David Pearce

The post Tate Modern, Brick Lane and negative utilitarians appeared first on Mr Kirkland.

My Secret Voodoo SEO Technique

Chris Kirkland — Sun, 29 Mar 2009 12:43:06 +0000

I’ve been on the fringe of the SEO industry for over 10 years now. I’ve have watched it change from the good old days of when keyword stuffing would get you at the top of AltaVista to the modern day which uses some of the worlds most sophisticated software and technology.

And after experimenting, researching, listening to White Hats, Black Hats and an entourage of other SEO guru’s I’ve come to the conclusion that now-a-days effective “SEO” has become pretty simple. Not necessarily easy, but it is simple. The 3 steps below are my ‘secret’ formula which has worked well, for instance most of our business leads for The Artists Web come from natural search and our advertising budget is practically zero.

Step 1. Get a high Page Rank

This is more important than anything else and also significantly more difficult than the following steps. I’m sure you already know what Page Rank is – and that getting a high PR is basically down to how many quality incoming links you have. In essence what the rest of the internet is doing is more important than what you do on your own site.

In my opinion, the most effective long term ways of getting incoming links are

Have a regularly updated interesting website/blog
Create a great web service
Create some viral content
Make lots of friends

Step 2. Know your target subject and keywords

Your website has a theme, and you have a target audience you wish to attract. Obviously the theme and the search terms your target audience use must align. Moreover use keyword tools to find out the specific language most commonly used – for example is which term is most searched “sell artwork” or “sell paintings”?

Step 3. Create a high PR page for the subject/keyword in question and apply a few simple techniques

Finally you create a page on your site, give it as much page rank as you can and make sure you are using the keywords appropriately. These techniques will help, but they are no silver bullet – really the without the first step (getting the high page rank) any SEO ‘technique’ is going to be of limited use.

Link to the page from every page on your site, or at least from your higher PR pages e.g. the home page. This ensures you are allocating as much page rank to the page as you can.
Put the main keywords in the page title
Write a compelling Meta Description, the meta description is often used on the search results page, this is your chance to write some compelling copy which encourages people to click on your link. Don’t worry about keywords, consider it similar to writing the copy on a paid search advert.
Phrase your copy to natuarlly include search phrases. This is the only ‘trick’ that I use, just bare in mind your target search phrases. e.g. for the phrase “sell paintings”:
Okay

Selling paintings online is easy with our service.

Better

Our service helps you easily sell paintings.
Use appropriate keywords in the url e.g.
Bad

/page.php?id=1232

Good

/how-to-sell-paintings

Surely it’s not that simple?

Well actually I think it is. Basically google (for now english language search is pretty much all about google) has some of the best brains and technology continually working to ensure it has the most relevant, useful and authoritative results. It’s therefore simple enough to presume that long term the most relevant, useful and authoritative results will tend to feature first, so really all you have to do is be relevant, useful and authoritative – simple, but not necessarily easy. Yes there are plenty of other techniques and factors (HTML validation, link anchor text, page cachebility) but none of them will make a significant difference unless you have and interesting an respected website.

Finally, don’t cheat

And don’t be tempted to go for any ‘black hat‘ SEO technique – do not run the risk of being penalised. Think long term and focus on quality.

The post My Secret Voodoo SEO Technique appeared first on Mr Kirkland.

Mr Kirkland

Recipe To 150 Thousand Monthly Uniques

1. You Need An Underserved Niche

2. Good Timing

3. Depth

4. Good Content Often vs Crap Every Day

5. Kitteh (and Dogz)

6. SEO Business

6.1 Proper Mobile First Responsive Theme

6.2 Page Speed

6.3 Fixing WordPress Woes

6.3.1 Avoiding the Slashdot Effect

6.3.2 Removing Tags

6.3.3 Changing Orange Juice To Link Juice in About US

6.3.4 Writing More Compelling Meta Descriptions

6.3.5 Yoast SEO Plugin

6.3.6 Inter Linking & Smart Links Plugin

6.4 Some Notes On Varnish

6.4.1 Our Varnish Set Up

6.4.2 Getting WordPress, W3 Total Cache and Varnish To Play Nice

6.4.3 Custom Cache Expiry Headers Using .htaccess

6.5 Redirect Moved or Missing Pages

7. The Graph You Probably Want To See

8. Support Our Crowd Funding Campaign!

Script for launching load balanced EC2 auto scaling group

Increase Facebook Fans – Case Study: Sponsored Story vs Advert

Control Case/Background

Case 1: Advert -> FB Landing Page

Results

Case 2 – Sponsored Story

Results

Observations and Conclusions

18pence for a fan!

moving/migrating svn repositories

Updating your local copy If your repository is moved

Migrating a repository to a new server

Preparing Your Website/Web App For Scalability

Part 1: Think Modular

Part 2: Other Key Principles

1. Queuing/Batching

2. Partitioning

3. Code First, Profile and Optimize Later

4. Cache Cache Cache

Super Easy

Easyish

Harder

Part 3: Quick and Practical tips

1. Database Indexes + Query Optimisation

2. Use a separate database reader and writer

3. Account For DB Reader Delays

4. Separate Server For Media/Static Files

5. Mail Config

6. Avoid local file system dependencies – separate your data storage.

7. Efficient Backups

8a. Get into the Cloud

8b. Services Above The Clouds

Closing Comments

Further Reading

Amazon s3 versioning

Cloud Computing Price Comparison

Apples and Lizards

data

If the shoe fits

Amazon

GoGrid

Rackspace

VPS.net

Some price comparison tools

Cyber Mercenaries For Hire

Tate Modern, Brick Lane and negative utilitarians

My Secret Voodoo SEO Technique

Step 1. Get a high Page Rank

Step 2. Know your target subject and keywords

Step 3. Create a high PR page for the subject/keyword in question and apply a few simple techniques

Surely it’s not that simple?

Finally, don’t cheat