Amazon’s S3 outage yesterday lasted a total over 7 hours, and brought down SmugMug, Twitter’s avatar hosting, and crashed Twinkle - a Twitter application for the iPhone, as well as Youniverse’s image hosting. Youniverse uses Akamai as a CDN with S3 as the ‘origin server’, and is set with a TTL of 5 days. I think after yesterday’s outage I’ll be changing Akamai to permanently cache the objects and refresh them manually (we never have needed to do that, and it’s unlikely we ever will). Akamai is a more distributed platform and is, hopefully, less likely to have a similar incident.
This is Amazon’s second major outage this year, and given how many sites rely on the service it’s a real nightmare when it goes down. It’s a little concerning to think about some of the other third-party providers that could exhibit similar problems. For example, those using Yahoo’s YUI library might well be accessing them from Yahoo’s JavaScript site, rather than hosting the files themselves. In the even that these go down, the sites will likely be rendered useless, in a rather ungraceful manner. Fortunately, JavaScript, CSS, and the like can all be easily hosted locally. Hundreds of gigabytes of photos are not so easy to have local backups of.












Maya Ed says:
Akamai can easily be setup to serve “stale” content by default. This means that if you can’t reach an origin server Akamai will serve whatever is already in cache. This may be a better option than permanently caching objects and having to purge the full site. Just a suggestion.
Jul 21, 2008, 22:28Dave says:
That’s a nice idea. Unfortunately in this instance S3 was actually returning content, albeit an HTML page saying “Service Unavailable”. Akamai ended up caching that.
Jul 21, 2008, 06:02