Simple Storage Service (You’ve Come a Long Way, Baby)

It’s taking a few weeks of cooking, but my Simple Storage Service is ready to come out of the oven to be eaten (and possibly spat out) by the world at large. Now there are still a couple of things on my TODO list, but nothing massive. Basically URL authentication of requests (needs some thought), postObject (I need to read the docs), virtual hosting of buckets (a lot of thought) and some tiny changes and bugs that i’ll fix over the next few days. So what has changed since my last post:

  • Anonymous requests can now be made where permission to do so has been set.
  • Authenticated/Alluser groups and ACL get and sets have been implemented.
  • All REST calls have been implemented (except postObject)*
  • Exception handling matches the S3 documentation (with some guess work)
  • The REST layer was completely rewritten using test driven development
  • phpDocumentator comments are being added to the code, so docs can be generated
  • I’ve created a web form to help you create new users to the service

So what’s next …. ? I guess I’ll polish what’s been completed so far and add some documentation to make it simpler to deploy. And as i’m off Snow boarding from Saturday i’ll wait to see what sort of feedback I get before getting started on the SOAP section which should be easier now that I’ve got a good testing setup + looking for a new job as i’ll be leaving mine soon! I’ve also found that the most popular php client for S3 (from a google search) is missing some useful functionality, so i’m pondering re-writing it and making several optimizations so it can stream downloads from S3 etc etc…

The best use for this software, apart from academic curiosity and mocking is probably a failover/backup service incase S3 goes down (which it has done). This would work best if you are CNAME record to map to s3.amazonaws.com as I believe that as this is under your DNS control it is fairly trivial to map it to another host.

Other than that I’ll write a blog on how to set it up using xampp on windows and macports on a mac (when I MacBook Pro arrives)…

You can checkout the latest code from here: http://svn.magudia.com/s3server

UPDATE: svn is broke since I moved to slicehost, you can download the code here: http://projects.magudia.com/s3server.zip

* As this service hasn’t been developed to work to meet Amazon’s data consistency model I implemented getBucketLocation, but essentially it does nothing. Although in theory I could use MySQL clustering to implement this I’m not going to unless someone wants to pay me and I also don’t have a global server network to play with ūüėČ

Specifications

I recently read one of Joel’s blogs on still how difficult it is to reverse engineer a Microsoft Office document even though Microsoft have now released their specification’s on the formats. Now the problems I’ve been facing are in no way on the order of magnitude of any developer attempting to reverse engineer one of Microsoft’s Office documents, but as some of you may know I’ve been attempting (mostly with success – more tomorrow on that) to create a clone of the Amazon S3 service from their freely published documentation.

The problem is that it’s quite easy to replicate the ‘happy path’ of the specification as that’s been quite clearly documented, but when you try and recreate how and when different errors are thrown from just the documentation things become a little bit more murky. Say the document states that it throws different errors depending on if the Content-MD5 or the Content-Length don’t match was calculated by what was received by the service, then how do you know which will get sent first as it’s quite likely if one condition fails then the other will also fail? The specification doesn’t answer this, but my answer is that it’s probably best that it shouldn’t and these sort of questions are best left to developer forums as sometimes a specification can so detailed that no-one ever reads it!

Then today I was thinking on my way to my parents house that maybe I was wrong to create the back end database layer first and I should have stuck with a contract first approach, but later on my way home I remembered the reason I didn’t: The Amazon S3 REST service doesn’t have a contract, it has documentation – which simply isn’t the same. The S3 SOAP service does have a contract of sorts – it’s WSDL, but even that doesn’t help you recreate/describe the ‘unhappy path’ of the underlying service. The only real way you can do this is to write tests against the real service and hope they (the people who own the service) don’t change it much and your tests map out most of the potential paths which exist. Even better if the specification came with a downloadable set of software tests (JUnit et al) then that would make building a client even easier … a baseline reference implementation of sorts.

Simply contract first development works well when you own the software behind the contract and the contract itself. I’m not fully convinced it works as well when you have neither and your trying to clone a service. I could write tests against S3, but they would mean signing up and possibly breaking the T&C’s, but this project wasn’t to threaten S3 or get sued, but to understand it and the fundamental principles of well behaved web based services it bases itself on. I guess I’m someone who likes to take things apart to see how it works and that’s what I’ve done.

Also from my current experience it’s harder to develop a REST service than it is ‘in theory’ a SOAP service; BUT I think a REST service is easier to consume by clients of the service than SOAP. Simply because SOAP has massive interoperability problems between tool kits as the SOAP specification it itself ambiguous and are in small parts incompatible with several languages and REST simply has none of this because it based on the great HTTP RFC 2616 which the entire web is based on (including the majority of SOAP based services).

I have no solutions, just more questions and that generally isn’t a bad thing!

Windows

Dear Windows,

I’m not sure how i’m going to say this, but I think our time together is at an end. I’m not sure that we were ever that suited to each other even when we first met; around that time I’d had recently left Commodore and I think I was just looking for something different.

We’ve been together for many years now (15 years I believe) and I’ve seen you change from 3.1 to 95, 2000, XP and now finally Vista. I guess we’ve just grown apart and we simply don’t have the same interests anymore, you’re more into your business work and i’m still just a hacker at heart, which I don’t think you ever satisfied or appreciated. I should tell you that while at University I met Solaris (who knows Unix and Linux) and that’s were my doubts about you started, but even then I stuck by you as they had serious issues which meant I couldn’t imagine living with them.

Many years ago I met Apple at work and although we didn’t like each other at first (OS 7 – 9) a couple of years ago I cheated on you with Apple by buying a MacMini. Now after having fun with Apple for a couple of years I’ve decided that Apple and I are better suited than you and I ever were. You may think this is just a phase i’m going through and you may be right, but I think I need to try, so Apple is moving in next week (MacBook Pro is on it’s way!) therefore i’m sorry to say you’ll need to move out (and BT says I can’t keep you). I’m sure we’ll see each other around the office.

Take care,
Milan

Flock

I first tried Flock when it came out in it’s initial beta many many moons ago, but with the recent death of Netscape and some fortunate stumbling I downloaded and gave the 1.1 beta release (now final) a go…
Now for those who have never tried Flock as a browser I can best describe it as Firefox inside with a social networking wrapper on the outside. My default layout for Flock (shown below – I lost the picture) is to have a left hand ‘People’ frame showing all updates from my Twitter, Facebook and Flickr friends sorted by the most recent updates first and have a media stream of pictures my friends have uploaded to Facebook (although that’s usually hidden to gain browser space).
The one thing which does really annoy me is that some people update their Facebook status using Twitter and obviously this causes duplicates in my people feed (not¬†flocks fault). I can see why they do that, but for me a tweet is different from a status update; it’s just plain lazy and pointless duplication.You also have a ‘My World’ page which aggregates all of this as well have any Atom/RSS feeds I have into a single page view.¬†And if that wasn’t enough you can save your bookmarks to del.icio.us, post directly to Twitter and Facebook and even write a blog posting. All of which I think is pretty damn cool.
OK all that’s all great and I do use it as my default browser, but what wrong with this picture …¬†(I lost the picture)
Personally I think at the moment this is a cool, but ultimately¬†a fringe browser for people who are interested in social networking or earn a living by it; there just aren’t that many people who will find it useful (i.e. most of my friends and family would never need it – yet). Also unless you already have accounts of Facebook, Twitter and¬†Flickr etc.. the appeal is very limited and you can’t easily¬†add more services to the browser as the ones which are there are baked in.
Then again I do have accounts on all those web services and I am interested in social networking, so it might as well be called Milan’s browser. I’m looked forward to future updates. This post was written using Flock

Tags: , , , , ,