Simple Storage Service (You’ve Come a Long Way, Baby)

It’s taking a few weeks of cooking, but my Simple Storage Service is ready to come out of the oven to be eaten (and possibly spat out) by the world at large. Now there are still a couple of things on my TODO list, but nothing massive. Basically URL authentication of requests (needs some thought), postObject (I need to read the docs), virtual hosting of buckets (a lot of thought) and some tiny changes and bugs that i’ll fix over the next few days. So what has changed since my last post:

  • Anonymous requests can now be made where permission to do so has been set.
  • Authenticated/Alluser groups and ACL get and sets have been implemented.
  • All REST calls have been implemented (except postObject)*
  • Exception handling matches the S3 documentation (with some guess work)
  • The REST layer was completely rewritten using test driven development
  • phpDocumentator comments are being added to the code, so docs can be generated
  • I’ve created a web form to help you create new users to the service

So what’s next …. ? I guess I’ll polish what’s been completed so far and add some documentation to make it simpler to deploy. And as i’m off Snow boarding from Saturday i’ll wait to see what sort of feedback I get before getting started on the SOAP section which should be easier now that I’ve got a good testing setup + looking for a new job as i’ll be leaving mine soon! I’ve also found that the most popular php client for S3 (from a google search) is missing some useful functionality, so i’m pondering re-writing it and making several optimizations so it can stream downloads from S3 etc etc…

The best use for this software, apart from academic curiosity and mocking is probably a failover/backup service incase S3 goes down (which it has done). This would work best if you are CNAME record to map to s3.amazonaws.com as I believe that as this is under your DNS control it is fairly trivial to map it to another host.

Other than that I’ll write a blog on how to set it up using xampp on windows and macports on a mac (when I MacBook Pro arrives)…

You can checkout the latest code from here: http://svn.magudia.com/s3server

UPDATE: svn is broke since I moved to slicehost, you can download the code here: http://projects.magudia.com/s3server.zip

* As this service hasn’t been developed to work to meet Amazon’s data consistency model I implemented getBucketLocation, but essentially it does nothing. Although in theory I could use MySQL clustering to implement this I’m not going to unless someone wants to pay me and I also don’t have a global server network to play with 😉

Specifications

I recently read one of Joel’s blogs on still how difficult it is to reverse engineer a Microsoft Office document even though Microsoft have now released their specification’s on the formats. Now the problems I’ve been facing are in no way on the order of magnitude of any developer attempting to reverse engineer one of Microsoft’s Office documents, but as some of you may know I’ve been attempting (mostly with success – more tomorrow on that) to create a clone of the Amazon S3 service from their freely published documentation.

The problem is that it’s quite easy to replicate the ‘happy path’ of the specification as that’s been quite clearly documented, but when you try and recreate how and when different errors are thrown from just the documentation things become a little bit more murky. Say the document states that it throws different errors depending on if the Content-MD5 or the Content-Length don’t match was calculated by what was received by the service, then how do you know which will get sent first as it’s quite likely if one condition fails then the other will also fail? The specification doesn’t answer this, but my answer is that it’s probably best that it shouldn’t and these sort of questions are best left to developer forums as sometimes a specification can so detailed that no-one ever reads it!

Then today I was thinking on my way to my parents house that maybe I was wrong to create the back end database layer first and I should have stuck with a contract first approach, but later on my way home I remembered the reason I didn’t: The Amazon S3 REST service doesn’t have a contract, it has documentation – which simply isn’t the same. The S3 SOAP service does have a contract of sorts – it’s WSDL, but even that doesn’t help you recreate/describe the ‘unhappy path’ of the underlying service. The only real way you can do this is to write tests against the real service and hope they (the people who own the service) don’t change it much and your tests map out most of the potential paths which exist. Even better if the specification came with a downloadable set of software tests (JUnit et al) then that would make building a client even easier … a baseline reference implementation of sorts.

Simply contract first development works well when you own the software behind the contract and the contract itself. I’m not fully convinced it works as well when you have neither and your trying to clone a service. I could write tests against S3, but they would mean signing up and possibly breaking the T&C’s, but this project wasn’t to threaten S3 or get sued, but to understand it and the fundamental principles of well behaved web based services it bases itself on. I guess I’m someone who likes to take things apart to see how it works and that’s what I’ve done.

Also from my current experience it’s harder to develop a REST service than it is ‘in theory’ a SOAP service; BUT I think a REST service is easier to consume by clients of the service than SOAP. Simply because SOAP has massive interoperability problems between tool kits as the SOAP specification it itself ambiguous and are in small parts incompatible with several languages and REST simply has none of this because it based on the great HTTP RFC 2616 which the entire web is based on (including the majority of SOAP based services).

I have no solutions, just more questions and that generally isn’t a bad thing!

Aspects and PHP

phpAspects is a project to bring Aspect-oriented programming (AOP) to PHP. If you don’t know much about aspects then to state it very simply aspects is a way to separate concerns such as logging and dependencies like database handling to produce more manageable and maintainable code, but the wikipedia article on aspects can describe this better than I can!

Now this project isn’t really mature at the moment i.e. it’s alpha version 0.10 which means it’s very likely to change alot before it becomes final which is the main reason I’ve decided not to use it from my SimpleStorageService project – although I’m really tempted to play around with it anyway; The other reason why I’m not going to use it is because i’d have to add another PECL dependency to the project (on top of PDO) which is Parse_Tree (another alpha dependency… Hmmm!). Then again over the last few days I’ve come to believe if you want to write a REST interface on a LAMP style stack you’re going to need to be able to have server level access to configure Apache to allow PUT & DELETE HTTP verbs (editing httpd.conf), although I’m hoping to find a higher level solution to that problem for easier deployment.

But I do recommend that it’s a project you should keep your eye on as sooner or later you’re going to want to use Aspects to keep your code clean and simple. Once it is a bit more mature I will be using it in some or my more work related projects where I’ll have more control of which extensions are compiled into PHP.