Thursday, 4 November 2010

RESTful architecture: what should we PUT?

I've recently been reading REST In Practice, by Ian Robinson, Jim Webber and Savas Parastatidis. I had some knowledge of the hypermedia-driven architectural style, but this book has really helped me clarify my understanding of both the how and why. It's also convinced me that I should definitely consider Atom for event-based integration requirements.

There was one concept I found puzzling. In Chapter 5 (the callout box on page 114, if you have the book), the authors recommend using POST to update the state of a resource. This is driven by a choice of interpretation of the semantics of PUT. According to the HTTP spec:

If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server.

In the book, the authors interpret this to mean that the body enclosed with the PUT request should contain the same elements as the representation served by GET requests at the same URI. Since we are using the HATEOAS style, representations include links and other hypermedia controls. The implication is therefore that clients should PUT representations containing both business data (for example the new contents of the coffee order) and hypermedia controls (the available next steps in the workflow). To quote the book:

this obliges a client to PUT all the resource state, including any links, as part of the representation it sends

The problem here is that the client has no business determining what workflow steps are available. It's the server's job to understand what steps are available given the current resource state, and advertise those to clients using hypermedia controls. Therefore, we don't want the client to send the complete representation including both data and hypermedia controls.

I see four potential resolutions to this conflict:

Use PATCH
The PATCH HTTP verb is designed to explicitly support partial updates to a resource. However it's not widely supported. It also feels semantically wrong to me. From the client's point of view, the business data comprising an order (how many lattes?) is the whole resource. The client doesn't see this as a PATCH, but as a replacement, i.e. a PUT. PATCH might make sense to express concepts like "use skimmed milk in the latte, instead of the full fat I originally ordered".
Use POST
This is the approach suggested in the book. For me, it has similar drawbacks to PATCH. POST implies appending to a resource. POSTing one cappuccino to a coffee order resource feels like it should add one cappuccino, not replace the existing set of ordered coffees with one cappuccino.
Use PUT, including hypermedia
To follow the strict interpretation that PUT should include entire representations, the client sends both the entire new coffee order and whatever hypermedia controls the service last sent it. The service then ignores these controls, since it's the service's job to determine what they should be, and on future GETs the service sends whatever controls it deems appropriate at that time. This feels nasty; we are sending unnecessary data just to satisfy some architectural OCD (hat tip to Seb Lambla for that phrase!).
Use PUT, don't include hypermedia
The client sends a complete representation of the new order, but no links. To me, this feels conceptually right. The client fulfils the expectations of PUT by sending a complete representation of the parts of the data for which it is responsible, but does not pretend to be responsible for determining what hypermedia controls are available.

I explored some of these thoughts through a twitter conversation with @serialseb, @iansrobinson and @jimwebber. Being able to explore these thoughts in conversation with some of the experts is incredibly rewarding, compared to sitting alone pondering, so thanks to those guys for their contributions. Ultimately we came up with a simple rule of thumb, which seemed to attract mutual agreement:

In response to GET requests, services serve complete representation of the current known state, including business data and available hypermedia controls. Clients PUT complete representations of the parts for which they are responsible.

HTTP places two significant expectations on PUT requests: idempotency, and the concept that the enclosed entity-body is a complete representation. This rule of thumb satisfies both, while absolving clients of the need to include data over which they have no control or responsibility, provided we accept that GET and PUT representations need not be the same.

This raises another question. Available hypermedia controls are just one example of representation elements that only the service should be generating. Other examples include service computed values (for example the total cost of a coffee order) and resource state that is owned by the service (for example, whether or not the order has been paid for). We don't want the client updating the total cost, or sending state fields telling the service that the client has paid, when it hasn't. Either of those could be bad for business! Should these data items be included in client PUT requests? My feeling is no. According to the rule of thumb, clients only include the parts for which they are responsible. However I have a feeling that this may be a controversial standpoint. From the twitter conversation, I learned that the authors aversion to partial PUT came from none other than Mark Nottingham.

Disclaimer: I hope that I have not misrepresented the viewpoint of the book and its authors. If I have, I apologise and welcome feedback and corrections.

Tuesday, 1 June 2010

Firefox autocomplete plays havoc with hidden input fields

We recently had a bug in our system which we eventually traced down to some odd Firefox behaviour. We’re using hidden input fields in our HTML to pass various pieces of data from the server to JavaScript code. We found that on some occasions, these hidden input fields would have different values from those sent by the server, even before any JavaScript that could change the values had run.

We realised that Firefox was using its autocomplete feature to auto-populate the hidden input fields with old values. The problem would only happen when some JavaScript had set the value of the hidden field, then the user pressed F5. The value set by the JavaScript would then reappear after the reload, even though the server sent a different value.

The solution we found (credit to Jim) is to use the non-standard HTML attribute autocomplete="off". You can also use an invisible span. The following page demonstrates the problem:

<html>
    <body>
        <input type="hidden" value="original" id="broken"></input>
        <input type="hidden" value="original" autocomplete="off" id="fixed"></input>
        <span style="display: none;" id="span_not_affected" value="original"></span>
        <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js"> </script>
        <script type="text/javascript">
            alert("input with no autocomplete attribute: " + $("#broken").attr("value"));
            alert("input with autocomplete attribute: " + $("#fixed").attr("value"));
            alert("span: " + $("#span_not_affected").attr("value"));
            $("#broken").attr("value","changed");
            $("#fixed").attr("value","changed");
            $("#span_not_affected").attr("value","changed");
        </script>
    </body>
</html>

Save this page as an html file and open it in Firefox. You'll get 3 alerts saying 'original', as you might expect. Reload it (F5), however, and the first alert, will say 'changed'. The other two will not be affected. Firefox has remembered the value 'changed' that was set by the JavaScript the first time round and ignored the value in the source HTML.

I’ve seen this behaviour in Firefox 3.6, but not in any of Chrome, Safari or IE6.

Wednesday, 24 March 2010

So, er, which build label do I deploy this week?

The problem

A couple of weeks ago our team accidentally deployed the wrong build to our production environment. We deployed that latest green build from CruiseControl.NET, rather than the slightly older one that we’d intended. Fortunately, this didn’t matter too much. After all, this build had passed our test suite, so there were no serious bugs.

We decided to perform some root-cause analysis, using the 5-Whys technique, to understand how this had happened. We came up with a few root causes:

  • It’s hard to tell what code changes are contained in a given build. The only way to do this is to analyse the git commit history, which can be rather tangled. To make this easier, we have now agreed on a convention for git branch naming and merging patterns, based on this one from @nvie.
  • We were testing stories, not builds. This is a natural way to approach a story-driven Agile project, but stories aren’t what gets deployed to production, builds are. Come time to deploy, we’d know that stories #123, #134 and #96 are signed off, but that sign off would be tied to a point in time, not a build number. How do we know what build to deploy for these stories?
  • We didn’t have a way to retrospectively ‘fail’ green cruise builds. Just because CruiseControl.NET passes a build, it doesn’t mean it’s perfect. When a QA finds a bug during manual or exploratory testing of a story, they fail that story. But we really need to fail that build. We don’t want to deploy this build or any following one until this bug is fixed.

The solution

We came up with a great, low-tech way to solve this problem: coloured stickers!

  • When the developers think that they’ve finished coding a story and that it’s ready for test, they put a white sticker in the bottom left of the story card, and write the build number on.
  • If QA pass the story, they stick a green sticker on, with the build number against which they tested written on it. The story is now ready for deployment.
  • If they find a problem, they use a red sticker. Again, they write on the sticker the number of the build in which they found the problem.
  • When the developer commits their fix to the problem, they use another white sticker labelled with the build number containing the fix. This will be followed by another green or red, and so on.

What does this give us?

Imagine I want to deploy stories #123, #134 and #96 to production. We know that they’ve all been passed by QA, but we need to answer some more questions.

  • What build contains working versions of all these stories? It’s now very easy to answer this: just look at the green stickers on each card. To deploy these stories, we need the latest of the build numbers on the green stickers.
  • Is the build I’m deploying safe to deploy? Anything on the wall with a white or red sticker not followed by a green one represents a build containing code that either is known to have a bug or has never been tested. Let’s say we want to deploy build 2503 (it has a green sticker for a story we want to deploy to our users), and the currently deployed build is 2496. If we see a white or red sticker on a story that’s still in test with build 2499 on it, we know that we can’t deploy 2503. The team can now focus on finishing the story that’s carrying the sticker for build 2499, so that we can get these features to our users as quickly as possible.

What’s great about this is how easy it is to glance at the wall to get the answers to these questions. We don’t have to go digging in our cruise result logs or story tracking tool or the project manager’s spreadsheet.

Other benefits

Each time a story goes into test and fails, that is waste. In the language of lean thinking, we see rework (QA may have to retest areas of functionality after the fix, dev may have to re-code something), transportation (extra handoffs between QA and dev, often delayed if one party is busy working on something else) and waiting (QA and dev waiting for each other to be available for collaboration, dev idling while a story is QAed as they don’t want to pick up a new one and thus have to switch context back to the one in test). We are always looking to minimise waste, so we’d like to reduce the number of trips taken by each story around the dev-QA cycle.

The number of stickers on each card is an immediate visual indicator of this metric. A quick look at the wall (again, no digging in the story tracking tool!) shows which stories have incurred this type of waste. We might then decide to apply some root-cause analysis and other continuous improvement activities to those stories, so that we can understand what really caused the waste and prevent it in the future.

I can’t recall whose idea this was, so credit goes to the whole team. Extra credit to Nick Ashley, who chose the type of stickers to use when I was telling him to use a completely different sort that would have been entirely impractical.

Tuesday, 16 March 2010

Lambda-passing style for access to resources with lifecycles

When accessing a resource whose lifecycle must be carefully controlled, such as database or network connections, we would traditionally use a try..finally or using block in C#. Something like this:

using(var db = connectionPool.GetConnection())
{
    results = db.ExecuteQuery(query);
}

or

DatabaseConnection db = null;
try
{
    db = connectionPool.GetConnection();
    results = db.ExecuteQuery(query);
}
finally
{
    db.Close();
}

I'm not really a fan of either of these styles. The try..finally one is particularly nasty. It would be very easy to forget to call db.Close(), or to forget to put that call in a finally block. It’s also unpleasant to have to declare and initialize the db variable outside the scope of the try block. Not only is it ugly, but it means that code further down could try to access the database connection once I’ve closed it.

The using option isn't quite so bad, but as a client of the connection pool (or whoever is managing my resource), I still have to remember to call Dispose() or wrap it in a using block. It also requires that the DatabaseConnection type expose both the operations (e.g. ExecuteQuery) and the disposal mechanism, which may more naturally belong on the connection pool.

I prefer the following style:

var results = connectionPool.WithOpenConnection(c => c.ExecuteQuery(query));

With this style, there's no way we can forget to close the connection. There's also no way that we can access the connection object outside of the intended scope.

The code inside my connection pool object usually looks like one of the examples above, using using or try...finally. I've now managed to confine the use of those unpleasant constructs to just one place in the codebase, rather than repeat them every time I need to use my database.

Typically, WithOpenConnection has a generic type parameter to specify the return type. Because C# doesn't allow void to satisfy a generic type parameter, you typically need to provide two implementations of WithOpenConnection, one for when your operation has a return value and one when it doesn't. This is the only drawback I can see of this approach.

Building Quality In

A common theme in lean and agile software delivery is to build quality in. That is, rather than inspect our software for bugs after we’ve written it, we make the software inherently high quality. One way to do this is minimise the opportunity to make a mistake and introduce a bug. If we make it easier to do the right thing, we’re less likely to make a mistake. Sometimes this is referred to as setting ourselves up for success.

That principle underlies the reasons that I prefer the lambda passing style. It’s impossible to access the database connection and forget to close it, or to access a stale connection, so we should never be able cause a bug that way.

Friday, 15 January 2010

CruiseControl.NET Campfire plugin

On my current project, we’re using 37Signals’ Campfire for team chat and collaboration. We also use CruiseControl.NET for continuous integration. I thought it would be nice to have CruiseControl.NET automatically post build results into the Campfire chat room, to raise visibility of the build status out of the system tray and into the browser.

Implementing CruiseControl.NET plugins is quite straightforward, and Campfire has a very simple XML-over-HTTP API for posting messages into chatrooms. The code for the plugin is available on my github account. There are instructions there for using the plugin in your environment.

For the CruiseControl.NET integration, I found this blog post from Kris Selden very useful. The one thing I wasn’t sure about was how to handle errors. I didn’t want to risk exceptions being thrown from my plugin causing the build to fail. If something’s going to go wrong, I’d rather the plugin failed silently – we’ll quickly notice the lack of messages appearing in Campfire and look into it – so I made sure that I catch all exceptions and use CruiseControl.NET’s log4net facility to write to the server log.

I did have one problem with the Campfire API. I forgot to set the Content-Type HTTP header. Without this I was getting the HTTP response code 422 Unprocessable Entity, with a message “Body can't be blank”. This confused me for a while as I was definitely sending a well formed XML message with a body element. Setting the Content-Type header to application/xml fixed the problem.

Wednesday, 7 October 2009

Red-Green-Refactor at story level

A quick thought that came up this morning. When working on a story, we have two sets of responsibilities:
  • to the acceptance criteria and business value
  • to the technical quality of the codebase
We often forget to attend to the codebase once we've finished attending to the business value. We may end up with an unfinished refactoring. We are often in such a rush to claim the points. It's important that we keep the codebase clean. Red-Green-Refactor and "Make it work, make it right" apply at the story level as well as the code level. Satisfy the criteria, then make sure you're happy with the code quality.

Friday, 4 September 2009

NHibernate Bidirectional Cascade – when is it appropriate?

I recently found a situation where I needed to use NHibernate’s cascade facility from both ends of a relationship. Normally, I’d use it only in one direction, but in this scenario the only practical solution was to use it in both directions. I’m unsure whether this is a good thing to do, so in this post I’ll:

  • talk about why I normally use it in one direction
  • talk about the situation that led to the bidirectional cascade
  • ask whether this was the right solution.

One-directional cascades

Normally, the cascade is tied to the concept of ownership in the model. We cascade changes from the parent (e.g. an Order) to its logical children (e.g. OrderLines). This fits in with the Domain-Driven Design concept of an aggregate root. If we have the appropriate cascades set up from the aggregate root to the contained entities, then we can follow a simple workflow:

  1. load up an aggregate root from the repository
  2. modify, add and remove entities within the aggregate
  3. let NHibernate automatically persist the changes to the database when the session is flushed

This makes for very clean domain layer code which remains persistent ignorant. The domain code that is performing these modifications doesn’t have to explicitly save or delete any of the entities within the aggregate.

I suppose this model does not say that we can’t have cascade running from children to parents. But I always felt that the natural order of things was to have cascades running in one direction.

Why we needed bidirectional cascades

transient

In this scenario, the Customer is an aggregate root containing the ShoppingOrder and OrderLine objects. I’ve set up cascades from the Customer down to the OrderLine class, which allows the following code:

using (new SessionScope())
{
    var customer = ActiveRecordMediator<Customer>.FindByPrimaryKey(customerId);
    var shoppingOrder = customer.AddOrder(some, parameters);
    var newOrderLine = shoppingOrder.AddLine(some, parameters);
}

which executes SQL like this:

SELECT ... FROM Customer WHERE Id = ...
INSERT INTO ShoppingOrder ...
INSERT INTO OrderLine ...

Perfect. Note that I’m using Castle.ActiveRecord to configure and access NHibernate, with auto-flushing sessions. As far as I know, this scenario would translate directly to an equivalent plain NHibernate set up.

The problem comes with the OrderLineState. In our application, we have a separate part of the system which tracks the state of orders. There are two implications of this being a separate concern:

  • The OrderLineState is not considered part of the Customer aggregate. There’s no relation from the OrderLine to the OrderLineState.
  • The piece of code that creates the initial state object for an OrderLine is logically a long way from the piece of code that creates the OrderLine.

This means that when we create the OrderLineState, we need to explicitly tell NHibernate about the new entity:

using (new SessionScope())
{
    var customer = ActiveRecordMediator<Customer>.FindByPrimaryKey(customerId);
    var shoppingOrder = customer.AddOrder(some, parameters);
    var newOrderLine = shoppingOrder.AddLine(some, parameters);

    // In some other part of the system, perhaps responding to a domain event
    var state = new OrderLineState(newOrderLine);
    ActiveRecordMediator<OrderLineState>.Save(state);
}

But this code throws an exception from the call to Save(state):

NHibernate.PropertyValueException: not-null property references a null or transient valueNHOddities.OrderLine.Parent

This makes sense. We’ve asked NHibernate to save the OrderLineState object. This has a non-null, cascading reference to the OrderLine, so it tries to save the OrderLine. Because the relationship OrderLine.Parent (the Order) does not have cascading configured, NHibernate won’t try to save the Order yet. So the Order referenced by OrderLine.Parent is a transient object, and we get the corresponding exception.

There are a variety of ways to fix this. All of them work, in that they allow the code above to run and make the correct INSERT statements in the database.

  1. Remove the NHibernate NotNull constraint on the OrderLine.Parent property. This lets NHibernate save the OrderLine before it’s saved the order. I don’t like this for two reasons. Firstly, the NotNull constraint reflects a genuine constraint: an OrderLine must always belong to an order. Secondly, that NotNull constraint is probably reflected on the database column definition, so we’d have to change the database schema, removing a meaningful and useful constraint. In general, I don’t like having to change what is a genuine reflection of the domain just to make things convenient for the framework I’m using.
  2. Explicitly flush the session after adding the Order and OrderLine to the Customer aggregate, and before saving the new OrderLineState. I don’t like this one either. It requires my domain code to explicitly deal with session management. We usually consider this to be an infrastructural concern. We have a web app, so we handle session and transaction management at the request level.
  3. Add a cascade on the BelongsTo relation OrderLine.Parent.

Introducing this bidirectional cascade seems like the best (or at least, least-worst) solution. But it doesn’t quite seem right.

Is there anything wrong with a bidirectional cascade?

As I discussed earlier, it feels like cascades should run from parents to children, not the other way, and not in both directions. This isn’t written down as a rule or guidance anywhere that I can find in the NHibernate documentation, but it seems to be the standard approach. Indeed when I tweeted about this, I got a reply from none other than one of the authors of NHibernate In Action himself:

@ascordellis Cascade on both sides feels heavy, are you sure that both sides need to be the big daddy of the relationship?

For now, I’m leaving it in. For our problem, it’s the least intrusive solution. But I’d love to hear your opinion. Is there a better way to solve our problem? Should NHibernate just deal with it for us?