Thursday, 4 November 2010

RESTful architecture: what should we PUT?

I've recently been reading REST In Practice, by Ian Robinson, Jim Webber and Savas Parastatidis. I had some knowledge of the hypermedia-driven architectural style, but this book has really helped me clarify my understanding of both the how and why. It's also convinced me that I should definitely consider Atom for event-based integration requirements.

There was one concept I found puzzling. In Chapter 5 (the callout box on page 114, if you have the book), the authors recommend using POST to update the state of a resource. This is driven by a choice of interpretation of the semantics of PUT. According to the HTTP spec:

If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server.

In the book, the authors interpret this to mean that the body enclosed with the PUT request should contain the same elements as the representation served by GET requests at the same URI. Since we are using the HATEOAS style, representations include links and other hypermedia controls. The implication is therefore that clients should PUT representations containing both business data (for example the new contents of the coffee order) and hypermedia controls (the available next steps in the workflow). To quote the book:

this obliges a client to PUT all the resource state, including any links, as part of the representation it sends

The problem here is that the client has no business determining what workflow steps are available. It's the server's job to understand what steps are available given the current resource state, and advertise those to clients using hypermedia controls. Therefore, we don't want the client to send the complete representation including both data and hypermedia controls.

I see four potential resolutions to this conflict:

Use PATCH
The PATCH HTTP verb is designed to explicitly support partial updates to a resource. However it's not widely supported. It also feels semantically wrong to me. From the client's point of view, the business data comprising an order (how many lattes?) is the whole resource. The client doesn't see this as a PATCH, but as a replacement, i.e. a PUT. PATCH might make sense to express concepts like "use skimmed milk in the latte, instead of the full fat I originally ordered".
Use POST
This is the approach suggested in the book. For me, it has similar drawbacks to PATCH. POST implies appending to a resource. POSTing one cappuccino to a coffee order resource feels like it should add one cappuccino, not replace the existing set of ordered coffees with one cappuccino.
Use PUT, including hypermedia
To follow the strict interpretation that PUT should include entire representations, the client sends both the entire new coffee order and whatever hypermedia controls the service last sent it. The service then ignores these controls, since it's the service's job to determine what they should be, and on future GETs the service sends whatever controls it deems appropriate at that time. This feels nasty; we are sending unnecessary data just to satisfy some architectural OCD (hat tip to Seb Lambla for that phrase!).
Use PUT, don't include hypermedia
The client sends a complete representation of the new order, but no links. To me, this feels conceptually right. The client fulfils the expectations of PUT by sending a complete representation of the parts of the data for which it is responsible, but does not pretend to be responsible for determining what hypermedia controls are available.

I explored some of these thoughts through a twitter conversation with @serialseb, @iansrobinson and @jimwebber. Being able to explore these thoughts in conversation with some of the experts is incredibly rewarding, compared to sitting alone pondering, so thanks to those guys for their contributions. Ultimately we came up with a simple rule of thumb, which seemed to attract mutual agreement:

In response to GET requests, services serve complete representation of the current known state, including business data and available hypermedia controls. Clients PUT complete representations of the parts for which they are responsible.

HTTP places two significant expectations on PUT requests: idempotency, and the concept that the enclosed entity-body is a complete representation. This rule of thumb satisfies both, while absolving clients of the need to include data over which they have no control or responsibility, provided we accept that GET and PUT representations need not be the same.

This raises another question. Available hypermedia controls are just one example of representation elements that only the service should be generating. Other examples include service computed values (for example the total cost of a coffee order) and resource state that is owned by the service (for example, whether or not the order has been paid for). We don't want the client updating the total cost, or sending state fields telling the service that the client has paid, when it hasn't. Either of those could be bad for business! Should these data items be included in client PUT requests? My feeling is no. According to the rule of thumb, clients only include the parts for which they are responsible. However I have a feeling that this may be a controversial standpoint. From the twitter conversation, I learned that the authors aversion to partial PUT came from none other than Mark Nottingham.

Disclaimer: I hope that I have not misrepresented the viewpoint of the book and its authors. If I have, I apologise and welcome feedback and corrections.

11 comments:

  1. In this case, could you not remove the ambiguity by placing the content of the order under a different URI to the details of the entire transaction?

    The client can then PUT or POST to the order content URI to their heart's content with replace and append semantics intact. They retrieve the current state from the "root" URI, including total cost and so forth which they can only GET.

    I'm only familiar with REST so I'm probably violating several fundamental points with this suggestion.

    ReplyDelete
  2. Regarding the use of the word "responsible", @mikekelly85 makes the good point: "how is that responsibility supposed to be determined/visible for a given request?" I'm tempted to drop the word "responsible". The client PUTs back a representation of _everything_ it "cares" about, schemas permitting; the server then adjusts, enriches, etc.

    Clients can always PUT bad data (marked-down prices, etc). The server's not obliged to take these details at face value; again, it can adjust or enrich the resource state according to some private business logic/business constraints. That is, the client declares a possible future resource state in its PUT, the server aligns resource state to reflect this declaration, at the same time applying these private business rules constraints. A subsequent GET reflects the state of the resource after the server has finished applying these constraints in the course of applying the PUT.

    Kind regards

    ian

    ReplyDelete
  3. Missed the twitter discussion! It looks you were having fun.

    Ok.
    1. To me, HTTP operations are nor for CRUD, that is why we end up in this type of discussions.

    2. There are two options for update. The PUT followers use PUT for updating and POST for adding. The POST followers (maybe a minority) use POST to update and PUT for creating new resources.

    3. The problem of PUT for creation is often related to the fact the client needs to provide the URI. The problem of PUT for updating is related to "full replace" behavior it has, meaning you should be careful with multiple writers (which is well discussed in the book).

    4. Then I can add another issue to the PUT for update: Multiple representations formats. See, what if I want to put using JSON?

    5. Let's continue. What about the representation as a partial resource? A representation may not need to include all data from a resource. I may need to work with the header of a bill, and the server may send just that part, the details are of no use to me.

    See? PUT update should not force clients to know how to handle the full resource, nor force the server to send up to the minimal detail of a resource implementation in a representation.

    Also, server is the owner of the resource, and it should validate if the representation content it receives to be PUTted, matches, contains discardable data or generates a conflict.

    That is also useful when different writers have different security levels, e.g. different visibility or different authorization to update certain data.

    Fun, isn't it?

    Regards

    ReplyDelete
  4. Thanks everyone for their comments. If this discussion proves anything, it's that this is not a simple topic and there's a lot of interesting depth here!

    Garry: Separating the hypermedia controls and content could make sense. We'd then be modelling the order content and the workflow protocol as separate resources. This is discussed in section 5.4.4. of the book, if you have it. We still need a hypermedia link from the content resource to the protocol resource, otherwise we lose the discoverability. I think this is a valid approach, but is more about resource modelling and I'm not sure it addresses the deeper issues that Ian and William are raising.

    Ian: That seems very sensible to me. One outstanding question might be: what should a service do with a PUT which is partially valid (change contents of coffee order) and partially invalid (set price to 1 penny)? A client might be surprised to receive a 2xx response, but then find that the service 'ignored' part of the PUT body. I could understand a disgruntled client developer blaming a bug on the service 'silently failing' in this way. Perhaps the lesson is: if we are going to accept (2xx,3xx) PUTs and not accept every data item in the body, we have to be explicit about it in our media type to ensure that client application developers understand that it can happen. If we don't publish explicitly, I feel we should respond with 4xx to avoid 'silent failure'

    William: I don't quite follow what you mean by the multiple writers problem. I don't have the book to hand at the moment - will take a look later. I don't see the problem with multiple representation formats. Probably I'm missing something, but I see XML and JSON as a choice of preferred bracket shape with limited impact on the meaning of the content. (There is of course more than this, e.g. the whole XML ecosystem of namespaces, schemas etc, but I don't see a direct link to this topic).

    If the header is important as a distinct entity in its own right, should it not be a separate resource? Otherwise how would a service know that client A only cares about the header, while client B cares about the header and the detail?

    Agree completely that the service is responsible for validation etc and enforcing security rules. See my response to Ian above for a question about some potential implications of that.

    ReplyDelete
  5. Sure Alex. Glad to explain a little bit more.

    1. Let’s start by saying we cannot give for granted the actual format of the message. That is, unless the server states it only serves XML. So, in an ideal world, the server manages the representation of the resource in several formats to serve different client needs, be that XML and JSON. But that is a minor problem. It was referred to the “send all back, including the links” approach. The client may receive the XML representation with all links in it, but it may want to PUT a JSON representation. See the point? The actual value of the representation is the important thing, the additional control info or format is secondary and should not be taken into account. Here we have another discussion, I recall, about defining a different resource per representation. Some suggested creating one resource for JSON and another for XML, or at least having different URIs for each although sharing the same resource. That also adds more spice to the PUT discussion.

    2. The multiple writers’ problem, in concurrency, refers to the problem of two writers reading an actual value, then each of them making different changes and posting them. You have a race condition there.

    3. Header example. It may. But then, for a bill, you have a header as a separated resource and the detail lines as other resources. It is a design decision, and I will be always in favor of using a complete entity, a bill. Still, some clients may need to know just general bill info and not the detail. The client should have a way to get a representation of the resource with just the information it needs.

    4. Again, that also applies to the security issue. For a resource, a client may be able to see only partial resource info. And from that, it may be able to update just a small field. It makes no sense to partition the resource to little resources to manage that atomicity.

    5. And something I forgot to mention, related to the PUT semantics. The URI in the PUT denotes an identifier or a place (like a locker) where I put the resource. Whatever is there, it is replaced by a new resource I create from the entity represented in the body of the request. That is the idea. That would mean the new resource should be created entirely from the data contained in the request. A partial representation in the request will create a partial new resource, given the missing parts are optional or have a default value.

    6. See? If we follow that, a partial update is not possible, as we would replace the whole resource and there is no mechanism to say in the request “please fill in the blanks with the data of the resource already in that URI”.

    7. So, what to do? Partial updates may be done using POST. As the data sent in a POST is subordinated to the destination resource, it makes sense that resource performs updates to itself, by adding a new resource under its management, or updating its own data. POST is not required to create a new resource.

    What do you think?

    ReplyDelete
  6. Hi Alex

    I suggest 409 Conflict:

    "The request could not be completed due to a conflict with the current state of the resource."

    Kind regards

    ian

    ReplyDelete
  7. I made a blog post out of comments on this post:
    http://stage.vambenepe.com/archives/1665

    It starts this way:

    Alex Scordellis has a good blog post about how to handle partial PUT in REST. It starts by explaining why partial PUT is needed in the first place. And then (including in the comments) it runs into the issues this brings and proposes some solutions.

    I have bad news. There are many more issues.

    Let’s pick a simple example. What does it mean if an element is not present in a partial update? Is it an explicit omission, intended to represent the need to remove this element in the representation? Or does it mean “don’t change its current value”. If the latter, then how do I do removal? Do I need partial DELETE like I have partial PUT? Hopefully not, but then I have to have a mechanism to remove elements as part of a PUT. Empty value? That doesn’t necessarily mean the same thing as an absent element. Nil value? And how do I handle this with JSON?

    And how do you deal with repeating elements? If you PUT an element of that type, is it an addition or a replacement? If replacement, which one(s) are you replacing? Or do you force me to PUT the entire list? No matter how long it is? Even if it increases the risk of concurrency issues?

    more...


    William (@vambenepe)

    ReplyDelete
  8. The problem is caused by mixing responsibility or ownership in one resource, and being distracted by the red herring of using HTTP methods to "edit that data" to convey client intent.

    You should have /two/ resources: a client order telling the server what /it/ wants, and a server "ticket" that confirms the details back to the client - and adds whatever extra data, links, etc, it wants.

    The server's ticket has a dependency on that client order, /plus/ on any internal or external process state it's privy to in the execution of the order.

    But who hosts the client order? Why, the client of course! Well, unless you have an asymmetric set-up, in which case the client can POST and/or PUT its order somewhere on the server.

    Now the client submits and changes the /whole/ order resource in one go, without worrying about any server-generated bits.

    Oh - and the client order /can itself/ have links, if there's value in the client supplying them. For example, a link to a prior RFQ or links to order items in a catalogue.

    Cheers!

    Duncan Cragg

    ReplyDelete
  9. This excellent debate has rather outgrown a blog comments section! I've posted a brief summary and links to the rest-discuss mailing list [1]. It's currently awaiting moderation and will hopefully appear soon.

    Looking forward to continuing the discussions there!

    [1] http://tech.groups.yahoo.com/group/rest-discuss/

    ReplyDelete
  10. I think the 'solution' is to PUT to the same URI, but use a different 'Content-type', one that is not a hypermedia type.
    For example you could PUT an update to http://example.org/alex/address/street using 'multipart/form-data'.

    Conceptually this means that links are not part of a resource, but only of its representation.

    ReplyDelete