In an exclusive nine-part dialogue with an imaginary eBay Architect, we present an accessible discussion of the REST vs. SOA issue.
Although eBay have what they call a 'REST' interface, it is, in fact, a STREST interface, and only works for a few of the many function calls that they make available via SOAP.
In this dialogue series, I argue the case for eBay to adopt a truly REST approach to their integration API.
Part 8: WS-Are-You-Sure (Security, Reliable Messaging and Transactions)
Duncan Cragg: So, back to your list of Enterprise functions. We're on to what I'm going to call the 'WS-Are-You-Sure': Security, Reliable Messaging and Transactions.
Let's attack these Starting with the Web!
eBay Architect: We could start with Security: authentication, authorisation and encryption. For example, you have to keep some information secret on eBay. Like Invoices, Offer details. Reserve price on an Item. And you have to ensure only the owners of data can change it.
But with HTTPS you lose some of the benefits of using intermediaries, such as cacheing. If those intermediaries are untrustworthy, then you can use message-level rather than transport-level security: encrypt the resource state being transferred.
eA: Can't I use WS-Security for this?
DC: Possibly(PDF)! However, the benefits of cacheing may be lost in the time taken to package and unpackage each resource in turn. You may prefer a more lightweight approach as suggested in the Atom and AtomPub specs.
eA: How does REST handle authorisation: such as read and write permissions?
DC: As I keep saying, REST is about much more than simple data read/write services. In REST we don't have the generic concept of authorisation on a specific process execution, such as a command that could cause state change.
REST infrastructure is about state transfer, which is thus really only about 'read permissions'.
Everything else is business logic: it's up to the target resource to manage its reaction to incoming non-GETs and to decide if or how it should change in response, according to internal integrity constraints and the identity of the source. Resources are masters of their own destiny and must be aware of the identity of interacting parties at that level.
eA: What can you do to secure the infrastructure level below the business logic?
DC: The department managing the infrastructure can see data going either out (GET) or in (POST), and can see the target URIs. They can thus do both server- (URI) and client- (request header) based security and partitioning.
For read permission, it's possible to implement a low-level lookup from the identity in the request header to whatever URIs they can GET. They can enforce simple rules at that level like 'only GETs are allowed on these URIs unless the client is in this list'. They can groom more and less sensitive traffic to different servers.
eA: Any more Security advice?
DC: Paul Prescod has written some notes on REST security.
Finally, remember to keep sensitive data out of those highly-propagatable unencrypted URIs by using POST instead of GET when submitting queries; another reason to use URIs that are literally opaque, not just treated as opaque operationally.
eA: Another of the WS-* specifications deals with Reliable Messaging. How does REST give me the assurances I need that an important message - such as a new Offer on an Item or a ResponseToBestOffer, or an Invoice - will be delivered? In the right order? I can't just rely on POST, as you suggested before, if I really care about this.
DC: In REST, there are no command messages that have to make it through. There's only state that may or may not need to be reliably transferred - or that may or may not need to be notified in a timely manner.
In the eBay example, as I described it before, "if you keep re-POSTing the same Invoice, or Item or Offer, it only gets created once".
eA: Ah! Define 'same'!
DC: If, as in this eBay example, the successful POST creates a server-side copy with its own new URI, then the Item, Invoice, etc, must have some uniquely identifying information on it. It could perhaps have a Message-ID header or get cheap, unique URIs minted for it from the server in advance. Alternatively, when the POSTed resource already has a URI itself on the 'client', then it's obviously the same each time it's POSTed.
When used as state notification, POST must be idempotent; repeatable.
So if the initial POST fails, just keep POSTing until you can see the appropriate response, whatever that may be in business terms. On the pull or poll side, keep GETing until you see what you expect.
eA: So that's another issue you're side-stepping by dumping it into the business logic?!
DC: Only the business logic knows the following things: what signifies receipt of the notification; if it matters that the state didn't get through; how frequently to push or poll; whether it matters that state is out of date and by how much; and when to give up and tell someone.
Set the push or pull frequency and total number according to the business logic's view of the importance of that state transfer. Set cache control according to your domain's tolerance of stale data.
It's just like in real life: if something I sent doesn't get a response - in a form that is completely dependent on the type of recipient - then, after a time - which is also completely dependent on the type of recipient - I'll chase it up.
eA: Can't REST give any support here at all?
DC: Well, it would be easy enough to write a REST support library that implemented a simple API for specifying your constraints on a successful state transfer.
eA: Now, when you're a site like eBay, dealing with money all the time, you need the assurance that transactions give you. You need to make sure accounts are always consistent. But I suppose, like before, you're going to tell me that it'll all be fine in the end, right?
DC: Hold on. Let's not mix up financial transactions and database transactions! We'll first talk about the need for atomic units of work. Then see how to support financial transaction business logic.
Also, we're talking about units of work in public view, not hidden behind resources. Inside, it's up to a resource to ensure that its integrity and consistency are maintained through its interactions with others, and it's free to use transactions to achieve that internally if it wants, without exposing that to its clients.
eA: OK - so now say that it'll all be fine in the end!
DC: In a distributed system, you have to decide on what to give up out of Consistency, Availability and Partition Tolerance.
I have to say that eBay are actually fully clued here: that was a paper about 'BASE' by Dan Pritchett, Technical Fellow at eBay, in which he discusses the benefits of Eventual Consistency - i.e., knowing that it'll all be fine in the end! Especially if you tidy things up eventually.
eA: Gah! Ya got me there!
DC: Essentially, the rule of thumb is, use ACID internally, use BASE externally.
We're back to the inevitable inversion from internal imperative thinking to external declarative thinking.
As an imperative programmer you're inclined to want to take your internal programming style out into the distributed world - to think single-thread, central control: 'begin - do work - commit'.
But the importance of Availability and Partition Tolerance in distributed systems usually outweighs the importance of Consistency, leading the wise architect to a more relaxed, less imperative, more declarative approach.
eA: Such as REST.
DC: Indeed. REST without transaction support.
REST isn't a database model: in the same way REST doesn't imply simple read/write services, it also doesn't imply inert data that needs locking. And resources in REST should model active domain data, not low-level, domain-independent transaction paraphenalia.
eA: How does REST without transactions work, BASE-ically, then?
DC: A handy phrase that sums it up is intention puts the system in tension.
You start by declaring your intention that some state be true, which puts the system in tension - a tension that can only be resolved by the application of business logic constraints over each player in parallel, until the whole system settles or resolves into a new, consistent state.
eA: Examples, please!
DC: Think about how you'd do the classic transfer of funds between accounts, in the real world of loosely interacting, self-determined parties. Say inside a big company before computers came along, between an office that handles one account and an office that handles the other.
Your key resource is a signed declaration (the intention) by the payer that they are happy to have funds passed to the payee. As long as this fact doesn't appear in one account or the other, you have work to do (there is tension in the business rules).
eA: You've got to run around real quick with a piece of paper.
DC: It doesn't even need to happen all at the same time: you can visit one office, check the funds are available and deduct the amount, then wander over to the other office and tell them to increase the payee's balance. If you get waylaid and the auditors come, there is always the signed declaration and the account history available to resolve the situation.
You can enforce the constraint that no money appears to be in two places with the business rule that the payee account is only increased if the payer's account has an entry corresponding to the signed declaration.
eA: Mmm. Sounds a bit too loosely coupled to me.
DC: It's life outside of Central Control.
Consider hotel and flight booking: you don't lock the hotel and the flight while telling them all in a two-phase commit what your itinerary will be. You do 'optimistic locking' with compensation: if things don't work out, you cancel a booking. A system may tell you something is available, but when it comes to booking it may have just been taken.
The real, distributed, reactive world doesn't work in a lock-step fashion, so our distributed, reactive systems don't need to work that way to model it. Reality is much more like optimistic locking with the possibility of compensation or merge on conflict that, again, can only be defined at the business level.
eA: Why not do your optimistic locking below that? HTTP has support for it, right?
DC: In the same way that REST can support read permissions but is at the wrong level for write permissions, which are a business level concern, there is an asymmetry in read versioning versus write versioning.
While using Etags is great for optimising the reading and cacheing of data, I wouldn't use them in the optimistic locking pattern for writes that is supported by HTTP. The proper place for handling a mismatch of versions in an interaction is not in the HTTP headers.
REST should be about state declaration and intention, not absolute write commands. Only the business logic governing the evolution of a resource knows if, for example, it can go ahead and respond anyway to an edit request, even though it's possible that the sender has an out-of-date copy of it.
(c) 2006-2009 Duncan Cragg
In Part 9: Web Objects Ask, They Never Tell
Note that the opinions of our imaginary eBay Architect don't necessarily represent or reflect in any way the official opinions of eBay or the opinions of anyone at eBay.
Indeed, I can't guarantee that the opinions of our real blogger necessarily represent or reflect in any way the official opinions of Roy Fielding...