Distributing an application over a network isn't just a case of splitting it down a natural line and putting a network in-between. What works in-process simply doesn't work so well across the wire.

And just calling such an Internet version of application and process interfaces 'Web' Services doesn't mean it has anything at all to do with the Web, or that it in any way shares the Web's scalability, flexibility and robustness.

Indeed, I claim that you cannot distribute without also 'inverting'; you have to face what I call the 'Imperative-to-Declarative Inversion', if you really want a successful, scalable, distributed application.

Declarative Architectures such as REST (i.e. the Web, and now 'Web 2.0') dominate the broader Internet.

Take a simple application: calendar events. Let's first see what the Web Services approach is to distribution...

We'll start with an object-oriented calendar events class that hides the data and gives us methods to read from and write to it, plus some methods to add and subtract days, to calculate the number of seconds till due, etc. The calendar class interface is public and its data is encapsulated and private.

Now let's put this calendar event class on the Internet. We run up a Web Service that makes our calendar event class look much like it did before, with methods or, in fact, functions exposed by an interface. We lose a bit of object-orientation, but that is considered acceptable these days, after some bitter experiences with CORBA...

In fact, synchronicity is also frowned upon these days after similar lessons of the past. So let's help our Web Service a little by making it asynchronous. We'll drop off our commands and either come back every so often to see if they're done, or wait for a callback to tell us so. Instead of many synchronous functions, we have message schemas describing jobs, along with the needed data. Results come back as messages, too. The public message interface still talks of actions and functions that can be called or invoked. However, it acknowledges the constraints of the network by running asynchronously and by having coarser-grained functions: the messages are chunkier now, perhaps called 'documents'.

The difference with the distributed version is the noticable lag in interaction and the demise of the service when disconnected or under network outages. But it's acceptable on a good LAN connection.

Now, look at the benefits. Our calendar events application can be shared (very important for calendars). It's available and accessible to other applications on the network. Someone else can manage it, who are perhaps better at it than us. It can be accessed regardless of operating system and programming language. A number of authorized users or clients can create, manage, process calendar events.

It may even be possible to export calendar events. The Web Service might have getCalendarEvent(id), getCalendarEvents(date), etc. request messages - and may return XML documents now that it uses a chunkier, message- or document-based interface. This doesn't necessarily break encapsulation because the syntax of the exposed data is considered part of the interface, with standards, versioning, contracts, etc.

We're now pretty much up-to-date with the latest and/or best practice in the Web Services industry. Rich, versioned invocation interfaces (WSDL, Message Schemas), hidden or encapsulated data.

 

'Web' Services??

The above scenario is fine for small-scale, self-contained applications shared by a handful of regular clients.

But what if you actually want to publicise your calendar events? How do you advertise them and share them somewhere; somewhere like the Web?

Unfortunately the model fails us now: hidden data = outside of the Web!

Hidden data means no URLs. So how do you pass around a generally-understood reference, not to the service, but to a calendar event on the service? How do you index it? Google indexes pages, not services. How, indeed do you bookmark it?

Hidden data means no caches. Web Services standards don't concern themselves with cacheing. Cacheing is implemented in an ad-hoc manner because it's outside the model.

So scaling our calendar service for wide publishing can't be done with caches like it is on the Web (or by BitTorrent). It can only be done at the service operator, by adding fatter pipes and fatter machines.

This is where the 'Web' of 'Web Services' starts to look like a serious misnomer. There's none of the scalability and interoperability of the Web, because the data's hidden!

You can't index it, bookmark it or advertise it; you can't cache it.

Clients of the 'Web' Service may end up screaming to be able to see their locked-up data and to break the rules of encapsulation. To start to realise some of the proven benefits of the Web itself, which is hugely scalable, bookmarkable, indexable, linkable, cacheable.

Just having getCalendarEvent(id), etc. export functions doesn't make you part of the Web - and the incentive to export to a known MIME type and schema is low when the whole of the rest of the interface is defined for this particular service.

Oh - and WS-Addressing and WS-Transfer may create a parallel WS-Web, but it's not the one most of us will be using.

 

Public Data and Open Formats are OK!

The Web has shown us that, in contrast to the hidden data of object-oriented programming, public data and open formats are not only OK, they're incredibly desirable.

On the Web, data replaces service as our 'public interface'. And data thereby tends towards standard, stable or open formats.

Our service interface to the calendar event objects can be replaced by publicly-visible calendar event resources - perhaps in the open hCalendar XML format. (You can choose YAML instead of XML if you want, and you can restrict access if you want - I mean 'public' or 'open' data in principle.)

These calendar events are each given a URL and we tell everyone what to expect them to look like. The layout of the data is now our interface, and we have to apply the same rules of standards, versions, etc. that applied to our service interface.

They may be fetched using GET, and updated using POST. The data derivatives, such as time-till-due, can be added to our open format, retrieved by a separate query URL, or simply calculated by the client given the basic information. Many people may be authorised to edit them via POST, including adding and subtracting days or moving the time forward an hour.

Perhaps they could be 'Atomised', using the Atom Publishing Protocol to read and update them and an Atom feed for when events become due or for when they've been updated, etc.

 

Due Diligence

Naturally, before jumping headlong into such a Web-enabled world, we must show due diligence over the costs of this approach.

The main cost as seen by object-oriented programmers is that they have lost the ability for the class or application to change the internal data format or representation of calendar events without users seeing anything different in the application or object interface.

Of course, if at some point the data is exposed (whether exporting or serializing an object), then the need for versioning thwarts this benefit anyway in practical terms. Indeed, whatever is public needs to be controlled, agreed and versioned, whether it's our new data interface or the old service interface. We've just moved the constraint over.

Now, the data we're talking about exposing is not that inside some low-level library which has nothing to do with the domain of calendar events itself. Object-oriented tenets, and the Abstract Data Types that preceded them, undoubtedly have a value at these lower levels, where function-based APIs are ubiquitous and programmers need to leave themselves some breathing room behind the API to improve the implementation.

No, calendar events are a domain concept, not an implementation concept. Unlike users of implementation classes, domain users do care about the basic structure of their domain entities.

Indeed, users only fully understand 'what', not 'how'; they need to talk in terms of tangible things first, then the evolutions and constraints of those things second. They may never, and arguably should never, understand the mapping from those domain concepts that they understand into the threads and processes that programmers understand.

That's why there's so much business analysis that goes straight into database design, so much business analysis that goes straight into user interface design, so much domain analysis that goes into ontologies and message schemas. Business or domain level data will always be openly discussed: there's no reason not to expose and constrain domain data structures.

In fact, some of the key features of object-oriented programming map perfectly well to a purely data-oriented analysis. Inheritance, 'Duck Typing', Polymorphism, and even forms of data Encapsulation are all possible at the domain data level. This is the subject of another article, however...

 

The Inversion

Now that our calendar event data is opened, the data has become more important than the application. The importance of our shared data is underlined by its acquiring a URL: a universal handle on it.

You could even see the application as 'animating' the data from within (you only see the changes to the data, not how it happened, which tool was used client-side or server-side).

We have gone from public process wrapper around hidden state to public state animated by hidden process.

This is the Imperative-to-Declarative Inversion.

And moving stuff to the Internet is one of the quickest ways to feel the Force of Inversion! By joining the Web community, we get the benefits of bookmarkable, indexable, linkable, scalable and cacheable calendar events.

The fact is, the Internet is driven by data and events, not service interfaces. Distributed systems must face the Imperative-to-Declarative Inversion if they want to join the community of widely-deployed, scalable and successful systems that run over the Internet.

Imperative, Service-Oriented Architectures, 'Web' Services and SOAP will always be limited to small-scale deployments sold by the vendor consortia.

Declarative Architectures and REST will continue to dominate the broad Internet.