On editing text
2012-05-10 § 12 Comments
Editing text is the opposite of handling exceptions; or, to put it another way, editing text is like exception handling but backwards in time. I realise this is an unexpected claim, so I hope you will permit me to explain. Although it has the ring of nonsense, there is a perfectly good sense in which it is just straightforwardly true.
Ah yes, category theory. Our old friend. Elucidating structural connections between apparently disconnected topics since 1945. Let me tell you a story.
I have been toying with a simple piece of collaborative text-editing software that may form part of a future Kiln project. Thanks to Google Docs you have probably experienced collaborative editing, where several people can edit the same document simultaneously on different computers, and see each others’ changes in more-or-less real time.
To implement collaborative editing, edits need to be propagated from one user to another. Let’s say Tom and Sarah are editing the same document: when Tom changes something, his change needs to be applied to Sarah’s copy; and when Sarah changes something her change needs to be applied to Tom’s. But internet communication is not instantaneous, so it will quite often happen that Tom and Sarah have both independently made a change before either has seen the other’s.
So now we need to apply Tom’s changes to Sarah’s copy, and Sarah’s changes to Tom’s copy, so that they both end up with the same document.
This is also the same problem a distributed version control system like git faces when merging two branches.
If you’re lucky enough to know some category theory, no doubt you are thinking that looks like a pushout (Wikipedia, nLab). I’ll keep the technical stuff for the end, so you don’t all tune out before we get to the good bit, but that idea turns out to work. A category is a general sort of mathematical gadget that consists of arrows that can be combined to make new ones: think functions or operations, which can be composed to form compound operations. There is a category whose arrows represent edits to a document, and this category does indeed have pushouts, and the pushouts give a canonical way to reconcile two divergent edits.
The nice mathematical properties of pushouts make this a useful tool. For example, the result will always be the same however the edits of the two parties are interleaved: we won’t ever get results that depend on the vagaries of the network.
But surely, I hear you cry, divergent edits don’t always have a canonical resolution. Suppose Tom and Sarah have a document that says “The cat sat on the mat”, and they both select the word cat, like this:
Then Tom types ‘dog’, and Sarah types ‘chicken’. How can this be reconciled? Well, if you compute the pushout of these two edits then something interesting happens: the resulting document is no longer linear. If it was originally like this:
then the resolution is like this:
I wonder if we could just go where the theory leads, and build a collaborative editor that allowed non-linear documents to exist. It’s a little bit like a conflicted file in git, except that git forces such a conflict to be resolved immediately. What would happen if we didn’t force it at all, and allowed users to relinearise documents in their own time, or even never? Deleting one of the two tracks to make the document linear again is a perfectly cromulent edit, so conflict resolution would not require any special-case handling. The challenge here is one of user experience design: could such a system be made comprehensible?
More conservatively, we might relinearise the document in some arbitrary way by picking one of the two parallel tracks. That would have a similar effect to what Google Docs does, as I understand it, where the result can depend on in which order the edits are processed by the server. But keeping a conceptual separation between the resolution algorithm and the relinearisation step helps to clarify the design. The resolution algorithm is robust, and well-behaved, and will never arbitrarily delete anyone’s work; all the trickiness and the trouble is confined to the linearisation step, which is the part that needs to be designed and tested with great care. Or the system could represent documents internally using a non-linear representation, but select one track to present to the user: perhaps we need only add some mechanism for switching tracks to turn this into the best of both worlds.
It is a curious fact that the category of edits we’ve been using is already familiar to theoretical computer science, but in a different guise. Essentially the same mathematical structure has been used to model exception handling – the key difference being that, when it’s used for that, the arrows point in the other direction.
In Haskell it is called the Error monad. What we were thinking of as the character set, in the editing context, becomes the set of possible errors. The character positions in the resulting document – after the edit – correspond to possible input values, thanks to the reversal of direction. So, for example, inserting a character corresponds to throwing a particular error on a particular input.
So editing text really is the opposite of handling exceptions.
(Even the pushout operation has its uses in the exception handling model – where, because of the reversal of direction, it is called a pullback. It has been used by Cockett and Spooner to construct models for asynchronous parallel computation with exceptions. [ref])
For the blessed and the brave
If you are lucky enough to know a bit of category theory, or if you’re up for following along anyway, welcome to the really good stuff.
So what is the category whose arrows are edits to a document? The fundamental editing operations are inserting new text, and deleting old text, so we want the arrows to represent some combination of insertions and deletions.
Let’s say the objects of the category are finite total orders. In fact for now let’s choose a skeleton of the category of finite total orders, and say the objects are natural numbers, which represent the length of a document. The arrows are combinations of insertions and deletions. So an arrow might represent an operation like this one:
which deletes the last letter and inserts two e’s, turning chest into cheese or frizz into frieze. It is obvious enough how to compose two edits: just perform them one after the other.
Each target position is mapped either to a source position or to a letter of the alphabet: a morphism A→B is a function from B to A+∑. And not just any function: it should be one-one on A and preserve the order of positions. In other words, it looks rather as though our category can be described as the opposite of the Kleisli category of the monad (– + ∑), on some sort of category of ordered sets.
In order to make this precise, we need to allow more general objects into our category than just finite total orders. Temporarily we’ll go very general indeed and say the objects are (finite) sets equipped with a transitive relation ◁. We’ll still refer to the elements of the set as positions, and to the relation as the ordering, even though it is not necessarily a partial order – or even a preorder – in the conventional sense.
We define a category RRef. An object of RRef is a set X equipped with a transitive relation ◁X on X, and a morphism f:X→Y is a function X→Y that reflects the relation, which means f(a) ◁Y f(b) ⇒ a ◁X b for all a, b in X. Let ∑ be the alphabet (the set of characters, say the set of Unicode code points) equipped with the empty relation. Then our category of edits is the opposite of the Kleisli category of the monad (– + ∑) on RRef.
But – in computer science that monad has a name. It’s called the exception monad. The idea is that ∑ is the set of possible exceptions, and a function may either return a result or throw an exception. This idea is modelled precisely by the Kleisli category of the exception monad, in which an arrow A→B is a function A→B+∑. It is an example of the way computational effects can be modelled using monads, as explained by Eugenio Moggi in his influential 1991 paper Notions of computation and monads.
So edits are composed in exactly the same way as we compose functions in the presence of exceptions – but back-to-front.
We needed to allow the order relation on RRef objects to be quite general, since we needed to equip ∑ with the empty relation, which is not even reflexive. Now we’ve formed the Kleisli category, we can restrict its objects to tamer ones, like say partial orders. (Restricting to total orders would mean that pushouts don’t necessarily exist, which rather defeats the object of the exercise.)
Now, those pushouts. The details were worked out by Cockett and Spooner in their 1995 paper Categories for synchrony and asynchrony. The important parts here are Proposition 3.8 and Example 3.10. Let’s say we have two edits like this:
Then the pushout concretely consists of a set of pairs. There are three types: either a pair of positions (a, b) with f(a)=g(b) for a∈A and b∈B, or else a pair (a, σ) with f(a)=σ for a∈A and σ∈∑, or finally a pair (σ, b) with g(b)=σ for σ∈∑ and b∈B.
The ordering relation is inherited using the rule
(a,b)◁(a’,b’) if a ◁A a’ or b ◁B b’.
To be precise it is the transitive closure of the relation defined by that rule. Here (as elsewhere) we assume that characters σ are not in the order relation with anything, so for example (a,σ)◁(a’,σ’) if a ◁A a’ for a∈A and σ∈∑.
If you work through a couple of examples using this definition, you will see that it usually corresponds to the expected behaviour – and that it turns conflicts into parallel tracks.
Actually there is one class of cases where the pushout doesn’t do quite what I naively expected. Suppose Tom and Sarah start with a document containing the text “xy”. Tom replaces the x with “whisk”, so it reads “whisky”; and at the same time Sarah replaces the y with “anax”, so her copy reads “xanax”. I would have expected the resolved document to read “whiskanax”, but actually there is nothing to force this ordering, and the result is “whisk” and “anax” on parallel tracks. If there had been another character between the x and the y, the result would have been in the expected order; I wonder if it would be helpful to do the computation as though there were a hidden spacer character at every character boundary, so we get the expected result in situations like this. Or were my expectations wrong?
A question for the cognoscenti
Is there any prior work on this category-theoretic approach to collaborative editing or distributed version control?