## On editing text

2012-05-10 § 17 Comments

Editing text is the opposite of handling exceptions; or, to put it another way, editing text is like exception handling but backwards in time. I realise this is an unexpected claim, so I hope you will permit me to explain. Although it has the ring of nonsense, there is a perfectly good sense in which it is just straightforwardly true.

**Added 2014-06-18**: it turns out that the story is more interesting than I realised when I wrote this, and that almost all of the technical details below are wrong. I am in the process of writing an account of what I currently believe on this subject.

Ah yes, category theory. Our old friend. Elucidating structural connections between apparently disconnected topics since 1945. Let me tell you a story.

I have been toying with a simple piece of collaborative text-editing software that may form part of a future Kiln project. Thanks to Google Docs you have probably experienced collaborative editing, where several people can edit the same document simultaneously on different computers, and see each others’ changes in more-or-less real time.

To implement collaborative editing, edits need to be propagated from one user to another. Let’s say Tom and Sarah are editing the same document: when Tom changes something, his change needs to be applied to Sarah’s copy; and when Sarah changes something her change needs to be applied to Tom’s. But internet communication is not instantaneous, so it will quite often happen that Tom and Sarah have both independently made a change before either has seen the other’s.

So now we need to apply Tom’s changes to Sarah’s copy, and Sarah’s changes to Tom’s copy, so that they both end up with the same document.

This is also the same problem a distributed version control system like git faces when merging two branches.

If you’re lucky enough to know some category theory, no doubt you are thinking that looks like a pushout (Wikipedia, nLab). I’ll keep the technical stuff for the end, so you don’t all tune out before we get to the good bit, but that idea turns out to work. A category is a general sort of mathematical gadget that consists of arrows that can be combined to make new ones: think functions or operations, which can be composed to form compound operations. There is a category whose arrows represent edits to a document, and this category does indeed have pushouts, and the pushouts give a canonical way to reconcile two divergent edits.

The nice mathematical properties of pushouts make this a useful tool. For example, the result will always be the same however the edits of the two parties are interleaved: we won’t ever get results that depend on the vagaries of the network.

But surely, I hear you cry, divergent edits don’t always *have* a canonical resolution. Suppose Tom and Sarah have a document that says “The cat sat on the mat”, and they both select the word cat, like this:

Then Tom types ‘dog’, and Sarah types ‘chicken’. How can this be reconciled? Well, if you compute the pushout of these two edits then something interesting happens: the resulting document is no longer linear. If it was originally like this:

then the resolution is like this:

I wonder if we could just go where the theory leads, and build a collaborative editor that allowed non-linear documents to exist. It’s a little bit like a conflicted file in git, except that git forces such a conflict to be resolved immediately. What would happen if we didn’t force it at all, and allowed users to relinearise documents in their own time, or even never? Deleting one of the two tracks to make the document linear again is a perfectly cromulent edit, so conflict resolution would not require any special-case handling. The challenge here is one of user experience design: could such a system be made comprehensible?

More conservatively, we might relinearise the document in some arbitrary way by picking one of the two parallel tracks. That would have a similar effect to what Google Docs does, as I understand it, where the result can depend on in which order the edits are processed by the server. But keeping a conceptual separation between the resolution algorithm and the relinearisation step helps to clarify the design. The resolution algorithm is robust, and well-behaved, and will never arbitrarily delete anyone’s work; all the trickiness and the trouble is confined to the linearisation step, which is the part that needs to be designed and tested with great care. Or the system could represent documents internally using a non-linear representation, but select one track to present to the user: perhaps we need only add some mechanism for switching tracks to turn this into the best of both worlds.

## Exception handling

It is a curious fact that the category of edits we’ve been using is already familiar to theoretical computer science, but in a different guise. Essentially the same mathematical structure has been used to model exception handling – the key difference being that, when it’s used for that, the arrows point in the other direction.

In Haskell it is called the Error monad. What we were thinking of as the character set, in the editing context, becomes the set of possible errors. The character positions in the *resulting* document – after the edit – correspond to possible *input* values, thanks to the reversal of direction. So, for example, inserting a character corresponds to throwing a particular error on a particular input.

So editing text really *is* the opposite of handling exceptions.

(Even the pushout operation has its uses in the exception handling model – where, because of the reversal of direction, it is called a pullback. It has been used by Cockett and Spooner to construct models for asynchronous parallel computation with exceptions. [ref])

## For the blessed and the brave

If you *are* lucky enough to know a bit of category theory, or if you’re up for following along anyway, welcome to the *really* good stuff.

So what is the category whose arrows are edits to a document? The fundamental editing operations are inserting new text, and deleting old text, so we want the arrows to represent some combination of insertions and deletions.

Let’s say the objects of the category are finite total orders. In fact for now let’s choose a skeleton of the category of finite total orders, and say the objects are natural numbers, which represent the length of a document. The arrows are combinations of insertions and deletions. So an arrow might represent an operation like this one:

which deletes the last letter and inserts two e’s, turning chest into cheese or frizz into frieze. It is obvious enough how to compose two edits: just perform them one after the other.

Each target position is mapped either to a source position or to a letter of the alphabet: a morphism *A*→*B* is a function from *B* to *A*+∑. And not just any function: it should be one-one on *A* and preserve the order of positions. In other words, it looks rather as though our category can be described as the opposite of the Kleisli category of the monad (– + ∑), on some sort of category of ordered sets.

In order to make this precise, we need to allow more general objects into our category than just finite total orders. Temporarily we’ll say the objects are (finite) sets equipped with a relation ◁ that partially orders some subset. We’ll still refer to the elements of the set as positions, and to the relation as the ordering, even though it is not necessarily a partial order of the whole set.

We define a category **RRef**. An object of **RRef** is a set *X* equipped with a transitive relation ◁_{X} on *X*, and a morphism *f*:*X*→*Y* is a function *X*→*Y* that reflects the relation, which means *f*(*a*) ◁_{Y} *f*(*b*) ⇒ *a* ◁_{X} *b* for all *a*, *b* in *X*. Let ∑ be the alphabet (the set of characters, say the set of Unicode code points) equipped with the empty relation: we’ll call any set equipped with the empty relation a *discrete* object. Then our category of edits is the opposite of the Kleisli category of the monad (– + ∑) on **RRef**.

But – in computer science that monad has a name. It’s called the *exception monad*. The idea is that ∑ is the set of possible exceptions, and a function may either return a result or throw an exception. This idea is modelled precisely by the Kleisli category of the exception monad, in which an arrow *A*→*B* is a function *A*→*B*+∑. It is an example of the way computational effects can be modelled using monads, as explained by Eugenio Moggi in his influential 1991 paper Notions of computation and monads.

So edits are composed in *exactly the same way* as we compose functions in the presence of exceptions – but back-to-front.

We needed to allow the order relation on **RRef** objects to be quite general, since we needed to equip ∑ with the empty relation, which is not even reflexive. Now we’ve formed the Kleisli category, we can restrict its objects to tamer ones, like say partial orders. (Restricting to *total* orders would mean that pushouts don’t necessarily exist, which rather defeats the object of the exercise.)

Now, those pushouts. The details were worked out by Cockett and Spooner in their 1995 paper Categories for synchrony and asynchrony. The important parts here are Proposition 3.8 and Example 3.10. Let’s say we have two edits like this:

Then the pushout concretely consists of a set of pairs. There are three types: either a pair of positions (*a*, *b*) with *f*(*a*)=*g*(*b*) for *a*∈*A* and *b*∈*B*, or else a pair (*a*, *σ*) with *f*(*a*)=*σ* for *a*∈*A* and *σ*∈∑, or finally a pair (*σ*, *b*) with *g*(*b*)=*σ* for *σ*∈∑ and *b*∈*B*.

The ordering relation is inherited using the rule

(*a*,*b*)◁(*a’*,*b’*) if *a* ◁_{A} *a’* or *b* ◁_{B} *b’*.

To be precise it is the transitive closure of the relation defined by that rule. Here (as elsewhere) we assume that characters *σ* are not in the order relation with anything, so for example (*a*,*σ*)◁(*a’*,*σ’*) if *a* ◁_{A} *a’* for *a*∈*A* and *σ*∈∑.

If you work through a couple of examples using this definition, you will see that it usually corresponds to the expected behaviour – and that it turns conflicts into parallel tracks.

Actually there is one class of cases where the pushout doesn’t do quite what I naively expected. Suppose Tom and Sarah start with a document containing the text “xy”. Tom replaces the x with “whisk”, so it reads “whisky”; and at the same time Sarah replaces the y with “anax”, so her copy reads “xanax”. I would have expected the resolved document to read “whiskanax”, but actually there is nothing to force this ordering, and the result is “whisk” and “anax” on parallel tracks. If there had been another character between the x and the y, the result would have been in the expected order; I wonder if it would be helpful to do the computation as though there were a hidden spacer character at every character boundary, so we get the expected result in situations like this. Or were my expectations wrong?

## A question for the cognoscenti

Is there any prior work on this category-theoretic approach to collaborative editing or distributed version control?

There’s something about your whiskanax example that makes me think that the category theory is being sensible for wanting to plead the 5th on the ordering of the edits in the pushout. Here’s another example to meditate on:

My initial document is “pear apple orange 2 3 4″.

Edit E1 takes it to “pear apple orange banana 3 4″

Edit E2 takes it to “pear apple 1 2 3 4″

Now if the human-intended semantics of E1 were “delete 2, insert banana after orange” and of E2 were “delete orange, insert 1 before 2″, then the resolution “pear apple banana 1 3 4″ makes sense.

If the intended semantics were E1: “replace 2 with banana” and E2: “replace orange with 1″, then the resolution “”pear apple 1 banana 3 4″ makes sense.

But the math doesn’t really ‘know’ what your intent is when the edited elements are so close in the ordering, so it makes a kind of sense for it to leave them incomparable in the pushout ordering. That being said, adding extra hidden spacer characters, like you say, seems like a legitimate way to force a preference for a more ‘replacey’ interpretation of edits, if that’s what you want.

(Or maybe I should have had the first pair of intended semanticses be E1: “delete 2, insert banana at end of list of fruits”, E2: “delete orange, insert 1 at head of list of numbers”)

(One further disambiguating note that came up during a discussion with someone else about this: I mean the tokens “pear”, “apple” etc. *as* elements of the alphabet. The spaces in my example strings are not meant to be characters themselves)

On your last question, I think that’s basically what darcs is based on.

Not quite.

The biggest similarity is that these are both “bag of patches” models, unlike the Git/Mercurial “DAG of snapshots” model. This means, I think, that Robin’s algorithm will do the Right Thing in cases where Darcs merges correctly and Git merges incorrectly – Git fails in these cases precisely because it doesn’t track the source of each line by composing a sequence of changes, but rather tries to guess it by diffing the snapshots at the start and end of the sequence. But the formalism’s quite different. Darcs assumes that a repository contains an ordered sequence of *patches*; the space of patches is required to satisfy the following axioms:

1) any two patches AB can be commuted past each other to get a new pair of patches B’A’,

2) all patches are invertible,

3) commutation commutes with inversion,

4) commutation is coherent.

For more precise statements, see Ian Lynagh’s paper “An Algebra of Patches”, available at http://urchin.earth.li/~ian/conflictors/paper-2006-10-30.pdf . Given these axioms, one can define a merge operation (see Ian’s paper). I’ve just sketched a proof that this merge operation gives a pushout, but I’m not entirely sure in which category! Anyway, I ran a Google search for “pushout site:darcs.net” and got 0 hits, so I’m fairly confident that this is not a well-established idea.

Some differences between Robin’s model and the darcs model:

– Robin’s patches fail axiom (2), since deletions don’t have unique inverses (as Robin proves here). Note that real-world diffs do include the contents of lines which have been deleted.

– I’m not sure what the state of the art on modelling conflicts is in the Darcs community, but I’m pretty sure it’s nothing like as simple as what Robin presents above. I remember someone saying on the darcs-users list a month or two ago that they still didn’t have a treatment of conflicts that covers all cases, but can’t now find the relevant message.

> I’ve just sketched a proof that this merge operation gives a pushout

… and after an embarrassingly long time, I realise that’s because in Darcs

every patch is invertiblesoevery context is isomorphic to every other context. Which makes the calculation of colimits trivial.Bah. Time for a gin, I think.

Thinking further, I don’t think I was right to describe this as a “bag of patches” model – the patches are effectively concrete diffs, and so must be applied in the right order (unlike darcs, which makes strenuous efforts to pretend that patches can be applied in any order). But the snapshots are implicit, so it might be better described as a “DAG of patches” model.

But given darcs’ origins in quantum mechanics, I’m amused by the parallel between Robin’s “merge creating parallel tracks, then relinearize” split and the split in QM between the unitary, deterministic evolution of the wave function and the nondeterministic collapse it undergoes on measurement.

Two minor points:

> It’s a little bit like a conflicted file in git, except that git forces such a conflict to be resolved immediately.

Not quite – it forces you to mark the conflict as resolved. It’s happy for you to

`git add`

a file containing conflict markers and commit it. I think it’s more accurate to say that git works with serialised possibly-nonlinear files. In fact, this is essential to how git’s recursive merge algorithm works – conflicts found when generating the pseudo-MRCA are just treated as text and passed forward for subsequent merges. Git only complains if there are conflict markers in the final result.> And not just any function: it should be one-one on A and preserve the order of positions.

It seems to me that reordering a file is a useful operation – perhaps you’re moving code from one function to another, for instance.

I’m not sure whether this is relevant, but your post reminded me strongly of a Dan Spiewak post about resolving collaborative edits (he even used similar diagrams!).

http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation

> he even used similar diagrams!

That at least should not be surprising – commutative diagrams are very general, and directed graphs even more so :-)

There are some obvious similarities: both systems are based on pushing around lists of changes rather than snapshots, for instance. But some obvious differences leap out on a quick skim:

1. OT (in Google Wave’s implementation, at least) works on XML domtrees rather than text documents, though I expect this could be handled by replacing

RRefwith some other category.1. OT does not allow for “documents with conflicts” – the linearisation step is performed immediately, or rather it’s not separated out from the merge step.

2. Consequently, it’s not automatic that all the diagrams drawn commute (that the procedure gives the same merge no matter what order you merge patches in), and the author talks as if this is a surprising result of Google’s clever implementation. The universal property of pushouts makes this automatic in Robin’s scheme.

3. In OT, the linearisation step depends on an explicit client/server distinction (at least in Spiewak’s description). Robin’s scheme is fully peer-to-peer.

4. In OT, each patch is tagged with a number or hash describing the document it was first applied to. I don’t think this is essential except for consistency checking (darcs does something similar, IIRC), and could probably be added to Robin’s scheme straightforwardly.

Reading that post, I’m even more convinced that Robin’s on to something here. In particular, allowing the user to resolve conflicts if necessary rather than trying to resolve them immediately by some arbitrary scheme seems like the Right Thing – the go[at] example given there is particularly horrible, because it would pass a spellchecker but is very unlikely to be semantically meaningful. Exercise for the reader: construct a sentence in which “Got”, “Goa” and “Goat” can all be used in the same position.

Mikołaj Konarski has done some sketches on categorical formalisation of Darcs at http://en.wikibooks.org/wiki/Talk:Understanding_Darcs/Patch_theory

[...] Editing text is the opposite of handling exceptions (via) [...]

The paper “CRDTs: Consistency without concurrency control” by Mihai Letia, Nuno Preguica and Marc Shapiro (http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf) describes a similar concurrent-editing system called Treedoc. In the terms used above, Treedoc relinearises documents by numbering all the peers (perhaps by IP address?) and declaring that, where text by peer n and peer m > n overlap, the text by peer n goes first. Deletions are handled by adding “tombstones” to the data structure, so as a matter of efficiency periodic garbage-collection is required, and the authors describe an algorithm for this: this step requires consensus among peers, but it’s not on the critical path for writes so the system can still provide high availability.

Following on from this paper, there’s been work in the distributed-database community on Convergent/Commutative Replicated Data Types as an approach to achieving eventual consistency without the need for explicit consensus protocols. @lenary kindly wrote me a summary at https://gist.github.com/lenary/6008876; the key paper appears to be http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf. I haven’t yet finished reading the paper, but Convergent RDTs define their merge operation in terms of the least-upper-bound operation of a semilattice, and LUBs are a special case of colimits.

I was just enjoying the blog when this post reminded me of more than a decade thinking about this kind of problem, more than a decade ago.

I dealt with this problem, in very much this way in my dissertation on collaborative editing (accepted in 2000), which presented ideas that were stable by ’95 in all key essentials. The importance of commutation was significant in operational transformation contexts, but I went to an an order-independent representation so that only a certain notion of logical dependency was required. While partial ordering was the key, I was working to define a complete set of editing operations, and so the work is mostly stated directly in terms of the partial order of set inclusion between sets of changes.

It’s readable online, at:

http://david.hub.agilepdf.com/Palimpsest

So my work really is a “bag of patches” model, with a change of notation so that patches depend on a locally extensible global address space, rather than diffs between portions of content.

The real meat of my work is definition of freely combinable move and copy operations, so that it’s possible to combine large changes (like rearrangement of sections or paragraphs), with smaller changes like copy revision, without central coordination.

Most of these ideas don’t pay off for code because of it’s internal structure, where documents require much finer granularity to represent meaningful changes.

It’s more than usually readable for a dissertation, and I think it summarizes the the state of the art back then pretty well. Since I left academia and didn’t publish followups, I’m not surprised it’s not well known, but I feel like I have to at least point it out.

[…] To understand collaborative text editing. […]

[…] two years ago I wrote about a category-theoretic treatment of collaborative text editing. That post is unique in the history of Bosker Blog in having been cited – twice so far that I […]

[…] To understand collaborative text editing. […]