Revisiting “On editing text”

This document is an incomplete draft.

About two years ago I wrote about a category-theoretic treatment of collaborative text editing. That post is unique in the history of Bosker Blog in having been cited – twice so far that I know of – in the academic literature; so it’s a little embarrassing for me to have to explain that it is almost entirely wrong. The good news is that the core idea can be rescued, and the corrected story is quite interesting. Other writers on this subject seem to have made at least some of the same mistakes I did, so I hope this will be useful to at least a few other people too.

I was considering the problem of collaborative text editing, where several people separately and simultaneously are editing the same document and communicating their changes asynchronously to each other:


Each editor must then incorporate the other’s change into their version of the document, which must be done in such a way that they both end up with the same thing:


The hard case is where Tom and Sarah’s changes overlap: say they have both changed the same word to something different. In the example I used, the document read “The cat sat on the mat”, and they both select the word cat:

The cat sat on the mat

Then Tom types ‘dog’, and Sarah types ‘chicken’. This appears to be unresolvable if we require our documents to be linear – but we can achieve a canonical resolution if we permit non-linear documents:


In general the resolved document contains all the characters that neither editor changed, together with all the characters inserted by Tom and all the characters inserted by Sarah, ordered in the minimal way that is consistent with both documents.

So far so good. Now it starts to go wrong. There are two fundamental problems: pushouts imply pathological documents, and deletions lead to disaster. In turn:

Pushouts imply pathological documents

I said this resolution was a pushout in a category of edits. That is not true as I stated it, and moreover I think the examples show that we don’t want it to be true.

My previous example will not do here, since “dog” and “chicken” have no letters in common. Let us suppose, then, that instead of dog and chicken we have badger and donkey:


Let us recall what it means for the resolution to be a pushout. Start with a document that we shall call A, and suppose that, simultaneously:

  • Tom makes an edit, call it f, resulting in the document B;
  • Sarah makes an edit, call it g, resulting in the document C.

These edits are propagated: both parties run the resolution algorithm, and then Tom applies Sarah’s edit and Sarah applies Tom’s. Let’s label the resolved version of f as f’, the resolved version of g as g’, and the resulting document P.


Now, the properties of the pushout require that, given any resolution of the two edits (“The donkey sat…” and “The badger sat…”) there must be a unique edit from the pushout to this other resolution. One possible resolution would be to insert letters into the name of the animal to make “badonkeyger”:



In other words, the properties of the pushout require us to admit an edit that takes the parallel donkey/badger and flattens it out, identifying the two d’s.

This seems perverse. It certainly is not in line with my intuition about what sorts of edits ought to be permissible. But it gets worse! By admitting these strange edits that merge letters from parallel tracks, we have opened the door to yet stranger things such as cyclic documents. Consider the following pair of edits that merge two parallel tracks. The bold pink d in badonkeydonkger indicates the image of the  ‘d’ in badger: it could have been merged with either of the two d’s in donkeydonk. Resolving this pair of edits will necessarily create a cyclic document.


Samuel Mimram and Cinzia Di Giusto’s article A Categorical Theory of Patches – rather like my earlier On Editing Text – develops a theory where edits are resolved as pushouts. Unlike me, however, they recognised this issue, which they deal with by admitting cyclic documents into their theory. I shall take the alternative approach of arguing that, on the contrary, these examples show that pushouts do not correctly model edit resolution.

It’s okay that our resolutions are not pushouts, because they still have the property we really wanted in the first place: that edits can be resolved in any order and give the same answer. (And it’s good that they’re not, because if they were we would have to admit circularities etc.)

Deletions lead to disaster

The second problem is worse, because we get different answers depending on what order we process the edits. To resolve it we’ll have to change the way we represent deletions.

We’ll start with the document “ABC” and consider three different edits of this document:

  1. Insert v and w to get “AvBwC”;
  2. Insert x and y to get “AxByC”;
  3. Delete the B to get “AC”.

Now, combining (1) and (2) gives:


And combining that with (3) gives:


On the other hand, if we resolved these edits in a different order then we get a different answer. Combining (1) and (3) gives “AvwC”, then combining that with (2) gives:


So this resolution procedure is not order-independent, which is very bad. In particular it cannot be a pushout, which I believe shows that Mimram and Di Giusto’s Theorem 20 is false.

Avoiding deletions using tombstones

We can avoid this problem by not allowing characters to be deleted. When a user deletes a character, rather than removing it from the internal representation of the document, replace it with a tombstone marking where the character used to be.

We don’t need to keep these tombstones around for ever, I don’t think. Once we’re sure that an edit has been incorporated by all clients, there is no harm in discarding the tombstones it produced.

An object of the category – a document – is a triple (A, ≤, s) where (A, ≤) is the poset of positions and s: A → Σ assigns a symbol to each position.

A morphism f: A→B is a function such that for all x, y in A:

  1. x ≤ y ⇔ f(x) ≤ f(y)
  2. Either s(f(x)) = s(x) or s(f(x)) is a tombstone.

The resolution R of  f: A→B and  g: A→C has positions A + (B-f[A]) + (C-g[A]). Note that A + (B-f[A]) is isomorphic to B, and A + (C-g[A]) is isomorphic to C. The order of positions is inherited from B and C:

  • If b ≤ b’ in B then b ≤ b’ in R
  • If c ≤ c’ in C then c ≤ c’ in R
  • b ≤ c, for b∈B and c∈C, if there is some a∈A such that b ≤ f(a) in B and g(a) ≤ c in C
  • c ≤ b, for b∈B and c∈C, if there is some a∈A such that c ≤ g(a) in C and f(a) ≤ b in B

The assignment of characters to positions is:

  • sR(b) = sB(b) for b∈B,
  • sR(c) = sC(c) for c∈C,
  • sR(a) is a tombstone if either sB(f(a)) or sC(g(a)) is; otherwise sR(a) = sB(f(a)) = sC(g(a)).

What about the connection to the exception monad?

As I understand the situation at the moment, that doesn’t work. Constructing the category of edits in that way seems to rely on having real deletions – which we’ve seen we can’t have.

This entry was posted in algorithms, category theory, Mathematics. Bookmark the permalink.

2 Responses to Revisiting “On editing text”

  1. Pingback: On editing text | Bosker Blog

  2. meygerjos says:

    I don’t think pushouts are bad in the way that you say. Let’s we start with a document “The sat”. I edit it to “The x sat” and you also edit it to “The x sat”. The pushout of our patches is not just “The x sat”, rather it is “The sat” where denotes that the ordering relation does not order the two x’s. This is because you could patch yours to “The xerxes sat”, adding “erxes” between “x” and ” sat”, and I could patch mine to “The xerxes sat”, adding “xer” and “es” on either side of “x”. These two patches do not factor through the patches that send both our patches to the same string “The x sat”, because this single “x” in “The x sat” couldn’t be sent to both “x”s in “xerxes” at the same time by a single map. The same logic applies to your example about “donkey” and “badger”. The “d”s must be duplicated in the pushout, or the pushout is not sufficiently universal.

    Another thing: we can’t know a patch just by knowing the source string and the target string. The string “a” could be changed into “aa” in three different ways: adding an “a” at the end, adding an “a” at the beginning, or deleting an “a” and adding “aa”. This is why the Mimram and Di Guisto paper models patches as partial injective monotonic functions that preserve letter assignments. I think git determines a patch just from the source and target string, which means that some patches are impossible for it to represent. (I’m not sure about it’s algorithm). I just wanted to point this out because in your post you usually talk about a patch just in terms of the source and target strings, and this might have been the source of the above confusion.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s