The algebra of Unix command substitution

2013-08-16 § 6 Comments

Shadab Ahmed raised an interesting question. Open a Unix command shell, type : '!!' and press return. Then type : "!!" '!!' and press return. Now repeat the following a few times: press the up arrow, and press return.

The result will be something like this:

bash$ : '!!'
bash$ : "!!" '!!'
: ": '!!'" '!!'
bash$ : ": '!!'" '!!'
: ": ': ": '!!'" '!!''" '!!'
bash$ : ": ': ": '!!'" '!!''" '!!'
: ": ': ": '!!'" ': ": ': ": '!!'" '!!''" '!!'''" '!!'

producing ever-lengthening strings. The first one has two occurrences of !!, the next has three, the next has five, etc. But these are not Fibonacci numbers! The sequence continues:

2, 3, 5, 17, 161, 15681, …

The shell substitutes each occurrence of !! for the previous command, unless it is inside single quotes.

So, what is the pattern? Can we predict how many there will be after n steps? I checked the OEIS and the sequence wasn’t there, so I knew it wasn’t something straightforward. Actually computing the strings becomes intractable after a couple more terms, because the sequence grows so rapidly. But there is a way to calculate the sequence.

We need to somehow keep track of whether each occurrence of !! is inside single quotes, so we can tell whether it is going to be expanded. Not only that! We need to be able to track how this changes after a substitution has been performed. The solution is to use the symmetric group S3, the group of all permutations of three things. The three things in this case are quotation states: 0=unquoted, 1=single-quoted, and 2=double-quoted. For example, the string '"' has the effect of swapping the single- and double-quoted states, so we would represent it as the permutation (1 2). The quote characters ' and " correspond to the swaps (0 1) and (0 2), which generate S3, so all six permutations can be represented by some string. Given a string, we can work out its permutation by doing the appropriate swap for each quote character and composing them: for example, the string "'"' is (0 2)(0 1)(0 2)(0 1) = (0 1 2).

Now we can represent a string containing !! codes as a polynomial with exponents from S3. (We’re using the group semiring \mathbb{N}[\mathrm{S}_3], for the mathematicians in the house.) The coefficients represent the number of occurrences of !! in each quotation state. We’ll assume that the quotes in the string are balanced eventually, i.e. that the entire string gives the identity permutation.

Let’s look at some examples. The string : '!!' has one occurrence of !!, and the string preceding that occurrence is : ' which has type (0 1), so the polynomial is just x(0 1). The next string in the sequence is : "!!" '!!', which has one occurrence of type (0 1) and one of type (0 2), so the polynomial is x(0 1) x(0 2).

The next one is : ": '!!'" '!!', which is x(0 2 1) + x(0 1).

Now, what happens when substitution is performed? Remember that the single-quoted occurrences of !! are not substituted, so we need to separate the polynomial into the part that undergoes substitution and the part that does not. The part that does not is the x(0 1) and x(0 1 2) terms, and the part that does is the other four terms. Let’s say we have a string represented by f(x), and we are going to replace its non-single-quoted occurrences of !! by the string represented by g(x). First we separate f into f1(x) + f2(x), where f1 is the substitutable part and f2 is the single-quoted part. Then the result of the substitution is represented by f1(x) g(x) + f2(x).

We can check this against the example. We are going to substitute : '!!' into : "!!" '!!', so we have f1(x) = x(0 2), f2(x) = x(0 1), and g(x) = x(0 1). The result is f1(xg(x) + f2(x) = (x(0 2))(x(0 1)) + x(0 1) = x(0 2)(0 1) + x(0 1) = x(0 2 1) + x(0 1), which is the same polynomial we computed directly. Phew!

This already gives a reasonably efficient algorithm, because there are only six coefficients to keep track of at each step. So the sequence of polynomials begins like this:

x(0 1)
x(0 2) + x(0 1)
x(0 2 1) + x(0 1)

Because of the way shell command expansion works, at subsequent steps the command is substituted into itself: pressing up-arrow and then return reissues the previous command, after all, so as far as the shell is concerned the current command is the same as the previous one. So from here on we combine each polynomial with itself to get the next one, and the next in the sequence is:

x(0 2 1)(x(0 2 1) + x(0 1)) + x(0 1)
= x(0 2 1)(0 2 1) + x(0 2 1)(0 1) + x(0 1)
= x(0 1 2) + x(0 2) + x(0 1).

Add up the coefficients to find the total number of !! occurrences in the string: three, in this case.

Continuing in the same way, we get:
x() + x(1 2) + x(2 1 0) + x(0 1) + x(0 1 2)
and then:
3x() + 2x(0 2) + 2x(1 2) + 2x(2 1 0) + 4x(0 1) + 4x(0 1 2).
followed by
25x() + 24x(0 2) + 24x(1 2) + 24x(2 1 0) + 32x(0 1) + 32x(0 1 2).
and then
2545x() + 2544x(0 2) + 2544x(1 2) + 2544x(2 1 0) + 2752x(0 1) + 2752x(0 1 2).

This is interesting! A pattern seems to be developing. It looks as though our polynomials all have the form

x() + p(x() + x(0 2) + x(1 2) + x(2 1 0)) + q(x(0 1) + x(0 1 2))

for some numbers p and q, which neatly separates the substitutable part from the single-quoted part. Let’s define

α = x() + x(0 2) + x(1 2) + x(2 1 0)
β = (0 1) + x(0 1 2)

so we can write these polynomials as x() + pα + qβ. If we could show that (x() + pα)(x() + pα + qβ) + qβ has the same form, then we know the pattern will continue for ever. Indeed this works! If you multiply out α2 and αβ, then you find that α2 = 3α + 2β and αβ = α + 2β, so we get

(x() + pα)(x() + pα + qβ) + qβ
= x() + pα + qβ + pα + p2α2 + pqαβ + qβ
= x() + 2pα + 2qβ + p2(3α + 2β) + pq(α + 2β)
= x() + (3p2+2p+pq)α + (2p2+2pq+2q)β

Thus we can continue the sequence using the recurrence relation

p(n+1) = 3p(n)2 + 2p(n) + p(n)q(n)
q(n+1) = 2p(n)2 + 2p(n)q(n) + 2q(n)

with base case p(4) = 2, q(4) = 4, and the number of occurrences of !! at the nth step (for n≥4) is 4p(n)+2q(n)+1.

Now it’s easy to write a program to compute the values:

p, q = 2, 4
for i in range(4, 16):
    print"%d: %d" % (i, 4*p + 2*q + 1)
    p,q = 3*p*p + 2*p + p*q, 2*p*p + 2*p*q + 2*q

Here are the next few:

4: 17
5: 161
6: 15681
7: 159591041
8: 16866847940875521
9: 189345699699803478502456213711361
10: 23891282133830642364466494435428811321279341197321685846692090881
11: 380489448557681961464558753339232227165919630391472190371135637819686949421263314703894423845640377201176155204913969665095280641

§ 6 Responses to The algebra of Unix command substitution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading The algebra of Unix command substitution at Bosker Blog.

meta

Follow

Get every new post delivered to your Inbox.

Join 690 other followers

%d bloggers like this: