Shadab Ahmed raised an interesting question. Open a Unix command shell, type `: '!!'`

and press return. Then type `: "!!" '!!'`

and press return. Now repeat the following a few times: press the up arrow, and press return.

The result will be something like this:

bash$ : '!!' bash$ : "!!" '!!' : ": '!!'" '!!' bash$ : ": '!!'" '!!' : ": ': ": '!!'" '!!''" '!!' bash$ : ": ': ": '!!'" '!!''" '!!' : ": ': ": '!!'" ': ": ': ": '!!'" '!!''" '!!'''" '!!'

producing ever-lengthening strings. The first one has two occurrences of `!!`

, the next has three, the next has five, etc. But these are not Fibonacci numbers! The sequence continues:

2, 3, 5, 17, 161, 15681, …

The shell substitutes each occurrence of `!!`

for the previous command, unless it is inside single quotes.

So, what is the pattern? Can we predict how many there will be after *n* steps? I checked the OEIS and the sequence wasn’t there, so I knew it wasn’t something straightforward. Actually computing the strings becomes intractable after a couple more terms, because the sequence grows so rapidly. But there is a way to calculate the sequence.

We need to somehow keep track of whether each occurrence of `!!`

is inside single quotes, so we can tell whether it is going to be expanded. Not only that! We need to be able to track how this changes after a substitution has been performed. The solution is to use the symmetric group S_{3}, the group of all permutations of three things. The three things in this case are quotation states: 0=unquoted, 1=single-quoted, and 2=double-quoted. For example, the string `'"'`

has the effect of swapping the single- and double-quoted states, so we would represent it as the permutation (1 2). The quote characters `'`

and `"`

correspond to the swaps (0 1) and (0 2), which generate S_{3}, so all six permutations can be represented by some string. Given a string, we can work out its permutation by doing the appropriate swap for each quote character and composing them: for example, the string `"'"'`

is (0 2)(0 1)(0 2)(0 1) = (0 1 2).

Now we can represent a string containing `!!`

codes as a polynomial with exponents from S_{3}. (We’re using the group semiring , for the mathematicians in the house.) The coefficients represent the number of occurrences of `!!`

in each quotation state. We’ll assume that the quotes in the string are balanced eventually, i.e. that the entire string gives the identity permutation.

Let’s look at some examples. The string `: '!!'`

has one occurrence of `!!`

, and the string preceding that occurrence is `: '`

which has type (0 1), so the polynomial is just *x*^{(0 1)}. The next string in the sequence is `: "!!" '!!'`

, which has one occurrence of type (0 1) and one of type (0 2), so the polynomial is *x*^{(0 1) }+ *x*^{(0 2)}.

The next one is `: ": '!!'" '!!'`

, which is *x*^{(0 2 1)} + *x*^{(0 1)}.

Now, what happens when substitution is performed? Remember that the single-quoted occurrences of `!!`

are not substituted, so we need to separate the polynomial into the part that undergoes substitution and the part that does not. The part that does not is the *x*^{(0 1)} and *x*^{(0 1 2)} terms, and the part that does is the other four terms. Let’s say we have a string represented by *f*(*x*), and we are going to replace its non-single-quoted occurrences of `!!`

by the string represented by *g*(*x*). First we separate *f* into *f*_{1}(*x*) + *f*_{2}(*x*), where *f*_{1} is the substitutable part and *f*_{2} is the single-quoted part. Then the result of the substitution is represented by *f*_{1}(*x*) *g*(*x*) + *f*_{2}(*x*).

We can check this against the example. We are going to substitute `: '!!'`

into `: "!!" '!!'`

, so we have *f*_{1}(*x*) = *x*^{(0 2)}, *f*_{2}(*x*) = *x*^{(0 1)}, and *g*(*x*) = *x*^{(0 1)}. The result is *f*_{1}(*x*) *g*(*x*) + *f*_{2}(*x*) = (*x*^{(0 2)})(*x*^{(0 1)}) + *x*^{(0 1)} = *x*^{(0 2)(0 1)} + *x*^{(0 1)} = *x*^{(0 2 1)} + *x*^{(0 1)}, which is the same polynomial we computed directly. Phew!

This already gives a reasonably efficient algorithm, because there are only six coefficients to keep track of at each step. So the sequence of polynomials begins like this:

*x*^{(0 1)}

*x*^{(0 2)} + *x*^{(0 1)}

*x*^{(0 2 1)} + *x*^{(0 1)}

Because of the way shell command expansion works, at subsequent steps the command is substituted into itself: pressing up-arrow and then return reissues the previous command, after all, so as far as the shell is concerned the current command is the same as the previous one. So from here on we combine each polynomial with itself to get the next one, and the next in the sequence is:

*x*^{(0 2 1)}(*x*^{(0 2 1)} + *x*^{(0 1)}) + *x*^{(0 1)}

= *x*^{(0 2 1)(0 2 1)} + *x*^{(0 2 1)(0 1)} + *x*^{(0 1)}

= *x*^{(0 1 2)} + *x*^{(0 2)} + *x*^{(0 1)}.

Add up the coefficients to find the total number of `!!`

occurrences in the string: three, in this case.

Continuing in the same way, we get:

*x*^{()} + *x*^{(1 2)} + *x*^{(2 1 0)} + *x*^{(0 1)} + *x*^{(0 1 2)}

and then:

3*x*^{()} + 2*x*^{(0 2)} + 2*x*^{(1 2)} + 2*x*^{(2 1 0)} + 4*x*^{(0 1)} + 4*x*^{(0 1 2)}.

followed by

25*x*^{()} + 24*x*^{(0 2)} + 24*x*^{(1 2)} + 24*x*^{(2 1 0)} + 32*x*^{(0 1)} + 32*x*^{(0 1 2)}.

and then

2545*x*^{()} + 2544*x*^{(0 2)} + 2544*x*^{(1 2)} + 2544*x*^{(2 1 0)} + 2752*x*^{(0 1)} + 2752*x*^{(0 1 2)}.

This is interesting! A pattern seems to be developing. It looks as though our polynomials all have the form

*x*^{()} + p(*x*^{()} + *x*^{(0 2)} + *x*^{(1 2)} + *x*^{(2 1 0)}) + q(*x*^{(0 1)} + *x*^{(0 1 2)})

for some numbers p and q, which neatly separates the substitutable part from the single-quoted part. Let’s define

α = *x*^{()} + *x*^{(0 2)} + *x*^{(1 2)} + *x*^{(2 1 0)}

β = ^{(0 1)} + *x*^{(0 1 2)}

so we can write these polynomials as *x*^{()} + pα + qβ. If we could show that (*x*^{()} + pα)(*x*^{()} + pα + qβ) + qβ has the same form, then we know the pattern will continue for ever. Indeed this works! If you multiply out α^{2} and αβ, then you find that α^{2} = 3α + 2β and αβ = α + 2β, so we get

(*x*^{()} + pα)(*x*^{()} + pα + qβ) + qβ

= *x*^{()} + pα + qβ + pα + p^{2}α^{2} + pqαβ + qβ

= *x*^{()} + 2pα + 2qβ + p^{2}(3α + 2β) + pq(α + 2β)

= *x*^{()} + (3p^{2}+2p+pq)α + (2p^{2}+2pq+2q)β

Thus we can continue the sequence using the recurrence relation

p(n+1) = 3p(n)^{2} + 2p(n) + p(n)q(n)

q(n+1) = 2p(n)^{2} + 2p(n)q(n) + 2q(n)

with base case p(4) = 2, q(4) = 4, and the number of occurrences of `!!`

at the nth step (for n≥4) is 4p(n)+2q(n)+1.

Now it’s easy to write a program to compute the values:

p, q = 2, 4 for i in range(4, 16): print"%d: %d" % (i, 4*p + 2*q + 1) p,q = 3*p*p + 2*p + p*q, 2*p*p + 2*p*q + 2*q

Here are the next few:

4: 17 5: 161 6: 15681 7: 159591041 8: 16866847940875521 9: 189345699699803478502456213711361 10: 23891282133830642364466494435428811321279341197321685846692090881 11: 380489448557681961464558753339232227165919630391472190371135637819686949421263314703894423845640377201176155204913969665095280641

I submitted this sequence to the OEIS. Here is the draft that waits for review:

https://oeis.org/draft/A228162

The OEIS entry has been approved!

https://oeis.org/A228162

On OEIS, Joerg Arndt asks whether we could add a formula.

https://oeis.org/history?seq=A228162

Robin, would you mind engaging in this process? (Just register for an account on OEIS, then you can propose changes to every sequence entry including this one.)

Also, there are interesting related sequences, see the “CROSSREFS” section on OEIS:

https://oeis.org/A228162

Okay, I’ve applied for an account.

Pingback: Growth of recursive string substitution | Complex Projective 4-Space

THanks! I liked it so much, I looked around for something I could do on the topic and decided to write a minimal bash shell simulator in Python. Just enough to generate successive terms.

It works, but not very efficiently. The code is on my blog here: http://paddy3118.blogspot.co.uk/2013/08/recursive-unix-bang-bang-in-python.html