created: | 2015-01-26 |
---|---|
updated: | 2015-01-26 |
…yields falsehood when preceded by its quotation.
—Quine’s paradox
A quine is a non-empty computer program which does not accept any input and outputs its own source code. Usually it consists of three parts: A data section storing its source code, a program that outputs a quoted version of that section and another one that outputs the section literally (not necessarily in that order).
This post first presents a naive quine written in brainfuck and all the steps to get to that point. Then a simple yet effective optimization that reduces the quine’s size is developed. It is highly recommended to use a brainfuck interpreter to actually run the code presented below. The programs are portable and any brainfuck interpreter (online or offline) should execute them correctly.
Brainfuck is an esoteric language that lacks syntactic sugar one might expect from modern programming languages and is thus not suited for real-world programms. It supports eight instructions that operate on an infinite amount of memory cells: + and - increment or decrement the current cell’s value, < and > move the cell pointer left or right, , and . read or write a character to or from the current cell. And finally [ and ] jump to their counterpart if the current cell’s value is not zero (while loop). Overall brainfuck’s resemblance to the Turing machine is probably not completely unintentional.
The biggest question when writing a quine, especially in brainfuck, is: Where do we start? Let’s warm up by writing a program that prints a + character:
1 +++++++++++++++++++++++++++++++++++++++++++ 2 .
Since the ASCII value of + is 43 we add 43 to the first memory cell and output its value. Since we’ll be dealing with a lot of programs in this blog post we are going to give every program a unique name. This piece of code is called program A.
Now what does a program B, whose output is program A, look like? In fact it is quite similar:
1 +++++++++++++++++++++++++++++++++++++++++++ 2 ........................................... 3 +++ 4 .
The first line again stores a + sign in the first memory cell. That plus is printed exactly 43 times, the number of increments required to generate an ASCII plus character. Then we increment by three, turning the + into a dot and print it once. And voilà: Program B’s output is A.
Instead of writing program B for every program A ourselves we create another program C that does the job for us. It reads arbitrary input and outputs a program whose output is the original input, just like B’s output was A:
1 +++++++++++++++++++++++++++++++++++++++++++ 2 >++++++++++++++++++++++++++++++++++++++++++++++ 3 >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 >, 5 [ 6 <.> 7 [ - <<<.>>> ] 8 <<.>> 9 , 10 ]
Lines one to three prepare the first three cells with the ASCII chars +, . and >, which are required to store and output any input character α. For each α in cell four we print one > from cell three, then α plus characters from cell one and a final dot from cell two. Fed with program A this program’s output is the following program BC (without line breaks, which have been added for readability):
1 >+++++++++++++++++++++++++++++++++++++++++++. 2 >+++++++++++++++++++++++++++++++++++++++++++. 3 >+++++++++++++++++++++++++++++++++++++++++++. 4 >+++++++++++++++++++++++++++++++++++++++++++. 5 >+++++++++++++++++++++++++++++++++++++++++++. 6 >+++++++++++++++++++++++++++++++++++++++++++. 7 >+++++++++++++++++++++++++++++++++++++++++++. 8 >+++++++++++++++++++++++++++++++++++++++++++. 9 >+++++++++++++++++++++++++++++++++++++++++++. 10 >+++++++++++++++++++++++++++++++++++++++++++. 11 >+++++++++++++++++++++++++++++++++++++++++++. 12 >+++++++++++++++++++++++++++++++++++++++++++. 13 >+++++++++++++++++++++++++++++++++++++++++++. 14 >+++++++++++++++++++++++++++++++++++++++++++. 15 >+++++++++++++++++++++++++++++++++++++++++++. 16 >+++++++++++++++++++++++++++++++++++++++++++. 17 >+++++++++++++++++++++++++++++++++++++++++++. 18 >+++++++++++++++++++++++++++++++++++++++++++. 19 >+++++++++++++++++++++++++++++++++++++++++++. 20 >+++++++++++++++++++++++++++++++++++++++++++. 21 >+++++++++++++++++++++++++++++++++++++++++++. 22 >+++++++++++++++++++++++++++++++++++++++++++. 23 >+++++++++++++++++++++++++++++++++++++++++++. 24 >+++++++++++++++++++++++++++++++++++++++++++. 25 >+++++++++++++++++++++++++++++++++++++++++++. 26 >+++++++++++++++++++++++++++++++++++++++++++. 27 >+++++++++++++++++++++++++++++++++++++++++++. 28 >+++++++++++++++++++++++++++++++++++++++++++. 29 >+++++++++++++++++++++++++++++++++++++++++++. 30 >+++++++++++++++++++++++++++++++++++++++++++. 31 >+++++++++++++++++++++++++++++++++++++++++++. 32 >+++++++++++++++++++++++++++++++++++++++++++. 33 >+++++++++++++++++++++++++++++++++++++++++++. 34 >+++++++++++++++++++++++++++++++++++++++++++. 35 >+++++++++++++++++++++++++++++++++++++++++++. 36 >+++++++++++++++++++++++++++++++++++++++++++. 37 >+++++++++++++++++++++++++++++++++++++++++++. 38 >+++++++++++++++++++++++++++++++++++++++++++. 39 >+++++++++++++++++++++++++++++++++++++++++++. 40 >+++++++++++++++++++++++++++++++++++++++++++. 41 >+++++++++++++++++++++++++++++++++++++++++++. 42 >+++++++++++++++++++++++++++++++++++++++++++. 43 >+++++++++++++++++++++++++++++++++++++++++++. 44 >++++++++++. 45 >++++++++++++++++++++++++++++++++++++++++++++++. 46 >++++++++++.
It might not look as pretty as the hand-written program B, but it does the job. Actually, it does the job very well, since it not just outputs program BC, but also writes it to memory cells. We could remove all the dots (which output the current cell’s value) and append this piece of code instead, too:
47 [<] 48 >[.>]
These additional two lines first rewind the cell pointer to the beginning of the program – assuming there were no zeroes in program C’s input – and then output program BC.
By definition a quine cannot have any input. So we have to modify program C to read its input from memory cells instead. This program is called program QC:
1 [<<]>> 2 [ 3 <++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.. 4 ------------------- 5 > [ <. >- >+< ] 6 > [ <+ >- ] 7 > 8 ]
Looking at the original program C again the first three cells stored ASCII characters we used to generate the output. Using a fixed location for them here, say before program A, is rather difficult since we’d need a position counter, which – again – needs to be stored somewhere. Instead we expect the input to contain “holes”, unused memory cells, that can be used as temporary storage while traversing the input. The string “ABC” then resides in six instead of three memory cells whose values are 0 A 0 B 0 C.
Line one assumes the cell pointer is at the last input cell and rewinds it to the first one, two at a time, since the temporary cells are set to zero. Then, in line three, for each input cell \(i\) the character > is written to \(i-1\) and printed twice. Afterwards it is transformed into plus character by subtracting 19 from 62. This character is printed as long as cell \(i\), which is decremented by one each time, is not zero (line five). We’ll need the value in cell \(i\) later and thus write a temporary copy to \(i+1\), which is moved back to cell \(i\) in line six.
So far we wrote:
Now we concatenate program QC and the (slightly modified) lines 47 and 48 of program BC and call it program Qtail:
1 [<<]>> 2 [ 3 <++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.. 4 ------------------- 5 > [ <. >- >+< ] 6 > [ <+ >- ] 7 > 8 ] 9 10 <<[<<]>> 11 [.>>]
Finally we bootstrap the quine by feeding program Qtail into a modified program C. Cbootstrap prints two > instead of one (second dot added to line six) and does not print the trailing dot (line eight removed):
1 +++++++++++++++++++++++++++++++++++++++++++ 2 >++++++++++++++++++++++++++++++++++++++++++++++ 3 >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 >, 5 [ 6 <..> 7 [ - <<<.>>> ] 8 9 , 10 ]
Prepending the output to Qtail yields a valid quine: It expects no input and reproduces itself.
With 6711 bytes (without whitespace) the quine is quite big though. Heading back to program BC (#) we can see why: Storing the number \(N\) to a cell, for example the ASCII value of +, requires \(N\) plus characters. Since we’re using just the characters +-<>[]. – a comma is not required for a quine – instead of using ASCII we can map these characters to the numbers 1…7. We can keep the first part of program Qtail (#) (which equals QC, #) as is. Line eleven however needs to be replaced by program Qdecode, which maps 1…7 back to ASCII characters and outputs them:
1 [ 2 - 3 [ - <++> 4 [ - <+++++++++++++++> 5 [ - <++> 6 [ - <+++++++++++++++++++++++++++++> 7 [ - <++> 8 [ - <-----------------------------------------------> ] 9 ] 10 ] 11 ] 12 ] 13 ] 14 <.> >> 15 ]
Again this program loops over all cells. The loop body is equivalent to a switch statement in C, implemented with nested while loops. If the cell value minus one is zero (i.e. the cell’s value was one; line two), reuse cell \(i-1\)’s value. Recall that line four of program QC (#) stored a plus character in \(i-1\). Thus a value of one maps to +. If that was not the case enter the second loop. Again we subtract one from cell \(i\) and change cell \(i-1\)’s value to the ASCII value -. And so on… This way the following mapping is established:
Value | ASCII |
---|---|
1 | + |
2 | - |
3 | < |
4 | > |
5 | [ |
6 | ] |
7 | . |
This particular encoding has not been chosen randomly: The plus and minus characters occur most frequently in the quine we are currently writing, followed by greater/smaller sign and square brackets. The dot is used only a few times.
Back to the code above, line 14 finally prints the resulting ASCII character and advances to the next cell.
Again, we have to modify the bootstrapping program as well. We start with Cbootstrap (#) and get Cbootstrap,encode:
1 +++++++++++++++++++++++++++++++++++++++++++ 2 >++++++++++++++++++++++++++++++++++++++++++++++ 3 >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 >, 5 [ 6 ------------------------------------------- >+< 7 [ -- >+< 8 [ - >+++++< 9 [ -------------- >----< 10 [ -- >+< 11 [ ----------------------------- >+< 12 [ -- >+< 13 [ - >[-]< ] 14 ] 15 ] 16 ] 17 ] 18 ] 19 ] 20 >[ 21 <<..>> 22 [ - <<<<.>>>> ] 23 ]< 24 , 25 ]
With a similar while chain (lines six to 19) we can decode ASCII into the range one to seven for every character read (outer loop). Brainfuck’s instruction characters are decoded ordered by their ASCII value (+, -, then ., and so on…). The encoded value (see table above) is stored in cell five. Line 13 is the switch statement’s fall through for characters that cannot be encoded (characters that is not part of the brainfuck instruction set). Both cells, four and five, are set to zero to break out of the chained loops. Then, in line 20 to 23, two > (line 21), followed by cell five’s value times + (line 22) are printed if cell five is not zero (line 20).
Again, we concatenate QC (#) and Qdecode (#) and call the result Qtail2:
1 [<<]>> 2 [ 3 <++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.. 4 ------------------- 5 > [ <. >- >+< ] 6 > [ <+ >- ] 7 > 8 ] 9 10 <<[<<]>> 11 [ 12 - 13 [ - <++> 14 [ - <+++++++++++++++> 15 [ - <++> 16 [ - <+++++++++++++++++++++++++++++> 17 [ - <++> 18 [ - <-----------------------------------------------> ] 19 ] 20 ] 21 ] 22 ] 23 ] 24 <.> >> 25 ]
Then we use Qtail2 as input for Qbootstrap,encode (#). The newlines have been inserted for readability and are not part of the actual output:
1 >>+++++>>+++>>+++>>++++++>>++++>>++++>>+++++>>+++>>+>>+>>+>>+>>+>>+>>+>>+ 2 >>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+ 3 >>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+ 4 >>+>>+>>+>>+>>+>>+>>+++++++>>+++++++>>++>>++>>++>>++>>++>>++>>++>>++>>++ 5 >>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++++>>+++++>>+++>>+++++++>>++++ 6 >>++>>++++>>+>>+++>>++++++>>++++>>+++++>>+++>>+>>++++>>++>>++++++>>++++ 7 >>++++++>>+++>>+++>>+++++>>+++>>+++>>++++++>>++++>>++++>>+++++>>++>>+++++ 8 >>++>>+++>>+>>+>>++++>>+++++>>++>>+++>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+ 9 >>+>>+>>+>>++++>>+++++>>++>>+++>>+>>+>>++++>>+++++>>++>>+++>>+>>+>>+>>+ 10 >>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+>>+ 11 >>+>>++++>>+++++>>++>>+++>>+>>+>>++++>>+++++>>++>>+++>>++>>++>>++>>++>>++ 12 >>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++ 13 >>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++>>++ 14 >>++>>++>>++>>++>>++>>++>>++++>>++++++>>++++++>>++++++>>++++++>>++++++ 15 >>++++++>>+++>>+++++++>>++++>>++++>>++++>>++++++
Together with Qtail2 this is a much smaller quine, just 1310 bytes large. That’s five times smaller than the naive approach. While there are numerous ways to further reduce the quine’s size this is left as an exercise for the reader.
Character | ASCII value |
---|---|
+ | 43 |
, | 44 |
- | 45 |
. | 46 |
< | 60 |
> | 62 |
[ | 91 |
] | 93 |