What is Gibberish?

The gibberish (nonsense text) presented here is generated by a remarkably simple computer program. Given some sample text, say Shakespeare, as input, the computer generates output which is random, but which has the same statistical distribution of characters or combinations of characters. (A character may be a letter, a digit, a space, a punctuation mark, etc.)

Level 1 gibberish: The output has the same distribution of single characters as the input. For example, the probability of seeing a character like "e" or "z" or "." will be approximately the same in the output as in the input.
Level 2 gibberish: The output has the same distribution of character pairs as the input. For example, the probability of seeing a pair like "th" or "te" or "t." will be approximately the same in the output as in the input.
Level n gibberish: The output has the same distribution of groups of n characters (n-tuples) as the input.

The algorithm is a letter-based Markov text generator. Level n gibberish is a Markov chain of order n-1.

It is amazing how well this simple algorithm works, even for very low level numbers. For example, at level 2, you can easily recognize different languages. At level 3 you can recognize the styles of different authors.

For even more fun, the gibberish generator can easily blend two different languages or two different authors. If the input is simply the text from author A followed by the text from author B, the output will be a smooth blend of the two.

To see some samples of plain and blended gibberish, go to Gibberish Samples.

To generate your own gibberish, go to Gibberish Generator

You can see my source code in GibGen.html and GibGen.js (JavaScript). My implementation of the algorithm is simple (and inefficient). To generate level n gibberish do the following. Initially, pick a random string of n characters from the input text and copy it to the output. Now start looping. Repeat the following steps. Set the target string to be last n-1 characters written. Find all the occurrences of the target string in the input text. Randomly select one of these matching positions. Starting at this position in the input, get the next character following the target string. Copy this character to the output text. Repeat.

A final thought: Is the human brain simply a level 100 gibberish generator?

References: Program named Mark V. Shaney (pun on Markov Chain) by Bruce Ellis, Rob Pike, and Don P. Mitchell, publicized in the June, 1989, Scientific American "Computer Recreations" column titled "A potpourri of programmed prose and prosody" by A. K. Dewdney.