# Give a virtual monkey a virtual keyboard...



## Lenny (Apr 19, 2009)

Take a look at the following code and tell me what it does:


```
import java.util.Random;
import java.io.*;

public class RandomChar {
	
	public static String s = ""; 

	public static void main(String[] args) throws IOException {
		
		for(int i = 0; i < 1000; i++){
			String a = nextChar();
			makeSentence(a);
		}
		
		BufferedWriter writer = new BufferedWriter(new FileWriter(
		 new File("G:\\Users\\Lenny\\Desktop\\monkey.txt")));
		writer.write("Your monkey has typed: ");
		writer.newLine(); writer.newLine();
		writer.write(s);
		writer.close();
	}
	
	public static String nextChar() {
		Random r = new Random();
		int i = r.nextInt(95) + 32;
		
		String b = "" + (char)i + "";
				
		return b;
	}
	
	public static void makeSentence(String a) {
		s += "" + a + "";
	}

}
```

No? OK then, contemplate the famous theorem: If an infinite number of monkeys typed at an infinite number of typewriters for an infinite length of time, the collected works of Shakespeare would eventually be produced.

Aye, I'm currently breeding my own super-talented virtual monkeys to see if they can crack it! They work by generating a random number between 0 and 94 (95 different numbers), adding 32 to it to get the right decimal number for an ASCII character (adding 32 takes the random number from _0 to 94_ to _32 to 126_, which are all the English ASCII characters from SPACE to ~), and converting the decimal to ASCII, which is then output to a text file.

I'm not the first (not by a long shot), but I've been a bit bored, and I need to keep programming to keep my fingers flexible.

But there you have it. The code above (not perfect - I can do so many things to it to make it cleaner code, and I've got to do something about the file writing, as if the monkey dies halfway through choosing letters, what they've already chosen won't get written) shows just how easy a thought experiment can be carried out with everyday resources. Ain't technology wonderful? 

The current record, by the way, is 24 matching characters.

---

If anyone is wondering, I've got a number of things planned:

- Add some form of timing: either ten minutes/a user defined length, or simply creating a GUI and throwing in a start button and a stop button
- Allow multiple instances of the Virtual Monkey

- If I'm going to build a GUI, I might allow users to select where the file is saved (at the moment it's saved on my desktop) and what it's called (default is "monkey.txt").

I might compile an executable and upload it for others to try out.

EDIT: The last thing my monkey typed:



> Your monkey has typed:
> 
> dCsBCw>X>,SH#,wxZNIDdc?lYt9jYklOu;j|]|:jAIn:{@3#pCI3F!|vF!Nv*ubR*DazKyQqRPHo%/YO{,_+g9p+#QZ@[/i53-(V~;as3t=1g7WBY>a!?f[;nwo3u=RjWCqLqBgyBUx_rQuSq}T/'(f/{J8'YNJ5QdI}}WZ/c+88f/G4,^KR?+5=ziU$:c#>tDLMqd l9D@x<&B&XE-R&cv"dr*Qc#O((M2{J99[!JPS\fD0b=V9:Uj.TkEGHg YZ1dVEaYk4/Z"6v9N(Yf0Ye6^q"Jo3w"P~Kn`;B]FDV{sU7g`<I#VLGYJ-Sh"Mm(vd*^+PoVGae,?8]V4[\$~mpHsrkLw;`0i}iqAo_h}Mh6p$K?\u.JQ")\|k'Et}9$H$f*R+RQ+[;5Xj*4^h<\|9C>v^TVk{:eoBgx?gJQ%kCH}KrhQE#2f}?Ry/R#PwFP=Yidx?UN+%3q5g&%Gk5lQf,<a'e)m9UdTv`'9,3do3ctak^_h<^07iqq/?bi=c:=vj ArH]!Hl}EZ^3Uz2?\g.>ZY3e)}!&~`'L<iE-RMIvawlj~lX:k@aI.BS{?%M}$R~}o+N"ksBr)Wb1=h6n@vJl'L*QxZ.h:&S)TR3Q88J9gxT5La[`0:i0H8TDgIc7@qFo6Q(G6$k*cH]KiLs:ymTM{h'@-t~% D^&8%Wh/V/_Za"![]!)[~3j;c41\Ri:]#e&i7\-QWCp)X}clAb7?;_( dv=$OsF!G>B L~n!Exaun9JuRwIjMxn&:0F0uP!)rIcZOcH^U2y)ZqW;LF<2BYOkHaW%8tZVubtjOuB>=zHGj`{F'uFs-sxgf(qD96LB'9yOY'}wWtZ;Z@e$:Fb+!?G^/~7"#}Q`1]TT8VUEThiU.Dhg8;$WpL,%R5e88d&o^Mf]"^Br\,$[MB~?3-^nf>.`Ug84[C0UQ+O8e^R,U/xZ gTI//CVVg_?/ZxmxCEw`HlP-2QJh+=[|FvEGHjnQvtsb$ l)ZF4#o



There spaces in there, honest! Most of them are where the QUOTE box has decided to split onto a new line.


----------



## Ursa major (Apr 19, 2009)

Have you tried decreasing the randomness by building in bias based on the frequency of characters in standard (or Shakespearean) English? (Okay, this wouldn't be testing the monkey theory as such, but then you don't have infinite time, so you ought to allow yourself to cut the odd corner.)



Even if you don't go this far, I think you can leave most of the symbols out: you can always punctuate it later.


But if you do want to be strict: remember that some symbols require the shift key, so the likelihood of these symbols must be a good deal less than those that don't need a shift action - and even then, this assumes a key press when another is already pressed: you really ought to include this (somehow).




Now where's that banana?


----------



## Ursa major (Apr 19, 2009)

Sorry to double post.



Why not generate a random binary string and interpret it as if it were Huffman coded**?





** - I'm almost willing to bet that the frequency of character use in Shakespeares works has already been determined and may very well be available somewhere on the Web.


----------



## Lenny (Apr 20, 2009)

I did think about the SHIFT point when planning it all out - if I wanted to be really true to the keyboard, then I'd also need the ENTER key, and things like CAPS LOCK. When I thought about it more (and here I am, thinking about it again, so get ready for my entire thought processes! ), though, I didn't know how to do it in Java.  VB allows you to compare the "KeyAscii" value of a key when it is pressed (something I used to the death with small data entry forms - if KeyAscii = 13, which is the ENTER key, then it calls the code for the "Submit" button of the form. Java, being OOP rather than event-driven, needs ActionListeners and the like to even begin to think about that, and I don't know if it's got an equivalent to KeyAscii), but to do it in Java [my first thoughts are that] I'd probably need to select the case from a list (two reasons: 1. the ASCII character for decimal value 13 is not the same as the KeyAscii representation, 2. the KeyAscii table has gaps in it - it jumps from 47, which is the HELP key, to 65, which is the A key), which would be a damn big list:

- A standard keyboard has 105 keys
- With CAPS LOCK, that effectively increases to 131 (26 upper case letters)
- With shift held down, that further increases to 153 (22 more keys with an 'upper case' character)
- Which is finally rounded out to 155 with the two keys that can be three characters (4$€ and `¬¦ - the third character is obtained by holding CTRL and ALT).

So... one SELECT CASE (105 cases) for standard, one for CAPS LOCK (26 cases), one for SHIFT (26 upper case letters and 22 characters - 48 cases), and one for CTRL+ALT (2 cases), giving a list of 181 cases.

Now if I want to be really nit-picky, I can add in the Windows commands - a monkey could easily hit ALT+F4, or CTRL+ALT+DEL, in which case whatever is typed after that isn't recorded, unless another certain key is pressed... gah!

You can start to see why I went for the easy option with the initial build!  If I was doing it in VB then I'd have gladly gone over the top, but that's because I know what I'm doing in VB. Java, on the other hand, is a horrible language.

Another question that has sprung to mind - SHIFT needs to be held down for multiple characters to be shifted up. How would I model that? Take the sticky key option and assume that it's held down until another SHIFT is randomly chosen (which would be how I control CAPS-, NUM- and SCROLL LOCK)? Or just apply it to one character?

Then I've got to think that a standard keyboard has not one, but two SHIFT buttons, which doubles the probability of it being hit. And if I'm going to do that, then I need to do the same for CTRL and ALT... and what about ENTER? It spans two rows, so should that be two buttons (giving three altogether? Though the ENTER on the Num Pad spans two rows, too... as does PLUS, and 0INS spans two columns), and then there's the FORWARD SLASH, and + and - and *. And perhaps the biggest key of the lot - SPACE... what's that one worth?

Definitely something to think about for future monkeys - it'll be worth building it as above, and then seeing what I can replace to cut down on code and make it cleaner.

---

Which I think brings me nicely to your second post about Huffman Coding.

I must admit that  I had to look it up - to say I have a module that seems to be focused solely on trees, I've yet to come across Huffman coding. I've had a look at Wikipedia, and the general idea behind it seems pretty straight-forward - assigning codes to characters based on their frequencies in a given string... yes?

I've thought of a potential problem though (which may be because I don't know enough about Huffman Coding) - anagrams.

I've got two strings (of characters, rather than binary): *My name is Lenny* and *Sanely men in my*. Surely both would end up with the same Huffman Coding, as exactly the same number of characters (including spaces) occur in both, just in a different order?

So if my monkey somehow manages to produce the string of characters: "obeto   r ttbn ooe", would a program comparing it with the Huffman Code for "to be or not to be" not say they are a correct match?

I can how binary strings would solve it easily, but that takes the fun out of it - plus without arguments, how am I supposed to learn anything new? Huffman Codes are a brand new algorithm to me, and I want to make sure I understand them completely.

---

Thanks for the ideas, by the way - I find it fascinating to observe the thought processes of others when they are solving problems, particularly with programs. Give a class of 30 a problem to solve, and you'll be given not only 30 different solutions, but 30 completely different programs! 

EDIT: Oh, about the bias - my goal is to make it as true to life as possible. Within reason, of course (you can't expect a monkey to randomly hit a different key everytime, could you? You'd expect it to sit and press the same key for a while, before maybe hitting the keyboard with its other hand, at the same time still holding down the original key!).


----------



## Ursa major (Apr 20, 2009)

Lenny said:


> A standard keyboard has 105 keys.


 
Strictly speaking, Lenny, you're talking about a _computer_ keyboard, not a typewriter's (most of which would not have anything like 105 keys).




Lenny said:


> Thanks for the ideas, by the way - I find it fascinating to observe the thought processes of others when they are solving problems, particularly with programs. Give a class of 30 a problem to solve, and you'll be given not only 30 different solutions, but 30 completely different programs!


 
You're welcome (Hooray: Someone thinks I have thought processes! ) Oh, and and _at least _30.


----------



## Lenny (Apr 20, 2009)

Ursa major said:


> Strictly speaking, Lenny, you're talking about a _computer_ keyboard, not a typewriter's (most of which would not have anything like 105 keys).



Fair point - but modern thinking calls for modern objects, surely! 

For the sake of argument, a standard typewriter would have about fifty keys, right?

- 26 letters, arranged in any order *(26)*
- 0-9* (10)*
- I know for a fact (because I've just read up on typewriters!) that the SHIFT key wasn't standard in the early days, but let's throw it in anyway *(1? 2?)*
- SPACE, obviously *(1)*
- CAPS LOCK, I'd imagine *(1)*
- And all the miscellaneous keys. Let's go for: [];'#,./\ *(9)*

Which gives me 48 or 49 keys.

I don't suppose anyone else knows better, do they?

Thinking about it, if you go back to using an RNG and converting the decimal numbers to ASCII, it doesn't matter whether you use a keyboard or a typewriter, as both are more or less equal in terms of characters they can produce. It's not until you factor in repeat keys (the numbers, for instance) and extra keys that do nothing (in this context - Esc and most of the F keys do nothing if you're typing in Notepad), that you need to change things - a 9 has twice the probability of being typed than a [.

---

Am I right at all about Huffman Coding and anagrams if strings of characters are used?


----------



## Ursa major (Apr 20, 2009)

Lenny said:


> I've thought of a potential problem though (which may be because I don't know enough about Huffman Coding) - anagrams.
> 
> I've got two strings (of characters, rather than binary): *My name is Lenny* and *Sanely men in my*. Surely both would end up with the same Huffman Coding, as exactly the same number of characters (including spaces) occur in both, just in a different order?


 
My knowledge of Huffman coding is from a Byte magazine article I read in the 1980s, so I'm no expert.

The anagram issue is not an issue, as the text is kept in its original order, whether encoded (compressed) or not.



There are two stages in encoding (compressing) text.
The relative fequency of a symbol (say a character) in the text to be compressed is used to determine the coding. (This stage is only done if you want very good compression: otherwise you can use the Huffman code for English, say, as a whole. I would have thought with large amounts of text, the difference in compression efficiancy between the bespoke and the general case would decrease.)
The text to be compressed is then coded symbol by symbol in the order in which it appears in the original. (Decompressing works the same way, symbol by symbol, maintining symbol order.)


----------



## HareBrain (Apr 20, 2009)

Just to point out that if you had infinite monkeys, you wouldn't need infinite time to produce the complete Shakespeare, just a few weeks (depending on simian typing rate).

OK, so I'm bored too ...


----------



## Ursa major (Apr 20, 2009)

Don't forget that in those days (when the statement was first made, whenever that was), they had no computers and OCRs, so the checking of the texts would take a very long time. Unless there was to be an infinite number of checkers with an infinite number of accurate copies of Shakespeare's works against which to check.


----------



## HareBrain (Apr 20, 2009)

You're right. Infinite monkeys I can accept, but infinite checkers? What was I thinking??


----------



## Lenny (Apr 20, 2009)

Talking about checkers, I found a 58000 word dictionary in a .txt file last night, and I've added a but of code to check the string against real words:


```
public static void compare() throws IOException {
		
		int counter = 0;
		String line;
		String s2 = s.toLowerCase();
		
		BufferedReader in = new BufferedReader(
                  new FileReader("G:\\Users\\Lenny\\Desktop\\dictionary.txt"));
		BufferedWriter writer = new BufferedWriter(new FileWriter(
                  new File("G:\\Users\\Lenny\\Desktop\\monkey.txt"), true));
		writer.newLine(); writer.newLine();
		writer.write("Your very intelligent monkey has written:");
		writer.newLine(); writer.newLine();
		
		while((line = in.readLine()) != null){
			if(line.length() > 2){
				if(s2.indexOf(line) != -1){
					writer.write(line + "  ");
					counter++;
				}
			}
		}
		
		writer.newLine(); writer.newLine();
		writer.write("That's a whole " + counter + " word(s)!");
		writer.close();
	}
```

So far, the best a monkey has produced is 401 words in 100,000 characters - including "emu, "jam", "jazz" and "zulu".

The latest monkey is slowly working his way through 1,000,000 characters, and has been for twenty minutes!

EDIT: He just fatally crashed. 

---

Thinking about the logistics for a physical checker for infinite monkeys at typewriters, a checker for each monkey might be excessive. At the very max I'd imagine that ∞/2 checkers are needed - one checker to two monkeys. If they check each monkey's output every ten minutes, then they'll have five minutes to skim through a few thousand characters (I can hit 150wpm on a very good day - can you imagine a monkey hitting even 100wpm? Thus you're looking at maybe 7000 characters - say an average word is six letters long, and then spaces. You could skim through 7000 characters to see if there are any real words within a minute or two, and if you find a string of them then you could stop and crack open the Shakespeare... assuming you have a good knowledge of all the plays, otherwise you'd be there for days trying to match things up) before going to the next monkey. You could even have one checker for three monkeys, thus ∞/3 checkers.


----------



## HareBrain (Apr 20, 2009)

Lenny said:


> Thinking about the logistics for a physical checker for infinite monkeys at typewriters, a checker for each monkey might be excessive. At the very max I'd imagine that ∞/2 checkers are needed


 
But since ∞/2 is not distinguishable from ∞, you've ended up with as many checkers as before. In fact since 100∞ is not distinguishable from ∞, you've also ended up with 100 checkers for each monkey!! Who's going to pay all these underemployed people?


----------



## Lenny (Apr 20, 2009)

Hmmm... it made sense when I wrote it, honest!

The reason I started to breed Virtual Monkeys is because of an idea on the BBC program "Genius" - increase income tax by one penny in a pound and use the millions of pounds to buy monkeys, typewriters, paper and carers, and do the experiment.

Why are we discussing logistics, anyway?!


----------



## HareBrain (Apr 20, 2009)

Discussing something is always better than having a war about it. Now granted, the possibility of us going to war on this issue might seem slim, but better safe than sorry - all those monkeys could cause a lot of destruction!

Plus I'm waiting for dinner to cook.


----------



## Lenny (Apr 20, 2009)

'Twould be a lot easier for a monkey to produce a declaration of war over Shakespeare's collected works (if we think about the number of monkeys as finite )!

I've just has a suggestion from a friend - give the monkey the means to [potentially] program another monkey, which also has those means, and wait for world domination Planet of the Apes styley! Muahahaha...


----------



## Dave (Apr 20, 2009)

I found this a very interesting idea, and I don't think that adding something to narrow-down the 'randomness' by aligning it to the probability of letters in the English language is "cheating" at all. You should probably read some books on cryptology to give you an idea of where to start.


----------



## Lenny (Apr 21, 2009)

If I find myself with free time between now and my exams, I'll do just that. Until then, I'll concentrate on the coding and sort out the user interface. After my exams I'll be free to do whatever I want for about 18 weeks! 

---

Good news, everyone! I've thrown together a (very rough) distribution of sorts which can be downloaded here (.RAR file): Virtual_Monkey_1.0.rar Download File on FileFront

Instructions are inside the folder.

EDIT: Just a few numbers, times and expected #words.

10 Characters   --  Almost Instant  -- 0/1 Words
100 Characters -- Almost Instant -- <5 Words
1000 Characters -- Almost Instant -- <20 Words
10000 Characters -- A few seconds -- ~55 Words
100000 Characters -- ~Thirty Seconds -- ~400 Words
1000000 Characters -- A few hours? -- I've only done a million one, and got 1,012 words (which included my longest to date - "LEGEND", fittingly).


----------



## Lenny (Apr 22, 2009)

Virtual_Monkey_v1.6.rar Download File on FileFront

An update for all you lovely peoples...

I've sat down and optimised the code until it could be optimised no more! The end result is rather pleasing:

- In v1.0, the times for increasing numbers of characters went up exponentially, with the big leap coming between 100,000 characters and 1,000,000 (from fourteen seconds to three hours!)

- In v1.6, one million characters takes 0.4s, regardless of how many times you run it. You could set the monkey off on one million characters until the cows come home, and it would still rattle them out at a rate of one million per 0.4s. Furthermore, you can string them together! One billion characters, for instance, takes exactly six minutes and forty seconds which, if you work it out, is exactly one thousand times the length of time it takes to generate one million characters.

I've done this by writing the generated characters to a text file in blocks, clearing the string within the program, and repeating the process. Writing in blocks of 25,000 characters got the time down to 22s for one million characters, which was further shortened by writing in blocks of 10,000 (six seconds), 5,000 (one second) and 250 (0.5s). The optimal number of characters to write at a time is 1-200 (0.4s), so I've set it to write in blocks of 175 characters.

Just to see what would happen, I've increased the character limit to 2,147,483,647 (which is ((2^31)-1) - an integer in Java is 32 bits, thus the highest value it can store is two to the power of thirty-one, minus one). I haven't tried it yet, but I know it will generate in just under fourteen minutes and twenty seconds.

The problem I have now is with the comparison - it takes over a minute for one million characters! If I can split it up into blocks and compare the smaller blocks, then I imagine I can speed it up, but that can wait until tomorrow.

So there you have it, folks - a phenomenal increase in speed (I was knocked back with awe when I changed my code and it generated one million characters in 22s! Try it for yourself - generate 100,000 characters with v1.0 and v1.6 and see the difference in speed).

Enjoy!

Link again: Virtual_Monkey_v1.6.rar Download File on FileFront


----------



## Lenny (Dec 4, 2009)

Ursa major said:


> Why not generate a random binary string and interpret it as if it were Huffman coded**?




In our *Algorithms & Complexity* lecture this morning, we spent about forty minutes on Huffman coding, and how and why it's used.  We're also going to have a practical on it at the start of next term.

If I get time over Christmas, I'm going to have another crack at my monkeys - since Easter I've developed a much better understanding of threads and implementing them, and in this terms Algorithms & Complexity lectures we've done a lot about algorithm running time, searching algorithms and sorting algorithms, which I should be able to apply and speed things up.


----------



## HareBrain (Dec 4, 2009)

Lenny said:


> If I get time over Christmas, I'm going to have another crack at my monkeys


 
I just wanted to see this sentence out of context. That's all.


----------



## The Judge (Dec 4, 2009)

HareBrain said:


> I just wanted to see this sentence out of context. That's all.



That's the only sentence I've understood in the whole thread...

J


----------



## Dave (Dec 4, 2009)

Shakespeare? I've just realised the origin of the Nigerian Spam emails I've been getting.


----------



## Lenny (Jan 8, 2010)

Dave said:


> I found this a very interesting idea, and I don't think that adding something to narrow-down the 'randomness' by aligning it to the probability of letters in the English language is "cheating" at all. You should probably read some books on cryptology to give you an idea of where to start.



Thought I'd give this thread another read through (trying to find if I'd written down the size of the 2 billion character file), and I came across this post.

Next term (as well as doing a bit more on Huffman coding), I've got a module called *Codes and Cryptography* next term. I assume there'll be some crossover with cryptology (which, as I understand, is a slightly different practice including cryptanalysis).

I've not even looked at my monkeys over the past few weeks and, with the second half of the academic year looming, I doubt I'll get time between the work and exams. Afterwards, yes, as long as my programming for Android doesn't swallow my time.

---

I find it interesting how concepts suggested to me when I was building the Virtual Monkey are being taught to me as part of my course.


----------



## Ursa major (Jan 8, 2010)

Lenny said:


> I find it interesting how concepts suggested to me when I was building the Virtual Monkey are being taught to me as part of my course.


 
So you've only just realised that we at the Chrons are at the _Ape_x of this sort of thing? 


Oops! Should I have typed that?


----------



## The Judge (Jan 8, 2010)

Lenny said:


> I find it interesting how concepts suggested to me when I was building the Virtual Monkey are being taught to me as part of my course.



That's because your infinite monkeys are eager for you to pass your exams so you can start building them and they can exist.  Therefore, they are coming back from the future and whispering in Ursa's and Dave's ears, putting forward these concepts -- knowing your course work will include them -- so the suggestions are made here, so that you pay attention to the lectures when they are delivered, and you get 100% in your tests. 

Or not.

J


----------

