Create a KTouch lecture from a list of words

Sunday, September 12, 2010 | En Español

Since I didn't find any lecture that I liked for KTouch, at least not one that allowed me to simply practice typing with a large number of random words, I thought it would be useful to create one based in random words taken from one of the lists of words usually located in /usr/share/dict, although this very same procedure can be use with any list of words. It didn't take to long to do it, but nevertheless it was interesting.

I include at the end the lecture that results from this procedure, as well as a lecture based in the words from the book The Adventures of Sherlock Holmes.

The structure of a KTouch lecture
Creating the KTouch lecture
An brief explanation of some of the parameters used here
Download the KTouch lectures
A short foot note

The structure of a KTouch lecture

First of all, the lectures in KTouch are just XML files. This XML files contain text divided in lines, and the lines are grouped by levels. This is how the lectures in KTouch are organized:

  <Title>The title of the lecture</Title>
  <Comment>The comment of the lecture</Comment>
      <LevelComment>A comment for the level</LevelComment>
      <NewCharacters>The new characters of the level</NewCharacters>
      <Line>A line of text that will appear in KTouch</Line>
      <Line>Another line of text that will appaer in KTouch</Line>
      <Line>One last line that will appear in KTouch</Line>

Now that we know what we want to achieve, we are going to see how is the list of words organized in the dictionary file, by seeing the first lines of it (in this example we are going to be using the american-english file):


Creating the KTouch lecture

What we are going to do to obtain a usable lecture for KTouch is the following:

  1. Shuffle the list of words.
  2. Remove the line breaks to obtain a long line of words.
  3. Split this long line in smaller ones containing several words each.
  4. Add the required XML tags.

I suggest to create a temporal folder to perform this procedure, otherwise the files will be created in your current folder. The first step is to randomize the order in which the words are presented, and save the results in a new file:

sort -R /usr/share/dict/american-english > shuffled-words.txt

Now, we are going to turn the file into one single line, to do this we use tr to delete all the line breaks in the file:

tr '\n' ' ' < shuffled-words.txt > one-line.txt

After this, we need to split this long line into several smaller ones, but without breaking the words, in my example I will set a maximum length of 80 characters per line:

fold -sw 80 one-line.txt > folded.txt

The default value of fold is already 80 characters, I specify it here so that you can change it to any length that you want.

Once this is done, lets start generating the XML file, I will do this as one-liners for ease of copy-paste, we start by printing the head of the file:

echo -e "<KTouchLecture>\n  <Title>The whole list of English words</Title>\n  <Comment>The whole list of English words</Comment>\n  <FontSuggestions>Monospace</FontSuggestions>\n  <Levels>\n    <Level>\n      <LevelComment>This shows the lines of words 1 to 20</LevelComment>\n      <NewCharacters>Lines: 1-20</NewCharacters>" > final.xml

Then, we are going to use awk to read the file contents and add the necessary markup to it:

awk '{
print "      <Line>" $0 "</Line>"
if(NR %20 == 0 )
print "    </Level>\n    <Level>\n      <LevelComment>This shows the lines of words " (NR + 1) " to " (NR + 20) "</LevelComment>\n      <NewCharacters>Lines: " (NR+1) "-" (NR+20) "</NewCharacters>"
}' folded.txt >> final.xml

We add the foot of the file.

echo -e "    </Level>\n  </Levels>\n</KTouchLecture>" >> final.xml

And finally we delete the transitional files that we created and we are done, if you are doing this in that temporal folder that I suggested creating in the beggining you may use *.txt:

rm folded.txt one-line.txt shuffled-words.txt

The end result will be something like this, only much larger:

  <Title>The whole list of English words</Title>
  <Comment>The whole list of English words</Comment>
      <LevelComment>This shows the lines of words 1 to 20</LevelComment>
      <NewCharacters>Lines: 1-20</NewCharacters>
      <Line>charlatans outlook's excuses slangier spoors Tuvalu Southwest burrito </Line>
      <Line>ligature's Fredericton trousseaus Horatio Claude's engining tensed talon </Line>

An brief explanation of some of the parameters used here

In fold, the parameter -s indicates that the spaces are separating the words, and the -w parameter indicates the maximum width of the line.

In echo, we use the -e parameter to enable the interpretation of backslash escapes, otherwise we would end up printing the \n into the file instead of getting a line break.

Awk is to extensive to explain as it is basically a programming language, the notable parts to explain is that what we have between { and } will be executed once for every line that we get, NR represents the number of line, and we use $0 to print the line that we currently have in the buffer, I believe the rest is self-explanatory for someone with programming background.

Download the KTouch lectures

I noticed that many people arrive looking for lectures to download and not for a method to make them, so I decided to upload the result of this procedure. Additionally, because this lecture is the unfiltered list of English words, and therefore it may contain words that affect the sensitivities of someone, not to mention it is not safe for Children, I decided to take a book in the public domain, The Adventures of Sherlock Holmes, and turn it into the list of unique words that appear in the book, then do this procedure, and make it available for download.

Download the KTouch lecture with the full list of English words:
md5sum: df0de288a150a7ff9208fa57d7033fac

Download the KTouch lecture with the words from The Adventures of Sherlock Holmes:
md5sum: ba480f65ee49feef76cfa9ee5319bce2

A short foot note

When I originally made this I was inside of ViM and I tryed to use:

:set tw=80

However there is a problem doing this in Vim, due to how it saves buffers of lines, and since here is one massive line, it will slow down to a crawl and eat a massive amount of memory doing this operation, not to mention that using fold is the most convenient method to achieve this, in fact, Vim is clearly not necessary at all, I just happened to be inside of Vim and accidentally stumbled upon this, issue.

Categories: Commands, FOSS, KDE, Linux, Text manipulation