Tackling L33t-Speak

Tackling L33t-Speak
Dave Taylor Thu, 04/05/2018 - 09:00

How to script a l33t-speak translator.

My daughter and I were bantering with each other via text message this morning as we often do, and I dropped into a sort of mock "leet speak". She wasn't impressed, but it got me thinking about formulaic substitutions in language and how they represent interesting programming challenges.

If you're not familiar with "leet speak" it's a variation on English that some youthful hackers like to use—something that obscures words sufficiently to leave everyone else confused but that still allows reasonably coherent communication. Take the word "elite", drop the leading "e" and change the spelling to "leet". Now replace the vowels with digits that look kind of, sort of the same: l33t.

There's a sort of sophomoric joy in speaking—or writing—l33t. I suppose it's similar to pig latin, the rhyming slang of East Londoners or the reverse-sentence structure of Australian shopkeepers. The intent's the same: it's us versus them and a way to share with those in the know without everyone else understanding what you're saying.

At their heart, however, many of these things are just substitution ciphers. For example, "apples and pears" replaces "stairs", and "baked bean" replaces "queen", in Cockney rhyming slang.

It turns out that l33t speak is even more formalized, and there's actually a Wikipedia page that outlines most of its rules and structure. I'm just going to start with word variations and letter substitutions here.

The Rules of L33t Speak

Okay, I got ahead of myself. There aren't "rules", because at its base, leet speak is a casual slang, so l33t and 733T are both valid variations of "elite". Still, there are a lot of typical substitutions, like dropping an initial vowel, replacing vowels with numerical digits or symbols (think "@" for "a"), replacing a trailing "s" with a "z", "cks" with "x" (so "sucks" becomes "sux"), and the suffixed "ed" becomes either 'd or just the letter "d".

All of this very much lends itself to a shell script, right? So let's test some mad skillz!

For simplicity, let's parse command-line arguments for the l33t.sh script and use some level of randomness to ensure that it's not too normalized. How do you do that in a shell script? With the variable $RANDOM. In modern shells, each time you reference that variable, you'll get a different value somewhere in the range of 1..MAXINT. Want to "flip a coin"? Use $(($RANDOM % 2)), which will return a zero or 1 in reasonably random order.

So the fast and easy way to go through these substitutions is to use sed—that old mainstay of Linux and UNIX before it, the stream editor. Mostly I'm using sed here, because it's really easy to use substitute/pattern/newpattern/—kind of like this:

word="$(echo $word | sed "s/ed$/d/")"

This will replace the sequence "ed" with just a "d", but only when it's the last two letters of the word. You wouldn't want to change education to ducation, after all.

Here are a few more that can help:

word="$(echo $word | sed "s/s$/z/")"
word="$(echo $word | sed "s/cks/x/g;s/cke/x/g")"
word="$(echo $word | sed "s/a/@/g;s/e/3/g;s/o/0/g")"
word="$(echo $word | sed "s/^@/a/")"
word="$(echo $word |  tr "[[:lower:]]" "[[:upper:]]")"

In order, a trailing "s" becomes a trailing "z"; "cks" anywhere in a word becomes an "x", as does "cke"; all instances of "a" are translated into "@"; all instances of "e" change to "3"; and all instances of "o" become "0". Finally, the script cleans up any words that might start with an "a". Finally, all lowercase letters are converted to uppercase, because, well, it looks cool.

How does it work? Here's how this first script translates the sentence "I am a master hacker with great skills":


That's a good start, but there's more you can do, something I'll pick up in my next article. Meanwhile, if you consider yourself a l33t expert, hit me up, let's talk about some additional letter, letter combination and word rules.