Like a lot of you, I discovered Wordle a while ago. I love the backstory — wrote it for his partner, opened it up to the world, game got bought by the New York Times for real money!
I tend to get obsessed about stuff. Wordle’s no exception. I thought about playing strategies and decided that a smart player would think about letter frequencies. If you play words that have the letters that appear most often in all the available words, then you’ll have the highest likelihood of finding correct letters. And when letters don’t match, you’ll get rid of the most possible remaining words.
If you Google up “letter frequencies in English,” you’ll discovered that lots of people have spent a lot of time counting, and that a good consensus of the most frequent letters in written English, in order, is ETAOINSHRDLU. The exact rankings move around a little bit depending on who’s counting, but that’s a good list for you if you’re playing hangman against a third-grader.
Wordle’s a special case of the English language, though. All the words in Wordle are exactly five letters long. If you want to win at Wordle, you need to consider just the five-letter words.
I downloaded the online approximation of the Scrabble dictionary, pulled out just the five-letter words, and wrote a computer program to calculate letter frequencies from the result. That list begins SEAORILTNUDC. If you’re using my letter-frequency strategy, you want to choose words that contain the letters toward the beginning of that list.
So I always play AROSE as my opening word.
I use those same letter frequencies to make guesses later in the game. That’s not really an optimal strategy — every time you make a guess in Wordle, you eliminate a bunch of words. A smart player would recalculate letter frequencies from all remaining words after every guess, and would adjust next-guess word choice accordingly.
In order to do that, of course, a smart player would need to have the entire Wordle dictionary memorized, and would need to be really fast at counting. I’m just a hunk of meat running at low clock speed, but I know how to program computers, and they’re really good at stuff like that.
So I wrote a bot that plays exactly my strategy, but rigorously. It recalculates letter frequencies after every guess, scores all the words on those frequencies, and always picks the word at the top of the list. Ties get resolved by alphabetical order.
For more than a year now, my every-morning routine has been coffee, quick scan of the headlines, play Wordle myself, have my Wordlebot play Wordle, write down what the word was and our respective scores. Sometimes I win, sometimes the bot wins. We tie a lot. A few times, we have played the same words in the same order.
In general, I beat my Wordlebot by about 0.2 guesses on any given day. The exact number fluctuates a little over the range: 0.18 lifetime, 0.15 over the last 100 days, 0.23 in the last month.
It’s a little surprising, on the face of it, that the hunk of meat would beat precision itself. The answer’s simple: I am a better judge than my Wordlebot of what words an actual human editor would choose for the New York Times reading audience. ONTIC and CHIRU may be legal guesses, but, I mean, come on, man.
There are a few other facts about Wordle that I know, that I haven’t figured out how to render into code. For example, the editor never picks a plural. I use that fact to prune my guesses, but my bot doesn’t know what a plural is.
My Wordlebot guesses stuff that I rule out based on taste. Those are solid for pruning the field, but every one of them is wrong, and the bot’s guaranteed not to win on that particular turn. Add those up across all the games we play, and I have the edge.
I could add more smarts to the bot and have it consider word frequency using a table that I build from Google ngrams or something, but then I fear I would start losing. And who wants to start their day getting reliably beaten by a bot?
Notes to fanatics:
- Hard mode, thank you very much.
- I know that the Times has added its own Wordlebot that analyzes your play after your game. I love it! But my Wordlebot came first so I’m keeping the name.
- The Times Wordlebot always starts with LEAST. I suspect that’s because they’ve pruned words like ONTIC and CHIRU out of the corpus, and have calculated letter frequencies from the words remaining. I don’t have that dictionary.
- You might also do better if you considered letter position as well as frequency. That seems like a lot of trouble and a couple of new data structures to code up. Too hard.