Today, I decided to play with a new language so I gave Clojure a try. I have previously played with Lisp so I expected some fun with brackets.
I decided to build a simple spell checker.
Install the Cursive plugin in IntelliJ. Refer to the docs.
Once installed start by creating a new project.
Next choose the location on the file system and press the create
It will install the Clojure and Leign. Once project is setup you will see the structure as shown below.
Now that we dev environment setup let’s start creating our spell checker.
There are two main files generated by the plugin
- core.clj: It is the place where we will define our spell checker logic
- project.clj: It is our build file
- Rest are documentation and test files.
The core.clj has following code.
(ns spell-checker.core) (defn foo "I don't do a whole lot." [x] (println x "Hello, World!"))
It only prints
It is not executable so we first make it executable by adding the main function as shown below. And, then we call the
To run this using IntelliJ we have to define a new run configuration as shown below.After setting the values press
Apply and then
Now, you will be able to run the Clojure program.
Let’s write the Psuedocode of what we to go to implement a simple and basic spell checker.
- Find English word list on the web
- Load English words in a Set. We will use since we care about uniqueness and order does not matter
- Next, we will check if the given word exists in our Set if yes we will return the same word in a list
- If word does not exist then we find words that are similar to the given word using levenshtein distance
- Finally, we return a list of similar words
Step 1: Find English word list on the web
I did a quick Google search and found the following English word list https://www.mit.edu/~ecprice/wordlist.10000. It has 10,000 words.
Download this file to
Step 2: Load English words in a Set
We used the Clojure
slurp function to first load a file to a String. Then, we split the String using
split-lines and lower case each word. And, finally we added it to a Set.
(ns spell-checker.core (:require [clojure.string :as str])) (def words (set (map str/lower-case (str/split-lines (slurp "resources/words.txt"))))) (defn -main  (println "Total number of words loaded: " (count words)))
Run the main program and you will get the following output.
Total number of words loaded: 10000
Step 3: Word exists in the Set
Now, we will define our spell-checker function that will check the Set for the word. We used the collection
contains? function to check for existence of a word. If word is found we return the word in a list else we return an empty list.
(ns spell-checker.core (:require [clojure.string :as str])) (def words (set (map str/lower-case (str/split-lines (slurp "resources/words.txt"))))) (defn spell-checker [input] (let [word (str/lower-case input)] (let [exactMatchFound? (contains? words word)] (if exactMatchFound? (list word) ())))) (defn -main  (println (spell-checker "English")) (println (spell-checker "Englisx")) (println (spell-checker "nxsnk")) )
When we run the code shown above we get following result.
(English) () ()
The result was as expected.
English is an English word so we returned that
Step 4: Find similar words using levenshtein distance
First, we have to update the dependencies section of
:dependencies [ [org.clojure/clojure "1.10.1"] [clj-fuzzy "0.4.1"] ]
[clj-fuzzy "0.4.1"] dependency.
First, we will implement the
distance function that takes two words and returns their levenshtein distance as shown below.
(ns spell-checker.core (:require [clojure.string :as str]) (:use clj-fuzzy.metrics) ) (def words (set (map str/lower-case (str/split-lines (slurp "resources/words.txt"))))) (defn distance [word1 word2] (levenshtein word1 word2))
Next, we will have to implement
similar-words function that is called by
(defn spell-checker [input] (let [word (str/lower-case input)] (let [exactMatchFound? (contains? words word)] (if exactMatchFound? (list word) (similar-words word words)))))
similar-words function take the word and our English words set as input and return a list of similar words.
I wrote an inefficient version of
similar-words function. Please don’t use it. It can be optimized. I am a Clojure newbie so this is my first implementation that works.
(defn similar-words [word words] (let [mapF (partial distance word)] (take 3 (map second (sort-by first (map list (map mapF words) words))))))
The above code does following
- We create a partial function
mapF. It allows us to convert distance to one argument function. This is required
mapexpects a function that take the collection element and return a response.
- Next, we apply mapF on words. It will give us the distance between
wordand collection element.
- Next, I zip distance list with original words
- We sort the resultant tuple list by distance
- Map the list to get only the collection element
- Finally, take only the three elements from the result list
Now, if I run my code with three examples –
nxsnk we get following response. The response is sorted by levenshtein distance.
(english) (english england eagles) (funk junk desk)