Custom Hunspell Dictionary

Overview

I use a custom dictionary, since the Hunspell plugin for Eclipse does not look at my personal dictionary, nor does it allow you to add words from any of the Plugins editors I use, Pydev and ReST to name a few.

The solution was to copy the package dictionary to a new location and add words to it manually.

See this document eclipse spell check, on how to install Hunspell spelling service for Eclipse on a Linux workstation.

Create a new dictionary

Install Hunspell if you haven’t already.

$ sudo apt update
$ sudo apt install hunspell

Rather than start from scratch, we will be copying the dictionary that comes with the Hunspell package. We could use the package dictionary, but whenever the Hunspell package is updated, it would overwrite any changes you made.

Note

You need to copy both the .dic and .aff files, we will discuss the .dff file later.

$ cd
$ mkdir .hunspell
$ cd .hunspell/
$ cp /usr/share/hunspell/en_US* .
$ ls
en_US.aff  en_US.dic
$ sudo chown billf: *     # (use your UID obviously)

Test your new dictionary

Check a word to test your new dictionary.

Note

Do not use the full dictionary file name, only en_US

$ echo "Linux" | hunspell -d ~/.hunspell/en_US
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.7.0)
*

Hunspell returned an * which means it found that word spelled correctly.

Add words to the dictionary

Check the spelling of the word "sudo"

$ echo "sudo" | hunspell -d ~/.hunspell/en_US
Hunspell 1.7.0
& sudo 3 0: suds, ludo, sumo

The first letter returned was an &, so Hunspell did not find a match in the dictionary. It would have been an * if it did, but it did find some close matches 3 total, not good enough.

Now, lets add sudo to our custom dictionary.

Danger

Be careful when using this, if you use only one > it will erase the whole dictionary and only add the single word you echoed. You should always make a backup of your custom dictionary.

$ cd ~/.hunspell
$ cp en_US.dic en_US.dic_BU_`date +"%m_%d_%Y_%I_%M_%p"`
$ echo "sudo" >> ~/.hunspell/en_US.dic

Lets check the spelling again:

$ echo "sudo" | hunspell -d ~/.hunspell/en_US
Hunspell 1.7.0
*

Found the correct spelling, returned an *.

You can also add several words at one time. To do that, add the words to a file, each on a new line.

Using the file new_words in this example.

$ cd ~/.hunspell
$ cp en_US.dic en_US.dic_BU_`date +"%m_%d_%Y_%I_%M_%p"`
$ nano new_words   # add your new words each on a new line
$ cat "new_words" >> en_US.dic

Note

You need to restart Eclipse for the new word(s) to show up, when Eclipse starts it caches the dictionary.

Understanding the .dic and .dff files

The .dff or affix file is used to cut down on the number of entries in the .dic dictionary file, by using a single word in the dictionary that has common suffixes and prefixes. For example the word build with a suffix builders or suffix building.

I’m going to use build for an example.

Lets first look at the build word in the dictionary file.

$ grep build en_US.dic
bodybuilder/SM
bodybuilding/M
build/SMRZGJ
builder/M
building/M
buildup/SM
outbuilding/MS
overbuild/SG
rebuild/SG
shipbuilder/SM
shipbuilding/M

The build word has /SMRZGJ types. Let see where those come from and what they stand for.

Lets start with the "S" and grep the affix file en_US.aff looking "SFX S". The SFX stands for suffix and the PFX in the file stands for prefixes.

$ grep "SFX S" en_US.aff
SFX S Y 4
SFX S   y     ies        [^aeiou]y
SFX S   0     s          [aeiou]y
SFX S   0     es         [sxzh]
SFX S   0     s          [^sxzhy]
The first line, SFX=suffix, S=type used in the dictionary entry, Y=can have prefix too, and 4=number of tests.
The following lines, third column is strip last letter y Yes or 0 No. forth column what do we add to the word, fifth column is the test.
The first test (line 2) says Yes replace the last letter, with ies, if it is ^ not ay ey iy oy uy on the end of the word. body is an example bodies

For the word build, it would be the last line, don’t replace the last letter, add s to the end of the word, the last letter is ^ not s x z h or y. builds

Each of the types work the same.

M type match if 's is on the end of the word build’s

$ grep "SFX M" en_US.aff
SFX M Y 1
SFX M   0     's         .

R type match if er is on the end of the word builder

$ grep "SFX R" en_US.aff
SFX R Y 4
SFX R   0     r          e
SFX R   y     ier        [^aeiou]y
SFX R   0     er         [aeiou]y
SFX R   0     er         [^ey]

Z type match if ers is on the end of the word builders

$ grep "SFX Z" en_US.aff
SFX Z Y 4
SFX Z   0     rs         e
SFX Z   y     iers       [^aeiou]y
SFX Z   0     ers        [aeiou]y
SFX Z   0     ers        [^ey]

G type match if ing is on the end of the word building

$ grep "SFX G" en_US.aff
SFX G Y 2
SFX G   e     ing        e
SFX G   0     ing        [^e]

J type match if ings is on the end of the word buildings

$ grep "SFX J" en_US.aff
SFX J Y 2
SFX J   e     ings       e
SFX J   0     ings       [^e]

Here are a couple of links with more detail on the affix file.

https://zverok.github.io/blog/2021-03-16-spellchecking-dictionaries.html

https://www.systutorials.com/docs/linux/man/4-hunspell/