Custom Hunspell Dictionary¶
I use a custom dictionary, since the Hunspell plugin for Eclipse does not look at my personal dictionary, nor does it allow you to add words from any of the Plugins editors I use, Pydev and ReST to name a few.
The solution was to copy the package dictionary to a new location and add words to it manually.
See this document eclipse spell check, on how to install Hunspell spelling service for Eclipse on a Linux workstation.
Create a new dictionary¶
Install Hunspell if you haven’t already.
$ sudo apt update $ sudo apt install hunspell
Rather than start from scratch, we will be coping the dictionary that comes with the Hunspell package. We could use the package dictionary, but whenever the Hunspell package is updated it would over right any changes you made.
You need to copy both the .dic and .aff files, we will discuss the .dff file later.
$ cd $ mkdir .hunspell $ cd .hunspell/ $ cp /usr/share/hunspell/en_US* . $ ls en_US.aff en_US.dic $ sudo chown billf: * # (use your UID obviously)
Test your new dictionary¶
Check a word to test your new dictionary.
Do not use the full dictionary file name, only en_US
$ echo "Linux" | hunspell -d ~/.hunspell/en_US @(#) International Ispell Version 3.2.06 (but really Hunspell 1.7.0) *
Hunspell returned an * which means it found that word spelled correctly.
Add words to the dictionary¶
Check the spelling of the word "sudo"
$ echo "sudo" | hunspell -d ~/.hunspell/en_US Hunspell 1.7.0 & sudo 3 0: suds, ludo, sumo
The first letter returned was an &, so Hunspell did not find a match in the dictionary. It would have been an * if it did, but it did find some close matches 3 total, not good enough.
Now, lets add sudo to our custom dictionary.
Be careful when using this, if you use only one > it will erase the whole dictionary and only add the single word you echoed. You should always make a backup of your custom dictionary.
$ cd ~/.hunspell $ cp en_US.dic en_US.dic_BU_`date +"%m_%d_%Y_%I_%M_%p"` $ echo "sudo" >> ~/.hunspell/en_US.dic
Lets check the spelling again:
$ echo "sudo" | hunspell -d ~/.hunspell/en_US Hunspell 1.7.0 *
Found the correct spelling, returned an *.
You can also add several words at one time. To do that, add the words to a file, each on a new line.
Using the file new_words in this example.
$ cd ~/.hunspell $ cp en_US.dic en_US.dic_BU_`date +"%m_%d_%Y_%I_%M_%p"` $ nano new_words # add your new words each on a new line $ cat "new_words" >> en_US.dic
You need to restart Eclipse for the new word(s) to show up, when Eclipse starts it caches the dictionary.
Understanding the .dic and .dff files¶
The .dff or affix file is used to cut down on the number of entries in the .dic dictionary file, by using a single word in the dictionary that has common suffixes and prefixes. For example the word build with a suffix builders or suffix building.
I’m going to use build for an example.
Lets first look at the build word in the dictionary file.
$ grep build en_US.dic bodybuilder/SM bodybuilding/M build/SMRZGJ builder/M building/M buildup/SM outbuilding/MS overbuild/SG rebuild/SG shipbuilder/SM shipbuilding/M
The build word has /SMRZGJ types. Let see where those come from and what they stand for.
Lets start with the "S" and grep the affix file en_US.aff looking "SFX S". The SFX stands for suffix and the PFX in the file stands for prefixes.
$ grep "SFX S" en_US.aff SFX S Y 4 SFX S y ies [^aeiou]y SFX S 0 s [aeiou]y SFX S 0 es [sxzh] SFX S 0 s [^sxzhy]
Each of the types work the same.
M type match if 's is on the end of the word build’s
$ grep "SFX M" en_US.aff SFX M Y 1 SFX M 0 's .
R type match if er is on the end of the word builder
$ grep "SFX R" en_US.aff SFX R Y 4 SFX R 0 r e SFX R y ier [^aeiou]y SFX R 0 er [aeiou]y SFX R 0 er [^ey]
Z type match if ers is on the end of the word builders
$ grep "SFX Z" en_US.aff SFX Z Y 4 SFX Z 0 rs e SFX Z y iers [^aeiou]y SFX Z 0 ers [aeiou]y SFX Z 0 ers [^ey]
G type match if ing is on the end of the word building
$ grep "SFX G" en_US.aff SFX G Y 2 SFX G e ing e SFX G 0 ing [^e]
J type match if ings is on the end of the word buildings
$ grep "SFX J" en_US.aff SFX J Y 2 SFX J e ings e SFX J 0 ings [^e]
Here are a couple of links with more detail on the affix file.