#
Phonetic Reference
Outdated
Rule configuration documentation is not yet up to date. We have made major changes and still have the update the documentation. Please refer to our release notes for more details.
The following phonetics can be used from within matchers that support phonetics.
#
cologne
The cologne
phonetic uses the
Cologne phonetic algorithm,
which is similar to the Soundex phonetic algorithm, but specialized for the
German language.
Example:
Hendrik
06274
Steven
8236
Stefan
8236
#
dynamicCologneLevenshtein
The dynamicCologneLevenshtein
is an extension of the cologne
phonetic, which
introduces a length dependent
Levenshtein distance.
Depending on your use case, the issue with the cologne
phonetic might be that
it considers short texts as equal even if a human would not consider them equal.
E.g. Leon
and Liam
produce the same code (56
), but do not sound similar.
In fact their Levenshtein distance is 3, meaning three characters are wrong and
only one is correct.
The dynamicCologneLevenshtein
prevents any changes for when the shortest of
the two texts has at maximum three characters, allows a distance of 1 for four
to six letter words, 2 for seven to nine character words, and 3 for everything
with at least ten characters.
#
equal
The equal
phonetic does not allow any phonetic changes at all.
#
metaphone
The metaphone
phonetic uses the
Metaphone algorithm,
which is similar to the Soundex phonetic algorithm, but with more accurate
results.
Example:
Hendrik
NTRK
Steven
STFN
Stefan
STFN
#
noDiacritOrSpecials
The noDiacritOrSpecials
phonetic removes all
diacritics, leaving only the basic
letter. Also it replaces, where possible, other special letters, e.g. the German
ß
with a phonetic similar letter. It also detects and decodes unicodes,
such as \u00f5
and \\u00f5
.
Example:
aāâäõŗłġß
aaaaorlgss
aāâä\\u00f5ŗłġß
aaaaorlgss
#
soundex
The soundex
phonetic uses the
Soundex algorithm, a relatively simple
way of producing same outputs for similar sounding words.
Example:
Hendrik
H536
Steven
S315
Stefan
S315