It is my philosophy that word lists for the purpose of translation
should be maintained in the simplest possible format, offloading
the complexity to search and translation tools.
The goal being, to provide users with the ability to easily modify
the database and to quickly develop code.
Complexity Must Add Information
Desirable traits in a "simple" word list are those which add
information at the expense of complexity.
An example of a simple word list would be a file containing single
line entries, where each entry contains a word and its translation
with a separating character in between.
For example the word
"
fahren"
and its translation
"
to drive"
could be on a single line separated by
"
::"
as such:
fahren :: to drive
Additional information can be added through the use of bracketed modfiers.
To denote that a words above are verbs, one would add the charaters
"
{vt}"
for "transitive verb" to the entry.
Note that this introduces the convention of placing parts of speech
within curly brackets.
It is not important which side takes the bracketed modifier.
fahren {vt} :: to drive
Continuing the idea, irregular tenses and helping verbs can be added
in the same way, using parenthesis as the modifying bracket.
If an element within the parenthesis is suffixed with a semicolon,
it refers to the the second- and third-person singular umlaut,
fährt.
Following that optional entry is the
simple past,
das Präteritum,
fuhr,
followed by the
past participle used in the present perfect tense,
das Perfekt,
gefahren,
separated by commas.
If past participle takes an irregular helping verb it is
placed before the past participle,
ist gefahren.
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven)
Categorical information is added with square brackets.
In this case the category
[Zool.]
is added for illustration.
fahren (fährt; fuhr, ist gefahren) {vt} [Zool.] :: to drive (drove, driven)
Additional entries are added on separate lines.
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven)
fahren (fährt; fuhr, ist gefahren) {vt} :: to navigate
fahren (fährt; fuhr, ist gefahren) {vi} :: to ply (between)
fahren (fährt; fuhr, ist gefahren) {vt} :: to ride (rode, ridden)
ansteuern :: to drive (drove, driven)
antreiben :: to drive (drove, driven)
befahren (regelmäßig) :: to ply (between)
pendeln (zwischen) :: to ply (between)
verkehren :: to ply (between)
Although the file grows more complicated as more information is added,
it is still considered a "simple" file.
Examples will follow later that will describe exactly what
a "complex" file is and the difficulties they create for maintenance
and development.
For now I would like to introduce a few undesirable traits that
are found in simple word lists.
I define an undesirable trait as that, which increases the complexity
of the file without adding information.
A perfect example of this is multiple translations for a word appearing
on a single line.
In the word lists I've seen, this is typically done with the use of the
"
|"
character to join multiple entries on a single line.
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven) | to navigate | to ply (between)
Compare that with what I argue is the simpler and better format:
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven)
fahren (fährt; fuhr, ist gefahren) {vt} :: to navigate
fahren (fährt; fuhr, ist gefahren) {vi} :: to ply (between)
Note that the same information is contained in both formats.
However, the first adds complexity (the separation of multiple entries
with the
"
|"
character) without adding information.
The increase in complexity provides a decrease in file size.
While this is admirable, this is can be better achieved by alternate
methods such as file compression.
All increases in word-list complexity must add information to that word list.
If one is not careful, there are instances, where lumping multiple entries
together, can lead to a loss of information.
Note that it was easy to miss that the intransitive verb,
"
to ply (between)",
was misclassified as a transitive verb,
{vt}.
Ease of Modification
More to follow ...
Quick Code Development
More to follow ...