www.Michael-Forman.com

Word-List Philosophy>>  Click to open

It is my philosophy that word lists for the purpose of translation should be maintained in the simplest possible format, offloading the complexity to search and translation tools. The goal being, to provide users with the ability to easily modify the database and to quickly develop code.

Complexity Must Add Information

Desirable traits in a "simple" word list are those which add information at the expense of complexity. An example of a simple word list would be a file containing single line entries, where each entry contains a word and its translation with a separating character in between. For example the word "fahren" and its translation "to drive" could be on a single line separated by "::" as such:
fahren :: to drive
Additional information can be added through the use of bracketed modfiers. To denote that a words above are verbs, one would add the charaters "{vt}" for "transitive verb" to the entry. Note that this introduces the convention of placing parts of speech within curly brackets. It is not important which side takes the bracketed modifier.
fahren {vt} :: to drive
Continuing the idea, irregular tenses and helping verbs can be added in the same way, using parenthesis as the modifying bracket. If an element within the parenthesis is suffixed with a semicolon, it refers to the the second- and third-person singular umlaut, fährt. Following that optional entry is the simple past, das Präteritum, fuhr, followed by the past participle used in the present perfect tense, das Perfekt, gefahren, separated by commas. If past participle takes an irregular helping verb it is placed before the past participle, ist gefahren.
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven)
Categorical information is added with square brackets. In this case the category [Zool.] is added for illustration.
fahren (fährt; fuhr, ist gefahren) {vt} [Zool.] :: to drive (drove, driven)
Additional entries are added on separate lines.
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven)
fahren (fährt; fuhr, ist gefahren) {vt} :: to navigate
fahren (fährt; fuhr, ist gefahren) {vi} :: to ply (between)
fahren (fährt; fuhr, ist gefahren) {vt} :: to ride (rode, ridden)
ansteuern                               :: to drive (drove, driven)
antreiben                               :: to drive (drove, driven)
befahren (regelmäßig)                   :: to ply (between)
pendeln (zwischen)                      :: to ply (between)
verkehren                               :: to ply (between)
Although the file grows more complicated as more information is added, it is still considered a "simple" file. Examples will follow later that will describe exactly what a "complex" file is and the difficulties they create for maintenance and development.

For now I would like to introduce a few undesirable traits that are found in simple word lists. I define an undesirable trait as that, which increases the complexity of the file without adding information. A perfect example of this is multiple translations for a word appearing on a single line. In the word lists I've seen, this is typically done with the use of the "|" character to join multiple entries on a single line.
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven) | to navigate | to ply (between)
Compare that with what I argue is the simpler and better format:
fahren (fährt; fuhr, ist gefahren) {vt} :: to drive (drove, driven)
fahren (fährt; fuhr, ist gefahren) {vt} :: to navigate
fahren (fährt; fuhr, ist gefahren) {vi} :: to ply (between)
Note that the same information is contained in both formats. However, the first adds complexity (the separation of multiple entries with the "|" character) without adding information. The increase in complexity provides a decrease in file size. While this is admirable, this is can be better achieved by alternate methods such as file compression. All increases in word-list complexity must add information to that word list.

If one is not careful, there are instances, where lumping multiple entries together, can lead to a loss of information. Note that it was easy to miss that the intransitive verb, "to ply (between)", was misclassified as a transitive verb, {vt}.

Ease of Modification

More to follow ...

Quick Code Development

More to follow ...


Copyright © 2008 Michael Forman