www.Michael-Forman.com

Online German-English Dictionary>>  Click to open

Lexica Smart Search
Word
Characters
Mode [Expand] >>   Click to view advanced search
Summary: This form looks up German and English words using a local copy of the open-source dictionary from the Technische Universität Chemnitz. It differs from most online German-English dictionaries in that it strives to reduce the verbosity of the output and to provide grammatical searches useful to those studying German or English.

Dictionary Instructions

Smart search looks up words in the dictionary using a series of matching algorithms, each progressively more general. If an algorithm has no match, it passes the query on to the next. If there is a match, the data is returned and the search stopped. This reduces unnecessary output, providing the user with concise output.

Advanced search provides traditional control over search methods and output formatting. The matching algorithm, language, database, word type, and output mode can be modified to suit the user's needs.

Phonetic Transcription

The CMU Pronouncing Dictionary has been integrated into the search engine powering the Lexica Online German English Dictionary. Phonetic pronunciations for English words are provided in IPA, SAMPA, or CMU format. The online English Phonetic Transcription page provides this same functionality without the dictionary interface and adds phonetic output in HTML and LaTeX formats for those who are interested.

[ðʌ si em ju prʌnaʊˈnsɪŋ dɪˈkʃʌneˌri hæˈz bɪˈn ɪˈntʌgreɪˌtʌd ɪˌntuˈ ðʌ ðʌ sɜːˈtʃ eˈndʒʌn paʊˈɜːɪŋ ðʌ ɔːˈnlaɪˌn dʒɜːˈmʌn ɪˈŋglɪˌʃ dɪˈkʃʌneˌri. fʌneˈtɪk prəʊnʌˌnsieɪˈʃʌnz fɔːˈr ɪˈŋglɪˌʃ wɜːˈdz ɒˈr prʌvaɪˈdʌd ʌn aɪˈ pi eɪˈ, sɒˈmpɒˈ, ɔːˈr ei em ju fɔːˈrmæˌt. ðʌ ɔːnlaɪn ɪŋglɪʃ fʌnetɪk trænskrɪpʃʌn peɪdʒ prʌvaɪdz ðɪs seɪm fʌŋkʃʌnælʌti wɪθaʊt ðʌ dɪkʃʌneri ɪntɜːfeɪs ænd ædz fʌnetɪk aʊtpʊt ʌn HTML ænd leɪteks fɔːrmæts fɔːr ðəʊz hu ɒr ɪntrʌstʌd. ]

The IPA phonetic transcriptions require the Lucida Sans Unicode true type font and a browser that supports Unicode.

Dictionary History

The word list used in the online dictionary has its roots in one of the original Internet dictionaries on the web. Maintained from its modest beginnings by Frank Richter, the word list has grown significantly over the last decade. The word list is used in the online dictionary at the Technische Universität Chemnitz. Frank Richter also maintains das Ding, a Linux German-to-English dictionary, which uses the same word list as the TU-Chemnitz dictionary.

It's important to note that, while other websites eventually limited access to collaborative translating dictionaries, this word list has remained freely available. To the point, the dictionary is distributed freely, das Ding is released under the GPL, and individual contributors are still listed. I say, "still", as my last contribution was in 1994, writing perl scripts to spell check, alphabetize, and remove duplicate entries in the dictionary. At that time, I also changed the format, where entries were on multiple lines separated with the character "-", to its current format, where both languages are on the same line separated by the characters "::". A feature that for better or worse persists to this day.

Dictionary Resources

This dictionary is written in HTML with interpreted in-line Perl, JavaScript, and a Perl-script backend called lexica. The Perl script lexica is a command-line dictionary that uses grep to perform an initial broad search on a dictionary text file, followed by a second search implemented using Perl regular expressions. In order to reduce unnecessary output and provide more accurate search results, lexica employs several search algorithms of increasing complexity to find the best translation. I will make lexica available shortly after I add textual output and finish the final search algorithm.

The best place to get the complete word list is with the source code for das Ding. However, I thought it would be useful to offer my simply formatted files for download. The original word list has been modified to conform to my philosophy concerning word lists: keep the word list uniform and simple while offloading complexity to the dictionary program. With that in mind, single lines in the word list that had multiple translations have been split into multiple lines.

I also found, that searching a word list that included phrases led to excessive amounts of output, especially if one searched for common words. To solve the problem, I split the word list into two files, one containing phrases and the other words. As an attempt to identify errors and redundancies, the list of words was split into several more word lists based on the word function (adjective, verb, noun, usw.). With these new simplified and separated databases the search engine can target searches to a specific category.

Lines Words Chars Filename Description
116537 722715 4880876 de-en.words.txt Complete word list
154201 670020 4629607 de-en.word.txt Words
21543 156340 920213 de-en.phra.txt Phrases
83544 380032 2631626 de-en.noun.txt Nouns
45846 182698 1337023 de-en.adjv.txt Adjectives and Adverbs
13877 63133 369080 de-en.verb.txt Verbs
1142 10095 70018 de-en.abbr.txt Abbreviations
436690 2185033 14838443 total  

$Id: dictionary.html,v 2.1 2003/06/22 06:31:13 forman Exp forman
$Id: lexica,v 1.4 2003/06/27 04:50:50 forman Exp forman

Copyright © 2008 Michael Forman