Diglossa.org and Anthrax.js, beta version
Diglossa.org is a set of utilities for slow reading of multilingual texts. Ἅνθραξ - (ulcer), - morphological analyzer of ancient Greek. There is no linguistics as a text-meaning model, only honest processing of strings.
A word form is represented in the program as: - 1. prefix (or one or two leading vowels), - 2. stem, - 3. suffix, - 4. ending. Prefixes or leading vowels are calculated not only in verbs, but also in names and any inflected forms.
Nests are calculated in all dictionaries based on stems. DBs of dictionaries are created using the dictionary of nests. The dictionaries are organized not by dictionary forms, but by stems.
Thanks to this, Anthrax can calculate the word scheme even if it is not in any dictionary, if it consists of known prefixes, stems and the corresponding endings. And also find related words with the same stem and, possibly, other prefixes, endings, etc.
The technology is new, complex, and changes quickly, both when new dictionaries are added and in essence. Bugs and errors are possible, this is a beta version.
The word representation schemes may look unusual. However, they are mathematically equivalent to familiar grammatical schemes. And the analysis results will match the Wiktionary data with an accuracy of 1 percent. All word forms currently presented in Wictionary are used as tests.
For example, in the word προκαταλαμβάνω we get 3 schemes with the stem λαμβ:
- - προκαταλαμβάνω: προακαταλαμβαν-ω
- - καταλαμβάνω: προ-καταλαμβαν-ω
- - λαμβάνω: προ-κατ-α-λαμβαν-ω
and in the word προσυπογράφω only two with the stem γραφ:
- - ὑπογράφω: προσ-υπογραφ-ω
- - γράφω: προσ-υπ-ο-γραφ-ω
because when generating the Nest dictionary, the full word προσυπογράφω was not in the original dictionaries