CreLang-design/implementation.md

1.4 KiB

Implementation

Grammar

Grammar is a very complex issue in linguistics, it is certainly hard to represent it structurally. This design thus likely does not cover all grammatical constructions. It might be rather Eurocentric, and probably does not cover many languages whose grammar I'm not familiar with, such as:

  • Korean
  • Arabic (all dialects)
  • Swahili
  • Nahuatl
  • Lojban
  • Sign languages

I would be happy to extend (either by myself, or merging contributions) the system to be able to represent those languages once the project is stable enough.

Inflection

Inflections, at least in the majority of Indo-European languages, occur as prefixes or suffixes. We should not exclude the possibility of other types of inflection:

  • Circumfix: haben -- gehabt (German)
  • Simulfix: goose -- geese (English, also known as umlaut or ablaut)
  • German Trennbarverb: einschlafen -- Ich schlafe ein.
  • Infix: No example found yet
  • Reduplication

I propose two formats to store inflection rules:

  • C-style string format, e.g. %Sen would signifiy the stem is followed by en.
    • Example: transformation %Sen --> %St would turns haben to habt and liegen to liegt It also turns senden into *sendt.
  • RegEx, e.g. oo matches the first substring with oo and transform
    • Example: transformation oo --> ee would turns foot into feet and tooth into teeth It also turns book into *beek