125 lines
3.7 KiB
Markdown
125 lines
3.7 KiB
Markdown
# Database
|
|
|
|
To persist the information as specified in [Use Cases](uc.md),
|
|
we need to design an appropriate database accordingly.
|
|
This section discusses the structure for each information.
|
|
|
|
Due to the target for Android development, SQLite is chosen as the platform's
|
|
database. SQLite is a relational database that has a programming interface in
|
|
virtually any modern programming language. On top of that, it is natively
|
|
supported on both Android and iOS. It is also suitable for a custom
|
|
[application file format][app-format]. In fact, Anki used SQLite for its
|
|
flashcard deck file. SQLite supports databases up to [281 terabytes][size],
|
|
which is more than enough for the use case.
|
|
|
|
[app-format]: https://sqlite.org/appfileformat.html
|
|
[size]: https://sqlite.org/whentouse.html
|
|
|
|
## Phonology and Phonetics
|
|
### Set of consonants, vowels, tones
|
|
|
|
The database for this would likely be mostly fixed in number of rows.
|
|
|
|
- IPA (`TEXT`): The International Phonetic Alphabet representation of the sound.
|
|
- X_SAMPA (`TEXT`): The X-SAMPA equivalent,
|
|
which allows user to type on a non-IPA keyboard
|
|
- is_used (`BOOLEAN`): Whether the sound is used in the language
|
|
|
|
## Grammar
|
|
|
|
According to Chomsky, a grammar consists of:
|
|
|
|
- vocabulary V
|
|
- a subset T of V consisting of terminal symbols
|
|
-- we call this word list
|
|
- a subset N of V consisting of non-terminal symbols
|
|
-- we call this word classes
|
|
(e.g. nouns, adjectives, noun phrase)
|
|
- a start symbol S -- this is usually a phrase or sentence
|
|
- a finite set of productions P [^1]
|
|
|
|
In this project, grammar only means the production rules.
|
|
|
|
Each production rule is characterized by following columns:
|
|
|
|
- name (`TEXT`, unique): The human-friendly identifier for the rule
|
|
- transformation_syntax(`INTEGER`): id for the transformation syntax
|
|
- before (`TEXT`): the strings to be transformation
|
|
- after (`TEXT`) The string after the transformation
|
|
- description (`TEXT`): Rule description to be added to the document
|
|
|
|
There are two transformation syntax (more elaborated at
|
|
[Implementation](implementation.md) chapter):
|
|
|
|
- RegEx
|
|
- C-style string format
|
|
|
|
Each kind of production rule is represented in its own table
|
|
and is described in following subsections.
|
|
|
|
### Inflection
|
|
|
|
Inflections are usually unique to only one part of speech, so they need a
|
|
column for the part of speech.
|
|
|
|
Extra columns:
|
|
|
|
- part_of_speech (`INTEGER`): ID of the part of speech
|
|
|
|
### Phrase Syntax
|
|
|
|
TBD
|
|
|
|
### Derivation
|
|
|
|
Derivation is like inflection, except that it usually changes its part of
|
|
speech. By default, derivation rule is disabled.
|
|
|
|
Extra column:
|
|
|
|
- part_of_speech_before (`INTEGER`): ID of the part of speech the rule apply to
|
|
- part_of_speech_after (`INTEGER`): ID of the part of speech the rule transform the verb to
|
|
|
|
## Writing system
|
|
### Orthography rules
|
|
|
|
The table for orthography rules divides in two categories: hard rules and soft
|
|
rules. Hard rules are enforced by the program to check if a newly added word
|
|
follows the rule, and if a body of text follows such rule. Soft rules are
|
|
human-readable rules that will be exported into document. The hard rules can be
|
|
defined using RegEx, anti-RegEx (matching texts that are disallowed), BNF, or
|
|
EBNF.
|
|
|
|
Columns:
|
|
|
|
- type (`TEXT`): regex, anti-regex, bnf, ebnf, soft
|
|
- rule (`TEXT`)
|
|
|
|
### Scripts
|
|
|
|
Columns:
|
|
|
|
- name (`TEXT`)
|
|
- glyph (`BLOB`): the content of the vector file
|
|
|
|
## Vocabulary
|
|
### Part of Speech & Word Class
|
|
|
|
They share the same structure
|
|
|
|
Columns:
|
|
|
|
- name (`TEXT`)
|
|
|
|
### Word List
|
|
|
|
Columns:
|
|
|
|
- word (`TEXT`)
|
|
- part_of_speech (`INTEGER`): part of speech ID
|
|
- word_class (`INTEGER`): word class ID
|
|
- definition (`TEXT`): Can be translation into a natural language,
|
|
or the native conlang -- this is up to the user.
|
|
|
|
[^1]: Kenneth H. Rosen, Discrete Mathematics and Its Applications
|