This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO
OUTPUT FORMATTING, and NO UI. It can only process an image of a
single column and create text from it. It can detect fixed pitch
vs proportional text. Having said that, in 1995, this engine was
in the top 3 in terms of character accuracy, and it compiles and
runs on both Linux and Windows. Another current limitation is that
it only recognizes English and its character set is only US-ASCII.
Training code IS included in the open source release however, and
will be included in a future release.
TODO:
Compiles fine, but dumps core on NetBSD-4.99.3/amd64. Backtrace:
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004c1c70 in reverse32 ()
(gdb) bt
#0 0x00000000004c1c70 in reverse32 ()
#1 0x00000000004aed12 in read_squished_dawg ()
#2 0x00000000004aaded in init_permute ()
#3 0x0000000000485779 in program_editup ()
#4 0x0000000000485869 in start_recog ()
#5 0x0000000000403d04 in init_tesseract ()
#6 0x000000000040309b in main ()
(gdb)