d46db02864
Move to new home at Github. Clean up. 2015-02-17 - V3.04.01 - Added OSD renderer for psm 0. Works for single page and multi-page images. - Improve tesstrain.sh script. - Simplify build and run of ScrollView. - Improved PDF output for OS X Preview utility. - INCOMPATIBLE fix to hOCR line height information - commit 134ebc3. - Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD). - Enable OpenMP support. - Many bug fixes. 2015-07-11 - V3.04.00 - Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). - Tesseract now requires leptonica 1.71 or a higher version. - Removed official support for VS 2008. - Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid - Major updates to training system as a result of extensive testing on 100 languages. - New training data for over 100 languages - Improved performance with PIC compilation option. - Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript. - Improved font identification. - Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc. - Fixed problems with shifted baselines so recognition can recover from layout analysis errors. - Major refactor to improve speed on difficult images, especially when running a heap checker. - Moved params from global in page layout to tesseractclass. - Improved single column layout analysis. - Allow ocr output to multiple formats using tesseract command line executable. - Fixed issues with mixed eng+ara scripts. - Improved script consistency in numbers. - Major refactor of control.cpp to enable line recognition. - Added tesstrain.sh - a master training script. - Added ability to text2image training tool to just list available fonts. - Added ability to text2image to underline words. - Improved efficiency of image processing for PDF output. - Added parameter description for each parameter listed with 'print-parameters' command line option. - Added font info to hOCR output. - Enabled streaming input and output of multi-page documents. - Many bug fixes. 2014-02-04 - V3.03(rc1) - Added new training tool text2image to generate box/tif file pairs from text and truetype fonts. - Added support for PDF output with searchable text. - Removed entire IMAGE class and all code in image directory. - Tesseract executable: support for output to stdout; limited support for one page images from stdin (especially on Windows) - Added Renderer to API to allow document-level processing and output of document formats, like hOCR, PDF. - Major refactor of word-level recognition, beam search, eliminating dead code. - Refactored classifier to make it easier to add new ones. - Generalized feature extractor to allow feature extraction from greyscale. - Improved sub/superscript treatment. - Improved baseline fit. - Added set_unicharset_properties to training tools. - Many bug fixes. - More training source data included.
238 lines
7 KiB
Text
238 lines
7 KiB
Text
@comment $NetBSD: PLIST,v 1.8 2016/03/17 12:51:14 fhajny Exp $
|
|
bin/ambiguous_words
|
|
bin/classifier_tester
|
|
bin/cntraining
|
|
bin/combine_tessdata
|
|
bin/dawg2wordlist
|
|
bin/mftraining
|
|
bin/set_unicharset_properties
|
|
bin/shapeclustering
|
|
bin/tesseract
|
|
bin/text2image
|
|
bin/unicharset_extractor
|
|
bin/wordlist2dawg
|
|
include/tesseract/apitypes.h
|
|
include/tesseract/baseapi.h
|
|
include/tesseract/basedir.h
|
|
include/tesseract/capi.h
|
|
include/tesseract/errcode.h
|
|
include/tesseract/fileerr.h
|
|
include/tesseract/genericvector.h
|
|
include/tesseract/helpers.h
|
|
include/tesseract/host.h
|
|
include/tesseract/ltrresultiterator.h
|
|
include/tesseract/memry.h
|
|
include/tesseract/ndminx.h
|
|
include/tesseract/ocrclass.h
|
|
include/tesseract/osdetect.h
|
|
include/tesseract/pageiterator.h
|
|
include/tesseract/params.h
|
|
include/tesseract/platform.h
|
|
include/tesseract/publictypes.h
|
|
include/tesseract/renderer.h
|
|
include/tesseract/resultiterator.h
|
|
include/tesseract/serialis.h
|
|
include/tesseract/strngs.h
|
|
include/tesseract/tesscallback.h
|
|
include/tesseract/thresholder.h
|
|
include/tesseract/unichar.h
|
|
include/tesseract/unicharmap.h
|
|
include/tesseract/unicharset.h
|
|
lib/libtesseract.la
|
|
lib/pkgconfig/tesseract.pc
|
|
man/man1/ambiguous_words.1
|
|
man/man1/cntraining.1
|
|
man/man1/combine_tessdata.1
|
|
man/man1/dawg2wordlist.1
|
|
man/man1/mftraining.1
|
|
man/man1/shapeclustering.1
|
|
man/man1/tesseract.1
|
|
man/man1/unicharset_extractor.1
|
|
man/man1/wordlist2dawg.1
|
|
man/man5/unicharambigs.5
|
|
man/man5/unicharset.5
|
|
share/tessdata/afr.traineddata
|
|
share/tessdata/amh.traineddata
|
|
share/tessdata/ara.cube.bigrams
|
|
share/tessdata/ara.cube.fold
|
|
share/tessdata/ara.cube.lm
|
|
share/tessdata/ara.cube.nn
|
|
share/tessdata/ara.cube.params
|
|
share/tessdata/ara.cube.size
|
|
share/tessdata/ara.cube.word-freq
|
|
share/tessdata/ara.traineddata
|
|
share/tessdata/asm.traineddata
|
|
share/tessdata/aze.traineddata
|
|
share/tessdata/aze_cyrl.traineddata
|
|
share/tessdata/bel.traineddata
|
|
share/tessdata/ben.traineddata
|
|
share/tessdata/bod.traineddata
|
|
share/tessdata/bos.traineddata
|
|
share/tessdata/bul.traineddata
|
|
share/tessdata/cat.traineddata
|
|
share/tessdata/ceb.traineddata
|
|
share/tessdata/ces.traineddata
|
|
share/tessdata/chi_sim.traineddata
|
|
share/tessdata/chi_tra.traineddata
|
|
share/tessdata/chr.traineddata
|
|
share/tessdata/configs/ambigs.train
|
|
share/tessdata/configs/api_config
|
|
share/tessdata/configs/bigram
|
|
share/tessdata/configs/box.train
|
|
share/tessdata/configs/box.train.stderr
|
|
share/tessdata/configs/digits
|
|
share/tessdata/configs/hocr
|
|
share/tessdata/configs/inter
|
|
share/tessdata/configs/kannada
|
|
share/tessdata/configs/linebox
|
|
share/tessdata/configs/logfile
|
|
share/tessdata/configs/makebox
|
|
share/tessdata/configs/pdf
|
|
share/tessdata/configs/quiet
|
|
share/tessdata/configs/rebox
|
|
share/tessdata/configs/strokewidth
|
|
share/tessdata/configs/txt
|
|
share/tessdata/configs/unlv
|
|
share/tessdata/cym.traineddata
|
|
share/tessdata/dan.traineddata
|
|
share/tessdata/dan_frak.traineddata
|
|
share/tessdata/deu.traineddata
|
|
share/tessdata/deu_frak.traineddata
|
|
share/tessdata/dzo.traineddata
|
|
share/tessdata/ell.traineddata
|
|
share/tessdata/eng.cube.bigrams
|
|
share/tessdata/eng.cube.fold
|
|
share/tessdata/eng.cube.lm
|
|
share/tessdata/eng.cube.nn
|
|
share/tessdata/eng.cube.params
|
|
share/tessdata/eng.cube.size
|
|
share/tessdata/eng.cube.word-freq
|
|
share/tessdata/eng.tesseract_cube.nn
|
|
share/tessdata/eng.traineddata
|
|
share/tessdata/eng.user-patterns
|
|
share/tessdata/eng.user-words
|
|
share/tessdata/enm.traineddata
|
|
share/tessdata/epo.traineddata
|
|
share/tessdata/equ.traineddata
|
|
share/tessdata/est.traineddata
|
|
share/tessdata/eus.traineddata
|
|
share/tessdata/fas.traineddata
|
|
share/tessdata/fin.traineddata
|
|
share/tessdata/fra.cube.bigrams
|
|
share/tessdata/fra.cube.fold
|
|
share/tessdata/fra.cube.lm
|
|
share/tessdata/fra.cube.nn
|
|
share/tessdata/fra.cube.params
|
|
share/tessdata/fra.cube.size
|
|
share/tessdata/fra.cube.word-freq
|
|
share/tessdata/fra.tesseract_cube.nn
|
|
share/tessdata/fra.traineddata
|
|
share/tessdata/frk.traineddata
|
|
share/tessdata/frm.traineddata
|
|
share/tessdata/gle.traineddata
|
|
share/tessdata/glg.traineddata
|
|
share/tessdata/grc.traineddata
|
|
share/tessdata/guj.traineddata
|
|
share/tessdata/hat.traineddata
|
|
share/tessdata/heb.traineddata
|
|
share/tessdata/hin.cube.bigrams
|
|
share/tessdata/hin.cube.fold
|
|
share/tessdata/hin.cube.lm
|
|
share/tessdata/hin.cube.nn
|
|
share/tessdata/hin.cube.params
|
|
share/tessdata/hin.cube.word-freq
|
|
share/tessdata/hin.tesseract_cube.nn
|
|
share/tessdata/hin.traineddata
|
|
share/tessdata/hrv.traineddata
|
|
share/tessdata/hun.traineddata
|
|
share/tessdata/iku.traineddata
|
|
share/tessdata/ind.traineddata
|
|
share/tessdata/isl.traineddata
|
|
share/tessdata/ita.cube.bigrams
|
|
share/tessdata/ita.cube.fold
|
|
share/tessdata/ita.cube.lm
|
|
share/tessdata/ita.cube.nn
|
|
share/tessdata/ita.cube.params
|
|
share/tessdata/ita.cube.size
|
|
share/tessdata/ita.cube.word-freq
|
|
share/tessdata/ita.tesseract_cube.nn
|
|
share/tessdata/ita.traineddata
|
|
share/tessdata/ita_old.traineddata
|
|
share/tessdata/jav.traineddata
|
|
share/tessdata/jpn.traineddata
|
|
share/tessdata/kan.traineddata
|
|
share/tessdata/kat.traineddata
|
|
share/tessdata/kat_old.traineddata
|
|
share/tessdata/kaz.traineddata
|
|
share/tessdata/khm.traineddata
|
|
share/tessdata/kir.traineddata
|
|
share/tessdata/kor.traineddata
|
|
share/tessdata/kur.traineddata
|
|
share/tessdata/lao.traineddata
|
|
share/tessdata/lat.traineddata
|
|
share/tessdata/lav.traineddata
|
|
share/tessdata/lit.traineddata
|
|
share/tessdata/mal.traineddata
|
|
share/tessdata/mar.traineddata
|
|
share/tessdata/mkd.traineddata
|
|
share/tessdata/mlt.traineddata
|
|
share/tessdata/msa.traineddata
|
|
share/tessdata/mya.traineddata
|
|
share/tessdata/nep.traineddata
|
|
share/tessdata/nld.traineddata
|
|
share/tessdata/nor.traineddata
|
|
share/tessdata/ori.traineddata
|
|
share/tessdata/osd.traineddata
|
|
share/tessdata/pan.traineddata
|
|
share/tessdata/pdf.ttf
|
|
share/tessdata/pol.traineddata
|
|
share/tessdata/por.traineddata
|
|
share/tessdata/pus.traineddata
|
|
share/tessdata/ron.traineddata
|
|
share/tessdata/rus.cube.fold
|
|
share/tessdata/rus.cube.lm
|
|
share/tessdata/rus.cube.nn
|
|
share/tessdata/rus.cube.params
|
|
share/tessdata/rus.cube.size
|
|
share/tessdata/rus.cube.word-freq
|
|
share/tessdata/rus.traineddata
|
|
share/tessdata/san.traineddata
|
|
share/tessdata/sin.traineddata
|
|
share/tessdata/slk.traineddata
|
|
share/tessdata/slk_frak.traineddata
|
|
share/tessdata/slv.traineddata
|
|
share/tessdata/spa.cube.bigrams
|
|
share/tessdata/spa.cube.fold
|
|
share/tessdata/spa.cube.lm
|
|
share/tessdata/spa.cube.nn
|
|
share/tessdata/spa.cube.params
|
|
share/tessdata/spa.cube.size
|
|
share/tessdata/spa.cube.word-freq
|
|
share/tessdata/spa.traineddata
|
|
share/tessdata/spa_old.traineddata
|
|
share/tessdata/sqi.traineddata
|
|
share/tessdata/srp.traineddata
|
|
share/tessdata/srp_latn.traineddata
|
|
share/tessdata/swa.traineddata
|
|
share/tessdata/swe.traineddata
|
|
share/tessdata/syr.traineddata
|
|
share/tessdata/tam.traineddata
|
|
share/tessdata/tel.traineddata
|
|
share/tessdata/tessconfigs/batch
|
|
share/tessdata/tessconfigs/batch.nochop
|
|
share/tessdata/tessconfigs/matdemo
|
|
share/tessdata/tessconfigs/msdemo
|
|
share/tessdata/tessconfigs/nobatch
|
|
share/tessdata/tessconfigs/segdemo
|
|
share/tessdata/tgk.traineddata
|
|
share/tessdata/tgl.traineddata
|
|
share/tessdata/tha.traineddata
|
|
share/tessdata/tir.traineddata
|
|
share/tessdata/tur.traineddata
|
|
share/tessdata/uig.traineddata
|
|
share/tessdata/ukr.traineddata
|
|
share/tessdata/urd.traineddata
|
|
share/tessdata/uzb.traineddata
|
|
share/tessdata/uzb_cyrl.traineddata
|
|
share/tessdata/vie.traineddata
|
|
share/tessdata/yid.traineddata
|