2017-02-21 18:51:18 +01:00
|
|
|
@comment $NetBSD: PLIST,v 1.9 2017/02/21 17:51:18 fhajny Exp $
|
2014-10-02 18:06:02 +02:00
|
|
|
bin/ambiguous_words
|
|
|
|
bin/classifier_tester
|
2007-05-18 08:39:27 +02:00
|
|
|
bin/cntraining
|
2014-10-02 18:06:02 +02:00
|
|
|
bin/combine_tessdata
|
|
|
|
bin/dawg2wordlist
|
2007-05-18 08:39:27 +02:00
|
|
|
bin/mftraining
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
bin/set_unicharset_properties
|
2014-10-02 18:06:02 +02:00
|
|
|
bin/shapeclustering
|
2007-05-18 08:39:27 +02:00
|
|
|
bin/tesseract
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
bin/text2image
|
2007-07-28 03:02:14 +02:00
|
|
|
bin/unicharset_extractor
|
|
|
|
bin/wordlist2dawg
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/apitypes.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/baseapi.h
|
|
|
|
include/tesseract/basedir.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/capi.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/errcode.h
|
|
|
|
include/tesseract/fileerr.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/genericvector.h
|
|
|
|
include/tesseract/helpers.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/host.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/ltrresultiterator.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/memry.h
|
|
|
|
include/tesseract/ndminx.h
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
include/tesseract/ocrclass.h
|
|
|
|
include/tesseract/osdetect.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/pageiterator.h
|
|
|
|
include/tesseract/params.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/platform.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/publictypes.h
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
include/tesseract/renderer.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/resultiterator.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/serialis.h
|
|
|
|
include/tesseract/strngs.h
|
2014-10-02 18:06:02 +02:00
|
|
|
include/tesseract/tesscallback.h
|
|
|
|
include/tesseract/thresholder.h
|
2007-05-18 08:39:27 +02:00
|
|
|
include/tesseract/unichar.h
|
|
|
|
include/tesseract/unicharmap.h
|
|
|
|
include/tesseract/unicharset.h
|
2014-10-02 18:06:02 +02:00
|
|
|
lib/libtesseract.la
|
|
|
|
lib/pkgconfig/tesseract.pc
|
|
|
|
man/man1/ambiguous_words.1
|
|
|
|
man/man1/cntraining.1
|
|
|
|
man/man1/combine_tessdata.1
|
|
|
|
man/man1/dawg2wordlist.1
|
|
|
|
man/man1/mftraining.1
|
|
|
|
man/man1/shapeclustering.1
|
|
|
|
man/man1/tesseract.1
|
|
|
|
man/man1/unicharset_extractor.1
|
|
|
|
man/man1/wordlist2dawg.1
|
|
|
|
man/man5/unicharambigs.5
|
|
|
|
man/man5/unicharset.5
|
|
|
|
share/tessdata/afr.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/amh.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/ara.cube.bigrams
|
|
|
|
share/tessdata/ara.cube.fold
|
|
|
|
share/tessdata/ara.cube.lm
|
|
|
|
share/tessdata/ara.cube.nn
|
|
|
|
share/tessdata/ara.cube.params
|
|
|
|
share/tessdata/ara.cube.size
|
|
|
|
share/tessdata/ara.cube.word-freq
|
|
|
|
share/tessdata/ara.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/asm.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/aze.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/aze_cyrl.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/bel.traineddata
|
|
|
|
share/tessdata/ben.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/bod.traineddata
|
|
|
|
share/tessdata/bos.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/bul.traineddata
|
|
|
|
share/tessdata/cat.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/ceb.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/ces.traineddata
|
|
|
|
share/tessdata/chi_sim.traineddata
|
|
|
|
share/tessdata/chi_tra.traineddata
|
|
|
|
share/tessdata/chr.traineddata
|
|
|
|
share/tessdata/configs/ambigs.train
|
2007-11-29 17:42:08 +01:00
|
|
|
share/tessdata/configs/api_config
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/configs/bigram
|
2007-07-28 03:02:14 +02:00
|
|
|
share/tessdata/configs/box.train
|
Update to 2.04. Set LICENSE.
June 30 2009 - V2.04
Integrated bug fixes and patches and misc changes for portability.
Integrated a patch to remove some of the "access" macros.
Removed dependence on lua from the viewer, speeding it up
dramatically.
Fixed the viewer so it compiles and runs properly!
Specifically fixing issues: 1, 63, 67, 71, 76, 81, 82, 106, 111,
112, 128, 129, 130, 133, 135, 142, 143, 145, 147, 153, 154, 160,
165, 170, 175, 177, 187, 192, 195, 199, 201, 205, 209, 108, 169
2009-07-22 22:57:47 +02:00
|
|
|
share/tessdata/configs/box.train.stderr
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/configs/digits
|
|
|
|
share/tessdata/configs/hocr
|
2007-07-28 03:02:14 +02:00
|
|
|
share/tessdata/configs/inter
|
2008-05-30 15:06:26 +02:00
|
|
|
share/tessdata/configs/kannada
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/configs/linebox
|
|
|
|
share/tessdata/configs/logfile
|
2007-07-28 03:02:14 +02:00
|
|
|
share/tessdata/configs/makebox
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/configs/pdf
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/configs/quiet
|
|
|
|
share/tessdata/configs/rebox
|
|
|
|
share/tessdata/configs/strokewidth
|
2017-02-21 18:51:18 +01:00
|
|
|
share/tessdata/configs/tsv
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/configs/txt
|
2007-07-28 03:02:14 +02:00
|
|
|
share/tessdata/configs/unlv
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/cym.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/dan.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/dan_frak.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/deu.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/deu_frak.traineddata
|
|
|
|
share/tessdata/dzo.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/ell.traineddata
|
|
|
|
share/tessdata/eng.cube.bigrams
|
|
|
|
share/tessdata/eng.cube.fold
|
|
|
|
share/tessdata/eng.cube.lm
|
|
|
|
share/tessdata/eng.cube.nn
|
|
|
|
share/tessdata/eng.cube.params
|
|
|
|
share/tessdata/eng.cube.size
|
|
|
|
share/tessdata/eng.cube.word-freq
|
|
|
|
share/tessdata/eng.tesseract_cube.nn
|
|
|
|
share/tessdata/eng.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/eng.user-patterns
|
|
|
|
share/tessdata/eng.user-words
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/enm.traineddata
|
|
|
|
share/tessdata/epo.traineddata
|
|
|
|
share/tessdata/equ.traineddata
|
|
|
|
share/tessdata/est.traineddata
|
|
|
|
share/tessdata/eus.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/fas.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/fin.traineddata
|
|
|
|
share/tessdata/fra.cube.bigrams
|
|
|
|
share/tessdata/fra.cube.fold
|
|
|
|
share/tessdata/fra.cube.lm
|
|
|
|
share/tessdata/fra.cube.nn
|
|
|
|
share/tessdata/fra.cube.params
|
|
|
|
share/tessdata/fra.cube.size
|
|
|
|
share/tessdata/fra.cube.word-freq
|
|
|
|
share/tessdata/fra.tesseract_cube.nn
|
|
|
|
share/tessdata/fra.traineddata
|
|
|
|
share/tessdata/frk.traineddata
|
|
|
|
share/tessdata/frm.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/gle.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/glg.traineddata
|
|
|
|
share/tessdata/grc.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/guj.traineddata
|
|
|
|
share/tessdata/hat.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/heb.traineddata
|
|
|
|
share/tessdata/hin.cube.bigrams
|
|
|
|
share/tessdata/hin.cube.fold
|
|
|
|
share/tessdata/hin.cube.lm
|
|
|
|
share/tessdata/hin.cube.nn
|
|
|
|
share/tessdata/hin.cube.params
|
|
|
|
share/tessdata/hin.cube.word-freq
|
|
|
|
share/tessdata/hin.tesseract_cube.nn
|
|
|
|
share/tessdata/hin.traineddata
|
|
|
|
share/tessdata/hrv.traineddata
|
|
|
|
share/tessdata/hun.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/iku.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/ind.traineddata
|
|
|
|
share/tessdata/isl.traineddata
|
|
|
|
share/tessdata/ita.cube.bigrams
|
|
|
|
share/tessdata/ita.cube.fold
|
|
|
|
share/tessdata/ita.cube.lm
|
|
|
|
share/tessdata/ita.cube.nn
|
|
|
|
share/tessdata/ita.cube.params
|
|
|
|
share/tessdata/ita.cube.size
|
|
|
|
share/tessdata/ita.cube.word-freq
|
|
|
|
share/tessdata/ita.tesseract_cube.nn
|
|
|
|
share/tessdata/ita.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/ita_old.traineddata
|
|
|
|
share/tessdata/jav.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/jpn.traineddata
|
|
|
|
share/tessdata/kan.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/kat.traineddata
|
|
|
|
share/tessdata/kat_old.traineddata
|
|
|
|
share/tessdata/kaz.traineddata
|
|
|
|
share/tessdata/khm.traineddata
|
|
|
|
share/tessdata/kir.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/kor.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/kur.traineddata
|
|
|
|
share/tessdata/lao.traineddata
|
|
|
|
share/tessdata/lat.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/lav.traineddata
|
|
|
|
share/tessdata/lit.traineddata
|
|
|
|
share/tessdata/mal.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/mar.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/mkd.traineddata
|
|
|
|
share/tessdata/mlt.traineddata
|
|
|
|
share/tessdata/msa.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/mya.traineddata
|
|
|
|
share/tessdata/nep.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/nld.traineddata
|
|
|
|
share/tessdata/nor.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/ori.traineddata
|
|
|
|
share/tessdata/osd.traineddata
|
|
|
|
share/tessdata/pan.traineddata
|
|
|
|
share/tessdata/pdf.ttf
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/pol.traineddata
|
|
|
|
share/tessdata/por.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/pus.traineddata
|
|
|
|
share/tessdata/ron.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/rus.cube.fold
|
|
|
|
share/tessdata/rus.cube.lm
|
|
|
|
share/tessdata/rus.cube.nn
|
|
|
|
share/tessdata/rus.cube.params
|
|
|
|
share/tessdata/rus.cube.size
|
|
|
|
share/tessdata/rus.cube.word-freq
|
|
|
|
share/tessdata/rus.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/san.traineddata
|
|
|
|
share/tessdata/sin.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/slk.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/slk_frak.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/slv.traineddata
|
|
|
|
share/tessdata/spa.cube.bigrams
|
|
|
|
share/tessdata/spa.cube.fold
|
|
|
|
share/tessdata/spa.cube.lm
|
|
|
|
share/tessdata/spa.cube.nn
|
|
|
|
share/tessdata/spa.cube.params
|
|
|
|
share/tessdata/spa.cube.size
|
|
|
|
share/tessdata/spa.cube.word-freq
|
|
|
|
share/tessdata/spa.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/spa_old.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/sqi.traineddata
|
|
|
|
share/tessdata/srp.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/srp_latn.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/swa.traineddata
|
|
|
|
share/tessdata/swe.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/syr.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/tam.traineddata
|
|
|
|
share/tessdata/tel.traineddata
|
2007-07-28 03:02:14 +02:00
|
|
|
share/tessdata/tessconfigs/batch
|
|
|
|
share/tessdata/tessconfigs/batch.nochop
|
|
|
|
share/tessdata/tessconfigs/matdemo
|
|
|
|
share/tessdata/tessconfigs/msdemo
|
|
|
|
share/tessdata/tessconfigs/nobatch
|
|
|
|
share/tessdata/tessconfigs/segdemo
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/tgk.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/tgl.traineddata
|
|
|
|
share/tessdata/tha.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/tir.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/tur.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/uig.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/ukr.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/urd.traineddata
|
|
|
|
share/tessdata/uzb.traineddata
|
|
|
|
share/tessdata/uzb_cyrl.traineddata
|
2014-10-02 18:06:02 +02:00
|
|
|
share/tessdata/vie.traineddata
|
Update graphics/tesseract to 3.04.01.
Move to new home at Github. Clean up.
2015-02-17 - V3.04.01
- Added OSD renderer for psm 0. Works for single page and
multi-page images.
- Improve tesstrain.sh script.
- Simplify build and run of ScrollView.
- Improved PDF output for OS X Preview utility.
- INCOMPATIBLE fix to hOCR line height information - commit
134ebc3.
- Added option to build Tesseract without Cube OCR engine
(-DNO_CUBE_BUILD).
- Enable OpenMP support.
- Many bug fixes.
2015-07-11 - V3.04.00
- Tesseract development is now done with Git and hosted at
github.com (Previously we used Subversion as a VCS and
code.google.com for hosting).
- Tesseract now requires leptonica 1.71 or a higher version.
- Removed official support for VS 2008.
- Added support for 39 additional scripts/languages, including:
amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat,
iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya,
nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd,
uzb, uzb_cyrl, yid
- Major updates to training system as a result of extensive
testing on 100 languages.
- New training data for over 100 languages
- Improved performance with PIC compilation option.
- Significant change to invisible font system in pdf output to
improve correctness and compatibility with external programs,
particularly ghostscript.
- Improved font identification.
- Major change to improve layout analysis for heavily diacritic
languages: Thai, Vietnamese, Kannada, Telugu etc.
- Fixed problems with shifted baselines so recognition can recover
from layout analysis errors.
- Major refactor to improve speed on difficult images, especially
when running a heap checker.
- Moved params from global in page layout to tesseractclass.
- Improved single column layout analysis.
- Allow ocr output to multiple formats using tesseract command
line executable.
- Fixed issues with mixed eng+ara scripts.
- Improved script consistency in numbers.
- Major refactor of control.cpp to enable line recognition.
- Added tesstrain.sh - a master training script.
- Added ability to text2image training tool to just list available
fonts.
- Added ability to text2image to underline words.
- Improved efficiency of image processing for PDF output.
- Added parameter description for each parameter listed with
'print-parameters' command line option.
- Added font info to hOCR output.
- Enabled streaming input and output of multi-page documents.
- Many bug fixes.
2014-02-04 - V3.03(rc1)
- Added new training tool text2image to generate box/tif file
pairs from text and truetype fonts.
- Added support for PDF output with searchable text.
- Removed entire IMAGE class and all code in image directory.
- Tesseract executable: support for output to stdout; limited
support for one
page images from stdin (especially on Windows)
- Added Renderer to API to allow document-level processing and
output of document formats, like hOCR, PDF.
- Major refactor of word-level recognition, beam search,
eliminating dead code.
- Refactored classifier to make it easier to add new ones.
- Generalized feature extractor to allow feature extraction from
greyscale.
- Improved sub/superscript treatment.
- Improved baseline fit.
- Added set_unicharset_properties to training tools.
- Many bug fixes.
- More training source data included.
2016-03-17 13:51:14 +01:00
|
|
|
share/tessdata/yid.traineddata
|