Commit graph

56 commits

Author SHA1 Message Date
joerg
3a3c07bc30 Remove @dirrm entries from PLISTs 2009-06-14 17:59:04 +00:00
wiz
4358c8cac0 Replace patch-ab with a post-extract rule. No change to the binary package,
just one file less in pkgsrc ;)
2008-10-30 22:12:59 +00:00
wiz
b4a554e958 Update to 2.03:
January 23 2008 - V2.02
          Improvements to clustering, training and classifier.
          Major internationalization improvements for large-character-set
          languages, eg Kannada.
          Removed some compiler warnings.
          Added multipage tiff support for training and running.
          Updated graphics output to talk to new java-based viewer.
          Added ability to save n-best lists.
          Added leptonica support for more file types.
          Improved Init/End to make them safe.
          Reduced memory use of dictionaries.
          Added some new APIs to TessBaseAPI.
April 21 2008 - V2.02 (again)
          Fixed namespace collisions with jpeg library (INT32).
          Portability fixes for Windows for new code.
          Updates to autoconf system for new code.
April 22 2008 - V2.03
          Fixed crash introduced in 2.02.
	  Fixed lack of tessembedded.cpp in distribution.
	  Added test for leptonica header files and conditional test for lib.
2008-05-30 13:06:26 +00:00
wiz
06d626133c Update to 2.01:
August 27 2007 - V2.01
	  Fixed UTF8 input problems with box file reader.
	  Fixed various infinite loops and crashes in dawg code.
	  Removed include of config_auto.h from host.h.
	  Added automatic wctype encoding to unicharset_extractor.
	  Fixed dawg table too full error.
	  Removed svn files from tarball.
	  Added new functions to tessdll.
	  Increased maximum utf8 string in a classification result to 8.
2007-11-29 16:42:08 +00:00
wiz
1da043e250 Update to 2.00, provided by Rumko on pkgsrc-users.
July 02 2007 - V2.00
	  Converted internal character handling to UTF8.
	  Trained with 6 languages.
	  Added unicharset_extractor, wordlist2dawg.
	  Added boxfile creation mode.
	  Added UNLV regression test capability.
	  Fixed problems with copyright and registered symbols.
	  Fixed extern "C" declarations problem.
2007-07-28 01:02:14 +00:00
wiz
e899e6021c Initial import of tesseract-1.04b from pkgsrc-wip (packaged by heinz@
and myself):

This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO
OUTPUT FORMATTING, and NO UI. It can only process an image of a
single column and create text from it. It can detect fixed pitch
vs proportional text.  Having said that, in 1995, this engine was
in the top 3 in terms of character accuracy, and it compiles and
runs on both Linux and Windows. Another current limitation is that
it only recognizes English and its character set is only US-ASCII.
Training code IS included in the open source release however, and
will be included in a future release.
2007-05-18 06:39:27 +00:00