Comparison of optical character recognition software
From HandWiki
Short description: None
This comparison of optical character recognition software includes:
- OCR engines, that do the actual character identification
- Layout analysis software, that divide scanned documents into zones suitable for OCR
- Graphical interfaces to one or more OCR engines
- Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Android | iOS | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ABBYY FineReader | 1989 | 16 | 2022 | Proprietary | Yes | Yes | Yes | No | Yes | Yes | Yes | C/C++ | Yes | 192[1] | All fonts | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[2] | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[3] |
AnyDoc Software | 1989 | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | VBScript | ? | ? | ? | Works with structured, semi-structured, and unstructured documents. | |
Asprise OCR SDK | 1998 | 15 | 2015 | Proprietary | Yes | Yes | Yes | Yes | Yes | ? | ? | Java, C#,VB.NET, C/C++/Delphi | Yes | 20+[4] | ? | Plain text, searchable PDF, XML[5] | Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[6] |
CuneiForm | 1996 | 1.1 | 2011 | BSD variant | No | Yes | Yes | Yes | Yes | ? | ? | C/C++ | Yes | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT[7] | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure |
Dynamsoft OCR SDK | 2003 | 8.2 | 2012 | Proprietary | Yes | Yes | No | No | No | ? | ? | C/C++ | Yes | 40+[8] | ? | PDF, TXT | |
E-aksharayan | 2010 | Yes | No | Yes | No | ? | ? | 14 | RTF, TXT, BRL | ||||||||
GOCR | 2000 | 0.52[9] | 2018 | GPL | Yes[10] | Yes | Yes | Yes | Yes | ? | ? | C | ? | 20+ | ? | ||
Google Drive OCR or Google Cloud Vision | 2015 | Proprietary | Yes | Browser | Browser | Browser | Unknown | ? | ? | Unknown | Yes | 200+ | All fonts | text | Google blog post[11][12] | ||
Microsoft Office Document Imaging | ? | Office 2007 | 2007 | Proprietary | No | Yes | No | No | No | ? | |||||||
Microsoft Office OneNote 2007 | 2011 | ? | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | ||
OCRFeeder | 2009-03 | 0.8.5 | 2022 | GPL | No | No | No | Yes | No | ? | ? | Python | ? | ? | ? | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad | |
Ocrad | ? | 0.28[13] | 2022 | GPL | Yes | No | Yes | Yes | Yes | ? | ? | C++ | Yes | Latin alphabet | ? | Command line | |
OCRopus | 2007 | 1.3.3 | 2017 | Apache | No | No | Yes | Yes | Yes | ? | ? | Python | ? | All languages using Latin script (other languages can be trained) | Normal Latin script and Fraktur (other scripts can be trained) | TXT, hOCR,[14] PDF[15] | Pluggable framework under active development, used for Google Books |
OmniPage | 1970s | 19.2 | 2015 | Proprietary | Yes | Yes | Yes | Yes | No | ? | ? | C/C++, C#[16] | Yes | 125[17] | Machine and handprinted fonts | DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 | Product of Nuance Communications |
Puma.NET | ? | ? | 2009 | BSD | No | Yes | No | No | No | ? | ? | C# | Yes | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications | |
ReadSoft | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | |
Scantron | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | For working with localized interfaces, corresponding language support is required. | |
SmartScore | 1991 | 10.5.8 | 2015 | Proprietary | No | Yes | Yes | No | No | ? | ? | ? | ? | ? | ? | For musical scores | |
Tesseract | 1985 | 5.3.3 | 2023 | Apache | No | Yes | Yes | Yes | Yes | ? | ? | C++, C | Yes | 100+[18] | Any printed font | Text, ALTO, hOCR,[19] PDF, others with different user interfaces[20] or the API | Created by Hewlett-Packard; under further development by Google[21] |
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Android | iOS | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
Evaluation
A 2016 analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others.[22]
References
- ↑ "ABBYY FineReader 14: Technical Specifications". Finereader.abbyy.com. https://www.abbyy.com/en-eu/finereader/tech-specs/. Retrieved 2017-02-23.
- ↑ "ABBYY FineReader 11: Technical Specifications". Finereader.abbyy.com. http://finereader.abbyy.com/professional/tech_specs/. Retrieved 2013-09-12.
- ↑ "Top OCR Software". Ocrworld.com. 2010-03-30. http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html. Retrieved 2013-09-12.
- ↑ "Asprise OCR SDK Features". asprise.com. http://asprise.com/royalty-free-library/java-ocr-api-overview.html. Retrieved 2014-06-21.
- ↑ "Asprise Java OCR Library Features". asprise.com. http://asprise.com/royalty-free-library/java-ocr-api-overview.html. Retrieved 2014-06-21.
- ↑ "Asprise Java, C#/VB.NET OCR API". asprise.com. 2015-11-19. http://asprise.com/royalty-free-library/ocr-api-for-java-csharp-vb.net.html. Retrieved 2015-11-19.
- ↑ Debian manual page for Cuneiform for Linux version 1.1.0
- ↑ "OCR SDK Language Packages Download". Dynamsoft.com. http://www.dynamsoft.com/Downloads/OCR-Language-Package.aspx. Retrieved 2013-09-12.
- ↑ "GOCR Homepage". wasd.urz.uni-magdeburg.de. https://wasd.urz.uni-magdeburg.de/jschulen/ocr/. Retrieved 2018-10-17.
- ↑ "GOCR". Jocr.sourceforge.net. http://jocr.sourceforge.net/. Retrieved 2013-09-12.
- ↑ "Supported languages". Feb 11, 2022. https://support.google.com/drive/answer/176692#zippy=%2Csupported-languages.
- ↑ Ashok Popat (Sep 4, 2015). "IEEE SPS: Optical Character Recognition for Most of the World's Languages". https://www.youtube.com/watch?v=E0y41YU85tI.
- ↑ Diaz, Antonio (2022-01-17). "GNU Ocrad 0.28 released" (Mailing list). info-gnu.
- ↑ OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
- ↑ In combination with the hocr-tools
- ↑ "OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR". Nuance. http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp. Retrieved 2013-09-12.
- ↑ "OmniPage Standard Document Conversion". Nuance. http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm. Retrieved 2014-02-25.
- ↑ Based on count of language training files for version 3.04. Available at the download page.
- ↑ Usage explained in the Tesseract Readme and FAQ
- ↑ Such as ODF with OCRFeeder
- ↑ "GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)". https://github.com/tesseract-ocr/tesseract#brief-history/. Retrieved 2018-11-05.
- ↑ Assefi, Mehdi (2016-12-01). "OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym". https://www.researchgate.net/publication/310645810.
Original source: https://en.wikipedia.org/wiki/Comparison of optical character recognition software.
Read more |