Software:Apache PDFBox

From HandWiki
Short description: Open-source PDF library
PDFBox
Apache PDFBox logo.svg
Developer(s)Apache Software Foundation
Stable release
1.8.x:1.8.17 / 15 September 2022; 17 months ago (2022-09-15)[1]
2.0.x:2.0.29 / 1 July 2023; 7 months ago (2023-07-01)[1]
3.0.x:3.0.0 / 18 August 2023; 5 months ago (2023-08-18)[1]
RepositoryPDFBox Repository (Mirror)
Written inJava
Operating systemCross-platform
TypePortable Document Format (PDF)
LicenseApache License 2.0
Websitepdfbox.apache.org

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Open Hub reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing year-over-year commits. Using the COCOMO model, it took an estimated 46 person-years of effort.[2]

Structure

Apache PDFBox has these components:

  • PDFBox: the main part
  • FontBox: handles font information
  • XmpBox: handles XMP metadata
  • Preflight (optional): checks PDF files for PDF/A-1b conformity.

History

PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene.[3] It became an Apache Incubator project in 2008, and an Apache top level project in 2009.[4]

Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.[5]

In February 2015, Apache PDFBox was named an Open Source Partner Organization of the PDF Association.[6]

See also

References

External links