Software:Apache Hop

From HandWiki
Apache Hop
Apache Hop Logo
Original author(s)Bart Maertens
Developer(s)Apache Hop
Initial releaseSeptember 21, 2021; 2 years ago (2021-09-21)[1]
Stable release
1.1.0 / January 21, 2022; 2 years ago (2022-01-21)
RepositoryHop Repository
Written inJava
Operating systemMicrosoft Windows, macOS, Linux, Web
Available inJava
TypeData orchestration, data engineering, ETL
LicenseApache License 2.0
Website{{{1}}}

Apache Hop is a data orchestration and data integration platform.[2] It is an open-source[3] software platform developed by the Apache Software Foundation written in Java. The project provides a visual development environment (IDE) for the development of workflows and pipelines. Hop uses metadata to process data and provides tools and functionality to manage a data project throughout its entire life cycle.[4] Hop can process data in batch, streaming and hybrid modes, and can run in cloud, on-premise or hybrid cloud.

History

Apache Hop started as Project Hop in the summer of 2019 as a fork of Kettle (or Pentaho Data Integration) 8.2.0.7.[5] Hop was redesigned and re-architected from the ground up, and was intended to be a separate and incompatible project from the start. After entering the Apache Incubator in October 2020[6], the project was renamed to Apache Hop. Hop graduated as an Apache Top-Level Project in December 2021.

Architecture

Apache Hop Gui - showing an Apache Beam pipeline

Hop's main file types are pipelines and workflows. Pipelines operate on the data: they read, modify, enrich, combine and perform other operations on the data before writing out to one or more target platforms. Workflows don't act on the data directly, but perform orchestration tasks like performing environment checks and error handling, and execute pipelines and child workflows. Pipelines are a collection of transforms, workflows are a collection of actions. Both the transforms in a pipeline and the actions in a workflow are connected through hops. Pipelines are executed in parallel, workflows are executed sequentially.

Hop consists of a small and robust engine (kernel). All non-core functionality is added through plugins. One of the available plugins in Hop are the runtime engines that run pipelines and workflows. Workflows can run locally and remotely in Hop's native run configuration. Pipelines can also run in the Hop native local and remote run configuration, but can also run on Apache Spark, Apache Flink or Google Cloud Dataflow through Apache Beam.

Hop considers data projects to be more than just pipelines and workflows. The platform offers functionality to not only develop pipelines and workflows, but also to support data teams in a project's entire life cycle. This includes functionality to organize work in projects and environments, to build and run unit tests, integrate with CI/CD platforms, version control (git) and more.[7]

Tools

The Apache Hop platform comes with a variety of tools:

  • Hop Gui - the visual IDE where Hop developers build, run and test data pipelines and workflows
  • Hop Conf - command line tool to manage all aspects of a Hop installation and configuration (e.g. projects and environments, cloud configuration)
  • Hop Import - command line tool to import Pentaho Data Integration (Kettle) jobs and transformation to workflows and pipelines and upgrade them to Hop projects
  • Hop Run - command line tool to run workflows and pipelines
  • Hop Search - command line tool to search in a project's (or all of Hop's) metadata
  • Hop Server - light-weight web server to run workflows and pipelines remotely
  • Hop Translator - GUI tool to assist non-developers in translating the Hop tools

External links

References

Category:Free software Category:Metadata