Software:Cascading

Cascading
Stable release	3.3.0 / March 24, 2018; 7 years ago
Preview release	4.0-wip-120 / March 27, 2021; 4 years ago
Repository	github.com/Cascading/cascading
Written in	Java
License	Apache License v2
Website	www.cascading.org

Short description: Software abstraction layer for Apache Hadoop and Apache Flink

Cascading is a software abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language (Java, JRuby, Clojure, etc.), hiding the underlying complexity of MapReduce jobs. It is open source and available under the Apache License. Commercial support is available from Driven, Inc.^[4]

Cascading was originally authored by Chris Wensel, who later founded Concurrent, Inc, which has been re-branded as Driven.^[5] Cascading is being actively developed by the community^{[citation needed]} and a number of add-on modules are available.^[6]

Architecture

To use Cascading, Apache Hadoop must also be installed, and the Hadoop job .jar must contain the Cascading .jars. Cascading consists of a data processing API, integration API, process planner and process scheduler.

Cascading leverages the scalability of Hadoop but abstracts standard data processing operations away from underlying map and reduce tasks.^[7]^{[better source needed]} Developers use Cascading to create a .jar file that describes the required processes. It follows a ‘source-pipe-sink’ paradigm, where data is captured from sources, follows reusable ‘pipes’ that perform data analysis processes, where the results are stored in output files or ‘sinks’. Pipes are created independent from the data they will process. Once tied to data sources and sinks, it is called a ‘flow’. These flows can be grouped into a ‘cascade’, and the process scheduler will ensure a given flow does not execute until all its dependencies are satisfied. Pipes and flows can be reused and reordered to support different business needs.^[8]

Developers write the code in a JVM-based language and do not need to learn MapReduce. The resulting program can be regression tested and integrated with external applications like any other Java application.^[9]

Cascading is most often used for ad targeting, log file analysis, bioinformatics, machine learning, predictive analytics, web content mining, and extract, transform and load (ETL) applications.^[5]

Uses of Cascading

Cascading was cited as one of the top five most powerful Hadoop projects by SD Times in 2011,^[10] as a major open source project relevant to bioinformatics^[11] and is included in Hadoop: A Definitive Guide, by Tom White.^[12] The project has also been cited in presentations, conference proceedings and Hadoop user group meetings as a useful tool for working with Hadoop^[13]^[14]^[15]^[16] and with Apache Spark^[17]

MultiTool on Amazon Web Services was developed using Cascading.^[18]
LogAnalyzer for Amazon CloudFront was developed using Cascading.^[19]
BackType^[20] - social analytics platform
Etsy^[7] - marketplace
FlightCaster^[21] - predicting flight delays
Ion Flux^[22] - analyzing DNA sequence data
RapLeaf^[23] - personalization and recommendation systems
Razorfish^[24] - digital advertising

Domain-Specific Languages Built on Cascading

PyCascading^[25] - by Twitter, available on GitHub
Cascading.jruby^[26] - developed by Gregoire Marabout, available on GitHub
Cascalog^[27] - authored by Nathan Marz, available on GitHub
Scalding^[28] - A Scala API for Cascading. Makes it easier to transition Cascading/Scalding code to Spark. By Twitter, available on GitHub

References

↑ "Releases · Cascading/cascading". https://github.com/Cascading/cascading/releases.
↑ "Releases · cwensel/cascading". https://github.com/cwensel/cascading/releases.
↑ "cascading/LICENSE.txt at 3.3 · Cascading/cascading". https://github.com/Cascading/cascading/blob/3.3/LICENSE.txt.
↑ "Cascading and Driven | Support". https://www.driven.io/support/.
↑ ^5.0 ^5.1 "Integrate.io - One Platform To Support Your Entire Data Journey". https://www.integrate.io/.
↑ "Cascading modules". http://www.cascading.org/modules.html.
↑ ^7.0 ^7.1 Blog post by Etsy describing their use of Cascading with Hadoop
↑ "Cascading User Guide". Archived from the original on February 6, 2011. https://web.archive.org/web/20110206053054/http://www.cascading.org/1.2/userguide/pdf/userguide.pdf.
↑ "Hadoop Application Performance Management - DRIVEN's Features". https://www.driven.io/features/.
↑ Handy, Alex (1 June 2011). "The top five most powerful Hadoop projects". SD Times. http://www.sdtimes.com/content/article.aspx?ArticleID=35596&page=1. Retrieved 26 October 2013.
↑ Taylor, Ronald (21 December 2010). "An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics". BioMed Central (Springer Science+Business Media). http://www.biomedcentral.com/1471-2105/11/S12/S1. Retrieved 26 October 2013.
↑ White, Tom (September 24, 2010). Hadoop: The Definitive Guide. "O'Reilly Media, Inc.". ISBN 9781449396893. https://books.google.com/books?id=Nff49D7vnJcC&dq=cascading+hadoop&pg=PA548.
↑ "Getting Started on Hadoop". https://www.slideshare.net/pacoid/getting-started-on-hadoop.
↑ "Julio Guijarro, Steve Loughran and Paolo Castagna, "Hadoop and beyond," HP Labs, Bristol UK, 2008.". http://www.smartfrog.org/wiki/download/attachments/6193590/hadoop_and_beyond.pdf?version=1&modificationDate=1238073739000.
↑ "Flightcaster Presentation Hadoop". https://www.slideshare.net/hadoopusergroup/flightcaster-presentation-hadoop.
↑ "NoSQL, Hadoop, Cascading June 2010". https://www.slideshare.net/chriscurtin/nosql-hadoop-cascading-june-2010.
↑ "Using Cascading to Build Data-centric Applications on Spark". 2014-05-07. https://spark-summit.org/2014/talk/using-cascading-to-build-data-centric-applications-on-spark.
↑ "Cascading.Multitool on AWS". http://aws.amazon.com/articles/2293?_encoding=UTF8&jiveRedirect=1.
↑ "AWS Articles". https://aws.amazon.com/articles/item/.
↑ BackType blog
↑ "FlightCaster". http://www.informationweek.com/news/software/infrastructure/224000240.
↑ "Ion Flux". Archived from the original on October 23, 2011. https://web.archive.org/web/20111023203553/http://www.concurrentinc.com/casestudies/ion_flux.
↑ RapLeaf Blog
↑ "Razorfish Case Study". https://aws.amazon.com/solutions/case-studies/razorfish/.
↑ "PyCascading is no longer maintained". 17 September 2021. https://github.com/twitter/pycascading.
↑ "Cascading.JRuby". August 8, 2018. https://github.com/gmarabout/cascading.jruby.
↑ "Cascalog". June 23, 2023. https://github.com/nathanmarz/cascalog.
↑ "Scalding". June 22, 2023. https://github.com/twitter/scalding.

External links

Official website

0.00

(0 votes)

[1] "Releases · Cascading/cascading". https://github.com/Cascading/cascading/releases.

[2] "Releases · cwensel/cascading". https://github.com/cwensel/cascading/releases.

[3] "cascading/LICENSE.txt at 3.3 · Cascading/cascading". https://github.com/Cascading/cascading/blob/3.3/LICENSE.txt.

[4] "Cascading and Driven | Support". https://www.driven.io/support/.

[Integrate.io_official_website-5] 5.0 ^5.1 "Integrate.io - One Platform To Support Your Entire Data Journey". https://www.integrate.io/.

[6] "Cascading modules". http://www.cascading.org/modules.html.

[Blog_post_by_Etsy_describing_their_use_of_Cascading_with_Hadoop-7] 7.0 ^7.1 Blog post by Etsy describing their use of Cascading with Hadoop

[8] "Cascading User Guide". Archived from the original on February 6, 2011. https://web.archive.org/web/20110206053054/http://www.cascading.org/1.2/userguide/pdf/userguide.pdf.

[9] "Hadoop Application Performance Management - DRIVEN's Features". https://www.driven.io/features/.

[sdtimes1-10] Handy, Alex (1 June 2011). "The top five most powerful Hadoop projects". SD Times. http://www.sdtimes.com/content/article.aspx?ArticleID=35596&page=1. Retrieved 26 October 2013.

[biomedcent1-11] Taylor, Ronald (21 December 2010). "An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics". BioMed Central (Springer Science+Business Media). http://www.biomedcentral.com/1471-2105/11/S12/S1. Retrieved 26 October 2013.

[12] White, Tom (September 24, 2010). Hadoop: The Definitive Guide. "O'Reilly Media, Inc.". ISBN 9781449396893. https://books.google.com/books?id=Nff49D7vnJcC&dq=cascading+hadoop&pg=PA548.

[13] "Getting Started on Hadoop". https://www.slideshare.net/pacoid/getting-started-on-hadoop.

[14] "Julio Guijarro, Steve Loughran and Paolo Castagna, "Hadoop and beyond," HP Labs, Bristol UK, 2008.". http://www.smartfrog.org/wiki/download/attachments/6193590/hadoop_and_beyond.pdf?version=1&modificationDate=1238073739000.

[15] "Flightcaster Presentation Hadoop". https://www.slideshare.net/hadoopusergroup/flightcaster-presentation-hadoop.

[16] "NoSQL, Hadoop, Cascading June 2010". https://www.slideshare.net/chriscurtin/nosql-hadoop-cascading-june-2010.

[17] "Using Cascading to Build Data-centric Applications on Spark". 2014-05-07. https://spark-summit.org/2014/talk/using-cascading-to-build-data-centric-applications-on-spark.

[18] "Cascading.Multitool on AWS". http://aws.amazon.com/articles/2293?_encoding=UTF8&jiveRedirect=1.

[19] "AWS Articles". https://aws.amazon.com/articles/item/.

[20] BackType blog

[21] "FlightCaster". http://www.informationweek.com/news/software/infrastructure/224000240.

[22] "Ion Flux". Archived from the original on October 23, 2011. https://web.archive.org/web/20111023203553/http://www.concurrentinc.com/casestudies/ion_flux.

[23] RapLeaf Blog

[24] "Razorfish Case Study". https://aws.amazon.com/solutions/case-studies/razorfish/.

[25] "PyCascading is no longer maintained". 17 September 2021. https://github.com/twitter/pycascading.

[26] "Cascading.JRuby". August 8, 2018. https://github.com/gmarabout/cascading.jruby.

[27] "Cascalog". June 23, 2023. https://github.com/nathanmarz/cascalog.

[28] "Scalding". June 22, 2023. https://github.com/twitter/scalding.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

Anonymous

Search

Software:Cascading

Namespaces

More

Page actions

Contents

Architecture

Uses of Cascading

Domain-Specific Languages Built on Cascading

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools


Stable release	3.3.0 / March 24, 2018; 7 years ago (2018-03-24)^[1]
Preview release	4.0-wip-120 / March 27, 2021; 4 years ago (2021-03-27)^[2]

Repository	github.com/Cascading/cascading
Written in	Java
License	Apache License v2^[3]
Website	www.cascading.org

Anonymous

Search

Software:Cascading

Architecture

Uses of Cascading

Domain-Specific Languages Built on Cascading

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories