Software:Crawlee

From HandWiki
Crawlee
Crawlee.svg
Developer(s)Apify
Initial release13 July 2022 (2022-07-13)
Written inTypescript, Python
Operating systemWindows, macOS, Linux
TypeWeb crawler
LicenseApache License 2.0

Crawlee is a free and open-source web-crawling and browser automation library developed by Apify. The original TypeScript version was first released in 2022, with a Python version added in 2024.

Crawlee's architecture is built around modular crawlers responsible for extracting data from websites[1]. The library follows a declarative programming approach, where users define crawling logic through a structured set of rules. Crawlee uses queues to manage requests; for each request, a specific function is executed to extract data or perform further processing[2].

Crawlee supports both headless browser sessions (via Playwright and other browser automation software) and plain HTTP request-based scraping.

It also provides various web-scraping-related utilities, such as a sitemap parser[3] or an automatic HTTP proxy manager.

Notable mentions of Crawlee's use in web-crawling projects include GPT Crawler by Builder.io[4] and various generative AI projects maintained by AWS Labs[5].

History

The first stable TypeScript version was released in 2021 under the name Apify SDK[6]. This version offered both the open-source crawling framework and the proprietary storage implementation for use on the Apify platform.

In 2022, version v3.0.0 was released[7], renaming the library to Crawlee. This update made Crawlee independent of the Apify Platform, moving most of the Apify-specific features into a separate package (also named Apify SDK).

In 2024, a beta version of Crawlee for Python was released[8].

References