Software:Norconex Web Crawler

From HandWiki
Norconex Web Crawler
Other namesNorconex HTTP Collector
Developer(s)Norconex Inc.
Initial release2016
Stable release
3.0.2 / 2022-01-05
RepositoryGitHub Repository
Written inJava
Operating systemCross-platform
LicenseApache License
WebsiteNorconex Web Crawler

Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch,[1] Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[2][3][4]

The Crawler can be run on its own or embedded in your own Java application.[5][6]

Some key features are:

  • Multi-threaded
  • Extract text from a variety of file formats (HTML, PDF, Word, etc.)
  • Extract metadata associated with documents
  • Supports pages rendered with JavaScript
  • Incremental crawls
  • Supports external commands to parse or manipulate documents
  • Send extracted data to a variety of repositories

Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[7] [8]

History

Norconex Web Crawler was released as free and open-source software in 2013.[9]

References

Mentions in Academic Research

  • Kancherla, Vinay (1 December 2014). "A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18)". Master's Projects. doi:10.31979/etd.ubfy-s3es. https://scholarworks.sjsu.edu/etd_projects/380/. Retrieved 28 September 2023. 
  • Horváth, Balázs (28 August 2017) (in en). Recommendation Techniques for smart cities (pg. 12). https://aaltodoc.aalto.fi/handle/123456789/27974. Retrieved 28 September 2023. 
  • Wani, Mudasir Ahmad; Agarwal, Nancy; Jabin, Suraiya; Hussain, Syed Zesahn (2018). "Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users". arXiv:1802.09566 [cs.SI].
  • Abbasi, Vahid. "Phonetic Analysis and Searching with Google Glass API" (in en). https://uub.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma991018494504807596&context=L&vid=46LIBRIS_UUB:UUB&lang=en&search_scope=MyInst_and_CI&adaptor=Local%20Search%20Engine&tab=Everything&query=creator,contains,vahid%20abbasi&offset=0. 

See also