Software:Norconex Web Crawler
From HandWiki
Other names | Norconex HTTP Collector |
---|---|
Developer(s) | Norconex Inc. |
Initial release | 2016 |
Stable release | 3.0.2
/ 2022-01-05 |
Repository | GitHub Repository |
Written in | Java |
Operating system | Cross-platform |
License | Apache License |
Website | Norconex Web Crawler |
Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch, Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[1][2][3]
The Crawler can be run on its own or embedded in your own Java application.[4][5]
Some key features are:
- Multi-threaded
- Extract text from a variety of file formats (HTML, PDF, Word, etc.)
- Extract metadata associated with documents
- Supports pages rendered with JavaScript
- Incremental crawls
- Supports external commands to parse or manipulate documents
- Send extracted data to a variety of repositories
Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[6] [7]
History
Norconex Web Crawler was released as free and open-source software in 2013.[8]
References
- ↑ "Committers". https://opensource.norconex.com/committers/.
- ↑ Hoppa, Jocelyn (10 February 2020). "Importing Data from the Web with Norconex & Neo4j" (in en). https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/.
- ↑ "Deploy a Norconex HTTP Collector Indexer Plugin | Cloud Search" (in en). https://developers.google.com/cloud-search/docs/guides/norconex-http-connector.
- ↑ Valcheva, Silvia (11 February 2018). "10 Best Open Source Web Crawlers: Web Data Extraction Software". https://www.intellspot.com/open-source-web-crawlers/.
- ↑ "Norconex HTTP Collector". https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml.
- ↑ "SolrEcosystem - Solr - Apache Software Foundation". https://cwiki.apache.org/confluence/display/solr/SolrEcosystem.
- ↑ "Norconex Crawler Users". https://opensource.norconex.com/crawlers/usedby.
- ↑ "Norconex Gives Back to Open-Source – Norconex Inc" (in en-US). https://norconex.com/norconex-gives-back-to-open-source/.
Mentions in Academic Research
- Kancherla, Vinay (1 December 2014). "A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18)". Master's Projects. doi:10.31979/etd.ubfy-s3es. https://scholarworks.sjsu.edu/etd_projects/380/. Retrieved 28 September 2023.
- Horváth, Balázs (28 August 2017) (in en). Recommendation Techniques for smart cities (pg. 12). https://aaltodoc.aalto.fi/handle/123456789/27974. Retrieved 28 September 2023.
- Wani, Mudasir Ahmad; Agarwal, Nancy; Jabin, Suraiya; Hussain, Syed Zesahn (2018). "Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users". arXiv:1802.09566 [cs.SI].
- Abbasi, Vahid. "Phonetic Analysis and Searching with Google Glass API" (in en). https://uub.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma991018494504807596&context=L&vid=46LIBRIS_UUB:UUB&lang=en&search_scope=MyInst_and_CI&adaptor=Local%20Search%20Engine&tab=Everything&query=creator,contains,vahid%20abbasi&offset=0.
See also
- Mitchell, Pete (8 April 2022). "25 Best Free Web Crawler Tools". https://techcult.com/best-free-web-crawler-tools/.
Original source: https://en.wikipedia.org/wiki/Norconex Web Crawler.
Read more |