Software:Norconex Web Crawler: Difference between revisions
From HandWiki
Importwiki (talk | contribs) (import) |
Importwiki (talk | contribs) (import) |
||
Line 13: | Line 13: | ||
}} | }} | ||
'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] web crawling and web scraping Software written in [[Java (programming language)|Java]] and released under an [[Software:Apache License|Apache License]]. It can export data to many repositories such as [[Software:Apache Solr|Apache Solr]], [[Software:Elasticsearch|Elasticsearch]], Microsoft Azure Cognitive Search, Amazon CloudSearch and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref> | '''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] web crawling and web scraping Software written in [[Java (programming language)|Java]] and released under an [[Software:Apache License|Apache License]]. It can export data to many repositories such as [[Software:Apache Solr|Apache Solr]], [[Software:Elasticsearch|Elasticsearch]],<ref>{{Cite web |date=Apr 12, 2024 |title=Enhance Your Search Capabilities with Norconex Web Crawler: Indexing Data to Elasticsearch |url=https://ohtwadi.medium.com/enhance-your-search-capabilities-with-norconex-web-crawler-indexing-data-to-elasticsearch-1a3e7b7d3617 |website=Medium}}</ref> Microsoft Azure Cognitive Search, Amazon CloudSearch and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref> | ||
The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |access-date=25 September 2023}}</ref> | The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |date=9 July 2023 |access-date=25 September 2023}}</ref> | ||
Some key features are: | Some key features are: | ||
Line 43: | Line 43: | ||
== See also == | == See also == | ||
* {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}} | * {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}} | ||
* {{cite web |title=19 Best Web Crawling Tools for Efficient Data Extraction |url=https://crawlbase.com/blog/best-web-crawling-tools/ |access-date=2024-05-10 |website=Crawlbase}} | |||
[[Category:Web crawlers]] | [[Category:Web crawlers]] | ||
{{Sourceattribution|Norconex Web Crawler}} | {{Sourceattribution|Norconex Web Crawler}} |
Latest revision as of 18:41, 15 May 2024
Other names | Norconex HTTP Collector |
---|---|
Developer(s) | Norconex Inc. |
Initial release | 2016 |
Stable release | 3.0.2
/ 2022-01-05 |
Repository | GitHub Repository |
Written in | Java |
Operating system | Cross-platform |
License | Apache License |
Website | Norconex Web Crawler |
Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch,[1] Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[2][3][4]
The Crawler can be run on its own or embedded in your own Java application.[5][6]
Some key features are:
- Multi-threaded
- Extract text from a variety of file formats (HTML, PDF, Word, etc.)
- Extract metadata associated with documents
- Supports pages rendered with JavaScript
- Incremental crawls
- Supports external commands to parse or manipulate documents
- Send extracted data to a variety of repositories
Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[7] [8]
History
Norconex Web Crawler was released as free and open-source software in 2013.[9]
References
- ↑ "Enhance Your Search Capabilities with Norconex Web Crawler: Indexing Data to Elasticsearch". Apr 12, 2024. https://ohtwadi.medium.com/enhance-your-search-capabilities-with-norconex-web-crawler-indexing-data-to-elasticsearch-1a3e7b7d3617.
- ↑ "Committers". https://opensource.norconex.com/committers/.
- ↑ Hoppa, Jocelyn (10 February 2020). "Importing Data from the Web with Norconex & Neo4j" (in en). https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/.
- ↑ "Deploy a Norconex HTTP Collector Indexer Plugin | Cloud Search" (in en). https://developers.google.com/cloud-search/docs/guides/norconex-http-connector.
- ↑ Valcheva, Silvia (11 February 2018). "10 Best Open Source Web Crawlers: Web Data Extraction Software". https://www.intellspot.com/open-source-web-crawlers/.
- ↑ "Norconex HTTP Collector". 9 July 2023. https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml.
- ↑ "SolrEcosystem - Solr - Apache Software Foundation". https://cwiki.apache.org/confluence/display/solr/SolrEcosystem.
- ↑ "Norconex Crawler Users". https://opensource.norconex.com/crawlers/usedby.
- ↑ "Norconex Gives Back to Open-Source – Norconex Inc" (in en-US). https://norconex.com/norconex-gives-back-to-open-source/.
Mentions in Academic Research
- Kancherla, Vinay (1 December 2014). "A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18)". Master's Projects. doi:10.31979/etd.ubfy-s3es. https://scholarworks.sjsu.edu/etd_projects/380/. Retrieved 28 September 2023.
- Horváth, Balázs (28 August 2017) (in en). Recommendation Techniques for smart cities (pg. 12). https://aaltodoc.aalto.fi/handle/123456789/27974. Retrieved 28 September 2023.
- Wani, Mudasir Ahmad; Agarwal, Nancy; Jabin, Suraiya; Hussain, Syed Zesahn (2018). "Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users". arXiv:1802.09566 [cs.SI].
- Abbasi, Vahid. "Phonetic Analysis and Searching with Google Glass API" (in en). https://uub.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma991018494504807596&context=L&vid=46LIBRIS_UUB:UUB&lang=en&search_scope=MyInst_and_CI&adaptor=Local%20Search%20Engine&tab=Everything&query=creator,contains,vahid%20abbasi&offset=0.
See also
- Mitchell, Pete (8 April 2022). "25 Best Free Web Crawler Tools". https://techcult.com/best-free-web-crawler-tools/.
- "19 Best Web Crawling Tools for Efficient Data Extraction". https://crawlbase.com/blog/best-web-crawling-tools/.
Original source: https://en.wikipedia.org/wiki/Norconex Web Crawler.
Read more |