Software:Beautiful Soup (HTML parser)
Original author(s) | Leonard Richardson |
---|---|
Initial release | 2004 |
Written in | Python |
Platform | Python |
Type | HTML parser library, Web scraping |
License | Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License (versions 4 and up)[1] |
Website | www |
Beautiful Soup is a Python package for parsing HTML and XML documents, including those with malformed markup. It creates a parse tree for documents that can be used to extract data from HTML,[2] which is useful for web scraping.[1][3]
Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,[4] and is additionally supported by Tidelift, a paid subscription to open-source maintenance.[5]
Code example
Beautiful Soup represents parsed data as a tree which can be searched and iterated over with ordinary Python loops.[6] The example below uses the Python standard library's urllib[7] to load Wikipedia's main page, then uses Beautiful Soup to parse the document and search for all links within.
#!/usr/bin/env python3 # Anchor extraction from HTML document from bs4 import BeautifulSoup from urllib.request import urlopen with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response: soup = BeautifulSoup(response, 'html.parser') for anchor in soup.find_all('a'): print(anchor.get('href', '/'))
History
Beautiful Soup is named both after a poem in Alice's Adventures in Wonderland[8] and tag soup.[9]
Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is Beautiful Soup 4.x. Beautiful Soup 4 can be installed with pip install beautifulsoup4
.
In 2021, Python 2.7 support was retired and the release 4.9.3 was the last to support Python 2.7.[10]
See also
References
- ↑ 1.0 1.1 "Beautiful Soup website". http://www.crummy.com/software/BeautifulSoup/#Download. Retrieved 18 April 2012. "Beautiful Soup is licensed under the same terms as Python itself"
- ↑ Hajba, Gábor László (2018), Hajba, Gábor László, ed., "Using Beautiful Soup" (in en), Website Scraping with Python: Using BeautifulSoup and Scrapy (Apress): pp. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4
- ↑ Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python" (in en). https://realpython.com/beautiful-soup-web-scraper-python/.
- ↑ "Code : Leonard Richardson" (in en-US). https://code.launchpad.net/%7Eleonardr/+branches.
- ↑ Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription" (in en). https://tidelift.com/subscription/pkg/pypi-beautifulsoup4.
- ↑ "How To Scrape Web Pages with Beautiful Soup and Python 3 | DigitalOcean" (in en). https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3.
- ↑ Python, Real. "Python's urllib.request for HTTP Requests – Real Python" (in en). https://realpython.com/urllib-request/.
- ↑ makcorps (2022-12-13). "BeautifulSoup tutorial: Let's Scrape Web Pages with Python" (in en-US). https://www.scrapingdog.com/blog/beautifulsoup-tutorial-web-scraping-with-python/.
- ↑ "Python Web Scraping" (in en-US). 2021-02-11. https://www.udacity.com/blog/2021/02/python-web-scraping.html.
- ↑ Richardson, Leonard (7 Sep 2021). "Beautiful Soup 4.10.0" (in en-US). Google Groups. https://groups.google.com/g/beautifulsoup/c/flWqqlrcJ9s.
Original source: https://en.wikipedia.org/wiki/Beautiful Soup (HTML parser).
Read more |