<h3 align="center">
<img src="https://raw.githubusercontent.com/maxhumber/gazpacho/master/images/gazpacho.png" height="300px" alt="gazpacho">
</h3>
<p align="center">
<a href="https://travis-ci.org/maxhumber/gazpacho"><img alt="Travis" src="https://img.shields.io/travis/maxhumber/gazpacho.svg"></a>
<a href="https://pypi.python.org/pypi/gazpacho"><img alt="PyPI" src="https://img.shields.io/pypi/v/gazpacho.svg"></a>
<a href="https://pypi.python.org/pypi/gazpacho"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/gazpacho.svg"></a>
<a href="https://pepy.tech/project/gazpacho"><img alt="Downloads" src="https://pepy.tech/badge/gazpacho"></a>
</p>
## About
gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with **zero** dependencies.
## Install
Install with `pip` at the command line:
```
pip install -U gazpacho
```
## Quickstart
Give this a try:
```python
from gazpacho import get, Soup
url = 'https://scrape.world/books'
html = get(url)
soup = Soup(html)
books = soup.find('div', {'class': 'book-'}, partial=True)
def parse(book):
name = book.find('h4').text
price = float(book.find('p').text[1:].split(' ')[0])
return name, price
[parse(book) for book in books]
```
## Tutorial
#### Import
Import gazpacho following the convention:
```python
from gazpacho import get, Soup
```
#### get
Use the `get` function to download raw HTML:
```python
url = 'https://scrape.world/soup'
html = get(url)
print(html[:50])
# '<!DOCTYPE html>\n<html lang="en">\n <head>\n <met'
```
Adjust `get` requests with optional params and headers:
```python
get(
url='https://httpbin.org/anything',
params={'foo': 'bar', 'bar': 'baz'},
headers={'User-Agent': 'gazpacho'}
)
```
#### Soup
Use the `Soup` wrapper on raw html to enable parsing:
```python
soup = Soup(html)
```
Soup objects can alternatively be initialized with the `.get` classmethod:
```python
soup = Soup.get(url)
```
#### .find
Use the `.find` method to target and extract HTML tags:
```python
h1 = soup.find('h1')
print(h1)
# <h1 id="firstHeading" class="firstHeading" lang="en">Soup</h1>
```
#### attrs=
Use the `attrs` argument to isolate tags that contain specific HTML element attributes:
```python
soup.find('div', attrs={'class': 'section-'})
```
#### partial=
Element attributes are partially matched by default. Turn this off by setting `partial` to `False`:
```python
soup.find('div', {'class': 'soup'}, partial=False)
```
#### mode=
Override the mode argument {`'auto', 'first', 'all'`} to guarantee return behaviour:
```python
print(soup.find('span', mode='first'))
# <span class="navbar-toggler-icon"></span>
len(soup.find('span', mode='all'))
# 8
```
#### dir()
`Soup` objects have `html`, `tag`, `attrs`, and `text` attributes:
```python
dir(h1)
# ['attrs', 'find', 'get', 'html', 'strip', 'tag', 'text']
```
Use them accordingly:
```python
print(h1.html)
# '<h1 id="firstHeading" class="firstHeading" lang="en">Soup</h1>'
print(h1.tag)
# h1
print(h1.attrs)
# {'id': 'firstHeading', 'class': 'firstHeading', 'lang': 'en'}
print(h1.text)
# Soup
```
## Support
If you use gazpacho, consider adding the [![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho) badge to your project README.md:
```markdown
[![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho)
```
## Contribute
For feature requests or bug reports, please use [Github Issues](https://github.com/maxhumber/gazpacho/issues)
For PRs, please read the [CONTRIBUTING.md](https://github.com/maxhumber/gazpacho/blob/master/CONTRIBUTING.md) document
Raw data
{
"_id": null,
"home_page": "https://github.com/maxhumber/gazpacho",
"name": "gazpacho",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "web scraping,BeautifulSoup,requests",
"author": "Max Humber",
"author_email": "max.humber@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/1d/65/3151b3837e9fa0fa535524c56e535f88910c10a3703487d9aead154c1339/gazpacho-1.1.tar.gz",
"platform": "",
"description": "<h3 align=\"center\">\n <img src=\"https://raw.githubusercontent.com/maxhumber/gazpacho/master/images/gazpacho.png\" height=\"300px\" alt=\"gazpacho\">\n</h3>\n<p align=\"center\">\n <a href=\"https://travis-ci.org/maxhumber/gazpacho\"><img alt=\"Travis\" src=\"https://img.shields.io/travis/maxhumber/gazpacho.svg\"></a>\n <a href=\"https://pypi.python.org/pypi/gazpacho\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/gazpacho.svg\"></a>\n\t<a href=\"https://pypi.python.org/pypi/gazpacho\"><img alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/gazpacho.svg\"></a>\n <a href=\"https://pepy.tech/project/gazpacho\"><img alt=\"Downloads\" src=\"https://pepy.tech/badge/gazpacho\"></a> \n</p>\n\n\n\n\n## About\n\ngazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with **zero** dependencies.\n\n\n\n## Install\n\nInstall with `pip` at the command line:\n\n```\npip install -U gazpacho\n```\n\n\n\n## Quickstart\n\nGive this a try:\n\n```python\nfrom gazpacho import get, Soup\n\nurl = 'https://scrape.world/books'\nhtml = get(url)\nsoup = Soup(html)\nbooks = soup.find('div', {'class': 'book-'}, partial=True)\n\ndef parse(book):\n name = book.find('h4').text\n price = float(book.find('p').text[1:].split(' ')[0])\n return name, price\n\n[parse(book) for book in books]\n```\n\n\n\n## Tutorial\n\n#### Import\n\nImport gazpacho following the convention:\n\n```python\nfrom gazpacho import get, Soup\n```\n\n\n\n#### get\n\nUse the `get` function to download raw HTML:\n\n```python\nurl = 'https://scrape.world/soup'\nhtml = get(url)\nprint(html[:50])\n# '<!DOCTYPE html>\\n<html lang=\"en\">\\n <head>\\n <met'\n```\n\nAdjust `get` requests with optional params and headers:\n\n```python\nget(\n url='https://httpbin.org/anything',\n params={'foo': 'bar', 'bar': 'baz'},\n headers={'User-Agent': 'gazpacho'}\n)\n```\n\n\n\n#### Soup\n\nUse the `Soup` wrapper on raw html to enable parsing:\n\n```python\nsoup = Soup(html)\n```\n\nSoup objects can alternatively be initialized with the `.get` classmethod:\n\n```python\nsoup = Soup.get(url)\n```\n\n\n\n#### .find\n\nUse the `.find` method to target and extract HTML tags:\n\n```python\nh1 = soup.find('h1')\nprint(h1)\n# <h1 id=\"firstHeading\" class=\"firstHeading\" lang=\"en\">Soup</h1>\n```\n\n\n\n#### attrs=\n\nUse the `attrs` argument to isolate tags that contain specific HTML element attributes:\n\n```python\nsoup.find('div', attrs={'class': 'section-'})\n```\n\n\n\n#### partial=\n\nElement attributes are partially matched by default. Turn this off by setting `partial` to `False`: \n\n```python\nsoup.find('div', {'class': 'soup'}, partial=False)\n```\n\n\n\n#### mode=\n\nOverride the mode argument {`'auto', 'first', 'all'`} to guarantee return behaviour:\n\n```python\nprint(soup.find('span', mode='first'))\n# <span class=\"navbar-toggler-icon\"></span>\nlen(soup.find('span', mode='all'))\n# 8\n```\n\n\n\n#### dir()\n\n`Soup` objects have `html`, `tag`, `attrs`, and `text` attributes:\n\n```python\ndir(h1)\n# ['attrs', 'find', 'get', 'html', 'strip', 'tag', 'text']\n```\n\nUse them accordingly:\n\n```python\nprint(h1.html)\n# '<h1 id=\"firstHeading\" class=\"firstHeading\" lang=\"en\">Soup</h1>'\nprint(h1.tag)\n# h1\nprint(h1.attrs)\n# {'id': 'firstHeading', 'class': 'firstHeading', 'lang': 'en'}\nprint(h1.text)\n# Soup\n```\n\n\n\n## Support\n\nIf you use gazpacho, consider adding the [![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho) badge to your project README.md:\n\n```markdown\n[![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho)\n```\n\n\n\n## Contribute\n\nFor feature requests or bug reports, please use [Github Issues](https://github.com/maxhumber/gazpacho/issues)\n\nFor PRs, please read the [CONTRIBUTING.md](https://github.com/maxhumber/gazpacho/blob/master/CONTRIBUTING.md) document",
"bugtrack_url": null,
"license": "MIT",
"summary": "The simple, fast, and modern web scraping library",
"version": "1.1",
"project_urls": {
"Homepage": "https://github.com/maxhumber/gazpacho"
},
"split_keywords": [
"web scraping",
"beautifulsoup",
"requests"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1d653151b3837e9fa0fa535524c56e535f88910c10a3703487d9aead154c1339",
"md5": "b5f3c09706b6a3c3f0963eb3e888a57e",
"sha256": "1579c1be2de05b5ded0a97107b179d12491392fb095aeab185b283ea48cd7010"
},
"downloads": -1,
"filename": "gazpacho-1.1.tar.gz",
"has_sig": false,
"md5_digest": "b5f3c09706b6a3c3f0963eb3e888a57e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 7872,
"upload_time": "2020-10-09T12:50:18",
"upload_time_iso_8601": "2020-10-09T12:50:18.025143Z",
"url": "https://files.pythonhosted.org/packages/1d/65/3151b3837e9fa0fa535524c56e535f88910c10a3703487d9aead154c1339/gazpacho-1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-10-09 12:50:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "maxhumber",
"github_project": "gazpacho",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"lcname": "gazpacho"
}