gazpacho


Namegazpacho JSON
Version 1.1 PyPI version JSON
download
home_pagehttps://github.com/maxhumber/gazpacho
SummaryThe simple, fast, and modern web scraping library
upload_time2020-10-09 12:50:18
maintainer
docs_urlNone
authorMax Humber
requires_python>=3.6
licenseMIT
keywords web scraping beautifulsoup requests
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            <h3 align="center">
  <img src="https://raw.githubusercontent.com/maxhumber/gazpacho/master/images/gazpacho.png" height="300px" alt="gazpacho">
</h3>
<p align="center">
  <a href="https://travis-ci.org/maxhumber/gazpacho"><img alt="Travis" src="https://img.shields.io/travis/maxhumber/gazpacho.svg"></a>
  <a href="https://pypi.python.org/pypi/gazpacho"><img alt="PyPI" src="https://img.shields.io/pypi/v/gazpacho.svg"></a>
	<a href="https://pypi.python.org/pypi/gazpacho"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/gazpacho.svg"></a>
  <a href="https://pepy.tech/project/gazpacho"><img alt="Downloads" src="https://pepy.tech/badge/gazpacho"></a>  
</p>




## About

gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with **zero** dependencies.



## Install

Install with `pip` at the command line:

```
pip install -U gazpacho
```



## Quickstart

Give this a try:

```python
from gazpacho import get, Soup

url = 'https://scrape.world/books'
html = get(url)
soup = Soup(html)
books = soup.find('div', {'class': 'book-'}, partial=True)

def parse(book):
    name = book.find('h4').text
    price = float(book.find('p').text[1:].split(' ')[0])
    return name, price

[parse(book) for book in books]
```



## Tutorial

#### Import

Import gazpacho following the convention:

```python
from gazpacho import get, Soup
```



#### get

Use the `get` function to download raw HTML:

```python
url = 'https://scrape.world/soup'
html = get(url)
print(html[:50])
# '<!DOCTYPE html>\n<html lang="en">\n  <head>\n    <met'
```

Adjust `get` requests with optional params and headers:

```python
get(
    url='https://httpbin.org/anything',
    params={'foo': 'bar', 'bar': 'baz'},
    headers={'User-Agent': 'gazpacho'}
)
```



#### Soup

Use the `Soup` wrapper on raw html to enable parsing:

```python
soup = Soup(html)
```

Soup objects can alternatively be initialized with the  `.get` classmethod:

```python
soup = Soup.get(url)
```



#### .find

Use the `.find` method to target and extract HTML tags:

```python
h1 = soup.find('h1')
print(h1)
# <h1 id="firstHeading" class="firstHeading" lang="en">Soup</h1>
```



#### attrs=

Use the `attrs` argument to isolate tags that contain specific HTML element attributes:

```python
soup.find('div', attrs={'class': 'section-'})
```



#### partial=

Element attributes are partially matched by default. Turn this off by setting `partial` to `False`:  

```python
soup.find('div', {'class': 'soup'}, partial=False)
```



#### mode=

Override the mode argument {`'auto', 'first', 'all'`} to guarantee return behaviour:

```python
print(soup.find('span', mode='first'))
# <span class="navbar-toggler-icon"></span>
len(soup.find('span', mode='all'))
# 8
```



#### dir()

`Soup` objects have `html`, `tag`, `attrs`, and `text` attributes:

```python
dir(h1)
# ['attrs', 'find', 'get', 'html', 'strip', 'tag', 'text']
```

Use them accordingly:

```python
print(h1.html)
# '<h1 id="firstHeading" class="firstHeading" lang="en">Soup</h1>'
print(h1.tag)
# h1
print(h1.attrs)
# {'id': 'firstHeading', 'class': 'firstHeading', 'lang': 'en'}
print(h1.text)
# Soup
```



## Support

If you use gazpacho, consider adding the [![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho) badge to your project README.md:

```markdown
[![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho)
```



## Contribute

For feature requests or bug reports, please use [Github Issues](https://github.com/maxhumber/gazpacho/issues)

For PRs, please read the [CONTRIBUTING.md](https://github.com/maxhumber/gazpacho/blob/master/CONTRIBUTING.md) document
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maxhumber/gazpacho",
    "name": "gazpacho",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "web scraping,BeautifulSoup,requests",
    "author": "Max Humber",
    "author_email": "max.humber@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1d/65/3151b3837e9fa0fa535524c56e535f88910c10a3703487d9aead154c1339/gazpacho-1.1.tar.gz",
    "platform": "",
    "description": "<h3 align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/maxhumber/gazpacho/master/images/gazpacho.png\" height=\"300px\" alt=\"gazpacho\">\n</h3>\n<p align=\"center\">\n  <a href=\"https://travis-ci.org/maxhumber/gazpacho\"><img alt=\"Travis\" src=\"https://img.shields.io/travis/maxhumber/gazpacho.svg\"></a>\n  <a href=\"https://pypi.python.org/pypi/gazpacho\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/gazpacho.svg\"></a>\n\t<a href=\"https://pypi.python.org/pypi/gazpacho\"><img alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/gazpacho.svg\"></a>\n  <a href=\"https://pepy.tech/project/gazpacho\"><img alt=\"Downloads\" src=\"https://pepy.tech/badge/gazpacho\"></a>  \n</p>\n\n\n\n\n## About\n\ngazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with **zero** dependencies.\n\n\n\n## Install\n\nInstall with `pip` at the command line:\n\n```\npip install -U gazpacho\n```\n\n\n\n## Quickstart\n\nGive this a try:\n\n```python\nfrom gazpacho import get, Soup\n\nurl = 'https://scrape.world/books'\nhtml = get(url)\nsoup = Soup(html)\nbooks = soup.find('div', {'class': 'book-'}, partial=True)\n\ndef parse(book):\n    name = book.find('h4').text\n    price = float(book.find('p').text[1:].split(' ')[0])\n    return name, price\n\n[parse(book) for book in books]\n```\n\n\n\n## Tutorial\n\n#### Import\n\nImport gazpacho following the convention:\n\n```python\nfrom gazpacho import get, Soup\n```\n\n\n\n#### get\n\nUse the `get` function to download raw HTML:\n\n```python\nurl = 'https://scrape.world/soup'\nhtml = get(url)\nprint(html[:50])\n# '<!DOCTYPE html>\\n<html lang=\"en\">\\n  <head>\\n    <met'\n```\n\nAdjust `get` requests with optional params and headers:\n\n```python\nget(\n    url='https://httpbin.org/anything',\n    params={'foo': 'bar', 'bar': 'baz'},\n    headers={'User-Agent': 'gazpacho'}\n)\n```\n\n\n\n#### Soup\n\nUse the `Soup` wrapper on raw html to enable parsing:\n\n```python\nsoup = Soup(html)\n```\n\nSoup objects can alternatively be initialized with the  `.get` classmethod:\n\n```python\nsoup = Soup.get(url)\n```\n\n\n\n#### .find\n\nUse the `.find` method to target and extract HTML tags:\n\n```python\nh1 = soup.find('h1')\nprint(h1)\n# <h1 id=\"firstHeading\" class=\"firstHeading\" lang=\"en\">Soup</h1>\n```\n\n\n\n#### attrs=\n\nUse the `attrs` argument to isolate tags that contain specific HTML element attributes:\n\n```python\nsoup.find('div', attrs={'class': 'section-'})\n```\n\n\n\n#### partial=\n\nElement attributes are partially matched by default. Turn this off by setting `partial` to `False`:  \n\n```python\nsoup.find('div', {'class': 'soup'}, partial=False)\n```\n\n\n\n#### mode=\n\nOverride the mode argument {`'auto', 'first', 'all'`} to guarantee return behaviour:\n\n```python\nprint(soup.find('span', mode='first'))\n# <span class=\"navbar-toggler-icon\"></span>\nlen(soup.find('span', mode='all'))\n# 8\n```\n\n\n\n#### dir()\n\n`Soup` objects have `html`, `tag`, `attrs`, and `text` attributes:\n\n```python\ndir(h1)\n# ['attrs', 'find', 'get', 'html', 'strip', 'tag', 'text']\n```\n\nUse them accordingly:\n\n```python\nprint(h1.html)\n# '<h1 id=\"firstHeading\" class=\"firstHeading\" lang=\"en\">Soup</h1>'\nprint(h1.tag)\n# h1\nprint(h1.attrs)\n# {'id': 'firstHeading', 'class': 'firstHeading', 'lang': 'en'}\nprint(h1.text)\n# Soup\n```\n\n\n\n## Support\n\nIf you use gazpacho, consider adding the [![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho) badge to your project README.md:\n\n```markdown\n[![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho)\n```\n\n\n\n## Contribute\n\nFor feature requests or bug reports, please use [Github Issues](https://github.com/maxhumber/gazpacho/issues)\n\nFor PRs, please read the [CONTRIBUTING.md](https://github.com/maxhumber/gazpacho/blob/master/CONTRIBUTING.md) document",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "The simple, fast, and modern web scraping library",
    "version": "1.1",
    "project_urls": {
        "Homepage": "https://github.com/maxhumber/gazpacho"
    },
    "split_keywords": [
        "web scraping",
        "beautifulsoup",
        "requests"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1d653151b3837e9fa0fa535524c56e535f88910c10a3703487d9aead154c1339",
                "md5": "b5f3c09706b6a3c3f0963eb3e888a57e",
                "sha256": "1579c1be2de05b5ded0a97107b179d12491392fb095aeab185b283ea48cd7010"
            },
            "downloads": -1,
            "filename": "gazpacho-1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b5f3c09706b6a3c3f0963eb3e888a57e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7872,
            "upload_time": "2020-10-09T12:50:18",
            "upload_time_iso_8601": "2020-10-09T12:50:18.025143Z",
            "url": "https://files.pythonhosted.org/packages/1d/65/3151b3837e9fa0fa535524c56e535f88910c10a3703487d9aead154c1339/gazpacho-1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-10-09 12:50:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maxhumber",
    "github_project": "gazpacho",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gazpacho"
}
        
Elapsed time: 0.45839s