grab

Name	grab JSON
Version	1.0.0 JSON
	download
home_page	None
Summary	Web scraping framework
upload_time	2025-09-15 17:15:46
maintainer	None
docs_url	https://pythonhosted.org/grab/
author	None
requires_python	>=2.7
license	None
keywords	pycurl multicurl curl network parsing grabbing scraping
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            # Grab

## Update (2025 year)

Since 2018 (which is the year of most recent Grab release) I have tried to do large refactoring of
code base of Grab library. Which ended up with semi-working product which nobody uses, including me.
I have decided to reset all project files to the state of most recent pypi release 0.6.41 dated by june 2018.
At least, now the code base corresponds to live version of the product which is being used by some people,
according to [pypi stats](https://clickpy.clickhouse.com/dashboard/grab).

I've updated Grab code base and code base of its dependencies to be compatible with Python 2.7 and Python 3.13
(and, hopefully, all py versions between these two). I have set up github action to run all tests on Python 2.7
and Python 3.13. In near time I am going to make a release of updated Grab's code base to pypi.

There is NO new features. It is just update code base which is alive now i.e. it can run on Python 2.7 or on
modern python and its tests passes and it has github CI config to run tests on new commits.

One backward-incompatible change is that I do not use `weblib.error::DataNotFound` exception anymore. Now Grab
uses DataNotFound exception which is stored in `grab.errors` module. So, if your code imports `DataNotFound`
from weblib, you should fix such imports.


## Support

You are welcome to talk about web scraping and data processing in these Telegram chat groups: [@grablab](https://t.me/grablab) (English) and [@grablab\_ru](https://t.me/grablab_ru) (Russian)

Documentation: https://grab.readthedocs.io/en/stable/


## Installation

Run `pip install -U grab`

See details about installing Grab on different platforms here https://grab.readthedocs.io/en/stable/usage/installation.html


## What is Grab?

Grab is a python web scraping framework. Grab provides a number of helpful methods
to perform network requests, scrape web sites and process the scraped content:

* Automatic cookies (session) support
* HTTP and SOCKS proxy with/without authorization
* Keep-Alive support
* IDN support
* Tools to work with web forms
* Easy multipart file uploading
* Flexible customization of HTTP requests
* Automatic charset detection
* Powerful API to extract data from DOM tree of HTML documents with XPATH queries
* Asynchronous API to make thousands of simultaneous queries. This part of
  library called Spider. See list of spider fetures below.
* Python 3 ready

Spider is a framework for writing web-site scrapers. Features:

* Rules and conventions to organize the request/parse logic in separate
  blocks of codes
* Multiple parallel network requests
* Automatic processing of network errors (failed tasks go back to task queue)
* You can create network requests and parse responses with Grab API (see above)
* HTTP proxy support
* Caching network results in permanent storage
* Different backends for task queue (in-memory, redis, mongodb)
* Tools to debug and collect statistics


## Grab Example

```python
import logging

from grab import Grab

logging.basicConfig(level=logging.DEBUG)

g = Grab()

g.go('https://github.com/login')
g.doc.set_input('login', '****')
g.doc.set_input('password', '****')
g.doc.submit()

g.doc.save('/tmp/x.html')

g.doc('//ul[@id="user-links"]//button[contains(@class, "signout")]').assert_exists()

home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
repo_url = home_url + '?tab=repositories'

g.go(repo_url)

for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):
    print('%s: %s' % (elem.text(),
                      g.make_url_absolute(elem.attr('href'))))

```

## Grab::Spider Example

```python
import logging

from grab.spider import Spider, Task

logging.basicConfig(level=logging.DEBUG)


class ExampleSpider(Spider):
    def task_generator(self):
        for lang in 'python', 'ruby', 'perl':
            url = 'https://www.google.com/search?q=%s' % lang
            yield Task('search', url=url, lang=lang)

    def task_search(self, grab, task):
        print('%s: %s' % (task.lang,
                          grab.doc('//div[@class="s"]//cite').text()))


bot = ExampleSpider(thread_number=2)
bot.run()
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "grab",
    "maintainer": null,
    "docs_url": "https://pythonhosted.org/grab/",
    "requires_python": ">=2.7",
    "maintainer_email": null,
    "keywords": "pycurl, multicurl, curl, network, parsing, grabbing, scraping",
    "author": null,
    "author_email": "Gregory Petukhov <lorien@lorien.name>",
    "download_url": "https://files.pythonhosted.org/packages/51/3a/ed767c064d176555a7de34c5fd5027cad9d74a9625e384ad92fbf02032d9/grab-1.0.0.tar.gz",
    "platform": null,
    "description": "# Grab\n\n## Update (2025 year)\n\nSince 2018 (which is the year of most recent Grab release) I have tried to do large refactoring of\ncode base of Grab library. Which ended up with semi-working product which nobody uses, including me.\nI have decided to reset all project files to the state of most recent pypi release 0.6.41 dated by june 2018.\nAt least, now the code base corresponds to live version of the product which is being used by some people,\naccording to [pypi stats](https://clickpy.clickhouse.com/dashboard/grab).\n\nI've updated Grab code base and code base of its dependencies to be compatible with Python 2.7 and Python 3.13\n(and, hopefully, all py versions between these two). I have set up github action to run all tests on Python 2.7\nand Python 3.13. In near time I am going to make a release of updated Grab's code base to pypi.\n\nThere is NO new features. It is just update code base which is alive now i.e. it can run on Python 2.7 or on\nmodern python and its tests passes and it has github CI config to run tests on new commits.\n\nOne backward-incompatible change is that I do not use `weblib.error::DataNotFound` exception anymore. Now Grab\nuses DataNotFound exception which is stored in `grab.errors` module. So, if your code imports `DataNotFound`\nfrom weblib, you should fix such imports.\n\n\n## Support\n\nYou are welcome to talk about web scraping and data processing in these Telegram chat groups: [@grablab](https://t.me/grablab) (English) and [@grablab\\_ru](https://t.me/grablab_ru) (Russian)\n\nDocumentation: https://grab.readthedocs.io/en/stable/\n\n\n## Installation\n\nRun `pip install -U grab`\n\nSee details about installing Grab on different platforms here https://grab.readthedocs.io/en/stable/usage/installation.html\n\n\n## What is Grab?\n\nGrab is a python web scraping framework. Grab provides a number of helpful methods\nto perform network requests, scrape web sites and process the scraped content:\n\n* Automatic cookies (session) support\n* HTTP and SOCKS proxy with/without authorization\n* Keep-Alive support\n* IDN support\n* Tools to work with web forms\n* Easy multipart file uploading\n* Flexible customization of HTTP requests\n* Automatic charset detection\n* Powerful API to extract data from DOM tree of HTML documents with XPATH queries\n* Asynchronous API to make thousands of simultaneous queries. This part of\n  library called Spider. See list of spider fetures below.\n* Python 3 ready\n\nSpider is a framework for writing web-site scrapers. Features:\n\n* Rules and conventions to organize the request/parse logic in separate\n  blocks of codes\n* Multiple parallel network requests\n* Automatic processing of network errors (failed tasks go back to task queue)\n* You can create network requests and parse responses with Grab API (see above)\n* HTTP proxy support\n* Caching network results in permanent storage\n* Different backends for task queue (in-memory, redis, mongodb)\n* Tools to debug and collect statistics\n\n\n## Grab Example\n\n```python\nimport logging\n\nfrom grab import Grab\n\nlogging.basicConfig(level=logging.DEBUG)\n\ng = Grab()\n\ng.go('https://github.com/login')\ng.doc.set_input('login', '****')\ng.doc.set_input('password', '****')\ng.doc.submit()\n\ng.doc.save('/tmp/x.html')\n\ng.doc('//ul[@id=\"user-links\"]//button[contains(@class, \"signout\")]').assert_exists()\n\nhome_url = g.doc('//a[contains(@class, \"header-nav-link name\")]/@href').text()\nrepo_url = home_url + '?tab=repositories'\n\ng.go(repo_url)\n\nfor elem in g.doc.select('//h3[@class=\"repo-list-name\"]/a'):\n    print('%s: %s' % (elem.text(),\n                      g.make_url_absolute(elem.attr('href'))))\n\n```\n\n## Grab::Spider Example\n\n```python\nimport logging\n\nfrom grab.spider import Spider, Task\n\nlogging.basicConfig(level=logging.DEBUG)\n\n\nclass ExampleSpider(Spider):\n    def task_generator(self):\n        for lang in 'python', 'ruby', 'perl':\n            url = 'https://www.google.com/search?q=%s' % lang\n            yield Task('search', url=url, lang=lang)\n\n    def task_search(self, grab, task):\n        print('%s: %s' % (task.lang,\n                          grab.doc('//div[@class=\"s\"]//cite').text()))\n\n\nbot = ExampleSpider(thread_number=2)\nbot.run()\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Web scraping framework",
    "version": "1.0.0",
    "project_urls": {
        "homepage": "http://github.com/lorien/grab"
    },
    "split_keywords": [
        "pycurl",
        " multicurl",
        " curl",
        " network",
        " parsing",
        " grabbing",
        " scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ba6cc27bc41c67da21c968648f8844156bc5937dd5f22cef17bbc72d63e36ade",
                "md5": "95ffcf9d56271465d8f60b9ea3b632b1",
                "sha256": "b4f4fe910d9a346f6ccbf1e8a6dac3b554f7c753661ee54ad4bf1a09ac1527d7"
            },
            "downloads": -1,
            "filename": "grab-1.0.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "95ffcf9d56271465d8f60b9ea3b632b1",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=2.7",
            "size": 85917,
            "upload_time": "2025-09-15T17:15:43",
            "upload_time_iso_8601": "2025-09-15T17:15:43.342582Z",
            "url": "https://files.pythonhosted.org/packages/ba/6c/c27bc41c67da21c968648f8844156bc5937dd5f22cef17bbc72d63e36ade/grab-1.0.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "513aed767c064d176555a7de34c5fd5027cad9d74a9625e384ad92fbf02032d9",
                "md5": "fbfa280d51b5f5658cf9846ff73a1895",
                "sha256": "7348c2d1b2fcb660834d358307d8eb384c2abab6effe6a837f8b5b8dba28e459"
            },
            "downloads": -1,
            "filename": "grab-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fbfa280d51b5f5658cf9846ff73a1895",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7",
            "size": 4277075,
            "upload_time": "2025-09-15T17:15:46",
            "upload_time_iso_8601": "2025-09-15T17:15:46.382489Z",
            "url": "https://files.pythonhosted.org/packages/51/3a/ed767c064d176555a7de34c5fd5027cad9d74a9625e384ad92fbf02032d9/grab-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-15 17:15:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lorien",
    "github_project": "grab",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "lcname": "grab"
}

None