scrapfly-sdk


Namescrapfly-sdk JSON
Version 0.8.16 PyPI version JSON
download
home_pagehttps://github.com/scrapfly/python-sdk
SummaryScrapfly SDK for Scrapfly
upload_time2024-04-19 02:33:04
maintainerNone
docs_urlNone
authorScrapfly
requires_python>=3.6
licenseBSD
keywords scraping web scraping data extraction scrapfly sdk cloud scrapy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrapfly SDK

## Installation

`pip install scrapfly-sdk`

You can also install extra dependencies

* `pip install "scrapfly-sdk[seepdup]"` for performance improvement
* `pip install "scrapfly-sdk[concurrency]"` for concurrency out of the box (asyncio / thread)
* `pip install "scrapfly-sdk[scrapy]"` for scrapy integration
* `pip install "scrapfly-sdk[all]"` Everything!

For use of built-in HTML parser (via `ScrapeApiResponse.selector` property) additional requirement of either [parsel](https://pypi.org/project/parsel/) or [scrapy](https://pypi.org/project/Scrapy/) is required.

For reference of usage or examples, please checkout the folder `/examples` in this repository.

## Get Your API Key

You can create a free account on [Scrapfly](https://scrapfly.io/register) to get your API Key.

* [Usage](https://scrapfly.io/docs/sdk/python)
* [Python API](https://scrapfly.github.io/python-scrapfly/scrapfly)
* [Open API 3 Spec](https://scrapfly.io/docs/openapi#get-/scrape) 
* [Scrapy Integration](https://scrapfly.io/docs/sdk/scrapy)

## Migration

### Migrate from 0.7.x to 0.8

asyncio-pool dependency has been dropped

`scrapfly.concurrent_scrape` is now an async generator. If the concurrency is `None` or not defined, the max concurrency allowed by
your current subscription is used.

```python
    async for result in scrapfly.concurrent_scrape(concurrency=10, scrape_configs=[ScrapConfig(...), ...]):
        print(result)
```

brotli args is deprecated and will be removed in the next minor. There is not benefit in most of case
versus gzip regarding and size and use more CPU.

### What's new

### 0.8.x

* Better error log
* Async/Improvement for concurrent scrape with asyncio
* Scrapy media pipeline are now supported out of the box

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scrapfly/python-sdk",
    "name": "scrapfly-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "scraping, web scraping, data, extraction, scrapfly, sdk, cloud, scrapy",
    "author": "Scrapfly",
    "author_email": "tech@scrapfly.io",
    "download_url": "https://files.pythonhosted.org/packages/f6/e5/e0c00669bd94c82b0dd3dea9932e130008f2ad3503b259c31f6fa45a4ac9/scrapfly-sdk-0.8.16.tar.gz",
    "platform": null,
    "description": "# Scrapfly SDK\n\n## Installation\n\n`pip install scrapfly-sdk`\n\nYou can also install extra dependencies\n\n* `pip install \"scrapfly-sdk[seepdup]\"` for performance improvement\n* `pip install \"scrapfly-sdk[concurrency]\"` for concurrency out of the box (asyncio / thread)\n* `pip install \"scrapfly-sdk[scrapy]\"` for scrapy integration\n* `pip install \"scrapfly-sdk[all]\"` Everything!\n\nFor use of built-in HTML parser (via `ScrapeApiResponse.selector` property) additional requirement of either [parsel](https://pypi.org/project/parsel/) or [scrapy](https://pypi.org/project/Scrapy/) is required.\n\nFor reference of usage or examples, please checkout the folder `/examples` in this repository.\n\n## Get Your API Key\n\nYou can create a free account on [Scrapfly](https://scrapfly.io/register) to get your API Key.\n\n* [Usage](https://scrapfly.io/docs/sdk/python)\n* [Python API](https://scrapfly.github.io/python-scrapfly/scrapfly)\n* [Open API 3 Spec](https://scrapfly.io/docs/openapi#get-/scrape) \n* [Scrapy Integration](https://scrapfly.io/docs/sdk/scrapy)\n\n## Migration\n\n### Migrate from 0.7.x to 0.8\n\nasyncio-pool dependency has been dropped\n\n`scrapfly.concurrent_scrape` is now an async generator. If the concurrency is `None` or not defined, the max concurrency allowed by\nyour current subscription is used.\n\n```python\n    async for result in scrapfly.concurrent_scrape(concurrency=10, scrape_configs=[ScrapConfig(...), ...]):\n        print(result)\n```\n\nbrotli args is deprecated and will be removed in the next minor. There is not benefit in most of case\nversus gzip regarding and size and use more CPU.\n\n### What's new\n\n### 0.8.x\n\n* Better error log\n* Async/Improvement for concurrent scrape with asyncio\n* Scrapy media pipeline are now supported out of the box\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Scrapfly SDK for Scrapfly",
    "version": "0.8.16",
    "project_urls": {
        "Company": "https://scrapfly.io",
        "Documentation": "https://scrapfly.io/docs",
        "Homepage": "https://github.com/scrapfly/python-sdk",
        "Source": "https://github.com/scrapfly/python-sdk"
    },
    "split_keywords": [
        "scraping",
        " web scraping",
        " data",
        " extraction",
        " scrapfly",
        " sdk",
        " cloud",
        " scrapy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "70539d1c62e111c4a3ae7e61a6a4e9ce1423d7352226a30265c13b680fc60f07",
                "md5": "df1258f2233de66de4b54f22e4da6997",
                "sha256": "4b5e5f060c02eeffb6859aea10a3f945e1988750b664968b116fb5f798f1c050"
            },
            "downloads": -1,
            "filename": "scrapfly_sdk-0.8.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "df1258f2233de66de4b54f22e4da6997",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 31901,
            "upload_time": "2024-04-19T02:33:02",
            "upload_time_iso_8601": "2024-04-19T02:33:02.557708Z",
            "url": "https://files.pythonhosted.org/packages/70/53/9d1c62e111c4a3ae7e61a6a4e9ce1423d7352226a30265c13b680fc60f07/scrapfly_sdk-0.8.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6e5e0c00669bd94c82b0dd3dea9932e130008f2ad3503b259c31f6fa45a4ac9",
                "md5": "745de289fab8230f7e7056760a6f2142",
                "sha256": "e77b5645e2516a1c73ef454fb3b51715c9e9e4264a59ae9cb4b065000e00c211"
            },
            "downloads": -1,
            "filename": "scrapfly-sdk-0.8.16.tar.gz",
            "has_sig": false,
            "md5_digest": "745de289fab8230f7e7056760a6f2142",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 27060,
            "upload_time": "2024-04-19T02:33:04",
            "upload_time_iso_8601": "2024-04-19T02:33:04.868546Z",
            "url": "https://files.pythonhosted.org/packages/f6/e5/e0c00669bd94c82b0dd3dea9932e130008f2ad3503b259c31f6fa45a4ac9/scrapfly-sdk-0.8.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-19 02:33:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scrapfly",
    "github_project": "python-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "scrapfly-sdk"
}
        
Elapsed time: 0.23333s