crawlab-ai


Namecrawlab-ai JSON
Version 0.0.10 PyPI version JSON
download
home_pagehttps://github.com/crawlab-team/crawlab-ai-sdk
SummarySDK for Crawlab AI
upload_time2024-05-29 13:34:40
maintainerNone
docs_urlNone
authorMarvin Zhang
requires_pythonNone
licenseMIT
keywords crawlab ai
VCS
bugtrack_url
requirements Scrapy requests pytest pandas bs4 black tabulate
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Crawlab AI SDK

This is the Python SDK for [Crawlab AI](https://www.crawlab.cn/ai), an AI-powered web scraping platform maintained
by [Crawlab](https://www.crawlab.cn).

## Installation

```bash
pip install crawlab-ai
```

## Pre-requisites

An API token is required to use this SDK. You can get the API token from
the [Crawlab official website](https://dev.crawlab.io/ai).

## Usage

### Get data from a list page

```python
from crawlab_ai import read_list

# Define the URL and fields
url = "https://example.com"

# Get the data without specifying fields
df = read_list(url=url)
print(df)

# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)

# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)
```

## Usage with Scrapy

Create a Scrapy spider by extending `ScrapyListSpider`:

```python
from crawlab_ai import ScrapyListSpider


class MySpider(ScrapyListSpider):
    name = "my_spider"
    start_urls = ["https://example.com"]
    fields = ["title", "content"]
```

Then run the spider:

```bash
scrapy crawl my_spider
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/crawlab-team/crawlab-ai-sdk",
    "name": "crawlab-ai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "crawlab, ai",
    "author": "Marvin Zhang",
    "author_email": "tikazyq@163.com",
    "download_url": "https://files.pythonhosted.org/packages/22/35/ae028accd56b7dfc00d88eef4d4139fa31f2a506a2bd2040152aca7e1a76/crawlab-ai-0.0.10.tar.gz",
    "platform": null,
    "description": "# Crawlab AI SDK\n\nThis is the Python SDK for [Crawlab AI](https://www.crawlab.cn/ai), an AI-powered web scraping platform maintained\nby [Crawlab](https://www.crawlab.cn).\n\n## Installation\n\n```bash\npip install crawlab-ai\n```\n\n## Pre-requisites\n\nAn API token is required to use this SDK. You can get the API token from\nthe [Crawlab official website](https://dev.crawlab.io/ai).\n\n## Usage\n\n### Get data from a list page\n\n```python\nfrom crawlab_ai import read_list\n\n# Define the URL and fields\nurl = \"https://example.com\"\n\n# Get the data without specifying fields\ndf = read_list(url=url)\nprint(df)\n\n# You can also specify fields\nfields = [\"title\", \"content\"]\ndf = read_list(url=url, fields=fields)\n\n# You can also return a list of dictionaries instead of a DataFrame\ndata = read_list(url=url, as_dataframe=False)\nprint(data)\n```\n\n## Usage with Scrapy\n\nCreate a Scrapy spider by extending `ScrapyListSpider`:\n\n```python\nfrom crawlab_ai import ScrapyListSpider\n\n\nclass MySpider(ScrapyListSpider):\n    name = \"my_spider\"\n    start_urls = [\"https://example.com\"]\n    fields = [\"title\", \"content\"]\n```\n\nThen run the spider:\n\n```bash\nscrapy crawl my_spider\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SDK for Crawlab AI",
    "version": "0.0.10",
    "project_urls": {
        "Homepage": "https://github.com/crawlab-team/crawlab-ai-sdk"
    },
    "split_keywords": [
        "crawlab",
        " ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c8acf2cc283d802ed8a51c36cb40d10204c1afa81e8bb0102ca5810ce81cee8",
                "md5": "7585a35d870091f0e2cba2d2401bbfa6",
                "sha256": "d155b529dc9fd76d3de79046aab948e907760a602e3472d0fcd69b4f976eda06"
            },
            "downloads": -1,
            "filename": "crawlab_ai-0.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7585a35d870091f0e2cba2d2401bbfa6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14209,
            "upload_time": "2024-05-29T13:34:39",
            "upload_time_iso_8601": "2024-05-29T13:34:39.660809Z",
            "url": "https://files.pythonhosted.org/packages/0c/8a/cf2cc283d802ed8a51c36cb40d10204c1afa81e8bb0102ca5810ce81cee8/crawlab_ai-0.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2235ae028accd56b7dfc00d88eef4d4139fa31f2a506a2bd2040152aca7e1a76",
                "md5": "59678f898c2ba6d6141ec8b9db20ddc8",
                "sha256": "b211f8d597bbcaa9e889448d569cbe9db08a6c879e30879742a96edb60826254"
            },
            "downloads": -1,
            "filename": "crawlab-ai-0.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "59678f898c2ba6d6141ec8b9db20ddc8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9353,
            "upload_time": "2024-05-29T13:34:40",
            "upload_time_iso_8601": "2024-05-29T13:34:40.732054Z",
            "url": "https://files.pythonhosted.org/packages/22/35/ae028accd56b7dfc00d88eef4d4139fa31f2a506a2bd2040152aca7e1a76/crawlab-ai-0.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-29 13:34:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "crawlab-team",
    "github_project": "crawlab-ai-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "Scrapy",
            "specs": [
                [
                    ">=",
                    "2.4"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "~=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.2"
                ]
            ]
        },
        {
            "name": "bs4",
            "specs": [
                [
                    "==",
                    "0.0.2"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    "==",
                    "24.3.0"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        }
    ],
    "lcname": "crawlab-ai"
}
        
Elapsed time: 0.82627s