# Crawlab AI SDK
This is the Python SDK for [Crawlab AI](https://www.crawlab.cn/ai), an AI-powered web scraping platform maintained
by [Crawlab](https://www.crawlab.cn).
## Installation
```bash
pip install crawlab-ai
```
## Pre-requisites
An API token is required to use this SDK. You can get the API token from
the [Crawlab official website](https://dev.crawlab.io/ai).
## Usage
### Get data from a list page
```python
from crawlab_ai import read_list
# Define the URL and fields
url = "https://example.com"
# Get the data without specifying fields
df = read_list(url=url)
print(df)
# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)
# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)
```
## Usage with Scrapy
Create a Scrapy spider by extending `ScrapyListSpider`:
```python
from crawlab_ai import ScrapyListSpider
class MySpider(ScrapyListSpider):
name = "my_spider"
start_urls = ["https://example.com"]
fields = ["title", "content"]
```
Then run the spider:
```bash
scrapy crawl my_spider
```
Raw data
{
"_id": null,
"home_page": "https://github.com/crawlab-team/crawlab-ai-sdk",
"name": "crawlab-ai",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "crawlab, ai",
"author": "Marvin Zhang",
"author_email": "tikazyq@163.com",
"download_url": "https://files.pythonhosted.org/packages/22/35/ae028accd56b7dfc00d88eef4d4139fa31f2a506a2bd2040152aca7e1a76/crawlab-ai-0.0.10.tar.gz",
"platform": null,
"description": "# Crawlab AI SDK\n\nThis is the Python SDK for [Crawlab AI](https://www.crawlab.cn/ai), an AI-powered web scraping platform maintained\nby [Crawlab](https://www.crawlab.cn).\n\n## Installation\n\n```bash\npip install crawlab-ai\n```\n\n## Pre-requisites\n\nAn API token is required to use this SDK. You can get the API token from\nthe [Crawlab official website](https://dev.crawlab.io/ai).\n\n## Usage\n\n### Get data from a list page\n\n```python\nfrom crawlab_ai import read_list\n\n# Define the URL and fields\nurl = \"https://example.com\"\n\n# Get the data without specifying fields\ndf = read_list(url=url)\nprint(df)\n\n# You can also specify fields\nfields = [\"title\", \"content\"]\ndf = read_list(url=url, fields=fields)\n\n# You can also return a list of dictionaries instead of a DataFrame\ndata = read_list(url=url, as_dataframe=False)\nprint(data)\n```\n\n## Usage with Scrapy\n\nCreate a Scrapy spider by extending `ScrapyListSpider`:\n\n```python\nfrom crawlab_ai import ScrapyListSpider\n\n\nclass MySpider(ScrapyListSpider):\n name = \"my_spider\"\n start_urls = [\"https://example.com\"]\n fields = [\"title\", \"content\"]\n```\n\nThen run the spider:\n\n```bash\nscrapy crawl my_spider\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "SDK for Crawlab AI",
"version": "0.0.10",
"project_urls": {
"Homepage": "https://github.com/crawlab-team/crawlab-ai-sdk"
},
"split_keywords": [
"crawlab",
" ai"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0c8acf2cc283d802ed8a51c36cb40d10204c1afa81e8bb0102ca5810ce81cee8",
"md5": "7585a35d870091f0e2cba2d2401bbfa6",
"sha256": "d155b529dc9fd76d3de79046aab948e907760a602e3472d0fcd69b4f976eda06"
},
"downloads": -1,
"filename": "crawlab_ai-0.0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7585a35d870091f0e2cba2d2401bbfa6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14209,
"upload_time": "2024-05-29T13:34:39",
"upload_time_iso_8601": "2024-05-29T13:34:39.660809Z",
"url": "https://files.pythonhosted.org/packages/0c/8a/cf2cc283d802ed8a51c36cb40d10204c1afa81e8bb0102ca5810ce81cee8/crawlab_ai-0.0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2235ae028accd56b7dfc00d88eef4d4139fa31f2a506a2bd2040152aca7e1a76",
"md5": "59678f898c2ba6d6141ec8b9db20ddc8",
"sha256": "b211f8d597bbcaa9e889448d569cbe9db08a6c879e30879742a96edb60826254"
},
"downloads": -1,
"filename": "crawlab-ai-0.0.10.tar.gz",
"has_sig": false,
"md5_digest": "59678f898c2ba6d6141ec8b9db20ddc8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9353,
"upload_time": "2024-05-29T13:34:40",
"upload_time_iso_8601": "2024-05-29T13:34:40.732054Z",
"url": "https://files.pythonhosted.org/packages/22/35/ae028accd56b7dfc00d88eef4d4139fa31f2a506a2bd2040152aca7e1a76/crawlab-ai-0.0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-29 13:34:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "crawlab-team",
"github_project": "crawlab-ai-sdk",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "Scrapy",
"specs": [
[
">=",
"2.4"
]
]
},
{
"name": "requests",
"specs": [
[
"~=",
"2.31.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.2"
]
]
},
{
"name": "bs4",
"specs": [
[
"==",
"0.0.2"
]
]
},
{
"name": "black",
"specs": [
[
"==",
"24.3.0"
]
]
},
{
"name": "tabulate",
"specs": [
[
"==",
"0.9.0"
]
]
}
],
"lcname": "crawlab-ai"
}