spiderwebai-py


Namespiderwebai-py JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/spider-rs/spiderwebai-clients/tree/main/python
SummaryPython SDK for SpiderWebAI API
upload_time2024-04-22 08:36:01
maintainerNone
docs_urlNone
authorSpider
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SpiderWebAI Python SDK

The SpiderWebAI Python SDK offers a toolkit for straightforward website scraping, crawling at scale, and other utilities like extracting links and taking screenshots, enabling you to collect data formatted for compatibility with language models (LLMs). It features a user-friendly interface for seamless integration with the SpiderWebAI API.

## Installation

To install the SpiderWebAI Python SDK, you can use pip:

```bash
pip install spiderwebai-py
```

## Usage

1. Get an API key from [spiderwebai.xyz](https://spiderwebai.xyz)
2. Set the API key as an environment variable named `SPIDER_API_KEY` or pass it as a parameter to the `SpiderWebAIApp` class.

Here's an example of how to use the SDK:

```python
from spiderwebai import SpiderWebAIApp

# Initialize the SpiderWebAIApp with your API key
app = SpiderWebAIApp(api_key='your_api_key')

# Scrape a single URL
url = 'https://spiderwebai.xyz'
scraped_data = app.scrape_url(url)

# Crawl a website
crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'store_data': False,
    'metadata': False,
    'request': 'http'
}
crawl_result = app.crawl_url(url, params=crawler_params)
```

### Scraping a URL

To scrape data from a single URL:

```python
url = 'https://example.com'
scraped_data = app.scrape_url(url)
```

### Crawling a Website

To automate crawling a website:

```python
url = 'https://example.com'
crawl_params = {
    'limit': 200,
    'request': 'smart_mode'
}
crawl_result = app.crawl_url(url, params=crawl_params)
```

### Retrieving Links from a URL(s)

Extract all links from a specified URL:

```python
url = 'https://example.com'
links = app.links(url)
```

### Taking Screenshots of a URL(s)

Capture a screenshot of a given URL:

```python
url = 'https://example.com'
screenshot = app.screenshot(url)
```

### Extracting Contact Information

Extract contact details from a specified URL:

```python
url = 'https://example.com'
contacts = app.extract_contacts(url)
```

### Labeling Data from a URL(s)

Label the data extracted from a particular URL:

```python
url = 'https://example.com'
labeled_data = app.label(url)
```

### Checking Available Credits

You can check the remaining credits on your account:

```python
credits = app.get_credits()
```

## Streaming

If you need to stream the request use the third param:

```python
url = 'https://example.com'

crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'store_data': False,
    'metadata': False,
    'request': 'http'
}

links = app.links(url, crawler_params, True)
```

## Content-Type

The following Content-type headers are supported using the fourth param:

1. `application/json`
1. `text/csv`
1. `application/xml`
1. `application/jsonl`

```python
url = 'https://example.com'

crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'store_data': False,
    'metadata': False,
    'request': 'http'
}

# stream json lines back to the client
links = app.crawl(url, crawler_params, True, "application/jsonl")
```

## Error Handling

The SDK handles errors returned by the SpiderWebAI API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.

## Contributing

Contributions to the SpiderWebAI Python SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

## License

The SpiderWebAI Python SDK is open-source and released under the [MIT License](https://opensource.org/licenses/MIT).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/spider-rs/spiderwebai-clients/tree/main/python",
    "name": "spiderwebai-py",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Spider",
    "author_email": "jeff@a11ywatch.com",
    "download_url": "https://files.pythonhosted.org/packages/cc/8f/eb18850beda1d9a2f6a02b029da3d4772baed961673eddd4f97e44ce6747/spiderwebai-py-0.1.4.tar.gz",
    "platform": null,
    "description": "# SpiderWebAI Python SDK\n\nThe SpiderWebAI Python SDK offers a toolkit for straightforward website scraping, crawling at scale, and other utilities like extracting links and taking screenshots, enabling you to collect data formatted for compatibility with language models (LLMs). It features a user-friendly interface for seamless integration with the SpiderWebAI API.\n\n## Installation\n\nTo install the SpiderWebAI Python SDK, you can use pip:\n\n```bash\npip install spiderwebai-py\n```\n\n## Usage\n\n1. Get an API key from [spiderwebai.xyz](https://spiderwebai.xyz)\n2. Set the API key as an environment variable named `SPIDER_API_KEY` or pass it as a parameter to the `SpiderWebAIApp` class.\n\nHere's an example of how to use the SDK:\n\n```python\nfrom spiderwebai import SpiderWebAIApp\n\n# Initialize the SpiderWebAIApp with your API key\napp = SpiderWebAIApp(api_key='your_api_key')\n\n# Scrape a single URL\nurl = 'https://spiderwebai.xyz'\nscraped_data = app.scrape_url(url)\n\n# Crawl a website\ncrawler_params = {\n    'limit': 1,\n    'proxy_enabled': True,\n    'store_data': False,\n    'metadata': False,\n    'request': 'http'\n}\ncrawl_result = app.crawl_url(url, params=crawler_params)\n```\n\n### Scraping a URL\n\nTo scrape data from a single URL:\n\n```python\nurl = 'https://example.com'\nscraped_data = app.scrape_url(url)\n```\n\n### Crawling a Website\n\nTo automate crawling a website:\n\n```python\nurl = 'https://example.com'\ncrawl_params = {\n    'limit': 200,\n    'request': 'smart_mode'\n}\ncrawl_result = app.crawl_url(url, params=crawl_params)\n```\n\n### Retrieving Links from a URL(s)\n\nExtract all links from a specified URL:\n\n```python\nurl = 'https://example.com'\nlinks = app.links(url)\n```\n\n### Taking Screenshots of a URL(s)\n\nCapture a screenshot of a given URL:\n\n```python\nurl = 'https://example.com'\nscreenshot = app.screenshot(url)\n```\n\n### Extracting Contact Information\n\nExtract contact details from a specified URL:\n\n```python\nurl = 'https://example.com'\ncontacts = app.extract_contacts(url)\n```\n\n### Labeling Data from a URL(s)\n\nLabel the data extracted from a particular URL:\n\n```python\nurl = 'https://example.com'\nlabeled_data = app.label(url)\n```\n\n### Checking Available Credits\n\nYou can check the remaining credits on your account:\n\n```python\ncredits = app.get_credits()\n```\n\n## Streaming\n\nIf you need to stream the request use the third param:\n\n```python\nurl = 'https://example.com'\n\ncrawler_params = {\n    'limit': 1,\n    'proxy_enabled': True,\n    'store_data': False,\n    'metadata': False,\n    'request': 'http'\n}\n\nlinks = app.links(url, crawler_params, True)\n```\n\n## Content-Type\n\nThe following Content-type headers are supported using the fourth param:\n\n1. `application/json`\n1. `text/csv`\n1. `application/xml`\n1. `application/jsonl`\n\n```python\nurl = 'https://example.com'\n\ncrawler_params = {\n    'limit': 1,\n    'proxy_enabled': True,\n    'store_data': False,\n    'metadata': False,\n    'request': 'http'\n}\n\n# stream json lines back to the client\nlinks = app.crawl(url, crawler_params, True, \"application/jsonl\")\n```\n\n## Error Handling\n\nThe SDK handles errors returned by the SpiderWebAI API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.\n\n## Contributing\n\nContributions to the SpiderWebAI Python SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.\n\n## License\n\nThe SpiderWebAI Python SDK is open-source and released under the [MIT License](https://opensource.org/licenses/MIT).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python SDK for SpiderWebAI API",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/spider-rs/spiderwebai-clients/tree/main/python"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cc8feb18850beda1d9a2f6a02b029da3d4772baed961673eddd4f97e44ce6747",
                "md5": "2fc6bda3f8e8e4ecb86d5a9f42ee98c3",
                "sha256": "340fdc88888d64590ea68e39f350b323ec26dd76fcb1c902c2dcf6767b13394c"
            },
            "downloads": -1,
            "filename": "spiderwebai-py-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "2fc6bda3f8e8e4ecb86d5a9f42ee98c3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4219,
            "upload_time": "2024-04-22T08:36:01",
            "upload_time_iso_8601": "2024-04-22T08:36:01.820510Z",
            "url": "https://files.pythonhosted.org/packages/cc/8f/eb18850beda1d9a2f6a02b029da3d4772baed961673eddd4f97e44ce6747/spiderwebai-py-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-22 08:36:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "spider-rs",
    "github_project": "spiderwebai-clients",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "spiderwebai-py"
}
        
Elapsed time: 0.22598s