botasaurus-api


Namebotasaurus-api JSON
Version 4.0.4 PyPI version JSON
download
home_pagehttps://github.com/omkarcloud/botasaurus-proxy-authentication
SummaryThe Botasaurus API Client provides programmatic access to Botasaurus scrapers with a developer-friendly API.
upload_time2024-05-23 04:31:21
maintainerNone
docs_urlNone
authorChetan Jain
requires_python>=3.6
licenseMIT
keywords seleniumwire proxy authentication proxy authentication
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Botasaurus API

The Botasaurus API client is a Python library for interacting with Botasaurus Scrapers via an API. 

It provides a simple and convenient way to create, fetch, download, abort, and delete tasks, as well as manage their results.

## Installation

To install the API client, use pip:

```bash
python -m pip install botasaurus_api
```

## Usage

First, import the `Api` class from the library:

```python
from botasaurus_api import Api
```

Then, create an instance of the `Api` class:

```python
api = Api()
```

You can also provide an optional `api_url` parameter to specify the base URL for the API server. If the `api_url` parameter is not provided, it defaults to `http://127.0.0.1:8000`.

```python
api = Api('https://example.com/')
```

Additionally, the API client will create response JSON files in the `output/responses/` directory to help with debugging and development. If you want to disable this feature in production, you can set `create_response_files=False`.

```python
api = Api(create_response_files=False)
```

### Creating Tasks

There are two types of tasks:

- Asynchronous Task
- Synchronous Task

Asynchronous tasks run asynchronously, without waiting for the task to be completed. The server will return a response immediately, containing information about the task, but not the actual results. The client can then retrieve the results later.

Synchronous tasks, on the other hand, wait for the completion of the task. The server response will contain the results of the task.

You should use asynchronous tasks when you want to run a task in the background and retrieve the results later. Synchronous tasks are better suited for scenarios where you have a small number of tasks and want to wait and get the results immediately.

To create an asynchronous task, use the `create_async_task` method:

```python
data = {'link': 'https://www.omkar.cloud/'}
task = api.create_async_task(data)
```

You can also provide an optional `scraper_name` parameter to specify the scraper to be used for the task, if not provided, it will use the default scraper:

```python
task = api.create_async_task(data, scraper_name='scrape_heading_task')
```

To create a synchronous task, use the `create_sync_task` method:

```python
data = {'link': 'https://www.omkar.cloud/blog/'}
task = api.create_sync_task(data)
```

You can create multiple asynchronous or synchronous tasks at once using the `create_async_tasks` and `create_sync_tasks` methods, respectively:

```python
data_items = [{'link': 'https://www.omkar.cloud/'}, {'link': 'https://www.omkar.cloud/blog/'}]
tasks = api.create_async_tasks(data_items)
tasks = api.create_sync_tasks(data_items)
```

### Fetching Tasks

To fetch tasks from the server, use the `get_tasks` method:

```python
tasks = api.get_tasks()
```

By default, all tasks are returned. You can also apply pagination, views, sorts and filters:

```python
tasks = api.get_tasks(
    page=1,
    per_page=10,
    # view='overview',
    # sort='my-sort',
    # filters={'your_filter': 'value'},
)
```

To fetch a specific task by its ID, use the `get_task` method:

```python
task = api.get_task(task_id=1)
```

### Fetching Task Results

To fetch the results of a specific task, use the `get_task_results` method:

```pytho
results = api.get_task_results(task_id=1)
```

You can also apply views, sorts and filters:

```python
results = api.get_task_results(
    task_id=1,
    page=1,
    per_page=20,
    # view='overview',
    # sort='my_sort',
    # filters={'your_filter': 'value'},
)
```

### Downloading Task Results

To download the results of a specific task in a particular format, use the `download_task_results` method:

```python
results_bytes, filename = api.download_task_results(task_id=1, format='csv')
with open(filename, 'wb') as file:
    file.write(results_bytes)
```

You can also apply views, sorts and filters:

```python
results_bytes, filename = api.download_task_results(
    task_id=1,
    format='excel',  # format can be one of: json, csv or excel
    # view='overview',
    # sort='my_sort',
    # filters={'your_filter': 'value'},
)
```

### Aborting and Deleting Tasks

To abort a specific task, use the `abort_task` method:

```python
api.abort_task(task_id=1)
```

To delete a specific task, use the `delete_task` method:

```python
api.delete_task(task_id=1)
```

You can also bulk abort or delete multiple tasks at once using the `abort_tasks` and `delete_tasks` methods, respectively:

```python
api.abort_tasks(1, 2, 3)
api.delete_tasks(4, 5, 6)
```

## Examples

Here are some example usages of the API client:

```python
from botasaurus_api import Api

# Create an instance of the API client
api = Api()

# Create an asynchronous task
data = {'link': 'https://www.omkar.cloud/'}
task = api.create_sync_task(data, scraper_name='scrape_heading_task')

# Fetch the task
task = api.get_task(task['id'])

# Fetch the task results
results = api.get_task_results(task['id'])

# Download the task results as a CSV
results_bytes, filename = api.download_task_results(task['id'], format='csv')

# Abort the task
api.abort_task(task['id'])

# Delete the task
api.delete_task(task['id'])

# --- Bulk Operations ---

# Create multiple synchronous tasks
data_items = [{'link': 'https://www.omkar.cloud/'}, {'link': 'https://www.omkar.cloud/blog/'}]
tasks = api.create_sync_tasks(data_items, scraper_name='scrape_heading_task')

# Fetch all tasks
all_tasks = api.get_tasks()

# Bulk abort tasks
api.abort_tasks(*[task['id'] for task in tasks])

# Bulk delete tasks
api.delete_tasks(*[task['id'] for task in tasks])
```

## That's It!

Now, go and build something awesome.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/omkarcloud/botasaurus-proxy-authentication",
    "name": "botasaurus-api",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "seleniumwire proxy authentication, proxy authentication",
    "author": "Chetan Jain",
    "author_email": "chetan@omkar.cloud",
    "download_url": "https://files.pythonhosted.org/packages/4d/f8/22f4f9b608a9b80ee8fdecb10b382c46924b552f8e6adab99ae980d40a5b/botasaurus_api-4.0.4.tar.gz",
    "platform": null,
    "description": "# Botasaurus API\n\nThe Botasaurus API client is a Python library for interacting with Botasaurus Scrapers via an API. \n\nIt provides a simple and convenient way to create, fetch, download, abort, and delete tasks, as well as manage their results.\n\n## Installation\n\nTo install the API client, use pip:\n\n```bash\npython -m pip install botasaurus_api\n```\n\n## Usage\n\nFirst, import the `Api` class from the library:\n\n```python\nfrom botasaurus_api import Api\n```\n\nThen, create an instance of the `Api` class:\n\n```python\napi = Api()\n```\n\nYou can also provide an optional `api_url` parameter to specify the base URL for the API server. If the `api_url` parameter is not provided, it defaults to `http://127.0.0.1:8000`.\n\n```python\napi = Api('https://example.com/')\n```\n\nAdditionally, the API client will create response JSON files in the `output/responses/` directory to help with debugging and development. If you want to disable this feature in production, you can set `create_response_files=False`.\n\n```python\napi = Api(create_response_files=False)\n```\n\n### Creating Tasks\n\nThere are two types of tasks:\n\n- Asynchronous Task\n- Synchronous Task\n\nAsynchronous tasks run asynchronously, without waiting for the task to be completed. The server will return a response immediately, containing information about the task, but not the actual results. The client can then retrieve the results later.\n\nSynchronous tasks, on the other hand, wait for the completion of the task. The server response will contain the results of the task.\n\nYou should use asynchronous tasks when you want to run a task in the background and retrieve the results later. Synchronous tasks are better suited for scenarios where you have a small number of tasks and want to wait and get the results immediately.\n\nTo create an asynchronous task, use the `create_async_task` method:\n\n```python\ndata = {'link': 'https://www.omkar.cloud/'}\ntask = api.create_async_task(data)\n```\n\nYou can also provide an optional `scraper_name` parameter to specify the scraper to be used for the task, if not provided, it will use the default scraper:\n\n```python\ntask = api.create_async_task(data, scraper_name='scrape_heading_task')\n```\n\nTo create a synchronous task, use the `create_sync_task` method:\n\n```python\ndata = {'link': 'https://www.omkar.cloud/blog/'}\ntask = api.create_sync_task(data)\n```\n\nYou can create multiple asynchronous or synchronous tasks at once using the `create_async_tasks` and `create_sync_tasks` methods, respectively:\n\n```python\ndata_items = [{'link': 'https://www.omkar.cloud/'}, {'link': 'https://www.omkar.cloud/blog/'}]\ntasks = api.create_async_tasks(data_items)\ntasks = api.create_sync_tasks(data_items)\n```\n\n### Fetching Tasks\n\nTo fetch tasks from the server, use the `get_tasks` method:\n\n```python\ntasks = api.get_tasks()\n```\n\nBy default, all tasks are returned. You can also apply pagination, views, sorts and filters:\n\n```python\ntasks = api.get_tasks(\n    page=1,\n    per_page=10,\n    # view='overview',\n    # sort='my-sort',\n    # filters={'your_filter': 'value'},\n)\n```\n\nTo fetch a specific task by its ID, use the `get_task` method:\n\n```python\ntask = api.get_task(task_id=1)\n```\n\n### Fetching Task Results\n\nTo fetch the results of a specific task, use the `get_task_results` method:\n\n```pytho\nresults = api.get_task_results(task_id=1)\n```\n\nYou can also apply views, sorts and filters:\n\n```python\nresults = api.get_task_results(\n    task_id=1,\n    page=1,\n    per_page=20,\n    # view='overview',\n    # sort='my_sort',\n    # filters={'your_filter': 'value'},\n)\n```\n\n### Downloading Task Results\n\nTo download the results of a specific task in a particular format, use the `download_task_results` method:\n\n```python\nresults_bytes, filename = api.download_task_results(task_id=1, format='csv')\nwith open(filename, 'wb') as file:\n    file.write(results_bytes)\n```\n\nYou can also apply views, sorts and filters:\n\n```python\nresults_bytes, filename = api.download_task_results(\n    task_id=1,\n    format='excel',  # format can be one of: json, csv or excel\n    # view='overview',\n    # sort='my_sort',\n    # filters={'your_filter': 'value'},\n)\n```\n\n### Aborting and Deleting Tasks\n\nTo abort a specific task, use the `abort_task` method:\n\n```python\napi.abort_task(task_id=1)\n```\n\nTo delete a specific task, use the `delete_task` method:\n\n```python\napi.delete_task(task_id=1)\n```\n\nYou can also bulk abort or delete multiple tasks at once using the `abort_tasks` and `delete_tasks` methods, respectively:\n\n```python\napi.abort_tasks(1, 2, 3)\napi.delete_tasks(4, 5, 6)\n```\n\n## Examples\n\nHere are some example usages of the API client:\n\n```python\nfrom botasaurus_api import Api\n\n# Create an instance of the API client\napi = Api()\n\n# Create an asynchronous task\ndata = {'link': 'https://www.omkar.cloud/'}\ntask = api.create_sync_task(data, scraper_name='scrape_heading_task')\n\n# Fetch the task\ntask = api.get_task(task['id'])\n\n# Fetch the task results\nresults = api.get_task_results(task['id'])\n\n# Download the task results as a CSV\nresults_bytes, filename = api.download_task_results(task['id'], format='csv')\n\n# Abort the task\napi.abort_task(task['id'])\n\n# Delete the task\napi.delete_task(task['id'])\n\n# --- Bulk Operations ---\n\n# Create multiple synchronous tasks\ndata_items = [{'link': 'https://www.omkar.cloud/'}, {'link': 'https://www.omkar.cloud/blog/'}]\ntasks = api.create_sync_tasks(data_items, scraper_name='scrape_heading_task')\n\n# Fetch all tasks\nall_tasks = api.get_tasks()\n\n# Bulk abort tasks\napi.abort_tasks(*[task['id'] for task in tasks])\n\n# Bulk delete tasks\napi.delete_tasks(*[task['id'] for task in tasks])\n```\n\n## That's It!\n\nNow, go and build something awesome.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "The Botasaurus API Client provides programmatic access to Botasaurus scrapers with a developer-friendly API.",
    "version": "4.0.4",
    "project_urls": {
        "Homepage": "https://github.com/omkarcloud/botasaurus-proxy-authentication"
    },
    "split_keywords": [
        "seleniumwire proxy authentication",
        " proxy authentication"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4df822f4f9b608a9b80ee8fdecb10b382c46924b552f8e6adab99ae980d40a5b",
                "md5": "2708b706530e4ee5e4b1597f63b59c81",
                "sha256": "461d17913e904c9ad4448cab3e959038a8b26256b7e58c287bb0fb5868658c73"
            },
            "downloads": -1,
            "filename": "botasaurus_api-4.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "2708b706530e4ee5e4b1597f63b59c81",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 8365,
            "upload_time": "2024-05-23T04:31:21",
            "upload_time_iso_8601": "2024-05-23T04:31:21.467243Z",
            "url": "https://files.pythonhosted.org/packages/4d/f8/22f4f9b608a9b80ee8fdecb10b382c46924b552f8e6adab99ae980d40a5b/botasaurus_api-4.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-23 04:31:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "omkarcloud",
    "github_project": "botasaurus-proxy-authentication",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "botasaurus-api"
}
        
Elapsed time: 0.45993s