browsy

Name	browsy JSON
Version	0.0.8 JSON
	download
home_page	None
Summary	Playwright-based browser automation service with HTTP API and Docker support
upload_time	2025-01-13 18:50:19
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	automation browser docker pdf playwright queue scraping screenshot
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">
  <h1>browsy</h1>
</div>

## What is browsy?

browsy is a service that lets you run browser automation tasks without managing browser instances yourself. It provides:

- **Simple Job Definition**: Write Playwright-powered automation tasks in Python
- **HTTP API**: Queue jobs and retrieve results through HTTP endpoints
- **Docker Ready**: Run everything in containers without worrying about browser dependencies
- **Queue System**: Jobs are processed in order, with automatic retries and status tracking
- **Extensible**: Create any browser automation task - from screenshots and PDFs to complex scraping operations

Think of it as a way to turn your Playwright scripts into HTTP services that can be called from anywhere.

## Quick Start

### Download files

To get started, download the necessary files using the following script:
```bash
curl -LsSf https://raw.githubusercontent.com/mbroton/browsy/main/scripts/get.sh | sh
```

This script will download the `docker-compose` file and example jobs. Once downloaded, navigate to the `browsy/` directory.

Alternatively, you can clone the repository directly and navigate to the quickstart directory.

### Start browsy

To start the service, run:
```bash
docker compose up --build --scale worker=3
```

You can adjust the number of workers by modifying the `--scale worker` parameter. 

Visit `http://localhost:8000/docs` to access the interactive API documentation provided by FastAPI.

### Defining custom jobs

A job is defined as any class that inherits from `browsy.BaseJob`. Browsy will automatically search for these classes within the `jobs/` directory.

Here's an example implementation:
```python
from browsy import BaseJob, Page

class ScreenshotJob(BaseJob):
    NAME = "screenshot"

    url: str | None = None
    html: str | None = None
    full_page: bool = False

    async def execute(self, page: Page) -> bytes:
        if self.url:
            await page.goto(self.url)
        elif self.html:
            await page.set_content(self.html)
        return await page.screenshot(full_page=self.full_page)

    async def validate_logic(self) -> bool:
        return bool(self.url) != bool(self.html)
```

- **Class Definition**: The `ScreenshotJob` class inherits from `BaseJob`, which itself is based on Pydantic's `BaseModel`. This provides automatic data validation and serialization.
- **Job Name**: The `NAME` attribute uniquely identifies the job type when making API calls.
- **Parameters**: Defined as class attributes, these are automatically validated by Pydantic during API calls. This ensures that input data meets the expected types and constraints before processing.
- **Validation Logic**: The `validate_logic` method runs during API calls to verify that the job's input parameters satisfy specific conditions. This validation occurs before the job is submitted for execution, allowing for early detection of configuration errors.
- **Execution Method**: The `execute` method carries out the browser automation using a Playwright `Page` object. Workers use this method to execute jobs.

Refer to [Playwright's documentation](https://playwright.dev/python/docs/api/class-page) for more details on what you can do with `page`.

### Client

To interact with the service using Python, you can use browsy client:
```bash
pip install browsy
```

Here's how you can use it:
```python
from browsy import BrowsyClient
# Also available:
# from browsy import AsyncBrowsyClient

client = BrowsyClient(base_url="http://localhost:8000")

job = client.create_job(name="screenshot", parameters={
    "url": "https://example.com",
    "full_page": True
})

result = client.get_job_output(job_id=job.id)
if result:
  with open("screenshot.png", "wb") as f:
      f.write(result)
```

This example demonstrates how to submit a screenshot job, retrieve the result, and save it locally.

### API

You can explore and interact with the API using the Swagger UI documentation provided by FastAPI. Visit `http://localhost:8000/docs` to access it.

If you prefer to interact with the service directly via HTTP, here are example requests:

#### Submit a job

`POST /api/v1/jobs`

Example request:
  ```json
  {
    "name": "screenshot",
    "parameters": {
      "url": "https://example.com",
      "html": null,
      "full_page": true
    }
  }
  ```

Example response:
```json
{
  "id": 1,
  "name": "screenshot",
  "input": {
    "url": "https://example.com",
    "html": null,
    "full_page": true
  },
  "processing_time": null,
  "status": "pending",
  "created_at": "2024-12-30T15:02:04.720000",
  "updated_at": null,
  "worker": null
}
```

#### Check job status

`GET /api/v1/jobs/{job_id}`

Example response:
```json
{
  "id": 1,
  "name": "screenshot",
  "input": {
    "url": "https://example.com",
    "html": null,
    "full_page": true
  },
  "processing_time": 1298,
  "status": "done",
  "created_at": "2024-12-30T15:06:39.204000",
  "updated_at": "2024-12-30T15:06:44.743000",
  "worker": "worker_lze4nFMy"
}
```

#### Retrieve job result

To retrieve the result of a job, use the following endpoint:

`GET /api/v1/jobs/{job_id}/result`

**Status Codes:**

- **200**: The job is complete, and the output is available. The response type is `application/octet-stream`.
- **202**: The job is pending or currently in progress.
- **204**: The job is complete or has failed, and there is no output available.
- **404**: No job exists with the provided ID.

**Response Headers:**

- `X-Job-Status`: Indicates the current status of the job.
- `X-Job-Last-Updated`: Shows the last time the job's status was updated.

### Internal Dashboard

Browsy provides a built-in monitoring dashboard accessible at `/internal`. This interface gives you real-time visibility:

- View all active workers and their current status
- Monitor job queues and execution status
- Track execution times
- Identify potential bottlenecks or issues

The dashboard updates automatically to show you the latest state of your system:

![Internal Dashboard](.github/assets/dashboard.png)


## How it works

![flow](.github/assets/flow.png)

1. You define jobs using Playwright's API
2. Send job requests through HTTP
3. Workers execute jobs in Docker containers
4. Get results when ready

## Documentation

For detailed setup and usage, check out the [documentation](https://broton.dev/).

## License

MIT License - see [LICENSE](LICENSE) for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "browsy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "automation, browser, docker, pdf, playwright, queue, scraping, screenshot",
    "author": null,
    "author_email": "Michal Broton <michal@broton.dev>",
    "download_url": "https://files.pythonhosted.org/packages/9e/c4/1e636195ef0376cfc1e0822882c8e15ee6d84e452da752fd5e0f3a2d803f/browsy-0.0.8.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <h1>browsy</h1>\n</div>\n\n## What is browsy?\n\nbrowsy is a service that lets you run browser automation tasks without managing browser instances yourself. It provides:\n\n- **Simple Job Definition**: Write Playwright-powered automation tasks in Python\n- **HTTP API**: Queue jobs and retrieve results through HTTP endpoints\n- **Docker Ready**: Run everything in containers without worrying about browser dependencies\n- **Queue System**: Jobs are processed in order, with automatic retries and status tracking\n- **Extensible**: Create any browser automation task - from screenshots and PDFs to complex scraping operations\n\nThink of it as a way to turn your Playwright scripts into HTTP services that can be called from anywhere.\n\n## Quick Start\n\n### Download files\n\nTo get started, download the necessary files using the following script:\n```bash\ncurl -LsSf https://raw.githubusercontent.com/mbroton/browsy/main/scripts/get.sh | sh\n```\n\nThis script will download the `docker-compose` file and example jobs. Once downloaded, navigate to the `browsy/` directory.\n\nAlternatively, you can clone the repository directly and navigate to the quickstart directory.\n\n### Start browsy\n\nTo start the service, run:\n```bash\ndocker compose up --build --scale worker=3\n```\n\nYou can adjust the number of workers by modifying the `--scale worker` parameter. \n\nVisit `http://localhost:8000/docs` to access the interactive API documentation provided by FastAPI.\n\n### Defining custom jobs\n\nA job is defined as any class that inherits from `browsy.BaseJob`. Browsy will automatically search for these classes within the `jobs/` directory.\n\nHere's an example implementation:\n```python\nfrom browsy import BaseJob, Page\n\nclass ScreenshotJob(BaseJob):\n    NAME = \"screenshot\"\n\n    url: str | None = None\n    html: str | None = None\n    full_page: bool = False\n\n    async def execute(self, page: Page) -> bytes:\n        if self.url:\n            await page.goto(self.url)\n        elif self.html:\n            await page.set_content(self.html)\n        return await page.screenshot(full_page=self.full_page)\n\n    async def validate_logic(self) -> bool:\n        return bool(self.url) != bool(self.html)\n```\n\n- **Class Definition**: The `ScreenshotJob` class inherits from `BaseJob`, which itself is based on Pydantic's `BaseModel`. This provides automatic data validation and serialization.\n- **Job Name**: The `NAME` attribute uniquely identifies the job type when making API calls.\n- **Parameters**: Defined as class attributes, these are automatically validated by Pydantic during API calls. This ensures that input data meets the expected types and constraints before processing.\n- **Validation Logic**: The `validate_logic` method runs during API calls to verify that the job's input parameters satisfy specific conditions. This validation occurs before the job is submitted for execution, allowing for early detection of configuration errors.\n- **Execution Method**: The `execute` method carries out the browser automation using a Playwright `Page` object. Workers use this method to execute jobs.\n\nRefer to [Playwright's documentation](https://playwright.dev/python/docs/api/class-page) for more details on what you can do with `page`.\n\n### Client\n\nTo interact with the service using Python, you can use browsy client:\n```bash\npip install browsy\n```\n\nHere's how you can use it:\n```python\nfrom browsy import BrowsyClient\n# Also available:\n# from browsy import AsyncBrowsyClient\n\nclient = BrowsyClient(base_url=\"http://localhost:8000\")\n\njob = client.create_job(name=\"screenshot\", parameters={\n    \"url\": \"https://example.com\",\n    \"full_page\": True\n})\n\nresult = client.get_job_output(job_id=job.id)\nif result:\n  with open(\"screenshot.png\", \"wb\") as f:\n      f.write(result)\n```\n\nThis example demonstrates how to submit a screenshot job, retrieve the result, and save it locally.\n\n### API\n\nYou can explore and interact with the API using the Swagger UI documentation provided by FastAPI. Visit `http://localhost:8000/docs` to access it.\n\nIf you prefer to interact with the service directly via HTTP, here are example requests:\n\n#### Submit a job\n\n`POST /api/v1/jobs`\n\nExample request:\n  ```json\n  {\n    \"name\": \"screenshot\",\n    \"parameters\": {\n      \"url\": \"https://example.com\",\n      \"html\": null,\n      \"full_page\": true\n    }\n  }\n  ```\n\nExample response:\n```json\n{\n  \"id\": 1,\n  \"name\": \"screenshot\",\n  \"input\": {\n    \"url\": \"https://example.com\",\n    \"html\": null,\n    \"full_page\": true\n  },\n  \"processing_time\": null,\n  \"status\": \"pending\",\n  \"created_at\": \"2024-12-30T15:02:04.720000\",\n  \"updated_at\": null,\n  \"worker\": null\n}\n```\n\n#### Check job status\n\n`GET /api/v1/jobs/{job_id}`\n\nExample response:\n```json\n{\n  \"id\": 1,\n  \"name\": \"screenshot\",\n  \"input\": {\n    \"url\": \"https://example.com\",\n    \"html\": null,\n    \"full_page\": true\n  },\n  \"processing_time\": 1298,\n  \"status\": \"done\",\n  \"created_at\": \"2024-12-30T15:06:39.204000\",\n  \"updated_at\": \"2024-12-30T15:06:44.743000\",\n  \"worker\": \"worker_lze4nFMy\"\n}\n```\n\n#### Retrieve job result\n\nTo retrieve the result of a job, use the following endpoint:\n\n`GET /api/v1/jobs/{job_id}/result`\n\n**Status Codes:**\n\n- **200**: The job is complete, and the output is available. The response type is `application/octet-stream`.\n- **202**: The job is pending or currently in progress.\n- **204**: The job is complete or has failed, and there is no output available.\n- **404**: No job exists with the provided ID.\n\n**Response Headers:**\n\n- `X-Job-Status`: Indicates the current status of the job.\n- `X-Job-Last-Updated`: Shows the last time the job's status was updated.\n\n### Internal Dashboard\n\nBrowsy provides a built-in monitoring dashboard accessible at `/internal`. This interface gives you real-time visibility:\n\n- View all active workers and their current status\n- Monitor job queues and execution status\n- Track execution times\n- Identify potential bottlenecks or issues\n\nThe dashboard updates automatically to show you the latest state of your system:\n\n![Internal Dashboard](.github/assets/dashboard.png)\n\n\n## How it works\n\n![flow](.github/assets/flow.png)\n\n1. You define jobs using Playwright's API\n2. Send job requests through HTTP\n3. Workers execute jobs in Docker containers\n4. Get results when ready\n\n## Documentation\n\nFor detailed setup and usage, check out the [documentation](https://broton.dev/).\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Playwright-based browser automation service with HTTP API and Docker support",
    "version": "0.0.8",
    "project_urls": null,
    "split_keywords": [
        "automation",
        " browser",
        " docker",
        " pdf",
        " playwright",
        " queue",
        " scraping",
        " screenshot"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "663e060936cdb9128b68b2caa04876a7d19ada944d6b9f025cf8b989f24b0b22",
                "md5": "61cf3eae7e2fbb5b51fd30a22d427b5b",
                "sha256": "cb527cb9da90262a74c011987616fea3c8cbf5a773ea640199aca59ab47e7648"
            },
            "downloads": -1,
            "filename": "browsy-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "61cf3eae7e2fbb5b51fd30a22d427b5b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 19563,
            "upload_time": "2025-01-13T18:50:13",
            "upload_time_iso_8601": "2025-01-13T18:50:13.583944Z",
            "url": "https://files.pythonhosted.org/packages/66/3e/060936cdb9128b68b2caa04876a7d19ada944d6b9f025cf8b989f24b0b22/browsy-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ec41e636195ef0376cfc1e0822882c8e15ee6d84e452da752fd5e0f3a2d803f",
                "md5": "674ced07cfe553315cc680c5b0570ba6",
                "sha256": "a6f70329a032833163518dd5311a095be22819db969db0658ca3846cb7ccb341"
            },
            "downloads": -1,
            "filename": "browsy-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "674ced07cfe553315cc680c5b0570ba6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 315535,
            "upload_time": "2025-01-13T18:50:19",
            "upload_time_iso_8601": "2025-01-13T18:50:19.804145Z",
            "url": "https://files.pythonhosted.org/packages/9e/c4/1e636195ef0376cfc1e0822882c8e15ee6d84e452da752fd5e0f3a2d803f/browsy-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-13 18:50:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "browsy"
}

None