spatula


Namespatula JSON
Version 0.9.1 PyPI version JSON
download
home_pagehttps://github.com/jamesturk/spatula/
SummaryA modern Python library for writing maintainable web scrapers.
upload_time2024-07-10 07:17:31
maintainerNone
docs_urlNone
authorJames Turk
requires_python<4.0,>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Overview

*spatula* is a modern Python library for writing maintainable web scrapers.

Source: [https://github.com/jamesturk/spatula](https://github.com/jamesturk/spatula)

Documentation: [https://jamesturk.github.io/spatula/](https://jamesturk.github.io/spatula/)

Issues: [https://github.com/jamesturk/spatula/issues](https://github.com/jamesturk/spatula/issues)

[![PyPI badge](https://badge.fury.io/py/spatula.svg)](https://badge.fury.io/py/spatula)
[![Test badge](https://github.com/jamesturk/spatula/workflows/Test%20&%20Lint/badge.svg)](https://github.com/jamesturk/spatula/actions?query=workflow%3A%22Test+%26+Lint%22)

## Features

- **Page-oriented design**: Encourages writing understandable & maintainable scrapers.
- **Not Just HTML**: Provides built in [handlers for common data formats](https://jamesturk.github.io/spatula/reference/#pages) including CSV, JSON, XML, PDF, and Excel.  Or write your own.
- **Fast HTML parsing**: Uses `lxml.html` for fast, consistent, and reliable parsing of HTML.
- **Flexible Data Model Support**: Compatible with `dataclasses`, `attrs`, `pydantic`, or bring your own data model classes for storing & validating your scraped data.
- **CLI Tools**: Offers several [CLI utilities](https://jamesturk.github.io/spatula/cli/) that can help streamline development & testing cycle.
- **Fully Typed**: Makes full use of Python 3 type annotations.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jamesturk/spatula/",
    "name": "spatula",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "James Turk",
    "author_email": "dev@jamesturk.net",
    "download_url": "https://files.pythonhosted.org/packages/b7/36/ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa/spatula-0.9.1.tar.gz",
    "platform": null,
    "description": "# Overview\n\n*spatula* is a modern Python library for writing maintainable web scrapers.\n\nSource: [https://github.com/jamesturk/spatula](https://github.com/jamesturk/spatula)\n\nDocumentation: [https://jamesturk.github.io/spatula/](https://jamesturk.github.io/spatula/)\n\nIssues: [https://github.com/jamesturk/spatula/issues](https://github.com/jamesturk/spatula/issues)\n\n[![PyPI badge](https://badge.fury.io/py/spatula.svg)](https://badge.fury.io/py/spatula)\n[![Test badge](https://github.com/jamesturk/spatula/workflows/Test%20&%20Lint/badge.svg)](https://github.com/jamesturk/spatula/actions?query=workflow%3A%22Test+%26+Lint%22)\n\n## Features\n\n- **Page-oriented design**: Encourages writing understandable & maintainable scrapers.\n- **Not Just HTML**: Provides built in [handlers for common data formats](https://jamesturk.github.io/spatula/reference/#pages) including CSV, JSON, XML, PDF, and Excel.  Or write your own.\n- **Fast HTML parsing**: Uses `lxml.html` for fast, consistent, and reliable parsing of HTML.\n- **Flexible Data Model Support**: Compatible with `dataclasses`, `attrs`, `pydantic`, or bring your own data model classes for storing & validating your scraped data.\n- **CLI Tools**: Offers several [CLI utilities](https://jamesturk.github.io/spatula/cli/) that can help streamline development & testing cycle.\n- **Fully Typed**: Makes full use of Python 3 type annotations.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A modern Python library for writing maintainable web scrapers.",
    "version": "0.9.1",
    "project_urls": {
        "Documentation": "https://jamesturk.github.io/spatula/",
        "Homepage": "https://github.com/jamesturk/spatula/",
        "Repository": "https://github.com/jamesturk/spatula/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c4bf8650ff2003220b6edd166f188c32e67c64f58f9a5c259f39c61f9a29355",
                "md5": "fea7d831d36eda47419a9579c4f3ff3c",
                "sha256": "adf6474504090943a78e1507c7b00e38ee0fd761cf4c136696975d840ac8c798"
            },
            "downloads": -1,
            "filename": "spatula-0.9.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fea7d831d36eda47419a9579c4f3ff3c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 16617,
            "upload_time": "2024-07-10T07:17:30",
            "upload_time_iso_8601": "2024-07-10T07:17:30.163474Z",
            "url": "https://files.pythonhosted.org/packages/0c/4b/f8650ff2003220b6edd166f188c32e67c64f58f9a5c259f39c61f9a29355/spatula-0.9.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b736ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa",
                "md5": "ad026bf4453f6783e1ee99398bdfec96",
                "sha256": "245a71e46f01c2bd4ba8f67f979cfbf116caeaa3b17bf8b3110d807dab51a329"
            },
            "downloads": -1,
            "filename": "spatula-0.9.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ad026bf4453f6783e1ee99398bdfec96",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 14904,
            "upload_time": "2024-07-10T07:17:31",
            "upload_time_iso_8601": "2024-07-10T07:17:31.789276Z",
            "url": "https://files.pythonhosted.org/packages/b7/36/ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa/spatula-0.9.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-10 07:17:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jamesturk",
    "github_project": "spatula",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "spatula"
}
        
Elapsed time: 0.86491s