Name | spatula JSON |
Version |
0.9.1
JSON |
| download |
home_page | https://github.com/jamesturk/spatula/ |
Summary | A modern Python library for writing maintainable web scrapers. |
upload_time | 2024-07-10 07:17:31 |
maintainer | None |
docs_url | None |
author | James Turk |
requires_python | <4.0,>=3.8 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Overview
*spatula* is a modern Python library for writing maintainable web scrapers.
Source: [https://github.com/jamesturk/spatula](https://github.com/jamesturk/spatula)
Documentation: [https://jamesturk.github.io/spatula/](https://jamesturk.github.io/spatula/)
Issues: [https://github.com/jamesturk/spatula/issues](https://github.com/jamesturk/spatula/issues)
[![PyPI badge](https://badge.fury.io/py/spatula.svg)](https://badge.fury.io/py/spatula)
[![Test badge](https://github.com/jamesturk/spatula/workflows/Test%20&%20Lint/badge.svg)](https://github.com/jamesturk/spatula/actions?query=workflow%3A%22Test+%26+Lint%22)
## Features
- **Page-oriented design**: Encourages writing understandable & maintainable scrapers.
- **Not Just HTML**: Provides built in [handlers for common data formats](https://jamesturk.github.io/spatula/reference/#pages) including CSV, JSON, XML, PDF, and Excel. Or write your own.
- **Fast HTML parsing**: Uses `lxml.html` for fast, consistent, and reliable parsing of HTML.
- **Flexible Data Model Support**: Compatible with `dataclasses`, `attrs`, `pydantic`, or bring your own data model classes for storing & validating your scraped data.
- **CLI Tools**: Offers several [CLI utilities](https://jamesturk.github.io/spatula/cli/) that can help streamline development & testing cycle.
- **Fully Typed**: Makes full use of Python 3 type annotations.
Raw data
{
"_id": null,
"home_page": "https://github.com/jamesturk/spatula/",
"name": "spatula",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": null,
"author": "James Turk",
"author_email": "dev@jamesturk.net",
"download_url": "https://files.pythonhosted.org/packages/b7/36/ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa/spatula-0.9.1.tar.gz",
"platform": null,
"description": "# Overview\n\n*spatula* is a modern Python library for writing maintainable web scrapers.\n\nSource: [https://github.com/jamesturk/spatula](https://github.com/jamesturk/spatula)\n\nDocumentation: [https://jamesturk.github.io/spatula/](https://jamesturk.github.io/spatula/)\n\nIssues: [https://github.com/jamesturk/spatula/issues](https://github.com/jamesturk/spatula/issues)\n\n[![PyPI badge](https://badge.fury.io/py/spatula.svg)](https://badge.fury.io/py/spatula)\n[![Test badge](https://github.com/jamesturk/spatula/workflows/Test%20&%20Lint/badge.svg)](https://github.com/jamesturk/spatula/actions?query=workflow%3A%22Test+%26+Lint%22)\n\n## Features\n\n- **Page-oriented design**: Encourages writing understandable & maintainable scrapers.\n- **Not Just HTML**: Provides built in [handlers for common data formats](https://jamesturk.github.io/spatula/reference/#pages) including CSV, JSON, XML, PDF, and Excel. Or write your own.\n- **Fast HTML parsing**: Uses `lxml.html` for fast, consistent, and reliable parsing of HTML.\n- **Flexible Data Model Support**: Compatible with `dataclasses`, `attrs`, `pydantic`, or bring your own data model classes for storing & validating your scraped data.\n- **CLI Tools**: Offers several [CLI utilities](https://jamesturk.github.io/spatula/cli/) that can help streamline development & testing cycle.\n- **Fully Typed**: Makes full use of Python 3 type annotations.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A modern Python library for writing maintainable web scrapers.",
"version": "0.9.1",
"project_urls": {
"Documentation": "https://jamesturk.github.io/spatula/",
"Homepage": "https://github.com/jamesturk/spatula/",
"Repository": "https://github.com/jamesturk/spatula/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0c4bf8650ff2003220b6edd166f188c32e67c64f58f9a5c259f39c61f9a29355",
"md5": "fea7d831d36eda47419a9579c4f3ff3c",
"sha256": "adf6474504090943a78e1507c7b00e38ee0fd761cf4c136696975d840ac8c798"
},
"downloads": -1,
"filename": "spatula-0.9.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fea7d831d36eda47419a9579c4f3ff3c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 16617,
"upload_time": "2024-07-10T07:17:30",
"upload_time_iso_8601": "2024-07-10T07:17:30.163474Z",
"url": "https://files.pythonhosted.org/packages/0c/4b/f8650ff2003220b6edd166f188c32e67c64f58f9a5c259f39c61f9a29355/spatula-0.9.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b736ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa",
"md5": "ad026bf4453f6783e1ee99398bdfec96",
"sha256": "245a71e46f01c2bd4ba8f67f979cfbf116caeaa3b17bf8b3110d807dab51a329"
},
"downloads": -1,
"filename": "spatula-0.9.1.tar.gz",
"has_sig": false,
"md5_digest": "ad026bf4453f6783e1ee99398bdfec96",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 14904,
"upload_time": "2024-07-10T07:17:31",
"upload_time_iso_8601": "2024-07-10T07:17:31.789276Z",
"url": "https://files.pythonhosted.org/packages/b7/36/ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa/spatula-0.9.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-10 07:17:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jamesturk",
"github_project": "spatula",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "spatula"
}