# Rss parser
[![Downloads](https://pepy.tech/badge/rss-parser)](https://pepy.tech/project/rss-parser)
[![Downloads](https://pepy.tech/badge/rss-parser/month)](https://pepy.tech/project/rss-parser)
[![Downloads](https://pepy.tech/badge/rss-parser/week)](https://pepy.tech/project/rss-parser)
[![PyPI version](https://img.shields.io/pypi/v/rss-parser)](https://pypi.org/project/rss-parser)
[![Python versions](https://img.shields.io/pypi/pyversions/rss-parser)](https://pypi.org/project/rss-parser)
[![Wheel status](https://img.shields.io/pypi/wheel/rss-parser)](https://pypi.org/project/rss-parser)
[![License](https://img.shields.io/pypi/l/rss-parser?color=success)](https://github.com/dhvcc/rss-parser/blob/master/LICENSE)
![Docs](https://github.com/dhvcc/rss-parser/actions/workflows/pages/pages-build-deployment/badge.svg)
![CI](https://github.com/dhvcc/rss-parser/actions/workflows/ci.yml/badge.svg?branch=master)
![PyPi publish](https://github.com/dhvcc/rss-parser/actions/workflows/publish_to_pypi.yml/badge.svg)
## About
`rss-parser` is typed python RSS/Atom parsing module built using [pydantic](https://github.com/pydantic/pydantic) and [xmltodict](https://github.com/martinblech/xmltodict)
## Installation
```bash
pip install rss-parser
```
or
```bash
git clone https://github.com/dhvcc/rss-parser.git
cd rss-parser
poetry build
pip install dist/*.whl
```
## V1 -> V2 migration
- `Parser` class was renamed to `RSSParser`
- Models for RSS-specific schemas were moved from `rss_parser.models` to `rss_parser.models.rss`. Generic types are not touched
- Date parsing was changed a bit, now uses pydantic's `validator` instead of `email.utils`, so the code will produce datetimes better, where it was defaulting to `str` before
## Usage
### Quickstart
**NOTE: For parsing Atom, use `AtomParser`**
```python
from rss_parser import RSSParser
from requests import get # noqa
rss_url = "https://rss.art19.com/apology-line"
response = get(rss_url)
rss = RSSParser.parse(response.text)
# Print out rss meta data
print("Language", rss.channel.language)
print("RSS", rss.version)
# Iteratively print feed items
for item in rss.channel.items:
print(item.title)
print(item.description[:50])
# Language en
# RSS 2.0
# Wondery Presents - Flipping The Bird: Elon vs Twitter
# <p>When Elon Musk posted a video of himself arrivi
# Introducing: The Apology Line
# <p>If you could call a number and say you’re sorry
```
Here we can see that description is still somehow has <p> - this is beacause it's placed as [CDATA](https://www.w3resource.com/xml/CDATA-sections.php) like so
```
<![CDATA[<p>If you could call ...</p>]]>
```
### Overriding schema
If you want to customize the schema or provide a custom one - use `schema` keyword argument of the parser
```python
from rss_parser import RSSParser
from rss_parser.models import XMLBaseModel
from rss_parser.models.rss import RSS
from rss_parser.models.types import Tag
class CustomSchema(RSS, XMLBaseModel):
channel: None = None # Removing previous channel field
custom: Tag[str]
with open("tests/samples/custom.xml") as f:
data = f.read()
rss = RSSParser.parse(data, schema=CustomSchema)
print("RSS", rss.version)
print("Custom", rss.custom)
# RSS 2.0
# Custom Custom tag data
```
### xmltodict
This library uses [xmltodict](https://github.com/martinblech/xmltodict) to parse XML data. You can see the detailed documentation [here](https://github.com/martinblech/xmltodict#xmltodict)
The basic thing you should know is that your data is processed into dictionaries
For example, this data
```xml
<tag>content</tag>
```
will result in the following
```python
{
"tag": "content"
}
```
*But*, when handling attributes, the content of the tag will be also a dictionary
```xml
<tag attr="1" data-value="data">data</tag>
```
Turns into
```python
{
"tag": {
"@attr": "1",
"@data-value": "data",
"#text": "content"
}
}
```
Multiple children of a tag will be put into a list
```xml
<div>
<tag>content</tag>
<tag>content2</tag>
</div>
```
Results in a list
```python
[
{ "tag": "content" },
{ "tag": "content" },
]
```
If you don't want to deal with those conditions and parse something **always** as a list -
please, use `rss_parser.models.types.only_list.OnlyList` like we did in `Channel`
```python
from typing import Optional
from rss_parser.models.rss.item import Item
from rss_parser.models.types.only_list import OnlyList
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic
pydantic = import_v1_pydantic()
...
class OptionalChannelElementsMixin(...):
...
items: Optional[OnlyList[Tag[Item]]] = pydantic.Field(alias="item", default=[])
```
### Tag field
This is a generic field that handles tags as raw data or a dictonary returned with attributes
Example
```python
from rss_parser.models import XMLBaseModel
from rss_parser.models.types.tag import Tag
class Model(XMLBaseModel):
width: Tag[int]
category: Tag[str]
m = Model(
width=48,
category={"@someAttribute": "https://example.com", "#text": "valid string"},
)
# Content value is an integer, as per the generic type
assert m.width.content == 48
assert type(m.width), type(m.width.content) == (Tag[int], int)
# The attributes are empty by default
assert m.width.attributes == {} # But are populated when provided.
# Note that the @ symbol is trimmed from the beggining and name is convert to snake_case
assert m.category.attributes == {'some_attribute': 'https://example.com'}
```
## Contributing
Pull requests are welcome. For major changes, please open an issue first
to discuss what you would like to change.
Install dependencies with `poetry install` (`pip install poetry`)
`pre-commit` usage is highly recommended. To install hooks run
```bash
poetry run pre-commit install -t=pre-commit -t=pre-push
```
## License
[GPLv3](https://github.com/dhvcc/rss-parser/blob/master/LICENSE)
Raw data
{
"_id": null,
"home_page": null,
"name": "rss-parser",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "python, python3, cli, rss, parser, gplv3, typed, typed-python",
"author": "dhvcc",
"author_email": "1337kwiz@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/72/f1/8853d9808f68b4a34a316977f0082906b32e8a2313b6fb3935155fb055a1/rss_parser-2.1.0.tar.gz",
"platform": null,
"description": "# Rss parser\n\n[![Downloads](https://pepy.tech/badge/rss-parser)](https://pepy.tech/project/rss-parser)\n[![Downloads](https://pepy.tech/badge/rss-parser/month)](https://pepy.tech/project/rss-parser)\n[![Downloads](https://pepy.tech/badge/rss-parser/week)](https://pepy.tech/project/rss-parser)\n\n[![PyPI version](https://img.shields.io/pypi/v/rss-parser)](https://pypi.org/project/rss-parser)\n[![Python versions](https://img.shields.io/pypi/pyversions/rss-parser)](https://pypi.org/project/rss-parser)\n[![Wheel status](https://img.shields.io/pypi/wheel/rss-parser)](https://pypi.org/project/rss-parser)\n[![License](https://img.shields.io/pypi/l/rss-parser?color=success)](https://github.com/dhvcc/rss-parser/blob/master/LICENSE)\n\n![Docs](https://github.com/dhvcc/rss-parser/actions/workflows/pages/pages-build-deployment/badge.svg)\n![CI](https://github.com/dhvcc/rss-parser/actions/workflows/ci.yml/badge.svg?branch=master)\n![PyPi publish](https://github.com/dhvcc/rss-parser/actions/workflows/publish_to_pypi.yml/badge.svg)\n\n## About\n\n`rss-parser` is typed python RSS/Atom parsing module built using [pydantic](https://github.com/pydantic/pydantic) and [xmltodict](https://github.com/martinblech/xmltodict)\n\n## Installation\n\n```bash\npip install rss-parser\n```\n\nor\n\n```bash\ngit clone https://github.com/dhvcc/rss-parser.git\ncd rss-parser\npoetry build\npip install dist/*.whl\n```\n\n## V1 -> V2 migration\n- `Parser` class was renamed to `RSSParser`\n- Models for RSS-specific schemas were moved from `rss_parser.models` to `rss_parser.models.rss`. Generic types are not touched\n- Date parsing was changed a bit, now uses pydantic's `validator` instead of `email.utils`, so the code will produce datetimes better, where it was defaulting to `str` before\n\n## Usage\n\n### Quickstart\n\n**NOTE: For parsing Atom, use `AtomParser`**\n\n```python\nfrom rss_parser import RSSParser\nfrom requests import get # noqa\n\nrss_url = \"https://rss.art19.com/apology-line\"\nresponse = get(rss_url)\n\nrss = RSSParser.parse(response.text)\n\n# Print out rss meta data\nprint(\"Language\", rss.channel.language)\nprint(\"RSS\", rss.version)\n\n# Iteratively print feed items\nfor item in rss.channel.items:\n print(item.title)\n print(item.description[:50])\n\n# Language en\n# RSS 2.0\n# Wondery Presents - Flipping The Bird: Elon vs Twitter\n# <p>When Elon Musk posted a video of himself arrivi\n# Introducing: The Apology Line\n# <p>If you could call a number and say you\u2019re sorry\n```\n\nHere we can see that description is still somehow has <p> - this is beacause it's placed as [CDATA](https://www.w3resource.com/xml/CDATA-sections.php) like so\n\n```\n<![CDATA[<p>If you could call ...</p>]]>\n```\n\n### Overriding schema\n\nIf you want to customize the schema or provide a custom one - use `schema` keyword argument of the parser\n\n```python\nfrom rss_parser import RSSParser\nfrom rss_parser.models import XMLBaseModel\nfrom rss_parser.models.rss import RSS\nfrom rss_parser.models.types import Tag\n\n\nclass CustomSchema(RSS, XMLBaseModel):\n channel: None = None # Removing previous channel field\n custom: Tag[str]\n\n\nwith open(\"tests/samples/custom.xml\") as f:\n data = f.read()\n\nrss = RSSParser.parse(data, schema=CustomSchema)\n\nprint(\"RSS\", rss.version)\nprint(\"Custom\", rss.custom)\n\n# RSS 2.0\n# Custom Custom tag data\n```\n\n### xmltodict\n\nThis library uses [xmltodict](https://github.com/martinblech/xmltodict) to parse XML data. You can see the detailed documentation [here](https://github.com/martinblech/xmltodict#xmltodict)\n\nThe basic thing you should know is that your data is processed into dictionaries\n\nFor example, this data\n\n```xml\n<tag>content</tag>\n```\n\nwill result in the following\n\n```python\n{\n \"tag\": \"content\"\n}\n```\n\n*But*, when handling attributes, the content of the tag will be also a dictionary\n\n```xml\n<tag attr=\"1\" data-value=\"data\">data</tag>\n```\n\nTurns into\n\n```python\n{\n \"tag\": {\n \"@attr\": \"1\",\n \"@data-value\": \"data\",\n \"#text\": \"content\"\n }\n}\n```\n\nMultiple children of a tag will be put into a list\n\n```xml\n<div>\n <tag>content</tag>\n <tag>content2</tag>\n</div>\n```\n\nResults in a list\n\n```python\n[\n { \"tag\": \"content\" },\n { \"tag\": \"content\" },\n]\n```\n\nIf you don't want to deal with those conditions and parse something **always** as a list - \nplease, use `rss_parser.models.types.only_list.OnlyList` like we did in `Channel`\n```python\nfrom typing import Optional\n\nfrom rss_parser.models.rss.item import Item\nfrom rss_parser.models.types.only_list import OnlyList\nfrom rss_parser.models.types.tag import Tag\nfrom rss_parser.pydantic_proxy import import_v1_pydantic\n\npydantic = import_v1_pydantic()\n...\n\n\nclass OptionalChannelElementsMixin(...):\n ...\n items: Optional[OnlyList[Tag[Item]]] = pydantic.Field(alias=\"item\", default=[])\n```\n\n### Tag field\n\nThis is a generic field that handles tags as raw data or a dictonary returned with attributes\n\nExample\n\n```python\nfrom rss_parser.models import XMLBaseModel\nfrom rss_parser.models.types.tag import Tag\n\n\nclass Model(XMLBaseModel):\n width: Tag[int]\n category: Tag[str]\n\n\nm = Model(\n width=48,\n category={\"@someAttribute\": \"https://example.com\", \"#text\": \"valid string\"},\n)\n\n# Content value is an integer, as per the generic type\nassert m.width.content == 48\n\nassert type(m.width), type(m.width.content) == (Tag[int], int)\n\n# The attributes are empty by default\nassert m.width.attributes == {} # But are populated when provided.\n\n# Note that the @ symbol is trimmed from the beggining and name is convert to snake_case\nassert m.category.attributes == {'some_attribute': 'https://example.com'}\n```\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first\nto discuss what you would like to change.\n\nInstall dependencies with `poetry install` (`pip install poetry`)\n\n`pre-commit` usage is highly recommended. To install hooks run\n\n```bash\npoetry run pre-commit install -t=pre-commit -t=pre-push\n```\n\n## License\n\n[GPLv3](https://github.com/dhvcc/rss-parser/blob/master/LICENSE)\n",
"bugtrack_url": null,
"license": "GPL-3.0",
"summary": "Typed pythonic RSS/Atom parser",
"version": "2.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/dhvcc/rss-parser/issues",
"Homepage": "https://dhvcc.github.io/rss-parser",
"Source": "https://github.com/dhvcc/rss-parser"
},
"split_keywords": [
"python",
" python3",
" cli",
" rss",
" parser",
" gplv3",
" typed",
" typed-python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d16643fb6a0a1b3be3974e03b3d1182c066cddb6efedd7b3b23609597f962631",
"md5": "ec36ce4be5bbdbe6a1213e857ea7b7e4",
"sha256": "193b76f3292657faf85dd11dfe823b9007551fb7722d4363316870e32aff5ced"
},
"downloads": -1,
"filename": "rss_parser-2.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ec36ce4be5bbdbe6a1213e857ea7b7e4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 29816,
"upload_time": "2024-09-26T10:58:58",
"upload_time_iso_8601": "2024-09-26T10:58:58.059636Z",
"url": "https://files.pythonhosted.org/packages/d1/66/43fb6a0a1b3be3974e03b3d1182c066cddb6efedd7b3b23609597f962631/rss_parser-2.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "72f18853d9808f68b4a34a316977f0082906b32e8a2313b6fb3935155fb055a1",
"md5": "132a9fc810304d647ecd970c91ae97be",
"sha256": "4a1eb0f69442b9b8f3b8343c053c3a772c8e9a5c8a6a969edadc03800f30b47e"
},
"downloads": -1,
"filename": "rss_parser-2.1.0.tar.gz",
"has_sig": false,
"md5_digest": "132a9fc810304d647ecd970c91ae97be",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 25511,
"upload_time": "2024-09-26T10:58:59",
"upload_time_iso_8601": "2024-09-26T10:58:59.582059Z",
"url": "https://files.pythonhosted.org/packages/72/f1/8853d9808f68b4a34a316977f0082906b32e8a2313b6fb3935155fb055a1/rss_parser-2.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-26 10:58:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dhvcc",
"github_project": "rss-parser",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rss-parser"
}