fastfeedparser


Namefastfeedparser JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/kagi-search/fastfeedparser
SummaryHigh performance RSS, Atom and RDF parser in Python
upload_time2024-11-15 01:43:06
maintainerNone
docs_urlNone
authorVladimir Prelovac
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FastFeedParser

A high-performance RSS, Atom, and RDF feed parser for Python. FastFeedParser is designed to be fast, memory-efficient, and easy to use while providing comprehensive feed parsing capabilities.

### Why FastFeedParser?

The main advantage of FastFeedParser over the traditional feedparser library is its lightweight design and exceptional performance - benchmarks show it's 10x-100x faster than feedparser while maintaining a familiar API. This dramatic speed improvement is achieved through:

- Efficient XML parsing using lxml
- Optimized memory usage
- Minimal dependencies
- Streamlined codebase focused on core functionality

FastFeedParser is used for high performance processing of thousands of feeds for [Kagi Small Web](https://github.com/kagisearch/smallweb) initiative.


## Features

- Fast parsing of RSS 2.0, Atom 1.0, and RDF/RSS 1.0 feeds
- Robust error handling and encoding detection
- Support for media content and enclosures
- Automatic date parsing and standardization to UTC ISO 8601 format
- Clean, Pythonic API similar to feedparser
- Comprehensive handling of feed metadata
- Support for various feed extensions (Media RSS, Dublin Core, etc.)


## Installation

```bash
pip install fastfeedparser
```

## Quick Start

```python
import fastfeedparser

# Parse from URL
myfeed = fastfeedparser.parse('https://example.com/feed.xml')

# Parse from string
xml_content = '''<?xml version="1.0"?>
<rss version="2.0">
    <channel>
        <title>Example Feed</title>
        ...
    </channel>
</rss>'''
myfeed = fastfeedparser.parse(xml_content)

# Access feed global information
print(myfeed.feed.title)
print(myfeed.feed.link)

# Access feed entries
for entry in myfeed.entries:
    print(entry.title)
    print(entry.link)
    print(entry.published)
```

## Key Features

### Feed Types Support
- RSS 2.0
- Atom 1.0
- RDF/RSS 1.0

### Content Handling
- Automatic encoding detection
- HTML content parsing
- Media content extraction
- Enclosure handling

### Metadata Support
- Feed title, link, and description
- Publication dates
- Author information
- Categories and tags
- Media content and thumbnails

## API Reference

### Main Functions

- `parse(source)`: Parse feed from a source that can be URL or a string


### Feed Object Structure

The parser returns a `FastFeedParserDict` object with two main sections:

- `feed`: Contains feed-level metadata
- `entries`: List of feed entries

Each entry contains:
- `title`: Entry title
- `link`: Entry URL
- `description`: Entry description/summary
- `published`: Publication date
- `author`: Author information
- `content`: Full content
- `media_content`: Media attachments
- `enclosures`: Attached files

## Requirements

- Python 3.7+
- httpx
- lxml
- parsedatetime
- python-dateutil

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

Inspired by the Universal Feed Parser (feedparser) project, FastFeedParser aims to provide a modern, high-performance alternative while maintaining a familiar API.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kagi-search/fastfeedparser",
    "name": "fastfeedparser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Vladimir Prelovac",
    "author_email": "vlad@kagi.com",
    "download_url": "https://files.pythonhosted.org/packages/b0/6c/23004e7718547863c38feddd192d8897967677af85ccb62693357803fb46/fastfeedparser-0.2.2.tar.gz",
    "platform": null,
    "description": "# FastFeedParser\n\nA high-performance RSS, Atom, and RDF feed parser for Python. FastFeedParser is designed to be fast, memory-efficient, and easy to use while providing comprehensive feed parsing capabilities.\n\n### Why FastFeedParser?\n\nThe main advantage of FastFeedParser over the traditional feedparser library is its lightweight design and exceptional performance - benchmarks show it's 10x-100x faster than feedparser while maintaining a familiar API. This dramatic speed improvement is achieved through:\n\n- Efficient XML parsing using lxml\n- Optimized memory usage\n- Minimal dependencies\n- Streamlined codebase focused on core functionality\n\nFastFeedParser is used for high performance processing of thousands of feeds for [Kagi Small Web](https://github.com/kagisearch/smallweb) initiative.\n\n\n## Features\n\n- Fast parsing of RSS 2.0, Atom 1.0, and RDF/RSS 1.0 feeds\n- Robust error handling and encoding detection\n- Support for media content and enclosures\n- Automatic date parsing and standardization to UTC ISO 8601 format\n- Clean, Pythonic API similar to feedparser\n- Comprehensive handling of feed metadata\n- Support for various feed extensions (Media RSS, Dublin Core, etc.)\n\n\n## Installation\n\n```bash\npip install fastfeedparser\n```\n\n## Quick Start\n\n```python\nimport fastfeedparser\n\n# Parse from URL\nmyfeed = fastfeedparser.parse('https://example.com/feed.xml')\n\n# Parse from string\nxml_content = '''<?xml version=\"1.0\"?>\n<rss version=\"2.0\">\n    <channel>\n        <title>Example Feed</title>\n        ...\n    </channel>\n</rss>'''\nmyfeed = fastfeedparser.parse(xml_content)\n\n# Access feed global information\nprint(myfeed.feed.title)\nprint(myfeed.feed.link)\n\n# Access feed entries\nfor entry in myfeed.entries:\n    print(entry.title)\n    print(entry.link)\n    print(entry.published)\n```\n\n## Key Features\n\n### Feed Types Support\n- RSS 2.0\n- Atom 1.0\n- RDF/RSS 1.0\n\n### Content Handling\n- Automatic encoding detection\n- HTML content parsing\n- Media content extraction\n- Enclosure handling\n\n### Metadata Support\n- Feed title, link, and description\n- Publication dates\n- Author information\n- Categories and tags\n- Media content and thumbnails\n\n## API Reference\n\n### Main Functions\n\n- `parse(source)`: Parse feed from a source that can be URL or a string\n\n\n### Feed Object Structure\n\nThe parser returns a `FastFeedParserDict` object with two main sections:\n\n- `feed`: Contains feed-level metadata\n- `entries`: List of feed entries\n\nEach entry contains:\n- `title`: Entry title\n- `link`: Entry URL\n- `description`: Entry description/summary\n- `published`: Publication date\n- `author`: Author information\n- `content`: Full content\n- `media_content`: Media attachments\n- `enclosures`: Attached files\n\n## Requirements\n\n- Python 3.7+\n- httpx\n- lxml\n- parsedatetime\n- python-dateutil\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\nInspired by the Universal Feed Parser (feedparser) project, FastFeedParser aims to provide a modern, high-performance alternative while maintaining a familiar API.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "High performance RSS, Atom and RDF parser in Python",
    "version": "0.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/kagi-search/fastfeedparser/issues",
        "Homepage": "https://github.com/kagi-search/fastfeedparser"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "524deb658ebce08b242b644aae12a506dce5456d5a0d2ebf3b650107642685b4",
                "md5": "185618f48e6c220e7ca5607ee5afab06",
                "sha256": "a23ee5d777721d9d192016978679b5d0401481fa04de26ffa9b9a7daa71f0cdd"
            },
            "downloads": -1,
            "filename": "fastfeedparser-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "185618f48e6c220e7ca5607ee5afab06",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9225,
            "upload_time": "2024-11-15T01:43:04",
            "upload_time_iso_8601": "2024-11-15T01:43:04.858440Z",
            "url": "https://files.pythonhosted.org/packages/52/4d/eb658ebce08b242b644aae12a506dce5456d5a0d2ebf3b650107642685b4/fastfeedparser-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b06c23004e7718547863c38feddd192d8897967677af85ccb62693357803fb46",
                "md5": "9a913b895723683d5605b4221548b833",
                "sha256": "e81028bb9022e18fb6b15859ee10d7fd13c3328181cf65554837cef944fd63e2"
            },
            "downloads": -1,
            "filename": "fastfeedparser-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "9a913b895723683d5605b4221548b833",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 10635,
            "upload_time": "2024-11-15T01:43:06",
            "upload_time_iso_8601": "2024-11-15T01:43:06.194522Z",
            "url": "https://files.pythonhosted.org/packages/b0/6c/23004e7718547863c38feddd192d8897967677af85ccb62693357803fb46/fastfeedparser-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-15 01:43:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kagi-search",
    "github_project": "fastfeedparser",
    "github_not_found": true,
    "lcname": "fastfeedparser"
}
        
Elapsed time: 2.89805s