article-extraction


Namearticle-extraction JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/pmatigakis/article-extraction
SummaryArticle text extraction library
upload_time2023-03-18 16:47:46
maintainer
docs_urlNone
authorMatigakis Panagiotis
requires_python>=3.8,<4.0
licenseMIT
keywords article extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Article extraction library.

article-extraction is a package that can be used to extract the article content
from an HTML page.

# Installation

Use poetry to install the library from GitHub.

```bash
poetry add "git+https://github.com/pmatigakis/article-extraction.git"
```

# Usage

Extract the content of an article using article-extraction.

```python
from urllib.request import urlopen

from articles.mss.extractors import MSSArticleExtractor

document = urlopen("https://www.bbc.com/sport/formula1/64983451").read()
article_extractor = MSSArticleExtractor()
article = article_extractor.extract_article(document)
print(article)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pmatigakis/article-extraction",
    "name": "article-extraction",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "article extraction",
    "author": "Matigakis Panagiotis",
    "author_email": "pmatigakis@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/87/52/a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111/article-extraction-0.3.0.tar.gz",
    "platform": null,
    "description": "# Article extraction library.\n\narticle-extraction is a package that can be used to extract the article content\nfrom an HTML page.\n\n# Installation\n\nUse poetry to install the library from GitHub.\n\n```bash\npoetry add \"git+https://github.com/pmatigakis/article-extraction.git\"\n```\n\n# Usage\n\nExtract the content of an article using article-extraction.\n\n```python\nfrom urllib.request import urlopen\n\nfrom articles.mss.extractors import MSSArticleExtractor\n\ndocument = urlopen(\"https://www.bbc.com/sport/formula1/64983451\").read()\narticle_extractor = MSSArticleExtractor()\narticle = article_extractor.extract_article(document)\nprint(article)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Article text extraction library",
    "version": "0.3.0",
    "split_keywords": [
        "article",
        "extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7545f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1",
                "md5": "039f5159a4a29260bf5093de47246a35",
                "sha256": "b02bbb6daa433237058aabbed9e0373c58e00d8c9f4636923931db46ab7ce016"
            },
            "downloads": -1,
            "filename": "article_extraction-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "039f5159a4a29260bf5093de47246a35",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 5055,
            "upload_time": "2023-03-18T16:47:47",
            "upload_time_iso_8601": "2023-03-18T16:47:47.855300Z",
            "url": "https://files.pythonhosted.org/packages/75/45/f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1/article_extraction-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8752a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111",
                "md5": "44da2496337a514e28b3c58ba342217f",
                "sha256": "a1e5f3d4eb980f8c987bdce31b5d5bfebc385b0d0e8379237e4ec9a63ea2b699"
            },
            "downloads": -1,
            "filename": "article-extraction-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "44da2496337a514e28b3c58ba342217f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 4097,
            "upload_time": "2023-03-18T16:47:46",
            "upload_time_iso_8601": "2023-03-18T16:47:46.441108Z",
            "url": "https://files.pythonhosted.org/packages/87/52/a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111/article-extraction-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-18 16:47:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "pmatigakis",
    "github_project": "article-extraction",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "article-extraction"
}
        
Elapsed time: 0.18562s