# Article extraction library.
article-extraction is a package that can be used to extract the article content
from an HTML page.
# Installation
Use poetry to install the library from GitHub.
```bash
poetry add "git+https://github.com/pmatigakis/article-extraction.git"
```
# Usage
Extract the content of an article using article-extraction.
```python
from urllib.request import urlopen
from articles.mss.extractors import MSSArticleExtractor
document = urlopen("https://www.bbc.com/sport/formula1/64983451").read()
article_extractor = MSSArticleExtractor()
article = article_extractor.extract_article(document)
print(article)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/pmatigakis/article-extraction",
"name": "article-extraction",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "article extraction",
"author": "Matigakis Panagiotis",
"author_email": "pmatigakis@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/87/52/a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111/article-extraction-0.3.0.tar.gz",
"platform": null,
"description": "# Article extraction library.\n\narticle-extraction is a package that can be used to extract the article content\nfrom an HTML page.\n\n# Installation\n\nUse poetry to install the library from GitHub.\n\n```bash\npoetry add \"git+https://github.com/pmatigakis/article-extraction.git\"\n```\n\n# Usage\n\nExtract the content of an article using article-extraction.\n\n```python\nfrom urllib.request import urlopen\n\nfrom articles.mss.extractors import MSSArticleExtractor\n\ndocument = urlopen(\"https://www.bbc.com/sport/formula1/64983451\").read()\narticle_extractor = MSSArticleExtractor()\narticle = article_extractor.extract_article(document)\nprint(article)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Article text extraction library",
"version": "0.3.0",
"split_keywords": [
"article",
"extraction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7545f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1",
"md5": "039f5159a4a29260bf5093de47246a35",
"sha256": "b02bbb6daa433237058aabbed9e0373c58e00d8c9f4636923931db46ab7ce016"
},
"downloads": -1,
"filename": "article_extraction-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "039f5159a4a29260bf5093de47246a35",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 5055,
"upload_time": "2023-03-18T16:47:47",
"upload_time_iso_8601": "2023-03-18T16:47:47.855300Z",
"url": "https://files.pythonhosted.org/packages/75/45/f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1/article_extraction-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8752a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111",
"md5": "44da2496337a514e28b3c58ba342217f",
"sha256": "a1e5f3d4eb980f8c987bdce31b5d5bfebc385b0d0e8379237e4ec9a63ea2b699"
},
"downloads": -1,
"filename": "article-extraction-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "44da2496337a514e28b3c58ba342217f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 4097,
"upload_time": "2023-03-18T16:47:46",
"upload_time_iso_8601": "2023-03-18T16:47:46.441108Z",
"url": "https://files.pythonhosted.org/packages/87/52/a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111/article-extraction-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-18 16:47:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "pmatigakis",
"github_project": "article-extraction",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "article-extraction"
}