newspaper4k

Name	newspaper4k JSON
Version	0.9.3 JSON
	download
home_page	https://github.com/AndyTheFactory/newspaper4k
Summary	Simplified python article discovery & extraction.
upload_time	2024-03-18 00:11:14
maintainer
docs_url	None
author	Andrei Paraschiv
requires_python	>=3.8,<4.0
license	MIT
keywords	nlp scraping newspaper article curation extraction
VCS
bugtrack_url
requirements	beautifulsoup4 certifi charset-normalizer click colorama feedparser filelock idna jieba joblib lxml nltk pillow pythainlp python-crfsuite python-dateutil pyyaml regex requests-file requests sgmllib3k six soupsieve tinydb tinysegmenter tldextract tqdm urllib3
Travis-CI
coveralls test coverage	No coveralls.

            # Newspaper4k: Article Scraping & Curation, a continuation of the beloved newspaper3k by codelucas
[![PyPI version](https://badge.fury.io/py/newspaper4k.svg)](https://badge.fury.io/py/newspaper4k)
![Build status](https://github.com/AndyTheFactory/newspaper4k/actions/workflows/pipeline.yml/badge.svg)
[![Coverage status](https://coveralls.io/repos/github/AndyTheFactory/newspaper4k/badge.svg?branch=master)](https://coveralls.io/github/AndyTheFactory/newspaper4k)
[![Documentation Status](https://readthedocs.org/projects/newspaper4k/badge/?version=latest)](https://newspaper4k.readthedocs.io/en/latest/)

At the moment the Newspaper4k Project is a fork of the well known newspaper3k  by [codelucas](https://github.com/codelucas/newspaper) which was not updated since September 2020. The initial goal of this fork is to keep the project alive and to add new features and fix bugs.

I have duplicated all issues on the original project and will try to fix them. If you have any issues or feature requests please open an issue here.

| <!-- -->    | <!-- -->    |
|-------------|-------------|
| **Experimental ChatGPT helper bot for Newspaper4k:**         | [![ChatGPT helper](docs/user_guide/assets/chatgpt_chat200x75.png)](https://chat.openai.com/g/g-OxSqyKAhi-newspaper-4k-gpt)|



## Python compatibility
    - Python 3.8+ minimum

# Quick start

``` bash
pip install newspaper4k
```

## Using the CLI

You can start directly from the command line, using the included CLI:
``` bash
python -m newspaper --url="https://edition.cnn.com/2023/11/17/success/job-seekers-use-ai/index.html" --language=en --output-format=json --output-file=article.json

```
More information about the CLI can be found in the [CLI documentation](https://newspaper4k.readthedocs.io/en/latest/user_guide/cli_reference.html).
## Using the Python API

Alternatively, you can use Newspaper4k in Python:

### Processing one article / url at a time

``` python
import newspaper

article = newspaper.article('https://edition.cnn.com/2023/10/29/sport/nfl-week-8-how-to-watch-spt-intl/index.html')

print(article.authors)
# ['Hannah Brewitt', 'Minute Read', 'Published', 'Am Edt', 'Sun October']

print(article.publish_date)
# 2023-10-29 09:00:15.717000+00:00

print(article.text)
# New England Patriots head coach Bill Belichick, right, embraces Buffalo Bills head coach Sean McDermott ...

print(article.top_image)
# https://media.cnn.com/api/v1/images/stellar/prod/231015223702-06-nfl-season-gallery-1015.jpg?c=16x9&q=w_800,c_fill

print(article.movies)
# []

article.nlp()
print(article.keywords)
# ['broncos', 'game', 'et', 'wide', 'chiefs', 'mahomes', 'patrick', 'denver', 'nfl', 'stadium', 'week', 'quarterback', 'win', 'history', 'images']

print(article.summary)
# Kevin Sabitus/Getty Images Denver Broncos running back Javonte Williams evades Green Bay Packers safety Darnell Savage, bottom.
# Kathryn Riley/Getty Images Kansas City Chiefs quarterback Patrick Mahomes calls a play during the Chiefs' 19-8 Thursday Night Football win over the Denver Broncos on October 12.
# Paul Sancya/AP New York Jets running back Breece Hall carries the ball during a game against the Denver Broncos.
# The Broncos have not beaten the Chiefs since 2015, and have never beaten Chiefs quarterback Patrick Mahomes.
# Australia: NFL+, ESPN, 7Plus Brazil: NFL+, ESPN Canada: NFL+, CTV, TSN, RDS Germany: NFL+, ProSieben MAXX, DAZN Mexico: NFL+, TUDN, ESPN, Fox Sports, Sky Sports UK: NFL+, Sky Sports, ITV, Channel 5 US: NFL+, CBS, NBC, FOX, ESPN, Amazon Prime

```

## Parsing and scraping whole News Sources (websites) using the Source Class

This way you can build a Source object from a newspaper websites. This class will allow you to get all the articles and categories on the website. When you build the source, articles are not yet downloaded. The `build()` call will  parse front page, will detect category links (if possible), get any RSS feeds published by the news site, and will create a list of article links.
You need to call `download_articles()` to download the articles, but note that it can take a significant time.

`download_articles()` will download the articles in a multithreaded fashion using `ThreadPoolExecutor` from the `concurrent` package. The number of concurrent threads can be configured in `Configuration`.`number_threads` or passed as an argument to `build()`.


``` python
import newspaper

cnn_paper = newspaper.build('http://cnn.com', number_threads=3)
print(cnn_paper.category_urls())

# ['https://cnn.com', 'https://money.cnn.com', 'https://arabic.cnn.com',
# 'https://cnnespanol.cnn.com', 'http://edition.cnn.com',
# 'https://edition.cnn.com', 'https://us.cnn.com', 'https://www.cnn.com']

article_urls = [article.url for article in cnn_paper.articles]
print(article_urls[:3])
# ['https://arabic.cnn.com/middle-east/article/2023/10/30/number-of-hostages-held-in-gaza-now-up-to-239-idf-spokesperson',
# 'https://arabic.cnn.com/middle-east/video/2023/10/30/v146619-sotu-sullivan-hostage-negotiations',
# 'https://arabic.cnn.com/middle-east/article/2023/10/29/norwegian-pm-israel-gaza']


article = cnn_paper.articles[0]
article.download()
article.parse()

print(article.title)
# المتحدث باسم الجيش الإسرائيلي: عدد الرهائن المحتجزين في غزة يصل إلى

```
Or if you want to get bulk articles from the website (have in mind that this could take a long time and could get your IP blocked by the newssite):

``` python
import newspaper

cnn_source = newspaper.build('http://cnn.com', number_threads=3)

print(len(newspaper.article_urls))

articles = source.download_articles()

print(len(articles))

print(articles[0].title)
```

## Multilanguage features

Newspaper can extract and detect languages *seamlessly* based on the article meta tags. Additionally, you can specify the language for the website / article.  If no language is specified, Newspaper will attempt to auto detect a language from the available meta data. The fallback language is English.

Language detection is crucial for accurate article extraction. If the wrong language is detected or provided, chances are that no article text will be returned. Before parsing, check that your language is supported by our package.

``` python
from newspaper import Article

article = Article('https://www.bbc.com/zhongwen/simp/chinese-news-67084358')
article.download()
article.parse()

print(article.title)
# 晶片大战：台湾厂商助攻华为突破美国封锁？

if article.config.use_meta_language:
  # If we use the autodetected language, this config attribute will be true
  print(article.meta_lang)
else:
  print(article.config.language)

# zh
```

# Docs

Check out [The Docs](https://newspaper4k.readthedocs.io) for full and
detailed guides using newspaper.

# Features

-   Multi-threaded article download framework
-   Newspaper category detection
-   News url identification
-   Text extraction from html
-   Top image extraction from html
-   All image extraction from html
-   Keyword building from the extracted text
-   Autoatic article text summarization
-   Author extraction from text
-   Easy to use Command Line Interface (`python -m newspaper....`)
-   Output in various formats (json, csv, text)
-   Works in 10+ languages (English, Chinese, German, Arabic, \...)

# Evaluation

## Evaluation Results


Using the dataset from [ScrapingHub](https://github.com/scrapinghub/article-extraction-benchmark) I created an [evaluator script](tests/evaluation/evaluate.py) that compares the performance of newspaper against it's previous versions. This way we can see how newspaper updates improve or worsen the performance of the library.

| Version            | Corpus BLEU Score | Corpus Precision Score | Corpus Recall Score | Corpus F1 Score |
|--------------------|-------------------|------------------------|---------------------|-----------------|
| Newspaper3k 0.2.8  | 0.8660            | 0.9128                 | 0.9071              | 0.9100          |
| Newspaper4k 0.9.0  | 0.9212            | 0.8992                 | 0.9336              | 0.9161          |
| Newspaper4k 0.9.1  | 0.9224            | 0.8895                 | 0.9242              | 0.9065          |
| Newspaper4k 0.9.2  | 0.9426            | 0.9070                 | 0.9087              | 0.9078          |

Precision, Recall and F1 are computed using overlap of shingles with n-grams of size 4. The corpus BLEU score is computed using the [nltk's bleu_score](https://www.nltk.org/api/nltk.translate.bleu).

# Requirements and dependencies

Following system packages are required:

-   **Pillow**: `libjpeg-dev` `zlib1g-dev` `libpng12-dev`
-   **Lxml**: `libxml2-dev` `libxslt-dev`
-   Python Development version: `python-dev`


**If you are on Debian / Ubuntu**, install using the following:

-   Install `python3` and `python3-dev`:

        $ sudo apt-get install python3 python3-dev

-   Install `pip3` command needed to install `newspaper4k` package:

        $ sudo apt-get install python3-pip

-   lxml requirements:

        $ sudo apt-get install libxml2-dev libxslt-dev

-   For PIL to recognize .jpg images:

        $ sudo apt-get install libjpeg-dev zlib1g-dev libpng12-dev

NOTE: If you find problem installing `libpng12-dev`, try installing
`libpng-dev`.

-   Install the distribution via pip:

        $ pip3 install newspaper4k


**If you are on OSX**, install using the following, you may use both
homebrew or macports:

    $ brew install libxml2 libxslt

    $ brew install libtiff libjpeg webp little-cms2

    $ pip3 install newspaper4k


# Contributing

see [CONTRIBUTING.md](CONTRIBUTING.md)

# LICENSE

Authored and maintained by [Andrei Paraschiv](https://github.com/AndyTheFactory).

Newspaper was originally developed by Lucas Ou-Yang ([codelucas](https://codelucas.com/)), the original repository can be found [here](https://github.com/codelucas/newspaper). Newspaper is licensed under the MIT license.

# Credits
Thanks to Lucas Ou-Yang for creating the original Newspaper3k project and to all contributors of the original project.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AndyTheFactory/newspaper4k",
    "name": "newspaper4k",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "nlp,scraping,newspaper,article,curation,extraction",
    "author": "Andrei Paraschiv",
    "author_email": "andrei@thephpfactory.com",
    "download_url": "https://files.pythonhosted.org/packages/41/1c/003ad6cf29504d776f8d5cd1ccd7153507fa5c2968e226b723e1bd0a0087/newspaper4k-0.9.3.tar.gz",
    "platform": null,
    "description": "# Newspaper4k: Article Scraping & Curation, a continuation of the beloved newspaper3k by codelucas\n[![PyPI version](https://badge.fury.io/py/newspaper4k.svg)](https://badge.fury.io/py/newspaper4k)\n![Build status](https://github.com/AndyTheFactory/newspaper4k/actions/workflows/pipeline.yml/badge.svg)\n[![Coverage status](https://coveralls.io/repos/github/AndyTheFactory/newspaper4k/badge.svg?branch=master)](https://coveralls.io/github/AndyTheFactory/newspaper4k)\n[![Documentation Status](https://readthedocs.org/projects/newspaper4k/badge/?version=latest)](https://newspaper4k.readthedocs.io/en/latest/)\n\nAt the moment the Newspaper4k Project is a fork of the well known newspaper3k  by [codelucas](https://github.com/codelucas/newspaper) which was not updated since September 2020. The initial goal of this fork is to keep the project alive and to add new features and fix bugs.\n\nI have duplicated all issues on the original project and will try to fix them. If you have any issues or feature requests please open an issue here.\n\n| <!-- -->    | <!-- -->    |\n|-------------|-------------|\n| **Experimental ChatGPT helper bot for Newspaper4k:**         | [![ChatGPT helper](docs/user_guide/assets/chatgpt_chat200x75.png)](https://chat.openai.com/g/g-OxSqyKAhi-newspaper-4k-gpt)|\n\n\n\n## Python compatibility\n    - Python 3.8+ minimum\n\n# Quick start\n\n``` bash\npip install newspaper4k\n```\n\n## Using the CLI\n\nYou can start directly from the command line, using the included CLI:\n``` bash\npython -m newspaper --url=\"https://edition.cnn.com/2023/11/17/success/job-seekers-use-ai/index.html\" --language=en --output-format=json --output-file=article.json\n\n```\nMore information about the CLI can be found in the [CLI documentation](https://newspaper4k.readthedocs.io/en/latest/user_guide/cli_reference.html).\n## Using the Python API\n\nAlternatively, you can use Newspaper4k in Python:\n\n### Processing one article / url at a time\n\n``` python\nimport newspaper\n\narticle = newspaper.article('https://edition.cnn.com/2023/10/29/sport/nfl-week-8-how-to-watch-spt-intl/index.html')\n\nprint(article.authors)\n# ['Hannah Brewitt', 'Minute Read', 'Published', 'Am Edt', 'Sun October']\n\nprint(article.publish_date)\n# 2023-10-29 09:00:15.717000+00:00\n\nprint(article.text)\n# New England Patriots head coach Bill Belichick, right, embraces Buffalo Bills head coach Sean McDermott ...\n\nprint(article.top_image)\n# https://media.cnn.com/api/v1/images/stellar/prod/231015223702-06-nfl-season-gallery-1015.jpg?c=16x9&q=w_800,c_fill\n\nprint(article.movies)\n# []\n\narticle.nlp()\nprint(article.keywords)\n# ['broncos', 'game', 'et', 'wide', 'chiefs', 'mahomes', 'patrick', 'denver', 'nfl', 'stadium', 'week', 'quarterback', 'win', 'history', 'images']\n\nprint(article.summary)\n# Kevin Sabitus/Getty Images Denver Broncos running back Javonte Williams evades Green Bay Packers safety Darnell Savage, bottom.\n# Kathryn Riley/Getty Images Kansas City Chiefs quarterback Patrick Mahomes calls a play during the Chiefs' 19-8 Thursday Night Football win over the Denver Broncos on October 12.\n# Paul Sancya/AP New York Jets running back Breece Hall carries the ball during a game against the Denver Broncos.\n# The Broncos have not beaten the Chiefs since 2015, and have never beaten Chiefs quarterback Patrick Mahomes.\n# Australia: NFL+, ESPN, 7Plus Brazil: NFL+, ESPN Canada: NFL+, CTV, TSN, RDS Germany: NFL+, ProSieben MAXX, DAZN Mexico: NFL+, TUDN, ESPN, Fox Sports, Sky Sports UK: NFL+, Sky Sports, ITV, Channel 5 US: NFL+, CBS, NBC, FOX, ESPN, Amazon Prime\n\n```\n\n## Parsing and scraping whole News Sources (websites) using the Source Class\n\nThis way you can build a Source object from a newspaper websites. This class will allow you to get all the articles and categories on the website. When you build the source, articles are not yet downloaded. The `build()` call will  parse front page, will detect category links (if possible), get any RSS feeds published by the news site, and will create a list of article links.\nYou need to call `download_articles()` to download the articles, but note that it can take a significant time.\n\n`download_articles()` will download the articles in a multithreaded fashion using `ThreadPoolExecutor` from the `concurrent` package. The number of concurrent threads can be configured in `Configuration`.`number_threads` or passed as an argument to `build()`.\n\n\n``` python\nimport newspaper\n\ncnn_paper = newspaper.build('http://cnn.com', number_threads=3)\nprint(cnn_paper.category_urls())\n\n# ['https://cnn.com', 'https://money.cnn.com', 'https://arabic.cnn.com',\n# 'https://cnnespanol.cnn.com', 'http://edition.cnn.com',\n# 'https://edition.cnn.com', 'https://us.cnn.com', 'https://www.cnn.com']\n\narticle_urls = [article.url for article in cnn_paper.articles]\nprint(article_urls[:3])\n# ['https://arabic.cnn.com/middle-east/article/2023/10/30/number-of-hostages-held-in-gaza-now-up-to-239-idf-spokesperson',\n# 'https://arabic.cnn.com/middle-east/video/2023/10/30/v146619-sotu-sullivan-hostage-negotiations',\n# 'https://arabic.cnn.com/middle-east/article/2023/10/29/norwegian-pm-israel-gaza']\n\n\narticle = cnn_paper.articles[0]\narticle.download()\narticle.parse()\n\nprint(article.title)\n# \u0627\u0644\u0645\u062a\u062d\u062f\u062b \u0628\u0627\u0633\u0645 \u0627\u0644\u062c\u064a\u0634 \u0627\u0644\u0625\u0633\u0631\u0627\u0626\u064a\u0644\u064a: \u0639\u062f\u062f \u0627\u0644\u0631\u0647\u0627\u0626\u0646 \u0627\u0644\u0645\u062d\u062a\u062c\u0632\u064a\u0646 \u0641\u064a \u063a\u0632\u0629 \u064a\u0635\u0644 \u0625\u0644\u0649\n\n```\nOr if you want to get bulk articles from the website (have in mind that this could take a long time and could get your IP blocked by the newssite):\n\n``` python\nimport newspaper\n\ncnn_source = newspaper.build('http://cnn.com', number_threads=3)\n\nprint(len(newspaper.article_urls))\n\narticles = source.download_articles()\n\nprint(len(articles))\n\nprint(articles[0].title)\n```\n\n## Multilanguage features\n\nNewspaper can extract and detect languages *seamlessly* based on the article meta tags. Additionally, you can specify the language for the website / article.  If no language is specified, Newspaper will attempt to auto detect a language from the available meta data. The fallback language is English.\n\nLanguage detection is crucial for accurate article extraction. If the wrong language is detected or provided, chances are that no article text will be returned. Before parsing, check that your language is supported by our package.\n\n``` python\nfrom newspaper import Article\n\narticle = Article('https://www.bbc.com/zhongwen/simp/chinese-news-67084358')\narticle.download()\narticle.parse()\n\nprint(article.title)\n# \u6676\u7247\u5927\u6218\uff1a\u53f0\u6e7e\u5382\u5546\u52a9\u653b\u534e\u4e3a\u7a81\u7834\u7f8e\u56fd\u5c01\u9501\uff1f\n\nif article.config.use_meta_language:\n  # If we use the autodetected language, this config attribute will be true\n  print(article.meta_lang)\nelse:\n  print(article.config.language)\n\n# zh\n```\n\n# Docs\n\nCheck out [The Docs](https://newspaper4k.readthedocs.io) for full and\ndetailed guides using newspaper.\n\n# Features\n\n-   Multi-threaded article download framework\n-   Newspaper category detection\n-   News url identification\n-   Text extraction from html\n-   Top image extraction from html\n-   All image extraction from html\n-   Keyword building from the extracted text\n-   Autoatic article text summarization\n-   Author extraction from text\n-   Easy to use Command Line Interface (`python -m newspaper....`)\n-   Output in various formats (json, csv, text)\n-   Works in 10+ languages (English, Chinese, German, Arabic, \\...)\n\n# Evaluation\n\n## Evaluation Results\n\n\nUsing the dataset from [ScrapingHub](https://github.com/scrapinghub/article-extraction-benchmark) I created an [evaluator script](tests/evaluation/evaluate.py) that compares the performance of newspaper against it's previous versions. This way we can see how newspaper updates improve or worsen the performance of the library.\n\n| Version            | Corpus BLEU Score | Corpus Precision Score | Corpus Recall Score | Corpus F1 Score |\n|--------------------|-------------------|------------------------|---------------------|-----------------|\n| Newspaper3k 0.2.8  | 0.8660            | 0.9128                 | 0.9071              | 0.9100          |\n| Newspaper4k 0.9.0  | 0.9212            | 0.8992                 | 0.9336              | 0.9161          |\n| Newspaper4k 0.9.1  | 0.9224            | 0.8895                 | 0.9242              | 0.9065          |\n| Newspaper4k 0.9.2  | 0.9426            | 0.9070                 | 0.9087              | 0.9078          |\n\nPrecision, Recall and F1 are computed using overlap of shingles with n-grams of size 4. The corpus BLEU score is computed using the [nltk's bleu_score](https://www.nltk.org/api/nltk.translate.bleu).\n\n# Requirements and dependencies\n\nFollowing system packages are required:\n\n-   **Pillow**: `libjpeg-dev` `zlib1g-dev` `libpng12-dev`\n-   **Lxml**: `libxml2-dev` `libxslt-dev`\n-   Python Development version: `python-dev`\n\n\n**If you are on Debian / Ubuntu**, install using the following:\n\n-   Install `python3` and `python3-dev`:\n\n        $ sudo apt-get install python3 python3-dev\n\n-   Install `pip3` command needed to install `newspaper4k` package:\n\n        $ sudo apt-get install python3-pip\n\n-   lxml requirements:\n\n        $ sudo apt-get install libxml2-dev libxslt-dev\n\n-   For PIL to recognize .jpg images:\n\n        $ sudo apt-get install libjpeg-dev zlib1g-dev libpng12-dev\n\nNOTE: If you find problem installing `libpng12-dev`, try installing\n`libpng-dev`.\n\n-   Install the distribution via pip:\n\n        $ pip3 install newspaper4k\n\n\n**If you are on OSX**, install using the following, you may use both\nhomebrew or macports:\n\n    $ brew install libxml2 libxslt\n\n    $ brew install libtiff libjpeg webp little-cms2\n\n    $ pip3 install newspaper4k\n\n\n# Contributing\n\nsee [CONTRIBUTING.md](CONTRIBUTING.md)\n\n# LICENSE\n\nAuthored and maintained by [Andrei Paraschiv](https://github.com/AndyTheFactory).\n\nNewspaper was originally developed by Lucas Ou-Yang ([codelucas](https://codelucas.com/)), the original repository can be found [here](https://github.com/codelucas/newspaper). Newspaper is licensed under the MIT license.\n\n# Credits\nThanks to Lucas Ou-Yang for creating the original Newspaper3k project and to all contributors of the original project.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simplified python article discovery & extraction.",
    "version": "0.9.3",
    "project_urls": {
        "Documentation": "https://newspaper4k.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/AndyTheFactory/newspaper4k",
        "Repository": "https://github.com/AndyTheFactory/newspaper4k"
    },
    "split_keywords": [
        "nlp",
        "scraping",
        "newspaper",
        "article",
        "curation",
        "extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "723e9a3b8d2be42e09bc13cbe19ada20f08623f25c91cbfe0181085d85393da3",
                "md5": "79b687c814edbc214a736e6857309df3",
                "sha256": "63f33552cc339521976ab86a40cf24bdc9ba1aa3e0d8bb8020eefd1a541e438f"
            },
            "downloads": -1,
            "filename": "newspaper4k-0.9.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "79b687c814edbc214a736e6857309df3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 295956,
            "upload_time": "2024-03-18T00:11:11",
            "upload_time_iso_8601": "2024-03-18T00:11:11.968323Z",
            "url": "https://files.pythonhosted.org/packages/72/3e/9a3b8d2be42e09bc13cbe19ada20f08623f25c91cbfe0181085d85393da3/newspaper4k-0.9.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "411c003ad6cf29504d776f8d5cd1ccd7153507fa5c2968e226b723e1bd0a0087",
                "md5": "79b53e8a97da877b24af1aff68e275f8",
                "sha256": "f2047b0a092867ee3e8deca7846386860e3c55787f83112a456166c24017c1f3"
            },
            "downloads": -1,
            "filename": "newspaper4k-0.9.3.tar.gz",
            "has_sig": false,
            "md5_digest": "79b53e8a97da877b24af1aff68e275f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 271780,
            "upload_time": "2024-03-18T00:11:14",
            "upload_time_iso_8601": "2024-03-18T00:11:14.127471Z",
            "url": "https://files.pythonhosted.org/packages/41/1c/003ad6cf29504d776f8d5cd1ccd7153507fa5c2968e226b723e1bd0a0087/newspaper4k-0.9.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-18 00:11:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AndyTheFactory",
    "github_project": "newspaper4k",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2023.11.17"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "2.0.12"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.0.4"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.5"
                ]
            ]
        },
        {
            "name": "feedparser",
            "specs": [
                [
                    "==",
                    "6.0.10"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    "==",
                    "3.4.1"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.4"
                ]
            ]
        },
        {
            "name": "jieba",
            "specs": [
                [
                    "==",
                    "0.42.1"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "4.9.3"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    "==",
                    "3.6.7"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "8.4.0"
                ]
            ]
        },
        {
            "name": "pythainlp",
            "specs": [
                [
                    "==",
                    "2.3.2"
                ]
            ]
        },
        {
            "name": "python-crfsuite",
            "specs": [
                [
                    "==",
                    "0.9.9"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2023.8.8"
                ]
            ]
        },
        {
            "name": "requests-file",
            "specs": [
                [
                    "==",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.27.1"
                ]
            ]
        },
        {
            "name": "sgmllib3k",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "soupsieve",
            "specs": [
                [
                    "==",
                    "2.3.2.post1"
                ]
            ]
        },
        {
            "name": "tinydb",
            "specs": [
                [
                    "==",
                    "4.7.0"
                ]
            ]
        },
        {
            "name": "tinysegmenter",
            "specs": [
                [
                    "==",
                    "0.4"
                ]
            ]
        },
        {
            "name": "tldextract",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.64.1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "1.26.18"
                ]
            ]
        }
    ],
    "lcname": "newspaper4k"
}

Andrei Paraschiv