newspaperV3

Name	newspaperV3 JSON
Version	0.3.1 JSON
	download
home_page	https://github.com/salah55s/newspaperV3
Summary	Advanced news extraction, article parsing, and content analysis.
upload_time	2025-08-31 09:05:52
maintainer	None
docs_url	None
author	Lucas Ou-Yang
requires_python	<4.0,>=3.8
license	MIT
keywords	newspaper news article extraction scraping nlp content parsing
VCS
bugtrack_url
requirements	requests python-dotenv responses
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # newspaperV3

An advanced library for news extraction, article parsing, and content analysis. This is a fork/version based on the original `newspaper` library by Lucas Ou-Yang.

## Installation

Install the package using pip:

```bash
pip install newspaperV3
```

## Basic Usage

Here's a simple example of how to download and parse an article:

```python
from newspaperV3 import Article
import nltk

# NLTK data is required for the first run
# nltk.download('punkt')

url = 'https://edition.cnn.com/2025/07/29/middleeast/israeli-settler-odeh-hathalin-west-bank-oscar-intl'

# Create an Article object
article = Article(url)

# Download and parse the article
article.download()
article.parse()

# Perform Natural Language Processing (NLP)
article.nlp()

# Print the results
print("Title:", article.title)
print("Authors:", article.authors)
print("Publish Date:", article.publish_date)
print("Top Image:", article.top_image)
print("\nSummary:")
print(article.summary)
print("\nKeywords:", article.keywords)
```

## Features

* **Article Extraction** : Automatically extract clean article text from web pages
* **Metadata Parsing** : Extract titles, authors, publication dates, and images
* **Natural Language Processing** : Generate summaries and extract keywords
* **Multi-language Support** : Process articles in various languages
* **Image Processing** : Extract and analyze article images
* **Content Analysis** : Advanced text processing and analysis capabilities

## Requirements

* Python 3.6+
* NLTK (for natural language processing)
* Additional dependencies installed automatically

## License

This project is licensed under the MIT License.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/salah55s/newspaperV3",
    "name": "newspaperV3",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "newspaper, news, article, extraction, scraping, nlp, content, parsing",
    "author": "Lucas Ou-Yang",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7d/b0/30a0009a04923da0cd79540c54337b9bba6bd9109b5e85d341d5057c18e0/newspaperv3-0.3.1.tar.gz",
    "platform": null,
    "description": "# newspaperV3\n\nAn advanced library for news extraction, article parsing, and content analysis. This is a fork/version based on the original `newspaper` library by Lucas Ou-Yang.\n\n## Installation\n\nInstall the package using pip:\n\n```bash\npip install newspaperV3\n```\n\n## Basic Usage\n\nHere's a simple example of how to download and parse an article:\n\n```python\nfrom newspaperV3 import Article\nimport nltk\n\n# NLTK data is required for the first run\n# nltk.download('punkt')\n\nurl = 'https://edition.cnn.com/2025/07/29/middleeast/israeli-settler-odeh-hathalin-west-bank-oscar-intl'\n\n# Create an Article object\narticle = Article(url)\n\n# Download and parse the article\narticle.download()\narticle.parse()\n\n# Perform Natural Language Processing (NLP)\narticle.nlp()\n\n# Print the results\nprint(\"Title:\", article.title)\nprint(\"Authors:\", article.authors)\nprint(\"Publish Date:\", article.publish_date)\nprint(\"Top Image:\", article.top_image)\nprint(\"\\nSummary:\")\nprint(article.summary)\nprint(\"\\nKeywords:\", article.keywords)\n```\n\n## Features\n\n* **Article Extraction** : Automatically extract clean article text from web pages\n* **Metadata Parsing** : Extract titles, authors, publication dates, and images\n* **Natural Language Processing** : Generate summaries and extract keywords\n* **Multi-language Support** : Process articles in various languages\n* **Image Processing** : Extract and analyze article images\n* **Content Analysis** : Advanced text processing and analysis capabilities\n\n## Requirements\n\n* Python 3.6+\n* NLTK (for natural language processing)\n* Additional dependencies installed automatically\n\n## License\n\nThis project is licensed under the MIT License.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Advanced news extraction, article parsing, and content analysis.",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/salah55s/newspaperV3",
        "Repository": "https://github.com/salah55s/newspaperV3"
    },
    "split_keywords": [
        "newspaper",
        " news",
        " article",
        " extraction",
        " scraping",
        " nlp",
        " content",
        " parsing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40e3308a19170d111d8f357d5b464dcfb5048e3b7c3d03603ccedfcd584a2a38",
                "md5": "7d76bbe7405b1e416d20b3d536ac9bc4",
                "sha256": "9cc956032d3cc063536e1150575b87778c870b9d205dab9ae829af318d39bf21"
            },
            "downloads": -1,
            "filename": "newspaperv3-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7d76bbe7405b1e416d20b3d536ac9bc4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 225201,
            "upload_time": "2025-08-31T09:05:49",
            "upload_time_iso_8601": "2025-08-31T09:05:49.653282Z",
            "url": "https://files.pythonhosted.org/packages/40/e3/308a19170d111d8f357d5b464dcfb5048e3b7c3d03603ccedfcd584a2a38/newspaperv3-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7db030a0009a04923da0cd79540c54337b9bba6bd9109b5e85d341d5057c18e0",
                "md5": "2d8b93393b0edab5917e1baaaff967aa",
                "sha256": "4de1f1f9e67ceb67aff2f36c314f9119492696585c2394323ab12ebd8c16196d"
            },
            "downloads": -1,
            "filename": "newspaperv3-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2d8b93393b0edab5917e1baaaff967aa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 213900,
            "upload_time": "2025-08-31T09:05:52",
            "upload_time_iso_8601": "2025-08-31T09:05:52.568128Z",
            "url": "https://files.pythonhosted.org/packages/7d/b0/30a0009a04923da0cd79540c54337b9bba6bd9109b5e85d341d5057c18e0/newspaperv3-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-31 09:05:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "salah55s",
    "github_project": "newspaperV3",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "responses",
            "specs": []
        }
    ],
    "tox": true,
    "lcname": "newspaperv3"
}

Lucas Ou-Yang