py-newscollector


Namepy-newscollector JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/elisemercury/NewsCollector
SummaryNewsCollector - Python script for automated collection of most relevant news articles of the day.
upload_time2023-02-10 23:47:45
maintainer
docs_urlNone
authorElise Landman
requires_python
licenseMIT
keywords news collection automation newsletter
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Python NewsCollector

As the internet has grown, the available **sources of information at our disposal have equally grown**. Nowadays, if you want to update yourself with the most important news of the day, you have a **vast variety of news sources** to choose from. Since we have that many news sources at our disposal, instead of manually going through all their content...

**Couldn't we let automation pick the top news stories from various newspapers for us, and nicely combine them into a newsletter?**

<p align="center"><b>This is what the Python NewsCollector can do for us!</b></p>

For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).

Read more about how the algorithm of NewsCollector works in [my Medium article](https://medium.com/@eliselandman/automated-news-article-collection-with-python-9267968c9ea).

-------

## Description

The Python NewsCollector lets you define a variety of news sources from which it will pick the **most relevant articles** and bundle these in a **nice HTML-based newsletter**. 

<p align="center">
View the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>
</p>

The NewsCollector algorithm **scrapes** the source links provided and compares the articles it found based on their **similarity**. If it finds multiple articles from different sources covering similar topics, these will be considered as being **relevant articles** and will be included in the output newsletter.

## Basic Usage

You can run the NewsCollector algorithm as follows:

```Python
from newscollector import *

newsletter = NewsCollector()
output = newsletter.create()
```

This will run the full NewsCollector pipeline by scraping the sources from the `sources.json` file and outputting an HTML newsletter.

The `output` object will hold the location path of the generated newsletter, so that you can easily retrieve it programmatically:

```Python
output
> 'C:\\Output\\Path\\newsletter.html'
```

## CLI Usage

The NewsCollector can also be run directly via the CLI with the following parameters:

```python
newscollector.py [-h] [-s [SOURCES]] [-n [NEWS_NAME]] [-d [NEWS_DATE]] 
                 [-t [TEMPLATE]] [-o [OUTPUT_FILENAME]] [-a [AUTO_OPEN]]
                 [-r [RETURN_DETAILS]]
```

## Output

The NewsCollector will output an **HTML newsletter** with the most **relevant articles** it found while scraping the sources provided. 

By default, the output newsletter will be **created as an HTML file** in the installation directory of your package, saved in the folder `rendered` under the filename `newsletter_YYYY-MM-DD.html`, where the date is the respective date the NewsCollector scraped its articles from. 

<p align="center">
View the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>
</p>

To adjust the default settings, please refer to [Additional Parameters](https://github.com/elisemercury/NewsCollector#additional-parameters).

## Additional Parameters

You can customize the NewsCollector algorithm with the following optional parameters:

```Python
newsletter = NewsCollector(sources="sources.json", news_name="Daily News Update", 
                           news_date=date.today(), template='newsletter.html', 
                           output_filename='default', auto_open=False, 
                           return_details=False)
```

-------

For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/elisemercury/NewsCollector",
    "name": "py-newscollector",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "news,collection,automation,newsletter",
    "author": "Elise Landman",
    "author_email": "elisejlandman@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/31/39/0d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c/py-newscollector-1.0.0.tar.gz",
    "platform": null,
    "description": "# Python NewsCollector\n\nAs the internet has grown, the available **sources of information at our disposal have equally grown**. Nowadays, if you want to update yourself with the most important news of the day, you have a **vast variety of news sources** to choose from. Since we have that many news sources at our disposal, instead of manually going through all their content...\n\n**Couldn't we let automation pick the top news stories from various newspapers for us, and nicely combine them into a newsletter?**\n\n<p align=\"center\"><b>This is what the Python NewsCollector can do for us!</b></p>\n\nFor a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).\n\nRead more about how the algorithm of NewsCollector works in [my Medium article](https://medium.com/@eliselandman/automated-news-article-collection-with-python-9267968c9ea).\n\n-------\n\n## Description\n\nThe Python NewsCollector lets you define a variety of news sources from which it will pick the **most relevant articles** and bundle these in a **nice HTML-based newsletter**. \n\n<p align=\"center\">\nView the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>\n</p>\n\nThe NewsCollector algorithm **scrapes** the source links provided and compares the articles it found based on their **similarity**. If it finds multiple articles from different sources covering similar topics, these will be considered as being **relevant articles** and will be included in the output newsletter.\n\n## Basic Usage\n\nYou can run the NewsCollector algorithm as follows:\n\n```Python\nfrom newscollector import *\n\nnewsletter = NewsCollector()\noutput = newsletter.create()\n```\n\nThis will run the full NewsCollector pipeline by scraping the sources from the `sources.json` file and outputting an HTML newsletter.\n\nThe `output` object will hold the location path of the generated newsletter, so that you can easily retrieve it programmatically:\n\n```Python\noutput\n> 'C:\\\\Output\\\\Path\\\\newsletter.html'\n```\n\n## CLI Usage\n\nThe NewsCollector can also be run directly via the CLI with the following parameters:\n\n```python\nnewscollector.py [-h] [-s [SOURCES]] [-n [NEWS_NAME]] [-d [NEWS_DATE]] \n                 [-t [TEMPLATE]] [-o [OUTPUT_FILENAME]] [-a [AUTO_OPEN]]\n                 [-r [RETURN_DETAILS]]\n```\n\n## Output\n\nThe NewsCollector will output an **HTML newsletter** with the most **relevant articles** it found while scraping the sources provided. \n\nBy default, the output newsletter will be **created as an HTML file** in the installation directory of your package, saved in the folder `rendered` under the filename `newsletter_YYYY-MM-DD.html`, where the date is the respective date the NewsCollector scraped its articles from. \n\n<p align=\"center\">\nView the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>\n</p>\n\nTo adjust the default settings, please refer to [Additional Parameters](https://github.com/elisemercury/NewsCollector#additional-parameters).\n\n## Additional Parameters\n\nYou can customize the NewsCollector algorithm with the following optional parameters:\n\n```Python\nnewsletter = NewsCollector(sources=\"sources.json\", news_name=\"Daily News Update\", \n                           news_date=date.today(), template='newsletter.html', \n                           output_filename='default', auto_open=False, \n                           return_details=False)\n```\n\n-------\n\nFor a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "NewsCollector - Python script for automated collection of most relevant news articles of the day.",
    "version": "1.0.0",
    "split_keywords": [
        "news",
        "collection",
        "automation",
        "newsletter"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "31390d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c",
                "md5": "51f116db1b845678bc37e001d45a290f",
                "sha256": "7db0891ba4457623dc5bb283d746379d0c83301681d242bef87ecf594fdb953a"
            },
            "downloads": -1,
            "filename": "py-newscollector-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "51f116db1b845678bc37e001d45a290f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 111889,
            "upload_time": "2023-02-10T23:47:45",
            "upload_time_iso_8601": "2023-02-10T23:47:45.187115Z",
            "url": "https://files.pythonhosted.org/packages/31/39/0d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c/py-newscollector-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-10 23:47:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "elisemercury",
    "github_project": "NewsCollector",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "py-newscollector"
}
        
Elapsed time: 0.04372s