# Python NewsCollector
As the internet has grown, the available **sources of information at our disposal have equally grown**. Nowadays, if you want to update yourself with the most important news of the day, you have a **vast variety of news sources** to choose from. Since we have that many news sources at our disposal, instead of manually going through all their content...
**Couldn't we let automation pick the top news stories from various newspapers for us, and nicely combine them into a newsletter?**
<p align="center"><b>This is what the Python NewsCollector can do for us!</b></p>
For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).
Read more about how the algorithm of NewsCollector works in [my Medium article](https://medium.com/@eliselandman/automated-news-article-collection-with-python-9267968c9ea).
-------
## Description
The Python NewsCollector lets you define a variety of news sources from which it will pick the **most relevant articles** and bundle these in a **nice HTML-based newsletter**.
<p align="center">
View the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>
</p>
The NewsCollector algorithm **scrapes** the source links provided and compares the articles it found based on their **similarity**. If it finds multiple articles from different sources covering similar topics, these will be considered as being **relevant articles** and will be included in the output newsletter.
## Basic Usage
You can run the NewsCollector algorithm as follows:
```Python
from newscollector import *
newsletter = NewsCollector()
output = newsletter.create()
```
This will run the full NewsCollector pipeline by scraping the sources from the `sources.json` file and outputting an HTML newsletter.
The `output` object will hold the location path of the generated newsletter, so that you can easily retrieve it programmatically:
```Python
output
> 'C:\\Output\\Path\\newsletter.html'
```
## CLI Usage
The NewsCollector can also be run directly via the CLI with the following parameters:
```python
newscollector.py [-h] [-s [SOURCES]] [-n [NEWS_NAME]] [-d [NEWS_DATE]]
[-t [TEMPLATE]] [-o [OUTPUT_FILENAME]] [-a [AUTO_OPEN]]
[-r [RETURN_DETAILS]]
```
## Output
The NewsCollector will output an **HTML newsletter** with the most **relevant articles** it found while scraping the sources provided.
By default, the output newsletter will be **created as an HTML file** in the installation directory of your package, saved in the folder `rendered` under the filename `newsletter_YYYY-MM-DD.html`, where the date is the respective date the NewsCollector scraped its articles from.
<p align="center">
View the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>
</p>
To adjust the default settings, please refer to [Additional Parameters](https://github.com/elisemercury/NewsCollector#additional-parameters).
## Additional Parameters
You can customize the NewsCollector algorithm with the following optional parameters:
```Python
newsletter = NewsCollector(sources="sources.json", news_name="Daily News Update",
news_date=date.today(), template='newsletter.html',
output_filename='default', auto_open=False,
return_details=False)
```
-------
For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).
Raw data
{
"_id": null,
"home_page": "https://github.com/elisemercury/NewsCollector",
"name": "py-newscollector",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "news,collection,automation,newsletter",
"author": "Elise Landman",
"author_email": "elisejlandman@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/31/39/0d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c/py-newscollector-1.0.0.tar.gz",
"platform": null,
"description": "# Python NewsCollector\n\nAs the internet has grown, the available **sources of information at our disposal have equally grown**. Nowadays, if you want to update yourself with the most important news of the day, you have a **vast variety of news sources** to choose from. Since we have that many news sources at our disposal, instead of manually going through all their content...\n\n**Couldn't we let automation pick the top news stories from various newspapers for us, and nicely combine them into a newsletter?**\n\n<p align=\"center\"><b>This is what the Python NewsCollector can do for us!</b></p>\n\nFor a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).\n\nRead more about how the algorithm of NewsCollector works in [my Medium article](https://medium.com/@eliselandman/automated-news-article-collection-with-python-9267968c9ea).\n\n-------\n\n## Description\n\nThe Python NewsCollector lets you define a variety of news sources from which it will pick the **most relevant articles** and bundle these in a **nice HTML-based newsletter**. \n\n<p align=\"center\">\nView the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>\n</p>\n\nThe NewsCollector algorithm **scrapes** the source links provided and compares the articles it found based on their **similarity**. If it finds multiple articles from different sources covering similar topics, these will be considered as being **relevant articles** and will be included in the output newsletter.\n\n## Basic Usage\n\nYou can run the NewsCollector algorithm as follows:\n\n```Python\nfrom newscollector import *\n\nnewsletter = NewsCollector()\noutput = newsletter.create()\n```\n\nThis will run the full NewsCollector pipeline by scraping the sources from the `sources.json` file and outputting an HTML newsletter.\n\nThe `output` object will hold the location path of the generated newsletter, so that you can easily retrieve it programmatically:\n\n```Python\noutput\n> 'C:\\\\Output\\\\Path\\\\newsletter.html'\n```\n\n## CLI Usage\n\nThe NewsCollector can also be run directly via the CLI with the following parameters:\n\n```python\nnewscollector.py [-h] [-s [SOURCES]] [-n [NEWS_NAME]] [-d [NEWS_DATE]] \n [-t [TEMPLATE]] [-o [OUTPUT_FILENAME]] [-a [AUTO_OPEN]]\n [-r [RETURN_DETAILS]]\n```\n\n## Output\n\nThe NewsCollector will output an **HTML newsletter** with the most **relevant articles** it found while scraping the sources provided. \n\nBy default, the output newsletter will be **created as an HTML file** in the installation directory of your package, saved in the folder `rendered` under the filename `newsletter_YYYY-MM-DD.html`, where the date is the respective date the NewsCollector scraped its articles from. \n\n<p align=\"center\">\nView the full sample newsletter in PDF format <a href=https://github.com/elisemercury/NewsCollector/blob/main/sample_newsletter.pdf>here.</a>\n</p>\n\nTo adjust the default settings, please refer to [Additional Parameters](https://github.com/elisemercury/NewsCollector#additional-parameters).\n\n## Additional Parameters\n\nYou can customize the NewsCollector algorithm with the following optional parameters:\n\n```Python\nnewsletter = NewsCollector(sources=\"sources.json\", news_name=\"Daily News Update\", \n news_date=date.today(), template='newsletter.html', \n output_filename='default', auto_open=False, \n return_details=False)\n```\n\n-------\n\nFor a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation).",
"bugtrack_url": null,
"license": "MIT",
"summary": "NewsCollector - Python script for automated collection of most relevant news articles of the day.",
"version": "1.0.0",
"split_keywords": [
"news",
"collection",
"automation",
"newsletter"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "31390d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c",
"md5": "51f116db1b845678bc37e001d45a290f",
"sha256": "7db0891ba4457623dc5bb283d746379d0c83301681d242bef87ecf594fdb953a"
},
"downloads": -1,
"filename": "py-newscollector-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "51f116db1b845678bc37e001d45a290f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 111889,
"upload_time": "2023-02-10T23:47:45",
"upload_time_iso_8601": "2023-02-10T23:47:45.187115Z",
"url": "https://files.pythonhosted.org/packages/31/39/0d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c/py-newscollector-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-10 23:47:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "elisemercury",
"github_project": "NewsCollector",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "py-newscollector"
}