![Pypi Publish](https://github.com/ejtraderLabs/ejtraderNS/actions/workflows/python-publish.yml/badge.svg)
![GitHub release (latest by date)](https://img.shields.io/github/v/release/ejtraderLabs/ejtraderNS)
[![License](https://img.shields.io/github/license/ejtraderLabs/ejtraderNS)](https://github.com/ejtraderLabs/ejtraderNS/blob/main/LICENSE)
# ejtraderNS
**Programmatically collect normalized news from (almost) any website.**
Filter by **topic**, **country**, or **language**.
## Installation
`pip install ejtraderNS --upgrade`
## Quick Start
```python
from ejtraderNS import Client
```
Get the latest news from [nytimes.com](https://www.nytimes.com/)
(_we support thousands of news websites, try yourself!_) main news feed
```python
api = ejtraderNS(website = 'nytimes.com')
results = api.get_news()
# results.keys()
# 'url', 'topic', 'language', 'country', 'articles'
# Get the articles
articles = results['articles']
first_article_summary = articles[0]['summary']
first_article_title = articles[0]['title']
```
Get the latest news from [nytimes.com](https://www.nytimes.com/) **politics** feed
```python
api = ejtraderNS(website = 'nytimes.com', topic = 'politics')
results = api.get_news()
articles = results['articles']
```
Some websites support multiple countries, such as [investing.com](https://www.investing.com) or [tradingeconomics.com](https://www.tradingeconomics.com)
In this example, I will demonstrate a website that supports multiple countries,
retrieve multiple topics, and convert the data into a pandas dataframe.
```python
import pandas as pd
from ejtraderNS import Client
from datetime import datetime
url = 'investing.com' # or tradingeconomics.com
country = 'GB'
country_topic = ["finance","news","economics"]
dfs = []
for topic in country_topic:
api = Client(website=url, topic=topic, country=country)
getdata = api.get_news()
print(f"topic: {topic}")
if getdata is None:
continue
data = []
for article in getdata['articles']:
article_data = {
'topic': getdata['topic'],
'author': article['author'],
'date': article['published_parsed'] if article['published_parsed'] else article['published'],
'country': getdata['country'],
'language': getdata['language'],
'title': article['title'],
'summary': article.get('summary', article['title'])
}
data.append(article_data)
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'].apply(lambda x: datetime(*x[:6]) if isinstance(x, tuple) else x), utc=True, errors='coerce')
df.set_index('date', inplace=True)
dfs.append(df)
df = pd.concat(dfs)
df.sort_index(inplace=True)
print(df)
```
output example:
| topic | author | country | language | title | summary |
|:----------|:--------------|:----------|:-----------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
| finance | Reuters | GB | en | Italy pushes to limit executive pay in listed state-run firms | Italy pushes to limit executive pay in listed state-run firms |
| economics | Reuters | GB | en | UK's Cleverly raises Xinjiang and Taiwan with Chinese vice president | UK's Cleverly raises Xinjiang and Taiwan with Chinese vice president |
| news | Reuters | GB | en | Ukraine hails return of 45 Azov fighters, Russia says 3 pilots released | Ukraine hails return of 45 Azov fighters, Russia says 3 pilots released |
There is a limited set of topic that you might find:
``` 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world' ```
extras topics only for [investing.com](https://www.investing.com)
``` 'crypto', 'forex', 'stock', 'commodities', 'central_bank', 'forex_analysis', 'forex_technical', 'forex_fundamental', 'forex_opinion', 'forex_signal', 'bonds_analysis', 'bonds_technical', 'bonds_fundamental', 'bonds_opinion', 'bonds_strategy', 'bonds_government', 'bonds_corporate', 'stock_analysis', 'stock_technical', 'stock_fundamental', 'stock_opinion', 'stock_picks', 'indices_analysis', 'futures_analysis', 'options_analysis', 'commodities_analysis', 'commodities_technical', 'commodities_Fundamental', 'commodities_opinion', 'commodities_strategy', 'commodities_metals', 'commodities_energy', 'commodities_agriculture', 'overview_analysis', 'overview_technical', 'overview_fundamental', 'overview_opinion', 'overview_investing', 'crypto_opinion'```
However, not all topics are supported by every newspaper.
How to check which topics are supported by which newspaper:
```python
from ejtraderNS import describe_url
describe = describe_url('nytimes.com')
print(describe['topics'])
```
### Get the list of all news feeds by topic/language/country
If you want to find the full list of supported news websites
you can always do so using `urls()` function
```python
from ejtraderNS import urls
# URLs by TOPIC
politic_urls = urls(topic = 'politics')
# URLs by COUNTRY
american_urls = urls(country = 'US')
# URLs by LANGUAGE
english_urls = urls(language = 'en')
# Combine any from topic, country, language
american_english_politics_urls = urls(country = 'US', topic = 'politics', language = 'en')
# note some websites do not explicitly declare their language
# as a result they will be excluded from queries based on language
```
## Documentation
### `ejtraderNS` Class
```python
from ejtraderNS import Client
Client(website, topic = None)
```
**Please take the base form url of a website** (without `www.`,neither `https://`, nor `/` at the end of url).
For example: “nytimes”.com, “news.ycombinator.com” or “theverge.com”.
___
`Client.get_news()` - Get the latest news from the website of interest.
Allowed topics:
`tech`, `news`, `business`, `science`, `finance`, `food`,
`politics`, `economics`, `travel`, `entertainment`,
`music`, `sport`, `world`
If no topic is provided, the main feed is returned.
Returns a dictionary of 5 elements:
1. `url` - URL of the website
2. `topic` - topic of the returned feed
3. `language` - language of returned feed
4. `country` - country of returned feed
5. `articles` - articles of the feed. [Feedparser object]((https://pythonhosted.org/feedparser/reference.html))
___
`Client.get_headlines()` - Returns only the headlines
___
`Client.print_headlines(n)` - Print top `n` headlines
<br>
<br>
<br>
### `describe_url()` & `urls()`
Those functions exist to help you navigate through this package
___
```python
from ejtraderNS import describe_url
```
`describe_url(website)` - Get the main info on the website.
Returns a dictionary of 5 elements:
1. `url` - URL of the website
2. `topics` - list of all supported topics
3. `language` - language of website
4. `country` - country of returned feed
5. `main_topic` - main topic of a website
___
```python
from ejtraderNS import urls
```
`urls(topic = None, language = None, country = None)` - Get a list of all supported
news websites given any combination of `topic`, `language`, `country`
Returns a list of websites that match your combination of `topic`, `language`, `country`
Supported topics:
`tech`, `news`, `business`, `science`, `finance`, `food`,
`politics`, `economics`, `travel`, `entertainment`,
`music`, `sport`, `world`
Supported countries:
`US`, `GB`, `DE`, `FR`, `IN`, `RU`, `ES`, `BR`, `IT`, `CA`, `AU`, `NL`, `PL`, `NZ`, `PT`, `RO`, `UA`, `JP`, `AR`, `IR`, `IE`, `PH`, `IS`, `ZA`, `AT`, `CL`, `HR`, `BG`, `HU`, `KR`, `SZ`, `AE`, `EG`, `VE`, `CO`, `SE`, `CZ`, `ZH`, `MT`, `AZ`, `GR`, `BE`, `LU`, `IL`, `LT`, `NI`, `MY`, `TR`, `BM`, `NO`, `ME`, `SA`, `RS`, `BA`
Supported languages:
`EL`, `IT`, `ZH`, `EN`, `RU`, `CS`, `RO`, `FR`, `JA`, `DE`, `PT`, `ES`, `AR`, `HE`, `UK`, `PL`, `NL`, `TR`, `VI`, `KO`, `TH`, `ID`, `HR`, `DA`, `BG`, `NO`, `SK`, `FA`, `ET`, `SV`, `BN`, `GU`, `MK`, `PA`, `HU`, `SL`, `FI`, `LT`, `MR`, `HI`
## Tech/framework used
The package itself is nothing more than a SQLite database with
RSS feed endpoints for each website and some basic wrapper of
[feedparser](https://pythonhosted.org/feedparser/index.html).
## Acknowledgements
I would like to express my gratitude to [@kotartemiy](https://github.com/kotartemiy) for creating the initial project. Their work has been an invaluable starting point for my modifications and improvements.
Raw data
{
"_id": null,
"home_page": "https://ejtraderNS.readthedocs.io/",
"name": "ejtraderNS",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": "",
"keywords": "news,breaknews,historical-data,financial-data,stocks,funds,etfs,indices,currency crosses,bonds,commodities,crypto currencies",
"author": "Emerson Pedroso",
"author_email": "support@ejtrader.com",
"download_url": "https://files.pythonhosted.org/packages/f6/81/d930b550b98af64484f260be91a486d450bc16fdf416b0a58402ce28bd4f/ejtraderNS-0.0.3.tar.gz",
"platform": null,
"description": "\n![Pypi Publish](https://github.com/ejtraderLabs/ejtraderNS/actions/workflows/python-publish.yml/badge.svg)\n![GitHub release (latest by date)](https://img.shields.io/github/v/release/ejtraderLabs/ejtraderNS)\n[![License](https://img.shields.io/github/license/ejtraderLabs/ejtraderNS)](https://github.com/ejtraderLabs/ejtraderNS/blob/main/LICENSE)\n\n\n# ejtraderNS\n**Programmatically collect normalized news from (almost) any website.**\n\nFilter by **topic**, **country**, or **language**.\n\n## Installation\n`pip install ejtraderNS --upgrade` \n\n\n## Quick Start\n```python\nfrom ejtraderNS import Client\n\n```\n\nGet the latest news from [nytimes.com](https://www.nytimes.com/) \n(_we support thousands of news websites, try yourself!_) main news feed\n```python\napi = ejtraderNS(website = 'nytimes.com')\nresults = api.get_news()\n\n# results.keys()\n# 'url', 'topic', 'language', 'country', 'articles'\n\n# Get the articles\narticles = results['articles']\n\nfirst_article_summary = articles[0]['summary']\nfirst_article_title = articles[0]['title']\n```\n\nGet the latest news from [nytimes.com](https://www.nytimes.com/) **politics** feed\n\n```python\napi = ejtraderNS(website = 'nytimes.com', topic = 'politics')\n\nresults = api.get_news()\narticles = results['articles']\n```\n\nSome websites support multiple countries, such as [investing.com](https://www.investing.com) or [tradingeconomics.com](https://www.tradingeconomics.com)\n\n\nIn this example, I will demonstrate a website that supports multiple countries,\n retrieve multiple topics, and convert the data into a pandas dataframe.\n\n\n```python\nimport pandas as pd\nfrom ejtraderNS import Client\nfrom datetime import datetime\n\nurl = 'investing.com' # or tradingeconomics.com\ncountry = 'GB'\ncountry_topic = [\"finance\",\"news\",\"economics\"]\ndfs = []\n\nfor topic in country_topic:\n api = Client(website=url, topic=topic, country=country)\n getdata = api.get_news()\n print(f\"topic: {topic}\")\n\n if getdata is None:\n continue\n\n data = []\n\n for article in getdata['articles']:\n article_data = {\n 'topic': getdata['topic'],\n 'author': article['author'],\n 'date': article['published_parsed'] if article['published_parsed'] else article['published'],\n 'country': getdata['country'],\n 'language': getdata['language'],\n 'title': article['title'],\n 'summary': article.get('summary', article['title'])\n }\n data.append(article_data)\n\n df = pd.DataFrame(data)\n\n df['date'] = pd.to_datetime(df['date'].apply(lambda x: datetime(*x[:6]) if isinstance(x, tuple) else x), utc=True, errors='coerce')\n df.set_index('date', inplace=True)\n dfs.append(df)\n\ndf = pd.concat(dfs)\ndf.sort_index(inplace=True)\nprint(df)\n\n```\noutput example:\n\n| topic | author | country | language | title | summary |\n|:----------|:--------------|:----------|:-----------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|\n| finance | Reuters | GB | en | Italy pushes to limit executive pay in listed state-run firms | Italy pushes to limit executive pay in listed state-run firms |\n| economics | Reuters | GB | en | UK's Cleverly raises Xinjiang and Taiwan with Chinese vice president | UK's Cleverly raises Xinjiang and Taiwan with Chinese vice president |\n| news | Reuters | GB | en | Ukraine hails return of 45 Azov fighters, Russia says 3 pilots released | Ukraine hails return of 45 Azov fighters, Russia says 3 pilots released |\n\n\n\n\n\nThere is a limited set of topic that you might find:\n``` 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world' ```\n\nextras topics only for [investing.com](https://www.investing.com)\n\n``` 'crypto', 'forex', 'stock', 'commodities', 'central_bank', 'forex_analysis', 'forex_technical', 'forex_fundamental', 'forex_opinion', 'forex_signal', 'bonds_analysis', 'bonds_technical', 'bonds_fundamental', 'bonds_opinion', 'bonds_strategy', 'bonds_government', 'bonds_corporate', 'stock_analysis', 'stock_technical', 'stock_fundamental', 'stock_opinion', 'stock_picks', 'indices_analysis', 'futures_analysis', 'options_analysis', 'commodities_analysis', 'commodities_technical', 'commodities_Fundamental', 'commodities_opinion', 'commodities_strategy', 'commodities_metals', 'commodities_energy', 'commodities_agriculture', 'overview_analysis', 'overview_technical', 'overview_fundamental', 'overview_opinion', 'overview_investing', 'crypto_opinion'```\n \n\n\n\n\nHowever, not all topics are supported by every newspaper.\n\nHow to check which topics are supported by which newspaper:\n```python\nfrom ejtraderNS import describe_url\n\ndescribe = describe_url('nytimes.com')\n\nprint(describe['topics'])\n```\n\n\n### Get the list of all news feeds by topic/language/country\nIf you want to find the full list of supported news websites \nyou can always do so using `urls()` function\n```python\nfrom ejtraderNS import urls\n\n# URLs by TOPIC\npolitic_urls = urls(topic = 'politics')\n\n# URLs by COUNTRY\namerican_urls = urls(country = 'US')\n\n# URLs by LANGUAGE\nenglish_urls = urls(language = 'en')\n\n# Combine any from topic, country, language\namerican_english_politics_urls = urls(country = 'US', topic = 'politics', language = 'en') \n\n# note some websites do not explicitly declare their language \n# as a result they will be excluded from queries based on language\n```\n\n\n\n\n## Documentation\n\n### `ejtraderNS` Class\n```python\nfrom ejtraderNS import Client\n\nClient(website, topic = None)\n```\n**Please take the base form url of a website** (without `www.`,neither `https://`, nor `/` at the end of url).\n\nFor example: \u201cnytimes\u201d.com, \u201cnews.ycombinator.com\u201d or \u201ctheverge.com\u201d.\n___\n`Client.get_news()` - Get the latest news from the website of interest.\n\nAllowed topics:\n`tech`, `news`, `business`, `science`, `finance`, `food`, \n`politics`, `economics`, `travel`, `entertainment`, \n`music`, `sport`, `world`\n\nIf no topic is provided, the main feed is returned.\n\nReturns a dictionary of 5 elements:\n1. `url` - URL of the website\n2. `topic` - topic of the returned feed\n3. `language` - language of returned feed\n4. `country` - country of returned feed\n5. `articles` - articles of the feed. [Feedparser object]((https://pythonhosted.org/feedparser/reference.html))\n\n___\n\n`Client.get_headlines()` - Returns only the headlines\n\n___\n`Client.print_headlines(n)` - Print top `n` headlines\n\n\n<br> \n<br> \n<br> \n\n### `describe_url()` & `urls()`\nThose functions exist to help you navigate through this package\n\n___\n```python\nfrom ejtraderNS import describe_url\n```\n\n`describe_url(website)` - Get the main info on the website. \n\nReturns a dictionary of 5 elements:\n1. `url` - URL of the website\n2. `topics` - list of all supported topics\n3. `language` - language of website\n4. `country` - country of returned feed\n5. `main_topic` - main topic of a website\n\n___\n```python\nfrom ejtraderNS import urls\n```\n\n`urls(topic = None, language = None, country = None)` - Get a list of all supported \nnews websites given any combination of `topic`, `language`, `country`\n\nReturns a list of websites that match your combination of `topic`, `language`, `country`\n\nSupported topics:\n`tech`, `news`, `business`, `science`, `finance`, `food`, \n`politics`, `economics`, `travel`, `entertainment`, \n`music`, `sport`, `world`\n\n\nSupported countries:\n`US`, `GB`, `DE`, `FR`, `IN`, `RU`, `ES`, `BR`, `IT`, `CA`, `AU`, `NL`, `PL`, `NZ`, `PT`, `RO`, `UA`, `JP`, `AR`, `IR`, `IE`, `PH`, `IS`, `ZA`, `AT`, `CL`, `HR`, `BG`, `HU`, `KR`, `SZ`, `AE`, `EG`, `VE`, `CO`, `SE`, `CZ`, `ZH`, `MT`, `AZ`, `GR`, `BE`, `LU`, `IL`, `LT`, `NI`, `MY`, `TR`, `BM`, `NO`, `ME`, `SA`, `RS`, `BA`\n\nSupported languages:\n`EL`, `IT`, `ZH`, `EN`, `RU`, `CS`, `RO`, `FR`, `JA`, `DE`, `PT`, `ES`, `AR`, `HE`, `UK`, `PL`, `NL`, `TR`, `VI`, `KO`, `TH`, `ID`, `HR`, `DA`, `BG`, `NO`, `SK`, `FA`, `ET`, `SV`, `BN`, `GU`, `MK`, `PA`, `HU`, `SL`, `FI`, `LT`, `MR`, `HI`\n\n\n\n## Tech/framework used\nThe package itself is nothing more than a SQLite database with \nRSS feed endpoints for each website and some basic wrapper of\n[feedparser](https://pythonhosted.org/feedparser/index.html).\n\n\n## Acknowledgements\n\nI would like to express my gratitude to [@kotartemiy](https://github.com/kotartemiy) for creating the initial project. Their work has been an invaluable starting point for my modifications and improvements.\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "This is a News Library.",
"version": "0.0.3",
"project_urls": {
"Bug Reports": "https://github.com/ejtraderLabs/ejtraderNS/issues",
"Documentation": "https://ejtraderNS.readthedocs.io/",
"Download": "https://github.com/ejtraderlabs/ejtraderNS",
"Homepage": "https://ejtraderNS.readthedocs.io/",
"Source": "https://github.com/ejtraderLabs/ejtraderNS"
},
"split_keywords": [
"news",
"breaknews",
"historical-data",
"financial-data",
"stocks",
"funds",
"etfs",
"indices",
"currency crosses",
"bonds",
"commodities",
"crypto currencies"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "566616fdea2cfc0f1c19cda25d3114074f938eb6cdfbefcbc6fbdb8a96f7b77d",
"md5": "324720e08ef65b6fb514de43bb590659",
"sha256": "cae62259ec81180d8bf57e89f477a7c71ae5b9aa0bb90f0a76ed95197e84f6f0"
},
"downloads": -1,
"filename": "ejtraderNS-0.0.3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "324720e08ef65b6fb514de43bb590659",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3",
"size": 163238,
"upload_time": "2023-05-06T16:01:13",
"upload_time_iso_8601": "2023-05-06T16:01:13.390899Z",
"url": "https://files.pythonhosted.org/packages/56/66/16fdea2cfc0f1c19cda25d3114074f938eb6cdfbefcbc6fbdb8a96f7b77d/ejtraderNS-0.0.3-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f681d930b550b98af64484f260be91a486d450bc16fdf416b0a58402ce28bd4f",
"md5": "b7dc8bd8d28f491b7404f36461872e5b",
"sha256": "ce0469b5e7e9b7c44d439905223c2f633db16e07555b2b54b88c40ab02007ace"
},
"downloads": -1,
"filename": "ejtraderNS-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "b7dc8bd8d28f491b7404f36461872e5b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 166397,
"upload_time": "2023-05-06T16:01:15",
"upload_time_iso_8601": "2023-05-06T16:01:15.225735Z",
"url": "https://files.pythonhosted.org/packages/f6/81/d930b550b98af64484f260be91a486d450bc16fdf416b0a58402ce28bd4f/ejtraderNS-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-06 16:01:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ejtraderLabs",
"github_project": "ejtraderNS",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "ejtraderns"
}