pygooglenews

Name	pygooglenews JSON
Version	0.1.3 JSON
	download
home_page	https://www.newscatcherapi.com
Summary	If Google News had a Python library
upload_time	2024-12-09 15:07:29
maintainer	None
docs_url	None
author	kotartemiy
requires_python	<4.0,>=3.12
license	MIT
keywords	news rss scraping data mining
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pygooglenews
If Google News had a Python library


Created by Artem from [newscatcherapi.com](https://newscatcherapi.com/) but you do not need anything from us or from anyone else to get the software going, it just works out of the box.

My [blog post about how I did it](https://blog.newscatcherapi.com/google-news-rss/)

## Demo

![](pygooglenews-demo.gif)

You might also like to check our [Google News API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/google-news?endpoint=apiendpoint_5e0fa919-2494-4b20-a212-d186b7e8c3d8) or [Financial Google News API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/stock-google-news?endpoint=apiendpoint_c70050d2-34ac-4eac-a1b2-09aefb19d1a5)

### Table of Contents
- [About](#about)
- [Examples of Use Cases](#usecase)
- [Working with Google News in Production](#production)
- [Motivation](#motivation)
- [Installation](#installation)
- [Quickstart](#quickstart)
- [Documentation](#documentation)
- [Advanced Query Search Examples](#examples)
- [About me](#aboutme)
- [Change Log](#changelog)

<a name="about"/>

## **About**

A python wrapper of the Google News RSS feed.

Top stories, topic related news feeds, geolocation news feed, and an extensive full text search feed.

This work is more of a collection of all things I could find out about how Google News functions.

### **How is it different from other Pythonic Google News libraries?**

1. URL-escaping user input helper for the search function
2. Extensive support for the search function that makes it simple to use:
    - exact match
    - in title match, in url match, etc
    - search by date range (`from_` & `to_`), latest published (`when`)
3. Parsing of the sub articles. Almost always, all feeds except the search one contain a subset of similar news for each article in a feed. This package takes care of extracting those sub articles. This feature might be highly useful to ML task when you need to collect a data of similar article headlines

<a name="usecase"/>

## Examples of Use Cases

1. Integrating a news feed to your platform/application/website
2. Collecting data by topic to train your own ML model
3. Search for latest mentions for your new product
4. Media monitoring of people/organizations — PR


<a name="production"/>

## Working with Google News in Production

Before we start, if you want to integrate Google News data to your production then I would advise you to use one of the 3 methods described below. Why? Because you do not want your servers IP address to be locked by Google. Every time you call any function there is an HTTPS request to Google's servers. **Don't get me wrong, this Python package still works out of the box.**

1. NewsCatcher's [Google News API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/google-news?endpoint=apiendpoint_5e0fa919-2494-4b20-a212-d186b7e8c3d8) — all code is written for you, clean & structured JSON output. Low price. You can test it yourself with no credit card. Plus, [financial version of API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/stock-google-news?endpoint=apiendpoint_c70050d2-34ac-4eac-a1b2-09aefb19d1a5) is also available.
2. [ScrapingBee API](https://www.scrapingbee.com?fpr=artem26) which handles proxy rotation for you. Each function in this package has `scraping_bee` parameter where you paste your API key. You can also try it for free, no credit card required. See [example](#scrapingbeeexample)
3. Your own proxy — already have a pool of proxies? Each function in this package has `proxies` parameter (python dictionary) where you just paste your own proxies. 

<a name="motivaion"/>

## **Motivation**

I love working with the news data. I love it so much that I created [my own company that crawls for hundreds of thousands of news articles, and allow you to search it via a news API](https://newscatcherapi.com/). But this time, I want to share with the community a Python package that makes it simple to get the news data from the best search engine ever created - [Google](https://news.google.com/).

Most likely, you know already that Google has its own [news service](https://news.google.com/). It is different from the usual Google search that we use on a daily basis (sorry [DuckDuckGo](https://duckduckgo.com/), maybe next time).

**This package uses the RSS feed of the Google News. The [top stories page](https://news.google.com/rss), for example.**

RSS is an XML page that is already well structured. I heavily rely on Feedparser package to parse the RSS feed.

Google News used to have an API but it was deprecated many years ago. (Unofficial) information 
about RSS syntax is decentralized over the web. There is no official documentation. So, I tried my best
to collect all this informaion in one place. 


<a name="installation"/>

## **Installation**

```shell script
$ pip install pygooglenews --upgrade

```

<a name="quickstart"/>

## **Quickstart**

```python
from pygooglenews import GoogleNews

gn = GoogleNews()

```

### **Top Stories**

```python
top = gn.top_news()
```

### **Stories by Topic**

```python
business = gn.topic_headlines('business')

```

### **Geolocation Specific Stories**

```python
headquaters = gn.geo_headlines('San Fran')

```

### **Stories by a Query Search**

```python
# search for the best matching articles that mention MSFT and 
# do not mention AAPL (over the past 6 month
search = gn.search('MSFT -APPL', when = '6m')

```

---
<a name="documentation"/>

## **Documentation - Functions & Classes**

### **GoogleNews Class**

```python
from pygooglenews import GoogleNews
# default GoogleNews instance
gn = GoogleNews(lang = 'en', country = 'US')

```

To get the access to all the functions, you first have to initiate the `GoogleNews` class.

It has 2 required variables: `lang` and `country`

You can try any combination of those 2, however, it does not exist for all. Only the combinations that are supported by GoogleNews will work. Check the official Google News page to check what is covered:

On the bottom left side of the Google News page you may find a `Language & region` section where you can find all of the supported combinations.


For example, for `country=UA` (Ukraine), there are 2 languages supported:

- `lang=uk` Ukrainian
- `lang=ru` Russian

---

### **Top Stories**

```python
top = gn.top_news(proxies=None, scraping_bee = None)

```

`top_news()` returns the top stories for the selected country and language that are defined in `GoogleNews` class. The returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.

---

### **Stories by Topic**

```python
business = gn.topic_headlines('BUSINESS', proxies=None, scraping_bee = None)

```

The returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.

Accepted topics are:

- `WORLD`
- `NATION`
- `BUSINESS`
- `TECHNOLOGY`
- `ENTERTAINMENT`
- `SCIENCE`
- `SPORTS`
- `HEALTH`

However, you can find some other topics that are also supported by Google News.

For example, if you search for `corona` in the search tab of `en` + `US` you will find `COVID-19` as a topic.

The URL looks like this: `https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US%3Aen`

We have to copy the text after `topics/` and before `?`, then you can use it as an input for the `top_news()` function.

```python
from pygooglenews import GoogleNews

gn = GoogleNews()
covid = gn.topic_headlines('CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE')

```

**However, be aware that this topic will be unique for each language/country combination.**

---

### **Stories by Geolocation**

```python
gn = GoogleNews('uk', 'UA')
kyiv = gn.geo_headlines('kyiv', proxies=None, scraping_bee = None)
# or 
kyiv = gn.geo_headlines('kiev', proxies=None, scraping_bee = None)
# or
kyiv = gn.geo_headlines('киев', proxies=None, scraping_bee = None)
# or
kyiv = gn.geo_headlines('Київ', proxies=None, scraping_bee = None)

```

The returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.

All of the above variations will return the same feed of the latest news about Kyiv, Ukraine:

```python
geo['feed'].title

# 'Київ - Останні - Google Новини'

```

It is language agnostic, however, it does not guarantee that the feed for any specific place will exist. For example, if you want to find the feed on `LA` or `Los Angeles` you can do it with `GoogleNews('en', 'US')`.

The main (`en`, `US`) Google News client will most likely find the feed about the most places.

---

### **Stories by a Query**

```python
gn.search(query: str, helper = True, when = None, from_ = None, to_ = None, proxies=None, scraping_bee=None)

```

The returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.

Google News search itself is a complex function that has inherited some features from the standard Google Search.

[The official reference on what could be inserted](https://developers.google.com/custom-search/docs/xml_results)

The biggest obstacle that you might have is to write the URL-escaping input. To ease this process, `helper = True` is turned on by default.

`helper` uses `urllib.parse.quote_plus` to automatically convert the input.

For example:

- `'New York metro opening'` --> `'New+York+metro+opening'`
- `'AAPL -MSFT'` --> `'AAPL+-MSFT'`
- `'"Tokyo Olimpics date changes"'` --> `'%22Tokyo+Olimpics+date+changes%22'`

You can turn it off and write your own query in case you need it by `helper = False`

`when` parameter (`str`) sets the time range for the published datetime. I could not find any documentation regarding this option, but here is what I deducted:

- `h` for hours.(For me, worked for up to `101h`). `when=12h` will search for only the articles matching the `search` criteri and published for the last 12 hours
- `d` for days
- `m` for month (For me, worked for up to `48m`)

I did not set any hard limit here. You may try put here anything. Probably, it will work. However, I would like to warn you that wrong inputs will not lead to an error. Instead, the `when` parameter will be ignored by the Google.

`from_` and `to_` accept the following format of date: `%Y-%m-%d` For example, `2020-07-01` 

---

**[Google's Special Query Terms](https://developers.google.com/custom-search/docs/xml_results#special-query-terms) Cheat Sheet**

Many Google's Special Query Terms have been tested one by one. Most of the core ones have been inherited by Google News service. At first, I wanted to integrate all of those as the `search()` function parameters. But, I realised that it might be a bit confusing and difficult to make them all work correctly.

Instead, **I decided to write some kind of a cheat sheet that should give you a decent understanding of what you could do**.

* Boolean OR Search [ OR ]

```python
from pygooglenews import GoogleNews

gn = GoogleNews()

s = gn.search('boeing OR airbus')

print(s['feed'].title)
# "boeing OR airbus" - Google News

```

* Exclude Query Term [-]

"The exclude (`-`) query term restricts results for a particular search request to documents that do not contain a particular word or phrase. To use the exclude query term, you would preface the word or phrase to be excluded from the matching documents with "-" (a minus sign)."

* Include Query Term [+]

"The include (`+`) query term specifies that a word or phrase must occur in all documents included in the search results. To use the include query term, you would preface the word or phrase that must be included in all search results with "+" (a plus sign).

The URL-escaped version of **`+`** (a plus sign) is `%2B`."


* Phrase Search

"The phrase search (`"`) query term allows you to search for complete phrases by enclosing the phrases in quotation marks or by connecting them with hyphens.

The URL-escaped version of **`"`** (a quotation mark) is **`%22`**.

Phrase searches are particularly useful if you are searching for famous quotes or proper names."

* allintext

"The **`allintext:`** query term requires each document in the search results to contain all of the words in the search query in the body of the document. The query should be formatted as **`allintext:`** followed by the words in your search query.

If your search query includes the **`allintext:`** query term, Google will only check the body text of documents for the words in your search query, ignoring links in those documents, document titles and document URLs."

* intitle

"The `intitle:` query term restricts search results to documents that contain a particular word in the document title. The search query should be formatted as `intitle:WORD` with no space between the intitle: query term and the following word."

* allintitle

"The **`allintitle:`** query term restricts search results to documents that contain all of the query words in the document title. To use the **`allintitle:`** query term, include "allintitle:" at the start of your search query.

Note: Putting **`allintitle:`** at the beginning of a search query is equivalent to putting [intitle:](https://developers.google.com/custom-search/docs/xml_results#TitleSearchqt) in front of each word in the search query."

* inurl

"The `inurl:` query term restricts search results to documents that contain a particular word in the document URL. The search query should be formatted as `inurl:WORD` with no space between the inurl: query term and the following word"

* allinurl

The `allinurl:` query term restricts search results to documents that contain all of the query words in the document URL. To use the `allinurl:` query term, include allinurl: at the start of your search query.


### List of operators that do not work (for me, at least):

1. Most (probably all) of the `as_*` terms do not work for Google News
2. `allinlinks:`
3. `related:`


**Tip**. If you want to build a near real-time feed for a specific topic, use `when='1h'`. If Google captured fewer than 100 articles over the past hour, you should be able to retrieve all of them.

Check the [Useful Links](notion://www.notion.so/Google-News-API-Documentation-b95117b9ecd94076bb1d8cf7c2957d78#useful-links) section if you want to dig into how Google Search works.

Especially, [Special Query Terms](https://developers.google.com/custom-search/docs/xml_results#special-query-terms) section of Google XML reference.

Plus, I will provide some more examples under the [Full-Text Search Examples](notion://www.notion.so/Google-News-API-Documentation-b95117b9ecd94076bb1d8cf7c2957d78#examples) section

---

### **Output Body**

All 4 functions return the `dictionary` that has 2 sub-objects:

- `feed` - contains the information on the feed metadata
- `entries` - contains the parsed articles

Both are inherited from the [Feedparser](https://github.com/kurtmckee/feedparser). The only change is that each dictionary under `entries` also contains `sub_articles` which are the similar articles found in the description. Usually, it is non-empty for `top_news()` and `topic_headlines()` feeds.

**Tip** To check what is the found feed's name just check the `title` under the `feed` dictionary

---
<a name="scrapingbeeexample"/>

## How to use pygooglenews with [ScrapingBee](https://www.scrapingbee.com?fpr=artem26)

Every function has `scrapingbee` parameter. It accepts your [ScrapingBee](https://www.scrapingbee.com?fpr=artem26) API key that will be used to get the response from Google's servers. 

You can take a look at what exactly is happening in the source code: check for `__scaping_bee_request()` function under GoogleNews class

Pay attention to the concurrency of each plan at [ScrapingBee](https://www.scrapingbee.com?fpr=artem26)
 
How to use example:

```python
gn = GoogleNews()

# it's a fake API key, do not try to use it
gn.top_news(scraping_bee = 'I5SYNPRFZI41WHVQWWUT0GNXFMO104343E7CXFIISR01E2V8ETSMXMJFK1XNKM7FDEEPUPRM0FYAHFF5')
```

---

## How to use pygooglenews with proxies

So, if you have your own HTTP/HTTPS proxy(s) that you want to use to make requests to Google, that's how you do it:

```python
gn = GoogleNews()

gn.top_news(proxies = {'https':'34.91.135.38:80'})
```

---
<a name="examples"/>

## **Advanced Querying Search Examples**

### **Example 1. Search for articles that mention `boeing` and do not mention `airbus`**

```python
from pygooglenews import GoogleNews

gn = GoogleNews()

s = gn.search('boeing -airbus')

print(s['feed'].title)
# "boeing -airbus" - Google News

```

### **Example 2. Search for articles that mention `boeing` in title**

```python
from pygooglenews import GoogleNews

gn = GoogleNews()

s = gn.search('intitle:boeing')

print(s['feed'].title)
# "intitle:boeing" - Google News

```

### **Example 3. Search for articles that mention `boeing` in title and got published over the past hour**

```python
from pygooglenews import GoogleNews

gn = GoogleNews()

s = gn.search('intitle:boeing', when = '1h')

print(s['feed'].title)
# "intitle:boeing when:1h" - Google News

```

### **Example 4. Search for articles that mention `boeing` or `airbus`**

```python
from pygooglenews import GoogleNews

gn = GoogleNews()

s = gn.search('boeing OR airbus', when = '1h')

print(s['feed'].title)
# "boeing AND airbus when:1h" - Google News

```

---
<a name="useful-links"/>

## **Useful Links**

[Stack Overflow thread from which it all began](https://stackoverflow.com/questions/51537063/url-format-for-google-news-rss-feed)

[Google XML reference for the search query](https://developers.google.com/custom-search/docs/xml_results)

[Google News Search parameters (The Missing Manual)](http://web.archive.org/web/20150204025359/http://blog.slashpoundbang.com/post/12975232033/google-news-search-parameters-the-missing-manual)

---
<a name="built"/>

## **Built With**

[Feedparser](https://github.com/kurtmckee/feedparser)

[Beutifulsoup4](https://pypi.org/project/beautifulsoup4/)

---
<a name="aboutme"/>

## **About me**

My name is Artem. I ❤️ working with news data. I am a co-founder of [NewsCatcherAPI](https://newscatcherapi.com/) - **Ultra-fast API to find news articles by any topic, country, language, website, or keyword**

If you are interested in hiring me, please, contact me by email - **bugara.artem@gmail.com** or **artem@newscatcherapi.com**

Follow me on 🖋 [Twitter](https://twitter.com/bugaralife) - I write about data engineering, python, entrepreneurship, and memes.

Want to read about how it all was done? Subscribe to [CODARIUM](https://codarium.substack.com/)

thx to Kizy

---
<a name="changelog"/>

## **Change Log**

v0.1.1 -- fixed language-country issues

Raw data

            {
    "_id": null,
    "home_page": "https://www.newscatcherapi.com",
    "name": "pygooglenews",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": "News, RSS, Scraping, Data Mining",
    "author": "kotartemiy",
    "author_email": "bugara.artem@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a3/cf/c49fc92d11e1f275316c8b1bf8be2e82e9dfcae52c46e5b94c59b829b857/pygooglenews-0.1.3.tar.gz",
    "platform": null,
    "description": "# pygooglenews\nIf Google News had a Python library\n\n\nCreated by Artem from [newscatcherapi.com](https://newscatcherapi.com/) but you do not need anything from us or from anyone else to get the software going, it just works out of the box.\n\nMy [blog post about how I did it](https://blog.newscatcherapi.com/google-news-rss/)\n\n## Demo\n\n![](pygooglenews-demo.gif)\n\nYou might also like to check our [Google News API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/google-news?endpoint=apiendpoint_5e0fa919-2494-4b20-a212-d186b7e8c3d8) or [Financial Google News API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/stock-google-news?endpoint=apiendpoint_c70050d2-34ac-4eac-a1b2-09aefb19d1a5)\n\n### Table of Contents\n- [About](#about)\n- [Examples of Use Cases](#usecase)\n- [Working with Google News in Production](#production)\n- [Motivation](#motivation)\n- [Installation](#installation)\n- [Quickstart](#quickstart)\n- [Documentation](#documentation)\n- [Advanced Query Search Examples](#examples)\n- [About me](#aboutme)\n- [Change Log](#changelog)\n\n<a name=\"about\"/>\n\n## **About**\n\nA python wrapper of the Google News RSS feed.\n\nTop stories, topic related news feeds, geolocation news feed, and an extensive full text search feed.\n\nThis work is more of a collection of all things I could find out about how Google News functions.\n\n### **How is it different from other Pythonic Google News libraries?**\n\n1. URL-escaping user input helper for the search function\n2. Extensive support for the search function that makes it simple to use:\n    - exact match\n    - in title match, in url match, etc\n    - search by date range (`from_` & `to_`), latest published (`when`)\n3. Parsing of the sub articles. Almost always, all feeds except the search one contain a subset of similar news for each article in a feed. This package takes care of extracting those sub articles. This feature might be highly useful to ML task when you need to collect a data of similar article headlines\n\n<a name=\"usecase\"/>\n\n## Examples of Use Cases\n\n1. Integrating a news feed to your platform/application/website\n2. Collecting data by topic to train your own ML model\n3. Search for latest mentions for your new product\n4. Media monitoring of people/organizations \u2014 PR\n\n\n<a name=\"production\"/>\n\n## Working with Google News in Production\n\nBefore we start, if you want to integrate Google News data to your production then I would advise you to use one of the 3 methods described below. Why? Because you do not want your servers IP address to be locked by Google. Every time you call any function there is an HTTPS request to Google's servers. **Don't get me wrong, this Python package still works out of the box.**\n\n1. NewsCatcher's [Google News API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/google-news?endpoint=apiendpoint_5e0fa919-2494-4b20-a212-d186b7e8c3d8) \u2014 all code is written for you, clean & structured JSON output. Low price. You can test it yourself with no credit card. Plus, [financial version of API](https://rapidapi.com/newscatcher-api-newscatcher-api-default/api/stock-google-news?endpoint=apiendpoint_c70050d2-34ac-4eac-a1b2-09aefb19d1a5) is also available.\n2. [ScrapingBee API](https://www.scrapingbee.com?fpr=artem26) which handles proxy rotation for you. Each function in this package has `scraping_bee` parameter where you paste your API key. You can also try it for free, no credit card required. See [example](#scrapingbeeexample)\n3. Your own proxy \u2014 already have a pool of proxies? Each function in this package has `proxies` parameter (python dictionary) where you just paste your own proxies. \n\n<a name=\"motivaion\"/>\n\n## **Motivation**\n\nI love working with the news data. I love it so much that I created\u00a0[my own company that crawls for hundreds of thousands of news articles, and allow you to search it via a news API](https://newscatcherapi.com/). But this time, I want to share with the community a Python package that makes it simple to get the news data from the best search engine ever created -\u00a0[Google](https://news.google.com/).\n\nMost likely, you know already that Google has its own\u00a0[news service](https://news.google.com/). It is different from the usual Google search that we use on a daily basis (sorry\u00a0[DuckDuckGo](https://duckduckgo.com/), maybe next time).\n\n**This package uses the RSS feed of the Google News. The\u00a0[top stories page](https://news.google.com/rss), for example.**\n\nRSS is an XML page that is already well structured. I heavily rely on Feedparser package to parse the RSS feed.\n\nGoogle News used to have an API but it was deprecated many years ago. (Unofficial) information \nabout RSS syntax is decentralized over the web. There is no official documentation. So, I tried my best\nto collect all this informaion in one place. \n\n\n<a name=\"installation\"/>\n\n## **Installation**\n\n```shell script\n$ pip install pygooglenews --upgrade\n\n```\n\n<a name=\"quickstart\"/>\n\n## **Quickstart**\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\n\n```\n\n### **Top Stories**\n\n```python\ntop = gn.top_news()\n```\n\n### **Stories by Topic**\n\n```python\nbusiness = gn.topic_headlines('business')\n\n```\n\n### **Geolocation Specific Stories**\n\n```python\nheadquaters = gn.geo_headlines('San Fran')\n\n```\n\n### **Stories by a Query Search**\n\n```python\n# search for the best matching articles that mention MSFT and \n# do not mention AAPL (over the past 6 month\nsearch = gn.search('MSFT -APPL', when = '6m')\n\n```\n\n---\n<a name=\"documentation\"/>\n\n## **Documentation - Functions & Classes**\n\n### **GoogleNews Class**\n\n```python\nfrom pygooglenews import GoogleNews\n# default GoogleNews instance\ngn = GoogleNews(lang = 'en', country = 'US')\n\n```\n\nTo get the access to all the functions, you first have to initiate the\u00a0`GoogleNews`\u00a0class.\n\nIt has 2 required variables:\u00a0`lang`\u00a0and\u00a0`country`\n\nYou can try any combination of those 2, however, it does not exist for all. Only the combinations that are supported by GoogleNews will work. Check the official Google News page to check what is covered:\n\nOn the bottom left side of the Google News page you may find a\u00a0`Language & region`\u00a0section where you can find all of the supported combinations.\n\n\nFor example, for\u00a0`country=UA`\u00a0(Ukraine), there are 2 languages supported:\n\n- `lang=uk`\u00a0Ukrainian\n- `lang=ru`\u00a0Russian\n\n---\n\n### **Top Stories**\n\n```python\ntop = gn.top_news(proxies=None, scraping_bee = None)\n\n```\n\n`top_news()`\u00a0returns the top stories for the selected country and language that are defined in\u00a0`GoogleNews`\u00a0class. The returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.\n\n---\n\n### **Stories by Topic**\n\n```python\nbusiness = gn.topic_headlines('BUSINESS', proxies=None, scraping_bee = None)\n\n```\n\nThe returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.\n\nAccepted topics are:\n\n- `WORLD`\n- `NATION`\n- `BUSINESS`\n- `TECHNOLOGY`\n- `ENTERTAINMENT`\n- `SCIENCE`\n- `SPORTS`\n- `HEALTH`\n\nHowever, you can find some other topics that are also supported by Google News.\n\nFor example, if you search for\u00a0`corona`\u00a0in the search tab of\u00a0`en`\u00a0+\u00a0`US`\u00a0you will find\u00a0`COVID-19`\u00a0as a topic.\n\nThe URL looks like this:\u00a0`https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US%3Aen`\n\nWe have to copy the text after\u00a0`topics/`\u00a0and before\u00a0`?`, then you can use it as an input for the\u00a0`top_news()`\u00a0function.\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\ncovid = gn.topic_headlines('CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE')\n\n```\n\n**However, be aware that this topic will be unique for each language/country combination.**\n\n---\n\n### **Stories by Geolocation**\n\n```python\ngn = GoogleNews('uk', 'UA')\nkyiv = gn.geo_headlines('kyiv', proxies=None, scraping_bee = None)\n# or \nkyiv = gn.geo_headlines('kiev', proxies=None, scraping_bee = None)\n# or\nkyiv = gn.geo_headlines('\u043a\u0438\u0435\u0432', proxies=None, scraping_bee = None)\n# or\nkyiv = gn.geo_headlines('\u041a\u0438\u0457\u0432', proxies=None, scraping_bee = None)\n\n```\n\nThe returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.\n\nAll of the above variations will return the same feed of the latest news about Kyiv, Ukraine:\n\n```python\ngeo['feed'].title\n\n# '\u041a\u0438\u0457\u0432 - \u041e\u0441\u0442\u0430\u043d\u043d\u0456 - Google \u041d\u043e\u0432\u0438\u043d\u0438'\n\n```\n\nIt is language agnostic, however, it does not guarantee that the feed for any specific place will exist. For example, if you want to find the feed on\u00a0`LA`\u00a0or\u00a0`Los Angeles`\u00a0you can do it with\u00a0`GoogleNews('en', 'US')`.\n\nThe main (`en`,\u00a0`US`) Google News client will most likely find the feed about the most places.\n\n---\n\n### **Stories by a Query**\n\n```python\ngn.search(query: str, helper = True, when = None, from_ = None, to_ = None, proxies=None, scraping_bee=None)\n\n```\n\nThe returned object contains `feed` (FeedParserDict) and `entries` list of articles found with all data parsed.\n\nGoogle News search itself is a complex function that has inherited some features from the standard Google Search.\n\n[The official reference on what could be inserted](https://developers.google.com/custom-search/docs/xml_results)\n\nThe biggest obstacle that you might have is to write the URL-escaping input. To ease this process,\u00a0`helper = True`\u00a0is turned on by default.\n\n`helper`\u00a0uses\u00a0`urllib.parse.quote_plus`\u00a0to automatically convert the input.\n\nFor example:\n\n- `'New York metro opening'`\u00a0-->\u00a0`'New+York+metro+opening'`\n- `'AAPL -MSFT'`\u00a0-->\u00a0`'AAPL+-MSFT'`\n- `'\"Tokyo Olimpics date changes\"'`\u00a0-->\u00a0`'%22Tokyo+Olimpics+date+changes%22'`\n\nYou can turn it off and write your own query in case you need it by\u00a0`helper = False`\n\n`when`\u00a0parameter (`str`) sets the time range for the published datetime. I could not find any documentation regarding this option, but here is what I deducted:\n\n- `h`\u00a0for hours.(For me, worked for up to\u00a0`101h`).\u00a0`when=12h`\u00a0will search for only the articles matching the\u00a0`search`\u00a0criteri and published for the last 12 hours\n- `d`\u00a0for days\n- `m`\u00a0for month (For me, worked for up to\u00a0`48m`)\n\nI did not set any hard limit here. You may try put here anything. Probably, it will work. However, I would like to warn you that wrong inputs will not lead to an error. Instead, the\u00a0`when`\u00a0parameter will be ignored by the Google.\n\n`from_` and `to_` accept the following format of date: `%Y-%m-%d` For example, `2020-07-01` \n\n---\n\n**[Google's Special Query Terms](https://developers.google.com/custom-search/docs/xml_results#special-query-terms)\u00a0Cheat Sheet**\n\nMany Google's Special Query Terms have been tested one by one. Most of the core ones have been inherited by Google News service. At first, I wanted to integrate all of those as the\u00a0`search()`\u00a0function parameters. But, I realised that it might be a bit confusing and difficult to make them all work correctly.\n\nInstead,\u00a0**I decided to write some kind of a cheat sheet that should give you a decent understanding of what you could do**.\n\n* Boolean OR Search [ OR ]\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\n\ns = gn.search('boeing OR airbus')\n\nprint(s['feed'].title)\n# \"boeing OR airbus\" - Google News\n\n```\n\n* Exclude Query Term [-]\n\n\"The exclude (`-`) query term restricts results for a particular search request to documents that do not contain a particular word or phrase. To use the exclude query term, you would preface the word or phrase to be excluded from the matching documents with \"-\" (a minus sign).\"\n\n* Include Query Term [+]\n\n\"The include (`+`) query term specifies that a word or phrase must occur in all documents included in the search results. To use the include query term, you would preface the word or phrase that must be included in all search results with \"+\" (a plus sign).\n\nThe URL-escaped version of\u00a0**`+`**\u00a0(a plus sign) is\u00a0`%2B`.\"\n\n\n* Phrase Search\n\n\"The phrase search (`\"`) query term allows you to search for complete phrases by enclosing the phrases in quotation marks or by connecting them with hyphens.\n\nThe URL-escaped version of\u00a0**`\"`**\u00a0(a quotation mark) is\u00a0**`%22`**.\n\nPhrase searches are particularly useful if you are searching for famous quotes or proper names.\"\n\n* allintext\n\n\"The\u00a0**`allintext:`**\u00a0query term requires each document in the search results to contain all of the words in the search query in the body of the document. The query should be formatted as\u00a0**`allintext:`**\u00a0followed by the words in your search query.\n\nIf your search query includes the\u00a0**`allintext:`**\u00a0query term, Google will only check the body text of documents for the words in your search query, ignoring links in those documents, document titles and document URLs.\"\n\n* intitle\n\n\"The `intitle:` query term restricts search results to documents that contain a particular word in the document title. The search query should be formatted as `intitle:WORD` with no space between the intitle: query term and the following word.\"\n\n* allintitle\n\n\"The\u00a0**`allintitle:`**\u00a0query term restricts search results to documents that contain all of the query words in the document title. To use the\u00a0**`allintitle:`**\u00a0query term, include \"allintitle:\" at the start of your search query.\n\nNote:\u00a0Putting\u00a0**`allintitle:`**\u00a0at the beginning of a search query is equivalent to putting\u00a0[intitle:](https://developers.google.com/custom-search/docs/xml_results#TitleSearchqt)\u00a0in front of each word in the search query.\"\n\n* inurl\n\n\"The `inurl:` query term restricts search results to documents that contain a particular word in the document URL. The search query should be formatted as `inurl:WORD` with no space between the inurl: query term and the following word\"\n\n* allinurl\n\nThe `allinurl:` query term restricts search results to documents that contain all of the query words in the document URL. To use the `allinurl:` query term, include allinurl: at the start of your search query.\n\n\n### List of operators that do not work (for me, at least):\n\n1. Most (probably all) of the\u00a0`as_*`\u00a0terms do not work for Google News\n2. `allinlinks:`\n3. `related:`\n\n\n**Tip**. If you want to build a near real-time feed for a specific topic, use\u00a0`when='1h'`. If Google captured fewer than 100 articles over the past hour, you should be able to retrieve all of them.\n\nCheck the\u00a0[Useful Links](notion://www.notion.so/Google-News-API-Documentation-b95117b9ecd94076bb1d8cf7c2957d78#useful-links)\u00a0section if you want to dig into how Google Search works.\n\nEspecially,\u00a0[Special Query Terms](https://developers.google.com/custom-search/docs/xml_results#special-query-terms)\u00a0section of Google XML reference.\n\nPlus, I will provide some more examples under the\u00a0[Full-Text Search Examples](notion://www.notion.so/Google-News-API-Documentation-b95117b9ecd94076bb1d8cf7c2957d78#examples)\u00a0section\n\n---\n\n### **Output Body**\n\nAll 4 functions return the\u00a0`dictionary`\u00a0that has 2 sub-objects:\n\n- `feed`\u00a0- contains the information on the feed metadata\n- `entries`\u00a0- contains the parsed articles\n\nBoth are inherited from the\u00a0[Feedparser](https://github.com/kurtmckee/feedparser). The only change is that each dictionary under\u00a0`entries`\u00a0also contains\u00a0`sub_articles`\u00a0which are the similar articles found in the description. Usually, it is non-empty for\u00a0`top_news()`\u00a0and\u00a0`topic_headlines()`\u00a0feeds.\n\n**Tip**\u00a0To check what is the found feed's name just check the\u00a0`title`\u00a0under the\u00a0`feed`\u00a0dictionary\n\n---\n<a name=\"scrapingbeeexample\"/>\n\n## How to use pygooglenews with [ScrapingBee](https://www.scrapingbee.com?fpr=artem26)\n\nEvery function has `scrapingbee` parameter. It accepts your [ScrapingBee](https://www.scrapingbee.com?fpr=artem26) API key that will be used to get the response from Google's servers. \n\nYou can take a look at what exactly is happening in the source code: check for `__scaping_bee_request()` function under GoogleNews class\n\nPay attention to the concurrency of each plan at [ScrapingBee](https://www.scrapingbee.com?fpr=artem26)\n \nHow to use example:\n\n```python\ngn = GoogleNews()\n\n# it's a fake API key, do not try to use it\ngn.top_news(scraping_bee = 'I5SYNPRFZI41WHVQWWUT0GNXFMO104343E7CXFIISR01E2V8ETSMXMJFK1XNKM7FDEEPUPRM0FYAHFF5')\n```\n\n---\n\n## How to use pygooglenews with proxies\n\nSo, if you have your own HTTP/HTTPS proxy(s) that you want to use to make requests to Google, that's how you do it:\n\n```python\ngn = GoogleNews()\n\ngn.top_news(proxies = {'https':'34.91.135.38:80'})\n```\n\n---\n<a name=\"examples\"/>\n\n## **Advanced Querying Search Examples**\n\n### **Example 1. Search for articles that mention\u00a0`boeing`\u00a0and do not mention\u00a0`airbus`**\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\n\ns = gn.search('boeing -airbus')\n\nprint(s['feed'].title)\n# \"boeing -airbus\" - Google News\n\n```\n\n### **Example 2. Search for articles that mention\u00a0`boeing`\u00a0in title**\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\n\ns = gn.search('intitle:boeing')\n\nprint(s['feed'].title)\n# \"intitle:boeing\" - Google News\n\n```\n\n### **Example 3. Search for articles that mention\u00a0`boeing`\u00a0in title and got published over the past hour**\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\n\ns = gn.search('intitle:boeing', when = '1h')\n\nprint(s['feed'].title)\n# \"intitle:boeing when:1h\" - Google News\n\n```\n\n### **Example 4. Search for articles that mention\u00a0`boeing`\u00a0or\u00a0`airbus`**\n\n```python\nfrom pygooglenews import GoogleNews\n\ngn = GoogleNews()\n\ns = gn.search('boeing OR airbus', when = '1h')\n\nprint(s['feed'].title)\n# \"boeing AND airbus when:1h\" - Google News\n\n```\n\n---\n<a name=\"useful-links\"/>\n\n## **Useful Links**\n\n[Stack Overflow thread from which it all began](https://stackoverflow.com/questions/51537063/url-format-for-google-news-rss-feed)\n\n[Google XML reference for the search query](https://developers.google.com/custom-search/docs/xml_results)\n\n[Google News Search parameters (The Missing Manual)](http://web.archive.org/web/20150204025359/http://blog.slashpoundbang.com/post/12975232033/google-news-search-parameters-the-missing-manual)\n\n---\n<a name=\"built\"/>\n\n## **Built With**\n\n[Feedparser](https://github.com/kurtmckee/feedparser)\n\n[Beutifulsoup4](https://pypi.org/project/beautifulsoup4/)\n\n---\n<a name=\"aboutme\"/>\n\n## **About me**\n\nMy name is Artem. I \u2764\ufe0f working with news data. I am a co-founder of [NewsCatcherAPI](https://newscatcherapi.com/) - **Ultra-fast API to find news articles by any topic, country, language, website, or keyword**\n\nIf you are interested in hiring me, please, contact me by email -\u00a0**bugara.artem@gmail.com** or **artem@newscatcherapi.com**\n\nFollow me on \ud83d\udd8b\u00a0[Twitter](https://twitter.com/bugaralife)\u00a0- I write about data engineering, python, entrepreneurship, and memes.\n\nWant to read about how it all was done? Subscribe to [CODARIUM](https://codarium.substack.com/)\n\nthx to Kizy\n\n---\n<a name=\"changelog\"/>\n\n## **Change Log**\n\nv0.1.1 -- fixed language-country issues\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "If Google News had a Python library",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://www.newscatcherapi.com"
    },
    "split_keywords": [
        "news",
        " rss",
        " scraping",
        " data mining"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c95428881c36b9967ac02ac8e35c9f139c999ae0237216059a07362bcbe8d71",
                "md5": "f0660b634509e019d14cc867c9577fda",
                "sha256": "ae53357f4d1b9a5e476e88a3478e2f89ab5bc8b4a691db3f9271343ac42fb721"
            },
            "downloads": -1,
            "filename": "pygooglenews-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f0660b634509e019d14cc867c9577fda",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 10588,
            "upload_time": "2024-12-09T15:07:28",
            "upload_time_iso_8601": "2024-12-09T15:07:28.569972Z",
            "url": "https://files.pythonhosted.org/packages/3c/95/428881c36b9967ac02ac8e35c9f139c999ae0237216059a07362bcbe8d71/pygooglenews-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a3cfc49fc92d11e1f275316c8b1bf8be2e82e9dfcae52c46e5b94c59b829b857",
                "md5": "8c8e04a17496679f509dcf7409c61bd8",
                "sha256": "2bd9e7678155b684b8906b8e8a4a6d56a1529cec302598f86427a3cf2a3f8ae0"
            },
            "downloads": -1,
            "filename": "pygooglenews-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "8c8e04a17496679f509dcf7409c61bd8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 11012,
            "upload_time": "2024-12-09T15:07:29",
            "upload_time_iso_8601": "2024-12-09T15:07:29.962534Z",
            "url": "https://files.pythonhosted.org/packages/a3/cf/c49fc92d11e1f275316c8b1bf8be2e82e9dfcae52c46e5b94c59b829b857/pygooglenews-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-09 15:07:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pygooglenews"
}

kotartemiy