aioTrends


NameaioTrends JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/yuz0101/aioTrends
SummaryLibrary for fetching Google Trends in an async. way
upload_time2023-08-22 11:39:08
maintainer
docs_urlNone
authorStephen Zhang
requires_python
licenseMIT
keywords google trends async asyncio aiohttp googletrends pytrends
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # aioTrends

## Intro.

This is a project for asynchronously obtaining data from google trends in an efficient way. Inspired by [pytrends](https://github.com/GeneralMills/pytrends), I am developing this project based on a asynchronous framework, asyncio, and a related module, [aiohttp](https://github.com/aio-libs/aiohttp).

The logic behind this project is to firstly build a cookies pool, then obtain and store the tokenized queries (wrapped inside the widgets) in another pool, and lastly retreive the data with widgets from the widget pool.

Only data of interest over time is tested and avaiable now.

## Pros & Cons
### Pros
- **Saving time** ~ By employing the asynchronous framework, the programme will deal with other requests while waiting for responses from Google Trends, saving the waiting time.
* **Saving repeated requests** ~ Suffering from broken connections and being tired of restarting the requests process? This programme separates the whole process into (1) building a cookies pool, (2) building a widgets pool and (3) retrieving data. The programme can be started from either sub-stage, avoiding sending repeated requests.
+ **Unlimited tasks amount** ~ Tons of queries? The programme will handle that for you automatically.

### Cons
- **Heavily relying on proxies** ~ When running on a large amount of queries, proxies would be required for successfully catching responses. In this context, a small amount of rotating proxies or a large amount of static proxies would be required.
+ **Only timeseries interest data is avaiable now** ~ Will test others in the future.

## Requirements
- python >= 3.10
* aiohttp
* aiofiles
* numpy
+ pandas

## Files
Settings can be customized by amending the settings.json under the foler settings.

An example input of queries is given under the data folder.

An example of proxies file is given under the proxies folder.

The file userAgents.json is from [Said-Ait-Driss](https://github.com/Said-Ait-Driss/user-agents).

## Before Start
### I. Initial stage
1. Install [Python version at least 3.10](https://www.python.org/downloads/) if you don't have one, I use python 3.11 in this example
2. Install package via pip command line. (On macOS's terminal or WindowsOS's CMD)
```consol
pip install aioTrends
pip install virtualenv
```
3. Create a virtual environment, named as atenv, for running python3.11 without affecting your other setups.
```consol
where python3.11
```
copy the path to python 3.11 and replace below path
```consol
virtualenv -p /path/to/python3.11 atenv
```
4. Activate the virtual environment

On Windows:
```consol
atenv\Scripts\activate
```
On macOS and Linux:
```consol
source atenv/bin/activate
```
5. Install aioTrends ********

the package must be installed under the environment of python 3.10+  
```consol
pip install aioTrends
```

6. Checking if installed properly. The programme will creat folders, please follow the instructions given by the programme.
```consol
cd path/to/your/working/path
python
import aioTrends as at
```
7. Amend the settings.json under the folder 'settings'.
8. Paste proxies to the proxies.txt under the folder 'proxies'.
9. Get userAgents.json file from [Said-Ait-Driss](https://github.com/Said-Ait-Driss/user-agents) and past it under the folder 'settings'.

## Getting Started
### II. Setup a queries file

```python
import pickle
qrys = {
    0: {'keywords': ['AAPL'], 'periods': '2007-01-01 2007-08-31', 'freq': 'D'},
    1: {'keywords': ['AMZN'], 'periods': 'all', 'freq': 'M'},
    2: {'keywords': ['AAPL', 'AMZN'], 'periods': 'all', 'freq': 'M'},
    .
    .
    .
    10000: {'keywords': ['MSFT'], 'periods': '2004-01-01 2022-12-31', 'freq': 'M'}
    }

pickle.dump(qrys, open('./data/qrys.pkl', 'wb'))
```

Alternatively, function ```formQueries``` would form the query dataset based on the list of keywords you give.
```python
from aioTrends import formQueries
from datetime import date
import pickle

qrys = formQueries(keywords=['AMZN', 'MSFN'], start='2004-01-01', end=date.today(), freq='D')
pickle.dump(qrys, open('./data/qrys.pkl', 'wb'))
```

### III. Create a py script named as example.py

```python
import aioTrends as at

#Step 0: Set the log file. Other settings can be customized by amending the settings.json under the folder settings.
at.setLog('./data/hello.log')

#Step 1: collect 1000 cookies with 100 cocurrent tasks. Cocurrent tasks amount can be customized.
at.CookeisPool(100).run(1000)

#Step 2: get widgets with 100 cocurrent tasks. Cocurrent tasks can be customized.
at.WidgetsPool(100).run()

#Step 3: get data with 100 cocurrent tasks. Cocurrent tasks can be customized.
at.DataInterestOverTime(100).run()
```

Alternatively, you can use below one line for forming queries and getting daily scaled data or monthly data.
```python

import aioTrends as at
from datetime import date

qry_list = ['AMZN', 'AAPL', 'MSFT']

# running 50 cocurrent tasks
ataio = at.Aio(50)

df = ataio.getScaledDailyData(
    keywords=qry_list, # the query keyword list
    filename='test.csv', # json and pickle are both supported
    start='2004-01-01', # both datetime and str are supported
    end=date.today()
    )

fig = df.plot(figsize=(16,8), title='TEST_SCALED_DAILY_DATA').get_figure()
fig.savefig('test_scaled_daily_data.png')

df_m = ataio.getMonthlyData(
    keywords=qry_list, 
    start='2004-01-01', 
    end='2022-12-31'
    )
fig = df_m.plot(figsize=(16,8), title='TEST_MONTHLY_DATA').get_figure()
fig.savefig('test_monthly_data.png')
```

### IV. Run the above example.py file on your terminal or cmd (The code need to be running under the python 3.10+ environment)

```consol
python example.py
```

![Monthly Data](test_monthly_data.png)
![Scaled Daily Data](test_scaled_daily_data.png)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yuz0101/aioTrends",
    "name": "aioTrends",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "google,trends,async,asyncio,aiohttp,googletrends,pytrends",
    "author": "Stephen Zhang",
    "author_email": "stephen_se@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/6d/38/d832a971cd304421352aedda5baaf1fffc29e3b9058c401ae90df0c99cf5/aioTrends-0.0.4.tar.gz",
    "platform": null,
    "description": "# aioTrends\r\n\r\n## Intro.\r\n\r\nThis is a project for asynchronously obtaining data from google trends in an efficient way. Inspired by [pytrends](https://github.com/GeneralMills/pytrends), I am developing this project based on a asynchronous framework, asyncio, and a related module, [aiohttp](https://github.com/aio-libs/aiohttp).\r\n\r\nThe logic behind this project is to firstly build a cookies pool, then obtain and store the tokenized queries (wrapped inside the widgets) in another pool, and lastly retreive the data with widgets from the widget pool.\r\n\r\nOnly data of interest over time is tested and avaiable now.\r\n\r\n## Pros & Cons\r\n### Pros\r\n- **Saving time** ~ By employing the asynchronous framework, the programme will deal with other requests while waiting for responses from Google Trends, saving the waiting time.\r\n* **Saving repeated requests** ~ Suffering from broken connections and being tired of restarting the requests process? This programme separates the whole process into (1) building a cookies pool, (2) building a widgets pool and (3) retrieving data. The programme can be started from either sub-stage, avoiding sending repeated requests.\r\n+ **Unlimited tasks amount** ~ Tons of queries? The programme will handle that for you automatically.\r\n\r\n### Cons\r\n- **Heavily relying on proxies** ~ When running on a large amount of queries, proxies would be required for successfully catching responses. In this context, a small amount of rotating proxies or a large amount of static proxies would be required.\r\n+ **Only timeseries interest data is avaiable now** ~ Will test others in the future.\r\n\r\n## Requirements\r\n- python >= 3.10\r\n* aiohttp\r\n* aiofiles\r\n* numpy\r\n+ pandas\r\n\r\n## Files\r\nSettings can be customized by amending the settings.json under the foler settings.\r\n\r\nAn example input of queries is given under the data folder.\r\n\r\nAn example of proxies file is given under the proxies folder.\r\n\r\nThe file userAgents.json is from [Said-Ait-Driss](https://github.com/Said-Ait-Driss/user-agents).\r\n\r\n## Before Start\r\n### I. Initial stage\r\n1. Install [Python version at least 3.10](https://www.python.org/downloads/) if you don't have one, I use python 3.11 in this example\r\n2. Install package via pip command line. (On macOS's terminal or WindowsOS's CMD)\r\n```consol\r\npip install aioTrends\r\npip install virtualenv\r\n```\r\n3. Create a virtual environment, named as atenv, for running python3.11 without affecting your other setups.\r\n```consol\r\nwhere python3.11\r\n```\r\ncopy the path to python 3.11 and replace below path\r\n```consol\r\nvirtualenv -p /path/to/python3.11 atenv\r\n```\r\n4. Activate the virtual environment\r\n\r\nOn Windows:\r\n```consol\r\natenv\\Scripts\\activate\r\n```\r\nOn macOS and Linux:\r\n```consol\r\nsource atenv/bin/activate\r\n```\r\n5. Install aioTrends ********\r\n\r\nthe package must be installed under the environment of python 3.10+  \r\n```consol\r\npip install aioTrends\r\n```\r\n\r\n6. Checking if installed properly. The programme will creat folders, please follow the instructions given by the programme.\r\n```consol\r\ncd path/to/your/working/path\r\npython\r\nimport aioTrends as at\r\n```\r\n7. Amend the settings.json under the folder 'settings'.\r\n8. Paste proxies to the proxies.txt under the folder 'proxies'.\r\n9. Get userAgents.json file from [Said-Ait-Driss](https://github.com/Said-Ait-Driss/user-agents) and past it under the folder 'settings'.\r\n\r\n## Getting Started\r\n### II. Setup a queries file\r\n\r\n```python\r\nimport pickle\r\nqrys = {\r\n    0: {'keywords': ['AAPL'], 'periods': '2007-01-01 2007-08-31', 'freq': 'D'},\r\n    1: {'keywords': ['AMZN'], 'periods': 'all', 'freq': 'M'},\r\n    2: {'keywords': ['AAPL', 'AMZN'], 'periods': 'all', 'freq': 'M'},\r\n    .\r\n    .\r\n    .\r\n    10000: {'keywords': ['MSFT'], 'periods': '2004-01-01 2022-12-31', 'freq': 'M'}\r\n    }\r\n\r\npickle.dump(qrys, open('./data/qrys.pkl', 'wb'))\r\n```\r\n\r\nAlternatively, function ```formQueries``` would form the query dataset based on the list of keywords you give.\r\n```python\r\nfrom aioTrends import formQueries\r\nfrom datetime import date\r\nimport pickle\r\n\r\nqrys = formQueries(keywords=['AMZN', 'MSFN'], start='2004-01-01', end=date.today(), freq='D')\r\npickle.dump(qrys, open('./data/qrys.pkl', 'wb'))\r\n```\r\n\r\n### III. Create a py script named as example.py\r\n\r\n```python\r\nimport aioTrends as at\r\n\r\n#Step 0: Set the log file. Other settings can be customized by amending the settings.json under the folder settings.\r\nat.setLog('./data/hello.log')\r\n\r\n#Step 1: collect 1000 cookies with 100 cocurrent tasks. Cocurrent tasks amount can be customized.\r\nat.CookeisPool(100).run(1000)\r\n\r\n#Step 2: get widgets with 100 cocurrent tasks. Cocurrent tasks can be customized.\r\nat.WidgetsPool(100).run()\r\n\r\n#Step 3: get data with 100 cocurrent tasks. Cocurrent tasks can be customized.\r\nat.DataInterestOverTime(100).run()\r\n```\r\n\r\nAlternatively, you can use below one line for forming queries and getting daily scaled data or monthly data.\r\n```python\r\n\r\nimport aioTrends as at\r\nfrom datetime import date\r\n\r\nqry_list = ['AMZN', 'AAPL', 'MSFT']\r\n\r\n# running 50 cocurrent tasks\r\nataio = at.Aio(50)\r\n\r\ndf = ataio.getScaledDailyData(\r\n    keywords=qry_list, # the query keyword list\r\n    filename='test.csv', # json and pickle are both supported\r\n    start='2004-01-01', # both datetime and str are supported\r\n    end=date.today()\r\n    )\r\n\r\nfig = df.plot(figsize=(16,8), title='TEST_SCALED_DAILY_DATA').get_figure()\r\nfig.savefig('test_scaled_daily_data.png')\r\n\r\ndf_m = ataio.getMonthlyData(\r\n    keywords=qry_list, \r\n    start='2004-01-01', \r\n    end='2022-12-31'\r\n    )\r\nfig = df_m.plot(figsize=(16,8), title='TEST_MONTHLY_DATA').get_figure()\r\nfig.savefig('test_monthly_data.png')\r\n```\r\n\r\n### IV. Run the above example.py file on your terminal or cmd (The code need to be running under the python 3.10+ environment)\r\n\r\n```consol\r\npython example.py\r\n```\r\n\r\n![Monthly Data](test_monthly_data.png)\r\n![Scaled Daily Data](test_scaled_daily_data.png)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Library for fetching Google Trends in an async. way",
    "version": "0.0.4",
    "project_urls": {
        "Download": "https://github.com/yuz0101/aioTrends/archive/refs/tags/v_04.tar.gz",
        "Homepage": "https://github.com/yuz0101/aioTrends"
    },
    "split_keywords": [
        "google",
        "trends",
        "async",
        "asyncio",
        "aiohttp",
        "googletrends",
        "pytrends"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d38d832a971cd304421352aedda5baaf1fffc29e3b9058c401ae90df0c99cf5",
                "md5": "22127f63f7205d89ccdffef4b0f9091e",
                "sha256": "dde660a4312a63126ee2a60c6752f02aa8154c7752ecc5b514d326e4051f9e00"
            },
            "downloads": -1,
            "filename": "aioTrends-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "22127f63f7205d89ccdffef4b0f9091e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 327765,
            "upload_time": "2023-08-22T11:39:08",
            "upload_time_iso_8601": "2023-08-22T11:39:08.642556Z",
            "url": "https://files.pythonhosted.org/packages/6d/38/d832a971cd304421352aedda5baaf1fffc29e3b9058c401ae90df0c99cf5/aioTrends-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-22 11:39:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yuz0101",
    "github_project": "aioTrends",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "aiotrends"
}
        
Elapsed time: 0.11007s