# TwiGet
TwiGet is a python package for the management of the queries on filtered stream of the Twitter API, and the collection of tweets from it.
It can be used as a command line tool ([`twiget-cli`](#command-line-tool-twiget-cli)) or as a python class ([`TwiGet`](#python-class-twiget)).
## Installation
```
> pip install twiget
```
The command installs the package and also makes the `twiget-cli` command available.
## Command line tool: twiget-cli
TwiGet implements a command line interface that can be started with the command:
```
> twiget-cli
```
When launched without arguments the program searches for a `.twiget.conf` file in the `HOME` directory (the directory pointed by the `$HOME` or `%userprofile%` environment variable).
The file must contain in the first line the [__bearer token__](https://developer.twitter.com/en/docs/authentication/oauth-2-0/bearer-tokens) that allows the program to access the Twitter API.
Alternatively, the name of the file from which to obtain the bearer token can be given as argument when starting the program:
```
> twiget-cli -b path_to_file/with_token.txt
```
__NOTE: store the bearer token in a file with minimum access permissions. Never share it. Revoke any tokens that may have been made public.__
Another optional argument is the path where to save collected tweets.
By default, a `data` subdirectory is created in the current working directory.
```
> twiget-cli -s ./save_dir
```
#### prompt
When started, twiget-cli shown the available commands, and the queries currently registered for the given bearer token (queries are permanently stored on Twitter's servers).
```
TwiGet 0.1.1
Available commands (type help <command> for details):
create, delete, exit, help, list, refresh, save_to, size, start, stop
Registered queries:
ID=1385892384573355842 query="#usa" tag="usa"
ID=1405490304970434817 query="bts" tag="bts"
```
The command prompt tells if twiget-cli is currently collecting tweets, the number of collected tweets, and the save path.
```
[not collecting (0 since last start), save path "data"]>
```
When collecting tweets, the prompt is automatically refreshed every time a given number of tweets is collected (see [the refresh command](#refresh)).
### Commands
#### create
Format:
```
> create <tag> <query>
```
Creates a filtering rule, associated to a given tag name.
Collected tweets are saved in json format in a file named `<tag>.json`, in the given save path.
Tag name is the first argument of the command and cannot contain spaces.
Any word after the tag defines the query.
[Info on how to define rules](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/build-a-rule).
Example:
```
[not collecting (0 since last start), save path "data"]>create usa jow biden
Tweets matching the query "jow biden" will be saved in the file data/usa.json
ID=1395720345987340524
```
#### list
Format:
```
> list
```
Lists the queries, their ID and their tag, currently registered for the filtered stream.
Example:
```
[not collecting (0 since last start), save path "data"]> list
Registered queries:
ID=1385892384573355842 query="#usa" tag="usa"
ID=13905490304970434817 query="bts" tag="bts"
ID=1395720345987340524 query="joe biden" tag="usa"
```
#### delete
Format:
```
> delete <ID>
```
Deletes a query, given its ID.
Example:
```
[not collecting (0 since last start), save path "data"]> delete 1385892384573355842
```
#### start
Format:
```
> start
```
Starts a background process that collects tweets from the filtered stream and puts them in json files, according to the tag they are associated to.
Collection continues until a `stop` or a `exit` command is entered.
To let TwiGet collect data for longer periods of time, I suggest to use TwiGet within a virtual terminal session, using, e.g., `screen` or `tmux`.
_Note: create and delete command can be issued also when collecting tweets. The collection process is updated immediately._
Example:
```
[not collecting (0 since last start), save path "data"]> start
[collecting (0 since last start), save path "data"]>
```
#### stop
Format:
```
> stop
```
Stop data collections.
Example:
```
[collecting (3000 since last start), save path "data"]> stop
[not collecting (3152 since last start), save path "data"]>
```
#### save_to
Format:
```
> save_to <path>
```
Sets the path where json files are saved.
_Note: changing path while collecting tweets will immediately create new json file in the new path, leaving all tweets collected until that moment in the old path.
Example:
```
[not collecting (0 since last start), save path "data"]> save_to ../my_project
[not collecting (0 since last start), save path "../my_project"]>
```
#### size
Format:
```
> size <size>
```
Sets the maximum size in bytes of json files.
When a json file reaches this size, a new file with an incremented index (e.g., tag_0.json, tag_1.json, tag_2.json...) is created.
Example:
```
[not collecting (0 since last start), save path "data"]> size 1000000
```
#### refresh
Format:
```
> refresh <count>
```
Sets the number of collected tweets that triggers an automatic refresh of the prompt.
Example:
```
[not collecting (0 since last start), save path "data"]> refresh 10000
```
### Implementing a custom command line tool
The `TwiGetCLIBase` class in `twiget_cli.py` module implements all the above fuctions except those related to saving to json file (i.e. `save_to` and `size`).
It can be used to implement a command line tool that performs a different processing of the collected data, e.g., saving to a db.
## Python class: TwiGet
TwiGet core functionalities are implemented in a python class, which can be directly used in python code.
```python
from twiget import TwiGet
bearer = 'put here the bearer token'
collector = TwiGet(bearer)
# Adding a filtering rule
# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/post-tweets-search-stream-rules
query = 'support vector machine'
tag = 'ml'
answer = collector.add_rule(query, tag)
# returns the parsed json answer from the server.
print(answer)
# Listing the current filtering rules
# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/get-tweets-search-stream-rules
answer = collector.get_rules()
# returns the parsed json answer from the server.
print(answer)
# Delete some rules by giving their ID
# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/post-tweets-search-stream-rules
ids = [48573094587309485,3029834285720978]
answer = collector.delete_rules(ids)
# returns the parsed json answer from the server.
print(answer)
# Adding a callback
# The data argument contains the content and information about the retrieved tweet
# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/get-tweets-search-stream
def print_tag(data):
print(data['matching_rules'][0]['tag'])
answer = collector.add_callback('print tag', print_tag)
# returns the parsed json answer from the server.
print(answer)
# Getting callbacks
callbacks = collector.get_callbacks()
# returns a list of pairs with the name of the callback and the callback method.
print(callbacks)
# Delete a callback
collector.delete_callback('print tag')
# Starting tweet collection
collector.start_getting_stream()
# Checking status of collection
running = collector.is_getting_stream()
# returns a boolean. True if collection is active.
print(running)
# Stopping tweet collection
collector.stop_getting_stream()
```
## License
Author: [Andrea Esuli](http://esuli.it)
BSD 3-Clause License, see [license file](COPYING)
Raw data
{
"_id": null,
"home_page": "https://github.com/aesuli/twiget",
"name": "twiget",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6, <4",
"maintainer_email": "",
"keywords": "twitter,retrieval,streams,tweets,search,filter,API",
"author": "Andrea Esuli",
"author_email": "andrea@esuli.it",
"download_url": "https://files.pythonhosted.org/packages/3f/3b/80f2099fd152d25785ceb48d34e96f506149c8e1d5d872d12e9761196f9d/twiget-0.1.5.tar.gz",
"platform": "",
"description": "# TwiGet\n\nTwiGet is a python package for the management of the queries on filtered stream of the Twitter API, and the collection of tweets from it.\n\nIt can be used as a command line tool ([`twiget-cli`](#command-line-tool-twiget-cli)) or as a python class ([`TwiGet`](#python-class-twiget)).\n\n## Installation\n\n```\n> pip install twiget\n```\n\nThe command installs the package and also makes the `twiget-cli` command available. \n\n## Command line tool: twiget-cli\n\nTwiGet implements a command line interface that can be started with the command:\n```\n> twiget-cli\n```\nWhen launched without arguments the program searches for a `.twiget.conf` file in the `HOME` directory (the directory pointed by the `$HOME` or `%userprofile%` environment variable).\nThe file must contain in the first line the [__bearer token__](https://developer.twitter.com/en/docs/authentication/oauth-2-0/bearer-tokens) that allows the program to access the Twitter API.\n\nAlternatively, the name of the file from which to obtain the bearer token can be given as argument when starting the program:\n```\n> twiget-cli -b path_to_file/with_token.txt\n```\n\n__NOTE: store the bearer token in a file with minimum access permissions. Never share it. Revoke any tokens that may have been made public.__\n\nAnother optional argument is the path where to save collected tweets.\nBy default, a `data` subdirectory is created in the current working directory.\n```\n> twiget-cli -s ./save_dir\n```\n\n#### prompt\nWhen started, twiget-cli shown the available commands, and the queries currently registered for the given bearer token (queries are permanently stored on Twitter's servers).\n```\nTwiGet 0.1.1\n\nAvailable commands (type help <command> for details):\ncreate, delete, exit, help, list, refresh, save_to, size, start, stop\n\nRegistered queries:\n\tID=1385892384573355842\tquery=\"#usa\"\ttag=\"usa\"\n\tID=1405490304970434817\tquery=\"bts\"\ttag=\"bts\"\n```\nThe command prompt tells if twiget-cli is currently collecting tweets, the number of collected tweets, and the save path.\n```\n[not collecting (0 since last start), save path \"data\"]>\n```\nWhen collecting tweets, the prompt is automatically refreshed every time a given number of tweets is collected (see [the refresh command](#refresh)).\n### Commands\n\n#### create\n\nFormat:\n```\n> create <tag> <query>\n```\nCreates a filtering rule, associated to a given tag name. \nCollected tweets are saved in json format in a file named `<tag>.json`, in the given save path. \nTag name is the first argument of the command and cannot contain spaces. \nAny word after the tag defines the query.\n[Info on how to define rules](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/build-a-rule). \n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]>create usa jow biden\nTweets matching the query \"jow biden\" will be saved in the file data/usa.json\nID=1395720345987340524\n```\n\n#### list\nFormat:\n```\n> list\n```\nLists the queries, their ID and their tag, currently registered for the filtered stream.\n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]> list\nRegistered queries:\n\tID=1385892384573355842\tquery=\"#usa\"\ttag=\"usa\"\n\tID=13905490304970434817\tquery=\"bts\"\ttag=\"bts\"\n\tID=1395720345987340524\tquery=\"joe biden\"\ttag=\"usa\"\n```\n\n#### delete\nFormat:\n```\n> delete <ID>\n```\nDeletes a query, given its ID.\n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]> delete 1385892384573355842\n```\n\n#### start\nFormat:\n```\n> start\n```\n\nStarts a background process that collects tweets from the filtered stream and puts them in json files, according to the tag they are associated to.\n\nCollection continues until a `stop` or a `exit` command is entered. \nTo let TwiGet collect data for longer periods of time, I suggest to use TwiGet within a virtual terminal session, using, e.g., `screen` or `tmux`. \n\n_Note: create and delete command can be issued also when collecting tweets. The collection process is updated immediately._\n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]> start\n[collecting (0 since last start), save path \"data\"]>\n```\n#### stop\nFormat:\n```\n> stop\n```\nStop data collections.\n\nExample:\n```\n[collecting (3000 since last start), save path \"data\"]> stop\n[not collecting (3152 since last start), save path \"data\"]> \n```\n\n#### save_to\nFormat:\n```\n> save_to <path>\n```\nSets the path where json files are saved.\n\n_Note: changing path while collecting tweets will immediately create new json file in the new path, leaving all tweets collected until that moment in the old path. \n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]> save_to ../my_project\n[not collecting (0 since last start), save path \"../my_project\"]> \n```\n#### size\nFormat:\n```\n> size <size>\n```\nSets the maximum size in bytes of json files.\nWhen a json file reaches this size, a new file with an incremented index (e.g., tag_0.json, tag_1.json, tag_2.json...) is created.\n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]> size 1000000\n```\n#### refresh\nFormat:\n```\n> refresh <count>\n```\nSets the number of collected tweets that triggers an automatic refresh of the prompt. \n\nExample:\n```\n[not collecting (0 since last start), save path \"data\"]> refresh 10000\n```\n\n### Implementing a custom command line tool\n\nThe `TwiGetCLIBase` class in `twiget_cli.py` module implements all the above fuctions except those related to saving to json file (i.e. `save_to` and `size`).\nIt can be used to implement a command line tool that performs a different processing of the collected data, e.g., saving to a db.\n\n\n## Python class: TwiGet\n\nTwiGet core functionalities are implemented in a python class, which can be directly used in python code.\n\n```python\nfrom twiget import TwiGet\n\nbearer = 'put here the bearer token'\n\ncollector = TwiGet(bearer)\n\n# Adding a filtering rule\n# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/post-tweets-search-stream-rules\nquery = 'support vector machine'\ntag = 'ml'\nanswer = collector.add_rule(query, tag)\n# returns the parsed json answer from the server.\nprint(answer)\n\n# Listing the current filtering rules\n# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/get-tweets-search-stream-rules\nanswer = collector.get_rules()\n# returns the parsed json answer from the server.\nprint(answer)\n\n# Delete some rules by giving their ID\n# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/post-tweets-search-stream-rules\nids = [48573094587309485,3029834285720978]\nanswer = collector.delete_rules(ids)\n# returns the parsed json answer from the server.\nprint(answer)\n\n# Adding a callback\n# The data argument contains the content and information about the retrieved tweet\n# https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/api-reference/get-tweets-search-stream\ndef print_tag(data):\n print(data['matching_rules'][0]['tag'])\n \nanswer = collector.add_callback('print tag', print_tag)\n# returns the parsed json answer from the server.\nprint(answer)\n\n\n# Getting callbacks\ncallbacks = collector.get_callbacks()\n# returns a list of pairs with the name of the callback and the callback method.\nprint(callbacks)\n\n# Delete a callback\ncollector.delete_callback('print tag')\n\n# Starting tweet collection\ncollector.start_getting_stream()\n\n# Checking status of collection\nrunning = collector.is_getting_stream()\n# returns a boolean. True if collection is active.\nprint(running)\n\n# Stopping tweet collection\ncollector.stop_getting_stream()\n```\n\n## License\n\nAuthor: [Andrea Esuli](http://esuli.it)\n\nBSD 3-Clause License, see [license file](COPYING)\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A package for management of filtering queries of the Twitter API",
"version": "0.1.5",
"split_keywords": [
"twitter",
"retrieval",
"streams",
"tweets",
"search",
"filter",
"api"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "55fc6cedbbd530f64b3e0cc419a1472f605008eedeaf6b3790e2ae0adbc748c5",
"md5": "d31e3ed1d65f9a82351c06a9c6072955",
"sha256": "c7b7d026150e06ba7b58ebf613c600d2a4fa494bdea6fafbf1555cecc08641c6"
},
"downloads": -1,
"filename": "twiget-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d31e3ed1d65f9a82351c06a9c6072955",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6, <4",
"size": 9634,
"upload_time": "2021-06-10T06:37:05",
"upload_time_iso_8601": "2021-06-10T06:37:05.694793Z",
"url": "https://files.pythonhosted.org/packages/55/fc/6cedbbd530f64b3e0cc419a1472f605008eedeaf6b3790e2ae0adbc748c5/twiget-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3f3b80f2099fd152d25785ceb48d34e96f506149c8e1d5d872d12e9761196f9d",
"md5": "e0e0df0dbea351d832830bb57c40b956",
"sha256": "9604bcac76d7cc1ad90e2471c6147368f1d64a38d9bb914fba7c2b77262c23a6"
},
"downloads": -1,
"filename": "twiget-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "e0e0df0dbea351d832830bb57c40b956",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6, <4",
"size": 10205,
"upload_time": "2021-06-10T06:37:06",
"upload_time_iso_8601": "2021-06-10T06:37:06.782188Z",
"url": "https://files.pythonhosted.org/packages/3f/3b/80f2099fd152d25785ceb48d34e96f506149c8e1d5d872d12e9761196f9d/twiget-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-06-10 06:37:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "aesuli",
"github_project": "twiget",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "twiget"
}