twclient


Nametwclient JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/social-machines/twclient
SummaryA high-level analytics-focused client for the Twitter API
upload_time2023-02-08 07:34:15
maintainer
docs_urlNone
authorWilliam Brannon
requires_python>=3.7
licenseApache 2.0
keywords twitter tweepy data science analytics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg
   :target: https://www.apache.org/licenses/LICENSE-2.0
   :alt: Apache 2.0 License

.. image:: https://badge.fury.io/py/twclient.svg
   :target: https://pypi.python.org/pypi/twclient/
   :alt: PyPi version

.. image:: https://img.shields.io/pypi/pyversions/twclient.svg
   :target: https://pypi.python.org/pypi/twclient/
   :alt: Python versions

..
    .. image:: https://readthedocs.org/projects/twclient/badge/?version=latest
       :target: http://twclient.readthedocs.io/?badge=latest
       :alt: Documentation Status

twclient
========

This package provides a high-level command-line client for the Twitter API,
with a focus on loading data into a database for analysis or bulk use.

**Documentation**: `mit-ccc.github.io/twclient <https://mit-ccc.github.io/twclient>`__

Why use this project?
=====================

This project offers high-level primitives for researchers who want to get
data out of Twitter, without worrying about the API details. The client can
handle multiple sets of API credentials seamlessly, helping avoid rate limit
issues. [1]_ There's also support for exporting bulk datasets from the fetched
raw data.

Installation
============

Install the package from pypi:

.. code-block:: bash

   pip3 install twclient

or, if you want to use the development version, clone this repo and install:

.. code-block:: bash

   git clone git@github.com:mit-ccc/twclient.git && cd twclient
   pip3 install .

You can also use the ``-e`` flag to install in editable mode:

.. code-block:: bash

    pip3 install -e .

To install all development dependencies, replace ``.`` with ``.[dev]`` in the
arguments to ``pip3 install``.

Usage
=====

First, you need to tell twclient about your database backend and Twitter
credentials. On the database side, we've only tested with Postgres and SQLite.
While the package may well work with other DB engines, be aware that you may
encounter issues.

Setup: Database
---------------

The database backend can be either sqlite or an arbitrary database
specified by a sqlalchemy connection string.

You can set up the database in one of two ways. Both create a persistent
profile in your ``.twclientrc`` file (or whatever other file you specify), so
there's no need to type the database details repeatedly.

First, you can specify the DB with a sqlalchemy connection URL:

.. code-block:: bash

   # Postgres -- this becomes the default DB because you've created it first
   twclient config add-db -u "postgresql+psycopg2://username@hostname:5432/dbname" my_postgres_db

   # Or you could use SQLite
   twclient config add-db -u "sqlite:///home/user/twitter.db" my_sqlite_db

There's also support for using SQLite without having to think about sqlalchemy
and connection URLs:

.. code-block:: bash

   twclient config add-db -f ./twitter2.db my_sqlite_db2

If you specify a file-backed sqlite DB, as in the examples above, it'll be
created if it doesn't exist. Other databases (Postgres, for example) will need
to be set up separately.

Finally, you have to install our database schema into your database to receive
Twitter data:

.. code-block:: bash

   # You have to specify the -y to say you're aware all data will be dropped
   twclient initialize -d my_postgres_db -y

Be aware that doing this will **DROP ALL EXISTING TWCLIENT DATA**!!! (Or other
tables with the same names.) If you're not just getting started, check to make
sure you're using a new or empty database, don't care about the contents,
and/or have backups before running this.

Setup: Twitter
----------------

You'll also need to set up your Twitter API credentials. [1]_ As with the
database setup, doing this stores the credentials in a config file (the same
config file as for database info) for ease of use. Only two sets of credentials
are shown, but you can add as many as you want.

Here's an example of adding two API keys:

.. code-block:: bash

   twclient config add-api -n twitter1 \
       --consumer-key XXXXX \
       --consumer-secret XXXXXX \
       --token XXXXXX \
       --token-secret XXXXXX

   twclient config add-api -n twitter2 \
       --consumer-key XXXXX \
       --consumer-secret XXXXXX \
       --token XXXXXX \
       --token-secret XXXXXX

Here's an example of adding credentials that use `app-only auth <https://developer.twitter.com/en/docs/authentication/oauth-2-0/application-only>`_:

.. code-block:: bash

   twclient config add-api -n twitter3 \
       --consumer-key XXXXX \
       --consumer-secret XXXXXX

Pulling data
--------------

To actually pull data, use the ``twclient fetch`` command. We'll pull
information about three specific users and a Twitter list here. Note that you
can refer to lists either by their "slug" (username/listname) or by the ID at
the end of a URL of the form `https://twitter.com/i/lists/53603015`.

First, let's load some users and their basic info:

.. code-block:: bash

   # you could instead also end this with "-l 53603015"; it's the same list
   twclient fetch users -n wwbrannon CCCatMIT MIT -l MIT/peers1

Now, to save typing, let's use the ``twclient tag`` command to apply a tag we
can use to keep track of these users later:

.. code-block:: bash

   twclient tag create subjects
   twclient tag apply subjects -n wwbrannon CCCatMIT MIT -l MIT/peers1

We can now use this tag in specifying users, such as which users we'd like to
fetch tweets for:

.. code-block:: bash

   twclient fetch tweets -g subjects

And if we also want their follow-graph info (note that a "friend" is Twitter's
term for a follow-ee, an account you follow):

.. code-block:: bash

   twclient fetch friends -g subjects
   twclient fetch followers -g subjects

At this point, the loaded data is in the database configured with ``config
add-db``. Useful features have been normalized out to save processing time. The
raw API responses are also saved for later analysis.

Exporting data
----------------

You can query the data with the usual database tools (``psql`` for postgres,
``sqlite3`` for sqlite, ODBC clients, etc.) or export certain pre-defined bulk
datasets with the ``twclient export`` command. For example, here are the follow
graph and mention graph over users:

.. code-block:: bash

    twclient export follow-graph -o follow-graph.csv
    twclient export mention-graph -o mention-graph.csv

If you want to restrict the export to only the users specified above:

.. code-block:: bash

    twclient export follow-graph -g subjects -o follow-graph.csv
    twclient export mention-graph -g subjects -o mention-graph.csv

For other exports and other options, see the documentation.

Feedback or Contributions
=========================

If you come across a bug, please report it on the Github issue tracker. If you
want to contribute, reach out! Extensions and improvements are welcome.

Copyright
===========

Copyright © 2019-2023 Massachusetts Institute of Technology.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this software except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

.. [1] Of course, you'll need to make sure you have the right to use all of
   your credentials and are complying with Twitter's terms of use.

Changelog
=========

All notable changes to this project will be documented here.

The format is based on `Keep a Changelog
<https://keepachangelog.com/en/1.0.0/>`__, and this project follows `Semantic
Versioning <https://semver.org/spec/v2.0.0.html>`__.

v0.2.0
--------

Added
~~~~~~~

New features:

*  First stable version.
*  API documentation.
*  A command-line interface for easy automated fetching of data.
*  A stable relational data model, to make further analysis or data processing
   independent of the details of data ingest.
*  Support for fetching follow-graph edges, tweets, lists and user info.
*  Support for a variety of database backends via sqlalchemy.
*  Type-2 SCD for tracking the follow graph and list membership over time.
*  Many sanity checks in the data model to ensure correctness of loaded data.
*  Loaded data is extensively normalized: mentions, replies, retweets,
   and entities like hashtags are extracted into first-class objects for
   more convenient and accessible analysis.
*  Users can be tagged into arbitrary groups for greater convenience in
   analysis or data collection.
*  Support for both app-only and user authentication to the Twitter API.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/social-machines/twclient",
    "name": "twclient",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "twitter,tweepy,data science,analytics",
    "author": "William Brannon",
    "author_email": "wbrannon@mit.edu",
    "download_url": "",
    "platform": "any",
    "description": ".. image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg\n   :target: https://www.apache.org/licenses/LICENSE-2.0\n   :alt: Apache 2.0 License\n\n.. image:: https://badge.fury.io/py/twclient.svg\n   :target: https://pypi.python.org/pypi/twclient/\n   :alt: PyPi version\n\n.. image:: https://img.shields.io/pypi/pyversions/twclient.svg\n   :target: https://pypi.python.org/pypi/twclient/\n   :alt: Python versions\n\n..\n    .. image:: https://readthedocs.org/projects/twclient/badge/?version=latest\n       :target: http://twclient.readthedocs.io/?badge=latest\n       :alt: Documentation Status\n\ntwclient\n========\n\nThis package provides a high-level command-line client for the Twitter API,\nwith a focus on loading data into a database for analysis or bulk use.\n\n**Documentation**: `mit-ccc.github.io/twclient <https://mit-ccc.github.io/twclient>`__\n\nWhy use this project?\n=====================\n\nThis project offers high-level primitives for researchers who want to get\ndata out of Twitter, without worrying about the API details. The client can\nhandle multiple sets of API credentials seamlessly, helping avoid rate limit\nissues. [1]_ There's also support for exporting bulk datasets from the fetched\nraw data.\n\nInstallation\n============\n\nInstall the package from pypi:\n\n.. code-block:: bash\n\n   pip3 install twclient\n\nor, if you want to use the development version, clone this repo and install:\n\n.. code-block:: bash\n\n   git clone git@github.com:mit-ccc/twclient.git && cd twclient\n   pip3 install .\n\nYou can also use the ``-e`` flag to install in editable mode:\n\n.. code-block:: bash\n\n    pip3 install -e .\n\nTo install all development dependencies, replace ``.`` with ``.[dev]`` in the\narguments to ``pip3 install``.\n\nUsage\n=====\n\nFirst, you need to tell twclient about your database backend and Twitter\ncredentials. On the database side, we've only tested with Postgres and SQLite.\nWhile the package may well work with other DB engines, be aware that you may\nencounter issues.\n\nSetup: Database\n---------------\n\nThe database backend can be either sqlite or an arbitrary database\nspecified by a sqlalchemy connection string.\n\nYou can set up the database in one of two ways. Both create a persistent\nprofile in your ``.twclientrc`` file (or whatever other file you specify), so\nthere's no need to type the database details repeatedly.\n\nFirst, you can specify the DB with a sqlalchemy connection URL:\n\n.. code-block:: bash\n\n   # Postgres -- this becomes the default DB because you've created it first\n   twclient config add-db -u \"postgresql+psycopg2://username@hostname:5432/dbname\" my_postgres_db\n\n   # Or you could use SQLite\n   twclient config add-db -u \"sqlite:///home/user/twitter.db\" my_sqlite_db\n\nThere's also support for using SQLite without having to think about sqlalchemy\nand connection URLs:\n\n.. code-block:: bash\n\n   twclient config add-db -f ./twitter2.db my_sqlite_db2\n\nIf you specify a file-backed sqlite DB, as in the examples above, it'll be\ncreated if it doesn't exist. Other databases (Postgres, for example) will need\nto be set up separately.\n\nFinally, you have to install our database schema into your database to receive\nTwitter data:\n\n.. code-block:: bash\n\n   # You have to specify the -y to say you're aware all data will be dropped\n   twclient initialize -d my_postgres_db -y\n\nBe aware that doing this will **DROP ALL EXISTING TWCLIENT DATA**!!! (Or other\ntables with the same names.) If you're not just getting started, check to make\nsure you're using a new or empty database, don't care about the contents,\nand/or have backups before running this.\n\nSetup: Twitter\n----------------\n\nYou'll also need to set up your Twitter API credentials. [1]_ As with the\ndatabase setup, doing this stores the credentials in a config file (the same\nconfig file as for database info) for ease of use. Only two sets of credentials\nare shown, but you can add as many as you want.\n\nHere's an example of adding two API keys:\n\n.. code-block:: bash\n\n   twclient config add-api -n twitter1 \\\n       --consumer-key XXXXX \\\n       --consumer-secret XXXXXX \\\n       --token XXXXXX \\\n       --token-secret XXXXXX\n\n   twclient config add-api -n twitter2 \\\n       --consumer-key XXXXX \\\n       --consumer-secret XXXXXX \\\n       --token XXXXXX \\\n       --token-secret XXXXXX\n\nHere's an example of adding credentials that use `app-only auth <https://developer.twitter.com/en/docs/authentication/oauth-2-0/application-only>`_:\n\n.. code-block:: bash\n\n   twclient config add-api -n twitter3 \\\n       --consumer-key XXXXX \\\n       --consumer-secret XXXXXX\n\nPulling data\n--------------\n\nTo actually pull data, use the ``twclient fetch`` command. We'll pull\ninformation about three specific users and a Twitter list here. Note that you\ncan refer to lists either by their \"slug\" (username/listname) or by the ID at\nthe end of a URL of the form `https://twitter.com/i/lists/53603015`.\n\nFirst, let's load some users and their basic info:\n\n.. code-block:: bash\n\n   # you could instead also end this with \"-l 53603015\"; it's the same list\n   twclient fetch users -n wwbrannon CCCatMIT MIT -l MIT/peers1\n\nNow, to save typing, let's use the ``twclient tag`` command to apply a tag we\ncan use to keep track of these users later:\n\n.. code-block:: bash\n\n   twclient tag create subjects\n   twclient tag apply subjects -n wwbrannon CCCatMIT MIT -l MIT/peers1\n\nWe can now use this tag in specifying users, such as which users we'd like to\nfetch tweets for:\n\n.. code-block:: bash\n\n   twclient fetch tweets -g subjects\n\nAnd if we also want their follow-graph info (note that a \"friend\" is Twitter's\nterm for a follow-ee, an account you follow):\n\n.. code-block:: bash\n\n   twclient fetch friends -g subjects\n   twclient fetch followers -g subjects\n\nAt this point, the loaded data is in the database configured with ``config\nadd-db``. Useful features have been normalized out to save processing time. The\nraw API responses are also saved for later analysis.\n\nExporting data\n----------------\n\nYou can query the data with the usual database tools (``psql`` for postgres,\n``sqlite3`` for sqlite, ODBC clients, etc.) or export certain pre-defined bulk\ndatasets with the ``twclient export`` command. For example, here are the follow\ngraph and mention graph over users:\n\n.. code-block:: bash\n\n    twclient export follow-graph -o follow-graph.csv\n    twclient export mention-graph -o mention-graph.csv\n\nIf you want to restrict the export to only the users specified above:\n\n.. code-block:: bash\n\n    twclient export follow-graph -g subjects -o follow-graph.csv\n    twclient export mention-graph -g subjects -o mention-graph.csv\n\nFor other exports and other options, see the documentation.\n\nFeedback or Contributions\n=========================\n\nIf you come across a bug, please report it on the Github issue tracker. If you\nwant to contribute, reach out! Extensions and improvements are welcome.\n\nCopyright\n===========\n\nCopyright \u00a9 2019-2023 Massachusetts Institute of Technology.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this software except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n.. [1] Of course, you'll need to make sure you have the right to use all of\n   your credentials and are complying with Twitter's terms of use.\n\nChangelog\n=========\n\nAll notable changes to this project will be documented here.\n\nThe format is based on `Keep a Changelog\n<https://keepachangelog.com/en/1.0.0/>`__, and this project follows `Semantic\nVersioning <https://semver.org/spec/v2.0.0.html>`__.\n\nv0.2.0\n--------\n\nAdded\n~~~~~~~\n\nNew features:\n\n*  First stable version.\n*  API documentation.\n*  A command-line interface for easy automated fetching of data.\n*  A stable relational data model, to make further analysis or data processing\n   independent of the details of data ingest.\n*  Support for fetching follow-graph edges, tweets, lists and user info.\n*  Support for a variety of database backends via sqlalchemy.\n*  Type-2 SCD for tracking the follow graph and list membership over time.\n*  Many sanity checks in the data model to ensure correctness of loaded data.\n*  Loaded data is extensively normalized: mentions, replies, retweets,\n   and entities like hashtags are extracted into first-class objects for\n   more convenient and accessible analysis.\n*  Users can be tagged into arbitrary groups for greater convenience in\n   analysis or data collection.\n*  Support for both app-only and user authentication to the Twitter API.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "A high-level analytics-focused client for the Twitter API",
    "version": "0.2.0",
    "split_keywords": [
        "twitter",
        "tweepy",
        "data science",
        "analytics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0801ca81712fe0df4b8893f2fe1e6fd0b21e1a0af1c5f6655b50852e59eebe78",
                "md5": "0356d092126baa5386905eb34fab6fdb",
                "sha256": "f98ecbe4e1661e1c0c1691c42ed7bd755ae1336708a66db3563aacc8fc979412"
            },
            "downloads": -1,
            "filename": "twclient-0.2.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0356d092126baa5386905eb34fab6fdb",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 77043,
            "upload_time": "2023-02-08T07:34:15",
            "upload_time_iso_8601": "2023-02-08T07:34:15.059373Z",
            "url": "https://files.pythonhosted.org/packages/08/01/ca81712fe0df4b8893f2fe1e6fd0b21e1a0af1c5f6655b50852e59eebe78/twclient-0.2.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-08 07:34:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "social-machines",
    "github_project": "twclient",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "twclient"
}
        
Elapsed time: 0.05294s