gspread-pandas


Namegspread-pandas JSON
Version 3.3.0 PyPI version JSON
download
home_pagehttps://github.com/aiguofer/gspread-pandas
SummaryA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
upload_time2024-02-13 17:02:55
maintainer
docs_urlNone
authorDiego Fernandez
requires_python
licenseBSD
keywords gspread pandas google spreadsheets
VCS
bugtrack_url
requirements gspread pandas decorator google-auth google-auth-oauthlib
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ===============
Getting Started
===============

.. image:: https://img.shields.io/pypi/v/gspread-pandas.svg
        :target: https://pypi.python.org/pypi/gspread-pandas
        :alt: PyPI Version

.. image:: https://readthedocs.org/projects/gspread-pandas/badge/?version=latest
        :target: https://gspread-pandas.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status

author: Diego Fernandez

Links:

-  `Documentation <http://gspread-pandas.readthedocs.io/>`_
-  `Source code <https://github.com/aiguofer/gspread-pandas>`_
-  `Short video tutorial <https://youtu.be/2yIcNYzfzPw>`_

.. attention:: Looking for maintainer

   I don't really use this package anymore and with my current lifestyle I don't
   have much time to maintain this project. The library as-is only supports
   gspread version <6. Updating this to support v6 might involve a fair bit of work
   which I unfortunately cant do. I'm happy to work with anyone who might want to
   maintain, or I can still accept PRs.


Overview
========

A package to easily open an instance of a Google spreadsheet and
interact with worksheets through Pandas DataFrames. It enables you to
easily pull data from Google spreadsheets into DataFrames as well as
push data into spreadsheets from DataFrames. It leverages
`gspread <https://github.com/burnash/gspread/>`__ in the backend for
most of the heavylifting, but it has a lot of added functionality
to handle things specific to working with DataFrames as well as
some extra nice to have features.

The target audience are Data Analysts and Data Scientists, but it can also
be used by Data Engineers or anyone trying to automate workflows with Google
Sheets and Pandas.

Some key goals/features:

-  Be easy to use interactively, with good docstrings and auto-completion
-  Nicely handle headers and indexes (including multi-level headers and merged cells)
-  Run on Jupyter, headless server, and/or scripts
-  Allow storing different user credentials or using Service Accounts
-  Automatically handle token refreshes
-  Enable handling of frozen rows and columns
-  Enable filling in all merged cells when pulling data
-  Nicely handle large data sets and auto-retries
-  Enable creation of filters
-  Handle retries when exceeding 100 second user quota
-  When pushing DataFrames with MultiIndex columns, allow merging or flattening headers
-  Ability to handle Spreadsheet permissions
-  Ability to specify ``ValueInputOption`` and ``ValueRenderOption`` for specific columns

Installation / Usage
====================

To install use pip:

.. code-block:: console

    $ pip install gspread-pandas

Or clone the repo:

.. code-block:: console

    $ git clone https://github.com/aiguofer/gspread-pandas.git
    $ python setup.py install

Before using, you will need to download Google client credentials for
your app.

Client Credentials
------------------

To allow a script to use Google Drive API we need to authenticate our
self towards Google. To do so, we need to create a project, describing
the tool and generate credentials. Please use your web browser and go to
`Google console <https://console.developers.google.com/>`__ and :

-  Choose **Create Project** in popup menu on the top.
-  A dialog box appears, so give your project a name and click on
   **Create** button.
-  On the left-side menu click on **API Manager**.
-  A table of available APIs is shown. Switch **Drive API** and click on
   **Enable API** button. Do the same for **Sheets API**. Other APIs might
   be switched off, for our purpose.
-  On the left-side menu click on **Credentials**.
-  In section **OAuth consent screen** select your email address and
   give your product a name. Then click on **Save** button.
-  In section **Credentials** click on **Add credentials** and switch
   **OAuth client ID** (if you want to use your own account or enable
   the use of multiple accounts) or **Service account key** (if you prefer
   to have a service account interacting with spreadsheets).
-  If you select **OAuth client ID**:

   -  Select **Application type** item as **Desktop app** and give it a name.
   -  Click on **Create** button.
   -  Click on **Download JSON** icon on the right side of created
      **OAuth client IDs** and store the downloaded file on your file system.
-  If you select **Service account key**

   -  Click on **Service account** dropdown and select **New service account**
   -  Give it a **Service account name** and ignore the **Role** dropdown
      (unless you know you need this for something else, it's not necessary for
      working with spreadsheets)
   -  Note the **Service account ID** as you might need to give that user
      permission to interact with your spreadsheets
   -  Leave **Key type** as **JSON**
   -  Click **Create** and store the downloaded file on your file system.
-  Please be aware, the file contains your private credentials, so take
   care of the file in the same way you care of your private SSH key;
   Move the downloaded JSON to ``~/.config/gspread_pandas/google_secret.json``
   (or you can configure the directory and file name by directly calling
   ``gspread_pandas.conf.get_config``


Thanks to similar project
`df2gspread <https://github.com/maybelinot/df2gspread>`__ for this great
description of how to get the client credentials.

You can read more about it in the `configuration docs
<https://gspread-pandas.readthedocs.io/en/latest/configuration.html>`__
including how to change the default behavior.

Example
=======

.. code:: python

    import pandas as pd
    from gspread_pandas import Spread, Client

    file_name = "http://stats.idre.ucla.edu/stat/data/binary.csv"
    df = pd.read_csv(file_name)

    # 'Example Spreadsheet' needs to already exist and your user must have access to it
    spread = Spread('Example Spreadsheet')
    # This will ask to authenticate if you haven't done so before

    # Display available worksheets
    spread.sheets

    # Save DataFrame to worksheet 'New Test Sheet', create it first if it doesn't exist
    spread.df_to_sheet(df, index=False, sheet='New Test Sheet', start='A2', replace=True)
    spread.update_cells('A1', 'B1', ['Created by:', spread.email])
    print(spread)
    # <gspread_pandas.client.Spread - User: '<example_user>@gmail.com', Spread: 'Example Spreadsheet', Sheet: 'New Test Sheet'>

    # You can now first instanciate a Client separately and query folders and
    # instanciate other Spread objects by passing in the Client
    client = Client()
    # Assumming you have a dir called 'example dir' with sheets in it
    available_sheets = client.find_spreadsheet_files_in_folders('example dir')
    spreads = []
    for sheet in available_sheets.get('example dir', []):
        spreads.append(Spread(sheet['id'], client=client))

Troubleshooting
===============

EOFError in Rodeo
-----------------

If you're trying to use ``gspread_pandas`` from within
`Rodeo <https://www.yhat.com/products/rodeo>`_ you might get an
``EOFError: EOF when reading a line`` error when trying to pass in the verification
code. The workaround for this is to first verify your account in a regular shell.
Since you're just doing this to get your Oauth token, the spreadsheet doesn't need
to be valid. Just run this in shell:

.. code:: python

   python -c "from gspread_pandas import Spread; Spread('<user_key>','')"

Then follow the instructions to create and store the OAuth creds.


This action would increase the number of cells in the workbook above the limit of 10000000 cells.
-------------------------------------------------------------------------------------------------

IMO, Google sheets is not the right tool for large datasets. However, there's probably good reaons
you might have to use it in such cases. When uploading a large DataFrame, you might run into this
error.

By default, ``Spread.df_to_sheet`` will add rows and/or columns needed to accomodate the DataFrame.
Since a new sheet contains a fairly large number of columns, if you're uploading a DF with lots of
rows you might exceed the max number of cells in a worksheet even if your data does not. In order
to fix this you have 2 options:

1. The easiest is to pass ``replace=True``, which will first resize the worksheet and clear out all values.
2. Another option is to first resize to 1x1 using ``Spread.sheet.resize(1, 1)`` and then do ``df_to_sheet``

There's a strange caveat with resizing, so going to 1x1 first is recommended (``replace=True`` already does this). To read more see `this issue <https://issuetracker.google.com/issues/213126648>`_

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aiguofer/gspread-pandas",
    "name": "gspread-pandas",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "gspread pandas google spreadsheets",
    "author": "Diego Fernandez",
    "author_email": "aiguo.fernandez@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/5e/5c/851abfb9adf4e70b232f3b6e0237e416776aae9257b59eaa0f44c0960084/gspread-pandas-3.3.0.tar.gz",
    "platform": null,
    "description": "===============\nGetting Started\n===============\n\n.. image:: https://img.shields.io/pypi/v/gspread-pandas.svg\n        :target: https://pypi.python.org/pypi/gspread-pandas\n        :alt: PyPI Version\n\n.. image:: https://readthedocs.org/projects/gspread-pandas/badge/?version=latest\n        :target: https://gspread-pandas.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation Status\n\nauthor: Diego Fernandez\n\nLinks:\n\n-  `Documentation <http://gspread-pandas.readthedocs.io/>`_\n-  `Source code <https://github.com/aiguofer/gspread-pandas>`_\n-  `Short video tutorial <https://youtu.be/2yIcNYzfzPw>`_\n\n.. attention:: Looking for maintainer\n\n   I don't really use this package anymore and with my current lifestyle I don't\n   have much time to maintain this project. The library as-is only supports\n   gspread version <6. Updating this to support v6 might involve a fair bit of work\n   which I unfortunately cant do. I'm happy to work with anyone who might want to\n   maintain, or I can still accept PRs.\n\n\nOverview\n========\n\nA package to easily open an instance of a Google spreadsheet and\ninteract with worksheets through Pandas DataFrames. It enables you to\neasily pull data from Google spreadsheets into DataFrames as well as\npush data into spreadsheets from DataFrames. It leverages\n`gspread <https://github.com/burnash/gspread/>`__ in the backend for\nmost of the heavylifting, but it has a lot of added functionality\nto handle things specific to working with DataFrames as well as\nsome extra nice to have features.\n\nThe target audience are Data Analysts and Data Scientists, but it can also\nbe used by Data Engineers or anyone trying to automate workflows with Google\nSheets and Pandas.\n\nSome key goals/features:\n\n-  Be easy to use interactively, with good docstrings and auto-completion\n-  Nicely handle headers and indexes (including multi-level headers and merged cells)\n-  Run on Jupyter, headless server, and/or scripts\n-  Allow storing different user credentials or using Service Accounts\n-  Automatically handle token refreshes\n-  Enable handling of frozen rows and columns\n-  Enable filling in all merged cells when pulling data\n-  Nicely handle large data sets and auto-retries\n-  Enable creation of filters\n-  Handle retries when exceeding 100 second user quota\n-  When pushing DataFrames with MultiIndex columns, allow merging or flattening headers\n-  Ability to handle Spreadsheet permissions\n-  Ability to specify ``ValueInputOption`` and ``ValueRenderOption`` for specific columns\n\nInstallation / Usage\n====================\n\nTo install use pip:\n\n.. code-block:: console\n\n    $ pip install gspread-pandas\n\nOr clone the repo:\n\n.. code-block:: console\n\n    $ git clone https://github.com/aiguofer/gspread-pandas.git\n    $ python setup.py install\n\nBefore using, you will need to download Google client credentials for\nyour app.\n\nClient Credentials\n------------------\n\nTo allow a script to use Google Drive API we need to authenticate our\nself towards Google. To do so, we need to create a project, describing\nthe tool and generate credentials. Please use your web browser and go to\n`Google console <https://console.developers.google.com/>`__ and :\n\n-  Choose **Create Project** in popup menu on the top.\n-  A dialog box appears, so give your project a name and click on\n   **Create** button.\n-  On the left-side menu click on **API Manager**.\n-  A table of available APIs is shown. Switch **Drive API** and click on\n   **Enable API** button. Do the same for **Sheets API**. Other APIs might\n   be switched off, for our purpose.\n-  On the left-side menu click on **Credentials**.\n-  In section **OAuth consent screen** select your email address and\n   give your product a name. Then click on **Save** button.\n-  In section **Credentials** click on **Add credentials** and switch\n   **OAuth client ID** (if you want to use your own account or enable\n   the use of multiple accounts) or **Service account key** (if you prefer\n   to have a service account interacting with spreadsheets).\n-  If you select **OAuth client ID**:\n\n   -  Select **Application type** item as **Desktop app** and give it a name.\n   -  Click on **Create** button.\n   -  Click on **Download JSON** icon on the right side of created\n      **OAuth client IDs** and store the downloaded file on your file system.\n-  If you select **Service account key**\n\n   -  Click on **Service account** dropdown and select **New service account**\n   -  Give it a **Service account name** and ignore the **Role** dropdown\n      (unless you know you need this for something else, it's not necessary for\n      working with spreadsheets)\n   -  Note the **Service account ID** as you might need to give that user\n      permission to interact with your spreadsheets\n   -  Leave **Key type** as **JSON**\n   -  Click **Create** and store the downloaded file on your file system.\n-  Please be aware, the file contains your private credentials, so take\n   care of the file in the same way you care of your private SSH key;\n   Move the downloaded JSON to ``~/.config/gspread_pandas/google_secret.json``\n   (or you can configure the directory and file name by directly calling\n   ``gspread_pandas.conf.get_config``\n\n\nThanks to similar project\n`df2gspread <https://github.com/maybelinot/df2gspread>`__ for this great\ndescription of how to get the client credentials.\n\nYou can read more about it in the `configuration docs\n<https://gspread-pandas.readthedocs.io/en/latest/configuration.html>`__\nincluding how to change the default behavior.\n\nExample\n=======\n\n.. code:: python\n\n    import pandas as pd\n    from gspread_pandas import Spread, Client\n\n    file_name = \"http://stats.idre.ucla.edu/stat/data/binary.csv\"\n    df = pd.read_csv(file_name)\n\n    # 'Example Spreadsheet' needs to already exist and your user must have access to it\n    spread = Spread('Example Spreadsheet')\n    # This will ask to authenticate if you haven't done so before\n\n    # Display available worksheets\n    spread.sheets\n\n    # Save DataFrame to worksheet 'New Test Sheet', create it first if it doesn't exist\n    spread.df_to_sheet(df, index=False, sheet='New Test Sheet', start='A2', replace=True)\n    spread.update_cells('A1', 'B1', ['Created by:', spread.email])\n    print(spread)\n    # <gspread_pandas.client.Spread - User: '<example_user>@gmail.com', Spread: 'Example Spreadsheet', Sheet: 'New Test Sheet'>\n\n    # You can now first instanciate a Client separately and query folders and\n    # instanciate other Spread objects by passing in the Client\n    client = Client()\n    # Assumming you have a dir called 'example dir' with sheets in it\n    available_sheets = client.find_spreadsheet_files_in_folders('example dir')\n    spreads = []\n    for sheet in available_sheets.get('example dir', []):\n        spreads.append(Spread(sheet['id'], client=client))\n\nTroubleshooting\n===============\n\nEOFError in Rodeo\n-----------------\n\nIf you're trying to use ``gspread_pandas`` from within\n`Rodeo <https://www.yhat.com/products/rodeo>`_ you might get an\n``EOFError: EOF when reading a line`` error when trying to pass in the verification\ncode. The workaround for this is to first verify your account in a regular shell.\nSince you're just doing this to get your Oauth token, the spreadsheet doesn't need\nto be valid. Just run this in shell:\n\n.. code:: python\n\n   python -c \"from gspread_pandas import Spread; Spread('<user_key>','')\"\n\nThen follow the instructions to create and store the OAuth creds.\n\n\nThis action would increase the number of cells in the workbook above the limit of 10000000 cells.\n-------------------------------------------------------------------------------------------------\n\nIMO, Google sheets is not the right tool for large datasets. However, there's probably good reaons\nyou might have to use it in such cases. When uploading a large DataFrame, you might run into this\nerror.\n\nBy default, ``Spread.df_to_sheet`` will add rows and/or columns needed to accomodate the DataFrame.\nSince a new sheet contains a fairly large number of columns, if you're uploading a DF with lots of\nrows you might exceed the max number of cells in a worksheet even if your data does not. In order\nto fix this you have 2 options:\n\n1. The easiest is to pass ``replace=True``, which will first resize the worksheet and clear out all values.\n2. Another option is to first resize to 1x1 using ``Spread.sheet.resize(1, 1)`` and then do ``df_to_sheet``\n\nThere's a strange caveat with resizing, so going to 1x1 first is recommended (``replace=True`` already does this). To read more see `this issue <https://issuetracker.google.com/issues/213126648>`_\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.",
    "version": "3.3.0",
    "project_urls": {
        "Download": "https://github.com/aiguofer/gspread-pandas/tarball/v3.3.0",
        "Homepage": "https://github.com/aiguofer/gspread-pandas"
    },
    "split_keywords": [
        "gspread",
        "pandas",
        "google",
        "spreadsheets"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "55408839c83d13d31687e28d2eed21ae66f77889dcdbfcedc71ae6a0aa93863f",
                "md5": "55a4abb250d3123f4527b36f63b47aef",
                "sha256": "a4dfddd7b1c5418742e30099a01766ecd96d1f5bbcad8b1e1060c4a5f16fd627"
            },
            "downloads": -1,
            "filename": "gspread_pandas-3.3.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "55a4abb250d3123f4527b36f63b47aef",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 27301,
            "upload_time": "2024-02-13T17:02:52",
            "upload_time_iso_8601": "2024-02-13T17:02:52.997266Z",
            "url": "https://files.pythonhosted.org/packages/55/40/8839c83d13d31687e28d2eed21ae66f77889dcdbfcedc71ae6a0aa93863f/gspread_pandas-3.3.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5e5c851abfb9adf4e70b232f3b6e0237e416776aae9257b59eaa0f44c0960084",
                "md5": "4d1972299a19b787f698dc5bd9260038",
                "sha256": "aac84bd63594db6271ad2cfe10be64614ea5d1129d063ca57b40c2b9dcc18013"
            },
            "downloads": -1,
            "filename": "gspread-pandas-3.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4d1972299a19b787f698dc5bd9260038",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 29525,
            "upload_time": "2024-02-13T17:02:55",
            "upload_time_iso_8601": "2024-02-13T17:02:55.886089Z",
            "url": "https://files.pythonhosted.org/packages/5e/5c/851abfb9adf4e70b232f3b6e0237e416776aae9257b59eaa0f44c0960084/gspread-pandas-3.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-13 17:02:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aiguofer",
    "github_project": "gspread-pandas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "gspread",
            "specs": [
                [
                    "<",
                    "6"
                ],
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "decorator",
            "specs": []
        },
        {
            "name": "google-auth",
            "specs": []
        },
        {
            "name": "google-auth-oauthlib",
            "specs": []
        }
    ],
    "lcname": "gspread-pandas"
}
        
Elapsed time: 0.22426s