google-sheets-helper


Namegoogle-sheets-helper JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/machado000/google-sheets-helper
SummaryHelper module to parse data from GSheets into database-optimized DataFrames
upload_time2025-08-06 18:24:08
maintainerNone
docs_urlNone
authorJoao Brito
requires_python<3.13,>=3.10
licenseMIT
keywords google-sheets pandas etl data-extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Google Sheets Helper

A Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.

[![PyPI version](https://img.shields.io/pypi/v/google-sheets-helper)](https://pypi.org/project/google-sheets-helper/)
[![Issues](https://img.shields.io/github/issues/machado000/google-sheets-helper)](https://github.com/machado000/google-sheets-helper/issues)
[![Last Commit](https://img.shields.io/github/last-commit/machado000/google-sheets-helper)](https://github.com/machado000/google-sheets-helper/commits/main)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/machado000/google-sheets-helper/blob/main/LICENSE)

## Features

- **Google Sheets & Excel Support**: Read Google Sheets and Excel files directly from Google Drive
- **Database-Ready DataFrames**: Optimized data types and encoding for seamless database storage
- **Flexible Column Naming**: Choose between snake_case or camelCase column conventions
- **Smart Type Detection**: Dynamic conversion of metrics to appropriate int64/float64 types
- **Configurable Missing Values**: Granular control over NaN/NaT handling by column type
- **Character Encoding Cleanup**: Automatic text sanitization for database compatibility
- **Robust Error Handling**: Comprehensive error handling with specific exceptions
- **Type Hints**: Full type hint support for better IDE experience

## Installation

```bash
pip install google-sheets-helper
```

## Quick Start

### 1. Set up credentials

Place your Google service account credentials in `secrets/client_secret.json`.

### 2. Basic usage

```python
from google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging

setup_logging()
client_secret = load_client_secret()
gs_helper = GoogleSheetsHelper(client_secret)

spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"

df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
utils = DataframeUtils()

df = utils.fix_data_types(df, skip_columns=None)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")

print(df.head(), df.dtypes)

os.makedirs("data", exist_ok=True)
filename = os.path.join("data", f"{spreadsheet_id}_{worksheet_name}.csv")

df.to_csv(filename, index=False)
```

## Data Cleaning Pipeline

You can use the built-in DataFrame utilities for further cleaning:

```python
from google_sheets_helper import DataframeUtils

utils = DataframeUtils()
df = utils.fix_data_types(df)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
```

## API Reference

- `GoogleSheetsHelper`: Main class for reading and transforming Google Sheets/Excel data
- `load_client_secret`: Loads credentials from a JSON file
- `setup_logging`: Configures logging for the package
- `DataframeUtils`: Utility class for DataFrame cleaning and optimization
- Exception classes: `AuthenticationError`, `APIError`, `ConfigurationError`, `DataProcessingError`, `ValidationError`

## Error Handling

```python
from google_sheets_helper import (
    GoogleSheetsHelper,
    AuthenticationError,
    ValidationError,
    APIError,
    DataProcessingError,
    ConfigurationError
)

try:
    df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
except AuthenticationError:
    # Handle credential issues
    pass
except ValidationError:
    # Handle input validation errors
    pass
except APIError:
    # Handle API errors
    pass
except DataProcessingError:
    # Handle data processing errors
    pass
```

## Examples

Check the `examples/` directory for comprehensive usage examples:

- `basic_usage.py` - Simple sheet extraction and cleaning

## Requirements

- Python 3.9-3.12
- pandas >= 2.0.0
- gspread >= 5.10.0
- google-api-python-client >= 2.0.0
- tqdm >= 4.65.0

## Development

For development installation:

```bash
git clone https://github.com/machado000/google-sheets-helper
cd google-sheets-helper
pip install -e ".[dev]"
```

## License

MIT License. See [LICENSE](LICENSE) file for details.

## Support

- [Documentation](https://github.com/machado000/google-sheets-helper#readme)
- [Issues](https://github.com/machado000/google-sheets-helper/issues)
- [Examples](examples/)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/machado000/google-sheets-helper",
    "name": "google-sheets-helper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "google-sheets, pandas, etl, data-extraction",
    "author": "Joao Brito",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/0b/66/d937944aad10f6b4b2de7a0cbc7b14a65a17d8b2362702c2f2636fad0dd8/google_sheets_helper-1.0.1.tar.gz",
    "platform": null,
    "description": "# Google Sheets Helper\n\nA Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.\n\n[![PyPI version](https://img.shields.io/pypi/v/google-sheets-helper)](https://pypi.org/project/google-sheets-helper/)\n[![Issues](https://img.shields.io/github/issues/machado000/google-sheets-helper)](https://github.com/machado000/google-sheets-helper/issues)\n[![Last Commit](https://img.shields.io/github/last-commit/machado000/google-sheets-helper)](https://github.com/machado000/google-sheets-helper/commits/main)\n[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/machado000/google-sheets-helper/blob/main/LICENSE)\n\n## Features\n\n- **Google Sheets & Excel Support**: Read Google Sheets and Excel files directly from Google Drive\n- **Database-Ready DataFrames**: Optimized data types and encoding for seamless database storage\n- **Flexible Column Naming**: Choose between snake_case or camelCase column conventions\n- **Smart Type Detection**: Dynamic conversion of metrics to appropriate int64/float64 types\n- **Configurable Missing Values**: Granular control over NaN/NaT handling by column type\n- **Character Encoding Cleanup**: Automatic text sanitization for database compatibility\n- **Robust Error Handling**: Comprehensive error handling with specific exceptions\n- **Type Hints**: Full type hint support for better IDE experience\n\n## Installation\n\n```bash\npip install google-sheets-helper\n```\n\n## Quick Start\n\n### 1. Set up credentials\n\nPlace your Google service account credentials in `secrets/client_secret.json`.\n\n### 2. Basic usage\n\n```python\nfrom google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging\n\nsetup_logging()\nclient_secret = load_client_secret()\ngs_helper = GoogleSheetsHelper(client_secret)\n\nspreadsheet_id = \"your_spreadsheet_id\"\nworksheet_name = \"your_worksheet_name\"\n\ndf = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)\nutils = DataframeUtils()\n\ndf = utils.fix_data_types(df, skip_columns=None)\ndf = utils.handle_missing_values(df)\ndf = utils.clean_text_encoding(df)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\n\nprint(df.head(), df.dtypes)\n\nos.makedirs(\"data\", exist_ok=True)\nfilename = os.path.join(\"data\", f\"{spreadsheet_id}_{worksheet_name}.csv\")\n\ndf.to_csv(filename, index=False)\n```\n\n## Data Cleaning Pipeline\n\nYou can use the built-in DataFrame utilities for further cleaning:\n\n```python\nfrom google_sheets_helper import DataframeUtils\n\nutils = DataframeUtils()\ndf = utils.fix_data_types(df)\ndf = utils.handle_missing_values(df)\ndf = utils.clean_text_encoding(df)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\n```\n\n## API Reference\n\n- `GoogleSheetsHelper`: Main class for reading and transforming Google Sheets/Excel data\n- `load_client_secret`: Loads credentials from a JSON file\n- `setup_logging`: Configures logging for the package\n- `DataframeUtils`: Utility class for DataFrame cleaning and optimization\n- Exception classes: `AuthenticationError`, `APIError`, `ConfigurationError`, `DataProcessingError`, `ValidationError`\n\n## Error Handling\n\n```python\nfrom google_sheets_helper import (\n    GoogleSheetsHelper,\n    AuthenticationError,\n    ValidationError,\n    APIError,\n    DataProcessingError,\n    ConfigurationError\n)\n\ntry:\n    df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)\nexcept AuthenticationError:\n    # Handle credential issues\n    pass\nexcept ValidationError:\n    # Handle input validation errors\n    pass\nexcept APIError:\n    # Handle API errors\n    pass\nexcept DataProcessingError:\n    # Handle data processing errors\n    pass\n```\n\n## Examples\n\nCheck the `examples/` directory for comprehensive usage examples:\n\n- `basic_usage.py` - Simple sheet extraction and cleaning\n\n## Requirements\n\n- Python 3.9-3.12\n- pandas >= 2.0.0\n- gspread >= 5.10.0\n- google-api-python-client >= 2.0.0\n- tqdm >= 4.65.0\n\n## Development\n\nFor development installation:\n\n```bash\ngit clone https://github.com/machado000/google-sheets-helper\ncd google-sheets-helper\npip install -e \".[dev]\"\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) file for details.\n\n## Support\n\n- [Documentation](https://github.com/machado000/google-sheets-helper#readme)\n- [Issues](https://github.com/machado000/google-sheets-helper/issues)\n- [Examples](examples/)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Helper module to parse data from GSheets into database-optimized DataFrames",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://github.com/machado000/google-sheets-helper#readme",
        "Homepage": "https://github.com/machado000/google-sheets-helper",
        "Issues": "https://github.com/machado000/google-sheets-helper/issues"
    },
    "split_keywords": [
        "google-sheets",
        " pandas",
        " etl",
        " data-extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8d5a7ca824a1925d257af1874dc1df39d5c7ba39f3231c080105a4495da49428",
                "md5": "6816c1fc77492a89f3f5780c6a7fd36e",
                "sha256": "b70e89d5f4c4bdc70622c422240e6b9e7e4db796f7b9daf1521ad21959f7e93e"
            },
            "downloads": -1,
            "filename": "google_sheets_helper-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6816c1fc77492a89f3f5780c6a7fd36e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 12453,
            "upload_time": "2025-08-06T18:24:07",
            "upload_time_iso_8601": "2025-08-06T18:24:07.181527Z",
            "url": "https://files.pythonhosted.org/packages/8d/5a/7ca824a1925d257af1874dc1df39d5c7ba39f3231c080105a4495da49428/google_sheets_helper-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b66d937944aad10f6b4b2de7a0cbc7b14a65a17d8b2362702c2f2636fad0dd8",
                "md5": "409d29db8afbeb8b88445090aa31605c",
                "sha256": "b792607d29916ba5c71128aa91f9522e75b3433a05c7a0dd7451dfe61a010e37"
            },
            "downloads": -1,
            "filename": "google_sheets_helper-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "409d29db8afbeb8b88445090aa31605c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 12161,
            "upload_time": "2025-08-06T18:24:08",
            "upload_time_iso_8601": "2025-08-06T18:24:08.243660Z",
            "url": "https://files.pythonhosted.org/packages/0b/66/d937944aad10f6b4b2de7a0cbc7b14a65a17d8b2362702c2f2636fad0dd8/google_sheets_helper-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 18:24:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "machado000",
    "github_project": "google-sheets-helper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "google-sheets-helper"
}
        
Elapsed time: 0.60659s