# Google Sheets Helper
A Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.
[](https://pypi.org/project/google-sheets-helper/)
[](https://github.com/machado000/google-sheets-helper/issues)
[](https://github.com/machado000/google-sheets-helper/commits/main)
[](https://github.com/machado000/google-sheets-helper/blob/main/LICENSE)
## Features
- **Google Sheets & Excel Support**: Read Google Sheets and Excel files directly from Google Drive
- **Database-Ready DataFrames**: Optimized data types and encoding for seamless database storage
- **Flexible Column Naming**: Choose between snake_case or camelCase column conventions
- **Smart Type Detection**: Dynamic conversion of metrics to appropriate int64/float64 types
- **Configurable Missing Values**: Granular control over NaN/NaT handling by column type
- **Character Encoding Cleanup**: Automatic text sanitization for database compatibility
- **Robust Error Handling**: Comprehensive error handling with specific exceptions
- **Type Hints**: Full type hint support for better IDE experience
## Installation
```bash
pip install google-sheets-helper
```
## Quick Start
### 1. Set up credentials
Place your Google service account credentials in `secrets/client_secret.json`.
### 2. Basic usage
```python
from google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging
setup_logging()
client_secret = load_client_secret()
gs_helper = GoogleSheetsHelper(client_secret)
spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"
df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
utils = DataframeUtils()
df = utils.fix_data_types(df, skip_columns=None)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
print(df.head(), df.dtypes)
os.makedirs("data", exist_ok=True)
filename = os.path.join("data", f"{spreadsheet_id}_{worksheet_name}.csv")
df.to_csv(filename, index=False)
```
## Data Cleaning Pipeline
You can use the built-in DataFrame utilities for further cleaning:
```python
from google_sheets_helper import DataframeUtils
utils = DataframeUtils()
df = utils.fix_data_types(df)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
```
## API Reference
- `GoogleSheetsHelper`: Main class for reading and transforming Google Sheets/Excel data
- `load_client_secret`: Loads credentials from a JSON file
- `setup_logging`: Configures logging for the package
- `DataframeUtils`: Utility class for DataFrame cleaning and optimization
- Exception classes: `AuthenticationError`, `APIError`, `ConfigurationError`, `DataProcessingError`, `ValidationError`
## Error Handling
```python
from google_sheets_helper import (
GoogleSheetsHelper,
AuthenticationError,
ValidationError,
APIError,
DataProcessingError,
ConfigurationError
)
try:
df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
except AuthenticationError:
# Handle credential issues
pass
except ValidationError:
# Handle input validation errors
pass
except APIError:
# Handle API errors
pass
except DataProcessingError:
# Handle data processing errors
pass
```
## Examples
Check the `examples/` directory for comprehensive usage examples:
- `basic_usage.py` - Simple sheet extraction and cleaning
## Requirements
- Python 3.9-3.12
- pandas >= 2.0.0
- gspread >= 5.10.0
- google-api-python-client >= 2.0.0
- tqdm >= 4.65.0
## Development
For development installation:
```bash
git clone https://github.com/machado000/google-sheets-helper
cd google-sheets-helper
pip install -e ".[dev]"
```
## License
MIT License. See [LICENSE](LICENSE) file for details.
## Support
- [Documentation](https://github.com/machado000/google-sheets-helper#readme)
- [Issues](https://github.com/machado000/google-sheets-helper/issues)
- [Examples](examples/)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Raw data
{
"_id": null,
"home_page": "https://github.com/machado000/google-sheets-helper",
"name": "google-sheets-helper",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "google-sheets, pandas, etl, data-extraction",
"author": "Joao Brito",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/0b/66/d937944aad10f6b4b2de7a0cbc7b14a65a17d8b2362702c2f2636fad0dd8/google_sheets_helper-1.0.1.tar.gz",
"platform": null,
"description": "# Google Sheets Helper\n\nA Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.\n\n[](https://pypi.org/project/google-sheets-helper/)\n[](https://github.com/machado000/google-sheets-helper/issues)\n[](https://github.com/machado000/google-sheets-helper/commits/main)\n[](https://github.com/machado000/google-sheets-helper/blob/main/LICENSE)\n\n## Features\n\n- **Google Sheets & Excel Support**: Read Google Sheets and Excel files directly from Google Drive\n- **Database-Ready DataFrames**: Optimized data types and encoding for seamless database storage\n- **Flexible Column Naming**: Choose between snake_case or camelCase column conventions\n- **Smart Type Detection**: Dynamic conversion of metrics to appropriate int64/float64 types\n- **Configurable Missing Values**: Granular control over NaN/NaT handling by column type\n- **Character Encoding Cleanup**: Automatic text sanitization for database compatibility\n- **Robust Error Handling**: Comprehensive error handling with specific exceptions\n- **Type Hints**: Full type hint support for better IDE experience\n\n## Installation\n\n```bash\npip install google-sheets-helper\n```\n\n## Quick Start\n\n### 1. Set up credentials\n\nPlace your Google service account credentials in `secrets/client_secret.json`.\n\n### 2. Basic usage\n\n```python\nfrom google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging\n\nsetup_logging()\nclient_secret = load_client_secret()\ngs_helper = GoogleSheetsHelper(client_secret)\n\nspreadsheet_id = \"your_spreadsheet_id\"\nworksheet_name = \"your_worksheet_name\"\n\ndf = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)\nutils = DataframeUtils()\n\ndf = utils.fix_data_types(df, skip_columns=None)\ndf = utils.handle_missing_values(df)\ndf = utils.clean_text_encoding(df)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\n\nprint(df.head(), df.dtypes)\n\nos.makedirs(\"data\", exist_ok=True)\nfilename = os.path.join(\"data\", f\"{spreadsheet_id}_{worksheet_name}.csv\")\n\ndf.to_csv(filename, index=False)\n```\n\n## Data Cleaning Pipeline\n\nYou can use the built-in DataFrame utilities for further cleaning:\n\n```python\nfrom google_sheets_helper import DataframeUtils\n\nutils = DataframeUtils()\ndf = utils.fix_data_types(df)\ndf = utils.handle_missing_values(df)\ndf = utils.clean_text_encoding(df)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\n```\n\n## API Reference\n\n- `GoogleSheetsHelper`: Main class for reading and transforming Google Sheets/Excel data\n- `load_client_secret`: Loads credentials from a JSON file\n- `setup_logging`: Configures logging for the package\n- `DataframeUtils`: Utility class for DataFrame cleaning and optimization\n- Exception classes: `AuthenticationError`, `APIError`, `ConfigurationError`, `DataProcessingError`, `ValidationError`\n\n## Error Handling\n\n```python\nfrom google_sheets_helper import (\n GoogleSheetsHelper,\n AuthenticationError,\n ValidationError,\n APIError,\n DataProcessingError,\n ConfigurationError\n)\n\ntry:\n df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)\nexcept AuthenticationError:\n # Handle credential issues\n pass\nexcept ValidationError:\n # Handle input validation errors\n pass\nexcept APIError:\n # Handle API errors\n pass\nexcept DataProcessingError:\n # Handle data processing errors\n pass\n```\n\n## Examples\n\nCheck the `examples/` directory for comprehensive usage examples:\n\n- `basic_usage.py` - Simple sheet extraction and cleaning\n\n## Requirements\n\n- Python 3.9-3.12\n- pandas >= 2.0.0\n- gspread >= 5.10.0\n- google-api-python-client >= 2.0.0\n- tqdm >= 4.65.0\n\n## Development\n\nFor development installation:\n\n```bash\ngit clone https://github.com/machado000/google-sheets-helper\ncd google-sheets-helper\npip install -e \".[dev]\"\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) file for details.\n\n## Support\n\n- [Documentation](https://github.com/machado000/google-sheets-helper#readme)\n- [Issues](https://github.com/machado000/google-sheets-helper/issues)\n- [Examples](examples/)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Helper module to parse data from GSheets into database-optimized DataFrames",
"version": "1.0.1",
"project_urls": {
"Documentation": "https://github.com/machado000/google-sheets-helper#readme",
"Homepage": "https://github.com/machado000/google-sheets-helper",
"Issues": "https://github.com/machado000/google-sheets-helper/issues"
},
"split_keywords": [
"google-sheets",
" pandas",
" etl",
" data-extraction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8d5a7ca824a1925d257af1874dc1df39d5c7ba39f3231c080105a4495da49428",
"md5": "6816c1fc77492a89f3f5780c6a7fd36e",
"sha256": "b70e89d5f4c4bdc70622c422240e6b9e7e4db796f7b9daf1521ad21959f7e93e"
},
"downloads": -1,
"filename": "google_sheets_helper-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6816c1fc77492a89f3f5780c6a7fd36e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 12453,
"upload_time": "2025-08-06T18:24:07",
"upload_time_iso_8601": "2025-08-06T18:24:07.181527Z",
"url": "https://files.pythonhosted.org/packages/8d/5a/7ca824a1925d257af1874dc1df39d5c7ba39f3231c080105a4495da49428/google_sheets_helper-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0b66d937944aad10f6b4b2de7a0cbc7b14a65a17d8b2362702c2f2636fad0dd8",
"md5": "409d29db8afbeb8b88445090aa31605c",
"sha256": "b792607d29916ba5c71128aa91f9522e75b3433a05c7a0dd7451dfe61a010e37"
},
"downloads": -1,
"filename": "google_sheets_helper-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "409d29db8afbeb8b88445090aa31605c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 12161,
"upload_time": "2025-08-06T18:24:08",
"upload_time_iso_8601": "2025-08-06T18:24:08.243660Z",
"url": "https://files.pythonhosted.org/packages/0b/66/d937944aad10f6b4b2de7a0cbc7b14a65a17d8b2362702c2f2636fad0dd8/google_sheets_helper-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 18:24:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "machado000",
"github_project": "google-sheets-helper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "google-sheets-helper"
}