# Google Sheets Helper
A Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.
[](https://pypi.org/project/google-sheets-helper/)
[](https://github.com/machado000/google-sheets-helper/issues)
[](https://github.com/machado000/google-sheets-helper/commits/main)
[](https://github.com/machado000/google-sheets-helper/blob/main/LICENSE)
## Features
- **Google Sheets & Excel Support**: Read Google Sheets and Excel files directly from Google Drive
- **Flexible Column Naming**: Choose between snake_case or camelCase column conventions, with robust ASCII normalization and automatic removal of unnamed columns (e.g., from Excel/CSV exports)
- **Progress Bar for Large Downloads**: Uses `tqdm` to show download progress for large Excel files
- **Advanced DataFrame type detection and cleaning**: Now available via the standalone package [`pandas-type-detector`](https://pypi.org/project/pandas-type-detector/)
- **Robust Error Handling**: Comprehensive error handling with specific exceptions
- **Progress Bar for Large Downloads**: Uses `tqdm` to show download progress for large Excel files
- **Type Hints**: Full type hint support for better IDE experience
## Installation
```bash
pip install google-sheets-helper
```
## Quick Start
### 1. Set up credentials
Place your Google service account credentials in `secrets/client_secret.json`.
### 2. Basic usage
setup_logging()
spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"
df = utils.fix_data_types(df, skip_columns=None)
df = utils.transform_column_names(df, naming_convention="snake_case")
```python
from google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging
from google_sheets_helper import DataframeUtils
setup_logging()
client_secret = load_client_secret()
gs_helper = GoogleSheetsHelper(client_secret)
spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"
df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
utils = DataframeUtils()
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
print(df.head(), df.dtypes)
print(df.head(), df.dtypes)
filename = os.path.join("data", f"{spreadsheet_id}_{worksheet_name}.csv")
```
os.makedirs("data", exist_ok=True)
filename = os.path.join("data", f"{spreadsheet_id}_{worksheet_name}.csv")
df.to_csv(filename, index=False)
```
## Data Cleaning Pipeline
You can use advanced DataFrame type detection and cleaning via the [`pandas-type-detector`](https://pypi.org/project/pandas-type-detector/) package:
```python
from google_sheets_helper import DataframeUtils
utils = DataframeUtils()
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
df = utils.remove_unnamed_columns(df)
```
## API Reference
- `GoogleSheetsHelper`: Main class for reading and transforming Google Sheets/Excel data
- `get_drive_file_metadata`: Retrieve file name and MIME type from Google Drive
- `list_files_in_folder`: List files in a Google Drive folder
- `load_client_secret`: Loads credentials from a JSON file
- `setup_logging`: Configures logging for the package
- For advanced DataFrame cleaning and optimization, use [`pandas-type-detector`](https://pypi.org/project/pandas-type-detector/) and its `DataframeUtils` class
- Exception classes: `AuthenticationError`, `APIError`, `ConfigurationError`, `DataProcessingError`, `ValidationError`
## Error Handling
```python
from google_sheets_helper import (
GoogleSheetsHelper,
AuthenticationError,
ValidationError,
APIError,
DataProcessingError,
ConfigurationError
)
try:
df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
except AuthenticationError:
# Handle credential issues
pass
except ValidationError:
# Handle input validation errors
pass
except APIError:
# Handle API errors
pass
except DataProcessingError:
# Handle data processing errors
pass
```
## Examples
Check the `examples/` directory for comprehensive usage examples:
- `basic_usage.py` - Simple sheet extraction and cleaning
## Requirements
- Python 3.9-3.12
- pandas >= 2.0.0
- gspread >= 5.10.0
- google-api-python-client >= 2.0.0
- tqdm >= 4.65.0
## Development
For development installation:
```bash
git clone https://github.com/machado000/google-sheets-helper
cd google-sheets-helper
pip install -e ".[dev]"
```
## License
MIT License. See [LICENSE](LICENSE) file for details.
## Support
- [Documentation](https://github.com/machado000/google-sheets-helper#readme)
- [Issues](https://github.com/machado000/google-sheets-helper/issues)
- [Examples](examples/)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Raw data
{
"_id": null,
"home_page": "https://github.com/machado000/google-sheets-helper",
"name": "google-sheets-helper",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "google-sheets, pandas, etl, data-extraction",
"author": "Joao Brito",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/39/5f/b9174521e7e9b96d13462ab4b067fb0ffb6df2612dffce50fa8b03b9eabf/google_sheets_helper-1.1.2.tar.gz",
"platform": null,
"description": "# Google Sheets Helper\n\nA Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.\n\n[](https://pypi.org/project/google-sheets-helper/)\n[](https://github.com/machado000/google-sheets-helper/issues)\n[](https://github.com/machado000/google-sheets-helper/commits/main)\n[](https://github.com/machado000/google-sheets-helper/blob/main/LICENSE)\n\n## Features\n\n- **Google Sheets & Excel Support**: Read Google Sheets and Excel files directly from Google Drive\n- **Flexible Column Naming**: Choose between snake_case or camelCase column conventions, with robust ASCII normalization and automatic removal of unnamed columns (e.g., from Excel/CSV exports)\n- **Progress Bar for Large Downloads**: Uses `tqdm` to show download progress for large Excel files\n- **Advanced DataFrame type detection and cleaning**: Now available via the standalone package [`pandas-type-detector`](https://pypi.org/project/pandas-type-detector/)\n- **Robust Error Handling**: Comprehensive error handling with specific exceptions\n- **Progress Bar for Large Downloads**: Uses `tqdm` to show download progress for large Excel files\n- **Type Hints**: Full type hint support for better IDE experience\n\n## Installation\n\n```bash\npip install google-sheets-helper\n```\n\n## Quick Start\n\n### 1. Set up credentials\n\nPlace your Google service account credentials in `secrets/client_secret.json`.\n\n### 2. Basic usage\nsetup_logging()\n\nspreadsheet_id = \"your_spreadsheet_id\"\nworksheet_name = \"your_worksheet_name\"\n\ndf = utils.fix_data_types(df, skip_columns=None)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\n```python\nfrom google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging\nfrom google_sheets_helper import DataframeUtils\n\nsetup_logging()\nclient_secret = load_client_secret()\ngs_helper = GoogleSheetsHelper(client_secret)\n\nspreadsheet_id = \"your_spreadsheet_id\"\nworksheet_name = \"your_worksheet_name\"\n\ndf = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)\nutils = DataframeUtils()\n\ndf = utils.handle_missing_values(df)\ndf = utils.clean_text_encoding(df)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\n\nprint(df.head(), df.dtypes)\n\nprint(df.head(), df.dtypes)\nfilename = os.path.join(\"data\", f\"{spreadsheet_id}_{worksheet_name}.csv\")\n\n\n```\nos.makedirs(\"data\", exist_ok=True)\nfilename = os.path.join(\"data\", f\"{spreadsheet_id}_{worksheet_name}.csv\")\n\ndf.to_csv(filename, index=False)\n```\n\n## Data Cleaning Pipeline\n\nYou can use advanced DataFrame type detection and cleaning via the [`pandas-type-detector`](https://pypi.org/project/pandas-type-detector/) package:\n\n```python\nfrom google_sheets_helper import DataframeUtils\n\nutils = DataframeUtils()\ndf = utils.handle_missing_values(df)\ndf = utils.clean_text_encoding(df)\ndf = utils.transform_column_names(df, naming_convention=\"snake_case\")\ndf = utils.remove_unnamed_columns(df)\n```\n\n## API Reference\n\n- `GoogleSheetsHelper`: Main class for reading and transforming Google Sheets/Excel data\n- `get_drive_file_metadata`: Retrieve file name and MIME type from Google Drive\n- `list_files_in_folder`: List files in a Google Drive folder\n- `load_client_secret`: Loads credentials from a JSON file\n- `setup_logging`: Configures logging for the package\n- For advanced DataFrame cleaning and optimization, use [`pandas-type-detector`](https://pypi.org/project/pandas-type-detector/) and its `DataframeUtils` class\n- Exception classes: `AuthenticationError`, `APIError`, `ConfigurationError`, `DataProcessingError`, `ValidationError`\n\n## Error Handling\n\n```python\nfrom google_sheets_helper import (\n GoogleSheetsHelper,\n AuthenticationError,\n ValidationError,\n APIError,\n DataProcessingError,\n ConfigurationError\n)\n\ntry:\n df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)\nexcept AuthenticationError:\n # Handle credential issues\n pass\nexcept ValidationError:\n # Handle input validation errors\n pass\nexcept APIError:\n # Handle API errors\n pass\nexcept DataProcessingError:\n # Handle data processing errors\n pass\n```\n\n## Examples\n\nCheck the `examples/` directory for comprehensive usage examples:\n\n- `basic_usage.py` - Simple sheet extraction and cleaning\n\n## Requirements\n\n- Python 3.9-3.12\n- pandas >= 2.0.0\n- gspread >= 5.10.0\n- google-api-python-client >= 2.0.0\n- tqdm >= 4.65.0\n\n## Development\n\nFor development installation:\n\n```bash\ngit clone https://github.com/machado000/google-sheets-helper\ncd google-sheets-helper\npip install -e \".[dev]\"\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) file for details.\n\n## Support\n\n- [Documentation](https://github.com/machado000/google-sheets-helper#readme)\n- [Issues](https://github.com/machado000/google-sheets-helper/issues)\n- [Examples](examples/)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Helper module to parse data from GSheets into database-optimized DataFrames",
"version": "1.1.2",
"project_urls": {
"Documentation": "https://github.com/machado000/google-sheets-helper#readme",
"Homepage": "https://github.com/machado000/google-sheets-helper",
"Issues": "https://github.com/machado000/google-sheets-helper/issues"
},
"split_keywords": [
"google-sheets",
" pandas",
" etl",
" data-extraction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "45c0a9b5f0713fd9f64edf6987dcc8c201184afaaa6f9a27ac87f199c278ad4b",
"md5": "8b063160d97b621306add09c8f6c32cf",
"sha256": "b07751a7a49db9ddc793e4e4b07acc573a5cb76380f481595860925beff10dfc"
},
"downloads": -1,
"filename": "google_sheets_helper-1.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8b063160d97b621306add09c8f6c32cf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 10754,
"upload_time": "2025-08-20T04:13:19",
"upload_time_iso_8601": "2025-08-20T04:13:19.011784Z",
"url": "https://files.pythonhosted.org/packages/45/c0/a9b5f0713fd9f64edf6987dcc8c201184afaaa6f9a27ac87f199c278ad4b/google_sheets_helper-1.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "395fb9174521e7e9b96d13462ab4b067fb0ffb6df2612dffce50fa8b03b9eabf",
"md5": "9052bb74786e4bb81a64061c4ddbfb38",
"sha256": "a6538c69b46fd131a1758358bd056e5b56a059b3999d0735678b94ca8474eb13"
},
"downloads": -1,
"filename": "google_sheets_helper-1.1.2.tar.gz",
"has_sig": false,
"md5_digest": "9052bb74786e4bb81a64061c4ddbfb38",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 10174,
"upload_time": "2025-08-20T04:13:20",
"upload_time_iso_8601": "2025-08-20T04:13:20.411502Z",
"url": "https://files.pythonhosted.org/packages/39/5f/b9174521e7e9b96d13462ab4b067fb0ffb6df2612dffce50fa8b03b9eabf/google_sheets_helper-1.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-20 04:13:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "machado000",
"github_project": "google-sheets-helper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "google-sheets-helper"
}