# df-types
[](https://badge.fury.io/py/df-types)
[](https://opensource.org/licenses/MIT)
A Python tool for automatically generating dataclass definitions from pandas DataFrames.
## Getting Started
```bash
pip install df-types
```
```python
from df_types import DFTypes
import pandas as pd
import random
# Load your data
df = pd.DataFrame({
"id": list(range(1, 301)),
"name": ["Alice", "Bob", "Charlie"] * 100,
"age": [random.randint(18, 100) for _ in range(300)],
"prefers-pizza": [random.choice([True, False]) for _ in range(300)] # Not a valid Python identifier, will be normalized
})
# Generate type definitions
dft = DFTypes(df)
dft.write_types() # Creates typed_df.py
# Creates the following dataclass:
#
# @dataclass(slots=True)
# class TypedRowData:
# id: int
# name: Literal['Alice', 'Bob', 'Charlie']
# age: int
# prefers_pizza: bool
# Use the generated types
from typed_df import convert, iter_dataclasses
df_typed = convert(df) # Converts NaNs to None, normalizes column names to Python identifiers
for row_data in iter_dataclasses(df_typed):
# Each row_data is now a typed dataclass
print(f"ID: {row_data.id}, Name: {row_data.name}, Age: {row_data.age}, Prefers Pizza: {row_data.prefers_pizza}")
```
## Features
### Supported Types
| Feature | Example | Description |
| ------------------- | ----------------------------- | -------------------------------------- |
| **Literal Types** | `Literal["A", "B", "C"]` | For categorical data with known values |
| **Union Types** | `int \| float` | For columns with mixed numeric types |
| **Optional Types** | `str \| None` | For columns with missing values |
| **Custom Types** | `pd.Timestamp`, `Decimal` | Import and use external types |
| **Primitive Types** | `int`, `str`, `bool`, `float` | Standard Python types |
### Configuration
```python
from df_types.config import DFTypesConfig
# Basic options
config = DFTypesConfig(
filename="my_types.py",
class_name="MyRow", # Default is "TypedRowData"
max_literal_values=10 # Increase if you have more categories you want to infer as Literal types
)
dft = DFTypes(df, config=config)
dft.write_types()
from my_types import convert, iter_dataclasses, MyRow
# Use the generated types
```
## Considerations
If a type cannot be imported from the generated file, it will be given the type hint `object` and a warning will be printed. Most often this occurs because the type is contained in the calling module (e.g., `main.py` which imports `df_types` and provides `CustomType` that is contained in the DataFrame). You can manually move the type definition to another file to avoid this warning.
Due to sampling, if you have a column with a large number of rows and a disproportionate distribution of values, the inferred literals may not include all possible values. You can increase `sample_middle_rows`, `sample_head_rows`, or `sample_tail_rows` if you want to sample more rows.
## Future Features
- [ ] Support for typed containers (e.g., `List[int]`, `Dict[str, int]`)
- [ ] Support for nested dataclasses
- [ ] More advanced configuration options
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/jon-edward/df-types",
"name": "df-types",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "python, pandas, dataclass, dataframe",
"author": "jon-edward",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/0b/d0/cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300/df-types-0.0.2.tar.gz",
"platform": null,
"description": "# df-types\n\n[](https://badge.fury.io/py/df-types)\n[](https://opensource.org/licenses/MIT)\n\nA Python tool for automatically generating dataclass definitions from pandas DataFrames.\n\n## Getting Started\n\n```bash\npip install df-types\n```\n\n```python\nfrom df_types import DFTypes\nimport pandas as pd\nimport random\n\n# Load your data\ndf = pd.DataFrame({\n \"id\": list(range(1, 301)),\n \"name\": [\"Alice\", \"Bob\", \"Charlie\"] * 100,\n \"age\": [random.randint(18, 100) for _ in range(300)],\n \"prefers-pizza\": [random.choice([True, False]) for _ in range(300)] # Not a valid Python identifier, will be normalized\n})\n\n# Generate type definitions\ndft = DFTypes(df)\ndft.write_types() # Creates typed_df.py\n\n# Creates the following dataclass:\n#\n# @dataclass(slots=True)\n# class TypedRowData:\n# id: int\n# name: Literal['Alice', 'Bob', 'Charlie']\n# age: int\n# prefers_pizza: bool\n\n# Use the generated types\nfrom typed_df import convert, iter_dataclasses\n\ndf_typed = convert(df) # Converts NaNs to None, normalizes column names to Python identifiers\nfor row_data in iter_dataclasses(df_typed):\n # Each row_data is now a typed dataclass\n print(f\"ID: {row_data.id}, Name: {row_data.name}, Age: {row_data.age}, Prefers Pizza: {row_data.prefers_pizza}\")\n```\n\n## Features\n\n### Supported Types\n\n| Feature | Example | Description |\n| ------------------- | ----------------------------- | -------------------------------------- |\n| **Literal Types** | `Literal[\"A\", \"B\", \"C\"]` | For categorical data with known values |\n| **Union Types** | `int \\| float` | For columns with mixed numeric types |\n| **Optional Types** | `str \\| None` | For columns with missing values |\n| **Custom Types** | `pd.Timestamp`, `Decimal` | Import and use external types |\n| **Primitive Types** | `int`, `str`, `bool`, `float` | Standard Python types |\n\n### Configuration\n\n```python\nfrom df_types.config import DFTypesConfig\n\n# Basic options\nconfig = DFTypesConfig(\n filename=\"my_types.py\",\n class_name=\"MyRow\", # Default is \"TypedRowData\"\n max_literal_values=10 # Increase if you have more categories you want to infer as Literal types\n)\n\ndft = DFTypes(df, config=config)\ndft.write_types()\n\nfrom my_types import convert, iter_dataclasses, MyRow\n\n# Use the generated types\n```\n\n## Considerations\n\nIf a type cannot be imported from the generated file, it will be given the type hint `object` and a warning will be printed. Most often this occurs because the type is contained in the calling module (e.g., `main.py` which imports `df_types` and provides `CustomType` that is contained in the DataFrame). You can manually move the type definition to another file to avoid this warning.\n\nDue to sampling, if you have a column with a large number of rows and a disproportionate distribution of values, the inferred literals may not include all possible values. You can increase `sample_middle_rows`, `sample_head_rows`, or `sample_tail_rows` if you want to sample more rows.\n\n## Future Features\n\n- [ ] Support for typed containers (e.g., `List[int]`, `Dict[str, int]`)\n- [ ] Support for nested dataclasses\n- [ ] More advanced configuration options\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tool for generating dataclass type files for pandas DataFrame rows",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/jon-edward/df-types"
},
"split_keywords": [
"python",
" pandas",
" dataclass",
" dataframe"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bc1f8f8106e210548f71d43302ff5c3bd8676c8c444179a803c9339b4d359ea4",
"md5": "8085299189a819b3c0868a411e610a4c",
"sha256": "5f972f64d1422e0085be6146e086c395f40cc25efa953e2a6b9b7fa1bee9f3e7"
},
"downloads": -1,
"filename": "df_types-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8085299189a819b3c0868a411e610a4c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 8471,
"upload_time": "2025-07-27T20:38:57",
"upload_time_iso_8601": "2025-07-27T20:38:57.465247Z",
"url": "https://files.pythonhosted.org/packages/bc/1f/8f8106e210548f71d43302ff5c3bd8676c8c444179a803c9339b4d359ea4/df_types-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0bd0cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300",
"md5": "5ca450b6e1c90acf76adae4a444899f7",
"sha256": "e12a5ed0f2a99a995067b98ea6dfeecafe4369f6812e3c492619f6f3009077b1"
},
"downloads": -1,
"filename": "df-types-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "5ca450b6e1c90acf76adae4a444899f7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7319,
"upload_time": "2025-07-27T20:38:58",
"upload_time_iso_8601": "2025-07-27T20:38:58.520395Z",
"url": "https://files.pythonhosted.org/packages/0b/d0/cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300/df-types-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-27 20:38:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jon-edward",
"github_project": "df-types",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"2.3.1"
]
]
}
],
"lcname": "df-types"
}