df-types


Namedf-types JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/jon-edward/df-types
SummaryA tool for generating dataclass type files for pandas DataFrame rows
upload_time2025-07-27 20:38:58
maintainerNone
docs_urlNone
authorjon-edward
requires_pythonNone
licenseMIT
keywords python pandas dataclass dataframe
VCS
bugtrack_url
requirements pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # df-types

[![PyPI version](https://badge.fury.io/py/df-types.svg)](https://badge.fury.io/py/df-types)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python tool for automatically generating dataclass definitions from pandas DataFrames.

## Getting Started

```bash
pip install df-types
```

```python
from df_types import DFTypes
import pandas as pd
import random

# Load your data
df = pd.DataFrame({
    "id": list(range(1, 301)),
    "name": ["Alice", "Bob", "Charlie"] * 100,
    "age": [random.randint(18, 100) for _ in range(300)],
    "prefers-pizza": [random.choice([True, False]) for _ in range(300)]  # Not a valid Python identifier, will be normalized
})

# Generate type definitions
dft = DFTypes(df)
dft.write_types()  # Creates typed_df.py

# Creates the following dataclass:
#
# @dataclass(slots=True)
# class TypedRowData:
#     id: int
#     name: Literal['Alice', 'Bob', 'Charlie']
#     age: int
#     prefers_pizza: bool

# Use the generated types
from typed_df import convert, iter_dataclasses

df_typed = convert(df)  # Converts NaNs to None, normalizes column names to Python identifiers
for row_data in iter_dataclasses(df_typed):
    # Each row_data is now a typed dataclass
    print(f"ID: {row_data.id}, Name: {row_data.name}, Age: {row_data.age}, Prefers Pizza: {row_data.prefers_pizza}")
```

## Features

### Supported Types

| Feature             | Example                       | Description                            |
| ------------------- | ----------------------------- | -------------------------------------- |
| **Literal Types**   | `Literal["A", "B", "C"]`      | For categorical data with known values |
| **Union Types**     | `int \| float`                | For columns with mixed numeric types   |
| **Optional Types**  | `str \| None`                 | For columns with missing values        |
| **Custom Types**    | `pd.Timestamp`, `Decimal`     | Import and use external types          |
| **Primitive Types** | `int`, `str`, `bool`, `float` | Standard Python types                  |

### Configuration

```python
from df_types.config import DFTypesConfig

# Basic options
config = DFTypesConfig(
    filename="my_types.py",
    class_name="MyRow",  # Default is "TypedRowData"
    max_literal_values=10  # Increase if you have more categories you want to infer as Literal types
)

dft = DFTypes(df, config=config)
dft.write_types()

from my_types import convert, iter_dataclasses, MyRow

# Use the generated types
```

## Considerations

If a type cannot be imported from the generated file, it will be given the type hint `object` and a warning will be printed. Most often this occurs because the type is contained in the calling module (e.g., `main.py` which imports `df_types` and provides `CustomType` that is contained in the DataFrame). You can manually move the type definition to another file to avoid this warning.

Due to sampling, if you have a column with a large number of rows and a disproportionate distribution of values, the inferred literals may not include all possible values. You can increase `sample_middle_rows`, `sample_head_rows`, or `sample_tail_rows` if you want to sample more rows.

## Future Features

- [ ] Support for typed containers (e.g., `List[int]`, `Dict[str, int]`)
- [ ] Support for nested dataclasses
- [ ] More advanced configuration options

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jon-edward/df-types",
    "name": "df-types",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python, pandas, dataclass, dataframe",
    "author": "jon-edward",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/0b/d0/cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300/df-types-0.0.2.tar.gz",
    "platform": null,
    "description": "# df-types\n\n[![PyPI version](https://badge.fury.io/py/df-types.svg)](https://badge.fury.io/py/df-types)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA Python tool for automatically generating dataclass definitions from pandas DataFrames.\n\n## Getting Started\n\n```bash\npip install df-types\n```\n\n```python\nfrom df_types import DFTypes\nimport pandas as pd\nimport random\n\n# Load your data\ndf = pd.DataFrame({\n    \"id\": list(range(1, 301)),\n    \"name\": [\"Alice\", \"Bob\", \"Charlie\"] * 100,\n    \"age\": [random.randint(18, 100) for _ in range(300)],\n    \"prefers-pizza\": [random.choice([True, False]) for _ in range(300)]  # Not a valid Python identifier, will be normalized\n})\n\n# Generate type definitions\ndft = DFTypes(df)\ndft.write_types()  # Creates typed_df.py\n\n# Creates the following dataclass:\n#\n# @dataclass(slots=True)\n# class TypedRowData:\n#     id: int\n#     name: Literal['Alice', 'Bob', 'Charlie']\n#     age: int\n#     prefers_pizza: bool\n\n# Use the generated types\nfrom typed_df import convert, iter_dataclasses\n\ndf_typed = convert(df)  # Converts NaNs to None, normalizes column names to Python identifiers\nfor row_data in iter_dataclasses(df_typed):\n    # Each row_data is now a typed dataclass\n    print(f\"ID: {row_data.id}, Name: {row_data.name}, Age: {row_data.age}, Prefers Pizza: {row_data.prefers_pizza}\")\n```\n\n## Features\n\n### Supported Types\n\n| Feature             | Example                       | Description                            |\n| ------------------- | ----------------------------- | -------------------------------------- |\n| **Literal Types**   | `Literal[\"A\", \"B\", \"C\"]`      | For categorical data with known values |\n| **Union Types**     | `int \\| float`                | For columns with mixed numeric types   |\n| **Optional Types**  | `str \\| None`                 | For columns with missing values        |\n| **Custom Types**    | `pd.Timestamp`, `Decimal`     | Import and use external types          |\n| **Primitive Types** | `int`, `str`, `bool`, `float` | Standard Python types                  |\n\n### Configuration\n\n```python\nfrom df_types.config import DFTypesConfig\n\n# Basic options\nconfig = DFTypesConfig(\n    filename=\"my_types.py\",\n    class_name=\"MyRow\",  # Default is \"TypedRowData\"\n    max_literal_values=10  # Increase if you have more categories you want to infer as Literal types\n)\n\ndft = DFTypes(df, config=config)\ndft.write_types()\n\nfrom my_types import convert, iter_dataclasses, MyRow\n\n# Use the generated types\n```\n\n## Considerations\n\nIf a type cannot be imported from the generated file, it will be given the type hint `object` and a warning will be printed. Most often this occurs because the type is contained in the calling module (e.g., `main.py` which imports `df_types` and provides `CustomType` that is contained in the DataFrame). You can manually move the type definition to another file to avoid this warning.\n\nDue to sampling, if you have a column with a large number of rows and a disproportionate distribution of values, the inferred literals may not include all possible values. You can increase `sample_middle_rows`, `sample_head_rows`, or `sample_tail_rows` if you want to sample more rows.\n\n## Future Features\n\n- [ ] Support for typed containers (e.g., `List[int]`, `Dict[str, int]`)\n- [ ] Support for nested dataclasses\n- [ ] More advanced configuration options\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool for generating dataclass type files for pandas DataFrame rows",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/jon-edward/df-types"
    },
    "split_keywords": [
        "python",
        " pandas",
        " dataclass",
        " dataframe"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bc1f8f8106e210548f71d43302ff5c3bd8676c8c444179a803c9339b4d359ea4",
                "md5": "8085299189a819b3c0868a411e610a4c",
                "sha256": "5f972f64d1422e0085be6146e086c395f40cc25efa953e2a6b9b7fa1bee9f3e7"
            },
            "downloads": -1,
            "filename": "df_types-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8085299189a819b3c0868a411e610a4c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8471,
            "upload_time": "2025-07-27T20:38:57",
            "upload_time_iso_8601": "2025-07-27T20:38:57.465247Z",
            "url": "https://files.pythonhosted.org/packages/bc/1f/8f8106e210548f71d43302ff5c3bd8676c8c444179a803c9339b4d359ea4/df_types-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0bd0cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300",
                "md5": "5ca450b6e1c90acf76adae4a444899f7",
                "sha256": "e12a5ed0f2a99a995067b98ea6dfeecafe4369f6812e3c492619f6f3009077b1"
            },
            "downloads": -1,
            "filename": "df-types-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5ca450b6e1c90acf76adae4a444899f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7319,
            "upload_time": "2025-07-27T20:38:58",
            "upload_time_iso_8601": "2025-07-27T20:38:58.520395Z",
            "url": "https://files.pythonhosted.org/packages/0b/d0/cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300/df-types-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-27 20:38:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jon-edward",
    "github_project": "df-types",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.3.1"
                ]
            ]
        }
    ],
    "lcname": "df-types"
}
        
Elapsed time: 1.56756s