typeddfs


Nametypeddfs JSON
Version 0.16.5 PyPI version JSON
download
home_pagehttps://github.com/dmyersturnbull/typed-dfs
SummaryPandas DataFrame subclasses that enforce structure and can self-organize.
upload_time2022-03-01 04:56:11
maintainerdmyersturnbull
docs_urlNone
authorDouglas Myers-Turnbull
requires_python>=3.8,<4.0
licenseApache-2.0
keywords pandas typing columns structured
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Typed DataFrames

[![Version status](https://img.shields.io/pypi/status/typeddfs?label=status)](https://pypi.org/project/typeddfs)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python version compatibility](https://img.shields.io/pypi/pyversions/typeddfs?label=Python)](https://pypi.org/project/typeddfs)
[![Version on GitHub](https://img.shields.io/github/v/release/dmyersturnbull/typed-dfs?include_prereleases&label=GitHub)](https://github.com/dmyersturnbull/typed-dfs/releases)
[![Version on PyPi](https://img.shields.io/pypi/v/typeddfs?label=PyPi)](https://pypi.org/project/typeddfs)  
[![Build (Actions)](https://img.shields.io/github/workflow/status/dmyersturnbull/typed-dfs/Build%20&%20test?label=Tests)](https://github.com/dmyersturnbull/typed-dfs/actions)
[![Coverage (coveralls)](https://coveralls.io/repos/github/dmyersturnbull/typed-dfs/badge.svg?branch=main&service=github)](https://coveralls.io/github/dmyersturnbull/typed-dfs?branch=main)
[![Documentation status](https://readthedocs.org/projects/typed-dfs/badge)](https://typed-dfs.readthedocs.io/en/stable/)
[![Maintainability](https://api.codeclimate.com/v1/badges/6b804351b6ba5e7694af/maintainability)](https://codeclimate.com/github/dmyersturnbull/typed-dfs/maintainability)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/dmyersturnbull/typed-dfs/badges/quality-score.png?b=main)](https://scrutinizer-ci.com/g/dmyersturnbull/typed-dfs/?branch=main)
[![Created with Tyrannosaurus](https://img.shields.io/badge/Created_with-Tyrannosaurus-0000ff.svg)](https://github.com/dmyersturnbull/tyrannosaurus)

Pandas DataFrame subclasses that self-organize and serialize robustly.

```python
Film = TypedDfs.typed("Film").require("name", "studio", "year").build()
df = Film.read_csv("file.csv")
assert df.columns.tolist() == ["name", "studio", "year"]
type(df)  # Film
```

Your types remember how to be read,
including columns, dtypes, indices, and custom requirements.
No index_cols=, header=, set_index, or astype needed.

**Read and write any format:**

```python
path = input("input file? [.csv/.tsv/.tab/.json/.xml.bz2/.feather/.snappy.h5/...]")
df = Film.read_file(path)
df.write_file("output.snappy")
```

**Need dataclasses?**

```python
instances = df.to_dataclass_instances()
Film.from_dataclass_instances(instances)
```

**Save metadata?**

```python
df = df.set_attrs(dataset="piano")
df.write_file("df.csv", attrs=True)
df = Film.read_file("df.csv", attrs=True)
print(df.attrs)  # e.g. {"dataset": "piano")
```

**Make dirs? Don’t overwrite?**

```python
df.write_file("df.csv", mkdirs=True, overwrite=False)
```

**Write / verify checksums?**

```python
df.write_file("df.csv", file_hash=True)
df = Film.read_file("df.csv", file_hash=True)  # fails if wrong
```

**Get example datasets?**

```python
print(ExampleDfs.penguins().df)
#    species     island  bill_length_mm  ...  flipper_length_mm  body_mass_g     sex
# 0    Adelie  Torgersen            39.1  ...              181.0       3750.0    MALE
```

**Pretty-print the obvious way?**

```python
df.pretty_print(to="all_data.md.zip")
wiki_txt = df.pretty_print(fmt="mediawiki")
```

All standard DataFrame methods remain available.
Use `.of(df)` to convert to your type, or `.vanilla()` for a plain DataFrame.

**[Read the docs πŸ“š](https://typed-dfs.readthedocs.io/en/stable/)** for more info and examples.

### πŸ› Pandas serialization bugs fixed

Pandas has several issues with serialization.

<details>
<summary><em>See: Fixed issues</em></summary>
Depending on the format and columns, these issues occur:

- columns being silently added or dropped,
- errors on either read or write of empty DataFrames,
- the inability to use DataFrames with indices in Feather,
- writing to Parquet failing with half-precision,
- lingering partially written files on error,
- the buggy xlrd being preferred by read_excel,
- the buggy odfpy also being preferred,
- writing a file and reading it back results in a different DataFrame,
- you can’t write fixed-width format,
- and the platform text encoding being used rather than utf-8.
- invalid JSON is written via the built-in json library

</details>

### 🎁 Other features

See more in the [guided walkthrough ✏️](https://typed-dfs.readthedocs.io/en/latest/guide.html)

<details>
<summary><em>See: Short feature list</em></summary>

- Dtype-aware natural sorting
- UTF-8 by default
- Near-atomicity of read/write
- Matrix-like typed dataframes and methods (e.g. `matrix.is_symmetric()`)
- DataFrame-compatible frozen, hashable, ordered collections (dict, list, and set)
- Serialize JSON robustly, preserving NaN, inf, βˆ’inf, enums, timezones, complex numbers, etc.
- Serialize more formats like TOML and INI
- Interpreting paths and formats (e.g. `FileFormat.split("dir/myfile.csv.gz").compression # gz`)
- Generate good CLI help text for input DataFrames
- Parse/verify/add/update/delete files in a .shasum-like file

</details>

### πŸ’” Limitations

<details>
<summary><em>See: List of limitations</em></summary>

- Multi-level columns are not yet supported.
- Columns and index levels cannot share names.
- Duplicate column names are not supported. (These are strange anyway.)
- A typed DF cannot have columns "level_0", "index", or "Unnamed: 0".
- `inplace` is forbidden in some functions; avoid it or use `.vanilla()`.

</details>

### πŸ”Œ Serialization support

TypedDfs provides the methods `read_file` and `write_file`, which guess the format from the
filename extension. For example, this will convert a gzipped, tab-delimited file to Feather:

```python
TastyDf = typeddfs.typed("TastyDf").build()
TastyDf.read_file("myfile.tab.gz").write_file("myfile.feather")
```

Pandas does most of the serialization, but some formats require extra packages.
Typed-dfs specifies [extras](https://python-poetry.org/docs/pyproject/#extras)
to help you get required packages and with compatible versions.

Here are the extras:

- `feather`: [Feather](https://arrow.apache.org/docs/python/feather.html) (uses: pyarrow)
- `parquet`: [Parquet (e.g. .snappy)](https://github.com/apache/parquet-format) (uses: pyarrow)
- `xml` (uses: lxml)
- `excel`: Excel and LibreOffice .xlsx/.ods/.xls, etc. (uses: openpyxl, defusedxml)
- `toml`: [TOML](https://toml.io/en/) (uses: tomlkit)
- `html` (uses: html5lib, beautifulsoup4)
- `xlsb`: rare binary Excel file (uses: pyxlsb)
- [HDF5](https://www.hdfgroup.org/solutions/hdf5/) _{no extra provided}_ (_use:_ `tables`)

For example, for Feather and TOML support use: `typeddfs[feather,toml]`  
As a shorthand for all formats, use `typeddfs[all]`.

### πŸ“Š Serialization in-depth

<details>
<summary><em>See: Full table</em></summary>

| format      | packages                     | extra     | sanity | speed | file sizes |
| ----------- | ---------------------------- | --------- | ------ | ----- | ---------- |
| Feather     | `pyarrow`                    | `feather` | +++    | ++++  | +++        |
| Parquet     | `pyarrow` or `fastparquet` † | `parquet` | ++     | +++   | ++++       |
| csv/tsv     | none                         | none      | ++     | βˆ’βˆ’    | βˆ’βˆ’         |
| flexwf ‑    | none                         | none      | ++     | βˆ’βˆ’    | βˆ’βˆ’         |
| .fwf        | none                         | none      | +      | βˆ’βˆ’    | βˆ’βˆ’         |
| json        | none                         | none      | βˆ’βˆ’     | βˆ’βˆ’βˆ’   | βˆ’βˆ’βˆ’        |
| xml         | `lxml`                       | `xml`     | βˆ’      | βˆ’βˆ’βˆ’   | βˆ’βˆ’βˆ’        |
| .properties | none                         | none      | βˆ’βˆ’     | βˆ’βˆ’    | βˆ’βˆ’         |
| toml        | `tomlkit`                    | `toml`    | βˆ’βˆ’     | βˆ’βˆ’    | βˆ’βˆ’         |
| INI         | none                         | none      | βˆ’βˆ’βˆ’    | βˆ’βˆ’    | βˆ’βˆ’         |
| .lines      | none                         | none      | ++     | βˆ’βˆ’    | βˆ’βˆ’         |
| .npy        | none                         | none      | βˆ’      | +     | +++        |
| .npz        | none                         | none      | βˆ’      | +     | +++        |
| .html       | `html5lib,beautifulsoup4`    | `html`    | βˆ’βˆ’     | βˆ’βˆ’βˆ’   | βˆ’βˆ’βˆ’        |
| pickle      | none                         | none      | βˆ’βˆ’     | βˆ’βˆ’βˆ’   | βˆ’βˆ’βˆ’        |
| XLSX        | `openpyxl,defusedxml`        | `excel`   | +      | βˆ’βˆ’    | +          |
| ODS         | `openpyxl,defusedxml`        | `excel`   | +      | βˆ’βˆ’    | +          |
| XLS         | `openpyxl,defusedxml`        | `excel`   | βˆ’βˆ’     | βˆ’βˆ’    | +          |
| XLSB        | `pyxlsb`                     | `xlsb`    | βˆ’βˆ’     | βˆ’βˆ’    | ++         |
| HDF5        | `tables`                     | `hdf5`    | βˆ’βˆ’     | βˆ’     | ++         |

**⚠ Note:** The `hdf5` extra is currently disabled.

</details>

<details>
<summary><em>See: serialization notes</em></summary>

- † `fastparquet` can be used instead. It is slower but much smaller.
- Parquet only supports str, float64, float32, int64, int32, and bool.
  Other numeric types are automatically converted during write.
- ‑ `.flexwf` is fixed-width with optional delimiters.
- JSON has inconsistent handling of `None`. ([orjson](https://github.com/ijl/orjson) is more consistent).
- XML requires Pandas 1.3+.
- Not all JSON, XML, TOML, and HDF5 files can be read.
- .ini and .properties can only be written with exactly 2 columns + index levels:
  a key and a value. INI keys are in the form `section.name`.
- .lines can only be written with exactly 1 column or index level.
- .npy and .npz only serialize numpy objects.
  They are not supported in `read_file` and `write_file`.
- .html is not supported in `read_file` and `write_file`.
- Pickle is insecure and not recommended.
- Pandas supports odfpy for ODS and xlrd for XLS. In fact, it prefers those.
  However, they are very buggy; openpyxl is much better.
- XLSM, XLTX, XLTM, XLS, and XLSB files can contain macros, which Microsoft Excel will ingest.
- XLS is a deprecated format.
- XLSB is not fully supported in Pandas.
- HDF may not work on all platforms yet due to a
  [tables issue](https://github.com/PyTables/PyTables/issues/854).

Feather offers massively better performance over CSV, gzipped CSV, and HDF5
in read speed, write speed, memory overhead, and compression ratios.
Parquet typically results in smaller file sizes than Feather at some cost in speed.
Feather is the preferred format for most cases.

</details>

### πŸ”’ Security

Refer to the [security policy](https://github.com/dmyersturnbull/typed-dfs/blob/main/SECURITY.md).

### πŸ“ Extra notes

<details>
<summary><em>See: Pinned versions</em></summary>

Dependencies in the extras only have version minimums, not maximums.
For example, typed-dfs requires pyarrow >= 4.
[natsort](https://github.com/SethMMorton/natsort) is also only assigned a minimum version number.
This means that the result of typed-df’s `sort_natural` could change.
To fix this, pin natsort to a specific major version;
e.g. `natsort = "^8"` with [Poetry](https://python-poetry.org/) or `natsort>=8,<9` with pip.

</details>

### 🍁 Contributing

Typed-Dfs is licensed under the [Apache License, version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
[New issues](https://github.com/dmyersturnbull/typed-dfs/issues) and pull requests are welcome.
Please refer to the [contributing guide](https://github.com/dmyersturnbull/typed-dfs/blob/main/CONTRIBUTING.md).
Generated with [Tyrannosaurus](https://github.com/dmyersturnbull/tyrannosaurus).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dmyersturnbull/typed-dfs",
    "name": "typeddfs",
    "maintainer": "dmyersturnbull",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "pandas,typing,columns,structured",
    "author": "Douglas Myers-Turnbull",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/df/69/23b4c90de17493d82dcd65f094b636dc389fc3cd99342a109678bb06e101/typeddfs-0.16.5.tar.gz",
    "platform": "",
    "description": "# Typed DataFrames\n\n[![Version status](https://img.shields.io/pypi/status/typeddfs?label=status)](https://pypi.org/project/typeddfs)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python version compatibility](https://img.shields.io/pypi/pyversions/typeddfs?label=Python)](https://pypi.org/project/typeddfs)\n[![Version on GitHub](https://img.shields.io/github/v/release/dmyersturnbull/typed-dfs?include_prereleases&label=GitHub)](https://github.com/dmyersturnbull/typed-dfs/releases)\n[![Version on PyPi](https://img.shields.io/pypi/v/typeddfs?label=PyPi)](https://pypi.org/project/typeddfs)  \n[![Build (Actions)](https://img.shields.io/github/workflow/status/dmyersturnbull/typed-dfs/Build%20&%20test?label=Tests)](https://github.com/dmyersturnbull/typed-dfs/actions)\n[![Coverage (coveralls)](https://coveralls.io/repos/github/dmyersturnbull/typed-dfs/badge.svg?branch=main&service=github)](https://coveralls.io/github/dmyersturnbull/typed-dfs?branch=main)\n[![Documentation status](https://readthedocs.org/projects/typed-dfs/badge)](https://typed-dfs.readthedocs.io/en/stable/)\n[![Maintainability](https://api.codeclimate.com/v1/badges/6b804351b6ba5e7694af/maintainability)](https://codeclimate.com/github/dmyersturnbull/typed-dfs/maintainability)\n[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/dmyersturnbull/typed-dfs/badges/quality-score.png?b=main)](https://scrutinizer-ci.com/g/dmyersturnbull/typed-dfs/?branch=main)\n[![Created with Tyrannosaurus](https://img.shields.io/badge/Created_with-Tyrannosaurus-0000ff.svg)](https://github.com/dmyersturnbull/tyrannosaurus)\n\nPandas DataFrame subclasses that self-organize and serialize robustly.\n\n```python\nFilm = TypedDfs.typed(\"Film\").require(\"name\", \"studio\", \"year\").build()\ndf = Film.read_csv(\"file.csv\")\nassert df.columns.tolist() == [\"name\", \"studio\", \"year\"]\ntype(df)  # Film\n```\n\nYour types remember how to be read,\nincluding columns, dtypes, indices, and custom requirements.\nNo index_cols=, header=, set_index, or astype needed.\n\n**Read and write any format:**\n\n```python\npath = input(\"input file? [.csv/.tsv/.tab/.json/.xml.bz2/.feather/.snappy.h5/...]\")\ndf = Film.read_file(path)\ndf.write_file(\"output.snappy\")\n```\n\n**Need dataclasses?**\n\n```python\ninstances = df.to_dataclass_instances()\nFilm.from_dataclass_instances(instances)\n```\n\n**Save metadata?**\n\n```python\ndf = df.set_attrs(dataset=\"piano\")\ndf.write_file(\"df.csv\", attrs=True)\ndf = Film.read_file(\"df.csv\", attrs=True)\nprint(df.attrs)  # e.g. {\"dataset\": \"piano\")\n```\n\n**Make dirs? Don\u2019t overwrite?**\n\n```python\ndf.write_file(\"df.csv\", mkdirs=True, overwrite=False)\n```\n\n**Write / verify checksums?**\n\n```python\ndf.write_file(\"df.csv\", file_hash=True)\ndf = Film.read_file(\"df.csv\", file_hash=True)  # fails if wrong\n```\n\n**Get example datasets?**\n\n```python\nprint(ExampleDfs.penguins().df)\n#    species     island  bill_length_mm  ...  flipper_length_mm  body_mass_g     sex\n# 0    Adelie  Torgersen            39.1  ...              181.0       3750.0    MALE\n```\n\n**Pretty-print the obvious way?**\n\n```python\ndf.pretty_print(to=\"all_data.md.zip\")\nwiki_txt = df.pretty_print(fmt=\"mediawiki\")\n```\n\nAll standard DataFrame methods remain available.\nUse `.of(df)` to convert to your type, or `.vanilla()` for a plain DataFrame.\n\n**[Read the docs \ud83d\udcda](https://typed-dfs.readthedocs.io/en/stable/)** for more info and examples.\n\n### \ud83d\udc1b Pandas serialization bugs fixed\n\nPandas has several issues with serialization.\n\n<details>\n<summary><em>See: Fixed issues</em></summary>\nDepending on the format and columns, these issues occur:\n\n- columns being silently added or dropped,\n- errors on either read or write of empty DataFrames,\n- the inability to use DataFrames with indices in Feather,\n- writing to Parquet failing with half-precision,\n- lingering partially written files on error,\n- the buggy xlrd being preferred by read_excel,\n- the buggy odfpy also being preferred,\n- writing a file and reading it back results in a different DataFrame,\n- you can\u2019t write fixed-width format,\n- and the platform text encoding being used rather than utf-8.\n- invalid JSON is written via the built-in json library\n\n</details>\n\n### \ud83c\udf81 Other features\n\nSee more in the [guided walkthrough \u270f\ufe0f](https://typed-dfs.readthedocs.io/en/latest/guide.html)\n\n<details>\n<summary><em>See: Short feature list</em></summary>\n\n- Dtype-aware natural sorting\n- UTF-8 by default\n- Near-atomicity of read/write\n- Matrix-like typed dataframes and methods (e.g. `matrix.is_symmetric()`)\n- DataFrame-compatible frozen, hashable, ordered collections (dict, list, and set)\n- Serialize JSON robustly, preserving NaN, inf, \u2212inf, enums, timezones, complex numbers, etc.\n- Serialize more formats like TOML and INI\n- Interpreting paths and formats (e.g. `FileFormat.split(\"dir/myfile.csv.gz\").compression # gz`)\n- Generate good CLI help text for input DataFrames\n- Parse/verify/add/update/delete files in a .shasum-like file\n\n</details>\n\n### \ud83d\udc94 Limitations\n\n<details>\n<summary><em>See: List of limitations</em></summary>\n\n- Multi-level columns are not yet supported.\n- Columns and index levels cannot share names.\n- Duplicate column names are not supported. (These are strange anyway.)\n- A typed DF cannot have columns \"level_0\", \"index\", or \"Unnamed: 0\".\n- `inplace` is forbidden in some functions; avoid it or use `.vanilla()`.\n\n</details>\n\n### \ud83d\udd0c Serialization support\n\nTypedDfs provides the methods `read_file` and `write_file`, which guess the format from the\nfilename extension. For example, this will convert a gzipped, tab-delimited file to Feather:\n\n```python\nTastyDf = typeddfs.typed(\"TastyDf\").build()\nTastyDf.read_file(\"myfile.tab.gz\").write_file(\"myfile.feather\")\n```\n\nPandas does most of the serialization, but some formats require extra packages.\nTyped-dfs specifies [extras](https://python-poetry.org/docs/pyproject/#extras)\nto help you get required packages and with compatible versions.\n\nHere are the extras:\n\n- `feather`: [Feather](https://arrow.apache.org/docs/python/feather.html) (uses: pyarrow)\n- `parquet`: [Parquet (e.g. .snappy)](https://github.com/apache/parquet-format) (uses: pyarrow)\n- `xml` (uses: lxml)\n- `excel`: Excel and LibreOffice .xlsx/.ods/.xls, etc. (uses: openpyxl, defusedxml)\n- `toml`: [TOML](https://toml.io/en/) (uses: tomlkit)\n- `html` (uses: html5lib, beautifulsoup4)\n- `xlsb`: rare binary Excel file (uses: pyxlsb)\n- [HDF5](https://www.hdfgroup.org/solutions/hdf5/) _{no extra provided}_ (_use:_ `tables`)\n\nFor example, for Feather and TOML support use: `typeddfs[feather,toml]`  \nAs a shorthand for all formats, use `typeddfs[all]`.\n\n### \ud83d\udcca Serialization in-depth\n\n<details>\n<summary><em>See: Full table</em></summary>\n\n| format      | packages                     | extra     | sanity | speed | file sizes |\n| ----------- | ---------------------------- | --------- | ------ | ----- | ---------- |\n| Feather     | `pyarrow`                    | `feather` | +++    | ++++  | +++        |\n| Parquet     | `pyarrow` or `fastparquet` \u2020 | `parquet` | ++     | +++   | ++++       |\n| csv/tsv     | none                         | none      | ++     | \u2212\u2212    | \u2212\u2212         |\n| flexwf \u2021    | none                         | none      | ++     | \u2212\u2212    | \u2212\u2212         |\n| .fwf        | none                         | none      | +      | \u2212\u2212    | \u2212\u2212         |\n| json        | none                         | none      | \u2212\u2212     | \u2212\u2212\u2212   | \u2212\u2212\u2212        |\n| xml         | `lxml`                       | `xml`     | \u2212      | \u2212\u2212\u2212   | \u2212\u2212\u2212        |\n| .properties | none                         | none      | \u2212\u2212     | \u2212\u2212    | \u2212\u2212         |\n| toml        | `tomlkit`                    | `toml`    | \u2212\u2212     | \u2212\u2212    | \u2212\u2212         |\n| INI         | none                         | none      | \u2212\u2212\u2212    | \u2212\u2212    | \u2212\u2212         |\n| .lines      | none                         | none      | ++     | \u2212\u2212    | \u2212\u2212         |\n| .npy        | none                         | none      | \u2212      | +     | +++        |\n| .npz        | none                         | none      | \u2212      | +     | +++        |\n| .html       | `html5lib,beautifulsoup4`    | `html`    | \u2212\u2212     | \u2212\u2212\u2212   | \u2212\u2212\u2212        |\n| pickle      | none                         | none      | \u2212\u2212     | \u2212\u2212\u2212   | \u2212\u2212\u2212        |\n| XLSX        | `openpyxl,defusedxml`        | `excel`   | +      | \u2212\u2212    | +          |\n| ODS         | `openpyxl,defusedxml`        | `excel`   | +      | \u2212\u2212    | +          |\n| XLS         | `openpyxl,defusedxml`        | `excel`   | \u2212\u2212     | \u2212\u2212    | +          |\n| XLSB        | `pyxlsb`                     | `xlsb`    | \u2212\u2212     | \u2212\u2212    | ++         |\n| HDF5        | `tables`                     | `hdf5`    | \u2212\u2212     | \u2212     | ++         |\n\n**\u26a0 Note:** The `hdf5` extra is currently disabled.\n\n</details>\n\n<details>\n<summary><em>See: serialization notes</em></summary>\n\n- \u2020 `fastparquet` can be used instead. It is slower but much smaller.\n- Parquet only supports str, float64, float32, int64, int32, and bool.\n  Other numeric types are automatically converted during write.\n- \u2021 `.flexwf` is fixed-width with optional delimiters.\n- JSON has inconsistent handling of `None`. ([orjson](https://github.com/ijl/orjson) is more consistent).\n- XML requires Pandas 1.3+.\n- Not all JSON, XML, TOML, and HDF5 files can be read.\n- .ini and .properties can only be written with exactly 2 columns + index levels:\n  a key and a value. INI keys are in the form `section.name`.\n- .lines can only be written with exactly 1 column or index level.\n- .npy and .npz only serialize numpy objects.\n  They are not supported in `read_file` and `write_file`.\n- .html is not supported in `read_file` and `write_file`.\n- Pickle is insecure and not recommended.\n- Pandas supports odfpy for ODS and xlrd for XLS. In fact, it prefers those.\n  However, they are very buggy; openpyxl is much better.\n- XLSM, XLTX, XLTM, XLS, and XLSB files can contain macros, which Microsoft Excel will ingest.\n- XLS is a deprecated format.\n- XLSB is not fully supported in Pandas.\n- HDF may not work on all platforms yet due to a\n  [tables issue](https://github.com/PyTables/PyTables/issues/854).\n\nFeather offers massively better performance over CSV, gzipped CSV, and HDF5\nin read speed, write speed, memory overhead, and compression ratios.\nParquet typically results in smaller file sizes than Feather at some cost in speed.\nFeather is the preferred format for most cases.\n\n</details>\n\n### \ud83d\udd12 Security\n\nRefer to the [security policy](https://github.com/dmyersturnbull/typed-dfs/blob/main/SECURITY.md).\n\n### \ud83d\udcdd Extra notes\n\n<details>\n<summary><em>See: Pinned versions</em></summary>\n\nDependencies in the extras only have version minimums, not maximums.\nFor example, typed-dfs requires pyarrow >= 4.\n[natsort](https://github.com/SethMMorton/natsort) is also only assigned a minimum version number.\nThis means that the result of typed-df\u2019s `sort_natural` could change.\nTo fix this, pin natsort to a specific major version;\ne.g. `natsort = \"^8\"` with [Poetry](https://python-poetry.org/) or `natsort>=8,<9` with pip.\n\n</details>\n\n### \ud83c\udf41 Contributing\n\nTyped-Dfs is licensed under the [Apache License, version 2.0](https://www.apache.org/licenses/LICENSE-2.0).\n[New issues](https://github.com/dmyersturnbull/typed-dfs/issues) and pull requests are welcome.\nPlease refer to the [contributing guide](https://github.com/dmyersturnbull/typed-dfs/blob/main/CONTRIBUTING.md).\nGenerated with [Tyrannosaurus](https://github.com/dmyersturnbull/tyrannosaurus).\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Pandas DataFrame subclasses that enforce structure and can self-organize.",
    "version": "0.16.5",
    "project_urls": {
        "CI": "https://github.com/dmyersturnbull/typed-dfs/actions",
        "Documentation": "https://typed-dfs.readthedocs.io",
        "Download": "https://pypi.org/project/typeddfs/",
        "Homepage": "https://github.com/dmyersturnbull/typed-dfs",
        "Issues": "https://github.com/dmyersturnbull/typed-dfs/issues",
        "Repository": "https://github.com/dmyersturnbull/typed-dfs"
    },
    "split_keywords": [
        "pandas",
        "typing",
        "columns",
        "structured"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "00d206bf2e9884a5758e0a999504a48462ff40a8777f3a602f83611e011fdb55",
                "md5": "977e483f603f7921eb4ae6abd4fe7969",
                "sha256": "6c2a0a98a2bdd6bb941219864abc22874e30d1aa720e4e86dc77061db863ee98"
            },
            "downloads": -1,
            "filename": "typeddfs-0.16.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "977e483f603f7921eb4ae6abd4fe7969",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 92409,
            "upload_time": "2022-03-01T04:56:10",
            "upload_time_iso_8601": "2022-03-01T04:56:10.056654Z",
            "url": "https://files.pythonhosted.org/packages/00/d2/06bf2e9884a5758e0a999504a48462ff40a8777f3a602f83611e011fdb55/typeddfs-0.16.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df6923b4c90de17493d82dcd65f094b636dc389fc3cd99342a109678bb06e101",
                "md5": "f97a009aa64c13e52bc8e2307dbb1d64",
                "sha256": "631296a252ddb614c997596d75b268768ab6c225becc434311f79e9027d73d15"
            },
            "downloads": -1,
            "filename": "typeddfs-0.16.5.tar.gz",
            "has_sig": false,
            "md5_digest": "f97a009aa64c13e52bc8e2307dbb1d64",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 77906,
            "upload_time": "2022-03-01T04:56:11",
            "upload_time_iso_8601": "2022-03-01T04:56:11.765619Z",
            "url": "https://files.pythonhosted.org/packages/df/69/23b4c90de17493d82dcd65f094b636dc389fc3cd99342a109678bb06e101/typeddfs-0.16.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-03-01 04:56:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dmyersturnbull",
    "github_project": "typed-dfs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "typeddfs"
}
        
Elapsed time: 2.70770s