zensols-datdesc


Namezensols-datdesc JSON
Version 1.3.2 PyPI version JSON
download
home_pageNone
SummaryThis API and command line program describes data in tables with metadata and generate LaTeX tables in a `.sty` file from CSV files.
upload_time2025-07-17 03:08:41
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.11
licenseMIT
keywords academia data tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Describe and optimize data

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.11][python311-badge]][python311-link]
[![Python 3.12][python311-badge]][python312-link]
[![Build Status][build-badge]][build-link]

In this package, Pythonic objects are used to easily (un)serialize to create
LaTeX tables, figures and Excel files.  The API and command-line program
describes data in tables with metadata and using YAML and CSV files and
integrates with [Pandas].  The paths to the CSV files to create tables from and
their metadata is given as a YAML configuration file.

Features:
* Create LaTeX tables (with captions) and Excel files (with notes) of tabular
  metadata from CSV files.
* Create LaTeX friendly encapsulated postscript (`.eps`) files from CSV files.
* Data and metadata is viewable in a nice format with paging in a web browser
  using the [Render program].
* Usable as an API during data collection for research projects.


<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
## Table of Contents

- [Documentation](#documentation)
- [Obtaining](#obtaining)
- [Usage](#usage)
    - [Tables](#tables)
    - [Figures](#figures)
    - [Hyperparameters](#hyperparameters)
- [Changelog](#changelog)
- [Community](#community)
- [License](#license)

<!-- markdown-toc end -->


## Documentation

See the [full documentation](https://plandes.github.io/datdesc/index.html).
The [API reference](https://plandes.github.io/datdesc/api.html) is also
available.


## Obtaining

The library can be installed with pip from the [pypi] repository:
```bash
pip3 install zensols.datdesc
```

Binaries are also available on [pypi].


## Usage

The library can be used as a Python API to programmatically create tables,
figures, and/or represent tabular data.  However, it also has a very robust
command-line that is intended by be used by [GNU make].  The command-line can
be used to create on the fly LaTeX `.sty` files that are generated as commands
and figures are generated as Encapsulated Postscript (`.eps`) files.

The YAML file format is used to create both tables and figures.  Parameters are
both files or both directories when using directories, only files that match
`*-table.yml` are considered on the command line.  In addition, the described
data can be hyperparameter metadata, which can be optimized with the
[hyperparameter module](#hyperparameters).


### Tables

First create the table's configuration file.  For example, to create a Latex
`.sty` file from the CSV file `test-resources/section-id.csv` using the first
column as the index (makes that column go away) using a variable size and
placement, use:
```yaml
intercodertab:
  type: one_column
  path: test-resources/section-id.csv
  caption: >-
    Krippendorff’s ...
  single_column: true
  uses: zentable
  read_params:
    index_col: 0
  tabulate_params:
    disable_numparse: true
  replace_nan: ' '
  blank_columns: [0]
  bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]
```

Some of these fields include:

* **index_col**: clears column 0 and
* **bold_cells**: make certain cells bold
* **disable_numparse** tells the `tabulate` module not reformat numbers

See the [Table] class for a full listing of options.


### Figures

Figures can be generated in any format supported by [matplotlib] (namely
`.eps`, `.svg`, and `.pdf`).  Figures are configured in a very similar fashion
to [tables](#tables).  The configuration also points to a CSV file, but
describes the plot.

The primary difference is that the YAML is parsed using the [Zensols parsing
rules] so the string `path: target` will be given to a new [Plot] instance as a
[pathlib.Path].

A bar plot is configured below:
```yaml
irisFig:
  image_dir: 'path: target'
  seaborn:
    style:
      style: darkgrid
      rc:
        axes.facecolor: 'str: .9'
    context:
      context: 'paper'
      font_scale: 1.3
  plots:
    - type: bar
      data: 'dataframe: test-resources/fig/iris.csv'
      title: 'Iris Splits'
      x_column_name: ds_type
      y_column_name: count
      code: |
        df = df.groupby('ds_type').agg({'ds_type': 'count'}).\
          rename(columns={'ds_type': 'count'}).reset_index()
```
This configuration meaning:
* The top level `irisFig` creates a [Figure] instance, and when used with the
  command line, outputs this root level string as the name in the `image_dir`
  directory.
* The `image_dir` tells where to write the image.  This should be left out when
  invoking from the command-line to allow it to decide where to write the file.
* The `seaborn` section configures the [seaborn] module.
* The plots are a *list* of [Plot] instances that, like the [Figure] level, are
  populated with all the values.
* The `code` (optionally) allows the massaging of the [Pandas] dataframe
  (pointed to by `data`).  This feature also exists for [Table].

See the [Figure] and [Plot] classes for a full listing of options.



### Hyperparameters

Hyperparameter metadata is largely isomorphic to `datdesc` tables.  This
package was designed for the following purposes:

* Provide a basic scaffolding to update model hyperparameters such as
  [hyperopt].
* Generate LaTeX tables of the hyperparamers and their descriptions for
  academic papers.

Access to the hyperparameters via the API is done by calling the *set* or
*model* levels with a *dotted path notation* string.  For example, `svm.C`
first navigates to model `svm`, then to the hyperparameter named `C`.

A command line access to create LaTeX tables from the hyperparameter
definitions is available with the `hyper` action.  An example of a
hyperparameter set (a grouping of models that in turn have hyperparameters)
follows:
```yaml
svm:
  doc: 'support vector machine'
  params:
    kernel:
      type: choice
      choices: [radial, linear]
      doc: 'maps the observations into some feature space'
    C:
      type: float
      doc: 'regularization parameter'
    max_iter:
      type: int
      doc: 'number of iterations'
      value: 20
      interval: [1, 30]
```
In the example, the `svm` model has hyperparameters `kernel`, `C` and
`max_iter`.  The `kernel` type is set as a choice, which is a string that has
the constraints of matching a string in the list.  The `C` hyperparameter is a
floating point number, and the `max_iter` is an integer that must be between 1
and 30.

In this next example, the `k_means` model uses the string `k-means` in human
readable documentation, which can be Python generated code in a `dataclass`.
```yaml
k_means:
  desc: k-means
  doc: 'k-means clustering'
  params:
    n_clusters:
      type: int
      doc: 'number of clusters'
    copy_x:
      type: bool
      value: True
      doc: 'When pre-computing distances it is more numerically accurate to center the data first'
    strata:
      type: list
      doc: 'An array of stratified hyperparameters (made up for test cases).'
      value: [1, 2]
    kwargs:
      type: dict
      doc: 'Model keyword arguments (made up for test cases).'
      value:
        learning_rate: 0.01
        epochs: 3
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License](LICENSE.md)

Copyright (c) 2023 - 2025 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.datdesc/
[pypi-link]: https://pypi.python.org/pypi/zensols.datdesc
[pypi-badge]: https://img.shields.io/pypi/v/zensols.datdesc.svg
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[python312-badge]: https://img.shields.io/badge/python-3.12-blue.svg
[python312-link]: https://www.python.org/downloads/release/python-3120
[build-badge]: https://github.com/plandes/datdesc/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/datdesc/actions

[GNU make]: https://www.gnu.org/software/make/
[matplotlib]: https://matplotlib.org
[seaborn]: http://seaborn.pydata.org
[hyperopt]: http://hyperopt.github.io/hyperopt/
[pathlib.Path]: https://docs.python.org/3/library/pathlib.html
[Pandas]: https://pandas.pydata.org

[Zensols parsing rules]: https://plandes.github.io/util/doc/config.html#parsing
[Render program]: https://github.com/plandes/rend

[Table]: api/zensols.datdesc.html#zensols.datdesc.table.Table
[Figure]: api/zensols.datdesc.html#zensols.datdesc.figure.Figure
[Plot]: api/zensols.datdesc.html#zensols.datdesc.figure.Plot

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zensols-datdesc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.11",
    "maintainer_email": null,
    "keywords": "academia, data, tooling",
    "author": null,
    "author_email": "Paul Landes <landes@mailc.net>",
    "download_url": null,
    "platform": null,
    "description": "# Describe and optimize data\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Python 3.12][python311-badge]][python312-link]\n[![Build Status][build-badge]][build-link]\n\nIn this package, Pythonic objects are used to easily (un)serialize to create\nLaTeX tables, figures and Excel files.  The API and command-line program\ndescribes data in tables with metadata and using YAML and CSV files and\nintegrates with [Pandas].  The paths to the CSV files to create tables from and\ntheir metadata is given as a YAML configuration file.\n\nFeatures:\n* Create LaTeX tables (with captions) and Excel files (with notes) of tabular\n  metadata from CSV files.\n* Create LaTeX friendly encapsulated postscript (`.eps`) files from CSV files.\n* Data and metadata is viewable in a nice format with paging in a web browser\n  using the [Render program].\n* Usable as an API during data collection for research projects.\n\n\n<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->\n## Table of Contents\n\n- [Documentation](#documentation)\n- [Obtaining](#obtaining)\n- [Usage](#usage)\n    - [Tables](#tables)\n    - [Figures](#figures)\n    - [Hyperparameters](#hyperparameters)\n- [Changelog](#changelog)\n- [Community](#community)\n- [License](#license)\n\n<!-- markdown-toc end -->\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/datdesc/index.html).\nThe [API reference](https://plandes.github.io/datdesc/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe library can be installed with pip from the [pypi] repository:\n```bash\npip3 install zensols.datdesc\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nThe library can be used as a Python API to programmatically create tables,\nfigures, and/or represent tabular data.  However, it also has a very robust\ncommand-line that is intended by be used by [GNU make].  The command-line can\nbe used to create on the fly LaTeX `.sty` files that are generated as commands\nand figures are generated as Encapsulated Postscript (`.eps`) files.\n\nThe YAML file format is used to create both tables and figures.  Parameters are\nboth files or both directories when using directories, only files that match\n`*-table.yml` are considered on the command line.  In addition, the described\ndata can be hyperparameter metadata, which can be optimized with the\n[hyperparameter module](#hyperparameters).\n\n\n### Tables\n\nFirst create the table's configuration file.  For example, to create a Latex\n`.sty` file from the CSV file `test-resources/section-id.csv` using the first\ncolumn as the index (makes that column go away) using a variable size and\nplacement, use:\n```yaml\nintercodertab:\n  type: one_column\n  path: test-resources/section-id.csv\n  caption: >-\n    Krippendorff\u2019s ...\n  single_column: true\n  uses: zentable\n  read_params:\n    index_col: 0\n  tabulate_params:\n    disable_numparse: true\n  replace_nan: ' '\n  blank_columns: [0]\n  bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]\n```\n\nSome of these fields include:\n\n* **index_col**: clears column 0 and\n* **bold_cells**: make certain cells bold\n* **disable_numparse** tells the `tabulate` module not reformat numbers\n\nSee the [Table] class for a full listing of options.\n\n\n### Figures\n\nFigures can be generated in any format supported by [matplotlib] (namely\n`.eps`, `.svg`, and `.pdf`).  Figures are configured in a very similar fashion\nto [tables](#tables).  The configuration also points to a CSV file, but\ndescribes the plot.\n\nThe primary difference is that the YAML is parsed using the [Zensols parsing\nrules] so the string `path: target` will be given to a new [Plot] instance as a\n[pathlib.Path].\n\nA bar plot is configured below:\n```yaml\nirisFig:\n  image_dir: 'path: target'\n  seaborn:\n    style:\n      style: darkgrid\n      rc:\n        axes.facecolor: 'str: .9'\n    context:\n      context: 'paper'\n      font_scale: 1.3\n  plots:\n    - type: bar\n      data: 'dataframe: test-resources/fig/iris.csv'\n      title: 'Iris Splits'\n      x_column_name: ds_type\n      y_column_name: count\n      code: |\n        df = df.groupby('ds_type').agg({'ds_type': 'count'}).\\\n          rename(columns={'ds_type': 'count'}).reset_index()\n```\nThis configuration meaning:\n* The top level `irisFig` creates a [Figure] instance, and when used with the\n  command line, outputs this root level string as the name in the `image_dir`\n  directory.\n* The `image_dir` tells where to write the image.  This should be left out when\n  invoking from the command-line to allow it to decide where to write the file.\n* The `seaborn` section configures the [seaborn] module.\n* The plots are a *list* of [Plot] instances that, like the [Figure] level, are\n  populated with all the values.\n* The `code` (optionally) allows the massaging of the [Pandas] dataframe\n  (pointed to by `data`).  This feature also exists for [Table].\n\nSee the [Figure] and [Plot] classes for a full listing of options.\n\n\n\n### Hyperparameters\n\nHyperparameter metadata is largely isomorphic to `datdesc` tables.  This\npackage was designed for the following purposes:\n\n* Provide a basic scaffolding to update model hyperparameters such as\n  [hyperopt].\n* Generate LaTeX tables of the hyperparamers and their descriptions for\n  academic papers.\n\nAccess to the hyperparameters via the API is done by calling the *set* or\n*model* levels with a *dotted path notation* string.  For example, `svm.C`\nfirst navigates to model `svm`, then to the hyperparameter named `C`.\n\nA command line access to create LaTeX tables from the hyperparameter\ndefinitions is available with the `hyper` action.  An example of a\nhyperparameter set (a grouping of models that in turn have hyperparameters)\nfollows:\n```yaml\nsvm:\n  doc: 'support vector machine'\n  params:\n    kernel:\n      type: choice\n      choices: [radial, linear]\n      doc: 'maps the observations into some feature space'\n    C:\n      type: float\n      doc: 'regularization parameter'\n    max_iter:\n      type: int\n      doc: 'number of iterations'\n      value: 20\n      interval: [1, 30]\n```\nIn the example, the `svm` model has hyperparameters `kernel`, `C` and\n`max_iter`.  The `kernel` type is set as a choice, which is a string that has\nthe constraints of matching a string in the list.  The `C` hyperparameter is a\nfloating point number, and the `max_iter` is an integer that must be between 1\nand 30.\n\nIn this next example, the `k_means` model uses the string `k-means` in human\nreadable documentation, which can be Python generated code in a `dataclass`.\n```yaml\nk_means:\n  desc: k-means\n  doc: 'k-means clustering'\n  params:\n    n_clusters:\n      type: int\n      doc: 'number of clusters'\n    copy_x:\n      type: bool\n      value: True\n      doc: 'When pre-computing distances it is more numerically accurate to center the data first'\n    strata:\n      type: list\n      doc: 'An array of stratified hyperparameters (made up for test cases).'\n      value: [1, 2]\n    kwargs:\n      type: dict\n      doc: 'Model keyword arguments (made up for test cases).'\n      value:\n        learning_rate: 0.01\n        epochs: 3\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2023 - 2025 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.datdesc/\n[pypi-link]: https://pypi.python.org/pypi/zensols.datdesc\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.datdesc.svg\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[python312-badge]: https://img.shields.io/badge/python-3.12-blue.svg\n[python312-link]: https://www.python.org/downloads/release/python-3120\n[build-badge]: https://github.com/plandes/datdesc/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/datdesc/actions\n\n[GNU make]: https://www.gnu.org/software/make/\n[matplotlib]: https://matplotlib.org\n[seaborn]: http://seaborn.pydata.org\n[hyperopt]: http://hyperopt.github.io/hyperopt/\n[pathlib.Path]: https://docs.python.org/3/library/pathlib.html\n[Pandas]: https://pandas.pydata.org\n\n[Zensols parsing rules]: https://plandes.github.io/util/doc/config.html#parsing\n[Render program]: https://github.com/plandes/rend\n\n[Table]: api/zensols.datdesc.html#zensols.datdesc.table.Table\n[Figure]: api/zensols.datdesc.html#zensols.datdesc.figure.Figure\n[Plot]: api/zensols.datdesc.html#zensols.datdesc.figure.Plot\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This API and command line program describes data in tables with metadata and generate LaTeX tables in a `.sty` file from CSV files.",
    "version": "1.3.2",
    "project_urls": {
        "Changelog": "https://github.com/plandes/datdesc/blob/master/CHANGELOG.md",
        "Documentation": "https://plandes.github.io/datdesc",
        "Homepage": "https://github.com/plandes/datdesc",
        "Issues": "https://github.com/plandes/datdesc/issues",
        "Repository": "https://github.com/plandes/datdesc.git"
    },
    "split_keywords": [
        "academia",
        " data",
        " tooling"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5980a079e5505baae34aa473b1357e75762ad8f7ff4b4168906a97e5f2c39048",
                "md5": "7e78218ae198f03d7913afe07c7adfa8",
                "sha256": "1e63bb3134d5051e58b783d80db760cd0bc82fa3fef924ce375cd6a35ae44df6"
            },
            "downloads": -1,
            "filename": "zensols_datdesc-1.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7e78218ae198f03d7913afe07c7adfa8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.11",
            "size": 56482,
            "upload_time": "2025-07-17T03:08:41",
            "upload_time_iso_8601": "2025-07-17T03:08:41.899032Z",
            "url": "https://files.pythonhosted.org/packages/59/80/a079e5505baae34aa473b1357e75762ad8f7ff4b4168906a97e5f2c39048/zensols_datdesc-1.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 03:08:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "datdesc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols-datdesc"
}
        
Elapsed time: 0.74543s