# Describe and optimize data
[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]
This API and command line program describes data in tables with metadata and
generate LaTeX tables in a `.sty` file from CSV files. The paths to the CSV
files to create tables from and their metadata is given as a YAML configuration
file. Paraemters are both files or both directories. When using directories,
only files that match `*-table.yml` are considered. In addition, the described
data can be hyperparameter metadata, which can be optimized with the
[hyperparameter module](#hyperparameters).
Features:
* Associate metadata with each column in a Pandas DataFrame.
* DataFrame metadata is used to format LaTeX data and exported to Excel as
column header notes.
* Data and metadata is viewable in a nice format with paging in a web browser
using the [Render program].
* Usable as an API during data collection for research projects.
## Documentation
See the [full documentation](https://plandes.github.io/datdesc/index.html).
The [API reference](https://plandes.github.io/datdesc/api.html) is also
available.
## Obtaining
The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.datdesc
```
Binaries are also available on [pypi].
## Usage
First create the table's configuration file. For example, to create a Latex
`.sty` file from the CSV file `test-resources/section-id.csv` using the first
column as the index (makes that column go away) using a variable size and
placement, use:
```yaml
intercodertab:
path: test-resources/section-id.csv
caption: >-
Krippendorff’s ...
size: VAR
placement: VAR
single_column: true
uses: zentable
read_kwargs:
index_col: 0
write_kwargs:
disable_numparse: true
replace_nan: ' '
blank_columns: [0]
bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]
```
Some of these fields include:
* **placement**: the placement (i.e. `h!`), which `VAR` means to create the
command with a variable to use as the first parameter
* **size**: the font size (i.e. `small`), which `VAR` means to create the
command with a variable to use as the second parameter
* **index_col**: clears column 0 and
* **bold_cells**: make certain cells bold
* **disable_numparse** tells the `tabulate` module not reformat numbers
See the [Table] class for a full listing of options.
## Hyperparameters
Hyperparameter metadata: access and documentation. This package was designed
for the following purposes:
* Provide a basic scaffolding to update model hyperparameters such as
[hyperopt].
* Generate LaTeX tables of the hyperparamers and their descriptions for
academic papers.
Access to the hyperparameters via the API is done by calling the *set* or
*model* levels with a *dotted path notation* string. For example, `svm.C`
first navigates to model `svm`, then to the hyperparameter named `C`.
A command line access to create LaTeX tables from the hyperparameter
definitions is available with the `hyper` action. An example of a
hyperparameter set (a grouping of models that in turn have hyperparameters)
follows:
```yaml
svm:
doc: 'support vector machine'
params:
kernel:
type: choice
choices: [radial, linear]
doc: 'maps the observations into some feature space'
C:
type: float
doc: 'regularization parameter'
max_iter:
type: int
doc: 'number of iterations'
value: 20
interval: [1, 30]
```
In the example, the `svm` model has hyperparameters `kernel`, `C` and
`max_iter`. The `kernel` type is set as a choice, which is a string that has
the constraints of matching a string in the list. The `C` hyperparameter is a
floating point number, and the `max_iter` is an integer that must be between 1
and 30.
In this next example, the `k_means` model uses the string `k-means` in human
readable documentation, which can be Python generated code in a `dataclass`.
```yaml
k_means:
desc: k-means
doc: 'k-means clustering'
params:
n_clusters:
type: int
doc: 'number of clusters'
copy_x:
type: bool
value: True
doc: 'When pre-computing distances it is more numerically accurate to center the data first'
strata:
type: list
doc: 'An array of stratified hyperparameters (made up for test cases).'
value: [1, 2]
kwargs:
type: dict
doc: 'Model keyword arguments (made up for test cases).'
value:
learning_rate: 0.01
epochs: 3
```
## Changelog
An extensive changelog is available [here](CHANGELOG.md).
## Community
Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.
## License
[MIT License](LICENSE.md)
Copyright (c) 2023 Paul Landes
<!-- links -->
[pypi]: https://pypi.org/project/zensols.datdesc/
[pypi-link]: https://pypi.python.org/pypi/zensols.datdesc
[pypi-badge]: https://img.shields.io/pypi/v/zensols.datdesc.svg
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/datdesc/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/datdesc/actions
[hyperopt]: http://hyperopt.github.io/hyperopt/
[Render program]: https://github.com/plandes/rend
[Table]: api/zensols.datdesc.html#zensols.datdesc.table.Table
Raw data
{
"_id": null,
"home_page": "https://github.com/plandes/datdesc",
"name": "zensols.datdesc",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "tooling",
"author": "Paul Landes",
"author_email": "landes@mailc.net",
"download_url": "https://github.com/plandes/datdesc/releases/download/v0.2.2/zensols.datdesc-0.2.2-py3-none-any.whl",
"platform": null,
"description": "# Describe and optimize data\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nThis API and command line program describes data in tables with metadata and\ngenerate LaTeX tables in a `.sty` file from CSV files. The paths to the CSV\nfiles to create tables from and their metadata is given as a YAML configuration\nfile. Paraemters are both files or both directories. When using directories,\nonly files that match `*-table.yml` are considered. In addition, the described\ndata can be hyperparameter metadata, which can be optimized with the\n[hyperparameter module](#hyperparameters).\n\nFeatures:\n* Associate metadata with each column in a Pandas DataFrame.\n* DataFrame metadata is used to format LaTeX data and exported to Excel as\n column header notes.\n* Data and metadata is viewable in a nice format with paging in a web browser\n using the [Render program].\n* Usable as an API during data collection for research projects.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/datdesc/index.html).\nThe [API reference](https://plandes.github.io/datdesc/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.datdesc\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nFirst create the table's configuration file. For example, to create a Latex\n`.sty` file from the CSV file `test-resources/section-id.csv` using the first\ncolumn as the index (makes that column go away) using a variable size and\nplacement, use:\n```yaml\nintercodertab:\n path: test-resources/section-id.csv\n caption: >-\n Krippendorff\u2019s ...\n size: VAR\n placement: VAR\n single_column: true\n uses: zentable\n read_kwargs:\n index_col: 0\n write_kwargs:\n disable_numparse: true\n replace_nan: ' '\n blank_columns: [0]\n bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]\n```\n\nSome of these fields include:\n\n* **placement**: the placement (i.e. `h!`), which `VAR` means to create the\n command with a variable to use as the first parameter\n* **size**: the font size (i.e. `small`), which `VAR` means to create the\n command with a variable to use as the second parameter\n* **index_col**: clears column 0 and\n* **bold_cells**: make certain cells bold\n* **disable_numparse** tells the `tabulate` module not reformat numbers\n\nSee the [Table] class for a full listing of options.\n\n\n## Hyperparameters\n\nHyperparameter metadata: access and documentation. This package was designed\nfor the following purposes:\n\n* Provide a basic scaffolding to update model hyperparameters such as\n [hyperopt].\n* Generate LaTeX tables of the hyperparamers and their descriptions for\n academic papers.\n\nAccess to the hyperparameters via the API is done by calling the *set* or\n*model* levels with a *dotted path notation* string. For example, `svm.C`\nfirst navigates to model `svm`, then to the hyperparameter named `C`.\n\nA command line access to create LaTeX tables from the hyperparameter\ndefinitions is available with the `hyper` action. An example of a\nhyperparameter set (a grouping of models that in turn have hyperparameters)\nfollows:\n```yaml\nsvm:\n doc: 'support vector machine'\n params:\n kernel:\n type: choice\n choices: [radial, linear]\n doc: 'maps the observations into some feature space'\n C:\n type: float\n doc: 'regularization parameter'\n max_iter:\n type: int\n doc: 'number of iterations'\n value: 20\n interval: [1, 30]\n```\nIn the example, the `svm` model has hyperparameters `kernel`, `C` and\n`max_iter`. The `kernel` type is set as a choice, which is a string that has\nthe constraints of matching a string in the list. The `C` hyperparameter is a\nfloating point number, and the `max_iter` is an integer that must be between 1\nand 30.\n\nIn this next example, the `k_means` model uses the string `k-means` in human\nreadable documentation, which can be Python generated code in a `dataclass`.\n```yaml\nk_means:\n desc: k-means\n doc: 'k-means clustering'\n params:\n n_clusters:\n type: int\n doc: 'number of clusters'\n copy_x:\n type: bool\n value: True\n doc: 'When pre-computing distances it is more numerically accurate to center the data first'\n strata:\n type: list\n doc: 'An array of stratified hyperparameters (made up for test cases).'\n value: [1, 2]\n kwargs:\n type: dict\n doc: 'Model keyword arguments (made up for test cases).'\n value:\n learning_rate: 0.01\n epochs: 3\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2023 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.datdesc/\n[pypi-link]: https://pypi.python.org/pypi/zensols.datdesc\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.datdesc.svg\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/datdesc/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/datdesc/actions\n\n[hyperopt]: http://hyperopt.github.io/hyperopt/\n[Render program]: https://github.com/plandes/rend\n\n[Table]: api/zensols.datdesc.html#zensols.datdesc.table.Table\n",
"bugtrack_url": null,
"license": "",
"summary": "Generate Latex tables in a .sty file from CSV files",
"version": "0.2.2",
"project_urls": {
"Download": "https://github.com/plandes/datdesc/releases/download/v0.2.2/zensols.datdesc-0.2.2-py3-none-any.whl",
"Homepage": "https://github.com/plandes/datdesc"
},
"split_keywords": [
"tooling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bfb46fa886289d5e5f9b25aea5a0a66ae66477ce6138a6d4c790dbb5a43bde6e",
"md5": "ce923b69b8aca40b60735098a68b8203",
"sha256": "f83298c5548253dec19c3132c4ebf78231403d893e0d03cf2dd1732afb844d10"
},
"downloads": -1,
"filename": "zensols.datdesc-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ce923b69b8aca40b60735098a68b8203",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 35174,
"upload_time": "2024-03-05T10:54:21",
"upload_time_iso_8601": "2024-03-05T10:54:21.296087Z",
"url": "https://files.pythonhosted.org/packages/bf/b4/6fa886289d5e5f9b25aea5a0a66ae66477ce6138a6d4c790dbb5a43bde6e/zensols.datdesc-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-05 10:54:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "plandes",
"github_project": "datdesc",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zensols.datdesc"
}