# TA Data Kit (tadatakit)
TA Data Kit is a Python library developed by [TA Instruments™](https://www.tainstruments.com/), designed for easy parsing and handling of data exported by [TRIOS™ JSON Export Feature](https://www.tainstruments.com/trios-software/#data).
## Examples
If you would like to jump to some usage examples head over to our [collection of Jupyter Notebooks](examples/README.md) showing everything from data reading, plotting, analysis, data conversion and more.
## Installation
### Prerequisites
Before installing `tadatakit`, ensure that you have Python 3.9 or later installed on your system. You can download Python from the official [Python website](https://www.python.org/downloads/).
### Installing via Pip
Open your terminal or command prompt and run the following command:
```bash
pip install tadatakit
```
## Features
The `tadatakit` library offers a robust suite of features designed to simplify and enhance the way you handle data from TRIOS JSON Export Feature.
- **Dynamic Class Generation:** Automatically generates Python classes from the [TRIOS JSON Schema](https://software.tainstruments.com/schemas/TRIOSJSONExportSchema). This ensures that the data models are always in sync with the latest schema definitions.
- **Pandas Integration:** Seamlessly converts data into [pandas](https://pandas.pydata.org/) DataFrames, making it easier to perform complex data analysis, visualization, and manipulation directly from experiment results.
- **Extensible Architecture:** Designed with flexibility in mind, allowing users to easily extend or customize the generated classes to suit specific needs. Whether adding new methods, integrating with other libraries, or modifying property behaviors, `tadatakit` supports it all.
- **Type-Safe Operations:** Employs Python type hints throughout the dynamically generated classes, which enhances code quality and reliability through static type checking.
- **Serialization and Deserialization:** Includes built-in methods for JSON serialization and deserialization, facilitating easy data storage and retrieval, and ensuring data integrity across different stages of data handling.
- **Schema-Driven Validation:** Automatically validates data against the schema upon loading, ensuring that the data conforms to the expected structure and types defined by TA Instruments.
## Quick Start
### Classes
To utilize classes like `Experiment`, import them directly from the `tadatakit.classes` module. These classes are dynamically generated based on the data schema, with helper functions added.
Explore the `Experiment` class in a REPL environment (iPython or Jupyter Notebook):
```python
from tadatakit.classes import Experiment
Experiment?
```
### File Parsing
Easily parse files using the `from_json` method on the `Experiment` class, as demonstrated below:
```python
from tadatakit.classes import Experiment
experiment = Experiment.from_json("<path/to/json_file.json>")
```
As files can be large, be aware that this can take a large amount of memory.
### Using The Data
`Experiment` includes a convenience function to return the results data as a [pandas](https://pandas.pydata.org/) DataFrame. The example below demonstrates parsing a file and utilizing the DataFrame:
```python
from tadatakit.classes import Experiment
experiment = Experiment.from_json("<path/to/json_file.json>")
df = experiment.get_dataframe()
```
## Utilizing and Extending Classes in TA Data Kit
The `tadatakit` library offers a versatile framework for handling and manipulating data through dynamically generated classes. These classes can be used directly (1), extended with additional functionality (2), or fully customized (3).
### 1. Using Auto-Generated Classes
Classes such as `Experiment` are dynamically created from a JSON schema by the `class_generator` module. They come equipped with all necessary properties and methods for basic data handling:
```python
from tadatakit.class_generator import Experiment
experiment = Experiment.from_json('experiment_data.json')
print(experiment.start_time)
```
### 2. Using Auto-Generated Classes Extended With Helper Functions
Use `Experiment` imported from the `classes` module to take advantage of helper functions like:
- **`get_dataframe`**: Transforms `Experiment` results into a pandas DataFrame.
- **`get_dataframes_by_step`**: Divides results into multiple DataFrames, one per procedure step.
**Usage Example:**
```python
from tadatakit.classes import Experiment
experiment = Experiment.from_json('path_to_data.json')
df = experiment.get_dataframe()
print(df.head(5))
step, dfs = experiment.get_dataframes_by_step()
for step, df in zip(step, dfs):
print(step)
print(df.head(5))
```
### 3. Building Custom Extensions
Create custom functionality by adding new methods or altering existing behaviors, perhaps to add polars support, an analysis pipeline, or methods for injection into databases or LIMS systems:
**Steps to Extend:**
1. **Define New Functions**: Craft functions that fulfill specific requirements.
2. **Attach to Classes**: Dynamically bind these functions to the classes.
3. **Implement in Workflow**: Integrate these enhanced objects into your application.
**Custom Method Example:**
```python
from tadatakit.class_generator import Experiment
import datetime
def time_since_experiment(self):
return datetime.datetime.now() - self.start_time
setattr(Experiment, "time_since_experiment", time_since_experiment)
experiment = Experiment.from_json('data.json')
print(experiment.time_since_experiment())
```
> Note: we provide no guarantee that your functions will not conflict with future additions to the schema. For example, if you add a dynamic property of `Experiment.end_time` it may conflict in the future with an `EndTime` property in the schema.
## Explanation Of Approach
The `tadatakit.class_generator` module within the TA Data Kit automates the creation of Python classes directly from the TA Instruments TRIOS JSON Export Schema. This process allows for dynamic and efficient handling of data that conforms to defined standards, enhancing both development speed and data integrity. Here’s how the library achieves this:
### Overview
The library converts a JSON schema provided in a specification file into fully functional Python classes. These classes include type hints, serialization methods, and custom behaviors, closely mirroring the structure and requirements laid out in the schema.
### Steps for Class Generation
#### 1. Schema Loading
The process begins with loading the JSON schema. This schema defines the structure, types, required fields, and additional validation rules for the data.
#### 2. Schema Parsing
The loaded schema is parsed to identify different data structures and types. This includes simple types like strings and numbers, complex objects, arrays, and special formats like dates and UUIDs.
#### 3. Class Creation
For each definition in the schema (representing a potential data model), a Python class is dynamically generated. The library maps JSON types to Python types (e.g., `integer` to `int`, `object` to custom classes) and integrates any constraints and nested structures as class attributes.
#### 4. Property Handling
Each class is equipped with properties based on the schema's definitions. Properties are added dynamically with appropriate getters, setters, and deletions to manage data access and ensure type safety. In some places, for example results data, the schema allows for `additionalProperties` which are treated as `kwargs` in Python.
#### 5. Method Integration
Serialization and deserialization methods such as `from_json`, `to_json`, `from_dict`, and `to_dict` are integrated into each class. These methods handle conversion between JSON strings, dictionaries, and class instances, facilitating easy data exchange and storage operations.
#### 6. Inheritance and Composition
If the schema specifies inheritance (using `allOf`) or composition (using `anyOf` or `oneOf`), the library constructs classes that inherit from multiple bases or handle multiple data types, ensuring that the generated classes faithfully represent the schema's intended data structures.
#### 7. Registration and Accessibility
Generated classes are registered in a global class registry within the library. This registry allows for easy retrieval and instantiation of classes based on schema names, supporting dynamic access and manipulation of data in a type-safe manner.
## Contributing
We welcome contributions from the community and are pleased to have you join us in improving `tadatakit`. Whether you are fixing bugs, adding new features, improving documentation, or suggesting new functionality, your input is valuable!
If you are interested in contributing to the `tadatakit` library, please read our [contributing guidelines](CONTRIBUTING.md) for detailed information on how to get started, coding conventions, and the pull request process.
## Notes
TA Instruments, TA, and TRIOS are trademarks of Waters Technologies Corporation.
Raw data
{
"_id": null,
"home_page": "https://www.tainstruments.com/",
"name": "tadatakit",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "TA Instruments, TRIOS, JSON, data parsing, data analysis, materials science, DSC, TGA, rheology",
"author": "Stuart Cook",
"author_email": "stuart_cook@waters.com",
"download_url": "https://files.pythonhosted.org/packages/d7/8c/7ae540af4795e422fa158e1e4868a34b875c4ad5e28e57d8834e61a32736/tadatakit-0.1.1.tar.gz",
"platform": null,
"description": "# TA Data Kit (tadatakit)\n\nTA Data Kit is a Python library developed by [TA Instruments™](https://www.tainstruments.com/), designed for easy parsing and handling of data exported by [TRIOS™ JSON Export Feature](https://www.tainstruments.com/trios-software/#data).\n\n## Examples\n\nIf you would like to jump to some usage examples head over to our [collection of Jupyter Notebooks](examples/README.md) showing everything from data reading, plotting, analysis, data conversion and more.\n\n## Installation\n\n### Prerequisites\n\nBefore installing `tadatakit`, ensure that you have Python 3.9 or later installed on your system. You can download Python from the official [Python website](https://www.python.org/downloads/).\n\n### Installing via Pip\n\nOpen your terminal or command prompt and run the following command:\n\n```bash\npip install tadatakit\n```\n\n## Features\n\nThe `tadatakit` library offers a robust suite of features designed to simplify and enhance the way you handle data from TRIOS JSON Export Feature.\n\n- **Dynamic Class Generation:** Automatically generates Python classes from the [TRIOS JSON Schema](https://software.tainstruments.com/schemas/TRIOSJSONExportSchema). This ensures that the data models are always in sync with the latest schema definitions.\n\n- **Pandas Integration:** Seamlessly converts data into [pandas](https://pandas.pydata.org/) DataFrames, making it easier to perform complex data analysis, visualization, and manipulation directly from experiment results.\n\n- **Extensible Architecture:** Designed with flexibility in mind, allowing users to easily extend or customize the generated classes to suit specific needs. Whether adding new methods, integrating with other libraries, or modifying property behaviors, `tadatakit` supports it all.\n\n- **Type-Safe Operations:** Employs Python type hints throughout the dynamically generated classes, which enhances code quality and reliability through static type checking.\n\n- **Serialization and Deserialization:** Includes built-in methods for JSON serialization and deserialization, facilitating easy data storage and retrieval, and ensuring data integrity across different stages of data handling.\n\n- **Schema-Driven Validation:** Automatically validates data against the schema upon loading, ensuring that the data conforms to the expected structure and types defined by TA Instruments.\n\n## Quick Start\n\n### Classes\n\nTo utilize classes like `Experiment`, import them directly from the `tadatakit.classes` module. These classes are dynamically generated based on the data schema, with helper functions added.\n\nExplore the `Experiment` class in a REPL environment (iPython or Jupyter Notebook):\n\n```python\nfrom tadatakit.classes import Experiment\n\nExperiment?\n```\n\n### File Parsing\n\nEasily parse files using the `from_json` method on the `Experiment` class, as demonstrated below:\n\n```python\nfrom tadatakit.classes import Experiment\n\nexperiment = Experiment.from_json(\"<path/to/json_file.json>\")\n```\n\nAs files can be large, be aware that this can take a large amount of memory.\n\n### Using The Data\n\n`Experiment` includes a convenience function to return the results data as a [pandas](https://pandas.pydata.org/) DataFrame. The example below demonstrates parsing a file and utilizing the DataFrame:\n\n```python\nfrom tadatakit.classes import Experiment\n\nexperiment = Experiment.from_json(\"<path/to/json_file.json>\")\ndf = experiment.get_dataframe()\n```\n\n## Utilizing and Extending Classes in TA Data Kit\n\nThe `tadatakit` library offers a versatile framework for handling and manipulating data through dynamically generated classes. These classes can be used directly (1), extended with additional functionality (2), or fully customized (3).\n\n### 1. Using Auto-Generated Classes\n\nClasses such as `Experiment` are dynamically created from a JSON schema by the `class_generator` module. They come equipped with all necessary properties and methods for basic data handling:\n\n```python\nfrom tadatakit.class_generator import Experiment\n\nexperiment = Experiment.from_json('experiment_data.json')\n\nprint(experiment.start_time)\n```\n\n### 2. Using Auto-Generated Classes Extended With Helper Functions\n\nUse `Experiment` imported from the `classes` module to take advantage of helper functions like:\n\n- **`get_dataframe`**: Transforms `Experiment` results into a pandas DataFrame.\n- **`get_dataframes_by_step`**: Divides results into multiple DataFrames, one per procedure step.\n\n**Usage Example:**\n\n```python\nfrom tadatakit.classes import Experiment\n\nexperiment = Experiment.from_json('path_to_data.json')\ndf = experiment.get_dataframe()\nprint(df.head(5))\n\nstep, dfs = experiment.get_dataframes_by_step()\nfor step, df in zip(step, dfs):\n print(step)\n print(df.head(5))\n```\n\n### 3. Building Custom Extensions\n\nCreate custom functionality by adding new methods or altering existing behaviors, perhaps to add polars support, an analysis pipeline, or methods for injection into databases or LIMS systems:\n\n**Steps to Extend:**\n\n1. **Define New Functions**: Craft functions that fulfill specific requirements.\n2. **Attach to Classes**: Dynamically bind these functions to the classes.\n3. **Implement in Workflow**: Integrate these enhanced objects into your application.\n\n**Custom Method Example:**\n\n```python\nfrom tadatakit.class_generator import Experiment\nimport datetime\n\ndef time_since_experiment(self):\n return datetime.datetime.now() - self.start_time\n\nsetattr(Experiment, \"time_since_experiment\", time_since_experiment)\n\nexperiment = Experiment.from_json('data.json')\nprint(experiment.time_since_experiment())\n```\n\n> Note: we provide no guarantee that your functions will not conflict with future additions to the schema. For example, if you add a dynamic property of `Experiment.end_time` it may conflict in the future with an `EndTime` property in the schema.\n\n## Explanation Of Approach\n\nThe `tadatakit.class_generator` module within the TA Data Kit automates the creation of Python classes directly from the TA Instruments TRIOS JSON Export Schema. This process allows for dynamic and efficient handling of data that conforms to defined standards, enhancing both development speed and data integrity. Here\u2019s how the library achieves this:\n\n### Overview\nThe library converts a JSON schema provided in a specification file into fully functional Python classes. These classes include type hints, serialization methods, and custom behaviors, closely mirroring the structure and requirements laid out in the schema.\n\n### Steps for Class Generation\n#### 1. Schema Loading\nThe process begins with loading the JSON schema. This schema defines the structure, types, required fields, and additional validation rules for the data.\n\n#### 2. Schema Parsing\nThe loaded schema is parsed to identify different data structures and types. This includes simple types like strings and numbers, complex objects, arrays, and special formats like dates and UUIDs.\n\n#### 3. Class Creation\nFor each definition in the schema (representing a potential data model), a Python class is dynamically generated. The library maps JSON types to Python types (e.g., `integer` to `int`, `object` to custom classes) and integrates any constraints and nested structures as class attributes.\n\n#### 4. Property Handling\nEach class is equipped with properties based on the schema's definitions. Properties are added dynamically with appropriate getters, setters, and deletions to manage data access and ensure type safety. In some places, for example results data, the schema allows for `additionalProperties` which are treated as `kwargs` in Python.\n\n#### 5. Method Integration\nSerialization and deserialization methods such as `from_json`, `to_json`, `from_dict`, and `to_dict` are integrated into each class. These methods handle conversion between JSON strings, dictionaries, and class instances, facilitating easy data exchange and storage operations.\n\n#### 6. Inheritance and Composition\nIf the schema specifies inheritance (using `allOf`) or composition (using `anyOf` or `oneOf`), the library constructs classes that inherit from multiple bases or handle multiple data types, ensuring that the generated classes faithfully represent the schema's intended data structures.\n\n#### 7. Registration and Accessibility\nGenerated classes are registered in a global class registry within the library. This registry allows for easy retrieval and instantiation of classes based on schema names, supporting dynamic access and manipulation of data in a type-safe manner.\n\n## Contributing\n\nWe welcome contributions from the community and are pleased to have you join us in improving `tadatakit`. Whether you are fixing bugs, adding new features, improving documentation, or suggesting new functionality, your input is valuable!\n\nIf you are interested in contributing to the `tadatakit` library, please read our [contributing guidelines](CONTRIBUTING.md) for detailed information on how to get started, coding conventions, and the pull request process.\n\n## Notes\n\nTA Instruments, TA, and TRIOS are trademarks of Waters Technologies Corporation.",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for parsing and handling data exported by TA Instruments' TRIOS JSON Export",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://www.tainstruments.com/",
"Repository": "https://github.com/TA-Instruments/tadatakit"
},
"split_keywords": [
"ta instruments",
" trios",
" json",
" data parsing",
" data analysis",
" materials science",
" dsc",
" tga",
" rheology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "44c0c47b3aa7f7ae893c949360873309bd1586da78177a8c76d504c98061252d",
"md5": "c251843b538d24b01b85bc146ea36539",
"sha256": "8258878430cf21c32e4d69a25743db245cf322b4d8d726068ff97149e282228a"
},
"downloads": -1,
"filename": "tadatakit-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c251843b538d24b01b85bc146ea36539",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 23896,
"upload_time": "2024-07-17T16:51:51",
"upload_time_iso_8601": "2024-07-17T16:51:51.747157Z",
"url": "https://files.pythonhosted.org/packages/44/c0/c47b3aa7f7ae893c949360873309bd1586da78177a8c76d504c98061252d/tadatakit-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d78c7ae540af4795e422fa158e1e4868a34b875c4ad5e28e57d8834e61a32736",
"md5": "2ba71df111ca00c88e937c36f385fb86",
"sha256": "5e6c5033c313571153b1d013c7ac5eea0b8312fa39011f5bab532cc350049697"
},
"downloads": -1,
"filename": "tadatakit-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "2ba71df111ca00c88e937c36f385fb86",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 24047,
"upload_time": "2024-07-17T16:51:53",
"upload_time_iso_8601": "2024-07-17T16:51:53.143413Z",
"url": "https://files.pythonhosted.org/packages/d7/8c/7ae540af4795e422fa158e1e4868a34b875c4ad5e28e57d8834e61a32736/tadatakit-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-17 16:51:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TA-Instruments",
"github_project": "tadatakit",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tadatakit"
}