# `fmtr.tools`
A collection of high-level tools to simplify everyday development tasks, with a slight focus on full-stack AI/ML.
This repository is an attempt to provide a one-stop source for a wide range of utilities and tools designed to streamline a typical, modern development workflow. There is an emphasis on a lean and nimble approach to dependencies, which tries to strike a balance between powerful functionality while avoiding unnecessary bloat.
## Why?
Personally, I'm grossly impatient, and simply resent writing the same code, however simple, in multiple projects.
This could be trivial stuff like reading an integer from an environment variable (while handling errors gracefully) - or more complex ones (like just wanting a simple parallel-processing function without writing Queues, or remembering which libraries you need to do it for you).
At the same time, I find that traditional tools collections inevitably become bloated and unwieldy over time, so wanted something with a somewhat sophisticated approach to dependencies.
## Key Features
- Wide-Ranging Utilities: The collection includes tools for configuration, data types, environment management, functions, hashing, importing, iterating, JSON handling, path manipulation, platform-specific operations, randomness, and string operations.
- Lean Dependencies: Dependencies are managed via extras, allowing you to install only what you need. Missing dependencies are handled in a clear way, telling you what's missing and how to install it.
## Installing
The base library can be installed like this:
```bash
pip install fmtr.tools
```
## Usage
Some simple import and usage examples
### Read an integer from an environment variable and write it to a (human-readable) JSON file
```python
from fmtr import tools
from fmtr.tools import Path
value=tools.env.get_int('MY_VALUE',default=None)
data=dict(value=value)
Path('data.json').write_json(data)
```
### Zero-faff parallel multi-processing
Install the extra:
```bash
pip install fmtr.tools[parallel] --upgrade
```
```python
from fmtr.tools import parallel
def expensive_computation(n):
import math
result = 0
for i in range(1, n + 1):
result += math.sqrt(i) * math.sin(i) * math.log(i)
return result
if __name__ == '__main__':
results=parallel.apply(expensive_computation, [10_000] * 1_000)
```
## Extras
Most tools require no additional dependencies, but for any that do, you can add them like this:
```bash
pip install fmtr.tools[<extra>] --upgrade
```
If you try to use a module without the required extras, you'll get a message telling you which one is needed:
```
MissingExtraError: The current module is missing dependencies. To install them, run: `pip install fmtr.tools[logging] --upgrade`
```
## Modules
The included modules, plus any extra requirements, are as follows:
- `tools.ai`: Manages bulk inference for LLMs using dynamic batching. Includes classes for managing prompt encoding, generating outputs, and handling tool calls, with support for both local and remote models. Uses Pytorch and Transformers for model operations, and provides functionality for encoding prompts, generating responses, and applying tool functions.
- Extras: `ai`
- `tools.config`: Base config class with overridable field processors.
- Extras: None
- `tools.dataclass`: Utilities for extracting and filtering fields and metadata from dataclasses, with support for applying filters and retrieving enabled fields based on metadata attributes.
- Extras: None
- `tools.datatype`
- Extras: None
- `tools.dm`: Defines custom data modelling base classes for creating Pydantic models with error-tolerant deserialization from JSON (e.g. when output from an LLM).
- Extras: `dm`
- `tools.environment`: Tools for managing environment variables, including functions to retrieve variables with type conversions and default values. Features include environment variable fetching, handling missing variables, and creating type-specific getters for integers, floats, booleans, dates, and paths.
- Extras: None
- `tools.env`: Alias of `tools.environment`.
- Extras: None
- `tools.function`: Utilities for combining and splitting arguments and keyword arguments.
- Extras: None
- `tools.hash`: String hashing
- Extras: None
- `tools.hfh`: Utilities for caching and managing Hugging Face model repositories: setting tokens, downloading snapshots, tagging repositories, and retrieving local cache paths.
- Extras: `hfh`
- `tools.html`: Utilities for converting HTML documents to plain text.
- Extras: `html`
- `tools.interface`: Provides a base class for building Streamlit interfaces with a class-based structure.
- Extras: `interface`
- `tools.iterator`: Pivoting/unpivoting data structures
- Extras: None
- `tools.json`: Serialisation/deserialisation to human-readable, unicode JSON.
- Extras: None
- `tools.merge`: Utility for recursively merging multiple dictionaries or objects using the DeepMerge library.
- Extras: `merge`
- `tools.name`: Generates random memorable names (similar to Docker Container names) by combining an adjective with a surname.
- Extras: None
- `tools.openai`: Utilities for interacting with the OpenAI API, simple text-to-text output, etc.
- Extras: `openai.api`
- `tools.Path`: Enhanced `pathlib.Path` object with additional functionality for Windows-to-Unix path conversion, reading/writing JSON and YAML files, and convenient directory creation with parent directories. Includes methods for obtaining paths to modules and temporary directories.
- Extras: None
- `tools.platform`: Detecting if host is WSL, Docker etc.
- Extras: None
- `tools.ContextProcess`: Manages a function running in a separate process using a context manager. Provides methods to start, stop, and restart the process, with configurable restart delays. Useful for ensuring clean process management and automatic stopping when the context manager exits.
- Extras: None
- `tools.random`: Provides additional functions for random number generation and selection, useful for data augmentation.
- Extras: None
- `tools.semantic`: Manages semantic similarity operations using Sentence Transformers: loading a pre-trained model, vectorizing a text corpus, and retrieving the top matches based on similarity scores for a given query string.
- Extras: `semantic`
- `tools.string`: Provides utilities for handling string formatting.
- Extras: None
- `tools.logging`: Configures and initializes a logger using the Loguru library. Provides customizable logging formats for time, level, file, function, and message components.
- Extras: `logging`
- `tools.logger`: Prefabricated `logger` object, suitable for most projects, timestamped, color-coded, etc.
- Extras: `logging`
- `tools.augmentation`: Data augmentation stub.
- Extras: `augmentation`
- `tools.Container`: Runs a Docker container within a context manager, ensuring the container is stopped and removed when the context is exited.
- Extras: `docker.api`
- `tools.parallel`: Provides utilities for parallel computation using Dask. Supports executing functions across multiple workers or processes, handles different data formats, and options for progress display and parallelism configuration.
- Extras: `parallel`
- `tools.profiling`: Context-based code timing.
- Extras: `profiling`
- `tools.tokenization`: Provides utilities for creating and configuring tokenizers using the Tokenizers library. Iincludes functions for training both word-level and byte-pair encoding (BPE) tokenizers, applying special formatting and templates, and managing tokenizer configurations such as padding, truncation, and special tokens.
- Extras: `tokenization`
- `tools.unicode`: Simple unicode decoding (via `Unidecode`).
- Extras: `unicode`
## Contribution
Any contributions would be most welcome! If you have a utility that fits well within this collection, or improvements to existing tools, feel free to open a pull request.
## License
This project is licensed under the Apache License Version 2.0. See the [LICENSE](LICENSE) file for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/fmtr/fmtr.tools",
"name": "fmtr.tools",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Frontmatter",
"author_email": "innovative.fowler@mask.pro.fmtr.dev",
"download_url": "https://files.pythonhosted.org/packages/a3/e3/bd09a70d427a4df2a0b255017f14bd5ab42bf993fa56aaa65d9f32f6cebb/fmtr.tools-1.0.4.tar.gz",
"platform": null,
"description": "# `fmtr.tools`\n\nA collection of high-level tools to simplify everyday development tasks, with a slight focus on full-stack AI/ML.\n\nThis repository is an attempt to provide a one-stop source for a wide range of utilities and tools designed to streamline a typical, modern development workflow. There is an emphasis on a lean and nimble approach to dependencies, which tries to strike a balance between powerful functionality while avoiding unnecessary bloat.\n\n## Why?\n\nPersonally, I'm grossly impatient, and simply resent writing the same code, however simple, in multiple projects.\n\nThis could be trivial stuff like reading an integer from an environment variable (while handling errors gracefully) - or more complex ones (like just wanting a simple parallel-processing function without writing Queues, or remembering which libraries you need to do it for you).\n\nAt the same time, I find that traditional tools collections inevitably become bloated and unwieldy over time, so wanted something with a somewhat sophisticated approach to dependencies.\n\n## Key Features\n\n- Wide-Ranging Utilities: The collection includes tools for configuration, data types, environment management, functions, hashing, importing, iterating, JSON handling, path manipulation, platform-specific operations, randomness, and string operations.\n- Lean Dependencies: Dependencies are managed via extras, allowing you to install only what you need. Missing dependencies are handled in a clear way, telling you what's missing and how to install it.\n\n## Installing\n\nThe base library can be installed like this:\n\n```bash\npip install fmtr.tools\n```\n\n## Usage\n\nSome simple import and usage examples\n\n### Read an integer from an environment variable and write it to a (human-readable) JSON file\n\n```python\nfrom fmtr import tools\nfrom fmtr.tools import Path\n\nvalue=tools.env.get_int('MY_VALUE',default=None)\ndata=dict(value=value)\nPath('data.json').write_json(data)\n```\n\n### Zero-faff parallel multi-processing\n\nInstall the extra:\n\n```bash\npip install fmtr.tools[parallel] --upgrade\n```\n\n```python\nfrom fmtr.tools import parallel\n\ndef expensive_computation(n):\n import math\n result = 0\n for i in range(1, n + 1):\n result += math.sqrt(i) * math.sin(i) * math.log(i)\n return result\n\nif __name__ == '__main__':\n results=parallel.apply(expensive_computation, [10_000] * 1_000)\n```\n\n## Extras\n\nMost tools require no additional dependencies, but for any that do, you can add them like this:\n\n```bash\npip install fmtr.tools[<extra>] --upgrade\n```\n\nIf you try to use a module without the required extras, you'll get a message telling you which one is needed:\n\n```\nMissingExtraError: The current module is missing dependencies. To install them, run: `pip install fmtr.tools[logging] --upgrade`\n```\n\n## Modules\n\nThe included modules, plus any extra requirements, are as follows:\n\n- `tools.ai`: Manages bulk inference for LLMs using dynamic batching. Includes classes for managing prompt encoding, generating outputs, and handling tool calls, with support for both local and remote models. Uses Pytorch and Transformers for model operations, and provides functionality for encoding prompts, generating responses, and applying tool functions.\n - Extras: `ai`\n- `tools.config`: Base config class with overridable field processors.\n - Extras: None\n- `tools.dataclass`: Utilities for extracting and filtering fields and metadata from dataclasses, with support for applying filters and retrieving enabled fields based on metadata attributes.\n - Extras: None\n- `tools.datatype`\n - Extras: None\n- `tools.dm`: Defines custom data modelling base classes for creating Pydantic models with error-tolerant deserialization from JSON (e.g. when output from an LLM).\n - Extras: `dm`\n- `tools.environment`: Tools for managing environment variables, including functions to retrieve variables with type conversions and default values. Features include environment variable fetching, handling missing variables, and creating type-specific getters for integers, floats, booleans, dates, and paths.\n - Extras: None\n- `tools.env`: Alias of `tools.environment`.\n - Extras: None\n- `tools.function`: Utilities for combining and splitting arguments and keyword arguments.\n - Extras: None\n- `tools.hash`: String hashing\n - Extras: None\n- `tools.hfh`: Utilities for caching and managing Hugging Face model repositories: setting tokens, downloading snapshots, tagging repositories, and retrieving local cache paths.\n - Extras: `hfh`\n- `tools.html`: Utilities for converting HTML documents to plain text.\n - Extras: `html`\n- `tools.interface`: Provides a base class for building Streamlit interfaces with a class-based structure.\n - Extras: `interface`\n- `tools.iterator`: Pivoting/unpivoting data structures\n - Extras: None\n- `tools.json`: Serialisation/deserialisation to human-readable, unicode JSON.\n - Extras: None\n- `tools.merge`: Utility for recursively merging multiple dictionaries or objects using the DeepMerge library.\n - Extras: `merge`\n- `tools.name`: Generates random memorable names (similar to Docker Container names) by combining an adjective with a surname.\n - Extras: None\n- `tools.openai`: Utilities for interacting with the OpenAI API, simple text-to-text output, etc.\n - Extras: `openai.api`\n- `tools.Path`: Enhanced `pathlib.Path` object with additional functionality for Windows-to-Unix path conversion, reading/writing JSON and YAML files, and convenient directory creation with parent directories. Includes methods for obtaining paths to modules and temporary directories.\n - Extras: None\n- `tools.platform`: Detecting if host is WSL, Docker etc.\n - Extras: None\n- `tools.ContextProcess`: Manages a function running in a separate process using a context manager. Provides methods to start, stop, and restart the process, with configurable restart delays. Useful for ensuring clean process management and automatic stopping when the context manager exits.\n - Extras: None\n- `tools.random`: Provides additional functions for random number generation and selection, useful for data augmentation.\n - Extras: None\n- `tools.semantic`: Manages semantic similarity operations using Sentence Transformers: loading a pre-trained model, vectorizing a text corpus, and retrieving the top matches based on similarity scores for a given query string.\n - Extras: `semantic`\n- `tools.string`: Provides utilities for handling string formatting.\n - Extras: None\n- `tools.logging`: Configures and initializes a logger using the Loguru library. Provides customizable logging formats for time, level, file, function, and message components.\n - Extras: `logging`\n- `tools.logger`: Prefabricated `logger` object, suitable for most projects, timestamped, color-coded, etc.\n - Extras: `logging`\n- `tools.augmentation`: Data augmentation stub.\n - Extras: `augmentation`\n- `tools.Container`: Runs a Docker container within a context manager, ensuring the container is stopped and removed when the context is exited.\n - Extras: `docker.api`\n- `tools.parallel`: Provides utilities for parallel computation using Dask. Supports executing functions across multiple workers or processes, handles different data formats, and options for progress display and parallelism configuration.\n - Extras: `parallel`\n- `tools.profiling`: Context-based code timing.\n - Extras: `profiling`\n- `tools.tokenization`: Provides utilities for creating and configuring tokenizers using the Tokenizers library. Iincludes functions for training both word-level and byte-pair encoding (BPE) tokenizers, applying special formatting and templates, and managing tokenizer configurations such as padding, truncation, and special tokens.\n - Extras: `tokenization`\n- `tools.unicode`: Simple unicode decoding (via `Unidecode`).\n - Extras: `unicode`\n\n## Contribution\n\nAny contributions would be most welcome! If you have a utility that fits well within this collection, or improvements to existing tools, feel free to open a pull request.\n\n## License\n\nThis project is licensed under the Apache License Version 2.0. See the [LICENSE](LICENSE) file for more details.\n\n",
"bugtrack_url": null,
"license": "Copyright \u00a9 2024 Frontmatter. All rights reserved.",
"summary": "Collection of high-level tools to simplify everyday development tasks, with a focus on AI/ML",
"version": "1.0.4",
"project_urls": {
"Homepage": "https://github.com/fmtr/fmtr.tools"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6e44fcd94b5b13eab27387d1bcb5471d95bd45dd27058f69925ce4a1021cf49a",
"md5": "19bb1cb48e26eb10fff938a5e79514dc",
"sha256": "ad423a7ec1797f7bb718b4c22ff61db04d05487d773876024b0cf4b2df131900"
},
"downloads": -1,
"filename": "fmtr.tools-1.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "19bb1cb48e26eb10fff938a5e79514dc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 44589,
"upload_time": "2024-11-22T12:36:28",
"upload_time_iso_8601": "2024-11-22T12:36:28.872560Z",
"url": "https://files.pythonhosted.org/packages/6e/44/fcd94b5b13eab27387d1bcb5471d95bd45dd27058f69925ce4a1021cf49a/fmtr.tools-1.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a3e3bd09a70d427a4df2a0b255017f14bd5ab42bf993fa56aaa65d9f32f6cebb",
"md5": "6aa5de1cafaf956f60f28f5561cf3ca7",
"sha256": "afd12705c186db317f19227e66287045d148f174ed9c344711353293244c2745"
},
"downloads": -1,
"filename": "fmtr.tools-1.0.4.tar.gz",
"has_sig": false,
"md5_digest": "6aa5de1cafaf956f60f28f5561cf3ca7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 36531,
"upload_time": "2024-11-22T12:36:30",
"upload_time_iso_8601": "2024-11-22T12:36:30.923661Z",
"url": "https://files.pythonhosted.org/packages/a3/e3/bd09a70d427a4df2a0b255017f14bd5ab42bf993fa56aaa65d9f32f6cebb/fmtr.tools-1.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-22 12:36:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fmtr",
"github_project": "fmtr.tools",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "fmtr.tools"
}