# RXN utilities package
[](https://github.com/rxn4chemistry/rxn-utilities/actions)
This repository contains general Python utilities commonly used in the RXN universe.
For utilities related to chemistry, see our other repository [`rxn-chemutils`](https://github.com/rxn4chemistry/rxn-chemutils).
Links:
* [GitHub repository](https://github.com/rxn4chemistry/rxn-utilities)
* [Documentation](https://rxn4chemistry.github.io/rxn-utilities/)
* [PyPI package](https://pypi.org/project/rxn-utils/)
## System Requirements
This package is supported on all operating systems.
It has been tested on the following systems:
+ macOS: Big Sur (11.1)
+ Linux: Ubuntu 18.04.4
A Python version of 3.6 or greater is recommended.
## Installation guide
The package can be installed from Pypi:
```bash
pip install rxn-utils
```
For local development, the package can be installed with:
```bash
pip install -e ".[dev]"
```
## Package highlights
### File-related utilities
* [`load_list_from_file`](./src/rxn/utilities/files.py): read a files into a list of strings.
* [`iterate_lines_from_file`](./src/rxn/utilities/files.py): same as `load_list_from_file`, but produces an iterator instead of a list. This can be much more memory-efficient.
* [`dump_list_to_file`](./src/rxn/utilities/files.py) and [`append_to_file`](./src/rxn/utilities/files.py): Write an iterable of strings to a file (one per line).
* [`named_temporary_path`](./src/rxn/utilities/files.py) and [`named_temporary_directory`](./src/rxn/utilities/files.py): provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.
```pycon
>>> with named_temporary_path() as temporary_path:
... # do something on the temporary path.
... # The file or directory at that path will be deleted at the
... # end of the context, except if delete=False.
```
* ... and others.
### CSV-related functionality
* The function [`iterate_csv_column`](./src/rxn/utilities/csv/column_iterator.py) and the related executable `rxn-extract-csv-column` provide an easy way to extract one single column from a CSV file.
* The [`StreamingCsvEditor`](./src/rxn/utilities/csv/streaming_csv_editor.py) allows for doing a series of operations onto a CSV file without loading it fully in the memory.
This is for instance used in [`rxn-reaction-preprocessing`](https://github.com/rxn4chemistry/rxn-reaction-preprocessing).
See a few examples in the [unit tests](./tests/csv/test_streaming_csv_editor.py).
### Stable shuffling
For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the [`stable_shuffle`](./src/rxn/utilities/files.py) function.
The executable `rxn-stable-shuffle` is also provided for this purpose.
Both also work with CSV files if the appropriate flag is provided.
### `chunker` and `remove_duplicates`
For batching an iterable into lists of a specified size, `chunker` comes in handy.
It also does so in a memory-efficient way.
```pycon
>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
... print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]
```
[`remove_duplicates`](./src/rxn/utilities/containers.py) (or [`iterate_unique_values`](./src/rxn/utilities/containers.py), its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:
```pycon
>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']
```
### Regex utilities
[`regex.py`](./src/rxn/utilities/regex.py) provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).
### Others
* A custom, more general enum class, [`RxnEnum`](./src/rxn/utilities/types.py).
* [`remove_prefix`](./src/rxn/utilities/strings.py), [`remove_postfix`](./src/rxn/utilities/strings.py).
* Initialization of loggers, in a `logging`-compatible way: [`logging.py`](./src/rxn/utilities/logging.py).
* [`sandboxed_random_context`](./src/rxn/utilities/basic.py) and [`temporary_random_seed`](./src/rxn/utilities/basic.py), to create a context with a specific random state that will not have side effects.
Especially useful for testing purposes (unit tests).
* ... and others.
Raw data
{
"_id": null,
"home_page": "https://github.com/rxn4chemistry/rxn-utilities",
"name": "rxn-utils",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "IBM RXN team",
"author_email": "rxn4chemistry@zurich.ibm.com",
"download_url": "https://files.pythonhosted.org/packages/cd/53/fb8eaf6e2119aacfbc75b9458923535c72e462fac06c637b6e5419756444/rxn-utils-2.0.0.tar.gz",
"platform": null,
"description": "# RXN utilities package\n\n[](https://github.com/rxn4chemistry/rxn-utilities/actions)\n\nThis repository contains general Python utilities commonly used in the RXN universe.\nFor utilities related to chemistry, see our other repository [`rxn-chemutils`](https://github.com/rxn4chemistry/rxn-chemutils).\n\nLinks:\n* [GitHub repository](https://github.com/rxn4chemistry/rxn-utilities)\n* [Documentation](https://rxn4chemistry.github.io/rxn-utilities/)\n* [PyPI package](https://pypi.org/project/rxn-utils/)\n\n## System Requirements\n\nThis package is supported on all operating systems.\nIt has been tested on the following systems:\n\n+ macOS: Big Sur (11.1)\n\n+ Linux: Ubuntu 18.04.4\n\nA Python version of 3.6 or greater is recommended.\n\n## Installation guide\n\nThe package can be installed from Pypi:\n\n```bash\npip install rxn-utils\n```\n\nFor local development, the package can be installed with:\n\n```bash\npip install -e \".[dev]\"\n```\n\n## Package highlights\n\n### File-related utilities\n\n* [`load_list_from_file`](./src/rxn/utilities/files.py): read a files into a list of strings.\n* [`iterate_lines_from_file`](./src/rxn/utilities/files.py): same as `load_list_from_file`, but produces an iterator instead of a list. This can be much more memory-efficient.\n* [`dump_list_to_file`](./src/rxn/utilities/files.py) and [`append_to_file`](./src/rxn/utilities/files.py): Write an iterable of strings to a file (one per line).\n* [`named_temporary_path`](./src/rxn/utilities/files.py) and [`named_temporary_directory`](./src/rxn/utilities/files.py): provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.\n ```pycon\n >>> with named_temporary_path() as temporary_path:\n ... # do something on the temporary path.\n ... # The file or directory at that path will be deleted at the\n ... # end of the context, except if delete=False.\n ```\n* ... and others.\n\n### CSV-related functionality\n\n* The function [`iterate_csv_column`](./src/rxn/utilities/csv/column_iterator.py) and the related executable `rxn-extract-csv-column` provide an easy way to extract one single column from a CSV file.\n* The [`StreamingCsvEditor`](./src/rxn/utilities/csv/streaming_csv_editor.py) allows for doing a series of operations onto a CSV file without loading it fully in the memory. \n This is for instance used in [`rxn-reaction-preprocessing`](https://github.com/rxn4chemistry/rxn-reaction-preprocessing).\n See a few examples in the [unit tests](./tests/csv/test_streaming_csv_editor.py).\n\n### Stable shuffling\n\nFor reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the [`stable_shuffle`](./src/rxn/utilities/files.py) function.\nThe executable `rxn-stable-shuffle` is also provided for this purpose.\n\nBoth also work with CSV files if the appropriate flag is provided.\n\n### `chunker` and `remove_duplicates`\n\nFor batching an iterable into lists of a specified size, `chunker` comes in handy. \nIt also does so in a memory-efficient way.\n```pycon\n>>> from rxn.utilities.containers import chunker\n>>> for chunk in chunker(range(1, 10), chunk_size=4):\n... print(chunk)\n[1, 2, 3, 4]\n[5, 6, 7, 8]\n[9]\n```\n\n[`remove_duplicates`](./src/rxn/utilities/containers.py) (or [`iterate_unique_values`](./src/rxn/utilities/containers.py), its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:\n```pycon\n>>> from rxn.utilities.containers import remove_duplicates\n>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])\n[3, 6, 9, 2, 1]\n>>> remove_duplicates([\"ab\", \"cd\", \"efg\", \"hijk\", \"\", \"lmn\"], key=lambda x: len(x))\n['ab', 'efg', 'hijk', '']\n```\n\n### Regex utilities\n\n[`regex.py`](./src/rxn/utilities/regex.py) provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).\n\n### Others\n\n* A custom, more general enum class, [`RxnEnum`](./src/rxn/utilities/types.py).\n* [`remove_prefix`](./src/rxn/utilities/strings.py), [`remove_postfix`](./src/rxn/utilities/strings.py).\n* Initialization of loggers, in a `logging`-compatible way: [`logging.py`](./src/rxn/utilities/logging.py).\n* [`sandboxed_random_context`](./src/rxn/utilities/basic.py) and [`temporary_random_seed`](./src/rxn/utilities/basic.py), to create a context with a specific random state that will not have side effects. \n Especially useful for testing purposes (unit tests).\n* ... and others.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "General utilities (not related to chemistry)",
"version": "2.0.0",
"project_urls": {
"Documentation": "https://rxn4chemistry.github.io/rxn-utilities/",
"Homepage": "https://github.com/rxn4chemistry/rxn-utilities",
"Repository": "https://github.com/rxn4chemistry/rxn-utilities"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "353983dcba297793c24aa0402c80c628e571d425837362d30ac3fae9b3f0f9dd",
"md5": "068fd09b862ed5f7cbb896ea447ea101",
"sha256": "d1d8598040f9d0fdabfea15d0ec140bd39aa1d32d873efa1890143a7ab29b030"
},
"downloads": -1,
"filename": "rxn_utils-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "068fd09b862ed5f7cbb896ea447ea101",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 30089,
"upload_time": "2024-02-13T20:37:40",
"upload_time_iso_8601": "2024-02-13T20:37:40.105344Z",
"url": "https://files.pythonhosted.org/packages/35/39/83dcba297793c24aa0402c80c628e571d425837362d30ac3fae9b3f0f9dd/rxn_utils-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cd53fb8eaf6e2119aacfbc75b9458923535c72e462fac06c637b6e5419756444",
"md5": "14c8f3f325c8c41e6cb5f81224cc3365",
"sha256": "5af2feabb4b82dffb2aa3dda6973c6d9f658175d36380ac36d8e88808afa033e"
},
"downloads": -1,
"filename": "rxn-utils-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "14c8f3f325c8c41e6cb5f81224cc3365",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 34193,
"upload_time": "2024-02-13T20:37:41",
"upload_time_iso_8601": "2024-02-13T20:37:41.442736Z",
"url": "https://files.pythonhosted.org/packages/cd/53/fb8eaf6e2119aacfbc75b9458923535c72e462fac06c637b6e5419756444/rxn-utils-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-13 20:37:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rxn4chemistry",
"github_project": "rxn-utilities",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rxn-utils"
}