Name | joinem JSON |
Version |
0.1.5
JSON |
| download |
home_page | |
Summary | CLI for fast, flexbile concatenation of tabular data using polars. |
upload_time | 2024-02-19 19:12:38 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | MIT license |
keywords |
polars
data processing
csv
parquet
data science
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[
![PyPi](https://img.shields.io/pypi/v/joinem.svg?)
](https://pypi.python.org/pypi/joinem)
[
![CI](https://github.com/mmore500/joinem/actions/workflows/ci.yaml/badge.svg)
](https://github.com/mmore500/joinem/actions)
[
![GitHub stars](https://img.shields.io/github/stars/mmore500/joinem.svg?style=round-square&logo=github&label=Stars&logoColor=white)](https://github.com/mmore500/joinem)
_joinem_ provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs/)
- Free software: MIT license
- Repository: <https://github.com/mmore500/joinem>
## Install
`python3 -m pip install joinem`
## Features
- Lazily streams I/O to expeditiously handle numerous large files.
- Supports CSV and parquet input files.
- Due to current polars limitations, JSON and feather files are not supported.
- Input formats may be mixed.
- Supports output to CSV, JSON, parquet, and feather file types.
- Allows mismatched columns and/or empty data files with `--how diagonal` and `--how diagonal_relaxed`.
- Provides a progress bar with `--progress`.
## Example Usage
Pass input filenames via stdin, one filename per line.
```
find path/to/*.parquet path/to/*.csv | python3 -m joinem out.parquet
```
Output file type is inferred from the extension of the output file name.
Supported output types are feather, JSON, parquet, and csv.
```
find -name '*.parquet' | python3 -m joinem out.json
```
Use `--progress` to show a progress bar.
```
ls -1 path/{*.csv,*.pqt} | python3 -m joinem out.csv --progress
```
If file columns may mismatch, use `--how diagonal`.
```
find path/to/ -name '*.csv' | python3 -m joinem out.csv --how diagonal
```
If some files may be empty, use `--how diagonal_relaxed`.
To run via Singularity/Apptainer,
```
ls -1 *.csv | singularity run docker://ghcr.io/mmore500/joinem out.feather
```
## API
```
usage: __main__.py [-h] [--version] [--progress]
[--how {vertical,horizontal,diagonal,diagonal_relaxed}]
output_file
Concatenate CSV and/or parquet tabular data files.
positional arguments:
output_file Output file name
options:
-h, --help show this help message and exit
--version show program's version number and exit
--progress Show progress bar
--how {vertical,horizontal,diagonal,diagonal_relaxed}
How to concatenate frames. See <https://docs.pola.rs/py-
polars/html/reference/api/polars.concat.html> for more information.
Provide input filenames via stdin. Example: find path/to/ -name '*.csv' | python3 -m joinem
-o out.csv
```
Raw data
{
"_id": null,
"home_page": "",
"name": "joinem",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "polars,data processing,CSV,parquet,data science",
"author": "",
"author_email": "Matthew Andres moreno <m.more500@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d6/5a/7dda20bf3bd078017ef6f39890a4c0ade79abb864b4297918264435c7dd1/joinem-0.1.5.tar.gz",
"platform": null,
"description": "[\n![PyPi](https://img.shields.io/pypi/v/joinem.svg?)\n](https://pypi.python.org/pypi/joinem)\n[\n![CI](https://github.com/mmore500/joinem/actions/workflows/ci.yaml/badge.svg)\n](https://github.com/mmore500/joinem/actions)\n[\n![GitHub stars](https://img.shields.io/github/stars/mmore500/joinem.svg?style=round-square&logo=github&label=Stars&logoColor=white)](https://github.com/mmore500/joinem)\n\n_joinem_ provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs/)\n\n- Free software: MIT license\n- Repository: <https://github.com/mmore500/joinem>\n\n## Install\n\n`python3 -m pip install joinem`\n\n## Features\n\n- Lazily streams I/O to expeditiously handle numerous large files.\n- Supports CSV and parquet input files.\n - Due to current polars limitations, JSON and feather files are not supported.\n - Input formats may be mixed.\n- Supports output to CSV, JSON, parquet, and feather file types.\n- Allows mismatched columns and/or empty data files with `--how diagonal` and `--how diagonal_relaxed`.\n- Provides a progress bar with `--progress`.\n\n## Example Usage\n\nPass input filenames via stdin, one filename per line.\n```\nfind path/to/*.parquet path/to/*.csv | python3 -m joinem out.parquet\n```\n\nOutput file type is inferred from the extension of the output file name.\nSupported output types are feather, JSON, parquet, and csv.\n```\nfind -name '*.parquet' | python3 -m joinem out.json\n```\n\nUse `--progress` to show a progress bar.\n```\nls -1 path/{*.csv,*.pqt} | python3 -m joinem out.csv --progress\n```\n\nIf file columns may mismatch, use `--how diagonal`.\n```\nfind path/to/ -name '*.csv' | python3 -m joinem out.csv --how diagonal\n```\n\nIf some files may be empty, use `--how diagonal_relaxed`.\n\nTo run via Singularity/Apptainer,\n```\nls -1 *.csv | singularity run docker://ghcr.io/mmore500/joinem out.feather\n```\n\n## API\n\n```\nusage: __main__.py [-h] [--version] [--progress]\n [--how {vertical,horizontal,diagonal,diagonal_relaxed}]\n output_file\n\nConcatenate CSV and/or parquet tabular data files.\n\npositional arguments:\n output_file Output file name\n\noptions:\n -h, --help show this help message and exit\n --version show program's version number and exit\n --progress Show progress bar\n --how {vertical,horizontal,diagonal,diagonal_relaxed}\n How to concatenate frames. See <https://docs.pola.rs/py-\n polars/html/reference/api/polars.concat.html> for more information.\n\nProvide input filenames via stdin. Example: find path/to/ -name '*.csv' | python3 -m joinem\n-o out.csv\n```\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "CLI for fast, flexbile concatenation of tabular data using polars.",
"version": "0.1.5",
"project_urls": {
"documentation": "https://github.com/mmore500/joinem",
"homepage": "https://github.com/mmore500/joinem",
"repository": "https://github.com/mmore500/joinem",
"tracker": "https://github.com/mmore500/joinem/issues"
},
"split_keywords": [
"polars",
"data processing",
"csv",
"parquet",
"data science"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fec74e4962ea54832ddd5cb027987707f1cfb8590f84de364fa1bee5cacbf289",
"md5": "1f61d999a92828647c816fdf04402663",
"sha256": "92fafb87dedf277d8e93ff9b4c9370aa1e8462a5dcefe8cb5e49d96e0b7a0476"
},
"downloads": -1,
"filename": "joinem-0.1.5-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "1f61d999a92828647c816fdf04402663",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 4944,
"upload_time": "2024-02-19T19:12:37",
"upload_time_iso_8601": "2024-02-19T19:12:37.352828Z",
"url": "https://files.pythonhosted.org/packages/fe/c7/4e4962ea54832ddd5cb027987707f1cfb8590f84de364fa1bee5cacbf289/joinem-0.1.5-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d65a7dda20bf3bd078017ef6f39890a4c0ade79abb864b4297918264435c7dd1",
"md5": "62ffae6d118cb01ae82fd9dab39447ed",
"sha256": "fe034de9fb8f246e323a1f49657b40dfc5474152b5b8a5558f0a08c8e4861b8b"
},
"downloads": -1,
"filename": "joinem-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "62ffae6d118cb01ae82fd9dab39447ed",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 4753,
"upload_time": "2024-02-19T19:12:38",
"upload_time_iso_8601": "2024-02-19T19:12:38.352990Z",
"url": "https://files.pythonhosted.org/packages/d6/5a/7dda20bf3bd078017ef6f39890a4c0ade79abb864b4297918264435c7dd1/joinem-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-19 19:12:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mmore500",
"github_project": "joinem",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "joinem"
}