joinem


Namejoinem JSON
Version 0.1.5 PyPI version JSON
download
home_page
SummaryCLI for fast, flexbile concatenation of tabular data using polars.
upload_time2024-02-19 19:12:38
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT license
keywords polars data processing csv parquet data science
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [
![PyPi](https://img.shields.io/pypi/v/joinem.svg?)
](https://pypi.python.org/pypi/joinem)
[
![CI](https://github.com/mmore500/joinem/actions/workflows/ci.yaml/badge.svg)
](https://github.com/mmore500/joinem/actions)
[
![GitHub stars](https://img.shields.io/github/stars/mmore500/joinem.svg?style=round-square&logo=github&label=Stars&logoColor=white)](https://github.com/mmore500/joinem)

_joinem_ provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs/)

- Free software: MIT license
- Repository: <https://github.com/mmore500/joinem>

## Install

`python3 -m pip install joinem`

## Features

- Lazily streams I/O to expeditiously handle numerous large files.
- Supports CSV and parquet input files.
    - Due to current polars limitations, JSON and feather files are not supported.
    - Input formats may be mixed.
- Supports output to CSV, JSON, parquet, and feather file types.
- Allows mismatched columns and/or empty data files with `--how diagonal` and `--how diagonal_relaxed`.
- Provides a progress bar with `--progress`.

## Example Usage

Pass input filenames via stdin, one filename per line.
```
find path/to/*.parquet path/to/*.csv | python3 -m joinem out.parquet
```

Output file type is inferred from the extension of the output file name.
Supported output types are feather, JSON, parquet, and csv.
```
find -name '*.parquet' | python3 -m joinem out.json
```

Use `--progress` to show a progress bar.
```
ls -1 path/{*.csv,*.pqt} | python3 -m joinem out.csv --progress
```

If file columns may mismatch, use `--how diagonal`.
```
find path/to/ -name '*.csv' | python3 -m joinem out.csv --how diagonal
```

If some files may be empty, use `--how diagonal_relaxed`.

To run via Singularity/Apptainer,
```
ls -1 *.csv | singularity run docker://ghcr.io/mmore500/joinem out.feather
```

## API

```
usage: __main__.py [-h] [--version] [--progress]
                   [--how {vertical,horizontal,diagonal,diagonal_relaxed}]
                   output_file

Concatenate CSV and/or parquet tabular data files.

positional arguments:
  output_file           Output file name

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --progress            Show progress bar
  --how {vertical,horizontal,diagonal,diagonal_relaxed}
                        How to concatenate frames. See <https://docs.pola.rs/py-
                        polars/html/reference/api/polars.concat.html> for more information.

Provide input filenames via stdin. Example: find path/to/ -name '*.csv' | python3 -m joinem
-o out.csv
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "joinem",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "polars,data processing,CSV,parquet,data science",
    "author": "",
    "author_email": "Matthew Andres moreno <m.more500@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d6/5a/7dda20bf3bd078017ef6f39890a4c0ade79abb864b4297918264435c7dd1/joinem-0.1.5.tar.gz",
    "platform": null,
    "description": "[\n![PyPi](https://img.shields.io/pypi/v/joinem.svg?)\n](https://pypi.python.org/pypi/joinem)\n[\n![CI](https://github.com/mmore500/joinem/actions/workflows/ci.yaml/badge.svg)\n](https://github.com/mmore500/joinem/actions)\n[\n![GitHub stars](https://img.shields.io/github/stars/mmore500/joinem.svg?style=round-square&logo=github&label=Stars&logoColor=white)](https://github.com/mmore500/joinem)\n\n_joinem_ provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs/)\n\n- Free software: MIT license\n- Repository: <https://github.com/mmore500/joinem>\n\n## Install\n\n`python3 -m pip install joinem`\n\n## Features\n\n- Lazily streams I/O to expeditiously handle numerous large files.\n- Supports CSV and parquet input files.\n    - Due to current polars limitations, JSON and feather files are not supported.\n    - Input formats may be mixed.\n- Supports output to CSV, JSON, parquet, and feather file types.\n- Allows mismatched columns and/or empty data files with `--how diagonal` and `--how diagonal_relaxed`.\n- Provides a progress bar with `--progress`.\n\n## Example Usage\n\nPass input filenames via stdin, one filename per line.\n```\nfind path/to/*.parquet path/to/*.csv | python3 -m joinem out.parquet\n```\n\nOutput file type is inferred from the extension of the output file name.\nSupported output types are feather, JSON, parquet, and csv.\n```\nfind -name '*.parquet' | python3 -m joinem out.json\n```\n\nUse `--progress` to show a progress bar.\n```\nls -1 path/{*.csv,*.pqt} | python3 -m joinem out.csv --progress\n```\n\nIf file columns may mismatch, use `--how diagonal`.\n```\nfind path/to/ -name '*.csv' | python3 -m joinem out.csv --how diagonal\n```\n\nIf some files may be empty, use `--how diagonal_relaxed`.\n\nTo run via Singularity/Apptainer,\n```\nls -1 *.csv | singularity run docker://ghcr.io/mmore500/joinem out.feather\n```\n\n## API\n\n```\nusage: __main__.py [-h] [--version] [--progress]\n                   [--how {vertical,horizontal,diagonal,diagonal_relaxed}]\n                   output_file\n\nConcatenate CSV and/or parquet tabular data files.\n\npositional arguments:\n  output_file           Output file name\n\noptions:\n  -h, --help            show this help message and exit\n  --version             show program's version number and exit\n  --progress            Show progress bar\n  --how {vertical,horizontal,diagonal,diagonal_relaxed}\n                        How to concatenate frames. See <https://docs.pola.rs/py-\n                        polars/html/reference/api/polars.concat.html> for more information.\n\nProvide input filenames via stdin. Example: find path/to/ -name '*.csv' | python3 -m joinem\n-o out.csv\n```\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "CLI for fast, flexbile concatenation of tabular data using polars.",
    "version": "0.1.5",
    "project_urls": {
        "documentation": "https://github.com/mmore500/joinem",
        "homepage": "https://github.com/mmore500/joinem",
        "repository": "https://github.com/mmore500/joinem",
        "tracker": "https://github.com/mmore500/joinem/issues"
    },
    "split_keywords": [
        "polars",
        "data processing",
        "csv",
        "parquet",
        "data science"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fec74e4962ea54832ddd5cb027987707f1cfb8590f84de364fa1bee5cacbf289",
                "md5": "1f61d999a92828647c816fdf04402663",
                "sha256": "92fafb87dedf277d8e93ff9b4c9370aa1e8462a5dcefe8cb5e49d96e0b7a0476"
            },
            "downloads": -1,
            "filename": "joinem-0.1.5-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1f61d999a92828647c816fdf04402663",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 4944,
            "upload_time": "2024-02-19T19:12:37",
            "upload_time_iso_8601": "2024-02-19T19:12:37.352828Z",
            "url": "https://files.pythonhosted.org/packages/fe/c7/4e4962ea54832ddd5cb027987707f1cfb8590f84de364fa1bee5cacbf289/joinem-0.1.5-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d65a7dda20bf3bd078017ef6f39890a4c0ade79abb864b4297918264435c7dd1",
                "md5": "62ffae6d118cb01ae82fd9dab39447ed",
                "sha256": "fe034de9fb8f246e323a1f49657b40dfc5474152b5b8a5558f0a08c8e4861b8b"
            },
            "downloads": -1,
            "filename": "joinem-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "62ffae6d118cb01ae82fd9dab39447ed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 4753,
            "upload_time": "2024-02-19T19:12:38",
            "upload_time_iso_8601": "2024-02-19T19:12:38.352990Z",
            "url": "https://files.pythonhosted.org/packages/d6/5a/7dda20bf3bd078017ef6f39890a4c0ade79abb864b4297918264435c7dd1/joinem-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-19 19:12:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mmore500",
    "github_project": "joinem",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "joinem"
}
        
Elapsed time: 0.24767s