jskiner


Namejskiner JSON
Version 0.1.1 PyPI version JSON
download
home_page
Summary
upload_time2024-01-16 15:18:48
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Continuous Integration](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml)

# JSkiner 

The is a python **Js**on **Sch**ema **In**ference **E**ngine with **R**ust's core. Its inferencing speed is about 10 times of its pure-python counterpart ([jsonschema-inference](https://pypi.org/project/jsonschema-inference/)).

# Installation 

```bash
pip install jskiner
```

# Usage

## Checking the Json Schema of a Large .jsonl file

```bash
jskiner \
    --in <path_to_jsonl> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --split <number_of_split_batch_size>
    --split-path <path_to_store_the_split_files>
```

## Checking the Json Schema for a folder of json files

```bash
jskiner \
    --in <path_to_jsons> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --batch-size <batch_size_for_inferencing>
    --cuckoo-path <path_to_store_the_cuckoo_filter>
    --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
    --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>
```

## Infering the Schema in Python

```python
from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema
```
>> Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})

## Calculate the Union of a List of Schema 

```python
from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema
```
>> Optional(Atomic(Int()))

## Using | Operation between Two Schema

```python
from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema
```
>> Optional(Atomic(Int()))

# TODO:

- [X] Enable inference from a folder of json files
- [X] Enable ignoring of existing json files using cuckoo filter
- [X] Enable add starting schema file
- [X] Enable batch-by-batch process on large jsonl file
- [X] FIX: make sure __repr__ escape special characters. 
- [X] Auto Formatting Using Black
- [X] Enable sampling of json files
- [X] Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging) 
- [X] Fix: adding UnionRecord schema object
- [ ] Enable direct inferencing from API online. (able to avoid repeat download of json)
- [ ] Enable Regex to represent patterned FieldSet


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "jskiner",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "",
    "platform": null,
    "description": "[![Continuous Integration](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml)\n\n# JSkiner \n\nThe is a python **Js**on **Sch**ema **In**ference **E**ngine with **R**ust's core. Its inferencing speed is about 10 times of its pure-python counterpart ([jsonschema-inference](https://pypi.org/project/jsonschema-inference/)).\n\n# Installation \n\n```bash\npip install jskiner\n```\n\n# Usage\n\n## Checking the Json Schema of a Large .jsonl file\n\n```bash\njskiner \\\n    --in <path_to_jsonl> \n    --verbose <false/true> \n    --out <output_file_path>\n    --nworkers <number_of_cpu_core>\n    --split <number_of_split_batch_size>\n    --split-path <path_to_store_the_split_files>\n```\n\n## Checking the Json Schema for a folder of json files\n\n```bash\njskiner \\\n    --in <path_to_jsons> \n    --verbose <false/true> \n    --out <output_file_path>\n    --nworkers <number_of_cpu_core>\n    --batch-size <batch_size_for_inferencing>\n    --cuckoo-path <path_to_store_the_cuckoo_filter>\n    --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>\n    --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>\n```\n\n## Infering the Schema in Python\n\n```python\nfrom jskiner import InferenceEngine\ncpu_cnt = 16\nengine = InferenceEngine(cpu_cnt)\njson_string_list = [\"1\", \"1.2\", \"null\", \"{\\\"a\\\": 1}\"]\nschema = engine.run(json_string_list)\nschema\n```\n>> Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({\"a\": Atomic(Int())})})\n\n## Calculate the Union of a List of Schema \n\n```python\nfrom jskiner import InferenceEngine\nfrom jskiner.schema import Atomic, Int, Non\ncpu_cnt = 16\nengine = InferenceEngine(cpu_cnt)\nschema = engine.run([Atomic(Int()), Atomic(Non()])\nschema\n```\n>> Optional(Atomic(Int()))\n\n## Using | Operation between Two Schema\n\n```python\nfrom jskiner import Atomic, Int, Non\nschema = Atomic(Int()) | Atomic(Non())\nschema\n```\n>> Optional(Atomic(Int()))\n\n# TODO:\n\n- [X] Enable inference from a folder of json files\n- [X] Enable ignoring of existing json files using cuckoo filter\n- [X] Enable add starting schema file\n- [X] Enable batch-by-batch process on large jsonl file\n- [X] FIX: make sure __repr__ escape special characters. \n- [X] Auto Formatting Using Black\n- [X] Enable sampling of json files\n- [X] Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging) \n- [X] Fix: adding UnionRecord schema object\n- [ ] Enable direct inferencing from API online. (able to avoid repeat download of json)\n- [ ] Enable Regex to represent patterned FieldSet\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.1.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "24bf59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced",
                "md5": "9cc5bcc742135d08cc9dee9dfb85a65f",
                "sha256": "a51fa54d4d0833769a694d52e8b81e6b90d33dd51d6c2ee5e80bc2c15cfaf236"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9cc5bcc742135d08cc9dee9dfb85a65f",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.7",
            "size": 399863,
            "upload_time": "2024-01-16T15:18:48",
            "upload_time_iso_8601": "2024-01-16T15:18:48.790432Z",
            "url": "https://files.pythonhosted.org/packages/24/bf/59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced/jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60266945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af",
                "md5": "a8796a49c9d4e5b1985e6f096f1f3eef",
                "sha256": "1a482112ea462b3b803b946aee3291e346bbd7003d4a9ecc81bf473fc060da82"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "a8796a49c9d4e5b1985e6f096f1f3eef",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.7",
            "size": 390244,
            "upload_time": "2024-01-16T15:18:50",
            "upload_time_iso_8601": "2024-01-16T15:18:50.121362Z",
            "url": "https://files.pythonhosted.org/packages/60/26/6945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af/jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "da0946b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1",
                "md5": "3b7c4a9ffbd05589a105c9f619285f23",
                "sha256": "0d5f5cb30b3c10641500e1aff6f920cd1deacb5c2a2db06fc9cde8bd1bf43d60"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "3b7c4a9ffbd05589a105c9f619285f23",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.7",
            "size": 2555475,
            "upload_time": "2024-01-16T15:18:51",
            "upload_time_iso_8601": "2024-01-16T15:18:51.451475Z",
            "url": "https://files.pythonhosted.org/packages/da/09/46b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1/jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f721b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882",
                "md5": "b414f159c05b0ceb1f4843be48a67ffa",
                "sha256": "c14bbb98e5cb3d58132b005c97cf78f3dd1840f5f86dd03bc03153552e1dac95"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b414f159c05b0ceb1f4843be48a67ffa",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.7",
            "size": 401322,
            "upload_time": "2024-01-16T15:18:52",
            "upload_time_iso_8601": "2024-01-16T15:18:52.795741Z",
            "url": "https://files.pythonhosted.org/packages/f7/21/b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882/jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f2e7afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba",
                "md5": "dc94087afe9ef9799808c382f71a8ddd",
                "sha256": "40d3018adfc610ecb5aa67cdb9e9e77d64c11bbf954c411bee90d9e147a5c561"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "dc94087afe9ef9799808c382f71a8ddd",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.7",
            "size": 2555409,
            "upload_time": "2024-01-16T15:18:54",
            "upload_time_iso_8601": "2024-01-16T15:18:54.813684Z",
            "url": "https://files.pythonhosted.org/packages/f2/e7/afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba/jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a7eab04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1",
                "md5": "486482a9a9f4bacdb3210f9e71f85ae1",
                "sha256": "b8eafd55b1b77b7c21bf3688ca41303a8d4d73e2d950e4bbbfc0a84b3f156568"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "486482a9a9f4bacdb3210f9e71f85ae1",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.7",
            "size": 423722,
            "upload_time": "2024-01-16T15:18:56",
            "upload_time_iso_8601": "2024-01-16T15:18:56.184538Z",
            "url": "https://files.pythonhosted.org/packages/a7/ea/b04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1/jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bfdad7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e",
                "md5": "7106a7d75ce7f3fa10df4bb25292b78d",
                "sha256": "c8d867e68a716173702494e028006ef8162dcf1bba6bd0b48acd8a5e49a2fbc8"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "7106a7d75ce7f3fa10df4bb25292b78d",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.7",
            "size": 418897,
            "upload_time": "2024-01-16T15:18:57",
            "upload_time_iso_8601": "2024-01-16T15:18:57.841461Z",
            "url": "https://files.pythonhosted.org/packages/bf/da/d7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e/jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bae934c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8",
                "md5": "9249c8e6e0aee77e7672a221db6f6d6e",
                "sha256": "f8d94a6e2662f44837c6c80fe0a3eaa169708bb92b810a2e7578743f32b4e3ab"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9249c8e6e0aee77e7672a221db6f6d6e",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.7",
            "size": 2555015,
            "upload_time": "2024-01-16T15:18:59",
            "upload_time_iso_8601": "2024-01-16T15:18:59.583386Z",
            "url": "https://files.pythonhosted.org/packages/ba/e9/34c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8/jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b52c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a",
                "md5": "ce111156785e66ee7b35621b37f50e72",
                "sha256": "734a4b542db130dead020e0c842fe591337c50e4c8b66f2e8189f989d0d1dcce"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "ce111156785e66ee7b35621b37f50e72",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.7",
            "size": 423647,
            "upload_time": "2024-01-16T15:19:02",
            "upload_time_iso_8601": "2024-01-16T15:19:02.248620Z",
            "url": "https://files.pythonhosted.org/packages/0b/52/c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a/jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "65507b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9",
                "md5": "903ecd10f646facfff71eebdff514b9a",
                "sha256": "32a33728096302b31a3aae5ef940dd0548319423722524ca4fb2d605ba87bb17"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "903ecd10f646facfff71eebdff514b9a",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.7",
            "size": 391109,
            "upload_time": "2024-01-16T15:19:04",
            "upload_time_iso_8601": "2024-01-16T15:19:04.692525Z",
            "url": "https://files.pythonhosted.org/packages/65/50/7b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9/jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46d143e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150",
                "md5": "843ed9d2bf7de483e94809ba47497a27",
                "sha256": "abeb086eece2884ac6f0db7866aa769efc2f590a736e384e1f3626ff97a873d0"
            },
            "downloads": -1,
            "filename": "jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "843ed9d2bf7de483e94809ba47497a27",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.7",
            "size": 2555966,
            "upload_time": "2024-01-16T15:19:07",
            "upload_time_iso_8601": "2024-01-16T15:19:07.184210Z",
            "url": "https://files.pythonhosted.org/packages/46/d1/43e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150/jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-16 15:18:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "jskiner"
}
        
Elapsed time: 0.18501s