Name | jskiner JSON |
Version | 0.1.1 JSON |
download | |
home_page | |
Summary | |
upload_time | 2024-01-16 15:18:48 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.7 |
license | |
keywords | |
VCS | |
bugtrack_url | |
requirements | No requirements were recorded. |
Travis-CI | No Travis. |
coveralls test coverage | No coveralls. |
[![Continuous Integration](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml) # JSkiner The is a python **Js**on **Sch**ema **In**ference **E**ngine with **R**ust's core. Its inferencing speed is about 10 times of its pure-python counterpart ([jsonschema-inference](https://pypi.org/project/jsonschema-inference/)). # Installation ```bash pip install jskiner ``` # Usage ## Checking the Json Schema of a Large .jsonl file ```bash jskiner \ --in <path_to_jsonl> --verbose <false/true> --out <output_file_path> --nworkers <number_of_cpu_core> --split <number_of_split_batch_size> --split-path <path_to_store_the_split_files> ``` ## Checking the Json Schema for a folder of json files ```bash jskiner \ --in <path_to_jsons> --verbose <false/true> --out <output_file_path> --nworkers <number_of_cpu_core> --batch-size <batch_size_for_inferencing> --cuckoo-path <path_to_store_the_cuckoo_filter> --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)> --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter> ``` ## Infering the Schema in Python ```python from jskiner import InferenceEngine cpu_cnt = 16 engine = InferenceEngine(cpu_cnt) json_string_list = ["1", "1.2", "null", "{\"a\": 1}"] schema = engine.run(json_string_list) schema ``` >> Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})}) ## Calculate the Union of a List of Schema ```python from jskiner import InferenceEngine from jskiner.schema import Atomic, Int, Non cpu_cnt = 16 engine = InferenceEngine(cpu_cnt) schema = engine.run([Atomic(Int()), Atomic(Non()]) schema ``` >> Optional(Atomic(Int())) ## Using | Operation between Two Schema ```python from jskiner import Atomic, Int, Non schema = Atomic(Int()) | Atomic(Non()) schema ``` >> Optional(Atomic(Int())) # TODO: - [X] Enable inference from a folder of json files - [X] Enable ignoring of existing json files using cuckoo filter - [X] Enable add starting schema file - [X] Enable batch-by-batch process on large jsonl file - [X] FIX: make sure __repr__ escape special characters. - [X] Auto Formatting Using Black - [X] Enable sampling of json files - [X] Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging) - [X] Fix: adding UnionRecord schema object - [ ] Enable direct inferencing from API online. (able to avoid repeat download of json) - [ ] Enable Regex to represent patterned FieldSet
{ "_id": null, "home_page": "", "name": "jskiner", "maintainer": "", "docs_url": null, "requires_python": ">=3.7", "maintainer_email": "", "keywords": "", "author": "", "author_email": "", "download_url": "", "platform": null, "description": "[![Continuous Integration](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml)\n\n# JSkiner \n\nThe is a python **Js**on **Sch**ema **In**ference **E**ngine with **R**ust's core. Its inferencing speed is about 10 times of its pure-python counterpart ([jsonschema-inference](https://pypi.org/project/jsonschema-inference/)).\n\n# Installation \n\n```bash\npip install jskiner\n```\n\n# Usage\n\n## Checking the Json Schema of a Large .jsonl file\n\n```bash\njskiner \\\n --in <path_to_jsonl> \n --verbose <false/true> \n --out <output_file_path>\n --nworkers <number_of_cpu_core>\n --split <number_of_split_batch_size>\n --split-path <path_to_store_the_split_files>\n```\n\n## Checking the Json Schema for a folder of json files\n\n```bash\njskiner \\\n --in <path_to_jsons> \n --verbose <false/true> \n --out <output_file_path>\n --nworkers <number_of_cpu_core>\n --batch-size <batch_size_for_inferencing>\n --cuckoo-path <path_to_store_the_cuckoo_filter>\n --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>\n --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>\n```\n\n## Infering the Schema in Python\n\n```python\nfrom jskiner import InferenceEngine\ncpu_cnt = 16\nengine = InferenceEngine(cpu_cnt)\njson_string_list = [\"1\", \"1.2\", \"null\", \"{\\\"a\\\": 1}\"]\nschema = engine.run(json_string_list)\nschema\n```\n>> Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({\"a\": Atomic(Int())})})\n\n## Calculate the Union of a List of Schema \n\n```python\nfrom jskiner import InferenceEngine\nfrom jskiner.schema import Atomic, Int, Non\ncpu_cnt = 16\nengine = InferenceEngine(cpu_cnt)\nschema = engine.run([Atomic(Int()), Atomic(Non()])\nschema\n```\n>> Optional(Atomic(Int()))\n\n## Using | Operation between Two Schema\n\n```python\nfrom jskiner import Atomic, Int, Non\nschema = Atomic(Int()) | Atomic(Non())\nschema\n```\n>> Optional(Atomic(Int()))\n\n# TODO:\n\n- [X] Enable inference from a folder of json files\n- [X] Enable ignoring of existing json files using cuckoo filter\n- [X] Enable add starting schema file\n- [X] Enable batch-by-batch process on large jsonl file\n- [X] FIX: make sure __repr__ escape special characters. \n- [X] Auto Formatting Using Black\n- [X] Enable sampling of json files\n- [X] Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging) \n- [X] Fix: adding UnionRecord schema object\n- [ ] Enable direct inferencing from API online. (able to avoid repeat download of json)\n- [ ] Enable Regex to represent patterned FieldSet\n\n", "bugtrack_url": null, "license": "", "summary": "", "version": "0.1.1", "project_urls": null, "split_keywords": [], "urls": [ { "comment_text": "", "digests": { "blake2b_256": "24bf59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced", "md5": "9cc5bcc742135d08cc9dee9dfb85a65f", "sha256": "a51fa54d4d0833769a694d52e8b81e6b90d33dd51d6c2ee5e80bc2c15cfaf236" }, "downloads": -1, "filename": "jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl", "has_sig": false, "md5_digest": "9cc5bcc742135d08cc9dee9dfb85a65f", "packagetype": "bdist_wheel", "python_version": "cp310", "requires_python": ">=3.7", "size": 399863, "upload_time": "2024-01-16T15:18:48", "upload_time_iso_8601": "2024-01-16T15:18:48.790432Z", "url": "https://files.pythonhosted.org/packages/24/bf/59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced/jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "60266945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af", "md5": "a8796a49c9d4e5b1985e6f096f1f3eef", "sha256": "1a482112ea462b3b803b946aee3291e346bbd7003d4a9ecc81bf473fc060da82" }, "downloads": -1, "filename": "jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl", "has_sig": false, "md5_digest": "a8796a49c9d4e5b1985e6f096f1f3eef", "packagetype": "bdist_wheel", "python_version": "cp310", "requires_python": ">=3.7", "size": 390244, "upload_time": "2024-01-16T15:18:50", "upload_time_iso_8601": "2024-01-16T15:18:50.121362Z", "url": "https://files.pythonhosted.org/packages/60/26/6945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af/jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "da0946b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1", "md5": "3b7c4a9ffbd05589a105c9f619285f23", "sha256": "0d5f5cb30b3c10641500e1aff6f920cd1deacb5c2a2db06fc9cde8bd1bf43d60" }, "downloads": -1, "filename": "jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "has_sig": false, "md5_digest": "3b7c4a9ffbd05589a105c9f619285f23", "packagetype": "bdist_wheel", "python_version": "cp310", "requires_python": ">=3.7", "size": 2555475, "upload_time": "2024-01-16T15:18:51", "upload_time_iso_8601": "2024-01-16T15:18:51.451475Z", "url": "https://files.pythonhosted.org/packages/da/09/46b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1/jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "f721b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882", "md5": "b414f159c05b0ceb1f4843be48a67ffa", "sha256": "c14bbb98e5cb3d58132b005c97cf78f3dd1840f5f86dd03bc03153552e1dac95" }, "downloads": -1, "filename": "jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl", "has_sig": false, "md5_digest": "b414f159c05b0ceb1f4843be48a67ffa", "packagetype": "bdist_wheel", "python_version": "cp37", "requires_python": ">=3.7", "size": 401322, "upload_time": "2024-01-16T15:18:52", "upload_time_iso_8601": "2024-01-16T15:18:52.795741Z", "url": "https://files.pythonhosted.org/packages/f7/21/b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882/jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "f2e7afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba", "md5": "dc94087afe9ef9799808c382f71a8ddd", "sha256": "40d3018adfc610ecb5aa67cdb9e9e77d64c11bbf954c411bee90d9e147a5c561" }, "downloads": -1, "filename": "jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "has_sig": false, "md5_digest": "dc94087afe9ef9799808c382f71a8ddd", "packagetype": "bdist_wheel", "python_version": "cp37", "requires_python": ">=3.7", "size": 2555409, "upload_time": "2024-01-16T15:18:54", "upload_time_iso_8601": "2024-01-16T15:18:54.813684Z", "url": "https://files.pythonhosted.org/packages/f2/e7/afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba/jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "a7eab04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1", "md5": "486482a9a9f4bacdb3210f9e71f85ae1", "sha256": "b8eafd55b1b77b7c21bf3688ca41303a8d4d73e2d950e4bbbfc0a84b3f156568" }, "downloads": -1, "filename": "jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl", "has_sig": false, "md5_digest": "486482a9a9f4bacdb3210f9e71f85ae1", "packagetype": "bdist_wheel", "python_version": "cp38", "requires_python": ">=3.7", "size": 423722, "upload_time": "2024-01-16T15:18:56", "upload_time_iso_8601": "2024-01-16T15:18:56.184538Z", "url": "https://files.pythonhosted.org/packages/a7/ea/b04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1/jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "bfdad7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e", "md5": "7106a7d75ce7f3fa10df4bb25292b78d", "sha256": "c8d867e68a716173702494e028006ef8162dcf1bba6bd0b48acd8a5e49a2fbc8" }, "downloads": -1, "filename": "jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl", "has_sig": false, "md5_digest": "7106a7d75ce7f3fa10df4bb25292b78d", "packagetype": "bdist_wheel", "python_version": "cp38", "requires_python": ">=3.7", "size": 418897, "upload_time": "2024-01-16T15:18:57", "upload_time_iso_8601": "2024-01-16T15:18:57.841461Z", "url": "https://files.pythonhosted.org/packages/bf/da/d7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e/jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "bae934c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8", "md5": "9249c8e6e0aee77e7672a221db6f6d6e", "sha256": "f8d94a6e2662f44837c6c80fe0a3eaa169708bb92b810a2e7578743f32b4e3ab" }, "downloads": -1, "filename": "jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "has_sig": false, "md5_digest": "9249c8e6e0aee77e7672a221db6f6d6e", "packagetype": "bdist_wheel", "python_version": "cp38", "requires_python": ">=3.7", "size": 2555015, "upload_time": "2024-01-16T15:18:59", "upload_time_iso_8601": "2024-01-16T15:18:59.583386Z", "url": "https://files.pythonhosted.org/packages/ba/e9/34c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8/jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "0b52c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a", "md5": "ce111156785e66ee7b35621b37f50e72", "sha256": "734a4b542db130dead020e0c842fe591337c50e4c8b66f2e8189f989d0d1dcce" }, "downloads": -1, "filename": "jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl", "has_sig": false, "md5_digest": "ce111156785e66ee7b35621b37f50e72", "packagetype": "bdist_wheel", "python_version": "cp39", "requires_python": ">=3.7", "size": 423647, "upload_time": "2024-01-16T15:19:02", "upload_time_iso_8601": "2024-01-16T15:19:02.248620Z", "url": "https://files.pythonhosted.org/packages/0b/52/c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a/jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "65507b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9", "md5": "903ecd10f646facfff71eebdff514b9a", "sha256": "32a33728096302b31a3aae5ef940dd0548319423722524ca4fb2d605ba87bb17" }, "downloads": -1, "filename": "jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl", "has_sig": false, "md5_digest": "903ecd10f646facfff71eebdff514b9a", "packagetype": "bdist_wheel", "python_version": "cp39", "requires_python": ">=3.7", "size": 391109, "upload_time": "2024-01-16T15:19:04", "upload_time_iso_8601": "2024-01-16T15:19:04.692525Z", "url": "https://files.pythonhosted.org/packages/65/50/7b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9/jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "46d143e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150", "md5": "843ed9d2bf7de483e94809ba47497a27", "sha256": "abeb086eece2884ac6f0db7866aa769efc2f590a736e384e1f3626ff97a873d0" }, "downloads": -1, "filename": "jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "has_sig": false, "md5_digest": "843ed9d2bf7de483e94809ba47497a27", "packagetype": "bdist_wheel", "python_version": "cp39", "requires_python": ">=3.7", "size": 2555966, "upload_time": "2024-01-16T15:19:07", "upload_time_iso_8601": "2024-01-16T15:19:07.184210Z", "url": "https://files.pythonhosted.org/packages/46/d1/43e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150/jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", "yanked": false, "yanked_reason": null } ], "upload_time": "2024-01-16 15:18:48", "github": false, "gitlab": false, "bitbucket": false, "codeberg": false, "lcname": "jskiner" }