Name | jskiner JSON |
Version |
0.1.1
JSON |
| download |
home_page | |
Summary | |
upload_time | 2024-01-16 15:18:48 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.7 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml)
# JSkiner
The is a python **Js**on **Sch**ema **In**ference **E**ngine with **R**ust's core. Its inferencing speed is about 10 times of its pure-python counterpart ([jsonschema-inference](https://pypi.org/project/jsonschema-inference/)).
# Installation
```bash
pip install jskiner
```
# Usage
## Checking the Json Schema of a Large .jsonl file
```bash
jskiner \
--in <path_to_jsonl>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--split <number_of_split_batch_size>
--split-path <path_to_store_the_split_files>
```
## Checking the Json Schema for a folder of json files
```bash
jskiner \
--in <path_to_jsons>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--batch-size <batch_size_for_inferencing>
--cuckoo-path <path_to_store_the_cuckoo_filter>
--cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
--cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>
```
## Infering the Schema in Python
```python
from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema
```
>> Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})
## Calculate the Union of a List of Schema
```python
from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema
```
>> Optional(Atomic(Int()))
## Using | Operation between Two Schema
```python
from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema
```
>> Optional(Atomic(Int()))
# TODO:
- [X] Enable inference from a folder of json files
- [X] Enable ignoring of existing json files using cuckoo filter
- [X] Enable add starting schema file
- [X] Enable batch-by-batch process on large jsonl file
- [X] FIX: make sure __repr__ escape special characters.
- [X] Auto Formatting Using Black
- [X] Enable sampling of json files
- [X] Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging)
- [X] Fix: adding UnionRecord schema object
- [ ] Enable direct inferencing from API online. (able to avoid repeat download of json)
- [ ] Enable Regex to represent patterned FieldSet
Raw data
{
"_id": null,
"home_page": "",
"name": "jskiner",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "",
"download_url": "",
"platform": null,
"description": "[](https://github.com/jeffrey82221/JSkiner/actions/workflows/ci.yml)\n\n# JSkiner \n\nThe is a python **Js**on **Sch**ema **In**ference **E**ngine with **R**ust's core. Its inferencing speed is about 10 times of its pure-python counterpart ([jsonschema-inference](https://pypi.org/project/jsonschema-inference/)).\n\n# Installation \n\n```bash\npip install jskiner\n```\n\n# Usage\n\n## Checking the Json Schema of a Large .jsonl file\n\n```bash\njskiner \\\n --in <path_to_jsonl> \n --verbose <false/true> \n --out <output_file_path>\n --nworkers <number_of_cpu_core>\n --split <number_of_split_batch_size>\n --split-path <path_to_store_the_split_files>\n```\n\n## Checking the Json Schema for a folder of json files\n\n```bash\njskiner \\\n --in <path_to_jsons> \n --verbose <false/true> \n --out <output_file_path>\n --nworkers <number_of_cpu_core>\n --batch-size <batch_size_for_inferencing>\n --cuckoo-path <path_to_store_the_cuckoo_filter>\n --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>\n --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>\n```\n\n## Infering the Schema in Python\n\n```python\nfrom jskiner import InferenceEngine\ncpu_cnt = 16\nengine = InferenceEngine(cpu_cnt)\njson_string_list = [\"1\", \"1.2\", \"null\", \"{\\\"a\\\": 1}\"]\nschema = engine.run(json_string_list)\nschema\n```\n>> Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({\"a\": Atomic(Int())})})\n\n## Calculate the Union of a List of Schema \n\n```python\nfrom jskiner import InferenceEngine\nfrom jskiner.schema import Atomic, Int, Non\ncpu_cnt = 16\nengine = InferenceEngine(cpu_cnt)\nschema = engine.run([Atomic(Int()), Atomic(Non()])\nschema\n```\n>> Optional(Atomic(Int()))\n\n## Using | Operation between Two Schema\n\n```python\nfrom jskiner import Atomic, Int, Non\nschema = Atomic(Int()) | Atomic(Non())\nschema\n```\n>> Optional(Atomic(Int()))\n\n# TODO:\n\n- [X] Enable inference from a folder of json files\n- [X] Enable ignoring of existing json files using cuckoo filter\n- [X] Enable add starting schema file\n- [X] Enable batch-by-batch process on large jsonl file\n- [X] FIX: make sure __repr__ escape special characters. \n- [X] Auto Formatting Using Black\n- [X] Enable sampling of json files\n- [X] Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging) \n- [X] Fix: adding UnionRecord schema object\n- [ ] Enable direct inferencing from API online. (able to avoid repeat download of json)\n- [ ] Enable Regex to represent patterned FieldSet\n\n",
"bugtrack_url": null,
"license": "",
"summary": "",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "24bf59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced",
"md5": "9cc5bcc742135d08cc9dee9dfb85a65f",
"sha256": "a51fa54d4d0833769a694d52e8b81e6b90d33dd51d6c2ee5e80bc2c15cfaf236"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "9cc5bcc742135d08cc9dee9dfb85a65f",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.7",
"size": 399863,
"upload_time": "2024-01-16T15:18:48",
"upload_time_iso_8601": "2024-01-16T15:18:48.790432Z",
"url": "https://files.pythonhosted.org/packages/24/bf/59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced/jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "60266945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af",
"md5": "a8796a49c9d4e5b1985e6f096f1f3eef",
"sha256": "1a482112ea462b3b803b946aee3291e346bbd7003d4a9ecc81bf473fc060da82"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "a8796a49c9d4e5b1985e6f096f1f3eef",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.7",
"size": 390244,
"upload_time": "2024-01-16T15:18:50",
"upload_time_iso_8601": "2024-01-16T15:18:50.121362Z",
"url": "https://files.pythonhosted.org/packages/60/26/6945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af/jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "da0946b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1",
"md5": "3b7c4a9ffbd05589a105c9f619285f23",
"sha256": "0d5f5cb30b3c10641500e1aff6f920cd1deacb5c2a2db06fc9cde8bd1bf43d60"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "3b7c4a9ffbd05589a105c9f619285f23",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.7",
"size": 2555475,
"upload_time": "2024-01-16T15:18:51",
"upload_time_iso_8601": "2024-01-16T15:18:51.451475Z",
"url": "https://files.pythonhosted.org/packages/da/09/46b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1/jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f721b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882",
"md5": "b414f159c05b0ceb1f4843be48a67ffa",
"sha256": "c14bbb98e5cb3d58132b005c97cf78f3dd1840f5f86dd03bc03153552e1dac95"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "b414f159c05b0ceb1f4843be48a67ffa",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7",
"size": 401322,
"upload_time": "2024-01-16T15:18:52",
"upload_time_iso_8601": "2024-01-16T15:18:52.795741Z",
"url": "https://files.pythonhosted.org/packages/f7/21/b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882/jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f2e7afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba",
"md5": "dc94087afe9ef9799808c382f71a8ddd",
"sha256": "40d3018adfc610ecb5aa67cdb9e9e77d64c11bbf954c411bee90d9e147a5c561"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "dc94087afe9ef9799808c382f71a8ddd",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7",
"size": 2555409,
"upload_time": "2024-01-16T15:18:54",
"upload_time_iso_8601": "2024-01-16T15:18:54.813684Z",
"url": "https://files.pythonhosted.org/packages/f2/e7/afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba/jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a7eab04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1",
"md5": "486482a9a9f4bacdb3210f9e71f85ae1",
"sha256": "b8eafd55b1b77b7c21bf3688ca41303a8d4d73e2d950e4bbbfc0a84b3f156568"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "486482a9a9f4bacdb3210f9e71f85ae1",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.7",
"size": 423722,
"upload_time": "2024-01-16T15:18:56",
"upload_time_iso_8601": "2024-01-16T15:18:56.184538Z",
"url": "https://files.pythonhosted.org/packages/a7/ea/b04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1/jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bfdad7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e",
"md5": "7106a7d75ce7f3fa10df4bb25292b78d",
"sha256": "c8d867e68a716173702494e028006ef8162dcf1bba6bd0b48acd8a5e49a2fbc8"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "7106a7d75ce7f3fa10df4bb25292b78d",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.7",
"size": 418897,
"upload_time": "2024-01-16T15:18:57",
"upload_time_iso_8601": "2024-01-16T15:18:57.841461Z",
"url": "https://files.pythonhosted.org/packages/bf/da/d7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e/jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bae934c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8",
"md5": "9249c8e6e0aee77e7672a221db6f6d6e",
"sha256": "f8d94a6e2662f44837c6c80fe0a3eaa169708bb92b810a2e7578743f32b4e3ab"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "9249c8e6e0aee77e7672a221db6f6d6e",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.7",
"size": 2555015,
"upload_time": "2024-01-16T15:18:59",
"upload_time_iso_8601": "2024-01-16T15:18:59.583386Z",
"url": "https://files.pythonhosted.org/packages/ba/e9/34c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8/jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0b52c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a",
"md5": "ce111156785e66ee7b35621b37f50e72",
"sha256": "734a4b542db130dead020e0c842fe591337c50e4c8b66f2e8189f989d0d1dcce"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "ce111156785e66ee7b35621b37f50e72",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 423647,
"upload_time": "2024-01-16T15:19:02",
"upload_time_iso_8601": "2024-01-16T15:19:02.248620Z",
"url": "https://files.pythonhosted.org/packages/0b/52/c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a/jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "65507b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9",
"md5": "903ecd10f646facfff71eebdff514b9a",
"sha256": "32a33728096302b31a3aae5ef940dd0548319423722524ca4fb2d605ba87bb17"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "903ecd10f646facfff71eebdff514b9a",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 391109,
"upload_time": "2024-01-16T15:19:04",
"upload_time_iso_8601": "2024-01-16T15:19:04.692525Z",
"url": "https://files.pythonhosted.org/packages/65/50/7b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9/jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "46d143e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150",
"md5": "843ed9d2bf7de483e94809ba47497a27",
"sha256": "abeb086eece2884ac6f0db7866aa769efc2f590a736e384e1f3626ff97a873d0"
},
"downloads": -1,
"filename": "jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "843ed9d2bf7de483e94809ba47497a27",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2555966,
"upload_time": "2024-01-16T15:19:07",
"upload_time_iso_8601": "2024-01-16T15:19:07.184210Z",
"url": "https://files.pythonhosted.org/packages/46/d1/43e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150/jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-16 15:18:48",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "jskiner"
}