# PyJSONKit
A comprehensive Python toolkit for JSON processing with advanced AI-focused features for modern data workflows.
## 🚀 Features
### Core JSON Operations
- Easy JSON file manipulation and validation
- Pretty printing and formatting
- Robust parsing with error handling
- Simple API for common JSON operations
### 🤖 AI-Focused Features
- **AI JSON Processing**: Extract and fix JSON from AI model responses
- **Advanced Data Extraction**: JSONPath-like queries and entity extraction
- **Intelligent Data Cleaning**: Remove AI artifacts and sanitize data
- **Smart Data Transformation**: Reshape data for ML/AI workflows
- **Schema Generation**: Auto-generate schemas from AI data samples
## Installation
```bash
pip install pyjsonkit
```
## Quick Start
### Basic JSON Operations
```python
from pyjsonkit import JSONHandler
# Create a JSON handler
handler = JSONHandler("data.json")
# Get a value
value = handler.get("key")
# Set a value
handler.set("key", "value")
# Validate JSON
is_valid = handler.validate()
```
### AI-Focused Features
```python
from pyjsonkit import AIJSONProcessor, JSONExtractor, JSONCleaner
# Extract JSON from AI responses
ai_response = '''
Here's the data you requested:
```json
{"name": "John", "age": 30}
```
'''
data = AIJSONProcessor.extract_json_from_text(ai_response)
# Clean AI-generated artifacts
messy_data = {"name": "[AI_GENERATED] John Doe", "note": "[TODO: verify]"}
clean_data = JSONCleaner.clean_ai_artifacts(messy_data)
# Extract data with complex queries
result = JSONExtractor.extract_by_path(data, "users[*].name")
```
## 📚 Comprehensive Feature Set
### AI JSON Processor
- Extract JSON from mixed AI responses (markdown, code blocks, plain text)
- Fix common AI JSON errors (quotes, booleans, trailing commas)
- Batch process multiple AI responses with error handling
- Extract structured data from natural language text
### Advanced Data Extraction
- JSONPath-like data extraction with complex path queries
- Multi-path extraction in single operations
- AI entity extraction (emails, phones, URLs, etc.)
- Nested array extraction with configurable depth
- Schema analysis and statistics
### Intelligent Data Cleaning
- Remove AI-generated artifacts and markers
- Sanitize sensitive data for AI processing
- Normalize strings and remove extra whitespace
- Deduplicate arrays and remove null/empty values
- Clean malformed data structures
### Smart Data Transformation
- Reshape data for ML training (features/labels separation)
- Convert to chat/conversation formats for LLMs
- Create embeddings-ready format with metadata
- Aggregate and pivot data for analysis
- Flatten nested structures for CSV export
- Normalize data for AI prompts with size limits
### Schema Generation
- Auto-generate JSON schemas from data samples
- Create AI prompt schemas with validation rules
- Support for strict and flexible schema modes
- Include examples and constraints in generated schemas
- Validate data against generated schemas
## 🎯 Use Cases
- **AI/ML Data Pipelines**: Process and clean data from AI models
- **LLM Integration**: Extract structured data from language model outputs
- **Data Validation**: Ensure data quality in automated workflows
- **API Response Processing**: Handle inconsistent JSON from various sources
- **Data Transformation**: Prepare data for different ML frameworks
## Development
### Setup
```bash
git clone https://github.com/Pikachoo1111/jsonkit.git
cd jsonkit
pip install -e ".[dev]"
```
### Running Tests
```bash
pytest
```
### Code Formatting
```bash
black src/
isort src/
```
## 📖 Documentation
For detailed documentation and examples, see the [examples](examples/) directory.
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "pyjsonkit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Armaan Shahpuri <armaan30312@gmail.com>",
"keywords": "json, ai, ml, data-processing, validation, parsing, extraction, cleaning, transformation, schema-generation",
"author": null,
"author_email": "Armaan Shahpuri <armaan30312@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/0f/5d/0f5ffd0e37deca37e6a43fd373ee7c3e75d9050ced1ce33fa48cf504b06c/pyjsonkit-0.1.0.tar.gz",
"platform": null,
"description": "# PyJSONKit\n\nA comprehensive Python toolkit for JSON processing with advanced AI-focused features for modern data workflows.\n\n## \ud83d\ude80 Features\n\n### Core JSON Operations\n- Easy JSON file manipulation and validation\n- Pretty printing and formatting\n- Robust parsing with error handling\n- Simple API for common JSON operations\n\n### \ud83e\udd16 AI-Focused Features\n- **AI JSON Processing**: Extract and fix JSON from AI model responses\n- **Advanced Data Extraction**: JSONPath-like queries and entity extraction\n- **Intelligent Data Cleaning**: Remove AI artifacts and sanitize data\n- **Smart Data Transformation**: Reshape data for ML/AI workflows\n- **Schema Generation**: Auto-generate schemas from AI data samples\n\n## Installation\n\n```bash\npip install pyjsonkit\n```\n\n## Quick Start\n\n### Basic JSON Operations\n```python\nfrom pyjsonkit import JSONHandler\n\n# Create a JSON handler\nhandler = JSONHandler(\"data.json\")\n\n# Get a value\nvalue = handler.get(\"key\")\n\n# Set a value\nhandler.set(\"key\", \"value\")\n\n# Validate JSON\nis_valid = handler.validate()\n```\n\n### AI-Focused Features\n```python\nfrom pyjsonkit import AIJSONProcessor, JSONExtractor, JSONCleaner\n\n# Extract JSON from AI responses\nai_response = '''\nHere's the data you requested:\n```json\n{\"name\": \"John\", \"age\": 30}\n```\n'''\ndata = AIJSONProcessor.extract_json_from_text(ai_response)\n\n# Clean AI-generated artifacts\nmessy_data = {\"name\": \"[AI_GENERATED] John Doe\", \"note\": \"[TODO: verify]\"}\nclean_data = JSONCleaner.clean_ai_artifacts(messy_data)\n\n# Extract data with complex queries\nresult = JSONExtractor.extract_by_path(data, \"users[*].name\")\n```\n\n## \ud83d\udcda Comprehensive Feature Set\n\n### AI JSON Processor\n- Extract JSON from mixed AI responses (markdown, code blocks, plain text)\n- Fix common AI JSON errors (quotes, booleans, trailing commas)\n- Batch process multiple AI responses with error handling\n- Extract structured data from natural language text\n\n### Advanced Data Extraction\n- JSONPath-like data extraction with complex path queries\n- Multi-path extraction in single operations\n- AI entity extraction (emails, phones, URLs, etc.)\n- Nested array extraction with configurable depth\n- Schema analysis and statistics\n\n### Intelligent Data Cleaning\n- Remove AI-generated artifacts and markers\n- Sanitize sensitive data for AI processing\n- Normalize strings and remove extra whitespace\n- Deduplicate arrays and remove null/empty values\n- Clean malformed data structures\n\n### Smart Data Transformation\n- Reshape data for ML training (features/labels separation)\n- Convert to chat/conversation formats for LLMs\n- Create embeddings-ready format with metadata\n- Aggregate and pivot data for analysis\n- Flatten nested structures for CSV export\n- Normalize data for AI prompts with size limits\n\n### Schema Generation\n- Auto-generate JSON schemas from data samples\n- Create AI prompt schemas with validation rules\n- Support for strict and flexible schema modes\n- Include examples and constraints in generated schemas\n- Validate data against generated schemas\n\n## \ud83c\udfaf Use Cases\n\n- **AI/ML Data Pipelines**: Process and clean data from AI models\n- **LLM Integration**: Extract structured data from language model outputs\n- **Data Validation**: Ensure data quality in automated workflows\n- **API Response Processing**: Handle inconsistent JSON from various sources\n- **Data Transformation**: Prepare data for different ML frameworks\n\n## Development\n\n### Setup\n\n```bash\ngit clone https://github.com/Pikachoo1111/jsonkit.git\ncd jsonkit\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack src/\nisort src/\n```\n\n## \ud83d\udcd6 Documentation\n\nFor detailed documentation and examples, see the [examples](examples/) directory.\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT License - see LICENSE file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "A comprehensive Python toolkit for JSON processing with advanced AI-focused features for modern data workflows",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/Pikachoo1111/jsonkit/issues",
"Homepage": "https://github.com/Pikachoo1111/jsonkit",
"Repository": "https://github.com/Pikachoo1111/jsonkit.git"
},
"split_keywords": [
"json",
" ai",
" ml",
" data-processing",
" validation",
" parsing",
" extraction",
" cleaning",
" transformation",
" schema-generation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0efe1c5036ac3b1bed46324cf9c92fff1c691622c5e759ec31ea37dd2956f1b8",
"md5": "b78370c867b86460686895384374edfe",
"sha256": "12226ef2ab78747150a945c3afd61dd0e71a62f46c3738105ca4bf5a04972cb0"
},
"downloads": -1,
"filename": "pyjsonkit-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b78370c867b86460686895384374edfe",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 24526,
"upload_time": "2025-08-15T17:12:20",
"upload_time_iso_8601": "2025-08-15T17:12:20.281391Z",
"url": "https://files.pythonhosted.org/packages/0e/fe/1c5036ac3b1bed46324cf9c92fff1c691622c5e759ec31ea37dd2956f1b8/pyjsonkit-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0f5d0f5ffd0e37deca37e6a43fd373ee7c3e75d9050ced1ce33fa48cf504b06c",
"md5": "a3c22ed6ba73ca747a3dd66afa91a3bf",
"sha256": "c90be03d52e38584c0e8af0ae8cc447954b95f292a7098ef2f09267340969c34"
},
"downloads": -1,
"filename": "pyjsonkit-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a3c22ed6ba73ca747a3dd66afa91a3bf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 40369,
"upload_time": "2025-08-15T17:12:21",
"upload_time_iso_8601": "2025-08-15T17:12:21.474901Z",
"url": "https://files.pythonhosted.org/packages/0f/5d/0f5ffd0e37deca37e6a43fd373ee7c3e75d9050ced1ce33fa48cf504b06c/pyjsonkit-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-15 17:12:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Pikachoo1111",
"github_project": "jsonkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pyjsonkit"
}