# JsonAI β Production-Ready Structured JSON Generation with LLMs
## Environment Configuration
This project uses separate environment files for dev, qa, perf, cte, and prod, each located at the project root as `.env.dev`, `.env.qa`, `.env.perf`, `.env.cte`, and `.env.prod`. These files contain environment-specific variables for OIDC, metrics, tracing, and service endpoints. All files use the same variable structure for consistency and ease of deployment. See the `examples/stripe_schemas/` directory for environment-specific schema configs.
JsonAI is a comprehensive Python library for generating structured JSON data using Large Language Models (LLMs). It provides enterprise-grade features including robust JSON schema validation, multiple model backends, REST API, React frontend, CLI interface, and production deployment configurations.
Current version: 0.15.1
## π Whatβs New in 0.15.1
- Stabilized FastAPI REST API with endpoints for sync/async generation, batch processing, stats, cache management, and schema validation
- Performance suite:
- PerformanceMonitor async timing fixes
- CachedJsonformer with LRU/TTL caching
- BatchProcessor for efficient concurrent execution
- OptimizedJsonformer combines caching + batch processing with warmup
- Async generation improvements:
- FullAsyncJsonformer (aliased as AsyncJsonformer in the API)
- AsyncJsonformer wrapper in main.py for async tool execution
- Logging hygiene: lazy logging interpolation to reduce overhead
- Packaging: PyPI publish flow cleaned; version bumped to 0.15.1
## π Features
### Quantitative Output Quality Metrics
JsonAI's output quality is validated with statistical metrics. The following table summarizes KL divergence (lower is better) and timing (seconds) for core types, measured using uniform schema sampling and the built-in metrics suite:
| Type | KL Divergence | Time (s) |
|---------|---------------|----------|
| number | 0.016813 | 4.5798 |
| integer | 0.000864 | 4.5564 |
| boolean | 0.000018 | 4.4584 |
| enum | 0.000108 | 4.4765 |
All values are well below the recommended threshold (KL < 0.5), demonstrating high-fidelity, schema-faithful sampling. See `tests/test_metrics_sampling.py` for methodology.
### Core Capabilities
- Multiple LLM Backends: Ollama, OpenAI, and HuggingFace Transformers
- Full JSON Schema Coverage: primitives, arrays, objects, enums, nested structures, oneOf
- Performance Optimization: caching (LRU/TTL), batch processing, async operations
- Production Ready: Docker, FastAPI, monitoring, scaling considerations
### Interfaces & APIs
- REST API: FastAPI-based service with OpenAPI docs
- React Frontend: Modern web interface for JSON generation
- CLI Interface: Command-line tools for automation and batch processing
- Python Library: Programmatic access with sync and async support
### Enterprise Features
- Caching System: Intelligent multi-level caching (LRU/TTL)
- Batch Processing: Concurrent batch execution
- Performance Monitoring: Built-in metrics via PerformanceMonitor
- Schema Validation: Comprehensive validation with jsonschema
- Multiple Output Formats: JSON, YAML, XML, and CSV
## π¦ Installation
### Option 1: pip (Recommended)
```bash
pip install jsonai
```
### Option 2: From Source
```bash
git clone https://github.com/yourusername/JsonAI.git
cd JsonAI
poetry install
```
### Option 3: Docker
```bash
# Quick start with Docker
docker run -p 8000:8000 jsonai:latest
# Full stack with Docker Compose
docker-compose up -d
```
## Architecture Overview
The `jsonAI` library is modular and consists of the following components:
- **Jsonformer** (jsonAI.main): Orchestrates generation, formatting, and validation
- **TypeGenerator**: Generates values for each JSON Schema type
- **OutputFormatter**: Converts data into JSON, YAML, XML, CSV
- **SchemaValidator**: Validates data with jsonschema
- **ToolRegistry**: Registers and resolves Python/MCP tools
- **Async Paths**:
- **FullAsyncJsonformer** (jsonAI.async_jsonformer): asynchronous generator taking model_backend, json_schema, prompt (aliased as AsyncJsonformer in API)
- **AsyncJsonformer wrapper** (jsonAI.main): wraps a Jsonformer instance for async tool execution
## Testing
The project includes comprehensive tests for each component and integration:
- **Unit Tests**: Test individual components.
- **Integration Tests**: Validate the interaction between components.
To run tests:
```bash
pytest tests/
```
## Quick API Start (FastAPI)
Run the API with uvicorn:
```bash
uvicorn jsonAI.api:app --host 0.0.0.0 --port 8000
```
Then open http://localhost:8000/docs for interactive Swagger UI.
### REST Endpoints
- POST /generate β synchronous generation
- POST /generate/async β asynchronous generation
- POST /generate/batch β concurrent batch generation
- GET /stats β performance and cache statistics
- DELETE /cache β clear all caches
- POST /validate β validate a JSON schema
Minimal cURL examples:
```bash
# Sync generate
curl -X POST http://localhost:8000/generate -H "Content-Type: application/json" -d '{
"prompt": "Generate a simple user object",
"schema": {"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"}}},
"model_name": "ollama",
"model_path": "mistral:latest"
}'
# Async generate
curl -X POST http://localhost:8000/generate/async -H "Content-Type: application/json" -d '{
"prompt": "Generate a simple user object",
"schema": {"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"}}},
"model_name": "ollama",
"model_path": "mistral:latest"
}'
# Batch generate
curl -X POST http://localhost:8000/generate/batch -H "Content-Type: application/json" -d '{
"requests": [
{"prompt":"User 1","schema":{"type":"object","properties":{"name":{"type":"string"}}},"model_name":"ollama","model_path":"mistral:latest"},
{"prompt":"User 2","schema":{"type":"object","properties":{"name":{"type":"string"}}},"model_name":"ollama","model_path":"mistral:latest"}
],
"max_concurrent": 5
}'
```
## Examples
### Stripe Schema Demo
A full demonstration of environment-based configuration and schema-driven generation is provided in both:
- [`examples/stripe_schemas/stripe_schema_demo.py`](examples/stripe_schemas/stripe_schema_demo.py) (Python script)
- [`examples/stripe_schemas/stripe_schema_demo.ipynb`](examples/stripe_schemas/stripe_schema_demo.ipynb) (Jupyter notebook)
**Features demonstrated:**
- Loading Stripe-like schemas and environment-specific config files
- Switching between multiple schemas (`transfer_reversals_metadata`, `tax_rates_metadata`, `transfer_reversals`) and environments (`dev`, `qa`, `cte`, `perf`, `prod`)
- Using config file naming conventions: `<schema>.<env>.json` (e.g., `transfer_reversals_metadata.dev.json`)
- Tool chaining and environment-driven config patterns
- Integration with Ollama and JsonAI's tool registry
**Usage pattern:**
```python
env = "dev" # or "qa", "cte", "perf", "prod"
schema_choice = "transfer_reversals_metadata" # or "tax_rates_metadata", "transfer_reversals"
config_path = base_dir / f"{schema_choice}.{env}.json"
```
All required schema and config files are provided in [`examples/stripe_schemas/`](examples/stripe_schemas/).
You can run the Python script or the notebook to see how to generate and validate data for any supported schema/environment combination.
See the `examples/stripe_schemas/` directory for all related files and configuration patterns.
### Basic JSON Generation
```python
from jsonAI.main import Jsonformer
from jsonAI.model_backends import DummyBackend
backend = DummyBackend() # replace with OllamaBackend/OpenAIBackend/etc.
# Primitive type: string
schema = {"type": "string"}
prompt = "Generate a random color name."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)
print(jsonformer()) # e.g., "blue"
# Primitive type: number
schema = {"type": "number"}
prompt = "Generate a random floating point number."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)
print(jsonformer()) # e.g., 3.1415
# Enum type
schema = {"type": "string", "enum": ["A", "B", "C"]}
prompt = "Pick a letter from the set A, B, or C."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)
print(jsonformer()) # e.g., "B"
# Object type
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"isStudent": {"type": "boolean"}
}
}
prompt = "Generate a person's profile."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)
output = jsonformer()
print(output)
```
### XML Output
### YAML Output
```python
schema = {
"type": "object",
"properties": {
"city": {"type": "string"},
"population": {"type": "integer"}
}
}
prompt = "Generate a city profile."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, output_format="yaml")
output = jsonformer()
print(output)
```
### CSV Output
```python
schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"score": {"type": "number"}
}
}
}
prompt = "Generate a list of students and their scores."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, output_format="csv")
output = jsonformer()
print(output)
```
### CLI Example
#### Basic CLI Usage
```bash
python -m jsonAI.cli generate --schema schema.json --prompt "Generate a product" --output-format json
```
#### Using Ollama Backend (Recommended for LLMs)
```bash
python -m jsonAI.cli generate --schema complex_schema.json \
--prompt "Generate a comprehensive person profile as JSON." \
--use-ollama --ollama-model mistral:latest
```
#### Features
- Robustly extracts the first valid JSON object from any LLM output (even if wrapped in <answer> tags or surrounded by extra text)
- Supports all JSON schema types: primitives, enums, arrays, objects, null, oneOf, nested/complex
- Validates output against the schema and warns if invalid
- Pretty-prints objects/arrays, prints primitives/null as-is
- Production-ready for any schema and LLM output style
#### Example Output
```json
{
"id": "profile with all supported JSON schema types.",
"name": "re",
"age": 30,
"is_active": true,
"email": "example@example.com",
"roles": ["admin", "user"],
"address": {"street": "123 Main St", "city": "Anytown", "zip": "12345", "country": "USA"},
"preferences": {"newsletter": true, "theme": "dark", "language": "en"},
"tags": ["tech", "developer"],
"score": 95,
"metadata": {"key1": "value1", "key2": "value2"},
"status": "active",
"history": [{"date": "2023-01-01", "event": "joined", "details": "Account created"}],
"profile_picture": "https://example.com/avatar.jpg",
"settings": {"notifications": true, "privacy": "private"},
"null_field": null
}
```
See `complex_schema.json` for a comprehensive schema example.
### Tool Calling Example
```python
def send_email(email):
print(f"Sending email to {email}")
return "Email sent"
tool_registry = ToolRegistry()
tool_registry.register_tool("send_email", send_email)
schema = {
"type": "object",
"properties": {
"email": {"type": "string", "format": "email"}
},
"x-jsonai-tool-call": {
"name": "send_email",
"arguments": {"email": "email"}
}
}
prompt = "Generate a user email."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
```
### MCP Integration Example
```python
def mcp_callback(tool_name, server_name, kwargs):
# Simulate MCP call
return f"Called {tool_name} on {server_name} with {kwargs}"
schema = {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"x-jsonai-tool-call": {
"name": "search_tool",
"arguments": {"query": "query"}
}
}
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, mcp_callback=mcp_callback)
output = jsonformer()
print(output)
```
### Complex Schema Example
```python
schema = {
"type": "object",
"properties": {
"user": {
"type": "object",
"properties": {
"id": {"type": "uuid"},
"name": {"type": "string"},
"email": {"type": "string", "format": "email"}
}
},
"roles": {
"type": "array",
"items": {"type": "string", "enum": ["admin", "user", "guest"]}
},
"profile": {
"oneOf": [
{"type": "object", "properties": {"age": {"type": "integer"}}},
{"type": "object", "properties": {"birthdate": {"type": "date"}}}
]
}
},
"x-jsonai-tool-call": {
"name": "send_welcome_email",
"arguments": {"email": "user.email"}
}
}
# ...setup model, tokenizer, tool_registry, etc...
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
```
```python
schema = {
"type": "object",
"properties": {
"book": {
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"year": {"type": "integer"}
}
}
}
}
prompt = "Generate details for a book."
jsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, output_format="xml")
output = jsonformer()
print(output)
```
### Tool Chaining Example
You can chain multiple tools together using the `x-jsonai-tool-chain` schema key. Each tool in the chain receives arguments from the generated data and/or previous tool outputs.
```python
from jsonAI.main import Jsonformer
from jsonAI.tool_registry import ToolRegistry
def add(x, y):
return {"sum": x + y}
def multiply(sum, factor):
return {"product": sum * factor}
registry = ToolRegistry()
registry.register_tool("add", add)
registry.register_tool("multiply", multiply)
schema = {
"type": "object",
"properties": {
"x": {"type": "integer"},
"y": {"type": "integer"},
"factor": {"type": "integer"}
},
"x-jsonai-tool-chain": [
{
"name": "add",
"arguments": {"x": "x", "y": "y"}
},
{
"name": "multiply",
"arguments": {"sum": "sum", "factor": "factor"}
}
]
}
prompt = "Calculate (x + y) * factor."
jsonformer = Jsonformer(
model_backend=None, # Not used in this example
json_schema=schema,
prompt=prompt,
tool_registry=registry
)
# Provide input data (simulate generated data)
jsonformer.value = {"x": 2, "y": 3, "factor": 4}
generated = jsonformer.generate_data()
result = jsonformer._execute_tool_call(generated)
print(result)
# Output will include all intermediate and final tool results.
```
## Performance and Caching
JsonAI includes a performance suite to optimize throughput and latency.
## Quantitative Output Quality Metrics
JsonAI's output quality is validated with statistical metrics. The following table summarizes KL divergence (lower is better) and timing (seconds) for core types, measured using uniform schema sampling and the built-in metrics suite:
| Type | KL Divergence | Time (s) |
|---------|---------------|----------|
| number | 0.016813 | 4.5798 |
| integer | 0.000864 | 4.5564 |
| boolean | 0.000018 | 4.4584 |
| enum | 0.000108 | 4.4765 |
All values are well below the recommended threshold (KL < 0.5), demonstrating high-fidelity, schema-faithful sampling. See `tests/test_metrics_sampling.py` for methodology.
- **PerformanceMonitor**: measures durations for operations (async-safe)
- **CachedJsonformer**: two-level caching
- LRU cache for simple schema-based results
- TTL cache for prompt-based entries for complex schemas
- **OptimizedJsonformer**: all performance features plus cache warmup and batch helpers
- **BatchProcessor**: asynchronous concurrent processing (configurable semaphore)
Example:
```python
from jsonAI.performance import OptimizedJsonformer
from jsonAI.model_backends import DummyBackend
backend = DummyBackend()
schema = {"type":"object","properties":{"name":{"type":"string"}}}
jsonformer = OptimizedJsonformer(
model=backend, # accepts a ModelBackend
tokenizer=backend.tokenizer,
schema=schema,
cache_size=1000,
cache_ttl=3600
)
# Single generation (cached)
print(jsonformer.generate("Generate a name"))
# Batch generation
requests = [
{"prompt":"User A","kwargs":{}},
{"prompt":"User B","kwargs":{}}
]
print(jsonformer.generate_batch(requests))
```
To inspect performance and cache stats at runtime, use the REST API `GET /stats` or:
```python
jsonformer.get_comprehensive_stats()
```
## Output Format Γ Type Coverage
| Type | Example | JSON | XML | YAML | CSV* |
|-----------|----------------|------|------|------|------|
| number | 3.14 | β
| β
| β
| β
|
| integer | 42 | β
| β
| β
| β
|
| boolean | true | β
| β
| β
| β
|
| string | "hello" | β
| β
| β
| β
|
| datetime | "2023-06-29T12:00:00Z" | β
| β
| β
| β
|
| date | "2023-06-29" | β
| β
| β
| β
|
| time | "12:00:00" | β
| β
| β
| β
|
| uuid | "123e4567-e89b-12d3-a456-426614174000" | β
| β
| β
| β
|
| binary | "SGVsbG8=" | β
| β
| β
| β
|
| null | null | β
| (β οΈ) | β
| (β οΈ) |
| array | [1,2,3] | β
| β
| β
| (β οΈ) |
| object | {"a":1} | β
| β
| β
| (β οΈ) |
| enum | "red" | β
| β
| β
| β
|
| p_enum | "blue" | β
| β
| β
| β
|
| p_integer | 7 | β
| β
| β
| β
|
β
= Supported
β οΈ = Supported with caveats (e.g., nulls in XML/CSV, arrays/objects in CSV)
*CSV: Only arrays of objects (tabular) are practical
## Integrations & Capabilities
- LLMs: HuggingFace Transformers, OpenAI, Ollama (vLLM patterns apply)
- FastAPI: See `jsonAI/api.py` and `examples/fastapi_example.py`
- Tool Registry: Register and call Python or MCP tools from schemas; supports tool chaining via `x-jsonai-tool-chain`
- Async Support:
- `FullAsyncJsonformer` for async generation with `model_backend/json_schema/prompt`
- `AsyncJsonformer` wrapper (jsonAI.main) for async tool execution
See the [examples/](examples/) directory for more advanced usage and integration patterns.
## License
This project is licensed under the MIT License.
## Native Library Usage
JsonAI leverages high-performance native libraries for data processing and extensibility:
- **PyYAML** for YAML serialization
- **lxml** for XML output
- **cachetools** for caching
- **requests** and **aiohttp** for HTTP
- **jsonschema** for validation
For any tabular or batch data processing, it is recommended to use **pandas** for reliability and performance. If you extend JsonAI or build custom output logic, prefer native libraries like pandas, numpy, or others for best results.
## Multi-Environment Support
JsonAI supports multiple environments: dev, qa, perf, cte, and prod. Each environment has its own `.env` file at the project root.
- **Local Development:**
Copy or rename the desired `.env.*` file to `.env` before running locally.
```bash
cp .env.dev .env
uvicorn jsonAI.api:app --host 0.0.0.0 --port 8000
```
- **Docker Compose:**
Edit `docker-compose.yml` to set the `env_file` for the desired environment (e.g., `.env.prod`).
Or override at runtime:
```bash
docker-compose --env-file .env.qa up -d
```
- **Docker:**
Pass the environment file at runtime:
```bash
docker run --env-file .env.prod -p 8000:8000 jsonai:latest
```
- **CI/CD:**
The GitHub Actions workflow tests all environments by copying the correct `.env.*` file to `.env` for each matrix job.
- **APP_ENV Variable:**
The Dockerfile sets `APP_ENV` (default: dev) for extensibility. You can override this at runtime.
See `docs/deployment.md` for more details.
## Deployment
- API:
- `uvicorn jsonAI.api:app --host 0.0.0.0 --port 8000`
- CORS is enabled by default for development; harden for production
- Docker:
- `docker build -t jsonai:latest .`
- `docker run -p 8000:8000 jsonai:latest`
- Docker Compose:
- `docker-compose up -d`
- See `docs/deployment.md` for more
## Versioning and Release
PyPI forbids reusing the same filename for the same version. Always bump the version:
```bash
poetry version patch # or minor/major
poetry build
poetry publish -u __token__ -p $PYPI_TOKEN
```
Automate in CI by bumping on tags and using repository secrets for tokens.
## Streaming Support
JsonAI supports streaming data generation (experimental API in examples). Example pattern:
```python
jsonformer = Jsonformer(model_backend, json_schema, prompt)
for data_chunk in jsonformer.stream_generate_data():
print(data_chunk)
```
For async streaming, adapt the pattern with the async wrapper as needed.
## Limitations
- All native JSON schema types are now fully supported and tested, including primitives (`string`, `number`, `integer`, `boolean`, `null`), enums, arrays, objects, oneOf, and nested/complex schemas.
- See [examples/test_json_schema_variety.py](examples/test_json_schema_variety.py) for comprehensive test coverage and usage patterns.
Raw data
{
"_id": null,
"home_page": "https://github.com/kishoretvk/GenerativeJson",
"name": "jsonAI",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "json, llm, schema, generation, fastapi, openai, ollama, transformers",
"author": "1rgs",
"author_email": "kishoretvk9@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/5a/33/69492985defb08a1411507578d986503213fe415d6401ae7662f53e5ec08/jsonai-0.15.2.4.tar.gz",
"platform": null,
"description": "# JsonAI \u2014 Production-Ready Structured JSON Generation with LLMs\n\n## Environment Configuration\n\nThis project uses separate environment files for dev, qa, perf, cte, and prod, each located at the project root as `.env.dev`, `.env.qa`, `.env.perf`, `.env.cte`, and `.env.prod`. These files contain environment-specific variables for OIDC, metrics, tracing, and service endpoints. All files use the same variable structure for consistency and ease of deployment. See the `examples/stripe_schemas/` directory for environment-specific schema configs.\n\nJsonAI is a comprehensive Python library for generating structured JSON data using Large Language Models (LLMs). It provides enterprise-grade features including robust JSON schema validation, multiple model backends, REST API, React frontend, CLI interface, and production deployment configurations.\n\nCurrent version: 0.15.1\n\n## \ud83d\udd14 What\u2019s New in 0.15.1\n\n- Stabilized FastAPI REST API with endpoints for sync/async generation, batch processing, stats, cache management, and schema validation\n- Performance suite:\n - PerformanceMonitor async timing fixes\n - CachedJsonformer with LRU/TTL caching\n - BatchProcessor for efficient concurrent execution\n - OptimizedJsonformer combines caching + batch processing with warmup\n- Async generation improvements:\n - FullAsyncJsonformer (aliased as AsyncJsonformer in the API)\n - AsyncJsonformer wrapper in main.py for async tool execution\n- Logging hygiene: lazy logging interpolation to reduce overhead\n- Packaging: PyPI publish flow cleaned; version bumped to 0.15.1\n\n## \ud83d\ude80 Features\n\n### Quantitative Output Quality Metrics\n\nJsonAI's output quality is validated with statistical metrics. The following table summarizes KL divergence (lower is better) and timing (seconds) for core types, measured using uniform schema sampling and the built-in metrics suite:\n\n| Type | KL Divergence | Time (s) |\n|---------|---------------|----------|\n| number | 0.016813 | 4.5798 |\n| integer | 0.000864 | 4.5564 |\n| boolean | 0.000018 | 4.4584 |\n| enum | 0.000108 | 4.4765 |\n\nAll values are well below the recommended threshold (KL < 0.5), demonstrating high-fidelity, schema-faithful sampling. See `tests/test_metrics_sampling.py` for methodology.\n\n### Core Capabilities\n- Multiple LLM Backends: Ollama, OpenAI, and HuggingFace Transformers\n- Full JSON Schema Coverage: primitives, arrays, objects, enums, nested structures, oneOf\n- Performance Optimization: caching (LRU/TTL), batch processing, async operations\n- Production Ready: Docker, FastAPI, monitoring, scaling considerations\n\n### Interfaces & APIs\n- REST API: FastAPI-based service with OpenAPI docs\n- React Frontend: Modern web interface for JSON generation\n- CLI Interface: Command-line tools for automation and batch processing\n- Python Library: Programmatic access with sync and async support\n\n### Enterprise Features\n- Caching System: Intelligent multi-level caching (LRU/TTL)\n- Batch Processing: Concurrent batch execution\n- Performance Monitoring: Built-in metrics via PerformanceMonitor\n- Schema Validation: Comprehensive validation with jsonschema\n- Multiple Output Formats: JSON, YAML, XML, and CSV\n\n## \ud83d\udce6 Installation\n\n### Option 1: pip (Recommended)\n```bash\npip install jsonai\n```\n\n### Option 2: From Source\n```bash\ngit clone https://github.com/yourusername/JsonAI.git\ncd JsonAI\npoetry install\n```\n\n### Option 3: Docker\n```bash\n# Quick start with Docker\ndocker run -p 8000:8000 jsonai:latest\n\n# Full stack with Docker Compose\ndocker-compose up -d\n```\n\n## Architecture Overview\n\nThe `jsonAI` library is modular and consists of the following components:\n\n- **Jsonformer** (jsonAI.main): Orchestrates generation, formatting, and validation\n- **TypeGenerator**: Generates values for each JSON Schema type\n- **OutputFormatter**: Converts data into JSON, YAML, XML, CSV\n- **SchemaValidator**: Validates data with jsonschema\n- **ToolRegistry**: Registers and resolves Python/MCP tools\n- **Async Paths**:\n - **FullAsyncJsonformer** (jsonAI.async_jsonformer): asynchronous generator taking model_backend, json_schema, prompt (aliased as AsyncJsonformer in API)\n - **AsyncJsonformer wrapper** (jsonAI.main): wraps a Jsonformer instance for async tool execution\n\n## Testing\n\nThe project includes comprehensive tests for each component and integration:\n\n- **Unit Tests**: Test individual components.\n- **Integration Tests**: Validate the interaction between components.\n\nTo run tests:\n\n```bash\npytest tests/\n```\n\n## Quick API Start (FastAPI)\n\nRun the API with uvicorn:\n\n```bash\nuvicorn jsonAI.api:app --host 0.0.0.0 --port 8000\n```\n\nThen open http://localhost:8000/docs for interactive Swagger UI.\n\n### REST Endpoints\n\n- POST /generate \u2014 synchronous generation\n- POST /generate/async \u2014 asynchronous generation\n- POST /generate/batch \u2014 concurrent batch generation\n- GET /stats \u2014 performance and cache statistics\n- DELETE /cache \u2014 clear all caches\n- POST /validate \u2014 validate a JSON schema\n\nMinimal cURL examples:\n\n```bash\n# Sync generate\ncurl -X POST http://localhost:8000/generate -H \"Content-Type: application/json\" -d '{\n \"prompt\": \"Generate a simple user object\",\n \"schema\": {\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"age\":{\"type\":\"integer\"}}},\n \"model_name\": \"ollama\",\n \"model_path\": \"mistral:latest\"\n}'\n\n# Async generate\ncurl -X POST http://localhost:8000/generate/async -H \"Content-Type: application/json\" -d '{\n \"prompt\": \"Generate a simple user object\",\n \"schema\": {\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"age\":{\"type\":\"integer\"}}},\n \"model_name\": \"ollama\",\n \"model_path\": \"mistral:latest\"\n}'\n\n# Batch generate\ncurl -X POST http://localhost:8000/generate/batch -H \"Content-Type: application/json\" -d '{\n \"requests\": [\n {\"prompt\":\"User 1\",\"schema\":{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"}}},\"model_name\":\"ollama\",\"model_path\":\"mistral:latest\"},\n {\"prompt\":\"User 2\",\"schema\":{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"}}},\"model_name\":\"ollama\",\"model_path\":\"mistral:latest\"}\n ],\n \"max_concurrent\": 5\n}'\n```\n\n## Examples\n\n### Stripe Schema Demo\n\nA full demonstration of environment-based configuration and schema-driven generation is provided in both:\n\n- [`examples/stripe_schemas/stripe_schema_demo.py`](examples/stripe_schemas/stripe_schema_demo.py) (Python script)\n- [`examples/stripe_schemas/stripe_schema_demo.ipynb`](examples/stripe_schemas/stripe_schema_demo.ipynb) (Jupyter notebook)\n\n**Features demonstrated:**\n- Loading Stripe-like schemas and environment-specific config files\n- Switching between multiple schemas (`transfer_reversals_metadata`, `tax_rates_metadata`, `transfer_reversals`) and environments (`dev`, `qa`, `cte`, `perf`, `prod`)\n- Using config file naming conventions: `<schema>.<env>.json` (e.g., `transfer_reversals_metadata.dev.json`)\n- Tool chaining and environment-driven config patterns\n- Integration with Ollama and JsonAI's tool registry\n\n**Usage pattern:**\n```python\nenv = \"dev\" # or \"qa\", \"cte\", \"perf\", \"prod\"\nschema_choice = \"transfer_reversals_metadata\" # or \"tax_rates_metadata\", \"transfer_reversals\"\nconfig_path = base_dir / f\"{schema_choice}.{env}.json\"\n```\n\nAll required schema and config files are provided in [`examples/stripe_schemas/`](examples/stripe_schemas/). \nYou can run the Python script or the notebook to see how to generate and validate data for any supported schema/environment combination.\n\nSee the `examples/stripe_schemas/` directory for all related files and configuration patterns.\n\n### Basic JSON Generation\n\n```python\nfrom jsonAI.main import Jsonformer\nfrom jsonAI.model_backends import DummyBackend\nbackend = DummyBackend() # replace with OllamaBackend/OpenAIBackend/etc.\n\n# Primitive type: string\nschema = {\"type\": \"string\"}\nprompt = \"Generate a random color name.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)\nprint(jsonformer()) # e.g., \"blue\"\n\n# Primitive type: number\nschema = {\"type\": \"number\"}\nprompt = \"Generate a random floating point number.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)\nprint(jsonformer()) # e.g., 3.1415\n\n# Enum type\nschema = {\"type\": \"string\", \"enum\": [\"A\", \"B\", \"C\"]}\nprompt = \"Pick a letter from the set A, B, or C.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)\nprint(jsonformer()) # e.g., \"B\"\n\n# Object type\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\"type\": \"string\"},\n \"age\": {\"type\": \"integer\"},\n \"isStudent\": {\"type\": \"boolean\"}\n }\n}\nprompt = \"Generate a person's profile.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt)\noutput = jsonformer()\nprint(output)\n```\n\n\n### XML Output\n### YAML Output\n\n```python\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"city\": {\"type\": \"string\"},\n \"population\": {\"type\": \"integer\"}\n }\n}\nprompt = \"Generate a city profile.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, output_format=\"yaml\")\noutput = jsonformer()\nprint(output)\n```\n\n### CSV Output\n\n```python\nschema = {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\"type\": \"string\"},\n \"score\": {\"type\": \"number\"}\n }\n }\n}\nprompt = \"Generate a list of students and their scores.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, output_format=\"csv\")\noutput = jsonformer()\nprint(output)\n```\n\n\n### CLI Example\n\n#### Basic CLI Usage\n\n```bash\npython -m jsonAI.cli generate --schema schema.json --prompt \"Generate a product\" --output-format json\n```\n\n#### Using Ollama Backend (Recommended for LLMs)\n\n```bash\npython -m jsonAI.cli generate --schema complex_schema.json \\\n --prompt \"Generate a comprehensive person profile as JSON.\" \\\n --use-ollama --ollama-model mistral:latest\n```\n\n#### Features\n- Robustly extracts the first valid JSON object from any LLM output (even if wrapped in <answer> tags or surrounded by extra text)\n- Supports all JSON schema types: primitives, enums, arrays, objects, null, oneOf, nested/complex\n- Validates output against the schema and warns if invalid\n- Pretty-prints objects/arrays, prints primitives/null as-is\n- Production-ready for any schema and LLM output style\n\n#### Example Output\n\n```json\n{\n \"id\": \"profile with all supported JSON schema types.\",\n \"name\": \"re\",\n \"age\": 30,\n \"is_active\": true,\n \"email\": \"example@example.com\",\n \"roles\": [\"admin\", \"user\"],\n \"address\": {\"street\": \"123 Main St\", \"city\": \"Anytown\", \"zip\": \"12345\", \"country\": \"USA\"},\n \"preferences\": {\"newsletter\": true, \"theme\": \"dark\", \"language\": \"en\"},\n \"tags\": [\"tech\", \"developer\"],\n \"score\": 95,\n \"metadata\": {\"key1\": \"value1\", \"key2\": \"value2\"},\n \"status\": \"active\",\n \"history\": [{\"date\": \"2023-01-01\", \"event\": \"joined\", \"details\": \"Account created\"}],\n \"profile_picture\": \"https://example.com/avatar.jpg\",\n \"settings\": {\"notifications\": true, \"privacy\": \"private\"},\n \"null_field\": null\n}\n```\n\nSee `complex_schema.json` for a comprehensive schema example.\n\n### Tool Calling Example\n\n```python\ndef send_email(email):\n print(f\"Sending email to {email}\")\n return \"Email sent\"\n\ntool_registry = ToolRegistry()\ntool_registry.register_tool(\"send_email\", send_email)\n\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"email\": {\"type\": \"string\", \"format\": \"email\"}\n },\n \"x-jsonai-tool-call\": {\n \"name\": \"send_email\",\n \"arguments\": {\"email\": \"email\"}\n }\n}\nprompt = \"Generate a user email.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, tool_registry=tool_registry)\noutput = jsonformer()\nprint(output)\n```\n\n### MCP Integration Example\n\n```python\ndef mcp_callback(tool_name, server_name, kwargs):\n # Simulate MCP call\n return f\"Called {tool_name} on {server_name} with {kwargs}\"\n\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"query\": {\"type\": \"string\"}\n },\n \"x-jsonai-tool-call\": {\n \"name\": \"search_tool\",\n \"arguments\": {\"query\": \"query\"}\n }\n}\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, mcp_callback=mcp_callback)\noutput = jsonformer()\nprint(output)\n```\n\n### Complex Schema Example\n\n```python\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"user\": {\n \"type\": \"object\",\n \"properties\": {\n \"id\": {\"type\": \"uuid\"},\n \"name\": {\"type\": \"string\"},\n \"email\": {\"type\": \"string\", \"format\": \"email\"}\n }\n },\n \"roles\": {\n \"type\": \"array\",\n \"items\": {\"type\": \"string\", \"enum\": [\"admin\", \"user\", \"guest\"]}\n },\n \"profile\": {\n \"oneOf\": [\n {\"type\": \"object\", \"properties\": {\"age\": {\"type\": \"integer\"}}},\n {\"type\": \"object\", \"properties\": {\"birthdate\": {\"type\": \"date\"}}}\n ]\n }\n },\n \"x-jsonai-tool-call\": {\n \"name\": \"send_welcome_email\",\n \"arguments\": {\"email\": \"user.email\"}\n }\n}\n# ...setup model, tokenizer, tool_registry, etc...\njsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)\noutput = jsonformer()\nprint(output)\n```\n\n```python\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"book\": {\n \"type\": \"object\",\n \"properties\": {\n \"title\": {\"type\": \"string\"},\n \"author\": {\"type\": \"string\"},\n \"year\": {\"type\": \"integer\"}\n }\n }\n }\n}\n\nprompt = \"Generate details for a book.\"\njsonformer = Jsonformer(model_backend=backend, json_schema=schema, prompt=prompt, output_format=\"xml\")\noutput = jsonformer()\nprint(output)\n```\n\n### Tool Chaining Example\n\nYou can chain multiple tools together using the `x-jsonai-tool-chain` schema key. Each tool in the chain receives arguments from the generated data and/or previous tool outputs.\n\n```python\nfrom jsonAI.main import Jsonformer\nfrom jsonAI.tool_registry import ToolRegistry\n\ndef add(x, y):\n return {\"sum\": x + y}\n\ndef multiply(sum, factor):\n return {\"product\": sum * factor}\n\nregistry = ToolRegistry()\nregistry.register_tool(\"add\", add)\nregistry.register_tool(\"multiply\", multiply)\n\nschema = {\n \"type\": \"object\",\n \"properties\": {\n \"x\": {\"type\": \"integer\"},\n \"y\": {\"type\": \"integer\"},\n \"factor\": {\"type\": \"integer\"}\n },\n \"x-jsonai-tool-chain\": [\n {\n \"name\": \"add\",\n \"arguments\": {\"x\": \"x\", \"y\": \"y\"}\n },\n {\n \"name\": \"multiply\",\n \"arguments\": {\"sum\": \"sum\", \"factor\": \"factor\"}\n }\n ]\n}\n\nprompt = \"Calculate (x + y) * factor.\"\njsonformer = Jsonformer(\n model_backend=None, # Not used in this example\n json_schema=schema,\n prompt=prompt,\n tool_registry=registry\n)\n# Provide input data (simulate generated data)\njsonformer.value = {\"x\": 2, \"y\": 3, \"factor\": 4}\ngenerated = jsonformer.generate_data()\nresult = jsonformer._execute_tool_call(generated)\nprint(result)\n# Output will include all intermediate and final tool results.\n```\n\n## Performance and Caching\n\nJsonAI includes a performance suite to optimize throughput and latency.\n\n## Quantitative Output Quality Metrics\n\nJsonAI's output quality is validated with statistical metrics. The following table summarizes KL divergence (lower is better) and timing (seconds) for core types, measured using uniform schema sampling and the built-in metrics suite:\n\n| Type | KL Divergence | Time (s) |\n|---------|---------------|----------|\n| number | 0.016813 | 4.5798 |\n| integer | 0.000864 | 4.5564 |\n| boolean | 0.000018 | 4.4584 |\n| enum | 0.000108 | 4.4765 |\n\nAll values are well below the recommended threshold (KL < 0.5), demonstrating high-fidelity, schema-faithful sampling. See `tests/test_metrics_sampling.py` for methodology.\n\n- **PerformanceMonitor**: measures durations for operations (async-safe)\n- **CachedJsonformer**: two-level caching\n - LRU cache for simple schema-based results\n - TTL cache for prompt-based entries for complex schemas\n- **OptimizedJsonformer**: all performance features plus cache warmup and batch helpers\n- **BatchProcessor**: asynchronous concurrent processing (configurable semaphore)\n\nExample:\n\n```python\nfrom jsonAI.performance import OptimizedJsonformer\nfrom jsonAI.model_backends import DummyBackend\n\nbackend = DummyBackend()\nschema = {\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"}}}\n\njsonformer = OptimizedJsonformer(\n model=backend, # accepts a ModelBackend\n tokenizer=backend.tokenizer,\n schema=schema,\n cache_size=1000,\n cache_ttl=3600\n)\n\n# Single generation (cached)\nprint(jsonformer.generate(\"Generate a name\"))\n\n# Batch generation\nrequests = [\n {\"prompt\":\"User A\",\"kwargs\":{}},\n {\"prompt\":\"User B\",\"kwargs\":{}}\n]\nprint(jsonformer.generate_batch(requests))\n```\n\nTo inspect performance and cache stats at runtime, use the REST API `GET /stats` or:\n```python\njsonformer.get_comprehensive_stats()\n```\n\n## Output Format \u00d7 Type Coverage\n\n\n| Type | Example | JSON | XML | YAML | CSV* |\n|-----------|----------------|------|------|------|------|\n| number | 3.14 | \u2705 | \u2705 | \u2705 | \u2705 |\n| integer | 42 | \u2705 | \u2705 | \u2705 | \u2705 |\n| boolean | true | \u2705 | \u2705 | \u2705 | \u2705 |\n| string | \"hello\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| datetime | \"2023-06-29T12:00:00Z\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| date | \"2023-06-29\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| time | \"12:00:00\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| uuid | \"123e4567-e89b-12d3-a456-426614174000\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| binary | \"SGVsbG8=\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| null | null | \u2705 | (\u26a0\ufe0f) | \u2705 | (\u26a0\ufe0f) |\n| array | [1,2,3] | \u2705 | \u2705 | \u2705 | (\u26a0\ufe0f) |\n| object | {\"a\":1} | \u2705 | \u2705 | \u2705 | (\u26a0\ufe0f) |\n| enum | \"red\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| p_enum | \"blue\" | \u2705 | \u2705 | \u2705 | \u2705 |\n| p_integer | 7 | \u2705 | \u2705 | \u2705 | \u2705 |\n\n\u2705 = Supported\n\u26a0\ufe0f = Supported with caveats (e.g., nulls in XML/CSV, arrays/objects in CSV)\n*CSV: Only arrays of objects (tabular) are practical\n\n\n## Integrations & Capabilities\n\n- LLMs: HuggingFace Transformers, OpenAI, Ollama (vLLM patterns apply)\n- FastAPI: See `jsonAI/api.py` and `examples/fastapi_example.py`\n- Tool Registry: Register and call Python or MCP tools from schemas; supports tool chaining via `x-jsonai-tool-chain`\n- Async Support:\n - `FullAsyncJsonformer` for async generation with `model_backend/json_schema/prompt`\n - `AsyncJsonformer` wrapper (jsonAI.main) for async tool execution\n\nSee the [examples/](examples/) directory for more advanced usage and integration patterns.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Native Library Usage\n\nJsonAI leverages high-performance native libraries for data processing and extensibility:\n\n- **PyYAML** for YAML serialization\n- **lxml** for XML output\n- **cachetools** for caching\n- **requests** and **aiohttp** for HTTP\n- **jsonschema** for validation\n\nFor any tabular or batch data processing, it is recommended to use **pandas** for reliability and performance. If you extend JsonAI or build custom output logic, prefer native libraries like pandas, numpy, or others for best results.\n\n## Multi-Environment Support\n\nJsonAI supports multiple environments: dev, qa, perf, cte, and prod. Each environment has its own `.env` file at the project root.\n\n- **Local Development:** \n Copy or rename the desired `.env.*` file to `.env` before running locally.\n ```bash\n cp .env.dev .env\n uvicorn jsonAI.api:app --host 0.0.0.0 --port 8000\n ```\n\n- **Docker Compose:** \n Edit `docker-compose.yml` to set the `env_file` for the desired environment (e.g., `.env.prod`). \n Or override at runtime:\n ```bash\n docker-compose --env-file .env.qa up -d\n ```\n\n- **Docker:** \n Pass the environment file at runtime:\n ```bash\n docker run --env-file .env.prod -p 8000:8000 jsonai:latest\n ```\n\n- **CI/CD:** \n The GitHub Actions workflow tests all environments by copying the correct `.env.*` file to `.env` for each matrix job.\n\n- **APP_ENV Variable:** \n The Dockerfile sets `APP_ENV` (default: dev) for extensibility. You can override this at runtime.\n\nSee `docs/deployment.md` for more details.\n\n## Deployment\n\n- API:\n - `uvicorn jsonAI.api:app --host 0.0.0.0 --port 8000`\n - CORS is enabled by default for development; harden for production\n- Docker:\n - `docker build -t jsonai:latest .`\n - `docker run -p 8000:8000 jsonai:latest`\n- Docker Compose:\n - `docker-compose up -d`\n- See `docs/deployment.md` for more\n\n## Versioning and Release\n\nPyPI forbids reusing the same filename for the same version. Always bump the version:\n\n```bash\npoetry version patch # or minor/major\npoetry build\npoetry publish -u __token__ -p $PYPI_TOKEN\n```\n\nAutomate in CI by bumping on tags and using repository secrets for tokens.\n\n## Streaming Support\n\nJsonAI supports streaming data generation (experimental API in examples). Example pattern:\n\n```python\njsonformer = Jsonformer(model_backend, json_schema, prompt)\nfor data_chunk in jsonformer.stream_generate_data():\n print(data_chunk)\n```\n\nFor async streaming, adapt the pattern with the async wrapper as needed.\n\n## Limitations\n\n- All native JSON schema types are now fully supported and tested, including primitives (`string`, `number`, `integer`, `boolean`, `null`), enums, arrays, objects, oneOf, and nested/complex schemas.\n- See [examples/test_json_schema_variety.py](examples/test_json_schema_variety.py) for comprehensive test coverage and usage patterns.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for dynamic JSON generation based on schemas using language models.",
"version": "0.15.2.4",
"project_urls": {
"Documentation": "https://github.com/kishoretvk/GenerativeJson#readme",
"Homepage": "https://github.com/kishoretvk/GenerativeJson",
"Repository": "https://github.com/kishoretvk/GenerativeJson"
},
"split_keywords": [
"json",
" llm",
" schema",
" generation",
" fastapi",
" openai",
" ollama",
" transformers"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "782f492895478c250bb9134bd5182f4f1bf906a4a4c9673250dc20547cf7971d",
"md5": "b4d3ae36dac1fc32d99c6076ad714b1e",
"sha256": "e7fc9ce9db0160a3a338f085228106c7dea128c5584e38a6ad32a6d9aa667679"
},
"downloads": -1,
"filename": "jsonai-0.15.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b4d3ae36dac1fc32d99c6076ad714b1e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 52775,
"upload_time": "2025-08-18T10:41:59",
"upload_time_iso_8601": "2025-08-18T10:41:59.768339Z",
"url": "https://files.pythonhosted.org/packages/78/2f/492895478c250bb9134bd5182f4f1bf906a4a4c9673250dc20547cf7971d/jsonai-0.15.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5a3369492985defb08a1411507578d986503213fe415d6401ae7662f53e5ec08",
"md5": "bd56189439b95f16162493172d75d9e5",
"sha256": "3eb570c0ba6484fe586fdbee00ffd3f97c5985d0d366c20650b3051083df5ed2"
},
"downloads": -1,
"filename": "jsonai-0.15.2.4.tar.gz",
"has_sig": false,
"md5_digest": "bd56189439b95f16162493172d75d9e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 49944,
"upload_time": "2025-08-18T10:42:00",
"upload_time_iso_8601": "2025-08-18T10:42:00.896691Z",
"url": "https://files.pythonhosted.org/packages/5a/33/69492985defb08a1411507578d986503213fe415d6401ae7662f53e5ec08/jsonai-0.15.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-18 10:42:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kishoretvk",
"github_project": "GenerativeJson",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "jsonai"
}