robust-json-parser

Name	robust-json-parser JSON
Version	0.1.5 JSON
	download
home_page	None
Summary	Robust JSON extraction and repair utilities for LLM-generated content.
upload_time	2025-10-16 03:45:45
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	json llm ai validation repair parser extraction chatgpt claude
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # 🛠️ robust-json

[![Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Robust JSON extraction and repair utilities for LLM-generated content.**

Parse JSON from messy LLM outputs with confidence. `robust-json` extracts and repairs JSON even when models mix commentary with structured data, use incorrect quotes, add trailing commas, include comments, or truncate responses mid-object.

---

## ✨ Why robust-json?

Large Language Models are powerful but inconsistent when generating JSON. They might:

- 📝 **Mix text and JSON**: Embed JSON inside markdown code blocks or conversational responses
- 💬 **Add comments**: Include `//` or `#` comments that break standard JSON parsers
- 🔤 **Use wrong quotes**: Generate single quotes (`'`) instead of double quotes (`"`)
- 🔚 **Add trailing commas**: Place commas after the last item in arrays/objects
- ✂️ **Truncate output**: Stop mid-JSON due to token limits or errors

`robust-json` handles all these cases automatically, so you can focus on using the data instead of fighting with parser errors.

---

## 🚀 Features

- **🔍 Smart extraction**: Automatically finds JSON objects and arrays within free-form text
- **🔧 Auto-repair**: Fixes common LLM errors including:
  - Single-quoted strings → double quotes
  - Mixed quote types (e.g., `'text"` → `'text'`)
  - Inline comments (`//` and `#`)
  - Trailing commas
  - Unclosed braces and brackets
- **🎯 Multiple parsers**: Falls back through `json` → `ast.literal_eval` for maximum compatibility
- **⚡ Performance**: Optional speedups with `regex` (enhanced regex engine) and `numba` (JIT-compiled bracket scanning)
- **🌍 Unicode support**: Handles international characters and emoji seamlessly

---

## 📦 Installation

**Basic installation:**
```bash
pip install robust-json-parser
```

**With performance optimizations (numba JIT):**
```bash
pip install robust-json-parser[speedups]
```

**With regex (enhanced regex engine with better Unicode support):**
```bash
pip install robust-json-parser[regex]
```

**All extras:**
```bash
pip install robust-json-parser[speedups,regex]
```

**Requirements:** Python 3.9+

---

## 🎯 Quick Start

### Basic Usage

```python
from robust_json import loads

# LLM output with mixed formatting
llm_response = """
Sure! Here's the data you requested:
```json
{
  "name": "Alice",
  "age": 30,
  "hobbies": ["reading", "coding",],  // trailing comma
  "active": true,  # Python-style comment
}

Hope this helps!
"""

data = loads(llm_response)
print(data)
# {'name': 'Alice', 'age': 30, 'hobbies': ['reading', 'coding'], 'active': True}
```

### Handling Malformed JSON

```python
from robust_json import loads

# Mixed quotes, comments, and multilingual text
message = """
Hello, I'm a recruitment consultant. Here's the job description for your matching assessment:
```json
{"id": "algo", "position": "Large Language Model Algorithm Engineer",
# this is the keywords list used to analyze the candidate
 "keywords": {"positive": ["PEFT", "RLHF"], "negative": ["CNN", "RNN"]}, # negative keywords is supported
 "summary": 'The candidate has some AI background, but lacks experience."
 }
"""

data = loads(message)
print(data["keywords"]["positive"])
# ['PEFT', 'RLHF']
```

### Truncated/Partial JSON

```python
from robust_json import loads

# JSON cut off mid-object
incomplete = '{"user": {"name": "Bob", "email": "bob@example.com"'

data = loads(incomplete)
print(data)
# {'user': {'name': 'Bob', 'email': 'bob@example.com'}}
```

### Extract Multiple JSON Objects

```python
from robust_json import extract_all, RobustJSONParser

text = """
First result: {"a": 1, "b": 2}
Some text in between...
Second result: {"x": 10, "y": 20}
"""

# Get all extractions with metadata
extractions = extract_all(text)
for extraction in extractions:
    print(f"Found at position {extraction.start}: {extraction.text}")

# Or just get the parsed objects
parser = RobustJSONParser()
objects = parser.parse_all(text)
print(objects)
# [{'a': 1, 'b': 2}, {'x': 10, 'y': 20}]
```

---

## 📚 API Reference

### `loads(source, *, allow_partial=True, default=None, strict=False)`

Parse the first JSON object found in the source text.

**Parameters:**
- `source` (str): Text containing JSON
- `allow_partial` (bool): If `True`, auto-complete truncated JSON (default: `True`)
- `default` (Optional): Return this value if no JSON found (default: `None` raises error)
- `strict` (bool): If `True`, only extract from code blocks and brace-delimited content (default: `False`)

**Returns:** Parsed Python object (dict, list, etc.)

**Raises:** `ValueError` if no JSON found and no default provided

---

### `extract(source, *, allow_partial=True)`

Extract the first JSON-like fragment with metadata.

**Returns:** `Extraction` object or `None`

---

### `extract_all(source, *, allow_partial=True)`

Extract all JSON-like fragments from text.

**Returns:** List of `Extraction` objects

---

### `RobustJSONParser`

Main parser class for advanced usage.

**Methods:**
- `extract(source, limit=None)`: Find JSON fragments (returns list of `Extraction` objects)
- `parse_first(source)`: Parse first JSON object (returns parsed object or `None`)
- `parse_all(source)`: Parse all JSON objects (returns list of parsed objects)

**Parameters:**
- `allow_partial` (bool): Auto-complete truncated JSON (default: `True`)
- `strict` (bool): Only extract from explicit JSON contexts (default: `False`)

---

### `Extraction`

Dataclass representing an extracted JSON candidate.

**Attributes:**
- `text` (str): The extracted text
- `start` (int): Starting position in source
- `end` (int): Ending position in source
- `is_partial` (bool): Whether the extraction appears truncated
- `repaired` (Optional[str]): The repaired version after processing

---

## 🔧 How It Works

1. **🔎 Extraction**: Scans text for JSON patterns using:
   - Markdown code blocks (`` ```json ... ``` ``)
   - Brace-balanced regions (`{...}`, `[...]`)

2. **🛠️ Repair**: Applies fixes in order:
   - Strip `//` and `#` comments
   - Fix mixed quote types (e.g., `'text"` → `'text'`)
   - Normalize single quotes to double quotes
   - Remove trailing commas
   - Balance unclosed braces (if `allow_partial=True`)

3. **✅ Parse**: Attempts parsing with:
   - `json.loads()` (standard JSON)
   - `ast.literal_eval()` (Python literals)

4. **📊 Return**: Returns first successful parse or continues to next candidate

---

## 🎨 Use Cases

- **🤖 LLM Integration**: Parse structured output from ChatGPT, Claude, Llama, etc.
- **📊 Data Extraction**: Extract JSON from logs, documentation, or mixed-format files
- **🔄 API Responses**: Handle malformed API responses gracefully
- **🧪 Testing**: Validate and repair JSON in test fixtures
- **📝 Data Migration**: Clean up inconsistent JSON during migrations

---

## ⚡ Performance Tips

1. **Install speedups** for large-scale processing:
   ```bash
   pip install robust-json-parser[speedups]  # numba JIT compilation
   pip install robust-json-parser[regex]  # enhanced regex engine with better Unicode support
   ```

2. **Use strict mode** when JSON is always in code blocks:
   ```python
   loads(text, strict=True)  # Faster, skips fallback attempts
   ```

3. **Disable partial completion** if you know JSON is complete:
   ```python
   loads(text, allow_partial=False)  # Skips brace-balancing step
   ```

4. **Reuse parser instance** for multiple parses:
   ```python
   parser = RobustJSONParser()
   for text in texts:
       data = parser.parse_first(text)
   ```

---

## 🧪 Test Status

**Overall Test Coverage: 98.6% (140/142 tests passing)**

| Category | Test File | Passed | Failed | Total | Pass Rate | Status |
|----------|-----------|--------|--------|-------|-----------|---------|
| **Core Functionality** | test_parser.py | 5 | 0 | 5 | 100.0% | ✅ |
| **Comprehensive Tests** | test_comprehensive.py | 50 | 1 | 51 | 98.0% | ✅ |
| **Edge Cases** | test_edge_cases.py | 38 | 1 | 39 | 97.4% | ✅ |
| **LLM Scenarios** | test_llm_scenarios.py | 31 | 0 | 31 | 100.0% | ✅ |
| **Performance** | test_performance.py | 11 | 0 | 11 | 100.0% | ✅ |
| **Batch Processing** | test_batch_performance.py | 5 | 0 | 5 | 100.0% | ✅ |

### Test Categories Breakdown

- **✅ Core Functionality (100%)**: Basic parsing, extraction, and repair features
- **✅ Comprehensive Tests (98.0%)**: Real-world scenarios, complex nested structures, multilingual content
- **✅ Edge Cases (97.4%)**: Unicode handling, malformed JSON, bracket matching, error recovery
- **✅ LLM Scenarios (100%)**: ChatGPT/Claude-style outputs, conversational text extraction
- **✅ Performance (100%)**: Large datasets, memory usage, parsing speed benchmarks
- **✅ Batch Processing (100%)**: Parallel processing, multiprocessing, error handling

### Known Issues (2 failing tests)
- **Extraction Order**: `extract_all` function needs to preserve proper ordering
- **Deep Nesting**: Complex nested structures with mismatched brackets need enhanced repair

---

## 🤝 Contributing

We welcome contributions from developers of all skill levels! Whether you're fixing bugs, adding features, or improving documentation, your help makes this project better for everyone.

### 🎯 How to Contribute

1. **🐛 Bug Reports**: Found an issue? Open a GitHub issue with:
   - Clear description of the problem
   - Minimal reproducible example
   - Expected vs actual behavior

2. **✨ Feature Requests**: Have an idea? We'd love to hear it! Open an issue to discuss:
   - Use case and motivation
   - Proposed implementation approach
   - Any breaking changes

3. **🔧 Code Contributions**: Ready to code? Here's how:
   ```bash
   # Fork and clone the repository
   git clone https://github.com/your-username/robust-json.git
   cd robust-json
   
   # Install in development mode
   pip install -e ".[speedups,regex,dev]"
   
   # Run tests to ensure everything works
   pytest tests/
   
   # Make your changes and test them
   pytest tests/ -v
   
   # Submit a pull request
   ```

### 🧪 Testing Your Changes

```bash
# Run all tests
pytest tests/

# Run specific test categories
pytest tests/test_parser.py          # Core functionality
pytest tests/test_comprehensive.py   # Comprehensive scenarios
pytest tests/test_llm_scenarios.py   # LLM-specific cases
pytest tests/test_edge_cases.py      # Edge cases and error handling
pytest tests/test_performance.py     # Performance benchmarks

# Run with coverage
pytest tests/ --cov=robust_json --cov-report=html
```

### 🎨 Areas We'd Love Help With

- **🌍 Internationalization**: Better support for non-Latin scripts and RTL languages
- **⚡ Performance**: Optimize parsing speed for very large JSON objects
- **🔍 LLM Integration**: Improve extraction from more LLM output formats
- **📚 Documentation**: Examples, tutorials, and API documentation
- **🧪 Test Coverage**: Add more edge cases and real-world scenarios
- **🐛 Bug Fixes**: Help us get to 100% test pass rate!

### 📋 Development Guidelines

- **Code Style**: Follow PEP 8, use type hints, and add docstrings
- **Testing**: Add tests for new features and bug fixes
- **Documentation**: Update README and docstrings as needed
- **Performance**: Consider performance impact of changes
- **Compatibility**: Maintain Python 3.9+ compatibility

### 🏆 Recognition

Contributors will be recognized in our README and release notes. We appreciate every contribution, no matter how small!

**Ready to get started?** Check out our [open issues](https://github.com/callzhang/robust-json/issues) or start with the failing tests above!

---

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 🙏 Acknowledgments

Built for developers working with LLM-generated content who need reliability without sacrificing flexibility.

---

**Made with ❤️ for the AI/LLM community**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "robust-json-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Derek <derek@preseen.ai>",
    "keywords": "json, llm, ai, validation, repair, parser, extraction, chatgpt, claude",
    "author": null,
    "author_email": "Derek <derek@preseen.ai>",
    "download_url": "https://files.pythonhosted.org/packages/7b/45/a210bf643e0a8123f2c7f2c142510c34194d959e95164bbb18c18318617c/robust_json_parser-0.1.5.tar.gz",
    "platform": null,
    "description": "# \ud83d\udee0\ufe0f robust-json\n\n[![Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**Robust JSON extraction and repair utilities for LLM-generated content.**\n\nParse JSON from messy LLM outputs with confidence. `robust-json` extracts and repairs JSON even when models mix commentary with structured data, use incorrect quotes, add trailing commas, include comments, or truncate responses mid-object.\n\n---\n\n## \u2728 Why robust-json?\n\nLarge Language Models are powerful but inconsistent when generating JSON. They might:\n\n- \ud83d\udcdd **Mix text and JSON**: Embed JSON inside markdown code blocks or conversational responses\n- \ud83d\udcac **Add comments**: Include `//` or `#` comments that break standard JSON parsers\n- \ud83d\udd24 **Use wrong quotes**: Generate single quotes (`'`) instead of double quotes (`\"`)\n- \ud83d\udd1a **Add trailing commas**: Place commas after the last item in arrays/objects\n- \u2702\ufe0f **Truncate output**: Stop mid-JSON due to token limits or errors\n\n`robust-json` handles all these cases automatically, so you can focus on using the data instead of fighting with parser errors.\n\n---\n\n## \ud83d\ude80 Features\n\n- **\ud83d\udd0d Smart extraction**: Automatically finds JSON objects and arrays within free-form text\n- **\ud83d\udd27 Auto-repair**: Fixes common LLM errors including:\n  - Single-quoted strings \u2192 double quotes\n  - Mixed quote types (e.g., `'text\"` \u2192 `'text'`)\n  - Inline comments (`//` and `#`)\n  - Trailing commas\n  - Unclosed braces and brackets\n- **\ud83c\udfaf Multiple parsers**: Falls back through `json` \u2192 `ast.literal_eval` for maximum compatibility\n- **\u26a1 Performance**: Optional speedups with `regex` (enhanced regex engine) and `numba` (JIT-compiled bracket scanning)\n- **\ud83c\udf0d Unicode support**: Handles international characters and emoji seamlessly\n\n---\n\n## \ud83d\udce6 Installation\n\n**Basic installation:**\n```bash\npip install robust-json-parser\n```\n\n**With performance optimizations (numba JIT):**\n```bash\npip install robust-json-parser[speedups]\n```\n\n**With regex (enhanced regex engine with better Unicode support):**\n```bash\npip install robust-json-parser[regex]\n```\n\n**All extras:**\n```bash\npip install robust-json-parser[speedups,regex]\n```\n\n**Requirements:** Python 3.9+\n\n---\n\n## \ud83c\udfaf Quick Start\n\n### Basic Usage\n\n```python\nfrom robust_json import loads\n\n# LLM output with mixed formatting\nllm_response = \"\"\"\nSure! Here's the data you requested:\n```json\n{\n  \"name\": \"Alice\",\n  \"age\": 30,\n  \"hobbies\": [\"reading\", \"coding\",],  // trailing comma\n  \"active\": true,  # Python-style comment\n}\n\nHope this helps!\n\"\"\"\n\ndata = loads(llm_response)\nprint(data)\n# {'name': 'Alice', 'age': 30, 'hobbies': ['reading', 'coding'], 'active': True}\n```\n\n### Handling Malformed JSON\n\n```python\nfrom robust_json import loads\n\n# Mixed quotes, comments, and multilingual text\nmessage = \"\"\"\nHello, I'm a recruitment consultant. Here's the job description for your matching assessment:\n```json\n{\"id\": \"algo\", \"position\": \"Large Language Model Algorithm Engineer\",\n# this is the keywords list used to analyze the candidate\n \"keywords\": {\"positive\": [\"PEFT\", \"RLHF\"], \"negative\": [\"CNN\", \"RNN\"]}, # negative keywords is supported\n \"summary\": 'The candidate has some AI background, but lacks experience.\"\n }\n\"\"\"\n\ndata = loads(message)\nprint(data[\"keywords\"][\"positive\"])\n# ['PEFT', 'RLHF']\n```\n\n### Truncated/Partial JSON\n\n```python\nfrom robust_json import loads\n\n# JSON cut off mid-object\nincomplete = '{\"user\": {\"name\": \"Bob\", \"email\": \"bob@example.com\"'\n\ndata = loads(incomplete)\nprint(data)\n# {'user': {'name': 'Bob', 'email': 'bob@example.com'}}\n```\n\n### Extract Multiple JSON Objects\n\n```python\nfrom robust_json import extract_all, RobustJSONParser\n\ntext = \"\"\"\nFirst result: {\"a\": 1, \"b\": 2}\nSome text in between...\nSecond result: {\"x\": 10, \"y\": 20}\n\"\"\"\n\n# Get all extractions with metadata\nextractions = extract_all(text)\nfor extraction in extractions:\n    print(f\"Found at position {extraction.start}: {extraction.text}\")\n\n# Or just get the parsed objects\nparser = RobustJSONParser()\nobjects = parser.parse_all(text)\nprint(objects)\n# [{'a': 1, 'b': 2}, {'x': 10, 'y': 20}]\n```\n\n---\n\n## \ud83d\udcda API Reference\n\n### `loads(source, *, allow_partial=True, default=None, strict=False)`\n\nParse the first JSON object found in the source text.\n\n**Parameters:**\n- `source` (str): Text containing JSON\n- `allow_partial` (bool): If `True`, auto-complete truncated JSON (default: `True`)\n- `default` (Optional): Return this value if no JSON found (default: `None` raises error)\n- `strict` (bool): If `True`, only extract from code blocks and brace-delimited content (default: `False`)\n\n**Returns:** Parsed Python object (dict, list, etc.)\n\n**Raises:** `ValueError` if no JSON found and no default provided\n\n---\n\n### `extract(source, *, allow_partial=True)`\n\nExtract the first JSON-like fragment with metadata.\n\n**Returns:** `Extraction` object or `None`\n\n---\n\n### `extract_all(source, *, allow_partial=True)`\n\nExtract all JSON-like fragments from text.\n\n**Returns:** List of `Extraction` objects\n\n---\n\n### `RobustJSONParser`\n\nMain parser class for advanced usage.\n\n**Methods:**\n- `extract(source, limit=None)`: Find JSON fragments (returns list of `Extraction` objects)\n- `parse_first(source)`: Parse first JSON object (returns parsed object or `None`)\n- `parse_all(source)`: Parse all JSON objects (returns list of parsed objects)\n\n**Parameters:**\n- `allow_partial` (bool): Auto-complete truncated JSON (default: `True`)\n- `strict` (bool): Only extract from explicit JSON contexts (default: `False`)\n\n---\n\n### `Extraction`\n\nDataclass representing an extracted JSON candidate.\n\n**Attributes:**\n- `text` (str): The extracted text\n- `start` (int): Starting position in source\n- `end` (int): Ending position in source\n- `is_partial` (bool): Whether the extraction appears truncated\n- `repaired` (Optional[str]): The repaired version after processing\n\n---\n\n## \ud83d\udd27 How It Works\n\n1. **\ud83d\udd0e Extraction**: Scans text for JSON patterns using:\n   - Markdown code blocks (`` ```json ... ``` ``)\n   - Brace-balanced regions (`{...}`, `[...]`)\n\n2. **\ud83d\udee0\ufe0f Repair**: Applies fixes in order:\n   - Strip `//` and `#` comments\n   - Fix mixed quote types (e.g., `'text\"` \u2192 `'text'`)\n   - Normalize single quotes to double quotes\n   - Remove trailing commas\n   - Balance unclosed braces (if `allow_partial=True`)\n\n3. **\u2705 Parse**: Attempts parsing with:\n   - `json.loads()` (standard JSON)\n   - `ast.literal_eval()` (Python literals)\n\n4. **\ud83d\udcca Return**: Returns first successful parse or continues to next candidate\n\n---\n\n## \ud83c\udfa8 Use Cases\n\n- **\ud83e\udd16 LLM Integration**: Parse structured output from ChatGPT, Claude, Llama, etc.\n- **\ud83d\udcca Data Extraction**: Extract JSON from logs, documentation, or mixed-format files\n- **\ud83d\udd04 API Responses**: Handle malformed API responses gracefully\n- **\ud83e\uddea Testing**: Validate and repair JSON in test fixtures\n- **\ud83d\udcdd Data Migration**: Clean up inconsistent JSON during migrations\n\n---\n\n## \u26a1 Performance Tips\n\n1. **Install speedups** for large-scale processing:\n   ```bash\n   pip install robust-json-parser[speedups]  # numba JIT compilation\n   pip install robust-json-parser[regex]  # enhanced regex engine with better Unicode support\n   ```\n\n2. **Use strict mode** when JSON is always in code blocks:\n   ```python\n   loads(text, strict=True)  # Faster, skips fallback attempts\n   ```\n\n3. **Disable partial completion** if you know JSON is complete:\n   ```python\n   loads(text, allow_partial=False)  # Skips brace-balancing step\n   ```\n\n4. **Reuse parser instance** for multiple parses:\n   ```python\n   parser = RobustJSONParser()\n   for text in texts:\n       data = parser.parse_first(text)\n   ```\n\n---\n\n## \ud83e\uddea Test Status\n\n**Overall Test Coverage: 98.6% (140/142 tests passing)**\n\n| Category | Test File | Passed | Failed | Total | Pass Rate | Status |\n|----------|-----------|--------|--------|-------|-----------|---------|\n| **Core Functionality** | test_parser.py | 5 | 0 | 5 | 100.0% | \u2705 |\n| **Comprehensive Tests** | test_comprehensive.py | 50 | 1 | 51 | 98.0% | \u2705 |\n| **Edge Cases** | test_edge_cases.py | 38 | 1 | 39 | 97.4% | \u2705 |\n| **LLM Scenarios** | test_llm_scenarios.py | 31 | 0 | 31 | 100.0% | \u2705 |\n| **Performance** | test_performance.py | 11 | 0 | 11 | 100.0% | \u2705 |\n| **Batch Processing** | test_batch_performance.py | 5 | 0 | 5 | 100.0% | \u2705 |\n\n### Test Categories Breakdown\n\n- **\u2705 Core Functionality (100%)**: Basic parsing, extraction, and repair features\n- **\u2705 Comprehensive Tests (98.0%)**: Real-world scenarios, complex nested structures, multilingual content\n- **\u2705 Edge Cases (97.4%)**: Unicode handling, malformed JSON, bracket matching, error recovery\n- **\u2705 LLM Scenarios (100%)**: ChatGPT/Claude-style outputs, conversational text extraction\n- **\u2705 Performance (100%)**: Large datasets, memory usage, parsing speed benchmarks\n- **\u2705 Batch Processing (100%)**: Parallel processing, multiprocessing, error handling\n\n### Known Issues (2 failing tests)\n- **Extraction Order**: `extract_all` function needs to preserve proper ordering\n- **Deep Nesting**: Complex nested structures with mismatched brackets need enhanced repair\n\n---\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions from developers of all skill levels! Whether you're fixing bugs, adding features, or improving documentation, your help makes this project better for everyone.\n\n### \ud83c\udfaf How to Contribute\n\n1. **\ud83d\udc1b Bug Reports**: Found an issue? Open a GitHub issue with:\n   - Clear description of the problem\n   - Minimal reproducible example\n   - Expected vs actual behavior\n\n2. **\u2728 Feature Requests**: Have an idea? We'd love to hear it! Open an issue to discuss:\n   - Use case and motivation\n   - Proposed implementation approach\n   - Any breaking changes\n\n3. **\ud83d\udd27 Code Contributions**: Ready to code? Here's how:\n   ```bash\n   # Fork and clone the repository\n   git clone https://github.com/your-username/robust-json.git\n   cd robust-json\n   \n   # Install in development mode\n   pip install -e \".[speedups,regex,dev]\"\n   \n   # Run tests to ensure everything works\n   pytest tests/\n   \n   # Make your changes and test them\n   pytest tests/ -v\n   \n   # Submit a pull request\n   ```\n\n### \ud83e\uddea Testing Your Changes\n\n```bash\n# Run all tests\npytest tests/\n\n# Run specific test categories\npytest tests/test_parser.py          # Core functionality\npytest tests/test_comprehensive.py   # Comprehensive scenarios\npytest tests/test_llm_scenarios.py   # LLM-specific cases\npytest tests/test_edge_cases.py      # Edge cases and error handling\npytest tests/test_performance.py     # Performance benchmarks\n\n# Run with coverage\npytest tests/ --cov=robust_json --cov-report=html\n```\n\n### \ud83c\udfa8 Areas We'd Love Help With\n\n- **\ud83c\udf0d Internationalization**: Better support for non-Latin scripts and RTL languages\n- **\u26a1 Performance**: Optimize parsing speed for very large JSON objects\n- **\ud83d\udd0d LLM Integration**: Improve extraction from more LLM output formats\n- **\ud83d\udcda Documentation**: Examples, tutorials, and API documentation\n- **\ud83e\uddea Test Coverage**: Add more edge cases and real-world scenarios\n- **\ud83d\udc1b Bug Fixes**: Help us get to 100% test pass rate!\n\n### \ud83d\udccb Development Guidelines\n\n- **Code Style**: Follow PEP 8, use type hints, and add docstrings\n- **Testing**: Add tests for new features and bug fixes\n- **Documentation**: Update README and docstrings as needed\n- **Performance**: Consider performance impact of changes\n- **Compatibility**: Maintain Python 3.9+ compatibility\n\n### \ud83c\udfc6 Recognition\n\nContributors will be recognized in our README and release notes. We appreciate every contribution, no matter how small!\n\n**Ready to get started?** Check out our [open issues](https://github.com/callzhang/robust-json/issues) or start with the failing tests above!\n\n---\n\n## \ud83d\udcdd License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\nBuilt for developers working with LLM-generated content who need reliability without sacrificing flexibility.\n\n---\n\n**Made with \u2764\ufe0f for the AI/LLM community**\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Robust JSON extraction and repair utilities for LLM-generated content.",
    "version": "0.1.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/callzhang/robust-json/issues",
        "Documentation": "https://github.com/callzhang/robust-json#readme",
        "Homepage": "https://github.com/callzhang/robust-json",
        "Repository": "https://github.com/callzhang/robust-json"
    },
    "split_keywords": [
        "json",
        " llm",
        " ai",
        " validation",
        " repair",
        " parser",
        " extraction",
        " chatgpt",
        " claude"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "04da328894a700a03dcb5acf2d355eb87fdc48bc0017e4ca779b9d2395d45344",
                "md5": "e7020ce327ee7eac811f5e03ad4e0b5b",
                "sha256": "607d377cadf6330f1399e6858d41251db4fce3307adc3aa8983eac6bd3c1bb49"
            },
            "downloads": -1,
            "filename": "robust_json_parser-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e7020ce327ee7eac811f5e03ad4e0b5b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 13259,
            "upload_time": "2025-10-16T03:45:44",
            "upload_time_iso_8601": "2025-10-16T03:45:44.050767Z",
            "url": "https://files.pythonhosted.org/packages/04/da/328894a700a03dcb5acf2d355eb87fdc48bc0017e4ca779b9d2395d45344/robust_json_parser-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7b45a210bf643e0a8123f2c7f2c142510c34194d959e95164bbb18c18318617c",
                "md5": "791e74ddf8d8c8fb96c7100311c58aaf",
                "sha256": "60a57ec6aba14fb7c9acffc745951cb1249dc35168841354af81571fcbab0193"
            },
            "downloads": -1,
            "filename": "robust_json_parser-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "791e74ddf8d8c8fb96c7100311c58aaf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 32707,
            "upload_time": "2025-10-16T03:45:45",
            "upload_time_iso_8601": "2025-10-16T03:45:45.255487Z",
            "url": "https://files.pythonhosted.org/packages/7b/45/a210bf643e0a8123f2c7f2c142510c34194d959e95164bbb18c18318617c/robust_json_parser-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-16 03:45:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "callzhang",
    "github_project": "robust-json",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "robust-json-parser"
}

None