llm-conversation-parser


Namellm-conversation-parser JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/nullhyeon/llm-conversation-parser
SummaryA Python library for parsing LLM conversation JSON files into RAG-optimized format
upload_time2025-10-13 13:18:45
maintainerNone
docs_urlNone
authornullhyeon
requires_python>=3.8
licenseMIT
keywords llm conversation parser rag claude gpt grok json
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLM Conversation Parser

A Python library for parsing LLM conversation JSON files into RAG-optimized format.

## Documentation

- **[Korean Documentation](README_ko.md)** - Detailed documentation for Korean users
- **[LLM JSON Format Guide](LLM_JSON_FORMATS.md)** - Claude, ChatGPT, Grok JSON file structure analysis

## Supported LLMs

- **Claude** (Anthropic)
- **ChatGPT** (OpenAI)
- **Grok** (xAI)

## Installation

```bash
pip install llm-conversation-parser
```

## Quick Start

```python
from llm_conversation_parser import LLMConversationParser

# Initialize parser
parser = LLMConversationParser()

# Parse single file (auto-detect LLM type)
data = parser.parse_file("claude_conversations.json")
print(f"Parsed {len(data)} conversations")

# Parse multiple files
all_data = parser.parse_multiple_files([
    "claude_conversations.json",
    "gpt_conversations.json",
    "grok_conversations.json"
])

# Save parsed data
parser.save_parsed_data_by_llm(all_data, "parsed_data")
```

## Key Features

- **Automatic LLM Detection**: Analyzes JSON structure to determine LLM type
- **Unified Output Format**: Converts all LLM formats to standardized RAG-optimized structure
- **Batch Processing**: Process multiple files at once
- **Error Handling**: Robust error handling with detailed error messages
- **Zero Dependencies**: Uses only Python standard library
- **CLI Support**: Command-line interface included

## Output Format

```json
[
  {
    "id": "message_uuid",
    "content": {
      "user_query": "User's question",
      "conversation_flow": "[AI_ANSWER] Previous AI response\n[USER_QUESTION] User's question"
    },
    "metadata": {
      "previous_ai_answer": "Previous AI response or null",
      "conversation_id": "conversation_uuid"
    }
  }
]
```

## Usage Examples

### 1. Automatic LLM Type Detection

```python
from llm_conversation_parser import LLMConversationParser

parser = LLMConversationParser()

# Automatically detect LLM type based on JSON structure
claude_data = parser.parse_file("my_conversations.json")  # Auto-detected as Claude
gpt_data = parser.parse_file("chat_history.json")       # Auto-detected as ChatGPT
grok_data = parser.parse_file("ai_chat.json")          # Auto-detected as Grok
```

### 2. Explicit LLM Type Specification

```python
# Specify LLM type explicitly
claude_data = parser.parse_file("conversations.json", "claude")
gpt_data = parser.parse_file("conversations.json", "gpt")
grok_data = parser.parse_file("conversations.json", "grok")
```

### 3. Batch Processing

```python
# Process multiple files at once
files = [
    "claude_conversations.json",
    "gpt_conversations.json",
    "grok_conversations.json"
]

# Process all files with auto-detection
data_by_llm = parser.parse_multiple_files(files)

# Check results
for llm_type, conversations in data_by_llm.items():
    print(f"{llm_type}: {len(conversations)} conversations")

# Save by LLM type
parser.save_parsed_data_by_llm(data_by_llm, "parsed_data")
```

### 4. RAG Data Utilization

```python
# Use parsed data for RAG systems
for conversation in data:
    message_id = conversation["id"]
    user_query = conversation["content"]["user_query"]
    conversation_flow = conversation["content"]["conversation_flow"]

    # Extract text for vectorization
    rag_text = f"{user_query}\n{conversation_flow}"

    # Store in vector database
    # vector_db.add_document(message_id, rag_text)
```

## Command Line Interface

```bash
# Parse single file
llm-conversation-parser parse input.json

# Parse multiple files
llm-conversation-parser parse file1.json file2.json --output parsed_data/

# Auto-detect LLM type
llm-conversation-parser parse conversations.json

# Specify LLM type explicitly
llm-conversation-parser parse conversations.json --llm-type claude
```

## License

MIT License

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for detailed changelog.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nullhyeon/llm-conversation-parser",
    "name": "llm-conversation-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "nullhyeon <wnsgus7832@gmail.com>",
    "keywords": "llm, conversation, parser, rag, claude, gpt, grok, json",
    "author": "nullhyeon",
    "author_email": "nullhyeon <wnsgus7832@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d3/cf/85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506/llm_conversation_parser-1.0.1.tar.gz",
    "platform": null,
    "description": "# LLM Conversation Parser\r\n\r\nA Python library for parsing LLM conversation JSON files into RAG-optimized format.\r\n\r\n## Documentation\r\n\r\n- **[Korean Documentation](README_ko.md)** - Detailed documentation for Korean users\r\n- **[LLM JSON Format Guide](LLM_JSON_FORMATS.md)** - Claude, ChatGPT, Grok JSON file structure analysis\r\n\r\n## Supported LLMs\r\n\r\n- **Claude** (Anthropic)\r\n- **ChatGPT** (OpenAI)\r\n- **Grok** (xAI)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install llm-conversation-parser\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom llm_conversation_parser import LLMConversationParser\r\n\r\n# Initialize parser\r\nparser = LLMConversationParser()\r\n\r\n# Parse single file (auto-detect LLM type)\r\ndata = parser.parse_file(\"claude_conversations.json\")\r\nprint(f\"Parsed {len(data)} conversations\")\r\n\r\n# Parse multiple files\r\nall_data = parser.parse_multiple_files([\r\n    \"claude_conversations.json\",\r\n    \"gpt_conversations.json\",\r\n    \"grok_conversations.json\"\r\n])\r\n\r\n# Save parsed data\r\nparser.save_parsed_data_by_llm(all_data, \"parsed_data\")\r\n```\r\n\r\n## Key Features\r\n\r\n- **Automatic LLM Detection**: Analyzes JSON structure to determine LLM type\r\n- **Unified Output Format**: Converts all LLM formats to standardized RAG-optimized structure\r\n- **Batch Processing**: Process multiple files at once\r\n- **Error Handling**: Robust error handling with detailed error messages\r\n- **Zero Dependencies**: Uses only Python standard library\r\n- **CLI Support**: Command-line interface included\r\n\r\n## Output Format\r\n\r\n```json\r\n[\r\n  {\r\n    \"id\": \"message_uuid\",\r\n    \"content\": {\r\n      \"user_query\": \"User's question\",\r\n      \"conversation_flow\": \"[AI_ANSWER] Previous AI response\\n[USER_QUESTION] User's question\"\r\n    },\r\n    \"metadata\": {\r\n      \"previous_ai_answer\": \"Previous AI response or null\",\r\n      \"conversation_id\": \"conversation_uuid\"\r\n    }\r\n  }\r\n]\r\n```\r\n\r\n## Usage Examples\r\n\r\n### 1. Automatic LLM Type Detection\r\n\r\n```python\r\nfrom llm_conversation_parser import LLMConversationParser\r\n\r\nparser = LLMConversationParser()\r\n\r\n# Automatically detect LLM type based on JSON structure\r\nclaude_data = parser.parse_file(\"my_conversations.json\")  # Auto-detected as Claude\r\ngpt_data = parser.parse_file(\"chat_history.json\")       # Auto-detected as ChatGPT\r\ngrok_data = parser.parse_file(\"ai_chat.json\")          # Auto-detected as Grok\r\n```\r\n\r\n### 2. Explicit LLM Type Specification\r\n\r\n```python\r\n# Specify LLM type explicitly\r\nclaude_data = parser.parse_file(\"conversations.json\", \"claude\")\r\ngpt_data = parser.parse_file(\"conversations.json\", \"gpt\")\r\ngrok_data = parser.parse_file(\"conversations.json\", \"grok\")\r\n```\r\n\r\n### 3. Batch Processing\r\n\r\n```python\r\n# Process multiple files at once\r\nfiles = [\r\n    \"claude_conversations.json\",\r\n    \"gpt_conversations.json\",\r\n    \"grok_conversations.json\"\r\n]\r\n\r\n# Process all files with auto-detection\r\ndata_by_llm = parser.parse_multiple_files(files)\r\n\r\n# Check results\r\nfor llm_type, conversations in data_by_llm.items():\r\n    print(f\"{llm_type}: {len(conversations)} conversations\")\r\n\r\n# Save by LLM type\r\nparser.save_parsed_data_by_llm(data_by_llm, \"parsed_data\")\r\n```\r\n\r\n### 4. RAG Data Utilization\r\n\r\n```python\r\n# Use parsed data for RAG systems\r\nfor conversation in data:\r\n    message_id = conversation[\"id\"]\r\n    user_query = conversation[\"content\"][\"user_query\"]\r\n    conversation_flow = conversation[\"content\"][\"conversation_flow\"]\r\n\r\n    # Extract text for vectorization\r\n    rag_text = f\"{user_query}\\n{conversation_flow}\"\r\n\r\n    # Store in vector database\r\n    # vector_db.add_document(message_id, rag_text)\r\n```\r\n\r\n## Command Line Interface\r\n\r\n```bash\r\n# Parse single file\r\nllm-conversation-parser parse input.json\r\n\r\n# Parse multiple files\r\nllm-conversation-parser parse file1.json file2.json --output parsed_data/\r\n\r\n# Auto-detect LLM type\r\nllm-conversation-parser parse conversations.json\r\n\r\n# Specify LLM type explicitly\r\nllm-conversation-parser parse conversations.json --llm-type claude\r\n```\r\n\r\n## License\r\n\r\nMIT License\r\n\r\n## Changelog\r\n\r\nSee [CHANGELOG.md](CHANGELOG.md) for detailed changelog.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for parsing LLM conversation JSON files into RAG-optimized format",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://github.com/nullhyeon/llm-conversation-parser#readme",
        "Homepage": "https://github.com/nullhyeon/llm-conversation-parser",
        "Issues": "https://github.com/nullhyeon/llm-conversation-parser/issues",
        "Repository": "https://github.com/nullhyeon/llm-conversation-parser"
    },
    "split_keywords": [
        "llm",
        " conversation",
        " parser",
        " rag",
        " claude",
        " gpt",
        " grok",
        " json"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "827a7660ae616d233557139d9784309e1e50a09a8c507a6f45737ab739b8ca51",
                "md5": "610491b86ecac1e72672b967fd9cea8f",
                "sha256": "35540fa1fa0caba7efd9dd3c5f41071412a9b5c42ab558aee7136066ef04dc27"
            },
            "downloads": -1,
            "filename": "llm_conversation_parser-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "610491b86ecac1e72672b967fd9cea8f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9999,
            "upload_time": "2025-10-13T13:18:43",
            "upload_time_iso_8601": "2025-10-13T13:18:43.768257Z",
            "url": "https://files.pythonhosted.org/packages/82/7a/7660ae616d233557139d9784309e1e50a09a8c507a6f45737ab739b8ca51/llm_conversation_parser-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d3cf85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506",
                "md5": "aada46395d468f20c8b2e649ca360bbf",
                "sha256": "f03dad5941413d7a0a1d351697153801c1913b2acc86cfe868e0314bca007e85"
            },
            "downloads": -1,
            "filename": "llm_conversation_parser-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "aada46395d468f20c8b2e649ca360bbf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 21157,
            "upload_time": "2025-10-13T13:18:45",
            "upload_time_iso_8601": "2025-10-13T13:18:45.367859Z",
            "url": "https://files.pythonhosted.org/packages/d3/cf/85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506/llm_conversation_parser-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-13 13:18:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nullhyeon",
    "github_project": "llm-conversation-parser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "llm-conversation-parser"
}
        
Elapsed time: 2.06496s