# LLM Conversation Parser
A Python library for parsing LLM conversation JSON files into RAG-optimized format.
## Documentation
- **[Korean Documentation](README_ko.md)** - Detailed documentation for Korean users
- **[LLM JSON Format Guide](LLM_JSON_FORMATS.md)** - Claude, ChatGPT, Grok JSON file structure analysis
## Supported LLMs
- **Claude** (Anthropic)
- **ChatGPT** (OpenAI)
- **Grok** (xAI)
## Installation
```bash
pip install llm-conversation-parser
```
## Quick Start
```python
from llm_conversation_parser import LLMConversationParser
# Initialize parser
parser = LLMConversationParser()
# Parse single file (auto-detect LLM type)
data = parser.parse_file("claude_conversations.json")
print(f"Parsed {len(data)} conversations")
# Parse multiple files
all_data = parser.parse_multiple_files([
"claude_conversations.json",
"gpt_conversations.json",
"grok_conversations.json"
])
# Save parsed data
parser.save_parsed_data_by_llm(all_data, "parsed_data")
```
## Key Features
- **Automatic LLM Detection**: Analyzes JSON structure to determine LLM type
- **Unified Output Format**: Converts all LLM formats to standardized RAG-optimized structure
- **Batch Processing**: Process multiple files at once
- **Error Handling**: Robust error handling with detailed error messages
- **Zero Dependencies**: Uses only Python standard library
- **CLI Support**: Command-line interface included
## Output Format
```json
[
{
"id": "message_uuid",
"content": {
"user_query": "User's question",
"conversation_flow": "[AI_ANSWER] Previous AI response\n[USER_QUESTION] User's question"
},
"metadata": {
"previous_ai_answer": "Previous AI response or null",
"conversation_id": "conversation_uuid"
}
}
]
```
## Usage Examples
### 1. Automatic LLM Type Detection
```python
from llm_conversation_parser import LLMConversationParser
parser = LLMConversationParser()
# Automatically detect LLM type based on JSON structure
claude_data = parser.parse_file("my_conversations.json") # Auto-detected as Claude
gpt_data = parser.parse_file("chat_history.json") # Auto-detected as ChatGPT
grok_data = parser.parse_file("ai_chat.json") # Auto-detected as Grok
```
### 2. Explicit LLM Type Specification
```python
# Specify LLM type explicitly
claude_data = parser.parse_file("conversations.json", "claude")
gpt_data = parser.parse_file("conversations.json", "gpt")
grok_data = parser.parse_file("conversations.json", "grok")
```
### 3. Batch Processing
```python
# Process multiple files at once
files = [
"claude_conversations.json",
"gpt_conversations.json",
"grok_conversations.json"
]
# Process all files with auto-detection
data_by_llm = parser.parse_multiple_files(files)
# Check results
for llm_type, conversations in data_by_llm.items():
print(f"{llm_type}: {len(conversations)} conversations")
# Save by LLM type
parser.save_parsed_data_by_llm(data_by_llm, "parsed_data")
```
### 4. RAG Data Utilization
```python
# Use parsed data for RAG systems
for conversation in data:
message_id = conversation["id"]
user_query = conversation["content"]["user_query"]
conversation_flow = conversation["content"]["conversation_flow"]
# Extract text for vectorization
rag_text = f"{user_query}\n{conversation_flow}"
# Store in vector database
# vector_db.add_document(message_id, rag_text)
```
## Command Line Interface
```bash
# Parse single file
llm-conversation-parser parse input.json
# Parse multiple files
llm-conversation-parser parse file1.json file2.json --output parsed_data/
# Auto-detect LLM type
llm-conversation-parser parse conversations.json
# Specify LLM type explicitly
llm-conversation-parser parse conversations.json --llm-type claude
```
## License
MIT License
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for detailed changelog.
Raw data
{
"_id": null,
"home_page": "https://github.com/nullhyeon/llm-conversation-parser",
"name": "llm-conversation-parser",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "nullhyeon <wnsgus7832@gmail.com>",
"keywords": "llm, conversation, parser, rag, claude, gpt, grok, json",
"author": "nullhyeon",
"author_email": "nullhyeon <wnsgus7832@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d3/cf/85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506/llm_conversation_parser-1.0.1.tar.gz",
"platform": null,
"description": "# LLM Conversation Parser\r\n\r\nA Python library for parsing LLM conversation JSON files into RAG-optimized format.\r\n\r\n## Documentation\r\n\r\n- **[Korean Documentation](README_ko.md)** - Detailed documentation for Korean users\r\n- **[LLM JSON Format Guide](LLM_JSON_FORMATS.md)** - Claude, ChatGPT, Grok JSON file structure analysis\r\n\r\n## Supported LLMs\r\n\r\n- **Claude** (Anthropic)\r\n- **ChatGPT** (OpenAI)\r\n- **Grok** (xAI)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install llm-conversation-parser\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom llm_conversation_parser import LLMConversationParser\r\n\r\n# Initialize parser\r\nparser = LLMConversationParser()\r\n\r\n# Parse single file (auto-detect LLM type)\r\ndata = parser.parse_file(\"claude_conversations.json\")\r\nprint(f\"Parsed {len(data)} conversations\")\r\n\r\n# Parse multiple files\r\nall_data = parser.parse_multiple_files([\r\n \"claude_conversations.json\",\r\n \"gpt_conversations.json\",\r\n \"grok_conversations.json\"\r\n])\r\n\r\n# Save parsed data\r\nparser.save_parsed_data_by_llm(all_data, \"parsed_data\")\r\n```\r\n\r\n## Key Features\r\n\r\n- **Automatic LLM Detection**: Analyzes JSON structure to determine LLM type\r\n- **Unified Output Format**: Converts all LLM formats to standardized RAG-optimized structure\r\n- **Batch Processing**: Process multiple files at once\r\n- **Error Handling**: Robust error handling with detailed error messages\r\n- **Zero Dependencies**: Uses only Python standard library\r\n- **CLI Support**: Command-line interface included\r\n\r\n## Output Format\r\n\r\n```json\r\n[\r\n {\r\n \"id\": \"message_uuid\",\r\n \"content\": {\r\n \"user_query\": \"User's question\",\r\n \"conversation_flow\": \"[AI_ANSWER] Previous AI response\\n[USER_QUESTION] User's question\"\r\n },\r\n \"metadata\": {\r\n \"previous_ai_answer\": \"Previous AI response or null\",\r\n \"conversation_id\": \"conversation_uuid\"\r\n }\r\n }\r\n]\r\n```\r\n\r\n## Usage Examples\r\n\r\n### 1. Automatic LLM Type Detection\r\n\r\n```python\r\nfrom llm_conversation_parser import LLMConversationParser\r\n\r\nparser = LLMConversationParser()\r\n\r\n# Automatically detect LLM type based on JSON structure\r\nclaude_data = parser.parse_file(\"my_conversations.json\") # Auto-detected as Claude\r\ngpt_data = parser.parse_file(\"chat_history.json\") # Auto-detected as ChatGPT\r\ngrok_data = parser.parse_file(\"ai_chat.json\") # Auto-detected as Grok\r\n```\r\n\r\n### 2. Explicit LLM Type Specification\r\n\r\n```python\r\n# Specify LLM type explicitly\r\nclaude_data = parser.parse_file(\"conversations.json\", \"claude\")\r\ngpt_data = parser.parse_file(\"conversations.json\", \"gpt\")\r\ngrok_data = parser.parse_file(\"conversations.json\", \"grok\")\r\n```\r\n\r\n### 3. Batch Processing\r\n\r\n```python\r\n# Process multiple files at once\r\nfiles = [\r\n \"claude_conversations.json\",\r\n \"gpt_conversations.json\",\r\n \"grok_conversations.json\"\r\n]\r\n\r\n# Process all files with auto-detection\r\ndata_by_llm = parser.parse_multiple_files(files)\r\n\r\n# Check results\r\nfor llm_type, conversations in data_by_llm.items():\r\n print(f\"{llm_type}: {len(conversations)} conversations\")\r\n\r\n# Save by LLM type\r\nparser.save_parsed_data_by_llm(data_by_llm, \"parsed_data\")\r\n```\r\n\r\n### 4. RAG Data Utilization\r\n\r\n```python\r\n# Use parsed data for RAG systems\r\nfor conversation in data:\r\n message_id = conversation[\"id\"]\r\n user_query = conversation[\"content\"][\"user_query\"]\r\n conversation_flow = conversation[\"content\"][\"conversation_flow\"]\r\n\r\n # Extract text for vectorization\r\n rag_text = f\"{user_query}\\n{conversation_flow}\"\r\n\r\n # Store in vector database\r\n # vector_db.add_document(message_id, rag_text)\r\n```\r\n\r\n## Command Line Interface\r\n\r\n```bash\r\n# Parse single file\r\nllm-conversation-parser parse input.json\r\n\r\n# Parse multiple files\r\nllm-conversation-parser parse file1.json file2.json --output parsed_data/\r\n\r\n# Auto-detect LLM type\r\nllm-conversation-parser parse conversations.json\r\n\r\n# Specify LLM type explicitly\r\nllm-conversation-parser parse conversations.json --llm-type claude\r\n```\r\n\r\n## License\r\n\r\nMIT License\r\n\r\n## Changelog\r\n\r\nSee [CHANGELOG.md](CHANGELOG.md) for detailed changelog.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for parsing LLM conversation JSON files into RAG-optimized format",
"version": "1.0.1",
"project_urls": {
"Documentation": "https://github.com/nullhyeon/llm-conversation-parser#readme",
"Homepage": "https://github.com/nullhyeon/llm-conversation-parser",
"Issues": "https://github.com/nullhyeon/llm-conversation-parser/issues",
"Repository": "https://github.com/nullhyeon/llm-conversation-parser"
},
"split_keywords": [
"llm",
" conversation",
" parser",
" rag",
" claude",
" gpt",
" grok",
" json"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "827a7660ae616d233557139d9784309e1e50a09a8c507a6f45737ab739b8ca51",
"md5": "610491b86ecac1e72672b967fd9cea8f",
"sha256": "35540fa1fa0caba7efd9dd3c5f41071412a9b5c42ab558aee7136066ef04dc27"
},
"downloads": -1,
"filename": "llm_conversation_parser-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "610491b86ecac1e72672b967fd9cea8f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 9999,
"upload_time": "2025-10-13T13:18:43",
"upload_time_iso_8601": "2025-10-13T13:18:43.768257Z",
"url": "https://files.pythonhosted.org/packages/82/7a/7660ae616d233557139d9784309e1e50a09a8c507a6f45737ab739b8ca51/llm_conversation_parser-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d3cf85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506",
"md5": "aada46395d468f20c8b2e649ca360bbf",
"sha256": "f03dad5941413d7a0a1d351697153801c1913b2acc86cfe868e0314bca007e85"
},
"downloads": -1,
"filename": "llm_conversation_parser-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "aada46395d468f20c8b2e649ca360bbf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 21157,
"upload_time": "2025-10-13T13:18:45",
"upload_time_iso_8601": "2025-10-13T13:18:45.367859Z",
"url": "https://files.pythonhosted.org/packages/d3/cf/85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506/llm_conversation_parser-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-13 13:18:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nullhyeon",
"github_project": "llm-conversation-parser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "llm-conversation-parser"
}