# ResumeParser Pro π
[](https://badge.fury.io/py/resumeparser-pro)
[](https://pypi.org/project/resumeparser-pro/)
[](https://opensource.org/licenses/MIT)
Production-ready AI-powered resume parser with parallel processing capabilities. Extract structured data from resumes in **PDF, DOCX, TXT, images (PNG, JPG), HTML, and ODT** formats using state-of-the-art language models.
## π Features
- **π€ AI-Powered**: Uses advanced language models (GPT, Gemini, Claude, etc.) for high-accuracy extraction.
- **β‘ Parallel Processing**: Process multiple resumes simultaneously, significantly speeding up bulk operations.
- **π Structured Output**: Returns clean, Pydantic-validated JSON data for easy integration.
- **π― High Accuracy**: Extracts over 20 distinct fields, including categorized skills and work duration in months.
- **π Multi-Format Support**: Natively handles PDF, DOCX, and TXT, with optional support for images (OCR), HTML, and ODT files.
- **π Production Ready**: Features robust error handling, logging, and clear, structured results.
- **π Easy Integration**: A simple and intuitive API gets you started in just a few lines of code.
## π Quick Start
### Installation
For core functionality (PDF, DOCX, TXT), install the base package:
```bash
pip install ai-resume-parser
```
For full functionality, including support for images, HTML, and ODT files (recommended):
```bash
pip install ai-resume-parser[full]
```
See the "Supported File Formats" section for more specific installation options.
### Basic Usage
It only takes a few lines to parse your first resume.
```python
from resumeparser_pro import ResumeParserPro
# Initialize the parser with your chosen AI provider and API key
parser = ResumeParserPro(
provider="google_genai",
model_name="gemini-2.0-flash", # Or "gpt-4o-mini", "claude-3-5-sonnet", etc.
api_key="your-llm-provider-api-key"
)
```
```python
# Parse a single resume file
# Supports .pdf, .docx, .txt, .png, .jpg, and more
result = parser.parse_resume("path/to/your/resume.pdf")
# Check if parsing was successful and access the data
if result.success:
print(f"β
Resume parsed successfully!")
print(f"Name: {result.resume_data.contact_info.full_name}")
print(f"Total Experience: {result.resume_data.total_experience_months} months")
print(f"Industry: {result.resume_data.industry}")
# You can also get a quick summary
# print(result.get_summary()) # Assuming you add this convenience method
# Or export the full data to a dictionary
# resume_dict = result.model_dump()
else:
print(f"β Parsing failed: {result.error_message}")
```
### Batch Processing
Process multiple resumes in parallel for maximum speed.
```python
# Process multiple resumes at once
file_paths = ["resume1.pdf", "resume2.docx", "scanned_resume.png"]
results = parser.parse_batch(file_paths)
```
```python
# Filter for only the successfully parsed resumes
successful_resumes = parser.get_successful_resumes(results)
print(f"Successfully parsed {len(successful_resumes)} out of {len(file_paths)} resumes.")
```
## π Supported File Formats
ResumeParser Pro supports a wide range of file formats. For formats beyond PDF, DOCX, and TXT, you need to install optional dependencies.
| Format | Extensions | Required Installation Command |
|-----------------|--------------------------|----------------------------------------|
| **Core Formats**| `.pdf`, `.docx`, `.txt` | `pip install ai-resume-parser` |
| **Images (OCR)**| `.png`, `.jpg`, `.jpeg` | `pip install ai-resume-parser[ocr]` |
| **HTML** | `.html`, `.htm` | `pip install ai-resume-parser[html]` |
| **OpenDocument**| `.odt` | `pip install ai-resume-parser[odt]` |
**βοΈ Important Note for Image Parsing:**
To parse images (`.png`, `.jpg`), you must have the **Google Tesseract OCR engine** installed on your system. This is a separate step from the `pip` installation.
* [Tesseract Installation Guide](https://github.com/tesseract-ocr/tesseract/wiki)
## π Example Parsed Resume Data
The parser returns a structured `ParsedResumeResult` object. The core data is in `result.resume_data`, which follows a detailed Pydantic schema.
```python
{
'file_path': 'resume.pdf',
'success': True,
'resume_data': {
'contact_info': {
'full_name': 'Jason Miller',
'email': 'email@email.com',
'phone': '+1386862',
'location': 'Los Angeles, CA 90291, United States',
'linkedin': 'https://www.linkedin.com/in/jason-miller'
},
'professional_summary': 'Experienced Amazon Associate with five yearsβ tenure...',
'skills': [
{'category': 'Technical Skills', 'skills': ['Picking', 'Packing', 'Inventory Management']}
],
'work_experience': [{
'job_title': 'Amazon Warehouse Associate',
'company': 'Amazon',
'start_date': '2021-01',
'end_date': '2022-07',
'duration_months': 19,
'description': 'Performed all warehouse laborer duties...',
'achievements': ['Consistently maintained picking/packing speeds in the 98th percentile.']
}],
'education': [{
'degree': 'Associates Degree in Logistics and Supply Chain Fundamentals',
'institution': 'Atlanta Technical College'
}],
'total_experience_months': 43,
'industry': 'Logistics & Supply Chain',
'seniority_level': 'Mid-level'
},
'parsing_time_seconds': 3.71,
'timestamp': '2025-07-25T15:19:50.614831'
}
```
## π― Supported AI Providers
The library is built on LangChain, so it supports a vast ecosystem of LLM providers. Here are some of the most common ones:
| Provider | Example Models | Setup |
|-----------------|-------------------------------------------|------------------------|
| **Google** | `gemini-2.0-flash`, `gemini-1.5-pro` | `provider="google_genai"`|
| **OpenAI** | `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo` | `provider="openai"` |
| **Anthropic** | `claude-3-5-sonnet-20240620`, `claude-3-opus` | `provider="anthropic"` |
| **Azure OpenAI**| `gpt-4`, `gpt-35-turbo` | `provider="azure_openai"`|
| **AWS Bedrock** | Claude, Llama, Titan models | `provider="bedrock"` |
| **Ollama** | Local models like `llama3`, `codellama` | `provider="ollama"` |
**Full list**: See the [LangChain Chat Model Integrations](https://python.langchain.com/v0.2/docs/integrations/chat/) for a complete list of supported providers and model names.
### Provider Usage Examples
```python
# Using OpenAI's GPT-4o-mini
parser = ResumeParserPro(provider="openai", model_name="gpt-4o-mini", api_key="your-openai-key")
```
```python
# Using a local model with Ollama (no API key needed)
parser = ResumeParserPro(provider="ollama", model_name="llama3:8b", api_key="NA")
```
```python
# Using Anthropic's Claude 3.5 Sonnet
parser = ResumeParserPro(provider="anthropic", model_name="claude-3-5-sonnet-20240620", api_key="your-anthropic-key")
```
## π οΈ Advanced Configuration
You can customize the parser's behavior during initialization.
```python
parser = ResumeParserPro(
provider="openai",
model_name="gpt-4o-mini",
api_key="your-api-key",
max_workers=10, # Increase for faster batch processing
temperature=0.0, # Set to 0.0 for maximum consistency
)
```
## π€ Contributing
Contributions are highly welcome! Please feel free to submit a pull request or open an issue for bugs, feature requests, or suggestions.
## π License
This project is licensed under the MIT License - see the `LICENSE` file for details.
## π Support
- π **Documentation**: Check the code and examples in this repository.
- π **Issue Tracker**: Report bugs or issues [here](https://github.com/Ruthikr/ai-resume-parser/issues).
- π¬ **Discussions**: Ask questions or share ideas in our [Discussions tab](https://github.com/Ruthikr/ai-resume-parser/discussions).
---
**Built with β€οΈ for the recruitment and HR community.**
Raw data
{
"_id": null,
"home_page": null,
"name": "ai-resume-parser",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "resume parsing, AI, NLP, OCR, parallel processing, recruitment, HR, LLM, resume, parser, Gemini, OpenAI, job",
"author": null,
"author_email": "Ruthik Reddy <ruthikr369@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3d/8e/ab6760e1aebeeb14249b960b6d659264383d5e1978813dcf1c66141c873f/ai_resume_parser-1.0.6.tar.gz",
"platform": null,
"description": "# ResumeParser Pro \ud83d\ude80\n\n[](https://badge.fury.io/py/resumeparser-pro)\n[](https://pypi.org/project/resumeparser-pro/)\n[](https://opensource.org/licenses/MIT)\n\nProduction-ready AI-powered resume parser with parallel processing capabilities. Extract structured data from resumes in **PDF, DOCX, TXT, images (PNG, JPG), HTML, and ODT** formats using state-of-the-art language models.\n\n## \ud83c\udf1f Features\n\n- **\ud83e\udd16 AI-Powered**: Uses advanced language models (GPT, Gemini, Claude, etc.) for high-accuracy extraction.\n- **\u26a1 Parallel Processing**: Process multiple resumes simultaneously, significantly speeding up bulk operations.\n- **\ud83d\udcca Structured Output**: Returns clean, Pydantic-validated JSON data for easy integration.\n- **\ud83c\udfaf High Accuracy**: Extracts over 20 distinct fields, including categorized skills and work duration in months.\n- **\ud83d\udcc1 Multi-Format Support**: Natively handles PDF, DOCX, and TXT, with optional support for images (OCR), HTML, and ODT files.\n- **\ud83d\udcc8 Production Ready**: Features robust error handling, logging, and clear, structured results.\n- **\ud83d\udd0c Easy Integration**: A simple and intuitive API gets you started in just a few lines of code.\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\nFor core functionality (PDF, DOCX, TXT), install the base package:\n```bash\npip install ai-resume-parser\n```\n\nFor full functionality, including support for images, HTML, and ODT files (recommended):\n```bash\npip install ai-resume-parser[full]\n```\n\nSee the \"Supported File Formats\" section for more specific installation options.\n\n### Basic Usage\n\nIt only takes a few lines to parse your first resume.\n\n```python\nfrom resumeparser_pro import ResumeParserPro\n\n# Initialize the parser with your chosen AI provider and API key\nparser = ResumeParserPro(\n provider=\"google_genai\",\n model_name=\"gemini-2.0-flash\", # Or \"gpt-4o-mini\", \"claude-3-5-sonnet\", etc.\n api_key=\"your-llm-provider-api-key\"\n)\n```\n\n```python\n# Parse a single resume file\n# Supports .pdf, .docx, .txt, .png, .jpg, and more\nresult = parser.parse_resume(\"path/to/your/resume.pdf\")\n\n# Check if parsing was successful and access the data\nif result.success:\n print(f\"\u2705 Resume parsed successfully!\")\n print(f\"Name: {result.resume_data.contact_info.full_name}\")\n print(f\"Total Experience: {result.resume_data.total_experience_months} months\")\n print(f\"Industry: {result.resume_data.industry}\")\n\n # You can also get a quick summary\n # print(result.get_summary()) # Assuming you add this convenience method\n\n # Or export the full data to a dictionary\n # resume_dict = result.model_dump()\nelse:\n print(f\"\u274c Parsing failed: {result.error_message}\")\n```\n\n### Batch Processing\n\nProcess multiple resumes in parallel for maximum speed.\n\n```python\n# Process multiple resumes at once\nfile_paths = [\"resume1.pdf\", \"resume2.docx\", \"scanned_resume.png\"]\nresults = parser.parse_batch(file_paths)\n```\n\n```python\n# Filter for only the successfully parsed resumes\nsuccessful_resumes = parser.get_successful_resumes(results)\nprint(f\"Successfully parsed {len(successful_resumes)} out of {len(file_paths)} resumes.\")\n```\n\n## \ud83d\udcc1 Supported File Formats\n\nResumeParser Pro supports a wide range of file formats. For formats beyond PDF, DOCX, and TXT, you need to install optional dependencies.\n\n| Format | Extensions | Required Installation Command |\n|-----------------|--------------------------|----------------------------------------|\n| **Core Formats**| `.pdf`, `.docx`, `.txt` | `pip install ai-resume-parser` |\n| **Images (OCR)**| `.png`, `.jpg`, `.jpeg` | `pip install ai-resume-parser[ocr]` |\n| **HTML** | `.html`, `.htm` | `pip install ai-resume-parser[html]` |\n| **OpenDocument**| `.odt` | `pip install ai-resume-parser[odt]` |\n\n**\u2757\ufe0f Important Note for Image Parsing:**\nTo parse images (`.png`, `.jpg`), you must have the **Google Tesseract OCR engine** installed on your system. This is a separate step from the `pip` installation.\n* [Tesseract Installation Guide](https://github.com/tesseract-ocr/tesseract/wiki)\n\n## \ud83d\udcca Example Parsed Resume Data\n\nThe parser returns a structured `ParsedResumeResult` object. The core data is in `result.resume_data`, which follows a detailed Pydantic schema.\n\n```python\n{\n 'file_path': 'resume.pdf',\n 'success': True,\n 'resume_data': {\n 'contact_info': {\n 'full_name': 'Jason Miller',\n 'email': 'email@email.com',\n 'phone': '+1386862',\n 'location': 'Los Angeles, CA 90291, United States',\n 'linkedin': 'https://www.linkedin.com/in/jason-miller'\n },\n 'professional_summary': 'Experienced Amazon Associate with five years\u2019 tenure...',\n 'skills': [\n {'category': 'Technical Skills', 'skills': ['Picking', 'Packing', 'Inventory Management']}\n ],\n 'work_experience': [{\n 'job_title': 'Amazon Warehouse Associate',\n 'company': 'Amazon',\n 'start_date': '2021-01',\n 'end_date': '2022-07',\n 'duration_months': 19,\n 'description': 'Performed all warehouse laborer duties...',\n 'achievements': ['Consistently maintained picking/packing speeds in the 98th percentile.']\n }],\n 'education': [{\n 'degree': 'Associates Degree in Logistics and Supply Chain Fundamentals',\n 'institution': 'Atlanta Technical College'\n }],\n 'total_experience_months': 43,\n 'industry': 'Logistics & Supply Chain',\n 'seniority_level': 'Mid-level'\n },\n 'parsing_time_seconds': 3.71,\n 'timestamp': '2025-07-25T15:19:50.614831'\n}\n```\n\n## \ud83c\udfaf Supported AI Providers\n\nThe library is built on LangChain, so it supports a vast ecosystem of LLM providers. Here are some of the most common ones:\n\n| Provider | Example Models | Setup |\n|-----------------|-------------------------------------------|------------------------|\n| **Google** | `gemini-2.0-flash`, `gemini-1.5-pro` | `provider=\"google_genai\"`|\n| **OpenAI** | `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo` | `provider=\"openai\"` |\n| **Anthropic** | `claude-3-5-sonnet-20240620`, `claude-3-opus` | `provider=\"anthropic\"` |\n| **Azure OpenAI**| `gpt-4`, `gpt-35-turbo` | `provider=\"azure_openai\"`|\n| **AWS Bedrock** | Claude, Llama, Titan models | `provider=\"bedrock\"` |\n| **Ollama** | Local models like `llama3`, `codellama` | `provider=\"ollama\"` |\n\n**Full list**: See the [LangChain Chat Model Integrations](https://python.langchain.com/v0.2/docs/integrations/chat/) for a complete list of supported providers and model names.\n\n### Provider Usage Examples\n\n```python\n# Using OpenAI's GPT-4o-mini\nparser = ResumeParserPro(provider=\"openai\", model_name=\"gpt-4o-mini\", api_key=\"your-openai-key\")\n```\n\n```python\n# Using a local model with Ollama (no API key needed)\nparser = ResumeParserPro(provider=\"ollama\", model_name=\"llama3:8b\", api_key=\"NA\")\n```\n\n```python\n# Using Anthropic's Claude 3.5 Sonnet\nparser = ResumeParserPro(provider=\"anthropic\", model_name=\"claude-3-5-sonnet-20240620\", api_key=\"your-anthropic-key\")\n```\n\n## \ud83d\udee0\ufe0f Advanced Configuration\n\nYou can customize the parser's behavior during initialization.\n\n```python\nparser = ResumeParserPro(\n provider=\"openai\",\n model_name=\"gpt-4o-mini\",\n api_key=\"your-api-key\",\n max_workers=10, # Increase for faster batch processing\n temperature=0.0, # Set to 0.0 for maximum consistency\n)\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are highly welcome! Please feel free to submit a pull request or open an issue for bugs, feature requests, or suggestions.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the `LICENSE` file for details.\n\n## \ud83c\udd98 Support\n\n- \ud83d\udcd6 **Documentation**: Check the code and examples in this repository.\n- \ud83d\udc1b **Issue Tracker**: Report bugs or issues [here](https://github.com/Ruthikr/ai-resume-parser/issues).\n- \ud83d\udcac **Discussions**: Ask questions or share ideas in our [Discussions tab](https://github.com/Ruthikr/ai-resume-parser/discussions).\n\n---\n\n**Built with \u2764\ufe0f for the recruitment and HR community.**\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "AI-powered resume parser with parallel processing for multiple file formats (PDF, DOCX, images, etc.)",
"version": "1.0.6",
"project_urls": {
"Documentation": "https://github.com/Ruthikr/ai-resume-parser/tree/main/docs",
"Homepage": "https://github.com/Ruthikr",
"Issues": "https://github.com/Ruthikr/ai-resume-parser/issues",
"Repository": "https://github.com/Ruthikr/ai-resume-parser"
},
"split_keywords": [
"resume parsing",
" ai",
" nlp",
" ocr",
" parallel processing",
" recruitment",
" hr",
" llm",
" resume",
" parser",
" gemini",
" openai",
" job"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9ac7b00b86fcee84aa9471bdb4a7b49147fbe7583b0b40293e380f34249389d9",
"md5": "1b832816284ec51b0c1e58491a8c63d8",
"sha256": "d85f6e2787072664a1599dc1466ea45fcd96331bc253beb376170d4210c2b782"
},
"downloads": -1,
"filename": "ai_resume_parser-1.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1b832816284ec51b0c1e58491a8c63d8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 15949,
"upload_time": "2025-07-29T23:13:03",
"upload_time_iso_8601": "2025-07-29T23:13:03.340830Z",
"url": "https://files.pythonhosted.org/packages/9a/c7/b00b86fcee84aa9471bdb4a7b49147fbe7583b0b40293e380f34249389d9/ai_resume_parser-1.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3d8eab6760e1aebeeb14249b960b6d659264383d5e1978813dcf1c66141c873f",
"md5": "4e571fea9f02469544ff63ad70b78173",
"sha256": "868467a4478567d28201bdf194f5acbf4f91a3d92d8c16c0be39489a05270965"
},
"downloads": -1,
"filename": "ai_resume_parser-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "4e571fea9f02469544ff63ad70b78173",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 14651,
"upload_time": "2025-07-29T23:13:04",
"upload_time_iso_8601": "2025-07-29T23:13:04.657423Z",
"url": "https://files.pythonhosted.org/packages/3d/8e/ab6760e1aebeeb14249b960b6d659264383d5e1978813dcf1c66141c873f/ai_resume_parser-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-29 23:13:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ruthikr",
"github_project": "ai-resume-parser",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "langchain-core",
"specs": [
[
">=",
"0.1.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
">=",
"2.8.0"
]
]
},
{
"name": "pdfminer.six",
"specs": [
[
">=",
"20221105"
]
]
},
{
"name": "PyMuPDF",
"specs": [
[
">=",
"1.23.0"
]
]
},
{
"name": "python-docx",
"specs": [
[
">=",
"0.8.11"
]
]
},
{
"name": "phonenumbers",
"specs": [
[
">=",
"8.13.0"
]
]
}
],
"lcname": "ai-resume-parser"
}