# Pydantic Scrape
A modular AI-powered web scraping framework built on [pydantic-ai](https://github.com/pydantic/pydantic-ai) and [pydantic-graph](https://github.com/pydantic/pydantic-graph) for intelligent content extraction and research workflows.
## What is Pydantic Scrape?
Pydantic Scrape is a framework for building intelligent web scraping workflows that combine:
- **AI-powered content extraction** using pydantic-ai agents
- **Graph-based workflow orchestration** with pydantic-graph
- **Type-safe dependency injection** for modular, reusable components
- **Specialized content handlers** for academic papers, articles, videos, and more
## โก Quick Start: Search โ Answer Workflow
Get comprehensive research answers in seconds with our streamlined search-to-answer pipeline:
```python
from pydantic_scrape.graphs.search_answer import search_answer
# One line to research any topic
result = await search_answer(
query="Ivermectin working as a treatment for Cancer",
max_search_results=5
)
# Rich structured output with sources
print(f"โ
Found {result['processing_stats']['search_results']} sources")
print(f"๐ Answer: {result['answer']['answer']}")
print(f"๐ก Key insights: {len(result['answer']['key_insights'])}")
print(f"๐ Sources: {len(result['answer']['sources'])}")
```
**What it does:**
1. ๐ **Intelligent search** - Finds relevant academic papers and articles
2. ๐ **Content synthesis** - Combines multiple sources into comprehensive summaries
3. ๐ฏ **Answer generation** - Creates structured answers with key insights and sources
4. โก **Fast execution** - Complete research workflow in ~10 seconds
## Core Architecture: Agents + Dependencies + Graphs
Pydantic Scrape follows a clean three-layer architecture:
### ๐ค **Agents** - AI-powered workers
```python
# Intelligent search agent
from pydantic_scrape.agents.search import search_agent
# AI summarization agent
from pydantic_scrape.agents.summarization import summarize_content
# Dynamic scraping agent
from pydantic_scrape.agents.bs4_scrape_script_agent import get_bs4_scrape_script_agent
```
### ๐ง **Dependencies** - Reusable components
```python
# Content fetching with browser automation
from pydantic_scrape.dependencies.fetch import FetchDependency
# Academic API integrations
from pydantic_scrape.dependencies.openalex import OpenAlexDependency
from pydantic_scrape.dependencies.crossref import CrossrefDependency
# Content analysis and extraction
from pydantic_scrape.dependencies.content_analysis import ContentAnalysisDependency
```
### ๐ **Graphs** - Workflow orchestration
```python
# Fast search โ answer workflow
from pydantic_scrape.graphs.search_answer import search_answer_graph
# Complete science paper extraction
from pydantic_scrape.graphs.science import science_graph
# Dynamic scraping workflows
from pydantic_scrape.graphs.dynamic_scrape import dynamic_scrape_graph
```
## ๐ฌ Example: AI Content Summarization
Create structured summaries from any content:
```python
from pydantic_scrape.agents.summarization import summarize_content
# Single document
summary = await summarize_content(
"Machine learning advances in 2024 have focused on efficiency and safety...",
max_length=1000
)
print(f"Title: {summary.title}")
print(f"Summary: {summary.summary}")
print(f"Key findings: {summary.key_findings}")
print(f"Confidence: {summary.confidence_score}")
# Multiple documents (returns comprehensive summary)
combined_summary = await summarize_content([
doc1, doc2, doc3 # List of content objects
])
```
## ๐งฉ Example: Custom Dependency
Build reusable components for specific content types:
```python
from dataclasses import dataclass
from typing import Optional
from pydantic import BaseModel
class TwitterContent(BaseModel):
tweet_text: str
author: str
likes: int
retweets: int
@dataclass
class TwitterDependency:
"""Extract structured data from Twitter/X"""
def __init__(self, api_key: str):
self.api_key = api_key
async def extract_tweet_data(self, url: str) -> TwitterContent:
# Custom extraction logic here
pass
```
## ๐ Example: Custom Graph Workflow
Compose agents and dependencies into intelligent workflows:
```python
from dataclasses import dataclass
from typing import Union
from pydantic_graph import BaseNode, Graph, GraphRunContext, End
@dataclass
class ResearchState:
query: str
sources_found: list = None
summaries: list = None
final_report: str = None
@dataclass
class ResearchDeps:
search: SearchDependency
summarizer: SummarizationDependency
@dataclass
class SearchNode(BaseNode[ResearchState, ResearchDeps, Union["SummarizeNode", End]]):
async def run(self, ctx: GraphRunContext[ResearchState, ResearchDeps]):
sources = await ctx.deps.search.find_sources(ctx.state.query)
if not sources:
return End({"error": "No sources found"})
ctx.state.sources_found = sources
return SummarizeNode()
# Assemble the graph
research_graph = Graph(nodes=[SearchNode, SummarizeNode, ReportNode])
```
## ๐ ๏ธ Installation
### Development Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/pydantic-scrape.git
cd pydantic-scrape
# Install with development dependencies (using uv for speed)
uv pip install -e ".[dev]"
# or with pip
pip install -e ".[dev]"
# Set up environment variables
cp .env.example .env
# Add your API keys (OPENAI_API_KEY, etc.)
```
## ๐งช Comprehensive Testing & Validation
**โ
ALL 4 CORE GRAPHS TESTED AND OPERATIONAL!**
Run the complete test suite:
```bash
# Test all 4 graphs with real examples
python test_all_graphs.py
# Results: 4/4 graphs passing in ~90 seconds
# โ
Search โ Answer: Research workflow (32.9s)
# โ
Dynamic AI Scraping: Extract from any site (12.4s)
# โ
Complete Science Scraping: Full academic processing (20.0s)
# โ
Search โ Scrape โ Answer: Advanced research pipeline (29.0s)
```
**๐ฏ Framework Capabilities Demonstrated:**
- ๐ **Fast Research** - Search academic sources and generate comprehensive answers
- ๐ค **AI Extraction** - Dynamically extract structured data from any website using AI agents
- ๐ **Science Processing** - Complete academic paper processing with metadata enrichment
- ๐ฌ **Deep Research** - Advanced pipeline that searches, scrapes full content, and synthesizes answers
### Quick Individual Tests
```bash
# Test search-answer workflow
python -c "
import asyncio
from pydantic_scrape.graphs.search_answer import search_answer
async def test():
result = await search_answer('latest advances in quantum computing')
print(f'Found {len(result[\"answer\"][\"sources\"])} sources')
print(result['answer']['answer'][:200] + '...')
asyncio.run(test())
"
# Test summarization agent
python -c "
import asyncio
from pydantic_scrape.agents.summarization import summarize_content
async def test():
summary = await summarize_content(
'Artificial intelligence is transforming scientific research...'
)
print(f'Summary: {summary.summary}')
asyncio.run(test())
"
```
## ๐ค Contributing - We Need Your Help!
We're building the future of intelligent web scraping and **we want you to be part of it!**
### ๐ฏ What We're Looking For
#### ๐ค **Agent Builders**
Create specialized AI agents for:
- **Domain-specific extraction** (legal docs, medical papers, financial reports)
- **Multi-modal content** (image + text analysis, video transcription)
- **Real-time processing** (news monitoring, social media tracking)
- **Quality assurance** (fact-checking, source verification)
#### ๐ง **Dependency Developers**
Build reusable components for:
- **API integrations** (Google Scholar, PubMed, arXiv, GitHub, social platforms)
- **Content processors** (PDF extraction, video analysis, image recognition)
- **Data enrichment** (NLP analysis, metadata extraction, classification)
- **Storage & caching** (vector databases, knowledge graphs, search indices)
#### ๐ **Graph Architects**
Design intelligent workflows for:
- **Research pipelines** (literature review, systematic analysis, meta-analysis)
- **Content monitoring** (news tracking, social listening, trend analysis)
- **Knowledge extraction** (entity recognition, relationship mapping, fact extraction)
- **Quality control** (validation, verification, bias detection)
### ๐ Getting Started as a Contributor
```bash
# 1. Fork and clone
git clone https://github.com/yourusername/pydantic-scrape.git
cd pydantic-scrape
# 2. Install development dependencies
uv pip install -e ".[dev]"
# 3. Test current functionality
python test_search_answer.py # Should work out of the box
# 4. Check the current structure
ls pydantic_scrape/agents/ # See existing agents
ls pydantic_scrape/dependencies/ # See existing dependencies
ls pydantic_scrape/graphs/ # See existing graphs
# 5. Start building!
```
### ๐ก Contribution Ideas
**Easy wins for new contributors:**
- Add a new academic API (NASA ADS, bioRxiv, SSRN)
- Create a social media dependency (Reddit, LinkedIn, Mastodon)
- Build a specialized graph for a domain (legal research, patent analysis)
- Add content format support (EPUB, Markdown, slides)
**Advanced challenges:**
- Multi-agent coordination for complex research tasks
- Real-time streaming workflows with live updates
- Advanced caching and optimization strategies
- Cross-language content extraction and translation
### ๐ Community & Support
- ๐ **Found a bug?** [Open an issue](https://github.com/yourusername/pydantic-scrape/issues) with reproduction steps
- ๐ก **Have an idea?** [Start a discussion](https://github.com/yourusername/pydantic-scrape/discussions) about new features
- ๐ง **Ready to contribute?** Check out our [contribution guidelines](CONTRIBUTING.md)
- ๐ง **Questions?** Reach out to the maintainers
## ๐ Core Dependencies
- **AI Framework**: [pydantic-ai](https://github.com/pydantic/pydantic-ai) - Type-safe AI agents with structured outputs
- **Workflow Engine**: [pydantic-graph](https://github.com/pydantic/pydantic-graph) - Graph-based workflow orchestration
- **Browser Automation**: [Camoufox](https://github.com/daijro/camoufox) - Undetectable browser automation
- **Content Processing**: BeautifulSoup4, newspaper3k, pypdf
- **Academic APIs**: Integration with OpenAlex, Crossref, arXiv
## ๐ License
MIT License - see [LICENSE](LICENSE) file for details.
---
## ๐ Join Us!
Pydantic Scrape is more than a framework - it's a community building the next generation of intelligent web scraping tools. Whether you're a researcher, developer, data scientist, or domain expert, there's a place for you here.
**Let's build something amazing together!** ๐
[](https://github.com/philmade/pydantic_scrape)
[](https://github.com/philmade/pydantic_scrape)
[](https://github.com/philmade/pydantic_scrape/issues)
[](https://github.com/philmade/pydantic_scrape/graphs/contributors)
Raw data
{
"_id": null,
"home_page": null,
"name": "pydantic-scrape",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "scraping, web-scraping, pydantic, camoufox, automation",
"author": "Pydantic Scrape Contributors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/09/2d/2756cbbd80778617fead2693b6f69f26ac61c6fc1b022a4593cabb113559/pydantic_scrape-0.1.2.tar.gz",
"platform": null,
"description": "# Pydantic Scrape\n\nA modular AI-powered web scraping framework built on [pydantic-ai](https://github.com/pydantic/pydantic-ai) and [pydantic-graph](https://github.com/pydantic/pydantic-graph) for intelligent content extraction and research workflows.\n\n## What is Pydantic Scrape?\n\nPydantic Scrape is a framework for building intelligent web scraping workflows that combine:\n\n- **AI-powered content extraction** using pydantic-ai agents\n- **Graph-based workflow orchestration** with pydantic-graph \n- **Type-safe dependency injection** for modular, reusable components\n- **Specialized content handlers** for academic papers, articles, videos, and more\n\n## \u26a1 Quick Start: Search \u2192 Answer Workflow\n\nGet comprehensive research answers in seconds with our streamlined search-to-answer pipeline:\n\n```python\nfrom pydantic_scrape.graphs.search_answer import search_answer\n\n# One line to research any topic\nresult = await search_answer(\n query=\"Ivermectin working as a treatment for Cancer\",\n max_search_results=5\n)\n\n# Rich structured output with sources\nprint(f\"\u2705 Found {result['processing_stats']['search_results']} sources\")\nprint(f\"\ud83d\udcdd Answer: {result['answer']['answer']}\")\nprint(f\"\ud83d\udca1 Key insights: {len(result['answer']['key_insights'])}\")\nprint(f\"\ud83d\udcda Sources: {len(result['answer']['sources'])}\")\n```\n\n**What it does:**\n1. \ud83d\udd0d **Intelligent search** - Finds relevant academic papers and articles\n2. \ud83d\udcc4 **Content synthesis** - Combines multiple sources into comprehensive summaries \n3. \ud83c\udfaf **Answer generation** - Creates structured answers with key insights and sources\n4. \u26a1 **Fast execution** - Complete research workflow in ~10 seconds\n\n## Core Architecture: Agents + Dependencies + Graphs\n\nPydantic Scrape follows a clean three-layer architecture:\n\n### \ud83e\udd16 **Agents** - AI-powered workers\n```python\n# Intelligent search agent\nfrom pydantic_scrape.agents.search import search_agent\n\n# AI summarization agent \nfrom pydantic_scrape.agents.summarization import summarize_content\n\n# Dynamic scraping agent\nfrom pydantic_scrape.agents.bs4_scrape_script_agent import get_bs4_scrape_script_agent\n```\n\n### \ud83d\udd27 **Dependencies** - Reusable components\n```python\n# Content fetching with browser automation\nfrom pydantic_scrape.dependencies.fetch import FetchDependency\n\n# Academic API integrations\nfrom pydantic_scrape.dependencies.openalex import OpenAlexDependency\nfrom pydantic_scrape.dependencies.crossref import CrossrefDependency\n\n# Content analysis and extraction\nfrom pydantic_scrape.dependencies.content_analysis import ContentAnalysisDependency\n```\n\n### \ud83d\udcca **Graphs** - Workflow orchestration\n```python\n# Fast search \u2192 answer workflow\nfrom pydantic_scrape.graphs.search_answer import search_answer_graph\n\n# Complete science paper extraction\nfrom pydantic_scrape.graphs.science import science_graph\n\n# Dynamic scraping workflows\nfrom pydantic_scrape.graphs.dynamic_scrape import dynamic_scrape_graph\n```\n\n## \ud83d\udd2c Example: AI Content Summarization\n\nCreate structured summaries from any content:\n\n```python\nfrom pydantic_scrape.agents.summarization import summarize_content\n\n# Single document\nsummary = await summarize_content(\n \"Machine learning advances in 2024 have focused on efficiency and safety...\",\n max_length=1000\n)\n\nprint(f\"Title: {summary.title}\")\nprint(f\"Summary: {summary.summary}\")\nprint(f\"Key findings: {summary.key_findings}\")\nprint(f\"Confidence: {summary.confidence_score}\")\n\n# Multiple documents (returns comprehensive summary)\ncombined_summary = await summarize_content([\n doc1, doc2, doc3 # List of content objects\n])\n```\n\n## \ud83e\udde9 Example: Custom Dependency\n\nBuild reusable components for specific content types:\n\n```python\nfrom dataclasses import dataclass\nfrom typing import Optional\nfrom pydantic import BaseModel\n\nclass TwitterContent(BaseModel):\n tweet_text: str\n author: str\n likes: int\n retweets: int\n\n@dataclass \nclass TwitterDependency:\n \"\"\"Extract structured data from Twitter/X\"\"\"\n \n def __init__(self, api_key: str):\n self.api_key = api_key\n \n async def extract_tweet_data(self, url: str) -> TwitterContent:\n # Custom extraction logic here\n pass\n```\n\n## \ud83d\udcc8 Example: Custom Graph Workflow\n\nCompose agents and dependencies into intelligent workflows:\n\n```python\nfrom dataclasses import dataclass\nfrom typing import Union\nfrom pydantic_graph import BaseNode, Graph, GraphRunContext, End\n\n@dataclass\nclass ResearchState:\n query: str\n sources_found: list = None\n summaries: list = None\n final_report: str = None\n\n@dataclass\nclass ResearchDeps:\n search: SearchDependency\n summarizer: SummarizationDependency\n \n@dataclass\nclass SearchNode(BaseNode[ResearchState, ResearchDeps, Union[\"SummarizeNode\", End]]):\n async def run(self, ctx: GraphRunContext[ResearchState, ResearchDeps]):\n sources = await ctx.deps.search.find_sources(ctx.state.query)\n if not sources:\n return End({\"error\": \"No sources found\"})\n \n ctx.state.sources_found = sources\n return SummarizeNode()\n\n# Assemble the graph\nresearch_graph = Graph(nodes=[SearchNode, SummarizeNode, ReportNode])\n```\n\n## \ud83d\udee0\ufe0f Installation\n\n### Development Installation\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/pydantic-scrape.git\ncd pydantic-scrape\n\n# Install with development dependencies (using uv for speed)\nuv pip install -e \".[dev]\"\n# or with pip\npip install -e \".[dev]\"\n\n# Set up environment variables\ncp .env.example .env\n# Add your API keys (OPENAI_API_KEY, etc.)\n```\n\n## \ud83e\uddea Comprehensive Testing & Validation\n\n**\u2705 ALL 4 CORE GRAPHS TESTED AND OPERATIONAL!**\n\nRun the complete test suite:\n\n```bash\n# Test all 4 graphs with real examples\npython test_all_graphs.py\n\n# Results: 4/4 graphs passing in ~90 seconds\n# \u2705 Search \u2192 Answer: Research workflow (32.9s)\n# \u2705 Dynamic AI Scraping: Extract from any site (12.4s) \n# \u2705 Complete Science Scraping: Full academic processing (20.0s)\n# \u2705 Search \u2192 Scrape \u2192 Answer: Advanced research pipeline (29.0s)\n```\n\n**\ud83c\udfaf Framework Capabilities Demonstrated:**\n- \ud83d\udd0d **Fast Research** - Search academic sources and generate comprehensive answers\n- \ud83e\udd16 **AI Extraction** - Dynamically extract structured data from any website using AI agents\n- \ud83d\udcc4 **Science Processing** - Complete academic paper processing with metadata enrichment\n- \ud83d\udd2c **Deep Research** - Advanced pipeline that searches, scrapes full content, and synthesizes answers\n\n### Quick Individual Tests\n\n```bash\n# Test search-answer workflow\npython -c \"\nimport asyncio\nfrom pydantic_scrape.graphs.search_answer import search_answer\n\nasync def test():\n result = await search_answer('latest advances in quantum computing')\n print(f'Found {len(result[\\\"answer\\\"][\\\"sources\\\"])} sources')\n print(result['answer']['answer'][:200] + '...')\n\nasyncio.run(test())\n\"\n\n# Test summarization agent\npython -c \"\nimport asyncio\nfrom pydantic_scrape.agents.summarization import summarize_content\n\nasync def test():\n summary = await summarize_content(\n 'Artificial intelligence is transforming scientific research...'\n )\n print(f'Summary: {summary.summary}')\n\nasyncio.run(test())\n\"\n```\n\n## \ud83e\udd1d Contributing - We Need Your Help!\n\nWe're building the future of intelligent web scraping and **we want you to be part of it!** \n\n### \ud83c\udfaf What We're Looking For\n\n#### \ud83e\udd16 **Agent Builders**\nCreate specialized AI agents for:\n- **Domain-specific extraction** (legal docs, medical papers, financial reports)\n- **Multi-modal content** (image + text analysis, video transcription)\n- **Real-time processing** (news monitoring, social media tracking)\n- **Quality assurance** (fact-checking, source verification)\n\n#### \ud83d\udd27 **Dependency Developers** \nBuild reusable components for:\n- **API integrations** (Google Scholar, PubMed, arXiv, GitHub, social platforms)\n- **Content processors** (PDF extraction, video analysis, image recognition)\n- **Data enrichment** (NLP analysis, metadata extraction, classification)\n- **Storage & caching** (vector databases, knowledge graphs, search indices)\n\n#### \ud83d\udcca **Graph Architects**\nDesign intelligent workflows for:\n- **Research pipelines** (literature review, systematic analysis, meta-analysis)\n- **Content monitoring** (news tracking, social listening, trend analysis)\n- **Knowledge extraction** (entity recognition, relationship mapping, fact extraction)\n- **Quality control** (validation, verification, bias detection)\n\n### \ud83d\ude80 Getting Started as a Contributor\n\n```bash\n# 1. Fork and clone\ngit clone https://github.com/yourusername/pydantic-scrape.git\ncd pydantic-scrape\n\n# 2. Install development dependencies \nuv pip install -e \".[dev]\"\n\n# 3. Test current functionality\npython test_search_answer.py # Should work out of the box\n\n# 4. Check the current structure\nls pydantic_scrape/agents/ # See existing agents\nls pydantic_scrape/dependencies/ # See existing dependencies \nls pydantic_scrape/graphs/ # See existing graphs\n\n# 5. Start building!\n```\n\n### \ud83d\udca1 Contribution Ideas\n\n**Easy wins for new contributors:**\n- Add a new academic API (NASA ADS, bioRxiv, SSRN)\n- Create a social media dependency (Reddit, LinkedIn, Mastodon)\n- Build a specialized graph for a domain (legal research, patent analysis)\n- Add content format support (EPUB, Markdown, slides)\n\n**Advanced challenges:**\n- Multi-agent coordination for complex research tasks\n- Real-time streaming workflows with live updates\n- Advanced caching and optimization strategies \n- Cross-language content extraction and translation\n\n### \ud83c\udf1f Community & Support\n\n- \ud83d\udc1b **Found a bug?** [Open an issue](https://github.com/yourusername/pydantic-scrape/issues) with reproduction steps\n- \ud83d\udca1 **Have an idea?** [Start a discussion](https://github.com/yourusername/pydantic-scrape/discussions) about new features\n- \ud83d\udd27 **Ready to contribute?** Check out our [contribution guidelines](CONTRIBUTING.md)\n- \ud83d\udce7 **Questions?** Reach out to the maintainers\n\n## \ud83d\udccb Core Dependencies\n\n- **AI Framework**: [pydantic-ai](https://github.com/pydantic/pydantic-ai) - Type-safe AI agents with structured outputs\n- **Workflow Engine**: [pydantic-graph](https://github.com/pydantic/pydantic-graph) - Graph-based workflow orchestration \n- **Browser Automation**: [Camoufox](https://github.com/daijro/camoufox) - Undetectable browser automation\n- **Content Processing**: BeautifulSoup4, newspaper3k, pypdf\n- **Academic APIs**: Integration with OpenAlex, Crossref, arXiv\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n## \ud83c\udf89 Join Us!\n\nPydantic Scrape is more than a framework - it's a community building the next generation of intelligent web scraping tools. Whether you're a researcher, developer, data scientist, or domain expert, there's a place for you here.\n\n**Let's build something amazing together!** \ud83d\ude80\n\n[](https://github.com/philmade/pydantic_scrape)\n[](https://github.com/philmade/pydantic_scrape)\n[](https://github.com/philmade/pydantic_scrape/issues)\n[](https://github.com/philmade/pydantic_scrape/graphs/contributors)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A modular web scraping framework using pydantic-ai and pydantic-graph with intelligent caching",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/philmade/pydantic_scrape",
"Issues": "https://github.com/philmade/pydantic_scrape/issues",
"Repository": "https://github.com/philmade/pydantic_scrape"
},
"split_keywords": [
"scraping",
" web-scraping",
" pydantic",
" camoufox",
" automation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "cb4d861865730dbcfb3d04e1823fedad0dbce4c388ece1994f29bc198b6b7d04",
"md5": "d70920cdc7c8bee2bd5849c8de9daaff",
"sha256": "609820b02eb98d93080060733b6810cdbe8d5d0af205e44193851991edec6505"
},
"downloads": -1,
"filename": "pydantic_scrape-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d70920cdc7c8bee2bd5849c8de9daaff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 96773,
"upload_time": "2025-07-23T11:55:15",
"upload_time_iso_8601": "2025-07-23T11:55:15.130264Z",
"url": "https://files.pythonhosted.org/packages/cb/4d/861865730dbcfb3d04e1823fedad0dbce4c388ece1994f29bc198b6b7d04/pydantic_scrape-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "092d2756cbbd80778617fead2693b6f69f26ac61c6fc1b022a4593cabb113559",
"md5": "0bc390c276c770d6cf2963664e89acbc",
"sha256": "05ee20553e58a8157556cdcdafac7ff9d830d4fedcf9e0aa19193986ee41febb"
},
"downloads": -1,
"filename": "pydantic_scrape-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "0bc390c276c770d6cf2963664e89acbc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 88486,
"upload_time": "2025-07-23T11:55:16",
"upload_time_iso_8601": "2025-07-23T11:55:16.523703Z",
"url": "https://files.pythonhosted.org/packages/09/2d/2756cbbd80778617fead2693b6f69f26ac61c6fc1b022a4593cabb113559/pydantic_scrape-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-23 11:55:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "philmade",
"github_project": "pydantic_scrape",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pydantic-ai",
"specs": [
[
">=",
"0.2.11"
]
]
},
{
"name": "pydantic-graph",
"specs": [
[
">=",
"0.2.11"
]
]
},
{
"name": "camoufox",
"specs": [
[
">=",
"0.4.11"
]
]
},
{
"name": "loguru",
"specs": [
[
">=",
"0.7.3"
]
]
},
{
"name": "newspaper3k",
"specs": [
[
">=",
"0.2.8"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.13.0"
]
]
},
{
"name": "httpx",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "pyalex",
"specs": [
[
">=",
"0.17"
]
]
},
{
"name": "habanero",
"specs": [
[
">=",
"1.2.6"
]
]
},
{
"name": "goose3",
"specs": [
[
">=",
"3.1.19"
]
]
},
{
"name": "PyMuPDF",
"specs": [
[
">=",
"1.25.0"
]
]
},
{
"name": "python-docx",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "EbookLib",
"specs": [
[
">=",
"0.19"
]
]
},
{
"name": "yt-dlp",
"specs": [
[
">=",
"2023.12.30"
]
]
},
{
"name": "lxml",
"specs": [
[
">=",
"4.9.0"
]
]
},
{
"name": "lxml_html_clean",
"specs": [
[
">=",
"0.1.0"
]
]
},
{
"name": "rapidfuzz",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "google-generativeai",
"specs": [
[
">=",
"0.3.0"
]
]
},
{
"name": "openai",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "platformdirs",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "searchthescience",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "pydantic-scrape"
}