# PostCrawl Python SDK
Official Python SDK for [PostCrawl](https://postcrawl.com) - The Fastest LLM-Ready Social Media Crawler. Extract and search content from Reddit and TikTok with a simple, type-safe Python interface.
## Features
- 🔍 **Search** across Reddit and TikTok with advanced filtering
- 📊 **Extract** content from social media URLs with optional comments
- 🚀 **Combined search and extract** in a single operation
- 🏷️ **Type-safe** with Pydantic models and full type hints
- ⚡ **Async/await** support with synchronous convenience methods
- 🛡️ **Comprehensive error handling** with detailed exceptions
- 📈 **Rate limiting** support with credit tracking
- 🔄 **Automatic retries** for network errors
- 🎯 **Platform-specific models** for Reddit and TikTok data with strong typing
- 📝 **Rich content formatting** with markdown support
- 🐍 **Python 3.10+** with modern type annotations and snake_case naming
## Installation
### Using uv (Recommended)
[uv](https://github.com/astral-sh/uv) is a fast Python package manager that we recommend:
```bash
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Add postcrawl to your project
uv add postcrawl
```
### Using pip
```bash
pip install postcrawl
```
### Optional: Environment Variables
For loading API keys from .env files:
```bash
uv add python-dotenv
# or
pip install python-dotenv
```
## Requirements
- Python 3.10 or higher
- PostCrawl API key ([Get one for free](https://postcrawl.com))
## Quick Start
### Async Usage (Recommended)
```python
import asyncio
from postcrawl import PostCrawlClient
async def main():
# Initialize the client with your API key
async with PostCrawlClient(api_key="sk_your_api_key_here") as client:
# Search for content
results = await client.search(
social_platforms=["reddit"],
query="machine learning",
results=10,
page=1
)
# Process results
for post in results:
print(f"{post.title} - {post.url}")
print(f" Date: {post.date}")
print(f" Snippet: {post.snippet[:100]}...")
# Run the async function
asyncio.run(main())
```
### Synchronous Usage
```python
from postcrawl import PostCrawlClient
# Initialize the client
client = PostCrawlClient(api_key="sk_your_api_key_here")
# Search synchronously
results = client.search_sync(
social_platforms=["reddit", "tiktok"],
query="artificial intelligence",
results=5
)
# Extract content from URLs
posts = client.extract_sync(
urls=["https://reddit.com/r/...", "https://tiktok.com/@..."],
include_comments=True
)
```
## API Reference
### Search
```python
results = await client.search(
social_platforms=["reddit", "tiktok"],
query="your search query",
results=10, # 1-100
page=1 # pagination
)
```
### Extract
```python
posts = await client.extract(
urls=["https://reddit.com/...", "https://tiktok.com/..."],
include_comments=True,
response_mode="raw" # or "markdown"
)
```
### Search and Extract
```python
posts = await client.search_and_extract(
social_platforms=["reddit"],
query="search query",
results=5,
page=1,
include_comments=False,
response_mode="markdown"
)
```
### Synchronous Methods
```python
# All methods have synchronous versions
results = client.search_sync(...)
posts = client.extract_sync(...)
combined = client.search_and_extract_sync(...)
```
## Examples
Check out the `examples/` directory for complete working examples:
- `search_101.py` - Basic search functionality demo
- `extract_101.py` - Content extraction demo
- `search_and_extract_101.py` - Combined operation demo
Run examples with:
```bash
# Using uv (recommended)
uv run python examples/search_101.py
# Or with standard Python
cd examples
python search_101.py
```
## Response Models
### SearchResult
Response from the search endpoint:
- `title`: Title of the search result
- `url`: URL of the search result
- `snippet`: Text snippet from the content
- `date`: Date of the post (e.g., "Dec 28, 2024")
- `image_url`: URL of associated image (can be empty string)
### ExtractedPost
- `url`: Original URL
- `source`: Platform name ("reddit" or "tiktok")
- `raw`: Raw content data (RedditPost or TiktokPost object) - strongly typed
- `markdown`: Markdown formatted content (when response_mode="markdown")
- `error`: Error message if extraction failed
## Working with Platform-Specific Types
The SDK provides type-safe access to platform-specific data:
```python
from postcrawl import PostCrawlClient, RedditPost, TiktokPost
# Extract content with proper type handling
posts = await client.extract(urls=["https://reddit.com/..."])
for post in posts:
if post.error:
print(f"Error: {post.error}")
elif isinstance(post.raw, RedditPost):
# Access Reddit-specific fields with snake_case attributes
print(f"Subreddit: r/{post.raw.subreddit_name}")
print(f"Score: {post.raw.score}")
print(f"Title: {post.raw.title}")
print(f"Upvotes: {post.raw.upvotes}")
print(f"Created: {post.raw.created_at}")
if post.raw.comments:
print(f"Comments: {len(post.raw.comments)}")
elif isinstance(post.raw, TiktokPost):
# Access TikTok-specific fields with snake_case attributes
print(f"Username: @{post.raw.username}")
print(f"Likes: {post.raw.likes}")
print(f"Total Comments: {post.raw.total_comments}")
print(f"Created: {post.raw.created_at}")
if post.raw.hashtags:
print(f"Hashtags: {', '.join(post.raw.hashtags)}")
```
## Error Handling
```python
from postcrawl.exceptions import (
AuthenticationError, # Invalid API key
InsufficientCreditsError, # Not enough credits
RateLimitError, # Rate limit exceeded
ValidationError # Invalid parameters
)
```
## Development
This project uses [uv](https://github.com/astral-sh/uv) for dependency management. See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed setup and contribution guidelines.
### Quick Development Setup
```bash
# Clone the repository
git clone https://github.com/post-crawl/python-sdk.git
cd python-sdk
# Install dependencies
uv sync
# Run tests
make test
# Run all checks (format, lint, test)
make check
# Build the package
make build
```
### Available Commands
```bash
make help # Show all available commands
make format # Format code with black and ruff
make lint # Run linting and type checking
make test # Run test suite
make check # Run format, lint, and tests
make build # Build distribution packages
make verify # Verify package installation
make publish-test # Publish to TestPyPI
```
## API Key Management
### Environment Variables (Recommended)
Store your API key securely in environment variables:
```bash
export POSTCRAWL_API_KEY="sk_your_api_key_here"
```
Or use a `.env` file:
```bash
# .env
POSTCRAWL_API_KEY=sk_your_api_key_here
```
Then load it in your code:
```python
import os
from dotenv import load_dotenv
from postcrawl import PostCrawlClient
load_dotenv()
client = PostCrawlClient(api_key=os.getenv("POSTCRAWL_API_KEY"))
```
### Security Best Practices
- **Never hardcode API keys** in your source code
- **Add `.env` to `.gitignore`** to prevent accidental commits
- **Use environment variables** in production
- **Rotate keys regularly** through the PostCrawl dashboard
- **Set key permissions** to limit access to specific operations
## Rate Limits & Credits
PostCrawl uses a credit-based system:
- **Search**: ~1 credit per 10 results
- **Extract**: ~1 credit per URL (without comments)
- **Extract with comments**: ~3 credits per URL
Rate limits are returned in response headers:
```python
client = PostCrawlClient(api_key="sk_...")
results = await client.search(...)
print(f"Rate limit: {client.rate_limit_info['limit']}")
print(f"Remaining: {client.rate_limit_info['remaining']}")
print(f"Reset at: {client.rate_limit_info['reset']}")
```
## Support
- **Documentation**: [github.com/post-crawl/python-sdk](https://github.com/post-crawl/python-sdk)
- **Issues**: [github.com/post-crawl/python-sdk/issues](https://github.com/post-crawl/python-sdk/issues)
- **Email**: support@postcrawl.com
## License
MIT License - see [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "postcrawl",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "api, postcrawl, reddit, scraping, sdk, social-media, tiktok",
"author": null,
"author_email": "PostCrawl Team <support@postcrawl.com>",
"download_url": "https://files.pythonhosted.org/packages/4a/77/21c74210b000353db983d7175b74679869c80a5527e92a5858a75d00a983/postcrawl-1.0.0.tar.gz",
"platform": null,
"description": "# PostCrawl Python SDK\n\nOfficial Python SDK for [PostCrawl](https://postcrawl.com) - The Fastest LLM-Ready Social Media Crawler. Extract and search content from Reddit and TikTok with a simple, type-safe Python interface.\n\n## Features\n\n- \ud83d\udd0d **Search** across Reddit and TikTok with advanced filtering\n- \ud83d\udcca **Extract** content from social media URLs with optional comments\n- \ud83d\ude80 **Combined search and extract** in a single operation\n- \ud83c\udff7\ufe0f **Type-safe** with Pydantic models and full type hints\n- \u26a1 **Async/await** support with synchronous convenience methods\n- \ud83d\udee1\ufe0f **Comprehensive error handling** with detailed exceptions\n- \ud83d\udcc8 **Rate limiting** support with credit tracking\n- \ud83d\udd04 **Automatic retries** for network errors\n- \ud83c\udfaf **Platform-specific models** for Reddit and TikTok data with strong typing\n- \ud83d\udcdd **Rich content formatting** with markdown support\n- \ud83d\udc0d **Python 3.10+** with modern type annotations and snake_case naming\n\n## Installation\n\n### Using uv (Recommended)\n\n[uv](https://github.com/astral-sh/uv) is a fast Python package manager that we recommend:\n\n```bash\n# Install uv if you haven't already\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Add postcrawl to your project\nuv add postcrawl\n```\n\n### Using pip\n\n```bash\npip install postcrawl\n```\n\n### Optional: Environment Variables\n\nFor loading API keys from .env files:\n\n```bash\nuv add python-dotenv\n# or\npip install python-dotenv\n```\n\n## Requirements\n\n- Python 3.10 or higher\n- PostCrawl API key ([Get one for free](https://postcrawl.com))\n\n## Quick Start\n\n### Async Usage (Recommended)\n```python\nimport asyncio\nfrom postcrawl import PostCrawlClient\n\nasync def main():\n # Initialize the client with your API key\n async with PostCrawlClient(api_key=\"sk_your_api_key_here\") as client:\n # Search for content\n results = await client.search(\n social_platforms=[\"reddit\"],\n query=\"machine learning\",\n results=10,\n page=1\n )\n\n # Process results\n for post in results:\n print(f\"{post.title} - {post.url}\")\n print(f\" Date: {post.date}\")\n print(f\" Snippet: {post.snippet[:100]}...\")\n\n# Run the async function\nasyncio.run(main())\n```\n\n### Synchronous Usage\n```python\nfrom postcrawl import PostCrawlClient\n\n# Initialize the client\nclient = PostCrawlClient(api_key=\"sk_your_api_key_here\")\n\n# Search synchronously\nresults = client.search_sync(\n social_platforms=[\"reddit\", \"tiktok\"],\n query=\"artificial intelligence\",\n results=5\n)\n\n# Extract content from URLs\nposts = client.extract_sync(\n urls=[\"https://reddit.com/r/...\", \"https://tiktok.com/@...\"],\n include_comments=True\n)\n```\n\n\n## API Reference\n\n### Search\n```python\nresults = await client.search(\n social_platforms=[\"reddit\", \"tiktok\"],\n query=\"your search query\",\n results=10, # 1-100\n page=1 # pagination\n)\n```\n\n### Extract\n```python\nposts = await client.extract(\n urls=[\"https://reddit.com/...\", \"https://tiktok.com/...\"],\n include_comments=True,\n response_mode=\"raw\" # or \"markdown\"\n)\n```\n\n### Search and Extract\n```python\nposts = await client.search_and_extract(\n social_platforms=[\"reddit\"],\n query=\"search query\",\n results=5,\n page=1,\n include_comments=False,\n response_mode=\"markdown\"\n)\n```\n\n### Synchronous Methods\n```python\n# All methods have synchronous versions\nresults = client.search_sync(...)\nposts = client.extract_sync(...)\ncombined = client.search_and_extract_sync(...)\n```\n\n## Examples\n\nCheck out the `examples/` directory for complete working examples:\n- `search_101.py` - Basic search functionality demo\n- `extract_101.py` - Content extraction demo\n- `search_and_extract_101.py` - Combined operation demo\n\nRun examples with:\n```bash\n# Using uv (recommended)\nuv run python examples/search_101.py\n\n# Or with standard Python\ncd examples\npython search_101.py\n```\n\n## Response Models\n\n### SearchResult\nResponse from the search endpoint:\n- `title`: Title of the search result\n- `url`: URL of the search result\n- `snippet`: Text snippet from the content\n- `date`: Date of the post (e.g., \"Dec 28, 2024\")\n- `image_url`: URL of associated image (can be empty string)\n\n### ExtractedPost\n- `url`: Original URL\n- `source`: Platform name (\"reddit\" or \"tiktok\")\n- `raw`: Raw content data (RedditPost or TiktokPost object) - strongly typed\n- `markdown`: Markdown formatted content (when response_mode=\"markdown\")\n- `error`: Error message if extraction failed\n\n## Working with Platform-Specific Types\n\nThe SDK provides type-safe access to platform-specific data:\n\n```python\nfrom postcrawl import PostCrawlClient, RedditPost, TiktokPost\n\n# Extract content with proper type handling\nposts = await client.extract(urls=[\"https://reddit.com/...\"])\n\nfor post in posts:\n if post.error:\n print(f\"Error: {post.error}\")\n elif isinstance(post.raw, RedditPost):\n # Access Reddit-specific fields with snake_case attributes\n print(f\"Subreddit: r/{post.raw.subreddit_name}\")\n print(f\"Score: {post.raw.score}\")\n print(f\"Title: {post.raw.title}\")\n print(f\"Upvotes: {post.raw.upvotes}\")\n print(f\"Created: {post.raw.created_at}\")\n if post.raw.comments:\n print(f\"Comments: {len(post.raw.comments)}\")\n elif isinstance(post.raw, TiktokPost):\n # Access TikTok-specific fields with snake_case attributes\n print(f\"Username: @{post.raw.username}\")\n print(f\"Likes: {post.raw.likes}\")\n print(f\"Total Comments: {post.raw.total_comments}\")\n print(f\"Created: {post.raw.created_at}\")\n if post.raw.hashtags:\n print(f\"Hashtags: {', '.join(post.raw.hashtags)}\")\n```\n\n## Error Handling\n\n```python\nfrom postcrawl.exceptions import (\n AuthenticationError, # Invalid API key\n InsufficientCreditsError, # Not enough credits\n RateLimitError, # Rate limit exceeded\n ValidationError # Invalid parameters\n)\n```\n\n## Development\n\nThis project uses [uv](https://github.com/astral-sh/uv) for dependency management. See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed setup and contribution guidelines.\n\n### Quick Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/post-crawl/python-sdk.git\ncd python-sdk\n\n# Install dependencies\nuv sync\n\n# Run tests\nmake test\n\n# Run all checks (format, lint, test)\nmake check\n\n# Build the package\nmake build\n```\n\n### Available Commands\n\n```bash\nmake help # Show all available commands\nmake format # Format code with black and ruff\nmake lint # Run linting and type checking\nmake test # Run test suite\nmake check # Run format, lint, and tests\nmake build # Build distribution packages\nmake verify # Verify package installation\nmake publish-test # Publish to TestPyPI\n```\n\n## API Key Management\n\n### Environment Variables (Recommended)\n\nStore your API key securely in environment variables:\n\n```bash\nexport POSTCRAWL_API_KEY=\"sk_your_api_key_here\"\n```\n\nOr use a `.env` file:\n```bash\n# .env\nPOSTCRAWL_API_KEY=sk_your_api_key_here\n```\n\nThen load it in your code:\n```python\nimport os\nfrom dotenv import load_dotenv\nfrom postcrawl import PostCrawlClient\n\nload_dotenv()\nclient = PostCrawlClient(api_key=os.getenv(\"POSTCRAWL_API_KEY\"))\n```\n\n### Security Best Practices\n\n- **Never hardcode API keys** in your source code\n- **Add `.env` to `.gitignore`** to prevent accidental commits\n- **Use environment variables** in production\n- **Rotate keys regularly** through the PostCrawl dashboard\n- **Set key permissions** to limit access to specific operations\n\n## Rate Limits & Credits\n\nPostCrawl uses a credit-based system:\n\n- **Search**: ~1 credit per 10 results\n- **Extract**: ~1 credit per URL (without comments)\n- **Extract with comments**: ~3 credits per URL\n\nRate limits are returned in response headers:\n```python\nclient = PostCrawlClient(api_key=\"sk_...\")\nresults = await client.search(...)\n\nprint(f\"Rate limit: {client.rate_limit_info['limit']}\")\nprint(f\"Remaining: {client.rate_limit_info['remaining']}\")\nprint(f\"Reset at: {client.rate_limit_info['reset']}\")\n```\n\n## Support\n\n- **Documentation**: [github.com/post-crawl/python-sdk](https://github.com/post-crawl/python-sdk)\n- **Issues**: [github.com/post-crawl/python-sdk/issues](https://github.com/post-crawl/python-sdk/issues)\n- **Email**: support@postcrawl.com\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python SDK for PostCrawl - The Fastest LLM Ready Social Media Crawler",
"version": "1.0.0",
"project_urls": {
"Changelog": "https://github.com/post-crawl/python-sdk/releases",
"Documentation": "https://github.com/post-crawl/python-sdk",
"Homepage": "https://postcrawl.com/",
"Issues": "https://github.com/post-crawl/python-sdk/issues",
"Repository": "https://github.com/post-crawl/python-sdk"
},
"split_keywords": [
"api",
" postcrawl",
" reddit",
" scraping",
" sdk",
" social-media",
" tiktok"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "53a5a8885ff96aa1deb8ed5d50aef55d6d62d99a227735c0ef68d3e28ed2932f",
"md5": "1a0bd9307d149b5455a2686cee32661f",
"sha256": "59d9ae01eb7817d7c2894ddd06d1a53dde49308827abc8776eea8a067a2b333b"
},
"downloads": -1,
"filename": "postcrawl-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1a0bd9307d149b5455a2686cee32661f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 17220,
"upload_time": "2025-07-10T12:47:32",
"upload_time_iso_8601": "2025-07-10T12:47:32.208554Z",
"url": "https://files.pythonhosted.org/packages/53/a5/a8885ff96aa1deb8ed5d50aef55d6d62d99a227735c0ef68d3e28ed2932f/postcrawl-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4a7721c74210b000353db983d7175b74679869c80a5527e92a5858a75d00a983",
"md5": "6c6194de7cc94a40ccb0594762f10fb8",
"sha256": "80125e9cff06f7b078add32530d0e417cdd2cac684b171f338926397aac0aceb"
},
"downloads": -1,
"filename": "postcrawl-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "6c6194de7cc94a40ccb0594762f10fb8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 17505,
"upload_time": "2025-07-10T12:47:33",
"upload_time_iso_8601": "2025-07-10T12:47:33.901517Z",
"url": "https://files.pythonhosted.org/packages/4a/77/21c74210b000353db983d7175b74679869c80a5527e92a5858a75d00a983/postcrawl-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 12:47:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "post-crawl",
"github_project": "python-sdk",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "postcrawl"
}