postcrawl


Namepostcrawl JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryPython SDK for PostCrawl - The Fastest LLM Ready Social Media Crawler
upload_time2025-07-10 12:47:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords api postcrawl reddit scraping sdk social-media tiktok
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PostCrawl Python SDK

Official Python SDK for [PostCrawl](https://postcrawl.com) - The Fastest LLM-Ready Social Media Crawler. Extract and search content from Reddit and TikTok with a simple, type-safe Python interface.

## Features

- 🔍 **Search** across Reddit and TikTok with advanced filtering
- 📊 **Extract** content from social media URLs with optional comments
- 🚀 **Combined search and extract** in a single operation
- 🏷️ **Type-safe** with Pydantic models and full type hints
- ⚡ **Async/await** support with synchronous convenience methods
- 🛡️ **Comprehensive error handling** with detailed exceptions
- 📈 **Rate limiting** support with credit tracking
- 🔄 **Automatic retries** for network errors
- 🎯 **Platform-specific models** for Reddit and TikTok data with strong typing
- 📝 **Rich content formatting** with markdown support
- 🐍 **Python 3.10+** with modern type annotations and snake_case naming

## Installation

### Using uv (Recommended)

[uv](https://github.com/astral-sh/uv) is a fast Python package manager that we recommend:

```bash
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Add postcrawl to your project
uv add postcrawl
```

### Using pip

```bash
pip install postcrawl
```

### Optional: Environment Variables

For loading API keys from .env files:

```bash
uv add python-dotenv
# or
pip install python-dotenv
```

## Requirements

- Python 3.10 or higher
- PostCrawl API key ([Get one for free](https://postcrawl.com))

## Quick Start

### Async Usage (Recommended)
```python
import asyncio
from postcrawl import PostCrawlClient

async def main():
    # Initialize the client with your API key
    async with PostCrawlClient(api_key="sk_your_api_key_here") as client:
        # Search for content
        results = await client.search(
            social_platforms=["reddit"],
            query="machine learning",
            results=10,
            page=1
        )

        # Process results
        for post in results:
            print(f"{post.title} - {post.url}")
            print(f"  Date: {post.date}")
            print(f"  Snippet: {post.snippet[:100]}...")

# Run the async function
asyncio.run(main())
```

### Synchronous Usage
```python
from postcrawl import PostCrawlClient

# Initialize the client
client = PostCrawlClient(api_key="sk_your_api_key_here")

# Search synchronously
results = client.search_sync(
    social_platforms=["reddit", "tiktok"],
    query="artificial intelligence",
    results=5
)

# Extract content from URLs
posts = client.extract_sync(
    urls=["https://reddit.com/r/...", "https://tiktok.com/@..."],
    include_comments=True
)
```


## API Reference

### Search
```python
results = await client.search(
    social_platforms=["reddit", "tiktok"],
    query="your search query",
    results=10,  # 1-100
    page=1       # pagination
)
```

### Extract
```python
posts = await client.extract(
    urls=["https://reddit.com/...", "https://tiktok.com/..."],
    include_comments=True,
    response_mode="raw"  # or "markdown"
)
```

### Search and Extract
```python
posts = await client.search_and_extract(
    social_platforms=["reddit"],
    query="search query",
    results=5,
    page=1,
    include_comments=False,
    response_mode="markdown"
)
```

### Synchronous Methods
```python
# All methods have synchronous versions
results = client.search_sync(...)
posts = client.extract_sync(...)
combined = client.search_and_extract_sync(...)
```

## Examples

Check out the `examples/` directory for complete working examples:
- `search_101.py` - Basic search functionality demo
- `extract_101.py` - Content extraction demo
- `search_and_extract_101.py` - Combined operation demo

Run examples with:
```bash
# Using uv (recommended)
uv run python examples/search_101.py

# Or with standard Python
cd examples
python search_101.py
```

## Response Models

### SearchResult
Response from the search endpoint:
- `title`: Title of the search result
- `url`: URL of the search result
- `snippet`: Text snippet from the content
- `date`: Date of the post (e.g., "Dec 28, 2024")
- `image_url`: URL of associated image (can be empty string)

### ExtractedPost
- `url`: Original URL
- `source`: Platform name ("reddit" or "tiktok")
- `raw`: Raw content data (RedditPost or TiktokPost object) - strongly typed
- `markdown`: Markdown formatted content (when response_mode="markdown")
- `error`: Error message if extraction failed

## Working with Platform-Specific Types

The SDK provides type-safe access to platform-specific data:

```python
from postcrawl import PostCrawlClient, RedditPost, TiktokPost

# Extract content with proper type handling
posts = await client.extract(urls=["https://reddit.com/..."])

for post in posts:
    if post.error:
        print(f"Error: {post.error}")
    elif isinstance(post.raw, RedditPost):
        # Access Reddit-specific fields with snake_case attributes
        print(f"Subreddit: r/{post.raw.subreddit_name}")
        print(f"Score: {post.raw.score}")
        print(f"Title: {post.raw.title}")
        print(f"Upvotes: {post.raw.upvotes}")
        print(f"Created: {post.raw.created_at}")
        if post.raw.comments:
            print(f"Comments: {len(post.raw.comments)}")
    elif isinstance(post.raw, TiktokPost):
        # Access TikTok-specific fields with snake_case attributes
        print(f"Username: @{post.raw.username}")
        print(f"Likes: {post.raw.likes}")
        print(f"Total Comments: {post.raw.total_comments}")
        print(f"Created: {post.raw.created_at}")
        if post.raw.hashtags:
            print(f"Hashtags: {', '.join(post.raw.hashtags)}")
```

## Error Handling

```python
from postcrawl.exceptions import (
    AuthenticationError,      # Invalid API key
    InsufficientCreditsError, # Not enough credits
    RateLimitError,          # Rate limit exceeded
    ValidationError          # Invalid parameters
)
```

## Development

This project uses [uv](https://github.com/astral-sh/uv) for dependency management. See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed setup and contribution guidelines.

### Quick Development Setup

```bash
# Clone the repository
git clone https://github.com/post-crawl/python-sdk.git
cd python-sdk

# Install dependencies
uv sync

# Run tests
make test

# Run all checks (format, lint, test)
make check

# Build the package
make build
```

### Available Commands

```bash
make help         # Show all available commands
make format       # Format code with black and ruff
make lint         # Run linting and type checking
make test         # Run test suite
make check        # Run format, lint, and tests
make build        # Build distribution packages
make verify       # Verify package installation
make publish-test # Publish to TestPyPI
```

## API Key Management

### Environment Variables (Recommended)

Store your API key securely in environment variables:

```bash
export POSTCRAWL_API_KEY="sk_your_api_key_here"
```

Or use a `.env` file:
```bash
# .env
POSTCRAWL_API_KEY=sk_your_api_key_here
```

Then load it in your code:
```python
import os
from dotenv import load_dotenv
from postcrawl import PostCrawlClient

load_dotenv()
client = PostCrawlClient(api_key=os.getenv("POSTCRAWL_API_KEY"))
```

### Security Best Practices

- **Never hardcode API keys** in your source code
- **Add `.env` to `.gitignore`** to prevent accidental commits
- **Use environment variables** in production
- **Rotate keys regularly** through the PostCrawl dashboard
- **Set key permissions** to limit access to specific operations

## Rate Limits & Credits

PostCrawl uses a credit-based system:

- **Search**: ~1 credit per 10 results
- **Extract**: ~1 credit per URL (without comments)
- **Extract with comments**: ~3 credits per URL

Rate limits are returned in response headers:
```python
client = PostCrawlClient(api_key="sk_...")
results = await client.search(...)

print(f"Rate limit: {client.rate_limit_info['limit']}")
print(f"Remaining: {client.rate_limit_info['remaining']}")
print(f"Reset at: {client.rate_limit_info['reset']}")
```

## Support

- **Documentation**: [github.com/post-crawl/python-sdk](https://github.com/post-crawl/python-sdk)
- **Issues**: [github.com/post-crawl/python-sdk/issues](https://github.com/post-crawl/python-sdk/issues)
- **Email**: support@postcrawl.com

## License

MIT License - see [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "postcrawl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "api, postcrawl, reddit, scraping, sdk, social-media, tiktok",
    "author": null,
    "author_email": "PostCrawl Team <support@postcrawl.com>",
    "download_url": "https://files.pythonhosted.org/packages/4a/77/21c74210b000353db983d7175b74679869c80a5527e92a5858a75d00a983/postcrawl-1.0.0.tar.gz",
    "platform": null,
    "description": "# PostCrawl Python SDK\n\nOfficial Python SDK for [PostCrawl](https://postcrawl.com) - The Fastest LLM-Ready Social Media Crawler. Extract and search content from Reddit and TikTok with a simple, type-safe Python interface.\n\n## Features\n\n- \ud83d\udd0d **Search** across Reddit and TikTok with advanced filtering\n- \ud83d\udcca **Extract** content from social media URLs with optional comments\n- \ud83d\ude80 **Combined search and extract** in a single operation\n- \ud83c\udff7\ufe0f **Type-safe** with Pydantic models and full type hints\n- \u26a1 **Async/await** support with synchronous convenience methods\n- \ud83d\udee1\ufe0f **Comprehensive error handling** with detailed exceptions\n- \ud83d\udcc8 **Rate limiting** support with credit tracking\n- \ud83d\udd04 **Automatic retries** for network errors\n- \ud83c\udfaf **Platform-specific models** for Reddit and TikTok data with strong typing\n- \ud83d\udcdd **Rich content formatting** with markdown support\n- \ud83d\udc0d **Python 3.10+** with modern type annotations and snake_case naming\n\n## Installation\n\n### Using uv (Recommended)\n\n[uv](https://github.com/astral-sh/uv) is a fast Python package manager that we recommend:\n\n```bash\n# Install uv if you haven't already\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Add postcrawl to your project\nuv add postcrawl\n```\n\n### Using pip\n\n```bash\npip install postcrawl\n```\n\n### Optional: Environment Variables\n\nFor loading API keys from .env files:\n\n```bash\nuv add python-dotenv\n# or\npip install python-dotenv\n```\n\n## Requirements\n\n- Python 3.10 or higher\n- PostCrawl API key ([Get one for free](https://postcrawl.com))\n\n## Quick Start\n\n### Async Usage (Recommended)\n```python\nimport asyncio\nfrom postcrawl import PostCrawlClient\n\nasync def main():\n    # Initialize the client with your API key\n    async with PostCrawlClient(api_key=\"sk_your_api_key_here\") as client:\n        # Search for content\n        results = await client.search(\n            social_platforms=[\"reddit\"],\n            query=\"machine learning\",\n            results=10,\n            page=1\n        )\n\n        # Process results\n        for post in results:\n            print(f\"{post.title} - {post.url}\")\n            print(f\"  Date: {post.date}\")\n            print(f\"  Snippet: {post.snippet[:100]}...\")\n\n# Run the async function\nasyncio.run(main())\n```\n\n### Synchronous Usage\n```python\nfrom postcrawl import PostCrawlClient\n\n# Initialize the client\nclient = PostCrawlClient(api_key=\"sk_your_api_key_here\")\n\n# Search synchronously\nresults = client.search_sync(\n    social_platforms=[\"reddit\", \"tiktok\"],\n    query=\"artificial intelligence\",\n    results=5\n)\n\n# Extract content from URLs\nposts = client.extract_sync(\n    urls=[\"https://reddit.com/r/...\", \"https://tiktok.com/@...\"],\n    include_comments=True\n)\n```\n\n\n## API Reference\n\n### Search\n```python\nresults = await client.search(\n    social_platforms=[\"reddit\", \"tiktok\"],\n    query=\"your search query\",\n    results=10,  # 1-100\n    page=1       # pagination\n)\n```\n\n### Extract\n```python\nposts = await client.extract(\n    urls=[\"https://reddit.com/...\", \"https://tiktok.com/...\"],\n    include_comments=True,\n    response_mode=\"raw\"  # or \"markdown\"\n)\n```\n\n### Search and Extract\n```python\nposts = await client.search_and_extract(\n    social_platforms=[\"reddit\"],\n    query=\"search query\",\n    results=5,\n    page=1,\n    include_comments=False,\n    response_mode=\"markdown\"\n)\n```\n\n### Synchronous Methods\n```python\n# All methods have synchronous versions\nresults = client.search_sync(...)\nposts = client.extract_sync(...)\ncombined = client.search_and_extract_sync(...)\n```\n\n## Examples\n\nCheck out the `examples/` directory for complete working examples:\n- `search_101.py` - Basic search functionality demo\n- `extract_101.py` - Content extraction demo\n- `search_and_extract_101.py` - Combined operation demo\n\nRun examples with:\n```bash\n# Using uv (recommended)\nuv run python examples/search_101.py\n\n# Or with standard Python\ncd examples\npython search_101.py\n```\n\n## Response Models\n\n### SearchResult\nResponse from the search endpoint:\n- `title`: Title of the search result\n- `url`: URL of the search result\n- `snippet`: Text snippet from the content\n- `date`: Date of the post (e.g., \"Dec 28, 2024\")\n- `image_url`: URL of associated image (can be empty string)\n\n### ExtractedPost\n- `url`: Original URL\n- `source`: Platform name (\"reddit\" or \"tiktok\")\n- `raw`: Raw content data (RedditPost or TiktokPost object) - strongly typed\n- `markdown`: Markdown formatted content (when response_mode=\"markdown\")\n- `error`: Error message if extraction failed\n\n## Working with Platform-Specific Types\n\nThe SDK provides type-safe access to platform-specific data:\n\n```python\nfrom postcrawl import PostCrawlClient, RedditPost, TiktokPost\n\n# Extract content with proper type handling\nposts = await client.extract(urls=[\"https://reddit.com/...\"])\n\nfor post in posts:\n    if post.error:\n        print(f\"Error: {post.error}\")\n    elif isinstance(post.raw, RedditPost):\n        # Access Reddit-specific fields with snake_case attributes\n        print(f\"Subreddit: r/{post.raw.subreddit_name}\")\n        print(f\"Score: {post.raw.score}\")\n        print(f\"Title: {post.raw.title}\")\n        print(f\"Upvotes: {post.raw.upvotes}\")\n        print(f\"Created: {post.raw.created_at}\")\n        if post.raw.comments:\n            print(f\"Comments: {len(post.raw.comments)}\")\n    elif isinstance(post.raw, TiktokPost):\n        # Access TikTok-specific fields with snake_case attributes\n        print(f\"Username: @{post.raw.username}\")\n        print(f\"Likes: {post.raw.likes}\")\n        print(f\"Total Comments: {post.raw.total_comments}\")\n        print(f\"Created: {post.raw.created_at}\")\n        if post.raw.hashtags:\n            print(f\"Hashtags: {', '.join(post.raw.hashtags)}\")\n```\n\n## Error Handling\n\n```python\nfrom postcrawl.exceptions import (\n    AuthenticationError,      # Invalid API key\n    InsufficientCreditsError, # Not enough credits\n    RateLimitError,          # Rate limit exceeded\n    ValidationError          # Invalid parameters\n)\n```\n\n## Development\n\nThis project uses [uv](https://github.com/astral-sh/uv) for dependency management. See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed setup and contribution guidelines.\n\n### Quick Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/post-crawl/python-sdk.git\ncd python-sdk\n\n# Install dependencies\nuv sync\n\n# Run tests\nmake test\n\n# Run all checks (format, lint, test)\nmake check\n\n# Build the package\nmake build\n```\n\n### Available Commands\n\n```bash\nmake help         # Show all available commands\nmake format       # Format code with black and ruff\nmake lint         # Run linting and type checking\nmake test         # Run test suite\nmake check        # Run format, lint, and tests\nmake build        # Build distribution packages\nmake verify       # Verify package installation\nmake publish-test # Publish to TestPyPI\n```\n\n## API Key Management\n\n### Environment Variables (Recommended)\n\nStore your API key securely in environment variables:\n\n```bash\nexport POSTCRAWL_API_KEY=\"sk_your_api_key_here\"\n```\n\nOr use a `.env` file:\n```bash\n# .env\nPOSTCRAWL_API_KEY=sk_your_api_key_here\n```\n\nThen load it in your code:\n```python\nimport os\nfrom dotenv import load_dotenv\nfrom postcrawl import PostCrawlClient\n\nload_dotenv()\nclient = PostCrawlClient(api_key=os.getenv(\"POSTCRAWL_API_KEY\"))\n```\n\n### Security Best Practices\n\n- **Never hardcode API keys** in your source code\n- **Add `.env` to `.gitignore`** to prevent accidental commits\n- **Use environment variables** in production\n- **Rotate keys regularly** through the PostCrawl dashboard\n- **Set key permissions** to limit access to specific operations\n\n## Rate Limits & Credits\n\nPostCrawl uses a credit-based system:\n\n- **Search**: ~1 credit per 10 results\n- **Extract**: ~1 credit per URL (without comments)\n- **Extract with comments**: ~3 credits per URL\n\nRate limits are returned in response headers:\n```python\nclient = PostCrawlClient(api_key=\"sk_...\")\nresults = await client.search(...)\n\nprint(f\"Rate limit: {client.rate_limit_info['limit']}\")\nprint(f\"Remaining: {client.rate_limit_info['remaining']}\")\nprint(f\"Reset at: {client.rate_limit_info['reset']}\")\n```\n\n## Support\n\n- **Documentation**: [github.com/post-crawl/python-sdk](https://github.com/post-crawl/python-sdk)\n- **Issues**: [github.com/post-crawl/python-sdk/issues](https://github.com/post-crawl/python-sdk/issues)\n- **Email**: support@postcrawl.com\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python SDK for PostCrawl - The Fastest LLM Ready Social Media Crawler",
    "version": "1.0.0",
    "project_urls": {
        "Changelog": "https://github.com/post-crawl/python-sdk/releases",
        "Documentation": "https://github.com/post-crawl/python-sdk",
        "Homepage": "https://postcrawl.com/",
        "Issues": "https://github.com/post-crawl/python-sdk/issues",
        "Repository": "https://github.com/post-crawl/python-sdk"
    },
    "split_keywords": [
        "api",
        " postcrawl",
        " reddit",
        " scraping",
        " sdk",
        " social-media",
        " tiktok"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "53a5a8885ff96aa1deb8ed5d50aef55d6d62d99a227735c0ef68d3e28ed2932f",
                "md5": "1a0bd9307d149b5455a2686cee32661f",
                "sha256": "59d9ae01eb7817d7c2894ddd06d1a53dde49308827abc8776eea8a067a2b333b"
            },
            "downloads": -1,
            "filename": "postcrawl-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1a0bd9307d149b5455a2686cee32661f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 17220,
            "upload_time": "2025-07-10T12:47:32",
            "upload_time_iso_8601": "2025-07-10T12:47:32.208554Z",
            "url": "https://files.pythonhosted.org/packages/53/a5/a8885ff96aa1deb8ed5d50aef55d6d62d99a227735c0ef68d3e28ed2932f/postcrawl-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4a7721c74210b000353db983d7175b74679869c80a5527e92a5858a75d00a983",
                "md5": "6c6194de7cc94a40ccb0594762f10fb8",
                "sha256": "80125e9cff06f7b078add32530d0e417cdd2cac684b171f338926397aac0aceb"
            },
            "downloads": -1,
            "filename": "postcrawl-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6c6194de7cc94a40ccb0594762f10fb8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 17505,
            "upload_time": "2025-07-10T12:47:33",
            "upload_time_iso_8601": "2025-07-10T12:47:33.901517Z",
            "url": "https://files.pythonhosted.org/packages/4a/77/21c74210b000353db983d7175b74679869c80a5527e92a5858a75d00a983/postcrawl-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 12:47:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "post-crawl",
    "github_project": "python-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "postcrawl"
}
        
Elapsed time: 0.73931s