| Name | olyptik JSON |
| Version |
0.1.2
JSON |
| download |
| home_page | None |
| Summary | Official Python SDK for Olyptik API |
| upload_time | 2025-09-06 14:56:18 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.8 |
| license | MIT |
| keywords |
sdk
crawler
olyptik
api
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Olyptik Python SDK
The Olyptik Python SDK provides a simple and intuitive interface for web crawling and content extraction. It supports both synchronous and asynchronous programming patterns with full type hints.
## Installation
Install the SDK using pip:
```bash
pip install olyptik
```
## Configuration
First, you'll need to initialize the SDK with your API key - you can get it from the [settings page](https://app.olyptik.io/settings/crawl). You can either pass it directly or use environment variables.
```python
from olyptik import Olyptik
# Initialize with API key
client = Olyptik(api_key="your_api_key_here")
```
## Synchronous Usage
### Start a crawl
<CodeGroup>
Minimal settings crawl:
```python
crawl = client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 50
})
print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")
```
Full example:
```python
# Start a crawl
crawl = client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 50,
"maxDepth": 2,
"engineType": "auto",
"includeLinks": True,
"timeout": 60,
"useSitemap": False,
"entireWebsite": False,
"excludeNonMainTags": True,
"useStaticIps": False
})
print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")
```
</CodeGroup>
### Query crawls
```python
from olyptik import CrawlStatus
result = client.query_crawls({
"startUrls": ["https://example.com"],
"status": [CrawlStatus.SUCCEEDED],
"page": 0,
})
print("Crawls: ", result.results)
print("Page: ", result.page)
print("Total pages: ", result.totalPages)
print("Count of items per page: ", result.limit)
print("Total matched crawls: ", result.totalResults)
```
### Getting Crawl Results
Retrieve the results of your crawl using the crawl ID.
The results are paginated, and you can specify the page number and limit per page.
```python
limit = 50
page = 0
results = client.get_crawl_results(crawl.id, page, limit)
for result in results.results:
print(f"URL: {result.url}")
print(f"Title: {result.title}")
print(f"Depth: {result.depthOfUrl}")
```
### Abort a crawl
```python
aborted_crawl = client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")
```
## Asynchronous Usage
For better performance with I/O operations, use the async client:
### Start a crawl
<CodeGroup>
Minimal settings crawl:
```python
import asyncio
from olyptik import AsyncOlyptik
async def main():
async with AsyncOlyptik(api_key="your_api_key_here") as client:
crawl = await client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 50
})
print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")
asyncio.run(main())
```
Full example:
```python
import asyncio
from olyptik import AsyncOlyptik
async def main():
async with AsyncOlyptik(api_key="your_api_key_here") as client:
# Start a crawl
crawl = await client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 50,
"maxDepth": 2,
"engineType": "auto",
"includeLinks": True,
"timeout": 60,
"useSitemap": False,
"entireWebsite": False,
"excludeNonMainTags": True,
"useStaticIps": False
})
print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")
asyncio.run(main())
```
</CodeGroup>
### Query crawls
```python
import asyncio
from olyptik import AsyncOlyptik, CrawlStatus
async def main():
async with AsyncOlyptik(api_key="your_api_key_here") as client:
result = await client.query_crawls({
"startUrls": ["https://example.com"],
"status": [CrawlStatus.SUCCEEDED],
"page": 0,
})
print("Crawls: ", result.results)
print("Page: ", result.page)
print("Total pages: ", result.totalPages)
print("Count of items per page: ", result.limit)
print("Total matched crawls: ", result.totalResults)
asyncio.run(main())
```
### Get crawl results
```python
import asyncio
from olyptik import AsyncOlyptik
async def main():
async with AsyncOlyptik(api_key="your_api_key_here") as client:
# First start a crawl
crawl = await client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 50
})
# Get crawl results
limit = 50
page = 0
results = await client.get_crawl_results(crawl.id, page, limit)
for result in results.results:
print(f"URL: {result.url}")
print(f"Title: {result.title}")
print(f"Depth: {result.depthOfUrl}")
asyncio.run(main())
```
### Abort a crawl
```python
import asyncio
from olyptik import AsyncOlyptik
async def main():
async with AsyncOlyptik(api_key="your_api_key_here") as client:
# First start a crawl
crawl = await client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 50
})
# Abort the crawl
aborted_crawl = await client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")
asyncio.run(main())
```
## Configuration Options
### StartCrawlPayload
The crawl configuration options available:
You must provide at least one of the following: maxResults, useSitemap, or entireWebsite.
| Property | Type | Required | Default | Description |
|--------|------|----------|---------|-------------|
| startUrl | string | ✅ | - | The URL to start crawling from |
| maxResults | number | ❌ | - | Maximum number of results to collect (1-5,000) |
| useSitemap | boolean | ❌ | false | Whether to use sitemap.xml to crawl the website |
| entireWebsite | boolean | ❌ | false | Whether to use sitemap.xml and all found links to crawl the website |
| maxDepth | number | ❌ | 10 | Maximum depth of pages to crawl (1-100) |
| includeLinks | boolean | ❌ | true | Whether to include links in the crawl results' markdown |
| excludeNonMainTags | boolean | ❌ | true | Whether to exclude non-main HTML tags (header, footer, aside, etc.) from the crawl results |
| timeout | number | ❌ | 60 | Timeout duration in minutes |
| engineType | string | ❌ | "auto" | The engine to use: "auto", "cheerio" (fast, static sites), "playwright" (dynamic sites) |
| useStaticIps | boolean | ❌ | false | Whether to use static IPs for the crawl |
### Engine Types
Choose the appropriate engine for your crawling needs:
```python
from olyptik import EngineType
# Available engine types
EngineType.AUTO # Automatically choose the best engine
EngineType.PLAYWRIGHT # Use Playwright for JavaScript-heavy sites
EngineType.CHEERIO # Use Cheerio for faster, static content crawling
```
### Crawl Status
Monitor your crawl status using the `CrawlStatus` enum:
```python
from olyptik import CrawlStatus
# Possible status values
CrawlStatus.RUNNING # Crawl is currently running
CrawlStatus.SUCCEEDED # Crawl completed successfully
CrawlStatus.FAILED # Crawl failed due to an error
CrawlStatus.TIMED_OUT # Crawl exceeded timeout limit
CrawlStatus.ABORTED # Crawl was manually aborted
CrawlStatus.ERROR # Crawl encountered an error
```
## Error Handling
The SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:
```python
from olyptik import Olyptik, ApiError
client = Olyptik(api_key="your_api_key_here")
try:
crawl = client.run_crawl({
"startUrl": "https://example.com",
"maxResults": 10
})
except ApiError as e:
# API returned an error response
print(f"API Error: {e.message}")
print(f"Status Code: {e.status_code}")
```
## Data Models
### CrawlResult
Each crawl result contains:
```python
@dataclass
class CrawlResult:
crawlId: str # Unique identifier for the crawl
brandId: str # Brand identifier
url: str # The crawled URL
title: str # Page title
markdown: str # Extracted content in markdown format
depthOfUrl: int # How deep this URL was in the crawl
createdAt: str # When the result was created
```
### Crawl
Crawl metadata includes:
```python
@dataclass
class Crawl:
id: str # Unique crawl identifier
status: CrawlStatus # Current status
startUrls: List[str] # Starting URLs
includeLinks: bool # Whether links are included
maxDepth: int # Maximum crawl depth
maxResults: int # Maximum number of results
brandId: str # Brand identifier
createdAt: str # Creation timestamp
completedAt: Optional[str] # Completion timestamp
durationInSeconds: int # Total duration
numberOfResults: int # Number of results found
useSitemap: bool # Whether sitemap was used
entireWebsite: bool # Whether to use both sitemap and all found links
excludeNonMainTags: bool # Whether non-main HTML tags were excluded
timeout: int # Timeout setting
useStaticIps: bool # Whether static IPs were used
engineType: EngineType # Engine type used
```
Raw data
{
"_id": null,
"home_page": null,
"name": "olyptik",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "sdk, crawler, olyptik, api",
"author": null,
"author_email": "Olyptik <support@olyptik.io>",
"download_url": "https://files.pythonhosted.org/packages/d5/a2/b27bb150aad9edc0ee5309428c0b03610324ace72ec5ae58d45ea5ada16c/olyptik-0.1.2.tar.gz",
"platform": null,
"description": "# Olyptik Python SDK\nThe Olyptik Python SDK provides a simple and intuitive interface for web crawling and content extraction. It supports both synchronous and asynchronous programming patterns with full type hints.\n\n## Installation\n\nInstall the SDK using pip:\n\n```bash\npip install olyptik\n```\n\n## Configuration\n\nFirst, you'll need to initialize the SDK with your API key - you can get it from the [settings page](https://app.olyptik.io/settings/crawl). You can either pass it directly or use environment variables.\n\n```python\nfrom olyptik import Olyptik\n\n# Initialize with API key\nclient = Olyptik(api_key=\"your_api_key_here\")\n```\n\n## Synchronous Usage\n\n### Start a crawl\n\n<CodeGroup>\n\nMinimal settings crawl:\n```python\ncrawl = client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 50\n})\n\nprint(f\"Crawl started with ID: {crawl.id}\")\nprint(f\"Status: {crawl.status}\")\n```\n\nFull example:\n```python\n# Start a crawl\ncrawl = client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 50,\n \"maxDepth\": 2,\n \"engineType\": \"auto\",\n \"includeLinks\": True,\n \"timeout\": 60,\n \"useSitemap\": False,\n \"entireWebsite\": False,\n \"excludeNonMainTags\": True,\n \"useStaticIps\": False\n})\n\nprint(f\"Crawl started with ID: {crawl.id}\")\nprint(f\"Status: {crawl.status}\")\n```\n</CodeGroup>\n\n### Query crawls\n\n```python\nfrom olyptik import CrawlStatus\n\nresult = client.query_crawls({\n \"startUrls\": [\"https://example.com\"],\n \"status\": [CrawlStatus.SUCCEEDED],\n \"page\": 0,\n})\n\nprint(\"Crawls: \", result.results)\nprint(\"Page: \", result.page)\nprint(\"Total pages: \", result.totalPages)\nprint(\"Count of items per page: \", result.limit)\nprint(\"Total matched crawls: \", result.totalResults)\n```\n\n### Getting Crawl Results\nRetrieve the results of your crawl using the crawl ID.\nThe results are paginated, and you can specify the page number and limit per page.\n\n```python\nlimit = 50\npage = 0\nresults = client.get_crawl_results(crawl.id, page, limit)\nfor result in results.results:\n print(f\"URL: {result.url}\")\n print(f\"Title: {result.title}\")\n print(f\"Depth: {result.depthOfUrl}\")\n```\n\n### Abort a crawl\n\n```python\naborted_crawl = client.abort_crawl(crawl.id)\nprint(f\"Crawl aborted with ID: {aborted_crawl.id}\")\n```\n\n## Asynchronous Usage\n\nFor better performance with I/O operations, use the async client:\n\n### Start a crawl\n\n<CodeGroup>\n\nMinimal settings crawl:\n```python\nimport asyncio\nfrom olyptik import AsyncOlyptik\n\nasync def main():\n async with AsyncOlyptik(api_key=\"your_api_key_here\") as client:\n crawl = await client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 50\n })\n\n print(f\"Crawl started with ID: {crawl.id}\")\n print(f\"Status: {crawl.status}\")\n\nasyncio.run(main())\n```\n\nFull example:\n```python\nimport asyncio\nfrom olyptik import AsyncOlyptik\n\nasync def main():\n async with AsyncOlyptik(api_key=\"your_api_key_here\") as client:\n # Start a crawl\n crawl = await client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 50,\n \"maxDepth\": 2,\n \"engineType\": \"auto\",\n \"includeLinks\": True,\n \"timeout\": 60,\n \"useSitemap\": False,\n \"entireWebsite\": False,\n \"excludeNonMainTags\": True,\n \"useStaticIps\": False\n })\n\n print(f\"Crawl started with ID: {crawl.id}\")\n print(f\"Status: {crawl.status}\")\n\nasyncio.run(main())\n```\n\n</CodeGroup>\n\n### Query crawls\n\n```python\nimport asyncio\nfrom olyptik import AsyncOlyptik, CrawlStatus\n\nasync def main():\n async with AsyncOlyptik(api_key=\"your_api_key_here\") as client:\n result = await client.query_crawls({\n \"startUrls\": [\"https://example.com\"],\n \"status\": [CrawlStatus.SUCCEEDED],\n \"page\": 0,\n })\n \n print(\"Crawls: \", result.results)\n print(\"Page: \", result.page)\n print(\"Total pages: \", result.totalPages)\n print(\"Count of items per page: \", result.limit)\n print(\"Total matched crawls: \", result.totalResults)\n\nasyncio.run(main())\n```\n\n### Get crawl results\n\n```python\nimport asyncio\nfrom olyptik import AsyncOlyptik\n\nasync def main():\n async with AsyncOlyptik(api_key=\"your_api_key_here\") as client:\n # First start a crawl\n crawl = await client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 50\n })\n \n # Get crawl results\n limit = 50\n page = 0\n results = await client.get_crawl_results(crawl.id, page, limit)\n for result in results.results:\n print(f\"URL: {result.url}\")\n print(f\"Title: {result.title}\")\n print(f\"Depth: {result.depthOfUrl}\")\n\nasyncio.run(main())\n```\n\n### Abort a crawl\n\n```python\nimport asyncio\nfrom olyptik import AsyncOlyptik\n\nasync def main():\n async with AsyncOlyptik(api_key=\"your_api_key_here\") as client:\n # First start a crawl\n crawl = await client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 50\n })\n \n # Abort the crawl\n aborted_crawl = await client.abort_crawl(crawl.id)\n print(f\"Crawl aborted with ID: {aborted_crawl.id}\")\n\nasyncio.run(main())\n```\n\n## Configuration Options\n\n### StartCrawlPayload\n\nThe crawl configuration options available:\n\nYou must provide at least one of the following: maxResults, useSitemap, or entireWebsite.\n\n| Property | Type | Required | Default | Description |\n|--------|------|----------|---------|-------------|\n| startUrl | string | \u2705 | - | The URL to start crawling from |\n| maxResults | number | \u274c | - | Maximum number of results to collect (1-5,000) |\n| useSitemap | boolean | \u274c | false | Whether to use sitemap.xml to crawl the website |\n| entireWebsite | boolean | \u274c | false | Whether to use sitemap.xml and all found links to crawl the website |\n| maxDepth | number | \u274c | 10 | Maximum depth of pages to crawl (1-100) |\n| includeLinks | boolean | \u274c | true | Whether to include links in the crawl results' markdown |\n| excludeNonMainTags | boolean | \u274c | true | Whether to exclude non-main HTML tags (header, footer, aside, etc.) from the crawl results |\n| timeout | number | \u274c | 60 | Timeout duration in minutes |\n| engineType | string | \u274c | \"auto\" | The engine to use: \"auto\", \"cheerio\" (fast, static sites), \"playwright\" (dynamic sites) |\n| useStaticIps | boolean | \u274c | false | Whether to use static IPs for the crawl |\n\n### Engine Types\n\nChoose the appropriate engine for your crawling needs:\n\n```python\nfrom olyptik import EngineType\n\n# Available engine types\nEngineType.AUTO # Automatically choose the best engine\nEngineType.PLAYWRIGHT # Use Playwright for JavaScript-heavy sites\nEngineType.CHEERIO # Use Cheerio for faster, static content crawling\n```\n\n### Crawl Status\n\nMonitor your crawl status using the `CrawlStatus` enum:\n\n```python\nfrom olyptik import CrawlStatus\n\n# Possible status values\nCrawlStatus.RUNNING # Crawl is currently running\nCrawlStatus.SUCCEEDED # Crawl completed successfully\nCrawlStatus.FAILED # Crawl failed due to an error\nCrawlStatus.TIMED_OUT # Crawl exceeded timeout limit\nCrawlStatus.ABORTED # Crawl was manually aborted\nCrawlStatus.ERROR # Crawl encountered an error\n```\n\n## Error Handling\n\nThe SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:\n\n```python\nfrom olyptik import Olyptik, ApiError\n\nclient = Olyptik(api_key=\"your_api_key_here\")\n\ntry:\n crawl = client.run_crawl({\n \"startUrl\": \"https://example.com\",\n \"maxResults\": 10\n })\nexcept ApiError as e:\n # API returned an error response\n print(f\"API Error: {e.message}\")\n print(f\"Status Code: {e.status_code}\")\n```\n\n## Data Models\n\n### CrawlResult\n\nEach crawl result contains:\n\n```python\n@dataclass\nclass CrawlResult:\n crawlId: str # Unique identifier for the crawl\n brandId: str # Brand identifier\n url: str # The crawled URL\n title: str # Page title\n markdown: str # Extracted content in markdown format\n depthOfUrl: int # How deep this URL was in the crawl\n createdAt: str # When the result was created\n```\n\n### Crawl\n\nCrawl metadata includes:\n\n```python\n@dataclass\nclass Crawl:\n id: str # Unique crawl identifier\n status: CrawlStatus # Current status\n startUrls: List[str] # Starting URLs\n includeLinks: bool # Whether links are included\n maxDepth: int # Maximum crawl depth\n maxResults: int # Maximum number of results\n brandId: str # Brand identifier\n createdAt: str # Creation timestamp\n completedAt: Optional[str] # Completion timestamp\n durationInSeconds: int # Total duration\n numberOfResults: int # Number of results found\n useSitemap: bool # Whether sitemap was used\n entireWebsite: bool # Whether to use both sitemap and all found links\n excludeNonMainTags: bool # Whether non-main HTML tags were excluded\n timeout: int # Timeout setting\n useStaticIps: bool # Whether static IPs were used\n engineType: EngineType # Engine type used\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Official Python SDK for Olyptik API",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://www.olyptik.io",
"Issues": "https://github.com/olyptik/olyptik/issues",
"Repository": "https://github.com/olyptik/olyptik"
},
"split_keywords": [
"sdk",
" crawler",
" olyptik",
" api"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fa2cd04c784dc48d37fc1b9626c46ca9a6928289f25c609a0484efd58ad0f6d9",
"md5": "8b049d5230618750433f2fd4da773b40",
"sha256": "0d0bad5e417401a3e95d016381ec01d72a364a0b865c5a9ba532ec5ea97c911f"
},
"downloads": -1,
"filename": "olyptik-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8b049d5230618750433f2fd4da773b40",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 7168,
"upload_time": "2025-09-06T14:56:17",
"upload_time_iso_8601": "2025-09-06T14:56:17.581447Z",
"url": "https://files.pythonhosted.org/packages/fa/2c/d04c784dc48d37fc1b9626c46ca9a6928289f25c609a0484efd58ad0f6d9/olyptik-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d5a2b27bb150aad9edc0ee5309428c0b03610324ace72ec5ae58d45ea5ada16c",
"md5": "44cc2f5297de4e24f6fd4d372f3b00b7",
"sha256": "20c82bb0354459476941e5c2a30fbb31267b9e570c10d6f4231f6db27d609b61"
},
"downloads": -1,
"filename": "olyptik-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "44cc2f5297de4e24f6fd4d372f3b00b7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 8564,
"upload_time": "2025-09-06T14:56:18",
"upload_time_iso_8601": "2025-09-06T14:56:18.968349Z",
"url": "https://files.pythonhosted.org/packages/d5/a2/b27bb150aad9edc0ee5309428c0b03610324ace72ec5ae58d45ea5ada16c/olyptik-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-06 14:56:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "olyptik",
"github_project": "olyptik",
"github_not_found": true,
"lcname": "olyptik"
}