scrapegraph-py


Namescrapegraph-py JSON
Version 1.14.1 PyPI version JSON
download
home_pageNone
SummaryScrapeGraph Python SDK for API
upload_time2025-07-08 14:48:00
maintainerNone
docs_urlNone
authorNone
requires_python<4.0,>=3.10
licenseMIT
keywords ai api artificial intelligence gpt graph machine learning natural language processing nlp openai scraping sdk web scraping tool webscraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🌐 ScrapeGraph Python SDK

[![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py)
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)

<p align="left">
  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official [Python SDK ](https://scrapegraphai.com) for the ScrapeGraph API - Smart web scraping powered by AI.

## 📦 Installation

```bash
pip install scrapegraph-py
```

## 🚀 Features

- 🤖 AI-powered web scraping and search
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
- ⚡ Automatic retries
- 🔐 Secure authentication

## 🎯 Quick Start

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")
```

> [!NOTE]
> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`

## 📚 Available Endpoints

### 🤖 SmartScraper

Extract structured data from any webpage or HTML content using AI.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Using a URL
response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main heading and description"
)

# Or using HTML content
html_content = """
<html>
    <body>
        <h1>Company Name</h1>
        <p>We are a technology company focused on AI solutions.</p>
    </body>
</html>
"""

response = client.smartscraper(
    website_html=html_content,
    user_prompt="Extract the company description"
)

print(response)
```

<details>
<summary>Output Schema (Optional)</summary>

```python
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class WebsiteData(BaseModel):
    title: str = Field(description="The page title")
    description: str = Field(description="The meta description")

response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the title and description",
    output_schema=WebsiteData
)
```

</details>

### 🔍 SearchScraper

Perform AI-powered web searches with structured results and reference URLs.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.searchscraper(
    user_prompt="What is the latest version of Python and its main features?"
)

print(f"Answer: {response['result']}")
print(f"Sources: {response['reference_urls']}")
```

<details>
<summary>Output Schema (Optional)</summary>

```python
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class PythonVersionInfo(BaseModel):
    version: str = Field(description="The latest Python version number")
    release_date: str = Field(description="When this version was released")
    major_features: list[str] = Field(description="List of main features")

response = client.searchscraper(
    user_prompt="What is the latest version of Python and its main features?",
    output_schema=PythonVersionInfo
)
```

</details>

### 📝 Markdownify

Converts any webpage into clean, formatted markdown.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
    website_url="https://example.com"
)

print(response)
```

## ⚡ Async Support

All endpoints support async operations:

```python
import asyncio
from scrapegraph_py import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.smartscraper(
            website_url="https://example.com",
            user_prompt="Extract the main content"
        )
        print(response)

asyncio.run(main())
```

## 📖 Documentation

For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)

## 🛠️ Development

For information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).

## 💬 Support & Feedback

- 📧 Email: support@scrapegraphai.com
- 💻 GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)
- 🌟 Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)
- ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
  ```python
  from scrapegraph_py import Client

  client = Client(api_key="your-api-key-here")

  client.submit_feedback(
      request_id="your-request-id",
      rating=5,
      feedback_text="Great results!"
  )
  ```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://docs.scrapegraphai.com)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

---

Made with ❤️ by [ScrapeGraph AI](https://scrapegraphai.com)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scrapegraph-py",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "ai, api, artificial intelligence, gpt, graph, machine learning, natural language processing, nlp, openai, scraping, sdk, web scraping tool, webscraping",
    "author": null,
    "author_email": "Marco Vinciguerra <mvincig11@gmail.com>, perinim.98@gmail.com, Lorenzo Padoan <lorenzo.padoan977@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/44/69/c4b971bfe32a48fcc2dd9363fdc07c87481fb7d4147c5d6ded1b915baaf0/scrapegraph_py-1.14.1.tar.gz",
    "platform": null,
    "description": "# \ud83c\udf10 ScrapeGraph Python SDK\n\n[![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py)\n[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)\n\n<p align=\"left\">\n  <img src=\"https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png\" alt=\"ScrapeGraph API Banner\" style=\"width: 70%;\">\n</p>\n\nOfficial [Python SDK ](https://scrapegraphai.com) for the ScrapeGraph API - Smart web scraping powered by AI.\n\n## \ud83d\udce6 Installation\n\n```bash\npip install scrapegraph-py\n```\n\n## \ud83d\ude80 Features\n\n- \ud83e\udd16 AI-powered web scraping and search\n- \ud83d\udd04 Both sync and async clients\n- \ud83d\udcca Structured output with Pydantic schemas\n- \ud83d\udd0d Detailed logging\n- \u26a1 Automatic retries\n- \ud83d\udd10 Secure authentication\n\n## \ud83c\udfaf Quick Start\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n```\n\n> [!NOTE]\n> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`\n\n## \ud83d\udcda Available Endpoints\n\n### \ud83e\udd16 SmartScraper\n\nExtract structured data from any webpage or HTML content using AI.\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\n# Using a URL\nresponse = client.smartscraper(\n    website_url=\"https://example.com\",\n    user_prompt=\"Extract the main heading and description\"\n)\n\n# Or using HTML content\nhtml_content = \"\"\"\n<html>\n    <body>\n        <h1>Company Name</h1>\n        <p>We are a technology company focused on AI solutions.</p>\n    </body>\n</html>\n\"\"\"\n\nresponse = client.smartscraper(\n    website_html=html_content,\n    user_prompt=\"Extract the company description\"\n)\n\nprint(response)\n```\n\n<details>\n<summary>Output Schema (Optional)</summary>\n\n```python\nfrom pydantic import BaseModel, Field\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nclass WebsiteData(BaseModel):\n    title: str = Field(description=\"The page title\")\n    description: str = Field(description=\"The meta description\")\n\nresponse = client.smartscraper(\n    website_url=\"https://example.com\",\n    user_prompt=\"Extract the title and description\",\n    output_schema=WebsiteData\n)\n```\n\n</details>\n\n### \ud83d\udd0d SearchScraper\n\nPerform AI-powered web searches with structured results and reference URLs.\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nresponse = client.searchscraper(\n    user_prompt=\"What is the latest version of Python and its main features?\"\n)\n\nprint(f\"Answer: {response['result']}\")\nprint(f\"Sources: {response['reference_urls']}\")\n```\n\n<details>\n<summary>Output Schema (Optional)</summary>\n\n```python\nfrom pydantic import BaseModel, Field\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nclass PythonVersionInfo(BaseModel):\n    version: str = Field(description=\"The latest Python version number\")\n    release_date: str = Field(description=\"When this version was released\")\n    major_features: list[str] = Field(description=\"List of main features\")\n\nresponse = client.searchscraper(\n    user_prompt=\"What is the latest version of Python and its main features?\",\n    output_schema=PythonVersionInfo\n)\n```\n\n</details>\n\n### \ud83d\udcdd Markdownify\n\nConverts any webpage into clean, formatted markdown.\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nresponse = client.markdownify(\n    website_url=\"https://example.com\"\n)\n\nprint(response)\n```\n\n## \u26a1 Async Support\n\nAll endpoints support async operations:\n\n```python\nimport asyncio\nfrom scrapegraph_py import AsyncClient\n\nasync def main():\n    async with AsyncClient() as client:\n        response = await client.smartscraper(\n            website_url=\"https://example.com\",\n            user_prompt=\"Extract the main content\"\n        )\n        print(response)\n\nasyncio.run(main())\n```\n\n## \ud83d\udcd6 Documentation\n\nFor detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)\n\n## \ud83d\udee0\ufe0f Development\n\nFor information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).\n\n## \ud83d\udcac Support & Feedback\n\n- \ud83d\udce7 Email: support@scrapegraphai.com\n- \ud83d\udcbb GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)\n- \ud83c\udf1f Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)\n- \u2b50 API Feedback: You can also submit feedback programmatically using the feedback endpoint:\n  ```python\n  from scrapegraph_py import Client\n\n  client = Client(api_key=\"your-api-key-here\")\n\n  client.submit_feedback(\n      request_id=\"your-request-id\",\n      rating=5,\n      feedback_text=\"Great results!\"\n  )\n  ```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- [Website](https://scrapegraphai.com)\n- [Documentation](https://docs.scrapegraphai.com)\n- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)\n\n---\n\nMade with \u2764\ufe0f by [ScrapeGraph AI](https://scrapegraphai.com)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "ScrapeGraph Python SDK for API",
    "version": "1.14.1",
    "project_urls": null,
    "split_keywords": [
        "ai",
        " api",
        " artificial intelligence",
        " gpt",
        " graph",
        " machine learning",
        " natural language processing",
        " nlp",
        " openai",
        " scraping",
        " sdk",
        " web scraping tool",
        " webscraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9d936f0713d285c0bb61f03fbb27b226846aa469ff4ff32528fcb983846b254a",
                "md5": "d368ed8678c84a031b96be05da781e75",
                "sha256": "04d4799a756c840d89ed8ca4e627b2c9d7c1b07a90d1b9120361eb74a96c0abf"
            },
            "downloads": -1,
            "filename": "scrapegraph_py-1.14.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d368ed8678c84a031b96be05da781e75",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 18059,
            "upload_time": "2025-07-08T14:48:00",
            "upload_time_iso_8601": "2025-07-08T14:48:00.042177Z",
            "url": "https://files.pythonhosted.org/packages/9d/93/6f0713d285c0bb61f03fbb27b226846aa469ff4ff32528fcb983846b254a/scrapegraph_py-1.14.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4469c4b971bfe32a48fcc2dd9363fdc07c87481fb7d4147c5d6ded1b915baaf0",
                "md5": "bc8c35a0c28a03970f538dcd653277ce",
                "sha256": "0b8b5b2877648fbdde27387dece06401375cf3c2c88552ad170efd39bf9355dc"
            },
            "downloads": -1,
            "filename": "scrapegraph_py-1.14.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bc8c35a0c28a03970f538dcd653277ce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 148957,
            "upload_time": "2025-07-08T14:48:00",
            "upload_time_iso_8601": "2025-07-08T14:48:00.980562Z",
            "url": "https://files.pythonhosted.org/packages/44/69/c4b971bfe32a48fcc2dd9363fdc07c87481fb7d4147c5d6ded1b915baaf0/scrapegraph_py-1.14.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-08 14:48:00",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "scrapegraph-py"
}
        
Elapsed time: 0.44057s