scrapegraph-py


Namescrapegraph-py JSON
Version 1.9.0 PyPI version JSON
download
home_pageNone
SummaryScrapeGraph Python SDK for API
upload_time2025-01-08 12:02:23
maintainerNone
docs_urlNone
authorNone
requires_python<4.0,>=3.10
licenseMIT
keywords ai api artificial intelligence gpt graph machine learning natural language processing nlp openai scraping sdk web scraping tool webscraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🌐 ScrapeGraph Python SDK

[![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py)
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com) 

<p align="left">
  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official [Python SDK ](https://scrapegraphai.com) for the ScrapeGraph API - Smart web scraping powered by AI.

## 📦 Installation

```bash
pip install scrapegraph-py
```

## 🚀 Features

- 🤖 AI-powered web scraping
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
- ⚡ Automatic retries
- 🔐 Secure authentication

## 🎯 Quick Start

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")
```

> [!NOTE]
> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`

## 📚 Available Endpoints

### 🔍 SmartScraper

Scrapes any webpage using AI to extract specific information.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Basic usage
response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main heading and description"
)

print(response)
```

<details>
<summary>Output Schema (Optional)</summary>

```python
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class WebsiteData(BaseModel):
    title: str = Field(description="The page title")
    description: str = Field(description="The meta description")

response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the title and description",
    output_schema=WebsiteData
)
```

</details>

### 📝 Markdownify

Converts any webpage into clean, formatted markdown.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
    website_url="https://example.com"
)

print(response)
```

### 💻 LocalScraper

Extracts information from HTML content using AI.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

html_content = """
<html>
    <body>
        <h1>Company Name</h1>
        <p>We are a technology company focused on AI solutions.</p>
        <div class="contact">
            <p>Email: contact@example.com</p>
        </div>
    </body>
</html>
"""

response = client.localscraper(
    user_prompt="Extract the company description",
    website_html=html_content
)

print(response)
```

## ⚡ Async Support

All endpoints support async operations:

```python
import asyncio
from scrapegraph_py import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.smartscraper(
            website_url="https://example.com",
            user_prompt="Extract the main content"
        )
        print(response)

asyncio.run(main())
```

## 📖 Documentation

For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)

## 🛠️ Development

For information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).

## 💬 Support & Feedback

- 📧 Email: support@scrapegraphai.com
- 💻 GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)
- 🌟 Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)
- ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
  ```python
  from scrapegraph_py import Client

  client = Client(api_key="your-api-key-here")

  client.submit_feedback(
      request_id="your-request-id",
      rating=5,
      feedback_text="Great results!"
  )
  ```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://docs.scrapegraphai.com) 
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

---

Made with ❤️ by [ScrapeGraph AI](https://scrapegraphai.com)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scrapegraph-py",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "ai, api, artificial intelligence, gpt, graph, machine learning, natural language processing, nlp, openai, scraping, sdk, web scraping tool, webscraping",
    "author": null,
    "author_email": "Marco Vinciguerra <mvincig11@gmail.com>, Marco Perini <perinim.98@gmail.com>, Lorenzo Padoan <lorenzo.padoan977@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6a/58/90176999dd08572d8b6dd783f4537d222006217877d4e6cb0a01b2d031b1/scrapegraph_py-1.9.0.tar.gz",
    "platform": null,
    "description": "# \ud83c\udf10 ScrapeGraph Python SDK\n\n[![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py)\n[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com) \n\n<p align=\"left\">\n  <img src=\"https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png\" alt=\"ScrapeGraph API Banner\" style=\"width: 70%;\">\n</p>\n\nOfficial [Python SDK ](https://scrapegraphai.com) for the ScrapeGraph API - Smart web scraping powered by AI.\n\n## \ud83d\udce6 Installation\n\n```bash\npip install scrapegraph-py\n```\n\n## \ud83d\ude80 Features\n\n- \ud83e\udd16 AI-powered web scraping\n- \ud83d\udd04 Both sync and async clients\n- \ud83d\udcca Structured output with Pydantic schemas\n- \ud83d\udd0d Detailed logging\n- \u26a1 Automatic retries\n- \ud83d\udd10 Secure authentication\n\n## \ud83c\udfaf Quick Start\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n```\n\n> [!NOTE]\n> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`\n\n## \ud83d\udcda Available Endpoints\n\n### \ud83d\udd0d SmartScraper\n\nScrapes any webpage using AI to extract specific information.\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\n# Basic usage\nresponse = client.smartscraper(\n    website_url=\"https://example.com\",\n    user_prompt=\"Extract the main heading and description\"\n)\n\nprint(response)\n```\n\n<details>\n<summary>Output Schema (Optional)</summary>\n\n```python\nfrom pydantic import BaseModel, Field\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nclass WebsiteData(BaseModel):\n    title: str = Field(description=\"The page title\")\n    description: str = Field(description=\"The meta description\")\n\nresponse = client.smartscraper(\n    website_url=\"https://example.com\",\n    user_prompt=\"Extract the title and description\",\n    output_schema=WebsiteData\n)\n```\n\n</details>\n\n### \ud83d\udcdd Markdownify\n\nConverts any webpage into clean, formatted markdown.\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nresponse = client.markdownify(\n    website_url=\"https://example.com\"\n)\n\nprint(response)\n```\n\n### \ud83d\udcbb LocalScraper\n\nExtracts information from HTML content using AI.\n\n```python\nfrom scrapegraph_py import Client\n\nclient = Client(api_key=\"your-api-key-here\")\n\nhtml_content = \"\"\"\n<html>\n    <body>\n        <h1>Company Name</h1>\n        <p>We are a technology company focused on AI solutions.</p>\n        <div class=\"contact\">\n            <p>Email: contact@example.com</p>\n        </div>\n    </body>\n</html>\n\"\"\"\n\nresponse = client.localscraper(\n    user_prompt=\"Extract the company description\",\n    website_html=html_content\n)\n\nprint(response)\n```\n\n## \u26a1 Async Support\n\nAll endpoints support async operations:\n\n```python\nimport asyncio\nfrom scrapegraph_py import AsyncClient\n\nasync def main():\n    async with AsyncClient() as client:\n        response = await client.smartscraper(\n            website_url=\"https://example.com\",\n            user_prompt=\"Extract the main content\"\n        )\n        print(response)\n\nasyncio.run(main())\n```\n\n## \ud83d\udcd6 Documentation\n\nFor detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)\n\n## \ud83d\udee0\ufe0f Development\n\nFor information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).\n\n## \ud83d\udcac Support & Feedback\n\n- \ud83d\udce7 Email: support@scrapegraphai.com\n- \ud83d\udcbb GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)\n- \ud83c\udf1f Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)\n- \u2b50 API Feedback: You can also submit feedback programmatically using the feedback endpoint:\n  ```python\n  from scrapegraph_py import Client\n\n  client = Client(api_key=\"your-api-key-here\")\n\n  client.submit_feedback(\n      request_id=\"your-request-id\",\n      rating=5,\n      feedback_text=\"Great results!\"\n  )\n  ```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- [Website](https://scrapegraphai.com)\n- [Documentation](https://docs.scrapegraphai.com) \n- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)\n\n---\n\nMade with \u2764\ufe0f by [ScrapeGraph AI](https://scrapegraphai.com)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "ScrapeGraph Python SDK for API",
    "version": "1.9.0",
    "project_urls": null,
    "split_keywords": [
        "ai",
        " api",
        " artificial intelligence",
        " gpt",
        " graph",
        " machine learning",
        " natural language processing",
        " nlp",
        " openai",
        " scraping",
        " sdk",
        " web scraping tool",
        " webscraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42d367ac27ae8af5ca0f4e0bb10068a27e074dfc216cf0f5d145cb041469b168",
                "md5": "d263e00dd80fc4ed5238ef87bf2f8b0c",
                "sha256": "33fd727db8c1a83736b1e790fb416c508f4fd192034b5f22f80c50fdf99a9c4f"
            },
            "downloads": -1,
            "filename": "scrapegraph_py-1.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d263e00dd80fc4ed5238ef87bf2f8b0c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 14520,
            "upload_time": "2025-01-08T12:02:22",
            "upload_time_iso_8601": "2025-01-08T12:02:22.137214Z",
            "url": "https://files.pythonhosted.org/packages/42/d3/67ac27ae8af5ca0f4e0bb10068a27e074dfc216cf0f5d145cb041469b168/scrapegraph_py-1.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a5890176999dd08572d8b6dd783f4537d222006217877d4e6cb0a01b2d031b1",
                "md5": "87eac278834cf8bb3b93b09ab0488577",
                "sha256": "2d5fd0c457037541d646a2a4aaa46d2d8d20b6fba39c0c32f678618e6aac3899"
            },
            "downloads": -1,
            "filename": "scrapegraph_py-1.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "87eac278834cf8bb3b93b09ab0488577",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 110214,
            "upload_time": "2025-01-08T12:02:23",
            "upload_time_iso_8601": "2025-01-08T12:02:23.047745Z",
            "url": "https://files.pythonhosted.org/packages/6a/58/90176999dd08572d8b6dd783f4537d222006217877d4e6cb0a01b2d031b1/scrapegraph_py-1.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-08 12:02:23",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "scrapegraph-py"
}
        
Elapsed time: 0.43576s