# LlamaIndex Tool - Scrapegraph
This tool integrates [Scrapegraph](https://scrapegraphai.com) with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.
## Installation
```bash
pip install llama-index-tools-scrapegraph
```
## Usage
First, import and initialize the ScrapegraphToolSpec:
```python
from llama_index.tools.scrapegraph import ScrapegraphToolSpec
scrapegraph_tool = ScrapegraphToolSpec()
```
### Available Functions
The tool provides the following capabilities:
1. **Smart Scraper**
```python
from pydantic import BaseModel
# Define your schema (optional)
class ProductSchema(BaseModel):
name: str
price: float
description: str
schema = [ProductSchema]
# Perform the scraping
result = scrapegraph_tool.scrapegraph_smartscraper(
prompt="Extract product information",
url="https://example.com/product",
api_key="your-api-key",
schema=schema, # Optional
)
```
2. **Markdownify**
Convert webpage content to markdown format:
```python
markdown_content = scrapegraph_tool.scrapegraph_markdownify(
url="https://example.com", api_key="your-api-key"
)
```
3. **Local Scrape**
Extract structured data from raw text:
```python
text = """
Your raw text content here...
"""
structured_data = scrapegraph_tool.scrapegraph_local_scrape(
text=text, api_key="your-api-key"
)
```
## Requirements
- Python 3.8+
- `scrapegraph-py` package
- Valid Scrapegraph API key
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-tools-scrapegraphai",
"maintainer": "Vincigit00",
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "scraping",
"author": "Marco Vinciguerra",
"author_email": "mvincig11@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/7f/6f/edd41438f189f4f05d3870177f26f4179823afd4152537955c58338574b1/llama_index_tools_scrapegraphai-0.1.1.tar.gz",
"platform": null,
"description": "# LlamaIndex Tool - Scrapegraph\n\nThis tool integrates [Scrapegraph](https://scrapegraphai.com) with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.\n\n## Installation\n\n```bash\npip install llama-index-tools-scrapegraph\n```\n\n## Usage\n\nFirst, import and initialize the ScrapegraphToolSpec:\n\n```python\nfrom llama_index.tools.scrapegraph import ScrapegraphToolSpec\n\nscrapegraph_tool = ScrapegraphToolSpec()\n```\n\n### Available Functions\n\nThe tool provides the following capabilities:\n\n1. **Smart Scraper**\n\n```python\nfrom pydantic import BaseModel\n\n\n# Define your schema (optional)\nclass ProductSchema(BaseModel):\n name: str\n price: float\n description: str\n\n\nschema = [ProductSchema]\n\n# Perform the scraping\nresult = scrapegraph_tool.scrapegraph_smartscraper(\n prompt=\"Extract product information\",\n url=\"https://example.com/product\",\n api_key=\"your-api-key\",\n schema=schema, # Optional\n)\n```\n\n2. **Markdownify**\n\nConvert webpage content to markdown format:\n\n```python\nmarkdown_content = scrapegraph_tool.scrapegraph_markdownify(\n url=\"https://example.com\", api_key=\"your-api-key\"\n)\n```\n\n3. **Local Scrape**\n\nExtract structured data from raw text:\n\n```python\ntext = \"\"\"\nYour raw text content here...\n\"\"\"\n\nstructured_data = scrapegraph_tool.scrapegraph_local_scrape(\n text=text, api_key=\"your-api-key\"\n)\n```\n\n## Requirements\n\n- Python 3.8+\n- `scrapegraph-py` package\n- Valid Scrapegraph API key\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index tools integrating ScrapegraphAI",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [
"scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a3c039942ae68c106223376d7942dd290dd89692c1a2412d3e1ab2b6ada4aeaf",
"md5": "552ea25dbc4d40e0a57cfe8345492f57",
"sha256": "21c7fe1339585ea5367db88e0093a513aca97c470cb9163af6549e9ee25ec073"
},
"downloads": -1,
"filename": "llama_index_tools_scrapegraphai-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "552ea25dbc4d40e0a57cfe8345492f57",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 2908,
"upload_time": "2025-02-05T20:59:29",
"upload_time_iso_8601": "2025-02-05T20:59:29.267133Z",
"url": "https://files.pythonhosted.org/packages/a3/c0/39942ae68c106223376d7942dd290dd89692c1a2412d3e1ab2b6ada4aeaf/llama_index_tools_scrapegraphai-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7f6fedd41438f189f4f05d3870177f26f4179823afd4152537955c58338574b1",
"md5": "0e04435ecb62d89844c769536e53c425",
"sha256": "64840de80e134fc9091f1daa9ec744dd897707c0a3325d3186a134f5ec87f30a"
},
"downloads": -1,
"filename": "llama_index_tools_scrapegraphai-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "0e04435ecb62d89844c769536e53c425",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 2573,
"upload_time": "2025-02-05T20:59:30",
"upload_time_iso_8601": "2025-02-05T20:59:30.668933Z",
"url": "https://files.pythonhosted.org/packages/7f/6f/edd41438f189f4f05d3870177f26f4179823afd4152537955c58338574b1/llama_index_tools_scrapegraphai-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-05 20:59:30",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-tools-scrapegraphai"
}