langchain-scrapeless


Namelangchain-scrapeless JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryAn integration package connecting Scrapeless and LangChain
upload_time2025-07-17 07:16:32
maintainerNone
docs_urlNone
authorScrapeless Team
requires_python<4.0,>=3.9
licenseMIT
keywords scrapeless langchain integration universal scraping api scraping crawl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img width="180" src="https://app.scrapeless.com/assets/logo.svg" alt="Scrapeless logo">

LangChain Scrapeless: an all-in-one, highly scalable web scraping toolkit for enterprises and developers that also integrates with LangChain’s AI tools. Maintained by [Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless).

[Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) | [Documentation](https://docs.scrapeless.com) | [LangChain](https://langchain.com)

</div>

---

**langchain-scrapeless** is designed for seamless integration with LangChain, enabling you to:

- Run custom scraping tasks using your own crawlers or scraping logic.
- Automate data extraction and processing workflows in Python.
- Manage and interact with datasets produced by your scraping jobs.
- Access scraping and data handling capabilities as LangChain tools, making them easy to compose with LLM-powered chains and agents.

## πŸ“¦ Installation

```bash
pip install langchain-scrapeless
```

## βœ… Prerequisites

You should configure the credentials for the Scrapeless API in your environment variables.

- `SCRAPELESS_API_KEY`: Your Scrapeless API key.

If you don't have an API key, you can register at [here](https://app.scrapeless.com/passport/register?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) and learn how to get your API key in [Scrapeless documentation](https://docs.scrapeless.com/en/sdk/node-sdk/#quick-start).

## πŸ› οΈ Available Tools

### πŸ” DeepSerp

#### 🌐 ScrapelessDeepSerpGoogleSearchTool

Perform Google search queries and get the results.

```python
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool

tool = ScrapelessDeepSerpGoogleSearchTool()

# Basic usage
# result = tool.invoke("I want to know Scrapeless")
# print(result)

# Advanced usage
result = tool.invoke({
    "q": "Scrapeless",
    "hl": "en",
    "google_domain": "google.com"
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessDeepSerpGoogleSearchTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "I want to what is Scrapeless")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()
```

You can visit [here](https://apidocs.scrapeless.com/doc-800321) to learn more customizations options.

#### 🌐 ScrapelessDeepSerpGoogleTrendsTool

Perform Google trends queries and get the results.

```python
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool

tool = ScrapelessDeepSerpGoogleTrendsTool()

# Basic usage
# result = tool.invoke("Funny 2048,negamon monster trainer")
# print(result)

# Advanced usage
result = tool.invoke({
    "q": "Scrapeless",
    "data_type": "related_topics",
    "hl": "en"
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessDeepSerpGoogleTrendsTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "I want to know the iphone keyword trends")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

```

You can visit [here](https://apidocs.scrapeless.com/doc-796980) to learn more customizations options.

### πŸ”“ ScrapelessUniversalScrapingTool

Access any website at scale and say goodbye to blocks.

```python
from langchain_scrapeless import ScrapelessUniversalScrapingTool

tool = ScrapelessUniversalScrapingTool()

# Basic usage
# result = tool.invoke("https://example.com")
# print(result)

# Advanced usage
result = tool.invoke({
    "url": "https://exmaple.com",
    "response_type": "markdown"
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessUniversalScrapingTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessUniversalScrapingTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()
```

You can visit [here](https://apidocs.scrapeless.com/api-12948840) to learn more customizations options.

### πŸ•·οΈ Crawler

#### 🌐 ScrapelessCrawlerCrawlTool

Crawl a website and its linked pages to extract comprehensive data

```python
from langchain_scrapeless import ScrapelessCrawlerCrawlTool

tool = ScrapelessCrawlerCrawlTool()

# Basic
# result = tool.invoke("https://example.com")
# print(result)

# Advanced usage
result = tool.invoke({
    "url": "https://exmaple.com",
    "limit": 4
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessCrawlerCrawlTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()
```

You can visit [here](https://apidocs.scrapeless.com/api-17509010) to learn more customizations options.

#### 🌐 ScrapelessCrawlerScrapeTool

Extract data from a single or multiple webpages.

```python
from langchain_scrapeless import ScrapelessCrawlerScrapeTool

tool = ScrapelessCrawlerScrapeTool()

result = tool.invoke({
    "urls": ["https://exmaple.com", "https://www.scrapeless.com/en"],
    "formats": ["markdown"]
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessCrawlerScrapeTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "langchain-scrapeless",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "scrapeless, langchain, integration, universal scraping api, scraping, crawl",
    "author": "Scrapeless Team",
    "author_email": "scrapelessteam@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/42/7f/65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e/langchain_scrapeless-0.1.3.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img width=\"180\" src=\"https://app.scrapeless.com/assets/logo.svg\" alt=\"Scrapeless logo\">\n\nLangChain Scrapeless: an all-in-one, highly scalable web scraping toolkit for enterprises and developers that also integrates with LangChain\u2019s AI tools. Maintained by [Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless).\n\n[Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) | [Documentation](https://docs.scrapeless.com) | [LangChain](https://langchain.com)\n\n</div>\n\n---\n\n**langchain-scrapeless** is designed for seamless integration with LangChain, enabling you to:\n\n- Run custom scraping tasks using your own crawlers or scraping logic.\n- Automate data extraction and processing workflows in Python.\n- Manage and interact with datasets produced by your scraping jobs.\n- Access scraping and data handling capabilities as LangChain tools, making them easy to compose with LLM-powered chains and agents.\n\n## \ud83d\udce6 Installation\n\n```bash\npip install langchain-scrapeless\n```\n\n## \u2705 Prerequisites\n\nYou should configure the credentials for the Scrapeless API in your environment variables.\n\n- `SCRAPELESS_API_KEY`: Your Scrapeless API key.\n\nIf you don't have an API key, you can register at [here](https://app.scrapeless.com/passport/register?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) and learn how to get your API key in [Scrapeless documentation](https://docs.scrapeless.com/en/sdk/node-sdk/#quick-start).\n\n## \ud83d\udee0\ufe0f Available Tools\n\n### \ud83d\udd0d DeepSerp\n\n#### \ud83c\udf10 ScrapelessDeepSerpGoogleSearchTool\n\nPerform Google search queries and get the results.\n\n```python\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool\n\ntool = ScrapelessDeepSerpGoogleSearchTool()\n\n# Basic usage\n# result = tool.invoke(\"I want to know Scrapeless\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n    \"q\": \"Scrapeless\",\n    \"hl\": \"en\",\n    \"google_domain\": \"google.com\"\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessDeepSerpGoogleSearchTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n        {\"messages\": [(\"human\", \"I want to what is Scrapeless\")]},\n        stream_mode=\"values\"\n):\n    chunk[\"messages\"][-1].pretty_print()\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/doc-800321) to learn more customizations options.\n\n#### \ud83c\udf10 ScrapelessDeepSerpGoogleTrendsTool\n\nPerform Google trends queries and get the results.\n\n```python\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool\n\ntool = ScrapelessDeepSerpGoogleTrendsTool()\n\n# Basic usage\n# result = tool.invoke(\"Funny 2048,negamon monster trainer\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n    \"q\": \"Scrapeless\",\n    \"data_type\": \"related_topics\",\n    \"hl\": \"en\"\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessDeepSerpGoogleTrendsTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n        {\"messages\": [(\"human\", \"I want to know the iphone keyword trends\")]},\n        stream_mode=\"values\"\n):\n    chunk[\"messages\"][-1].pretty_print()\n\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/doc-796980) to learn more customizations options.\n\n### \ud83d\udd13 ScrapelessUniversalScrapingTool\n\nAccess any website at scale and say goodbye to blocks.\n\n```python\nfrom langchain_scrapeless import ScrapelessUniversalScrapingTool\n\ntool = ScrapelessUniversalScrapingTool()\n\n# Basic usage\n# result = tool.invoke(\"https://example.com\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n    \"url\": \"https://exmaple.com\",\n    \"response_type\": \"markdown\"\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessUniversalScrapingTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessUniversalScrapingTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n        {\"messages\": [(\"human\", \"Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.\")]},\n        stream_mode=\"values\"\n):\n    chunk[\"messages\"][-1].pretty_print()\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/api-12948840) to learn more customizations options.\n\n### \ud83d\udd77\ufe0f Crawler\n\n#### \ud83c\udf10 ScrapelessCrawlerCrawlTool\n\nCrawl a website and its linked pages to extract comprehensive data\n\n```python\nfrom langchain_scrapeless import ScrapelessCrawlerCrawlTool\n\ntool = ScrapelessCrawlerCrawlTool()\n\n# Basic\n# result = tool.invoke(\"https://example.com\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n    \"url\": \"https://exmaple.com\",\n    \"limit\": 4\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessCrawlerCrawlTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessCrawlerCrawlTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n        {\"messages\": [(\"human\", \"Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.\")]},\n        stream_mode=\"values\"\n):\n    chunk[\"messages\"][-1].pretty_print()\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/api-17509010) to learn more customizations options.\n\n#### \ud83c\udf10 ScrapelessCrawlerScrapeTool\n\nExtract data from a single or multiple webpages.\n\n```python\nfrom langchain_scrapeless import ScrapelessCrawlerScrapeTool\n\ntool = ScrapelessCrawlerScrapeTool()\n\nresult = tool.invoke({\n    \"urls\": [\"https://exmaple.com\", \"https://www.scrapeless.com/en\"],\n    \"formats\": [\"markdown\"]\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessCrawlerScrapeTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessCrawlerScrapeTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n        {\"messages\": [(\"human\", \"Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.\")]},\n        stream_mode=\"values\"\n):\n    chunk[\"messages\"][-1].pretty_print()\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An integration package connecting Scrapeless and LangChain",
    "version": "0.1.3",
    "project_urls": {
        "Issue Tracker": "https://github.com/scrapeless-ai/langchain-scrapeless/issues",
        "Release Notes": "https://github.com/scrapeless-ai/langchain-scrapeless/releases?expanded=true",
        "Repository": "https://github.com/scrapeless-ai/langchain-scrapeless",
        "Scrapeless Homepage": "https://scrapeless.com",
        "Source Code": "https://github.com/scrapeless-ai/langchain-scrapeless/tree/main"
    },
    "split_keywords": [
        "scrapeless",
        " langchain",
        " integration",
        " universal scraping api",
        " scraping",
        " crawl"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "929407bcdc6caf7652963d165a7740b420639466b621b67a265a9d42d09f5dae",
                "md5": "51a765ed51ab4d19047d168488cc7790",
                "sha256": "29f4f49f8d7a3017e7e311454c5b71cba76845c2e8a29a4508486bd7284a592a"
            },
            "downloads": -1,
            "filename": "langchain_scrapeless-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "51a765ed51ab4d19047d168488cc7790",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 26189,
            "upload_time": "2025-07-17T07:16:30",
            "upload_time_iso_8601": "2025-07-17T07:16:30.668714Z",
            "url": "https://files.pythonhosted.org/packages/92/94/07bcdc6caf7652963d165a7740b420639466b621b67a265a9d42d09f5dae/langchain_scrapeless-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "427f65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e",
                "md5": "d7132e9c1fa545ce4c3a010692fcecef",
                "sha256": "7eb799342c875b8074016cf2beec57a594763392e3110643263111b0abc35f59"
            },
            "downloads": -1,
            "filename": "langchain_scrapeless-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d7132e9c1fa545ce4c3a010692fcecef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 18047,
            "upload_time": "2025-07-17T07:16:32",
            "upload_time_iso_8601": "2025-07-17T07:16:32.457685Z",
            "url": "https://files.pythonhosted.org/packages/42/7f/65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e/langchain_scrapeless-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 07:16:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scrapeless-ai",
    "github_project": "langchain-scrapeless",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "langchain-scrapeless"
}
        
Elapsed time: 1.71701s