snakyscraper

Name	snakyscraper JSON
Version	1.0.0 JSON
	download
home_page	https://github.com/riodevnet/snakyscraper
Summary	SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.
upload_time	2025-08-15 08:01:07
maintainer	None
docs_url	None
author	Rio Dev
requires_python	>=3.6
license	MIT
keywords	snakyscraper scraping scraper
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# 🐍 SnakyScraper

**SnakyScraper** is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.

> Fast. Accurate. Snake-style scraping. 🐍🎯

---

## 🚀 Features

- ✅ Extract metadata: title, description, keywords, author, and more
- ✅ Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags
- ✅ Extract HTML structures: `h1`–`h6`, `p`, `ul`, `ol`, `img`, links
- ✅ Powerful `filter()` method with class, ID, and tag-based selectors
- ✅ `return_html` toggle to return clean text or raw HTML
- ✅ Simple return values: string, list, or dictionary
- ✅ Powered by BeautifulSoup4 and Requests

---

## 📦 Installation

```bash
pip install snakyscraper
```

> Requires Python 3.7 or later

---

## 🛠️ Basic Usage

```python
from snakyscraper import SnakyScraper

scraper = SnakyScraper("https://example.com")

# Get the page title
print(scraper.title())  # "Welcome to Example.com"

# Get meta description
print(scraper.description())  # "This is the example meta description."

# Get all <h1> elements
print(scraper.h1())  # ["Welcome", "Latest News"]

# Extract Open Graph metadata
print(scraper.open_graph())  # {"og:title": "...", "og:description": "...", ...}

# Custom filter: find all div.card elements and extract child tags
print(scraper.filter(
    element="div",
    attributes={"class": "card"},
    multiple=True,
    extract=["h1", "p", ".title", "#desc"]
))
```

---

## 🧪 Available Methods

### 🔹 Page Metadata

```python
scraper.title()
scraper.description()
scraper.keywords()
scraper.keyword_string()
scraper.charset()
scraper.canonical()
scraper.content_type()
scraper.author()
scraper.csrf_token()
scraper.image()
```

### 🔹 Open Graph & Twitter Card

```python
scraper.open_graph()
scraper.open_graph("og:title")

scraper.twitter_card()
scraper.twitter_card("twitter:title")
```

### 🔹 Headings & Text

```python
scraper.h1()
scraper.h2()
scraper.h3()
scraper.h4()
scraper.h5()
scraper.h6()
scraper.p()
```

### 🔹 Lists

```python
scraper.ul()
scraper.ol()
```

### 🔹 Images

```python
scraper.images()
scraper.image_details()
```

### 🔹 Links

```python
scraper.links()
scraper.link_details()
```

---

## 🔍 Custom DOM Filtering

Use `filter()` to target specific DOM elements and extract nested content.

#### ▸ Single element

```python
scraper.filter(
    element="div",
    attributes={"id": "main"},
    multiple=False,
    extract=[".title", "#description", "p"]
)
```

#### ▸ Multiple elements

```python
scraper.filter(
    element="div",
    attributes={"class": "card"},
    multiple=True,
    extract=["h1", ".subtitle", "#meta"]
)
```

> The `extract` argument accepts tag names, class selectors (e.g., `.title`), or ID selectors (e.g., `#meta`).  
> Output keys are automatically normalized:  
> `.title` → `class__title`, `#meta` → `id__meta`

#### ▸ Clean Text Output

You can also disable raw HTML output:

```python
scraper.filter(
    element="p",
    attributes={"class": "dark-text"},
    multiple=True,
    return_html=False
)
```

---

## 📦 Output Example

```python
scraper.title()
# "Welcome to Example.com"

scraper.h1()
# ["Main Heading", "Another Title"]

scraper.open_graph("og:title")
# "Example OG Title"
```

---

## 🤝 Contributing

Contributions are welcome!  
Found a bug or want to request a feature? Please open an [issue](https://github.com/riodevnet/snakyscraper/issues) or submit a pull request.

---

## 📄 License

MIT License © 2025 — SnakyScraper

---

## 🔗 Related Projects

- [BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup/)
- [Requests](https://docs.python-requests.org/)
- [lxml](https://lxml.de/)

---

## 💡 Why SnakyScraper?

> Think of it as your Pythonic sniper — targeting HTML content with precision and elegance.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/riodevnet/snakyscraper",
    "name": "snakyscraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "snakyscraper, scraping, scraper",
    "author": "Rio Dev",
    "author_email": "my.riodev.net@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0b/58/b404a78e02290ad3cb5eb8467954a8f93c930226e6d6d5f6dc7e7d844ca4/snakyscraper-1.0.0.tar.gz",
    "platform": null,
    "description": "\r\n# \ud83d\udc0d SnakyScraper\r\n\r\n**SnakyScraper** is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.\r\n\r\n> Fast. Accurate. Snake-style scraping. \ud83d\udc0d\ud83c\udfaf\r\n\r\n---\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- \u2705 Extract metadata: title, description, keywords, author, and more\r\n- \u2705 Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags\r\n- \u2705 Extract HTML structures: `h1`\u2013`h6`, `p`, `ul`, `ol`, `img`, links\r\n- \u2705 Powerful `filter()` method with class, ID, and tag-based selectors\r\n- \u2705 `return_html` toggle to return clean text or raw HTML\r\n- \u2705 Simple return values: string, list, or dictionary\r\n- \u2705 Powered by BeautifulSoup4 and Requests\r\n\r\n---\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install snakyscraper\r\n```\r\n\r\n> Requires Python 3.7 or later\r\n\r\n---\r\n\r\n## \ud83d\udee0\ufe0f Basic Usage\r\n\r\n```python\r\nfrom snakyscraper import SnakyScraper\r\n\r\nscraper = SnakyScraper(\"https://example.com\")\r\n\r\n# Get the page title\r\nprint(scraper.title())  # \"Welcome to Example.com\"\r\n\r\n# Get meta description\r\nprint(scraper.description())  # \"This is the example meta description.\"\r\n\r\n# Get all <h1> elements\r\nprint(scraper.h1())  # [\"Welcome\", \"Latest News\"]\r\n\r\n# Extract Open Graph metadata\r\nprint(scraper.open_graph())  # {\"og:title\": \"...\", \"og:description\": \"...\", ...}\r\n\r\n# Custom filter: find all div.card elements and extract child tags\r\nprint(scraper.filter(\r\n    element=\"div\",\r\n    attributes={\"class\": \"card\"},\r\n    multiple=True,\r\n    extract=[\"h1\", \"p\", \".title\", \"#desc\"]\r\n))\r\n```\r\n\r\n---\r\n\r\n## \ud83e\uddea Available Methods\r\n\r\n### \ud83d\udd39 Page Metadata\r\n\r\n```python\r\nscraper.title()\r\nscraper.description()\r\nscraper.keywords()\r\nscraper.keyword_string()\r\nscraper.charset()\r\nscraper.canonical()\r\nscraper.content_type()\r\nscraper.author()\r\nscraper.csrf_token()\r\nscraper.image()\r\n```\r\n\r\n### \ud83d\udd39 Open Graph & Twitter Card\r\n\r\n```python\r\nscraper.open_graph()\r\nscraper.open_graph(\"og:title\")\r\n\r\nscraper.twitter_card()\r\nscraper.twitter_card(\"twitter:title\")\r\n```\r\n\r\n### \ud83d\udd39 Headings & Text\r\n\r\n```python\r\nscraper.h1()\r\nscraper.h2()\r\nscraper.h3()\r\nscraper.h4()\r\nscraper.h5()\r\nscraper.h6()\r\nscraper.p()\r\n```\r\n\r\n### \ud83d\udd39 Lists\r\n\r\n```python\r\nscraper.ul()\r\nscraper.ol()\r\n```\r\n\r\n### \ud83d\udd39 Images\r\n\r\n```python\r\nscraper.images()\r\nscraper.image_details()\r\n```\r\n\r\n### \ud83d\udd39 Links\r\n\r\n```python\r\nscraper.links()\r\nscraper.link_details()\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udd0d Custom DOM Filtering\r\n\r\nUse `filter()` to target specific DOM elements and extract nested content.\r\n\r\n#### \u25b8 Single element\r\n\r\n```python\r\nscraper.filter(\r\n    element=\"div\",\r\n    attributes={\"id\": \"main\"},\r\n    multiple=False,\r\n    extract=[\".title\", \"#description\", \"p\"]\r\n)\r\n```\r\n\r\n#### \u25b8 Multiple elements\r\n\r\n```python\r\nscraper.filter(\r\n    element=\"div\",\r\n    attributes={\"class\": \"card\"},\r\n    multiple=True,\r\n    extract=[\"h1\", \".subtitle\", \"#meta\"]\r\n)\r\n```\r\n\r\n> The `extract` argument accepts tag names, class selectors (e.g., `.title`), or ID selectors (e.g., `#meta`).  \r\n> Output keys are automatically normalized:  \r\n> `.title` \u2192 `class__title`, `#meta` \u2192 `id__meta`\r\n\r\n#### \u25b8 Clean Text Output\r\n\r\nYou can also disable raw HTML output:\r\n\r\n```python\r\nscraper.filter(\r\n    element=\"p\",\r\n    attributes={\"class\": \"dark-text\"},\r\n    multiple=True,\r\n    return_html=False\r\n)\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udce6 Output Example\r\n\r\n```python\r\nscraper.title()\r\n# \"Welcome to Example.com\"\r\n\r\nscraper.h1()\r\n# [\"Main Heading\", \"Another Title\"]\r\n\r\nscraper.open_graph(\"og:title\")\r\n# \"Example OG Title\"\r\n```\r\n\r\n---\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome!  \r\nFound a bug or want to request a feature? Please open an [issue](https://github.com/riodevnet/snakyscraper/issues) or submit a pull request.\r\n\r\n---\r\n\r\n## \ud83d\udcc4 License\r\n\r\nMIT License \u00a9 2025 \u2014 SnakyScraper\r\n\r\n---\r\n\r\n## \ud83d\udd17 Related Projects\r\n\r\n- [BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup/)\r\n- [Requests](https://docs.python-requests.org/)\r\n- [lxml](https://lxml.de/)\r\n\r\n---\r\n\r\n## \ud83d\udca1 Why SnakyScraper?\r\n\r\n> Think of it as your Pythonic sniper \u2014 targeting HTML content with precision and elegance.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/riodevnet/snakyscraper"
    },
    "split_keywords": [
        "snakyscraper",
        " scraping",
        " scraper"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b51a2004ba337ba6e958462e2d873f7335bdeb6dacaf7432a1431e31a7a54bd8",
                "md5": "5ad0873fd51fded67bc4a2d57bd88a82",
                "sha256": "564eeccaf88a83803526c0fcb4c35d398b70dfb995eec27c70eb47a3b3d87bca"
            },
            "downloads": -1,
            "filename": "snakyscraper-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ad0873fd51fded67bc4a2d57bd88a82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5527,
            "upload_time": "2025-08-15T08:01:05",
            "upload_time_iso_8601": "2025-08-15T08:01:05.452920Z",
            "url": "https://files.pythonhosted.org/packages/b5/1a/2004ba337ba6e958462e2d873f7335bdeb6dacaf7432a1431e31a7a54bd8/snakyscraper-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b58b404a78e02290ad3cb5eb8467954a8f93c930226e6d6d5f6dc7e7d844ca4",
                "md5": "9fdb11c1c4ac4e743470fba280798c3b",
                "sha256": "368dab92aff789b48fdf2b28d0e883b42b0df9261ddfd68a97b19d302875747b"
            },
            "downloads": -1,
            "filename": "snakyscraper-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9fdb11c1c4ac4e743470fba280798c3b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 6554,
            "upload_time": "2025-08-15T08:01:07",
            "upload_time_iso_8601": "2025-08-15T08:01:07.113132Z",
            "url": "https://files.pythonhosted.org/packages/0b/58/b404a78e02290ad3cb5eb8467954a8f93c930226e6d6d5f6dc7e7d844ca4/snakyscraper-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-15 08:01:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "riodevnet",
    "github_project": "snakyscraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "snakyscraper"
}

Rio Dev