# ChainableSoup
[](https://github.com/thefcraft/ChainableSoup)
[](https://badge.fury.io/py/ChainableSoup)
**ChainableSoup** provides a fluent, pipeline-based interface for querying HTML and XML documents with BeautifulSoup, turning complex nested searches into clean, readable, and chainable method calls.
## The Problem
Working with [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is great, but navigating deeply nested structures can lead to verbose and hard-to-read code:
```python
# Standard BeautifulSoup
try:
doc = soup.find('div', class_='document')
wrapper = doc.find('div', class_='documentwrapper')
body_wrapper = wrapper.find('div', class_='bodywrapper')
body = body_wrapper.find('div', class_='body')
section = body.find('section', recursive=False)
p_tag = section.find_all('p', recursive=False)[0]
print(p_tag.text)
except AttributeError:
print("One of the tags was not found.")
```
This pattern is repetitive, and the error handling can obscure the main logic.
## The Solution: A Fluent Pipeline
ChainableSoup elegantly solves this by introducing a `Pipeline` that lets you chain `find` operations. The same query becomes:
```python
from ChainableSoup import Pipeline
# With ChainableSoup
pipeline = Pipeline().find_tag('div', class_='document') .find_tag('div', class_='documentwrapper') .find_tag('div', class_='bodywrapper') .find_tag('div', class_='body') .find_tag('section', recursive=False) .find_all_tags('p', recursive=False)[0]
# Execute the pipeline and get the result
first_p = pipeline.raise_for_error.run(soup)
print(first_p.text)
```
or
```python
from ChainableSoup import Pipeline, NestedArg, SpecalArg
# With ChainableSoup
pipeline = Pipeline().find_nested_tag(
name = NestedArg() >> 'div' >> 'div' >> 'div' >> 'div' >> 'section',
class_ = NestedArg() >> 'document' >> 'documentwrapper' >> 'bodywrapper' >> 'body',
recursive = NestedArg() >> True >> True >> True >> True >> False >> SpecalArg.EXPANDLAST
).find_all_tags('p', recursive=False)[0]
# Execute the pipeline and get the result
first_p = pipeline.raise_for_error.run(soup)
print(first_p.text)
```
## Features
- **Fluent Chaining:** Link `find_tag` and `find_all_tags` calls in a natural, readable sequence.
- **Powerful Nested Searches:** Use `find_nested_tag` with `NestedArg` to perform complex deep searches with a single method call.
- **Sequence Operations:** After a `find_all_tags` call, you can `filter`, `map`, and perform assertions on the sequence of results.
- **Robust Error Handling:** Choose your style: either get a descriptive `Error` object back or have an exception raised automatically on failure.
- **Intelligent Argument Resolution:** Automatically handle varying arguments for each level of a nested search.
## Installation
```bash
pip install ChainableSoup
```
## Quickstart
### 1. Basic Find
Create a `Pipeline` and chain `find_tag` calls to navigate to a specific element.
```python
from bs4 import BeautifulSoup
from ChainableSoup import Pipeline
html = '''
<body>
<div id="content">
<h1>Title</h1>
<p>First paragraph.</p>
<p>Second paragraph.</p>
</div>
</body>
'''
soup = BeautifulSoup(html, 'html.parser')
# Build the pipeline
pipeline = Pipeline().find_tag('body').find_tag('div', id='content').find_tag('p')
# Execute it and raise an exception if any tag is not found
first_p = pipeline.raise_for_error.run(soup)
print(first_p.text)
# Output: First paragraph.
# Alternatively, execute without raising an error
result = pipeline.run(soup)
if not result:
print(f"Pipeline failed: {result.msg}")
else:
print(result.text)
```
### 2. Finding All Tags and Filtering
Use `find_all_tags` to get a sequence of results. This returns a `PipelineSequence` object, which you can use to filter, map, or select items.
```python
# Continues from the previous example...
# Find all <p> tags inside the div
p_sequence = Pipeline().find_tag('div', id='content').find_all_tags('p')
# Select the second paragraph (index 1)
second_p_pipeline = p_sequence[1]
print(second_p_pipeline.raise_for_error.run(soup).text)
# Output: Second paragraph.
# Or use .first / .last properties
first_p_pipeline = p_sequence.first
print(first_p_pipeline.raise_for_error.run(soup).text)
# Output: First paragraph.
# Filter the sequence
contains_second = lambda tag: "Second" in tag.text
filtered_sequence = p_sequence.filter(contains_second)
# This will now find the first (and only) tag that matches the filter
result = filtered_sequence.first.raise_for_error.run(soup)
print(result.text)
# Output: Second paragraph.
```
## Advanced Usage: `find_nested_tag`
The `find_nested_tag` method is the most powerful feature of ChainableSoup. It allows you to define an entire path of `find` operations in a single, declarative call using `NestedArg`.
### `NestedArg`
An `NestedArg` is a fluent builder for creating a list of arguments, one for each level of the search. You can chain values using the `>>` operator or the `.add()` method.
### Example
Let's revisit the complex example from the introduction.
```python
from ChainableSoup import Pipeline, NestedArg, SpecalArg
# ... setup soup ...
pipeline = Pipeline().find_nested_tag(
# For each level of the search, specify the tag 'name'
name = NestedArg() >> 'body' >> 'div' >> 'div' >> 'div' >> 'div',
# Specify attributes for each level. The lists are matched by index.
attrs={
'class': NestedArg() >> None >> 'document' >> 'documentwrapper' >> 'bodywrapper' >> 'body'
},
# Specify the `recursive` flag. Here, we use a Special Argument.
# It will be True, then False, and EXPANDLAST will repeat `False` for the rest.
recursive = NestedArg() >> True >> False >> SpecalArg.EXPANDLAST
).find_all_tags(
name='section',
recursive=False
).first.find_all_tags(
name='p',
recursive=False
)
# Create two branches of the pipeline to get the first and second <p> tags
first_p_pipeline = pipeline[0]
second_p_pipeline = pipeline[1]
# Execute both
print(first_p_pipeline.raise_for_error.run(soup).text)
print(second_p_pipeline.raise_for_error.run(soup).text)
```
### `SpecalArg` Enum
When argument lists have different lengths, `SpecalArg` controls how the shorter lists are padded to match the longest one.
- `SpecalArg.EXPANDLAST`: Repeats the last provided value.
- `SpecalArg.FILLNONE`: Fills with `None` (the default).
- `SpecalArg.FILLTRUE`: Fills with `True`.
- `SpecalArg.FILLFALSE`: Fills with `False`.
## API Overview
- **`Pipeline`**: The main object for building a query that results in a **single `Tag`**.
- `.find_tag(...)`: Appends a `find` operation.
- `.find_nested_tag(...)`: Appends a series of `find` operations.
- `.find_all_tags(...)`: Transitions the query into a `PipelineSequence`.
- `.run(soup)`: Executes the pipeline and returns a `Tag` or `Error` object.
- `.run_and_raise_for_error(soup)`: Executes and raises an `Error` on failure.
- **`PipelineSequence`**: An object for building a query that results in a **sequence of `Tag`s**.
- `.filter(fn)`: Filters the sequence.
- `.map(fn)`: Applies a function to each tag in the sequence.
- `.assert_all(fn)`: Asserts a condition for all tags.
- `.first`, `.last`, `[index]`: Selects a single element, returning control to a `Pipeline`.
- **`NestedArg`**: A helper class to build argument lists for `find_nested_tag`.
## Contributing
Contributions are welcome! If you have a feature request, find a bug, or want to improve the documentation, please open an issue or submit a pull request on our [GitHub repository](https://github.com/your-username/chainablesoup).
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/thefcraft/ChainableSoup",
"name": "ChainableSoup",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "beautifulsoup, bs4, scraping, parser, html, xml, fluent, chainable, pipeline",
"author": "ThefCraft",
"author_email": "sisodiyalaksh@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/45/cc/5db602c5bf0a298167d855a0adbbbf35122115f6fd24f04048d0ae895b08/ChainableSoup-0.1.3.tar.gz",
"platform": null,
"description": "# ChainableSoup\n\n[](https://github.com/thefcraft/ChainableSoup)\n[](https://badge.fury.io/py/ChainableSoup)\n\n**ChainableSoup** provides a fluent, pipeline-based interface for querying HTML and XML documents with BeautifulSoup, turning complex nested searches into clean, readable, and chainable method calls.\n\n## The Problem\n\nWorking with [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is great, but navigating deeply nested structures can lead to verbose and hard-to-read code:\n\n```python\n# Standard BeautifulSoup\ntry:\n doc = soup.find('div', class_='document')\n wrapper = doc.find('div', class_='documentwrapper')\n body_wrapper = wrapper.find('div', class_='bodywrapper')\n body = body_wrapper.find('div', class_='body')\n section = body.find('section', recursive=False)\n p_tag = section.find_all('p', recursive=False)[0]\n print(p_tag.text)\nexcept AttributeError:\n print(\"One of the tags was not found.\")\n\n```\n\nThis pattern is repetitive, and the error handling can obscure the main logic.\n\n## The Solution: A Fluent Pipeline\n\nChainableSoup elegantly solves this by introducing a `Pipeline` that lets you chain `find` operations. The same query becomes:\n\n```python\nfrom ChainableSoup import Pipeline\n\n# With ChainableSoup\npipeline = Pipeline().find_tag('div', class_='document') .find_tag('div', class_='documentwrapper') .find_tag('div', class_='bodywrapper') .find_tag('div', class_='body') .find_tag('section', recursive=False) .find_all_tags('p', recursive=False)[0]\n\n# Execute the pipeline and get the result\nfirst_p = pipeline.raise_for_error.run(soup)\nprint(first_p.text)\n```\n\nor\n\n```python \nfrom ChainableSoup import Pipeline, NestedArg, SpecalArg\n\n# With ChainableSoup\npipeline = Pipeline().find_nested_tag(\n name = NestedArg() >> 'div' >> 'div' >> 'div' >> 'div' >> 'section',\n class_ = NestedArg() >> 'document' >> 'documentwrapper' >> 'bodywrapper' >> 'body',\n recursive = NestedArg() >> True >> True >> True >> True >> False >> SpecalArg.EXPANDLAST\n).find_all_tags('p', recursive=False)[0]\n\n# Execute the pipeline and get the result\nfirst_p = pipeline.raise_for_error.run(soup)\nprint(first_p.text)\n```\n\n## Features\n\n- **Fluent Chaining:** Link `find_tag` and `find_all_tags` calls in a natural, readable sequence.\n- **Powerful Nested Searches:** Use `find_nested_tag` with `NestedArg` to perform complex deep searches with a single method call.\n- **Sequence Operations:** After a `find_all_tags` call, you can `filter`, `map`, and perform assertions on the sequence of results.\n- **Robust Error Handling:** Choose your style: either get a descriptive `Error` object back or have an exception raised automatically on failure.\n- **Intelligent Argument Resolution:** Automatically handle varying arguments for each level of a nested search.\n\n## Installation\n\n```bash\npip install ChainableSoup\n```\n\n## Quickstart\n\n### 1. Basic Find\n\nCreate a `Pipeline` and chain `find_tag` calls to navigate to a specific element.\n\n```python\nfrom bs4 import BeautifulSoup\nfrom ChainableSoup import Pipeline\n\nhtml = '''\n<body>\n <div id=\"content\">\n <h1>Title</h1>\n <p>First paragraph.</p>\n <p>Second paragraph.</p>\n </div>\n</body>\n'''\nsoup = BeautifulSoup(html, 'html.parser')\n\n# Build the pipeline\npipeline = Pipeline().find_tag('body').find_tag('div', id='content').find_tag('p')\n\n# Execute it and raise an exception if any tag is not found\nfirst_p = pipeline.raise_for_error.run(soup)\nprint(first_p.text)\n# Output: First paragraph.\n\n# Alternatively, execute without raising an error\nresult = pipeline.run(soup)\nif not result:\n print(f\"Pipeline failed: {result.msg}\")\nelse:\n print(result.text)\n```\n\n### 2. Finding All Tags and Filtering\n\nUse `find_all_tags` to get a sequence of results. This returns a `PipelineSequence` object, which you can use to filter, map, or select items.\n\n```python\n# Continues from the previous example...\n\n# Find all <p> tags inside the div\np_sequence = Pipeline().find_tag('div', id='content').find_all_tags('p')\n\n# Select the second paragraph (index 1)\nsecond_p_pipeline = p_sequence[1]\nprint(second_p_pipeline.raise_for_error.run(soup).text)\n# Output: Second paragraph.\n\n# Or use .first / .last properties\nfirst_p_pipeline = p_sequence.first\nprint(first_p_pipeline.raise_for_error.run(soup).text)\n# Output: First paragraph.\n\n# Filter the sequence\ncontains_second = lambda tag: \"Second\" in tag.text\nfiltered_sequence = p_sequence.filter(contains_second)\n\n# This will now find the first (and only) tag that matches the filter\nresult = filtered_sequence.first.raise_for_error.run(soup)\nprint(result.text)\n# Output: Second paragraph.\n```\n\n## Advanced Usage: `find_nested_tag`\n\nThe `find_nested_tag` method is the most powerful feature of ChainableSoup. It allows you to define an entire path of `find` operations in a single, declarative call using `NestedArg`.\n\n### `NestedArg`\n\nAn `NestedArg` is a fluent builder for creating a list of arguments, one for each level of the search. You can chain values using the `>>` operator or the `.add()` method.\n\n### Example\n\nLet's revisit the complex example from the introduction.\n\n```python\nfrom ChainableSoup import Pipeline, NestedArg, SpecalArg\n\n# ... setup soup ...\n\npipeline = Pipeline().find_nested_tag(\n # For each level of the search, specify the tag 'name'\n name = NestedArg() >> 'body' >> 'div' >> 'div' >> 'div' >> 'div',\n\n # Specify attributes for each level. The lists are matched by index.\n attrs={\n 'class': NestedArg() >> None >> 'document' >> 'documentwrapper' >> 'bodywrapper' >> 'body'\n },\n \n # Specify the `recursive` flag. Here, we use a Special Argument.\n # It will be True, then False, and EXPANDLAST will repeat `False` for the rest.\n recursive = NestedArg() >> True >> False >> SpecalArg.EXPANDLAST\n\n).find_all_tags(\n name='section',\n recursive=False\n).first.find_all_tags(\n name='p',\n recursive=False\n)\n\n# Create two branches of the pipeline to get the first and second <p> tags\nfirst_p_pipeline = pipeline[0]\nsecond_p_pipeline = pipeline[1]\n\n# Execute both\nprint(first_p_pipeline.raise_for_error.run(soup).text)\nprint(second_p_pipeline.raise_for_error.run(soup).text)\n```\n\n### `SpecalArg` Enum\n\nWhen argument lists have different lengths, `SpecalArg` controls how the shorter lists are padded to match the longest one.\n\n- `SpecalArg.EXPANDLAST`: Repeats the last provided value.\n- `SpecalArg.FILLNONE`: Fills with `None` (the default).\n- `SpecalArg.FILLTRUE`: Fills with `True`.\n- `SpecalArg.FILLFALSE`: Fills with `False`.\n\n## API Overview\n\n- **`Pipeline`**: The main object for building a query that results in a **single `Tag`**.\n - `.find_tag(...)`: Appends a `find` operation.\n - `.find_nested_tag(...)`: Appends a series of `find` operations.\n - `.find_all_tags(...)`: Transitions the query into a `PipelineSequence`.\n - `.run(soup)`: Executes the pipeline and returns a `Tag` or `Error` object.\n - `.run_and_raise_for_error(soup)`: Executes and raises an `Error` on failure.\n\n- **`PipelineSequence`**: An object for building a query that results in a **sequence of `Tag`s**.\n - `.filter(fn)`: Filters the sequence.\n - `.map(fn)`: Applies a function to each tag in the sequence.\n - `.assert_all(fn)`: Asserts a condition for all tags.\n - `.first`, `.last`, `[index]`: Selects a single element, returning control to a `Pipeline`.\n\n- **`NestedArg`**: A helper class to build argument lists for `find_nested_tag`.\n\n## Contributing\n\nContributions are welcome! If you have a feature request, find a bug, or want to improve the documentation, please open an issue or submit a pull request on our [GitHub repository](https://github.com/your-username/chainablesoup).\n\n## License\n\nThis project is licensed under the MIT License.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A fluent, pipeline-based interface for querying HTML/XML with BeautifulSoup.",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/thefcraft/ChainableSoup"
},
"split_keywords": [
"beautifulsoup",
" bs4",
" scraping",
" parser",
" html",
" xml",
" fluent",
" chainable",
" pipeline"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "45cc5db602c5bf0a298167d855a0adbbbf35122115f6fd24f04048d0ae895b08",
"md5": "d86daa6824812d84a4d06306b4467cc0",
"sha256": "68ad8fdafd3250375442959daadfed7221bae35ce3318bfc61cbdedc4e5416e8"
},
"downloads": -1,
"filename": "ChainableSoup-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "d86daa6824812d84a4d06306b4467cc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13176,
"upload_time": "2025-07-16T19:47:02",
"upload_time_iso_8601": "2025-07-16T19:47:02.050305Z",
"url": "https://files.pythonhosted.org/packages/45/cc/5db602c5bf0a298167d855a0adbbbf35122115f6fd24f04048d0ae895b08/ChainableSoup-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-16 19:47:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thefcraft",
"github_project": "ChainableSoup",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "chainablesoup"
}