gobeautifulsoup

Name	gobeautifulsoup JSON
Version	1.0.0 JSON
	download
home_page	https://github.com/coffeecms/gobeautifulsoup
Summary	A high-performance BeautifulSoup replacement powered by Go
upload_time	2025-07-29 02:01:04
maintainer	None
docs_url	None
author	CoffeeCMS Team
requires_python	>=3.7
license	MIT License Copyright (c) 2025 CoffeeCMS Team Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	html xml parsing beautifulsoup go performance web-scraping
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # GoBeautifulSoup

[![PyPI version](https://badge.fury.io/py/gobeautifulsoup.svg)](https://badge.fury.io/py/gobeautifulsoup)
[![Python versions](https://img.shields.io/pypi/pyversions/gobeautifulsoup.svg)](https://pypi.org/project/gobeautifulsoup/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://pepy.tech/badge/gobeautifulsoup)](https://pepy.tech/project/gobeautifulsoup)

**GoBeautifulSoup** is a high-performance HTML/XML parsing library that provides a 100% compatible API with BeautifulSoup4, but powered by Go for dramatically improved performance. It's designed as a drop-in replacement for BeautifulSoup4 with significant speed improvements.

## 🚀 Why GoBeautifulSoup?

- **🔥 Up to 10-50x faster** than BeautifulSoup4 for parsing and querying
- **🔄 100% API Compatible** - Drop-in replacement for BeautifulSoup4
- **⚡ Go-Powered Backend** - Leverages Go's performance for HTML/XML processing
- **🌐 Cross-Platform** - Works on Windows, macOS, and Linux (x64/ARM64)
- **💾 Memory Efficient** - Optimized memory usage for large documents
- **🛡️ Production Ready** - Thoroughly tested with comprehensive benchmarks

## 📊 Performance Comparison

GoBeautifulSoup dramatically outperforms BeautifulSoup4 across all operations:

### Parsing Performance

| Document Size | GoBeautifulSoup | BeautifulSoup4 (html.parser) | BeautifulSoup4 (lxml) | Speed Improvement |
|---------------|-----------------|-------------------------------|----------------------|-------------------|
| Small (1KB)   | 0.044ms        | 2.1ms                        | 1.8ms               | **48x faster**    |
| Medium (100KB)| 5.7ms          | 89ms                         | 76ms                | **15x faster**    |
| Large (1MB)   | 154ms          | 2,400ms                      | 1,980ms             | **15x faster**    |

### Query Performance (Medium Document)

| Operation              | GoBeautifulSoup | BeautifulSoup4 | Speed Improvement |
|------------------------|-----------------|----------------|-------------------|
| `find('div')`         | 0.16ms         | 3.2ms         | **20x faster**    |
| `find_all('div')`     | 4.5ms          | 45ms          | **10x faster**    |
| `select('h3')`        | 2.5ms          | 28ms          | **11x faster**    |
| `find(class_='item')` | 0.55ms         | 8.9ms         | **16x faster**    |

## 🔧 Installation

```bash
pip install gobeautifulsoup
```

## 📖 Quick Start

GoBeautifulSoup provides the exact same API as BeautifulSoup4:

```python
from gobeautifulsoup import BeautifulSoup

# Parse HTML
html = """
<html>
    <head><title>Example</title></head>
    <body>
        <div class="container">
            <p class="highlight">Hello World!</p>
            <a href="https://example.com">Link</a>
        </div>
    </body>
</html>
"""

soup = BeautifulSoup(html, 'html.parser')

# All familiar BeautifulSoup methods work exactly the same
title = soup.find('title').get_text()
print(title)  # "Example"

paragraph = soup.find('p', class_='highlight')
print(paragraph.get_text())  # "Hello World!"

links = soup.find_all('a')
for link in links:
    print(link.get('href'))  # "https://example.com"
```

## 💡 Usage Examples

### 1. Basic HTML Parsing

```python
from gobeautifulsoup import BeautifulSoup

html = """
<html>
    <body>
        <h1>Welcome</h1>
        <p class="intro">This is an introduction.</p>
        <ul>
            <li>Item 1</li>
            <li>Item 2</li>
            <li>Item 3</li>
        </ul>
    </body>
</html>
"""

soup = BeautifulSoup(html, 'html.parser')

# Find elements
heading = soup.find('h1')
print(f"Heading: {heading.get_text()}")

# Find by class
intro = soup.find('p', class_='intro')
print(f"Introduction: {intro.get_text()}")

# Find all list items
items = soup.find_all('li')
for i, item in enumerate(items, 1):
    print(f"Item {i}: {item.get_text()}")
```

### 2. Web Scraping with Requests

```python
import requests
from gobeautifulsoup import BeautifulSoup

# Scrape a webpage
url = "https://httpbin.org/html"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract all links
links = soup.find_all('a')
for link in links:
    href = link.get('href')
    text = link.get_text().strip()
    if href:
        print(f"Link: {text} -> {href}")

# Extract all headings
for heading in soup.find_all(['h1', 'h2', 'h3']):
    print(f"{heading.name}: {heading.get_text()}")
```

### 3. CSS Selector Support

```python
from gobeautifulsoup import BeautifulSoup

html = """
<div class="content">
    <article id="post-1" class="post featured">
        <h2>Featured Post</h2>
        <p class="excerpt">This is a featured post excerpt.</p>
    </article>
    <article id="post-2" class="post">
        <h2>Regular Post</h2>
        <p class="excerpt">This is a regular post excerpt.</p>
    </article>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

# CSS selectors work exactly like BeautifulSoup4
featured_posts = soup.select('.post.featured')
print(f"Featured posts: {len(featured_posts)}")

# Complex selectors
excerpts = soup.select('article p.excerpt')
for excerpt in excerpts:
    print(f"Excerpt: {excerpt.get_text()}")

# ID selectors
specific_post = soup.select('#post-1 h2')[0]
print(f"Specific post title: {specific_post.get_text()}")
```

### 4. XML Processing

```python
from gobeautifulsoup import BeautifulSoup

xml_data = """
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
    <book id="1">
        <title>Python Programming</title>
        <author>John Doe</author>
        <price currency="USD">29.99</price>
    </book>
    <book id="2">
        <title>Web Development</title>
        <author>Jane Smith</author>
        <price currency="USD">34.99</price>
    </book>
</catalog>
"""

soup = BeautifulSoup(xml_data, 'xml')

# Process XML data
books = soup.find_all('book')
for book in books:
    book_id = book.get('id')
    title = book.find('title').get_text()
    author = book.find('author').get_text()
    price = book.find('price')
    
    print(f"Book {book_id}: {title} by {author}")
    print(f"Price: {price.get('currency')} {price.get_text()}")
    print("-" * 40)
```

### 5. Advanced Data Extraction

```python
from gobeautifulsoup import BeautifulSoup
import re

html = """
<table class="data-table">
    <thead>
        <tr>
            <th>Product</th>
            <th>Price</th>
            <th>Stock</th>
        </tr>
    </thead>
    <tbody>
        <tr data-product-id="123">
            <td class="product-name">Laptop</td>
            <td class="price">$999.99</td>
            <td class="stock in-stock">Available</td>
        </tr>
        <tr data-product-id="124">
            <td class="product-name">Mouse</td>
            <td class="price">$29.99</td>
            <td class="stock out-of-stock">Out of Stock</td>
        </tr>
    </tbody>
</table>
"""

soup = BeautifulSoup(html, 'html.parser')

# Extract structured data
products = []
rows = soup.select('tbody tr')

for row in rows:
    product_id = row.get('data-product-id')
    name = row.select_one('.product-name').get_text()
    price_text = row.select_one('.price').get_text()
    stock_cell = row.select_one('.stock')
    
    # Extract price using regex
    price_match = re.search(r'\$(\d+\.?\d*)', price_text)
    price = float(price_match.group(1)) if price_match else 0.0
    
    # Determine stock status
    in_stock = 'in-stock' in stock_cell.get('class', [])
    
    products.append({
        'id': product_id,
        'name': name,
        'price': price,
        'in_stock': in_stock
    })

# Display extracted data
for product in products:
    status = "✅ Available" if product['in_stock'] else "❌ Out of Stock"
    print(f"{product['name']} (ID: {product['id']})")
    print(f"Price: ${product['price']:.2f} | Status: {status}")
    print("-" * 50)
```

## 🔄 Migration from BeautifulSoup4

GoBeautifulSoup is designed as a drop-in replacement. Simply change your import:

```python
# Before
from bs4 import BeautifulSoup

# After  
from gobeautifulsoup import BeautifulSoup

# Everything else stays exactly the same!
```

## 📋 Supported Features

✅ **Full BeautifulSoup4 API Compatibility**
- `find()` and `find_all()` methods
- CSS selector support with `select()`
- Tree navigation (parent, children, siblings)
- Attribute access and modification
- Text extraction and manipulation

✅ **Parser Support**
- HTML parser (`html.parser`)
- XML parser (`xml`) 
- Automatic encoding detection

✅ **Advanced Features**
- Regular expression search
- Custom attribute filters
- Tree modification methods
- Pretty printing

## 🏗️ Architecture

GoBeautifulSoup consists of two main components:

1. **Go Core**: High-performance HTML/XML parsing engine written in Go
2. **Python Wrapper**: Provides BeautifulSoup4-compatible API

The Go core handles all the heavy lifting (parsing, querying, tree traversal), while the Python wrapper ensures 100% API compatibility.

## 🌟 Performance Tips

1. **Reuse Parser**: For multiple documents, reuse the BeautifulSoup instance when possible
2. **Use Specific Selectors**: More specific CSS selectors perform better than broad searches
3. **Limit Search Scope**: Use `find()` instead of `find_all()` when you only need one result
4. **Choose Right Parser**: Use 'html.parser' for HTML and 'xml' for XML documents

## 📚 Documentation

- **API Reference**: [docs/api.md](docs/api.md)
- **Migration Guide**: [docs/migration.md](docs/migration.md)
- **Performance Guide**: [docs/performance.md](docs/performance.md)
- **Examples**: [examples/](examples/)

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 🐛 Bug Reports

Found a bug? Please create an issue on [GitHub Issues](https://github.com/coffeecms/gobeautifulsoup/issues) with:

- Python version
- Operating system
- Minimal code example
- Expected vs actual behavior

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Inspired by the excellent [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) library by Leonard Richardson
- Built with [Go](https://golang.org/) for maximum performance
- Thanks to all contributors and users

## 📊 Project Stats

- **GitHub**: https://github.com/coffeecms/gobeautifulsoup
- **PyPI**: https://pypi.org/project/gobeautifulsoup/
- **Documentation**: https://gobeautifulsoup.readthedocs.io/
- **Benchmarks**: [benchmarks/](benchmarks/)

---

**Ready to supercharge your HTML parsing? Install GoBeautifulSoup today and experience the performance difference!**

```bash
pip install gobeautifulsoup
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/coffeecms/gobeautifulsoup",
    "name": "gobeautifulsoup",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "CoffeeCMS Team <team@coffeecms.com>",
    "keywords": "html, xml, parsing, beautifulsoup, go, performance, web-scraping",
    "author": "CoffeeCMS Team",
    "author_email": "CoffeeCMS Team <team@coffeecms.com>",
    "download_url": "https://files.pythonhosted.org/packages/77/65/3ae535e203219cd2dea66a3b63648c08ef8d4bd891543bf3b2a265d27561/gobeautifulsoup-1.0.0.tar.gz",
    "platform": null,
    "description": "# GoBeautifulSoup\r\n\r\n[![PyPI version](https://badge.fury.io/py/gobeautifulsoup.svg)](https://badge.fury.io/py/gobeautifulsoup)\r\n[![Python versions](https://img.shields.io/pypi/pyversions/gobeautifulsoup.svg)](https://pypi.org/project/gobeautifulsoup/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Downloads](https://pepy.tech/badge/gobeautifulsoup)](https://pepy.tech/project/gobeautifulsoup)\r\n\r\n**GoBeautifulSoup** is a high-performance HTML/XML parsing library that provides a 100% compatible API with BeautifulSoup4, but powered by Go for dramatically improved performance. It's designed as a drop-in replacement for BeautifulSoup4 with significant speed improvements.\r\n\r\n## \ud83d\ude80 Why GoBeautifulSoup?\r\n\r\n- **\ud83d\udd25 Up to 10-50x faster** than BeautifulSoup4 for parsing and querying\r\n- **\ud83d\udd04 100% API Compatible** - Drop-in replacement for BeautifulSoup4\r\n- **\u26a1 Go-Powered Backend** - Leverages Go's performance for HTML/XML processing\r\n- **\ud83c\udf10 Cross-Platform** - Works on Windows, macOS, and Linux (x64/ARM64)\r\n- **\ud83d\udcbe Memory Efficient** - Optimized memory usage for large documents\r\n- **\ud83d\udee1\ufe0f Production Ready** - Thoroughly tested with comprehensive benchmarks\r\n\r\n## \ud83d\udcca Performance Comparison\r\n\r\nGoBeautifulSoup dramatically outperforms BeautifulSoup4 across all operations:\r\n\r\n### Parsing Performance\r\n\r\n| Document Size | GoBeautifulSoup | BeautifulSoup4 (html.parser) | BeautifulSoup4 (lxml) | Speed Improvement |\r\n|---------------|-----------------|-------------------------------|----------------------|-------------------|\r\n| Small (1KB)   | 0.044ms        | 2.1ms                        | 1.8ms               | **48x faster**    |\r\n| Medium (100KB)| 5.7ms          | 89ms                         | 76ms                | **15x faster**    |\r\n| Large (1MB)   | 154ms          | 2,400ms                      | 1,980ms             | **15x faster**    |\r\n\r\n### Query Performance (Medium Document)\r\n\r\n| Operation              | GoBeautifulSoup | BeautifulSoup4 | Speed Improvement |\r\n|------------------------|-----------------|----------------|-------------------|\r\n| `find('div')`         | 0.16ms         | 3.2ms         | **20x faster**    |\r\n| `find_all('div')`     | 4.5ms          | 45ms          | **10x faster**    |\r\n| `select('h3')`        | 2.5ms          | 28ms          | **11x faster**    |\r\n| `find(class_='item')` | 0.55ms         | 8.9ms         | **16x faster**    |\r\n\r\n## \ud83d\udd27 Installation\r\n\r\n```bash\r\npip install gobeautifulsoup\r\n```\r\n\r\n## \ud83d\udcd6 Quick Start\r\n\r\nGoBeautifulSoup provides the exact same API as BeautifulSoup4:\r\n\r\n```python\r\nfrom gobeautifulsoup import BeautifulSoup\r\n\r\n# Parse HTML\r\nhtml = \"\"\"\r\n<html>\r\n    <head><title>Example</title></head>\r\n    <body>\r\n        <div class=\"container\">\r\n            <p class=\"highlight\">Hello World!</p>\r\n            <a href=\"https://example.com\">Link</a>\r\n        </div>\r\n    </body>\r\n</html>\r\n\"\"\"\r\n\r\nsoup = BeautifulSoup(html, 'html.parser')\r\n\r\n# All familiar BeautifulSoup methods work exactly the same\r\ntitle = soup.find('title').get_text()\r\nprint(title)  # \"Example\"\r\n\r\nparagraph = soup.find('p', class_='highlight')\r\nprint(paragraph.get_text())  # \"Hello World!\"\r\n\r\nlinks = soup.find_all('a')\r\nfor link in links:\r\n    print(link.get('href'))  # \"https://example.com\"\r\n```\r\n\r\n## \ud83d\udca1 Usage Examples\r\n\r\n### 1. Basic HTML Parsing\r\n\r\n```python\r\nfrom gobeautifulsoup import BeautifulSoup\r\n\r\nhtml = \"\"\"\r\n<html>\r\n    <body>\r\n        <h1>Welcome</h1>\r\n        <p class=\"intro\">This is an introduction.</p>\r\n        <ul>\r\n            <li>Item 1</li>\r\n            <li>Item 2</li>\r\n            <li>Item 3</li>\r\n        </ul>\r\n    </body>\r\n</html>\r\n\"\"\"\r\n\r\nsoup = BeautifulSoup(html, 'html.parser')\r\n\r\n# Find elements\r\nheading = soup.find('h1')\r\nprint(f\"Heading: {heading.get_text()}\")\r\n\r\n# Find by class\r\nintro = soup.find('p', class_='intro')\r\nprint(f\"Introduction: {intro.get_text()}\")\r\n\r\n# Find all list items\r\nitems = soup.find_all('li')\r\nfor i, item in enumerate(items, 1):\r\n    print(f\"Item {i}: {item.get_text()}\")\r\n```\r\n\r\n### 2. Web Scraping with Requests\r\n\r\n```python\r\nimport requests\r\nfrom gobeautifulsoup import BeautifulSoup\r\n\r\n# Scrape a webpage\r\nurl = \"https://httpbin.org/html\"\r\nresponse = requests.get(url)\r\nsoup = BeautifulSoup(response.content, 'html.parser')\r\n\r\n# Extract all links\r\nlinks = soup.find_all('a')\r\nfor link in links:\r\n    href = link.get('href')\r\n    text = link.get_text().strip()\r\n    if href:\r\n        print(f\"Link: {text} -> {href}\")\r\n\r\n# Extract all headings\r\nfor heading in soup.find_all(['h1', 'h2', 'h3']):\r\n    print(f\"{heading.name}: {heading.get_text()}\")\r\n```\r\n\r\n### 3. CSS Selector Support\r\n\r\n```python\r\nfrom gobeautifulsoup import BeautifulSoup\r\n\r\nhtml = \"\"\"\r\n<div class=\"content\">\r\n    <article id=\"post-1\" class=\"post featured\">\r\n        <h2>Featured Post</h2>\r\n        <p class=\"excerpt\">This is a featured post excerpt.</p>\r\n    </article>\r\n    <article id=\"post-2\" class=\"post\">\r\n        <h2>Regular Post</h2>\r\n        <p class=\"excerpt\">This is a regular post excerpt.</p>\r\n    </article>\r\n</div>\r\n\"\"\"\r\n\r\nsoup = BeautifulSoup(html, 'html.parser')\r\n\r\n# CSS selectors work exactly like BeautifulSoup4\r\nfeatured_posts = soup.select('.post.featured')\r\nprint(f\"Featured posts: {len(featured_posts)}\")\r\n\r\n# Complex selectors\r\nexcerpts = soup.select('article p.excerpt')\r\nfor excerpt in excerpts:\r\n    print(f\"Excerpt: {excerpt.get_text()}\")\r\n\r\n# ID selectors\r\nspecific_post = soup.select('#post-1 h2')[0]\r\nprint(f\"Specific post title: {specific_post.get_text()}\")\r\n```\r\n\r\n### 4. XML Processing\r\n\r\n```python\r\nfrom gobeautifulsoup import BeautifulSoup\r\n\r\nxml_data = \"\"\"\r\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<catalog>\r\n    <book id=\"1\">\r\n        <title>Python Programming</title>\r\n        <author>John Doe</author>\r\n        <price currency=\"USD\">29.99</price>\r\n    </book>\r\n    <book id=\"2\">\r\n        <title>Web Development</title>\r\n        <author>Jane Smith</author>\r\n        <price currency=\"USD\">34.99</price>\r\n    </book>\r\n</catalog>\r\n\"\"\"\r\n\r\nsoup = BeautifulSoup(xml_data, 'xml')\r\n\r\n# Process XML data\r\nbooks = soup.find_all('book')\r\nfor book in books:\r\n    book_id = book.get('id')\r\n    title = book.find('title').get_text()\r\n    author = book.find('author').get_text()\r\n    price = book.find('price')\r\n    \r\n    print(f\"Book {book_id}: {title} by {author}\")\r\n    print(f\"Price: {price.get('currency')} {price.get_text()}\")\r\n    print(\"-\" * 40)\r\n```\r\n\r\n### 5. Advanced Data Extraction\r\n\r\n```python\r\nfrom gobeautifulsoup import BeautifulSoup\r\nimport re\r\n\r\nhtml = \"\"\"\r\n<table class=\"data-table\">\r\n    <thead>\r\n        <tr>\r\n            <th>Product</th>\r\n            <th>Price</th>\r\n            <th>Stock</th>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr data-product-id=\"123\">\r\n            <td class=\"product-name\">Laptop</td>\r\n            <td class=\"price\">$999.99</td>\r\n            <td class=\"stock in-stock\">Available</td>\r\n        </tr>\r\n        <tr data-product-id=\"124\">\r\n            <td class=\"product-name\">Mouse</td>\r\n            <td class=\"price\">$29.99</td>\r\n            <td class=\"stock out-of-stock\">Out of Stock</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n\"\"\"\r\n\r\nsoup = BeautifulSoup(html, 'html.parser')\r\n\r\n# Extract structured data\r\nproducts = []\r\nrows = soup.select('tbody tr')\r\n\r\nfor row in rows:\r\n    product_id = row.get('data-product-id')\r\n    name = row.select_one('.product-name').get_text()\r\n    price_text = row.select_one('.price').get_text()\r\n    stock_cell = row.select_one('.stock')\r\n    \r\n    # Extract price using regex\r\n    price_match = re.search(r'\\$(\\d+\\.?\\d*)', price_text)\r\n    price = float(price_match.group(1)) if price_match else 0.0\r\n    \r\n    # Determine stock status\r\n    in_stock = 'in-stock' in stock_cell.get('class', [])\r\n    \r\n    products.append({\r\n        'id': product_id,\r\n        'name': name,\r\n        'price': price,\r\n        'in_stock': in_stock\r\n    })\r\n\r\n# Display extracted data\r\nfor product in products:\r\n    status = \"\u2705 Available\" if product['in_stock'] else \"\u274c Out of Stock\"\r\n    print(f\"{product['name']} (ID: {product['id']})\")\r\n    print(f\"Price: ${product['price']:.2f} | Status: {status}\")\r\n    print(\"-\" * 50)\r\n```\r\n\r\n## \ud83d\udd04 Migration from BeautifulSoup4\r\n\r\nGoBeautifulSoup is designed as a drop-in replacement. Simply change your import:\r\n\r\n```python\r\n# Before\r\nfrom bs4 import BeautifulSoup\r\n\r\n# After  \r\nfrom gobeautifulsoup import BeautifulSoup\r\n\r\n# Everything else stays exactly the same!\r\n```\r\n\r\n## \ud83d\udccb Supported Features\r\n\r\n\u2705 **Full BeautifulSoup4 API Compatibility**\r\n- `find()` and `find_all()` methods\r\n- CSS selector support with `select()`\r\n- Tree navigation (parent, children, siblings)\r\n- Attribute access and modification\r\n- Text extraction and manipulation\r\n\r\n\u2705 **Parser Support**\r\n- HTML parser (`html.parser`)\r\n- XML parser (`xml`) \r\n- Automatic encoding detection\r\n\r\n\u2705 **Advanced Features**\r\n- Regular expression search\r\n- Custom attribute filters\r\n- Tree modification methods\r\n- Pretty printing\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\nGoBeautifulSoup consists of two main components:\r\n\r\n1. **Go Core**: High-performance HTML/XML parsing engine written in Go\r\n2. **Python Wrapper**: Provides BeautifulSoup4-compatible API\r\n\r\nThe Go core handles all the heavy lifting (parsing, querying, tree traversal), while the Python wrapper ensures 100% API compatibility.\r\n\r\n## \ud83c\udf1f Performance Tips\r\n\r\n1. **Reuse Parser**: For multiple documents, reuse the BeautifulSoup instance when possible\r\n2. **Use Specific Selectors**: More specific CSS selectors perform better than broad searches\r\n3. **Limit Search Scope**: Use `find()` instead of `find_all()` when you only need one result\r\n4. **Choose Right Parser**: Use 'html.parser' for HTML and 'xml' for XML documents\r\n\r\n## \ud83d\udcda Documentation\r\n\r\n- **API Reference**: [docs/api.md](docs/api.md)\r\n- **Migration Guide**: [docs/migration.md](docs/migration.md)\r\n- **Performance Guide**: [docs/performance.md](docs/performance.md)\r\n- **Examples**: [examples/](examples/)\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\r\n\r\n1. Fork the repository\r\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\r\n4. Push to the branch (`git push origin feature/amazing-feature`)\r\n5. Open a Pull Request\r\n\r\n## \ud83d\udc1b Bug Reports\r\n\r\nFound a bug? Please create an issue on [GitHub Issues](https://github.com/coffeecms/gobeautifulsoup/issues) with:\r\n\r\n- Python version\r\n- Operating system\r\n- Minimal code example\r\n- Expected vs actual behavior\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\n- Inspired by the excellent [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) library by Leonard Richardson\r\n- Built with [Go](https://golang.org/) for maximum performance\r\n- Thanks to all contributors and users\r\n\r\n## \ud83d\udcca Project Stats\r\n\r\n- **GitHub**: https://github.com/coffeecms/gobeautifulsoup\r\n- **PyPI**: https://pypi.org/project/gobeautifulsoup/\r\n- **Documentation**: https://gobeautifulsoup.readthedocs.io/\r\n- **Benchmarks**: [benchmarks/](benchmarks/)\r\n\r\n---\r\n\r\n**Ready to supercharge your HTML parsing? Install GoBeautifulSoup today and experience the performance difference!**\r\n\r\n```bash\r\npip install gobeautifulsoup\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT License\r\n        \r\n        Copyright (c) 2025 CoffeeCMS Team\r\n        \r\n        Permission is hereby granted, free of charge, to any person obtaining a copy\r\n        of this software and associated documentation files (the \"Software\"), to deal\r\n        in the Software without restriction, including without limitation the rights\r\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n        copies of the Software, and to permit persons to whom the Software is\r\n        furnished to do so, subject to the following conditions:\r\n        \r\n        The above copyright notice and this permission notice shall be included in all\r\n        copies or substantial portions of the Software.\r\n        \r\n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n        SOFTWARE.\r\n        ",
    "summary": "A high-performance BeautifulSoup replacement powered by Go",
    "version": "1.0.0",
    "project_urls": {
        "Benchmarks": "https://github.com/coffeecms/gobeautifulsoup/tree/main/benchmarks",
        "Bug Reports": "https://github.com/coffeecms/gobeautifulsoup/issues",
        "Changelog": "https://github.com/coffeecms/gobeautifulsoup/blob/main/CHANGELOG.md",
        "Documentation": "https://gobeautifulsoup.readthedocs.io/",
        "Homepage": "https://github.com/coffeecms/gobeautifulsoup",
        "Source": "https://github.com/coffeecms/gobeautifulsoup"
    },
    "split_keywords": [
        "html",
        " xml",
        " parsing",
        " beautifulsoup",
        " go",
        " performance",
        " web-scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7149ee4c860847faeea50ef7f1bb565ed2997864a278e8dcd02dcea47092cf10",
                "md5": "ac1ca4505b5ecdcafd8ab66d83e278a1",
                "sha256": "b1a6a07e332dc043af622566109c0127bf6c64b8487f139e9b4059ee7b9d516b"
            },
            "downloads": -1,
            "filename": "gobeautifulsoup-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ac1ca4505b5ecdcafd8ab66d83e278a1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 16302,
            "upload_time": "2025-07-29T02:01:02",
            "upload_time_iso_8601": "2025-07-29T02:01:02.054458Z",
            "url": "https://files.pythonhosted.org/packages/71/49/ee4c860847faeea50ef7f1bb565ed2997864a278e8dcd02dcea47092cf10/gobeautifulsoup-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "77653ae535e203219cd2dea66a3b63648c08ef8d4bd891543bf3b2a265d27561",
                "md5": "4be8466587c3e793918795e45a6dee68",
                "sha256": "55a59a24f79681495447a35bdca98af049c6f28ad04c3ab4287eca77829bbe5d"
            },
            "downloads": -1,
            "filename": "gobeautifulsoup-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4be8466587c3e793918795e45a6dee68",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 48712,
            "upload_time": "2025-07-29T02:01:04",
            "upload_time_iso_8601": "2025-07-29T02:01:04.208327Z",
            "url": "https://files.pythonhosted.org/packages/77/65/3ae535e203219cd2dea66a3b63648c08ef8d4bd891543bf3b2a265d27561/gobeautifulsoup-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 02:01:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "coffeecms",
    "github_project": "gobeautifulsoup",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "gobeautifulsoup"
}

CoffeeCMS Team