scrape-bing


Namescrape-bing JSON
Version 0.1.2.1 PyPI version JSON
download
home_pageNone
SummaryA Python package for scraping Bing search results
upload_time2025-01-28 19:09:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT License Copyright (c) 2025 Affan Shaikhsurab Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords bing search scraper web scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrape Bing

A robust Python package for scraping search results from Bing with built-in rate limiting, retry mechanisms, and result cleaning features.

## Features

- ๐Ÿ” Clean and structured search results
- ๐Ÿ”„ Automatic retry mechanism for failed requests
- โฑ๏ธ Built-in rate limiting to prevent blocking
- ๐Ÿงน URL cleaning and validation
- ๐Ÿ”„ User agent rotation
- ๐Ÿ’ช Type hints and proper error handling
- ๐Ÿ“ Comprehensive documentation

## Installation

You can install the package using pip:

```bash
pip install scrape-bing
```

For development installation:

```bash
git clone https://github.com/affanshaikhsurab/scrape-bing.git
cd scrape_bing
pip install -e .
```

## Quick Start

```python
from scrape_bing import BingScraper

# Initialize the searcher
scraper= BingScraper(
    max_retries=3,
    delay_between_requests=1.0
)

# Perform a search
results = scraper.search("python programming", num_results=5)

# Process results
for result in results:
    print(f"\nTitle: {result.title}")
    print(f"URL: {result.url}")
    print(f"Description: {result.description}")
```

## Advanced Usage

### Custom Configuration

```python
# Configure with custom parameters
scraper = BingScraper(
    max_retries=5,                # Maximum retry attempts
    delay_between_requests=2.0    # Delay between requests in seconds
)
```

### Error Handling

```python
from scrape_bing import BingScraper

scraper = BingScraper()

try:
    results = scraper.search("python programming")
  
except ValueError as e:
    print(f"Invalid input: {e}")
  
except ConnectionError as e:
    print(f"Network error: {e}")
  
except RuntimeError as e:
    print(f"Parsing error: {e}")
```

### Search Result Structure

Each search result contains:

- `title`: The title of the search result
- `url`: The cleaned and validated URL
- `description`: The description snippet (if available)

```python
# Access result attributes
for result in results:
    print(result.title)       # Title of the page
    print(result.url)         # Clean URL
    print(result.description) # Description (may be None)
```

## API Reference

### BingSearch Class

```python
class BingScraper:
    def __init__(self, max_retries: int = 3, delay_between_requests: float = 1.0):
        """
        Initialize the BingSearch scraper.
  
        Args:
            max_retries: Maximum number of retry attempts for failed requests
            delay_between_requests: Minimum delay between requests in seconds
        """
        pass

    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """
        Perform a Bing search and return results.
  
        Args:
            query: Search query string
            num_results: Maximum number of results to return
  
        Returns:
            List of SearchResult objects
  
        Raises:
            ValueError: If query is empty
            ConnectionError: If network connection fails
            RuntimeError: If parsing fails
        """
        pass
```

### SearchResult Class

```python
@dataclass
class SearchResult:
    title: str                    # Title of the search result
    url: str                      # Cleaned URL
    description: Optional[str]     # Description (may be None)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## Running Tests

```bash
# Install development dependencies
pip install -e ".[dev]"

# Run tests
python -m pytest tests/
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Beautiful Soup 4 for HTML parsing
- Requests library for HTTP requests
- Python typing for type hints

## Support

If you encounter any issues or have questions, please file an issue on the GitHub repository.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scrape-bing",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "bing, search, scraper, web scraping",
    "author": null,
    "author_email": "Affan Shaikhsurab <affanshaikhsurabofficial@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0b/25/5073aaef0fb5a2d2ac4d7f7b320d458dccc95c912ee0c89375fa9e32cbec/scrape_bing-0.1.2.1.tar.gz",
    "platform": null,
    "description": "# Scrape Bing\r\n\r\nA robust Python package for scraping search results from Bing with built-in rate limiting, retry mechanisms, and result cleaning features.\r\n\r\n## Features\r\n\r\n- \ud83d\udd0d Clean and structured search results\r\n- \ud83d\udd04 Automatic retry mechanism for failed requests\r\n- \u23f1\ufe0f Built-in rate limiting to prevent blocking\r\n- \ud83e\uddf9 URL cleaning and validation\r\n- \ud83d\udd04 User agent rotation\r\n- \ud83d\udcaa Type hints and proper error handling\r\n- \ud83d\udcdd Comprehensive documentation\r\n\r\n## Installation\r\n\r\nYou can install the package using pip:\r\n\r\n```bash\r\npip install scrape-bing\r\n```\r\n\r\nFor development installation:\r\n\r\n```bash\r\ngit clone https://github.com/affanshaikhsurab/scrape-bing.git\r\ncd scrape_bing\r\npip install -e .\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom scrape_bing import BingScraper\r\n\r\n# Initialize the searcher\r\nscraper= BingScraper(\r\n    max_retries=3,\r\n    delay_between_requests=1.0\r\n)\r\n\r\n# Perform a search\r\nresults = scraper.search(\"python programming\", num_results=5)\r\n\r\n# Process results\r\nfor result in results:\r\n    print(f\"\\nTitle: {result.title}\")\r\n    print(f\"URL: {result.url}\")\r\n    print(f\"Description: {result.description}\")\r\n```\r\n\r\n## Advanced Usage\r\n\r\n### Custom Configuration\r\n\r\n```python\r\n# Configure with custom parameters\r\nscraper = BingScraper(\r\n    max_retries=5,                # Maximum retry attempts\r\n    delay_between_requests=2.0    # Delay between requests in seconds\r\n)\r\n```\r\n\r\n### Error Handling\r\n\r\n```python\r\nfrom scrape_bing import BingScraper\r\n\r\nscraper = BingScraper()\r\n\r\ntry:\r\n    results = scraper.search(\"python programming\")\r\n  \r\nexcept ValueError as e:\r\n    print(f\"Invalid input: {e}\")\r\n  \r\nexcept ConnectionError as e:\r\n    print(f\"Network error: {e}\")\r\n  \r\nexcept RuntimeError as e:\r\n    print(f\"Parsing error: {e}\")\r\n```\r\n\r\n### Search Result Structure\r\n\r\nEach search result contains:\r\n\r\n- `title`: The title of the search result\r\n- `url`: The cleaned and validated URL\r\n- `description`: The description snippet (if available)\r\n\r\n```python\r\n# Access result attributes\r\nfor result in results:\r\n    print(result.title)       # Title of the page\r\n    print(result.url)         # Clean URL\r\n    print(result.description) # Description (may be None)\r\n```\r\n\r\n## API Reference\r\n\r\n### BingSearch Class\r\n\r\n```python\r\nclass BingScraper:\r\n    def __init__(self, max_retries: int = 3, delay_between_requests: float = 1.0):\r\n        \"\"\"\r\n        Initialize the BingSearch scraper.\r\n  \r\n        Args:\r\n            max_retries: Maximum number of retry attempts for failed requests\r\n            delay_between_requests: Minimum delay between requests in seconds\r\n        \"\"\"\r\n        pass\r\n\r\n    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:\r\n        \"\"\"\r\n        Perform a Bing search and return results.\r\n  \r\n        Args:\r\n            query: Search query string\r\n            num_results: Maximum number of results to return\r\n  \r\n        Returns:\r\n            List of SearchResult objects\r\n  \r\n        Raises:\r\n            ValueError: If query is empty\r\n            ConnectionError: If network connection fails\r\n            RuntimeError: If parsing fails\r\n        \"\"\"\r\n        pass\r\n```\r\n\r\n### SearchResult Class\r\n\r\n```python\r\n@dataclass\r\nclass SearchResult:\r\n    title: str                    # Title of the search result\r\n    url: str                      # Cleaned URL\r\n    description: Optional[str]     # Description (may be None)\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n1. Fork the repository\r\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\r\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\r\n4. Push to the branch (`git push origin feature/AmazingFeature`)\r\n5. Open a Pull Request\r\n\r\n## Running Tests\r\n\r\n```bash\r\n# Install development dependencies\r\npip install -e \".[dev]\"\r\n\r\n# Run tests\r\npython -m pytest tests/\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Acknowledgments\r\n\r\n- Beautiful Soup 4 for HTML parsing\r\n- Requests library for HTTP requests\r\n- Python typing for type hints\r\n\r\n## Support\r\n\r\nIf you encounter any issues or have questions, please file an issue on the GitHub repository.\r\n",
    "bugtrack_url": null,
    "license": "MIT License\r\n        \r\n        Copyright (c) 2025 Affan Shaikhsurab\r\n        \r\n        Permission is hereby granted, free of charge, to any person obtaining a copy\r\n        of this software and associated documentation files (the \"Software\"), to deal\r\n        in the Software without restriction, including without limitation the rights\r\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n        copies of the Software, and to permit persons to whom the Software is\r\n        furnished to do so, subject to the following conditions:\r\n        \r\n        The above copyright notice and this permission notice shall be included in all\r\n        copies or substantial portions of the Software.\r\n        \r\n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n        SOFTWARE.",
    "summary": "A Python package for scraping Bing search results",
    "version": "0.1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/yourusername/scrape-bing",
        "Repository": "https://github.com/affanshaikhsurab/scrape-bing.git"
    },
    "split_keywords": [
        "bing",
        " search",
        " scraper",
        " web scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d13edc49dc5734712b2dab9d031af65df70c343d8937d4a9d6612223013605ec",
                "md5": "239360586f78610481285529839cf0b4",
                "sha256": "f57a81f22f288fbe7d4aca095cab43f9969277f50e4c6bafaa1375ee0cac75f5"
            },
            "downloads": -1,
            "filename": "scrape_bing-0.1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "239360586f78610481285529839cf0b4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 6971,
            "upload_time": "2025-01-28T19:09:31",
            "upload_time_iso_8601": "2025-01-28T19:09:31.750810Z",
            "url": "https://files.pythonhosted.org/packages/d1/3e/dc49dc5734712b2dab9d031af65df70c343d8937d4a9d6612223013605ec/scrape_bing-0.1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b255073aaef0fb5a2d2ac4d7f7b320d458dccc95c912ee0c89375fa9e32cbec",
                "md5": "57bab0b6feed55f7b32a9ec179a96087",
                "sha256": "ef182d24050c11f5322b1b705b2f5c6918c6edb248ce80dc74e3ac6381f19217"
            },
            "downloads": -1,
            "filename": "scrape_bing-0.1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "57bab0b6feed55f7b32a9ec179a96087",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 5907,
            "upload_time": "2025-01-28T19:09:33",
            "upload_time_iso_8601": "2025-01-28T19:09:33.033475Z",
            "url": "https://files.pythonhosted.org/packages/0b/25/5073aaef0fb5a2d2ac4d7f7b320d458dccc95c912ee0c89375fa9e32cbec/scrape_bing-0.1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-28 19:09:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "scrape-bing",
    "github_not_found": true,
    "lcname": "scrape-bing"
}
        
Elapsed time: 1.17131s