scrapy-ua-rotator


Namescrapy-ua-rotator JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/geeone/scrapy-ua-rotator
SummaryFlexible and modern User-Agent rotator middleware for Scrapy, supporting Faker, fake-useragent, and custom providers.
upload_time2025-07-13 07:08:20
maintainerNone
docs_urlNone
authorSergei Denisenko
requires_python>=3.9
licenseMIT
keywords scrapy user-agent middleware rotation fake-useragent faker providers proxy web-scraping
VCS
bugtrack_url
requirements Faker fake-useragent
Travis-CI No Travis.
coveralls test coverage
            

# scrapy-ua-rotator

[![PyPI](https://img.shields.io/pypi/v/scrapy-ua-rotator)](https://pypi.org/project/scrapy-ua-rotator/)
[![Python](https://img.shields.io/badge/Python-3.9%20|%203.10%20|%203.11%20|%203.12%20|%203.13-blue)](https://pypi.org/project/scrapy-ua-rotator/)
[![License](https://img.shields.io/github/license/geeone/scrapy-ua-rotator)](LICENSE)
[![Build Status](https://github.com/geeone/scrapy-ua-rotator/actions/workflows/build.yml/badge.svg)](https://github.com/geeone/scrapy-ua-rotator/actions/workflows/build.yml)
[![Codecov](https://codecov.io/gh/geeone/scrapy-ua-rotator/branch/main/graph/badge.svg)](https://codecov.io/gh/geeone/scrapy-ua-rotator)

A modern, pluggable User-Agent rotator middleware for the Scrapy framework.

Supports rotation via:
- [`fake-useragent`](https://pypi.org/project/fake-useragent/)
- [`Faker`](https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html)
- Scrapy’s built-in `USER_AGENT` setting

Also supports per-proxy rotation and easy extensibility with custom providers.

---

## 📋 Requirements

- Python 3.9+
- `Faker >= 18.0.0`
- `fake-useragent >= 1.5.0`

> ✅ **Tested with**: Scrapy 2.9, 2.10, 2.11, and 2.12  

---

## 📦 Installation

```bash
pip install scrapy-ua-rotator
```

---

## ⚙️ Configuration

Disable Scrapy’s default middleware and enable ours:

```python
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scrapy_ua_rotator.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_ua_rotator.middleware.RetryUserAgentMiddleware': 550,
}
```

Recommended provider order:

```python
USERAGENT_PROVIDERS = [
    'scrapy_ua_rotator.providers.FakeUserAgentProvider',  # Primary provider using the fake-useragent library
    'scrapy_ua_rotator.providers.FakerProvider',          # Fallback provider that generates synthetic UAs via Faker
    'scrapy_ua_rotator.providers.FixedUserAgentProvider', # Final fallback: uses the static USER_AGENT setting
]

# Static user-agent string to be used if all providers fail to return a valid value
USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64)..."
```

---

## 🧩 Provider Details

### FakeUserAgentProvider

Assigns a new user-agent using [`fake-useragent`](https://github.com/fake-useragent/fake-useragent).  
Supports fine-tuned filtering via:

```python
FAKE_USERAGENT_UA_TYPE = 'Chrome Mobile iOS'           # str; browser to prioritize (default: 'random')
FAKE_USERAGENT_OS = ['Linux']                          # str or list[str]; OS filter (default: None — all OSes)
FAKE_USERAGENT_PLATFORMS = ['mobile']                  # str or list[str]; platform filter (default: None — all platforms)
FAKE_USERAGENT_FALLBACK = 'Mozilla/5.0 (...)'          # str; fallback UA string (default: internal fallback)
```

> 💡 **Note:** See [docs](https://github.com/fake-useragent/fake-useragent/blob/main/README.md) for supported options and advanced usage.

### FakerProvider

Uses [`Faker`](https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html) to generate synthetic UA strings.

```python
FAKER_UA_TYPE = 'chrome'  # or 'firefox', 'safari', etc. (default: 'user_agent' — random web browser)
```

> 💡 **Note:** See [docs](https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html) for supported options and advanced usage.

### FixedUserAgentProvider

Simply uses the provided `USER_AGENT` setting without rotation.  
Useful as a fallback if other providers fail.

```python
USER_AGENT = "Mozilla/5.0 ..."
```

---

## 🔀 Proxy-Aware Mode

If you’re using rotating proxies (e.g., via `scrapy-proxies`), enable per-proxy UA assignment:

```python
RANDOM_UA_PER_PROXY = True
```

Make sure `RandomUserAgentMiddleware` has higher priority than your proxy middleware.

---

## 🧪 Example Output

To verify it’s working, log your request headers in your spider:

```python
def parse(self, response):
    self.logger.info("Using UA: %s", response.request.headers.get('User-Agent'))
```

---

## 🔧 Extending with Custom Providers

Add your own class:

```python
USERAGENT_PROVIDERS = [
    'your_project.providers.MyCustomProvider',
    ...
]
```

Just inherit from `BaseProvider` and implement `get_random_ua()`.

---

## 🤝 Contributing

Contributions, suggestions, and issues are welcome!  
Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## 📄 License

MIT © [Sergei Denisenko](https://github.com/geeone)  
See [LICENSE](https://github.com/geeone/scrapy-ua-rotator/blob/main/LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/geeone/scrapy-ua-rotator",
    "name": "scrapy-ua-rotator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "scrapy user-agent middleware rotation fake-useragent faker providers proxy web-scraping",
    "author": "Sergei Denisenko",
    "author_email": "sergei.denisenko@ieee.org",
    "download_url": "https://files.pythonhosted.org/packages/cc/bb/03b781cbf4705d2a43f3a2f58ee079b1dd8ac9c593a60929b46c25ded649/scrapy_ua_rotator-1.0.1.tar.gz",
    "platform": null,
    "description": "\n\n# scrapy-ua-rotator\n\n[![PyPI](https://img.shields.io/pypi/v/scrapy-ua-rotator)](https://pypi.org/project/scrapy-ua-rotator/)\n[![Python](https://img.shields.io/badge/Python-3.9%20|%203.10%20|%203.11%20|%203.12%20|%203.13-blue)](https://pypi.org/project/scrapy-ua-rotator/)\n[![License](https://img.shields.io/github/license/geeone/scrapy-ua-rotator)](LICENSE)\n[![Build Status](https://github.com/geeone/scrapy-ua-rotator/actions/workflows/build.yml/badge.svg)](https://github.com/geeone/scrapy-ua-rotator/actions/workflows/build.yml)\n[![Codecov](https://codecov.io/gh/geeone/scrapy-ua-rotator/branch/main/graph/badge.svg)](https://codecov.io/gh/geeone/scrapy-ua-rotator)\n\nA modern, pluggable User-Agent rotator middleware for the Scrapy framework.\n\nSupports rotation via:\n- [`fake-useragent`](https://pypi.org/project/fake-useragent/)\n- [`Faker`](https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html)\n- Scrapy\u2019s built-in `USER_AGENT` setting\n\nAlso supports per-proxy rotation and easy extensibility with custom providers.\n\n---\n\n## \ud83d\udccb Requirements\n\n- Python 3.9+\n- `Faker >= 18.0.0`\n- `fake-useragent >= 1.5.0`\n\n> \u2705 **Tested with**: Scrapy 2.9, 2.10, 2.11, and 2.12  \n\n---\n\n## \ud83d\udce6 Installation\n\n```bash\npip install scrapy-ua-rotator\n```\n\n---\n\n## \u2699\ufe0f Configuration\n\nDisable Scrapy\u2019s default middleware and enable ours:\n\n```python\nDOWNLOADER_MIDDLEWARES = {\n    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,\n    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,\n    'scrapy_ua_rotator.middleware.RandomUserAgentMiddleware': 400,\n    'scrapy_ua_rotator.middleware.RetryUserAgentMiddleware': 550,\n}\n```\n\nRecommended provider order:\n\n```python\nUSERAGENT_PROVIDERS = [\n    'scrapy_ua_rotator.providers.FakeUserAgentProvider',  # Primary provider using the fake-useragent library\n    'scrapy_ua_rotator.providers.FakerProvider',          # Fallback provider that generates synthetic UAs via Faker\n    'scrapy_ua_rotator.providers.FixedUserAgentProvider', # Final fallback: uses the static USER_AGENT setting\n]\n\n# Static user-agent string to be used if all providers fail to return a valid value\nUSER_AGENT = \"Mozilla/5.0 (X11; Linux x86_64)...\"\n```\n\n---\n\n## \ud83e\udde9 Provider Details\n\n### FakeUserAgentProvider\n\nAssigns a new user-agent using [`fake-useragent`](https://github.com/fake-useragent/fake-useragent).  \nSupports fine-tuned filtering via:\n\n```python\nFAKE_USERAGENT_UA_TYPE = 'Chrome Mobile iOS'           # str; browser to prioritize (default: 'random')\nFAKE_USERAGENT_OS = ['Linux']                          # str or list[str]; OS filter (default: None \u2014 all OSes)\nFAKE_USERAGENT_PLATFORMS = ['mobile']                  # str or list[str]; platform filter (default: None \u2014 all platforms)\nFAKE_USERAGENT_FALLBACK = 'Mozilla/5.0 (...)'          # str; fallback UA string (default: internal fallback)\n```\n\n> \ud83d\udca1 **Note:** See [docs](https://github.com/fake-useragent/fake-useragent/blob/main/README.md) for supported options and advanced usage.\n\n### FakerProvider\n\nUses [`Faker`](https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html) to generate synthetic UA strings.\n\n```python\nFAKER_UA_TYPE = 'chrome'  # or 'firefox', 'safari', etc. (default: 'user_agent' \u2014 random web browser)\n```\n\n> \ud83d\udca1 **Note:** See [docs](https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html) for supported options and advanced usage.\n\n### FixedUserAgentProvider\n\nSimply uses the provided `USER_AGENT` setting without rotation.  \nUseful as a fallback if other providers fail.\n\n```python\nUSER_AGENT = \"Mozilla/5.0 ...\"\n```\n\n---\n\n## \ud83d\udd00 Proxy-Aware Mode\n\nIf you\u2019re using rotating proxies (e.g., via `scrapy-proxies`), enable per-proxy UA assignment:\n\n```python\nRANDOM_UA_PER_PROXY = True\n```\n\nMake sure `RandomUserAgentMiddleware` has higher priority than your proxy middleware.\n\n---\n\n## \ud83e\uddea Example Output\n\nTo verify it\u2019s working, log your request headers in your spider:\n\n```python\ndef parse(self, response):\n    self.logger.info(\"Using UA: %s\", response.request.headers.get('User-Agent'))\n```\n\n---\n\n## \ud83d\udd27 Extending with Custom Providers\n\nAdd your own class:\n\n```python\nUSERAGENT_PROVIDERS = [\n    'your_project.providers.MyCustomProvider',\n    ...\n]\n```\n\nJust inherit from `BaseProvider` and implement `get_random_ua()`.\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions, suggestions, and issues are welcome!  \nPlease see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n---\n\n## \ud83d\udcc4 License\n\nMIT \u00a9 [Sergei Denisenko](https://github.com/geeone)  \nSee [LICENSE](https://github.com/geeone/scrapy-ua-rotator/blob/main/LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Flexible and modern User-Agent rotator middleware for Scrapy, supporting Faker, fake-useragent, and custom providers.",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/geeone/scrapy-ua-rotator/issues",
        "Documentation": "https://github.com/geeone/scrapy-ua-rotator",
        "Homepage": "https://github.com/geeone/scrapy-ua-rotator",
        "Source": "https://github.com/geeone/scrapy-ua-rotator"
    },
    "split_keywords": [
        "scrapy",
        "user-agent",
        "middleware",
        "rotation",
        "fake-useragent",
        "faker",
        "providers",
        "proxy",
        "web-scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8ad58c33068c44c8ca3c3d8d0f9c6c9f813c0bcc5a6b79d62ad665e5f2eb42c4",
                "md5": "c384136e69926f38a502b2b713c79054",
                "sha256": "ff985f2b92736ecc3af0769bb276a51576738648a5bc8e700d697c06c30b7a49"
            },
            "downloads": -1,
            "filename": "scrapy_ua_rotator-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c384136e69926f38a502b2b713c79054",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 7062,
            "upload_time": "2025-07-13T07:08:19",
            "upload_time_iso_8601": "2025-07-13T07:08:19.077036Z",
            "url": "https://files.pythonhosted.org/packages/8a/d5/8c33068c44c8ca3c3d8d0f9c6c9f813c0bcc5a6b79d62ad665e5f2eb42c4/scrapy_ua_rotator-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ccbb03b781cbf4705d2a43f3a2f58ee079b1dd8ac9c593a60929b46c25ded649",
                "md5": "87daa2ed052cb8989ba0c9a72bb4c6a0",
                "sha256": "32701c60190bc565be3cfcd2cb439cc40c15a7a6833165160e7e4ba90fab749f"
            },
            "downloads": -1,
            "filename": "scrapy_ua_rotator-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "87daa2ed052cb8989ba0c9a72bb4c6a0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11455,
            "upload_time": "2025-07-13T07:08:20",
            "upload_time_iso_8601": "2025-07-13T07:08:20.151211Z",
            "url": "https://files.pythonhosted.org/packages/cc/bb/03b781cbf4705d2a43f3a2f58ee079b1dd8ac9c593a60929b46c25ded649/scrapy_ua_rotator-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 07:08:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "geeone",
    "github_project": "scrapy-ua-rotator",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "Faker",
            "specs": [
                [
                    ">=",
                    "36.0.0"
                ]
            ]
        },
        {
            "name": "fake-useragent",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        }
    ],
    "lcname": "scrapy-ua-rotator"
}
        
Elapsed time: 1.70351s