UserAgentFilter


NameUserAgentFilter JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/ambilynanjilath/UserAgentFilter.git
SummaryA package for testing user agents on specific websites
upload_time2024-07-26 11:37:06
maintainerNone
docs_urlNone
authorAmbily Biju & Shahana Farvin
requires_python>=3.7
licenseNone
keywords user agent testing web scraping requests
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # UserAgentFilter
![UserAgentFilter Logo](https://github.com/ambilynanjilath/UserAgentFilter/blob/main/logo.png)

**UserAgentFilter** is a Python package designed for testing user agents on specific websites. It helps in identifying which user agents are effective for web scraping or automated testing by filtering out those that work or fail.

## Key Features
- Tests a list of user agents against a specified website.
- Supports optional proxy configuration.
- Handles errors and retries for transient issues.
- Random delays between requests to mimic human browsing behavior.
- Outputs results in a text file for easy review.

## Prerequisites

- Python 3.6 or higher
- `requests` library
- `beautifulsoup4` library

## Installation

You can install **UserAgentFilter** via pip. Run the following command:

```
pip install useragentfilter
```

## Usage

To use **UserAgentFilter**, follow these steps:

1. Import the Package
First, import the **UserAgentTester** class from the package.

```
from UserAgentFilter import UserAgentTester
```

2. Initialize the UserAgentTester
Create an instance of the UserAgentTester class. You need to specify the URL of the website you want to test the user agents against. Optionally, you can provide proxy settings, a timeout period, the number of retries, and a range for random delays between requests to mimic human behavior.

```
tester = UserAgentTester(
    test_url='https://www.example.com',  # The URL to test user agents against
    proxy={'http': 'http://your_proxy:port', 'https': 'https://your_proxy:port'},  # Optional proxy settings
    timeout=10,  # Timeout for each request in seconds
    max_retries=3,  # Number of retries for each request
    delay_range=(3, 8)  # Random delay range between requests in seconds
)

```
3. Prepare a List of User Agents
Prepare a text file containing a list of user agents, with each user agent on a new line. For example, save the following content to tests/user_agents.txt:

```
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/91.0.864.64
```
4. Filter User Agents
Call the filter_user_agents method to filter the user agents. This method takes two arguments: the path to the input file containing user agents and the path to the output file where the filtered user agents will be saved.

```
tester.filter_user_agents(
    user_agents_file='tests/user_agents.txt',  # Path to the input file with user agents
    output_file='filtered_user_agents.txt'  # Path to the output file to save the filtered user agents
)
```
5. Review the Results
After the filtering process is complete, the successful user agents will be saved to the specified output file (filtered_user_agents.txt). You can review this file to see which user agents passed the test.

## Example Workflow
Here’s a complete example of the entire workflow:

```
from UserAgentFilter import UserAgentTester

# Define the target URL and proxy settings (if needed)
test_url = 'https://www.example.com'
proxy = {'http': 'http://your_proxy:port', 'https': 'https://your_proxy:port'}

# Create an instance of UserAgentTester
tester = UserAgentTester(
    test_url=test_url,
    proxy=proxy,
    timeout=10,
    max_retries=3,
    delay_range=(3, 8)
)

# Filter user agents from the input file and save the successful ones to the output file
tester.filter_user_agents(
    user_agents_file='tests/user_agents.txt',
    output_file='filtered_user_agents.txt'
)

print("User agents have been filtered and saved to 'filtered_user_agents.txt'")
```
### Additional Tips
- **Error Handling**: The UserAgentTester handles various errors such as connection timeouts and HTTP errors. It retries requests up to the specified max_retries before giving up on a user agent.
- **Random Delays**: The delay_range parameter introduces random delays between requests to help mimic human browsing behavior, which can help avoid detection when testing multiple user agents.
- **Proxy Configuration**: If you need to use a proxy, make sure to provide the correct proxy settings in the proxy dictionary. The dictionary should include keys for http and https proxies.

## Configuration Options

- test_url: The URL of the website to test user agents against.
- proxy: A dictionary containing proxy settings (optional).Use importantly in case of any 403 forbidden error.
- timeout: The maximum amount of time to wait for a response (in seconds).Default value is 10.
- max_retries: The number of times to retry a request in case of transient errors.Default value is 3.
- delay_range: A tuple specifying the range (in seconds) for random delays between requests.Default value is (3,8).

## Contributing

Contributions are welcome! If you would like to contribute to **UserAgentFilter**, please follow these steps:

- Fork the repository.
- Create a new branch for your feature or bugfix.
- Commit your changes.
- Push your branch and create a pull request.

## License
**UserAgentFilter** is licensed under the MIT License. See the LICENSE file for more information.

## Contact
If you have any questions, suggestions, or issues, please feel free to contact us at [shahana50997@gmail.com][ambilybiju2408@gmail.com].






            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ambilynanjilath/UserAgentFilter.git",
    "name": "UserAgentFilter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "user agent testing, web scraping, requests",
    "author": "Ambily Biju & Shahana Farvin",
    "author_email": "ambilybiju2408@gmail.com,shahana50997@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8e/97/921eef0a6c63eb0f7a69f76533b02531e34120fcd91e7fe0081ff028a555/UserAgentFilter-1.0.0.tar.gz",
    "platform": null,
    "description": "# UserAgentFilter\n![UserAgentFilter Logo](https://github.com/ambilynanjilath/UserAgentFilter/blob/main/logo.png)\n\n**UserAgentFilter** is a Python package designed for testing user agents on specific websites. It helps in identifying which user agents are effective for web scraping or automated testing by filtering out those that work or fail.\n\n## Key Features\n- Tests a list of user agents against a specified website.\n- Supports optional proxy configuration.\n- Handles errors and retries for transient issues.\n- Random delays between requests to mimic human browsing behavior.\n- Outputs results in a text file for easy review.\n\n## Prerequisites\n\n- Python 3.6 or higher\n- `requests` library\n- `beautifulsoup4` library\n\n## Installation\n\nYou can install **UserAgentFilter** via pip. Run the following command:\n\n```\npip install useragentfilter\n```\n\n## Usage\n\nTo use **UserAgentFilter**, follow these steps:\n\n1. Import the Package\nFirst, import the **UserAgentTester** class from the package.\n\n```\nfrom UserAgentFilter import UserAgentTester\n```\n\n2. Initialize the UserAgentTester\nCreate an instance of the UserAgentTester class. You need to specify the URL of the website you want to test the user agents against. Optionally, you can provide proxy settings, a timeout period, the number of retries, and a range for random delays between requests to mimic human behavior.\n\n```\ntester = UserAgentTester(\n    test_url='https://www.example.com',  # The URL to test user agents against\n    proxy={'http': 'http://your_proxy:port', 'https': 'https://your_proxy:port'},  # Optional proxy settings\n    timeout=10,  # Timeout for each request in seconds\n    max_retries=3,  # Number of retries for each request\n    delay_range=(3, 8)  # Random delay range between requests in seconds\n)\n\n```\n3. Prepare a List of User Agents\nPrepare a text file containing a list of user agents, with each user agent on a new line. For example, save the following content to tests/user_agents.txt:\n\n```\nMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\nMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0\nMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/91.0.864.64\n```\n4. Filter User Agents\nCall the filter_user_agents method to filter the user agents. This method takes two arguments: the path to the input file containing user agents and the path to the output file where the filtered user agents will be saved.\n\n```\ntester.filter_user_agents(\n    user_agents_file='tests/user_agents.txt',  # Path to the input file with user agents\n    output_file='filtered_user_agents.txt'  # Path to the output file to save the filtered user agents\n)\n```\n5. Review the Results\nAfter the filtering process is complete, the successful user agents will be saved to the specified output file (filtered_user_agents.txt). You can review this file to see which user agents passed the test.\n\n## Example Workflow\nHere\u2019s a complete example of the entire workflow:\n\n```\nfrom UserAgentFilter import UserAgentTester\n\n# Define the target URL and proxy settings (if needed)\ntest_url = 'https://www.example.com'\nproxy = {'http': 'http://your_proxy:port', 'https': 'https://your_proxy:port'}\n\n# Create an instance of UserAgentTester\ntester = UserAgentTester(\n    test_url=test_url,\n    proxy=proxy,\n    timeout=10,\n    max_retries=3,\n    delay_range=(3, 8)\n)\n\n# Filter user agents from the input file and save the successful ones to the output file\ntester.filter_user_agents(\n    user_agents_file='tests/user_agents.txt',\n    output_file='filtered_user_agents.txt'\n)\n\nprint(\"User agents have been filtered and saved to 'filtered_user_agents.txt'\")\n```\n### Additional Tips\n- **Error Handling**: The UserAgentTester handles various errors such as connection timeouts and HTTP errors. It retries requests up to the specified max_retries before giving up on a user agent.\n- **Random Delays**: The delay_range parameter introduces random delays between requests to help mimic human browsing behavior, which can help avoid detection when testing multiple user agents.\n- **Proxy Configuration**: If you need to use a proxy, make sure to provide the correct proxy settings in the proxy dictionary. The dictionary should include keys for http and https proxies.\n\n## Configuration Options\n\n- test_url: The URL of the website to test user agents against.\n- proxy: A dictionary containing proxy settings (optional).Use importantly in case of any 403 forbidden error.\n- timeout: The maximum amount of time to wait for a response (in seconds).Default value is 10.\n- max_retries: The number of times to retry a request in case of transient errors.Default value is 3.\n- delay_range: A tuple specifying the range (in seconds) for random delays between requests.Default value is (3,8).\n\n## Contributing\n\nContributions are welcome! If you would like to contribute to **UserAgentFilter**, please follow these steps:\n\n- Fork the repository.\n- Create a new branch for your feature or bugfix.\n- Commit your changes.\n- Push your branch and create a pull request.\n\n## License\n**UserAgentFilter** is licensed under the MIT License. See the LICENSE file for more information.\n\n## Contact\nIf you have any questions, suggestions, or issues, please feel free to contact us at [shahana50997@gmail.com][ambilybiju2408@gmail.com].\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package for testing user agents on specific websites",
    "version": "1.0.0",
    "project_urls": {
        "Documentation": "https://github.com/ambilynanjilath/UserAgentFilter/blob/main/README.md",
        "Homepage": "https://github.com/ambilynanjilath/UserAgentFilter.git",
        "Source": "https://github.com/ambilynanjilath/UserAgentFilter",
        "Tracker": "https://github.com/ambilynanjilath/UserAgentFilter/issues"
    },
    "split_keywords": [
        "user agent testing",
        " web scraping",
        " requests"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "498503392e9a36d85c0e3977f2cc6e655983aa84dc3944ec02464e47c5c15d87",
                "md5": "9c34cfab0804612db0cc58070f764195",
                "sha256": "e5ef230a5c0c66787c335bb87981e450cd9e4a28250794aca55878129f5cb713"
            },
            "downloads": -1,
            "filename": "UserAgentFilter-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9c34cfab0804612db0cc58070f764195",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 10075,
            "upload_time": "2024-07-26T11:37:05",
            "upload_time_iso_8601": "2024-07-26T11:37:05.094327Z",
            "url": "https://files.pythonhosted.org/packages/49/85/03392e9a36d85c0e3977f2cc6e655983aa84dc3944ec02464e47c5c15d87/UserAgentFilter-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e97921eef0a6c63eb0f7a69f76533b02531e34120fcd91e7fe0081ff028a555",
                "md5": "7a878a6e9f52e8cbe66209ea0d77ebf3",
                "sha256": "b3d791b0ed117955a2558689f75ab7d47262c8480f7462c4c1c242d18eec08ff"
            },
            "downloads": -1,
            "filename": "UserAgentFilter-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7a878a6e9f52e8cbe66209ea0d77ebf3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 11484,
            "upload_time": "2024-07-26T11:37:06",
            "upload_time_iso_8601": "2024-07-26T11:37:06.773926Z",
            "url": "https://files.pythonhosted.org/packages/8e/97/921eef0a6c63eb0f7a69f76533b02531e34120fcd91e7fe0081ff028a555/UserAgentFilter-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-26 11:37:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ambilynanjilath",
    "github_project": "UserAgentFilter",
    "github_not_found": true,
    "lcname": "useragentfilter"
}
        
Elapsed time: 9.74550s