# Torsel: Tor and Selenium Automation
**Torsel** is a Python module designed to manage multiple Tor instances and automate web tasks using Selenium. It is particularly useful for web automation and web scraping tasks that require IP rotation to enhance anonymity.
## Disclaimer
This project is currently under development and subject to ongoing updates and enhancements. Please note that features and functionality may change as the project evolves.
It hasn't been tested on macOS, so any feedback is welcome! If you're interested in collaborating, check out the [Contributing](#contributing) section below <img src="https://images.emojiterra.com/google/noto-emoji/unicode-15.1/color/svg/1f447.svg" alt="emoji pointing down" width="20"/>
## Key Features
- **Cross-Platform Support**: Compatible with Linux, Windows and macOS.
- **Automated IP Rotation**: Seamlessly rotate IP addresses using multiple Tor instances.
- **Web Scraping and Automation**: Ideal for tasks that require anonymity.
- **Easy Configuration**: Automatically sets up, configures, and manages Tor instances.
- **Integration with Selenium**: Run your Selenium scripts with the added anonymity of Tor.
- **Flexible and Advanced Cookie Management**: Load and manage custom cookies across multiple instances with support for both simple and advanced mapping configurations.
- **Bypassing IP-Based Restrictions**: Torsel can help bypass some IP-based restrictions by rotating IP addresses through Tor nodes.
## Considerations
- **Tor Exit Node Blocking**: Be aware that some websites actively block traffic from Tor exit nodes, which may limit the effectiveness of this approach.
- **Cookie Loading Limitations**: Some sites may have restrictions that prevent successful cookie loading, loading cookies will not always work.
## Installation
You can install Torsel directly from PyPI:
```
pip install torsel
```
## Prerequisites
### Linux
On Linux machines make sure you have tor and chromium installed with the following command:
```
sudo apt install tor chromium
```
### Windows
You need to have the Tor binary available to invoke the path pointing to it inside the Torsel object.<br>[Here](https://www.torproject.org/download/tor/) you can download the expert bundle with the Tor binaries.
## Usage
### Simple example
This simple example scrapes the IP address 10 times, demonstrating IP rotation using Tor:
```python
from torsel import Torsel
# Selenium function to invoke in the Torsel object
def collect_ip(driver, wait, EC, By):
driver.get("http://icanhazip.com")
wait.until(EC.text_to_be_present_in_element((By.TAG_NAME, "body"), "."))
ip_address = driver.find_element(By.TAG_NAME, "body").text.strip()
print(f"[+] Current Tor IP: {ip_address}")
# Torsel object
torsel = Torsel(headless=True, # Invoke Torsel in headless mode and run
tor_path="/usr/bin/tor", # path to executable
tor_data_dir="/tmp/tor_profiles") # tor profiles path dest
torsel.run(10, collect_ip)
```
For detailed examples on how to use Torsel, please refer to the [examples directory](https://github.com/azuk4r/torsel/tree/main/examples).
### List of examples:
* [Detailed simple example (Single thread IP rotation)](https://github.com/azuk4r/torsel/blob/main/examples/simple_ip_rotation.py)
* [Verify Tor IP rotation with multithreading](https://github.com/azuk4r/torsel/blob/main/examples/multithread_ip_rotation.py)
* [Script to analyze the frequency of IP usage](https://github.com/azuk4r/torsel/blob/main/examples/tor_ip_usage_analyzer.py)
* [Simple session cookie loading with a single instance and single URL](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/simple_one_url_one_instance.py)
* [Simple session cookie loading with multiple instances across the same URL](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/simple_one_url_multi_instance.py)
* [Load and verify session cookies for two different URLs with a single instance](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/advanced_mapping_two_url_one_instance.py)
* [Load and verify different session cookies for two URLs across multiple instances](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/advanced_mapping_two_url_multi_instance.py)
### Advanced Configuration
Torsel is highly configurable to suit various use cases:
* **total_instances**: Number of Tor instances to create.
* **max_threads**: Maximum number of concurrent threads.
* **tor_base_port**: Starting port for Tor SOCKS connections.
* **tor_control_base_port:** Starting port for Tor control connections.
* **tor_path**: Path to the Tor executable.
* **tor_data_dir**: Directory to store Tor profile data.
* **user_agent**: Specifies the user_agent, if None a random one is selected.
* **headless**: Run Selenium in headless mode if `True`.
* **verbose**: Enable detailed logging if `True`.
* **cookies_dir**: Directory to store and load cookies (optional).
* **cookies_mapping**: A mapping of URLs to specific cookie files, allowing for advanced session management across multiple instances (optional).
Additionally, within the Selenium-related configurations, Torsel automatically handles the following parameters for functions declared within it:
* **driver**: Managed by Torsel and passed automatically to your function. No need to instantiate or manage it yourself.
* **wait**: An instance of WebDriverWait configured with a 10-second timeout, provided by Torsel.
* **By**: The By module from Selenium, used for locating elements (e.g., by ID, class name).
* **EC** (ExpectedConditions): The ExpectedConditions module from Selenium, used to define conditions like element visibility or text presence.
* **action_num**: The number of the current action being executed, provided automatically by Torsel.
* **instance_num**: The instance number of the Tor connection in use, passed automatically to your function.
* **log**: A logging function provided by Torsel to output messages during execution.
## Contributing
Hey! <img src="https://images.emojiterra.com/google/noto-emoji/unicode-15.1/color/svg/1f44b.svg" alt="emoji waving hand" width="20"/> Any kind of contribution is welcome. Send PR if you have improvements or examples of use to contribute!
## License
This project is licensed under the MIT License - see the [LICENSE](https://github.com/azuk4r/torsel/blob/main/LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/azuk4r/torsel",
"name": "torsel",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "tor selenium web scraping automation",
"author": "azuk4r",
"author_email": "azuk4r@tuta.io",
"download_url": "https://files.pythonhosted.org/packages/91/f9/1c0ce8538a890eb0b762fe790f2fa910a828e90ecc2465c5ac3c397c7f17/torsel-0.4.31.tar.gz",
"platform": null,
"description": "# Torsel: Tor and Selenium Automation\r\n**Torsel** is a Python module designed to manage multiple Tor instances and automate web tasks using Selenium. It is particularly useful for web automation and web scraping tasks that require IP rotation to enhance anonymity.\r\n\r\n## Disclaimer\r\nThis project is currently under development and subject to ongoing updates and enhancements. Please note that features and functionality may change as the project evolves.\r\nIt hasn't been tested on macOS, so any feedback is welcome! If you're interested in collaborating, check out the [Contributing](#contributing) section below <img src=\"https://images.emojiterra.com/google/noto-emoji/unicode-15.1/color/svg/1f447.svg\" alt=\"emoji pointing down\" width=\"20\"/>\r\n\r\n## Key Features\r\n- **Cross-Platform Support**: Compatible with Linux, Windows and macOS.\r\n- **Automated IP Rotation**: Seamlessly rotate IP addresses using multiple Tor instances.\r\n- **Web Scraping and Automation**: Ideal for tasks that require anonymity.\r\n- **Easy Configuration**: Automatically sets up, configures, and manages Tor instances.\r\n- **Integration with Selenium**: Run your Selenium scripts with the added anonymity of Tor.\r\n- **Flexible and Advanced Cookie Management**: Load and manage custom cookies across multiple instances with support for both simple and advanced mapping configurations.\r\n- **Bypassing IP-Based Restrictions**: Torsel can help bypass some IP-based restrictions by rotating IP addresses through Tor nodes.\r\n\r\n## Considerations\r\n- **Tor Exit Node Blocking**: Be aware that some websites actively block traffic from Tor exit nodes, which may limit the effectiveness of this approach.\r\n- **Cookie Loading Limitations**: Some sites may have restrictions that prevent successful cookie loading, loading cookies will not always work.\r\n\r\n## Installation\r\nYou can install Torsel directly from PyPI:\r\n```\r\npip install torsel\r\n```\r\n\r\n## Prerequisites\r\n\r\n### Linux\r\nOn Linux machines make sure you have tor and chromium installed with the following command:\r\n```\r\nsudo apt install tor chromium\r\n```\r\n\r\n### Windows\r\nYou need to have the Tor binary available to invoke the path pointing to it inside the Torsel object.<br>[Here](https://www.torproject.org/download/tor/) you can download the expert bundle with the Tor binaries.\r\n\r\n## Usage\r\n### Simple example\r\nThis simple example scrapes the IP address 10 times, demonstrating IP rotation using Tor:\r\n```python\r\nfrom torsel import Torsel\r\n\r\n# Selenium function to invoke in the Torsel object\r\ndef collect_ip(driver, wait, EC, By):\r\n driver.get(\"http://icanhazip.com\")\r\n wait.until(EC.text_to_be_present_in_element((By.TAG_NAME, \"body\"), \".\"))\r\n ip_address = driver.find_element(By.TAG_NAME, \"body\").text.strip()\r\n print(f\"[+] Current Tor IP: {ip_address}\")\r\n\r\n# Torsel object\r\ntorsel = Torsel(headless=True, # Invoke Torsel in headless mode and run\r\n tor_path=\"/usr/bin/tor\", # path to executable\r\n tor_data_dir=\"/tmp/tor_profiles\") # tor profiles path dest\r\ntorsel.run(10, collect_ip)\r\n```\r\n\r\nFor detailed examples on how to use Torsel, please refer to the [examples directory](https://github.com/azuk4r/torsel/tree/main/examples).\r\n\r\n### List of examples:\r\n* [Detailed simple example (Single thread IP rotation)](https://github.com/azuk4r/torsel/blob/main/examples/simple_ip_rotation.py)\r\n* [Verify Tor IP rotation with multithreading](https://github.com/azuk4r/torsel/blob/main/examples/multithread_ip_rotation.py)\r\n* [Script to analyze the frequency of IP usage](https://github.com/azuk4r/torsel/blob/main/examples/tor_ip_usage_analyzer.py)\r\n* [Simple session cookie loading with a single instance and single URL](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/simple_one_url_one_instance.py)\r\n* [Simple session cookie loading with multiple instances across the same URL](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/simple_one_url_multi_instance.py)\r\n* [Load and verify session cookies for two different URLs with a single instance](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/advanced_mapping_two_url_one_instance.py)\r\n* [Load and verify different session cookies for two URLs across multiple instances](https://github.com/azuk4r/torsel/blob/main/examples/loading_cookies/advanced_mapping_two_url_multi_instance.py)\r\n\r\n### Advanced Configuration\r\nTorsel is highly configurable to suit various use cases:\r\n* **total_instances**: Number of Tor instances to create.\r\n* **max_threads**: Maximum number of concurrent threads.\r\n* **tor_base_port**: Starting port for Tor SOCKS connections.\r\n* **tor_control_base_port:** Starting port for Tor control connections.\r\n* **tor_path**: Path to the Tor executable.\r\n* **tor_data_dir**: Directory to store Tor profile data.\r\n* **user_agent**: Specifies the user_agent, if None a random one is selected.\r\n* **headless**: Run Selenium in headless mode if `True`.\r\n* **verbose**: Enable detailed logging if `True`.\r\n* **cookies_dir**: Directory to store and load cookies (optional).\r\n* **cookies_mapping**: A mapping of URLs to specific cookie files, allowing for advanced session management across multiple instances (optional).\r\n\r\nAdditionally, within the Selenium-related configurations, Torsel automatically handles the following parameters for functions declared within it:\r\n* **driver**: Managed by Torsel and passed automatically to your function. No need to instantiate or manage it yourself.\r\n* **wait**: An instance of WebDriverWait configured with a 10-second timeout, provided by Torsel.\r\n* **By**: The By module from Selenium, used for locating elements (e.g., by ID, class name).\r\n* **EC** (ExpectedConditions): The ExpectedConditions module from Selenium, used to define conditions like element visibility or text presence.\r\n* **action_num**: The number of the current action being executed, provided automatically by Torsel.\r\n* **instance_num**: The instance number of the Tor connection in use, passed automatically to your function.\r\n* **log**: A logging function provided by Torsel to output messages during execution.\r\n\r\n## Contributing\r\nHey! <img src=\"https://images.emojiterra.com/google/noto-emoji/unicode-15.1/color/svg/1f44b.svg\" alt=\"emoji waving hand\" width=\"20\"/> Any kind of contribution is welcome. Send PR if you have improvements or examples of use to contribute!\r\n\r\n## License\r\nThis project is licensed under the MIT License - see the [LICENSE](https://github.com/azuk4r/torsel/blob/main/LICENSE) file for details.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python module for managing Tor instances with Selenium",
"version": "0.4.31",
"project_urls": {
"Homepage": "https://github.com/azuk4r/torsel"
},
"split_keywords": [
"tor",
"selenium",
"web",
"scraping",
"automation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f7b62eda3a73d979120f54d56d223ac68cdfd6fbf729318cb07dc4bbac6b496d",
"md5": "5a50977be87273229b1b20717d662911",
"sha256": "134595d1e2c2c5358f6708758ec6838bc6e5e0cd1d88ca4466f312c0e109062c"
},
"downloads": -1,
"filename": "torsel-0.4.31-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5a50977be87273229b1b20717d662911",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10074,
"upload_time": "2024-09-08T19:20:45",
"upload_time_iso_8601": "2024-09-08T19:20:45.205991Z",
"url": "https://files.pythonhosted.org/packages/f7/b6/2eda3a73d979120f54d56d223ac68cdfd6fbf729318cb07dc4bbac6b496d/torsel-0.4.31-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "91f91c0ce8538a890eb0b762fe790f2fa910a828e90ecc2465c5ac3c397c7f17",
"md5": "c21a1ee2639ca23f4b8074e36251e01d",
"sha256": "be6f4bf518d0aa581a0e6fb77768e336242576aa0f0dcd58ec3b87d97d9e79e0"
},
"downloads": -1,
"filename": "torsel-0.4.31.tar.gz",
"has_sig": false,
"md5_digest": "c21a1ee2639ca23f4b8074e36251e01d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11593,
"upload_time": "2024-09-08T19:20:46",
"upload_time_iso_8601": "2024-09-08T19:20:46.478141Z",
"url": "https://files.pythonhosted.org/packages/91/f9/1c0ce8538a890eb0b762fe790f2fa910a828e90ecc2465c5ac3c397c7f17/torsel-0.4.31.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-08 19:20:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "azuk4r",
"github_project": "torsel",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "setuptools",
"specs": [
[
"==",
"73.0.1"
]
]
},
{
"name": "selenium",
"specs": [
[
"==",
"4.23.1"
]
]
},
{
"name": "stem",
"specs": [
[
"==",
"1.8.2"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"5.9.8"
]
]
}
],
"lcname": "torsel"
}