<p align="center"><img src=".github/images/logo.png" width="256" alt="PhishingWebCollector" title="PhishingWebCollector"/></p>
<h1 align="center">
⚔️ PhishingWebCollector: A Python Library for Phishing Website Collection ⚔️
</h1>
<p align="center">
<img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/phishing-web-collector.svg">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/phishing-web-collector.svg" href="https://pepy.tech/project/phishing-web-collector">
<a href="https://repology.org/project/python:phishing-web-collector/versions">
<img src="https://repology.org/badge/tiny-repos/python:phishing-web-collector.svg" alt="Packaging status">
</a>
<img alt="Downloads" src="https://pepy.tech/badge/phishing-web-collector">
<img alt="GitHub license" src="https://img.shields.io/github/license/damianfraszczak/phishing-web-collector.svg" href="https://github.com/damianfraszczak/phishing-web-collector/blob/master/LICENSE">
<img alt="Documentation Status" src="https://readthedocs.org/projects/phishing-web-collector/badge/?version=latest" href="https://phishing-web-collector.readthedocs.io/en/latest/?badge=latest">
</p>
<p align="center">
<a href="https://github.com/damianfraszczak/phishing-web-collector?tab=readme-ov-file#why-PhishingWebCollector">✨ Why PhishingWebCollector?</a>
<a href="https://github.com/damianfraszczak/phishing-web-collector?tab=readme-ov-file#features">📦 Features</a>
<a href="https://github.com/damianfraszczak/phishing-web-collector/blob/master/docs/files/QUICK_START.md">🚀 Quick Start</a>
<a href="https://phishing-web-collector.readthedocs.io/">📮 Documentation</a>
<a href="https://github.com/damianfraszczak/phishing-web-collector/blob/master/docs/files/jupyter">📓 Jupyter Notebook examples</a>
<a href="LICENSE">🔑 License</a>
</p>
## Overview
`PhishingWebCollector` is a Python library that integrates 20 phishing feeds into one solution and offers a platform for collecting and managing malicious website data.
Suitable for practical cybersecurity applications, like updating local blacklists, and research, such as building phishing detection datasets.
It utilizes the asyncio module for efficient parallel processing and data collection.
Users can gather historical data from free feeds to construct extensive datasets without costly API subscriptions.
Its ease of use, scalability, and support for various data formats enhance the threat detection capabilities of cybersecurity teams and researchers while minimizing technical overhead.
* **Free software:** MIT license,
* **Documentation:** https://phishing-web-collector.readthedocs.io/en/latest/,
* **Python versions:** 3.9 | 3.10 | 3.11
* **Tested OS:** Windows, Ubuntu, Fedora and CentOS. **However, that does not mean it does not work on others.**
* **All-in-One Solution::** PhishingWebCollector is an all-in-one solution that allows for the collection of a wide range of information about websites.
* **Efficiency and Expertise: :** Building a similar solution independently would be very time-consuming and require specialized knowledge.
* **Open Source Advantage: :** Publishing this tool as open source will facilitate many studies, making them simpler and allowing researchers and industry professionals to focus on more advanced tasks.
* **Continuous Improvement: :** New techniques will be added successively, ensuring continuous growth in this area.
## Features
- Integration of 20 Different Sources: Reduces the need to maintain multiple integrations.
- Local Data Collection: Supports building and maintaining local phishing databases.
- Data Export: Allows exporting all collected data in a unified JSON format.
- Asynchronous Performance: Uses asyncio for faster, simultaneous data collection.
### Integrations
- BinaryDefence
- BlockListDe
- Botvrij
- C2IntelFeeds
- C2Tracker
- CertPL
- Ellio
- GreenSnow
- MiraiSecurity
- OpenPhish
- PhishTank
- PhishingArmy
- PhishingDatabase
- PhishStats
- Proofpoint
- ThreatView
- TweetFeed
- URLAbuse
- URLHaus
- Valdin
## Why PhishingWebCollector?
While many tools and scripts can collect phishing data, none offer a complete all-in-one solution like `PhishingWebCollector`. It combines comprehensive functionality with high performance, asynchronous data collection, and easy configuration, making it both efficient and user-friendly.
## How to use
Library can be installed using pip:
```bash
pip install phishing-web-collector
```
## Code usage
### Getting all phishing domains from all available sources
```python
import phishing_web_collector as pwc
manager = pwc.FeedManager(
sources=list(pwc.FeedSource),
storage_path="feeds_data"
)
manager.sync_refresh_all()
entries = manager.sync_retrieve_all()
phishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]
for domain in phishing_domains:
print(domain)
```
and as a results you will get the list of phishing domains.
All modules are exported into main package, so you can use import module and invoke them directly.
## Contributing
For contributing, refer to its [CONTRIBUTING.md](.github/CONTRIBUTING.md) file.
We are a welcoming community... just follow the [Code of Conduct](.github/CODE_OF_CONDUCT.md).
## Maintainers
Project maintainers are:
- Damian Frąszczak
- Edyta Frąszczak
Raw data
{
"_id": null,
"home_page": "https://github.com/damianfraszczak/phishing-web-collector",
"name": "phishing-web-collector",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "phishing_websites malicious_websites phishing",
"author": "Damian Fr\u0105szczak, Edyta Fr\u0105szczak",
"author_email": "damian.fraszczak@wat.edu.pl",
"download_url": "https://files.pythonhosted.org/packages/47/50/70c30cb6ecfd5edc06f2f184382387a6c0da0985242e3f8f1b4e5929893f/phishing_web_collector-0.1.1.tar.gz",
"platform": null,
"description": "<p align=\"center\"><img src=\".github/images/logo.png\" width=\"256\" alt=\"PhishingWebCollector\" title=\"PhishingWebCollector\"/></p>\n\n<h1 align=\"center\">\n \u2694\ufe0f PhishingWebCollector: A Python Library for Phishing Website Collection \u2694\ufe0f\n</h1>\n\n<p align=\"center\">\n <img alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/phishing-web-collector.svg\">\n <img alt=\"PyPI - Downloads\" src=\"https://img.shields.io/pypi/dm/phishing-web-collector.svg\" href=\"https://pepy.tech/project/phishing-web-collector\">\n <a href=\"https://repology.org/project/python:phishing-web-collector/versions\">\n <img src=\"https://repology.org/badge/tiny-repos/python:phishing-web-collector.svg\" alt=\"Packaging status\">\n </a>\n <img alt=\"Downloads\" src=\"https://pepy.tech/badge/phishing-web-collector\">\n <img alt=\"GitHub license\" src=\"https://img.shields.io/github/license/damianfraszczak/phishing-web-collector.svg\" href=\"https://github.com/damianfraszczak/phishing-web-collector/blob/master/LICENSE\">\n <img alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/phishing-web-collector/badge/?version=latest\" href=\"https://phishing-web-collector.readthedocs.io/en/latest/?badge=latest\">\n</p>\n\n<p align=\"center\">\n <a href=\"https://github.com/damianfraszczak/phishing-web-collector?tab=readme-ov-file#why-PhishingWebCollector\">\u2728 Why PhishingWebCollector?</a>\n <a href=\"https://github.com/damianfraszczak/phishing-web-collector?tab=readme-ov-file#features\">\ud83d\udce6 Features</a>\n <a href=\"https://github.com/damianfraszczak/phishing-web-collector/blob/master/docs/files/QUICK_START.md\">\ud83d\ude80 Quick Start</a>\n <a href=\"https://phishing-web-collector.readthedocs.io/\">\ud83d\udcee Documentation</a>\n <a href=\"https://github.com/damianfraszczak/phishing-web-collector/blob/master/docs/files/jupyter\">\ud83d\udcd3 Jupyter Notebook examples</a>\n <a href=\"LICENSE\">\ud83d\udd11 License</a>\n</p>\n\n\n## Overview\n`PhishingWebCollector` is a Python library that integrates 20 phishing feeds into one solution and offers a platform for collecting and managing malicious website data.\nSuitable for practical cybersecurity applications, like updating local blacklists, and research, such as building phishing detection datasets.\nIt utilizes the asyncio module for efficient parallel processing and data collection.\nUsers can gather historical data from free feeds to construct extensive datasets without costly API subscriptions.\nIts ease of use, scalability, and support for various data formats enhance the threat detection capabilities of cybersecurity teams and researchers while minimizing technical overhead.\n\n\n\n* **Free software:** MIT license,\n* **Documentation:** https://phishing-web-collector.readthedocs.io/en/latest/,\n* **Python versions:** 3.9 | 3.10 | 3.11\n* **Tested OS:** Windows, Ubuntu, Fedora and CentOS. **However, that does not mean it does not work on others.**\n* **All-in-One Solution::** PhishingWebCollector is an all-in-one solution that allows for the collection of a wide range of information about websites.\n* **Efficiency and Expertise: :** Building a similar solution independently would be very time-consuming and require specialized knowledge.\n* **Open Source Advantage: :** Publishing this tool as open source will facilitate many studies, making them simpler and allowing researchers and industry professionals to focus on more advanced tasks.\n* **Continuous Improvement: :** New techniques will be added successively, ensuring continuous growth in this area.\n\n## Features\n- Integration of 20 Different Sources: Reduces the need to maintain multiple integrations.\n- Local Data Collection: Supports building and maintaining local phishing databases.\n- Data Export: Allows exporting all collected data in a unified JSON format.\n- Asynchronous Performance: Uses asyncio for faster, simultaneous data collection.\n\n### Integrations\n- BinaryDefence\n- BlockListDe\n- Botvrij\n- C2IntelFeeds\n- C2Tracker\n- CertPL\n- Ellio\n- GreenSnow\n- MiraiSecurity\n- OpenPhish\n- PhishTank\n- PhishingArmy\n- PhishingDatabase\n- PhishStats\n- Proofpoint\n- ThreatView\n- TweetFeed\n- URLAbuse\n- URLHaus\n- Valdin\n\n## Why PhishingWebCollector?\nWhile many tools and scripts can collect phishing data, none offer a complete all-in-one solution like `PhishingWebCollector`. It combines comprehensive functionality with high performance, asynchronous data collection, and easy configuration, making it both efficient and user-friendly.\n\n\n## How to use\nLibrary can be installed using pip:\n\n```bash\npip install phishing-web-collector\n```\n\n## Code usage\n\n### Getting all phishing domains from all available sources\n\n```python\nimport phishing_web_collector as pwc\n\nmanager = pwc.FeedManager(\n sources=list(pwc.FeedSource),\n storage_path=\"feeds_data\"\n)\n\nmanager.sync_refresh_all()\nentries = manager.sync_retrieve_all()\n\nphishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]\n\nfor domain in phishing_domains:\n print(domain)\n\n```\nand as a results you will get the list of phishing domains.\n\nAll modules are exported into main package, so you can use import module and invoke them directly.\n\n## Contributing\n\nFor contributing, refer to its [CONTRIBUTING.md](.github/CONTRIBUTING.md) file.\nWe are a welcoming community... just follow the [Code of Conduct](.github/CODE_OF_CONDUCT.md).\n\n## Maintainers\n\nProject maintainers are:\n\n- Damian Fr\u0105szczak\n- Edyta Fr\u0105szczak\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Phishing Web Collector",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/damianfraszczak/phishing-web-collector"
},
"split_keywords": [
"phishing_websites",
"malicious_websites",
"phishing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "aba828705525c8764510f8f0d267bffba1fcb09eb2ddd1e39209a7b412779d5e",
"md5": "388df1ef50112d0ef72fea2bb01bcbaf",
"sha256": "ef0deadb67f3ef542e9cdd6e955dd6c8aca2abcb0168720b865c34f61f9770ac"
},
"downloads": -1,
"filename": "phishing_web_collector-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "388df1ef50112d0ef72fea2bb01bcbaf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 23332,
"upload_time": "2025-02-25T11:10:45",
"upload_time_iso_8601": "2025-02-25T11:10:45.312158Z",
"url": "https://files.pythonhosted.org/packages/ab/a8/28705525c8764510f8f0d267bffba1fcb09eb2ddd1e39209a7b412779d5e/phishing_web_collector-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "475070c30cb6ecfd5edc06f2f184382387a6c0da0985242e3f8f1b4e5929893f",
"md5": "e83a4e73c396041d336f98aa7947123d",
"sha256": "e3f2f05662d8a629bf76d8b9dfb22bdacc70523c255c8ea37b44d3fc36ddec27"
},
"downloads": -1,
"filename": "phishing_web_collector-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "e83a4e73c396041d336f98aa7947123d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14502,
"upload_time": "2025-02-25T11:10:48",
"upload_time_iso_8601": "2025-02-25T11:10:48.132935Z",
"url": "https://files.pythonhosted.org/packages/47/50/70c30cb6ecfd5edc06f2f184382387a6c0da0985242e3f8f1b4e5929893f/phishing_web_collector-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-25 11:10:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "damianfraszczak",
"github_project": "phishing-web-collector",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "aiohttp",
"specs": []
}
],
"lcname": "phishing-web-collector"
}