# Parsextract
Parsextract is a Python library that extracts **IP addresses, domains, and email addresses** from raw text. It supports validation against IANA-recognized TLDs and includes additional custom TLDs sourced from OpenNIC.
## Features
- Extracts IPv4 addresses from raw text.
- Extracts domain names and validates them against IANA and OpenNIC TLDs.
- Extracts email addresses using regex-based parsing.
- Lightweight and easy to integrate.
## Installation
```sh
pip install parsextract
```
## Usage
```python
import parsextract
text = """
Contact us at support@example.com or visit https://mywebsite.geek for more info.
Our server IP is 192.168.1.1.
"""
# Extract IPs
ips = parsextract.extract_ips(text)
print("IPs:", ips)
# Extract Domains
domains = parsextract.extract_domains(text)
print("Domains:", domains)
# Extract Emails
emails = parsextract.extract_emails(text)
print("Emails:", emails)
```
### Example Output
```
IPs: ['192.168.1.1']
Domains: ['mywebsite.geek']
Emails: ['support@example.com']
```
## Regular Expressions Used
- **IP Address Regex**:
```regex
(?i)(?<![a-zA-Z0-9:.])(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}(?![a-zA-Z0-9/.])
```
- **Domain Regex**:
```regex
(?i)(?:^|\s|["']|[,])((?:(?:[A-Z0-9_](?:[A-Z0-9_-]{0,61}[A-Z0-9_])?\.)+)(?:[A-Z0-9-]{1,63}(?<!-)))(?!(?:/[^\s/]+))(?![a-zA-Z0-9@])
```
- **Email Regex**:
```regex
(?<!@)\b[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,}\b(?!@)
```
## Supported TLDs
The library validates domains using **IANA-approved TLDs** and additional custom TLDs from OpenNIC.
### Custom TLDs Supported:
```
.bbs, .chan, .cyb, .dyn, .epic, .geek, .gopher, .indy, .libre, .neo, .null,
.o, .oss, .oz, .parody, .pirate
```
## License
This project is licensed under the MIT License.
## Contributing
Feel free to open an issue or submit a pull request for improvements.
## Author
Maintained by deepak.kumar@cyware.com.
Raw data
{
"_id": null,
"home_page": "https://github.com/deepakkumar/parsextract",
"name": "parsextract",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Deepak Kumar",
"author_email": "deepak.kumar@cyware.com",
"download_url": "https://files.pythonhosted.org/packages/9f/d3/096ff42084cab7186849dcf6959ed712a56500a511df5e3c1f02e4c26312/parsextract-0.1.10.tar.gz",
"platform": null,
"description": "# Parsextract\n\nParsextract is a Python library that extracts **IP addresses, domains, and email addresses** from raw text. It supports validation against IANA-recognized TLDs and includes additional custom TLDs sourced from OpenNIC.\n\n## Features\n- Extracts IPv4 addresses from raw text.\n- Extracts domain names and validates them against IANA and OpenNIC TLDs.\n- Extracts email addresses using regex-based parsing.\n- Lightweight and easy to integrate.\n\n## Installation\n```sh\npip install parsextract\n```\n\n## Usage\n```python\nimport parsextract\n\ntext = \"\"\"\n Contact us at support@example.com or visit https://mywebsite.geek for more info.\n Our server IP is 192.168.1.1.\n\"\"\"\n\n# Extract IPs\nips = parsextract.extract_ips(text)\nprint(\"IPs:\", ips)\n\n# Extract Domains\ndomains = parsextract.extract_domains(text)\nprint(\"Domains:\", domains)\n\n# Extract Emails\nemails = parsextract.extract_emails(text)\nprint(\"Emails:\", emails)\n```\n\n### Example Output\n```\nIPs: ['192.168.1.1']\nDomains: ['mywebsite.geek']\nEmails: ['support@example.com']\n```\n\n## Regular Expressions Used\n- **IP Address Regex**:\n ```regex\n (?i)(?<![a-zA-Z0-9:.])(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])(?:\\.(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}(?![a-zA-Z0-9/.])\n ```\n- **Domain Regex**:\n ```regex\n (?i)(?:^|\\s|[\"']|[,])((?:(?:[A-Z0-9_](?:[A-Z0-9_-]{0,61}[A-Z0-9_])?\\.)+)(?:[A-Z0-9-]{1,63}(?<!-)))(?!(?:/[^\\s/]+))(?![a-zA-Z0-9@])\n ```\n- **Email Regex**:\n ```regex\n (?<!@)\\b[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]{2,}\\b(?!@)\n ```\n\n## Supported TLDs\nThe library validates domains using **IANA-approved TLDs** and additional custom TLDs from OpenNIC.\n\n### Custom TLDs Supported:\n```\n.bbs, .chan, .cyb, .dyn, .epic, .geek, .gopher, .indy, .libre, .neo, .null,\n.o, .oss, .oz, .parody, .pirate\n```\n\n## License\nThis project is licensed under the MIT License.\n\n## Contributing\nFeel free to open an issue or submit a pull request for improvements.\n\n## Author\nMaintained by deepak.kumar@cyware.com.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A library to extract IPs, domains, and emails from text",
"version": "0.1.10",
"project_urls": {
"Homepage": "https://github.com/deepakkumar/parsextract"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "26791b43423cd57ef5eb530934807243f5d5044bda2b5a405e6747ce33542061",
"md5": "ff45c13eb78fa8051adae89dc4441892",
"sha256": "b93e1f460fc176492ef233c80cb40b4fb64e5458f4e79a82419931d75d57a610"
},
"downloads": -1,
"filename": "parsextract-0.1.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ff45c13eb78fa8051adae89dc4441892",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 9573,
"upload_time": "2025-02-03T09:30:25",
"upload_time_iso_8601": "2025-02-03T09:30:25.506145Z",
"url": "https://files.pythonhosted.org/packages/26/79/1b43423cd57ef5eb530934807243f5d5044bda2b5a405e6747ce33542061/parsextract-0.1.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9fd3096ff42084cab7186849dcf6959ed712a56500a511df5e3c1f02e4c26312",
"md5": "e1f84d5c61c138d6406e18b02ecd0c37",
"sha256": "512b5d1dff16e8ef0117072be19418ae837bc765c6d4578f5ee46ac037c814e3"
},
"downloads": -1,
"filename": "parsextract-0.1.10.tar.gz",
"has_sig": false,
"md5_digest": "e1f84d5c61c138d6406e18b02ecd0c37",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 9641,
"upload_time": "2025-02-03T09:30:27",
"upload_time_iso_8601": "2025-02-03T09:30:27.668406Z",
"url": "https://files.pythonhosted.org/packages/9f/d3/096ff42084cab7186849dcf6959ed712a56500a511df5e3c1f02e4c26312/parsextract-0.1.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-03 09:30:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "deepakkumar",
"github_project": "parsextract",
"github_not_found": true,
"lcname": "parsextract"
}