parsextract


Nameparsextract JSON
Version 0.1.10 PyPI version JSON
download
home_pagehttps://github.com/deepakkumar/parsextract
SummaryA library to extract IPs, domains, and emails from text
upload_time2025-02-03 09:30:27
maintainerNone
docs_urlNone
authorDeepak Kumar
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Parsextract

Parsextract is a Python library that extracts **IP addresses, domains, and email addresses** from raw text. It supports validation against IANA-recognized TLDs and includes additional custom TLDs sourced from OpenNIC.

## Features
- Extracts IPv4 addresses from raw text.
- Extracts domain names and validates them against IANA and OpenNIC TLDs.
- Extracts email addresses using regex-based parsing.
- Lightweight and easy to integrate.

## Installation
```sh
pip install parsextract
```

## Usage
```python
import parsextract

text = """
    Contact us at support@example.com or visit https://mywebsite.geek for more info.
    Our server IP is 192.168.1.1.
"""

# Extract IPs
ips = parsextract.extract_ips(text)
print("IPs:", ips)

# Extract Domains
domains = parsextract.extract_domains(text)
print("Domains:", domains)

# Extract Emails
emails = parsextract.extract_emails(text)
print("Emails:", emails)
```

### Example Output
```
IPs: ['192.168.1.1']
Domains: ['mywebsite.geek']
Emails: ['support@example.com']
```

## Regular Expressions Used
- **IP Address Regex**:
  ```regex
  (?i)(?<![a-zA-Z0-9:.])(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}(?![a-zA-Z0-9/.])
  ```
- **Domain Regex**:
  ```regex
  (?i)(?:^|\s|["']|[,])((?:(?:[A-Z0-9_](?:[A-Z0-9_-]{0,61}[A-Z0-9_])?\.)+)(?:[A-Z0-9-]{1,63}(?<!-)))(?!(?:/[^\s/]+))(?![a-zA-Z0-9@])
  ```
- **Email Regex**:
  ```regex
  (?<!@)\b[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,}\b(?!@)
  ```

## Supported TLDs
The library validates domains using **IANA-approved TLDs** and additional custom TLDs from OpenNIC.

### Custom TLDs Supported:
```
.bbs, .chan, .cyb, .dyn, .epic, .geek, .gopher, .indy, .libre, .neo, .null,
.o, .oss, .oz, .parody, .pirate
```

## License
This project is licensed under the MIT License.

## Contributing
Feel free to open an issue or submit a pull request for improvements.

## Author
Maintained by deepak.kumar@cyware.com.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/deepakkumar/parsextract",
    "name": "parsextract",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Deepak Kumar",
    "author_email": "deepak.kumar@cyware.com",
    "download_url": "https://files.pythonhosted.org/packages/9f/d3/096ff42084cab7186849dcf6959ed712a56500a511df5e3c1f02e4c26312/parsextract-0.1.10.tar.gz",
    "platform": null,
    "description": "# Parsextract\n\nParsextract is a Python library that extracts **IP addresses, domains, and email addresses** from raw text. It supports validation against IANA-recognized TLDs and includes additional custom TLDs sourced from OpenNIC.\n\n## Features\n- Extracts IPv4 addresses from raw text.\n- Extracts domain names and validates them against IANA and OpenNIC TLDs.\n- Extracts email addresses using regex-based parsing.\n- Lightweight and easy to integrate.\n\n## Installation\n```sh\npip install parsextract\n```\n\n## Usage\n```python\nimport parsextract\n\ntext = \"\"\"\n    Contact us at support@example.com or visit https://mywebsite.geek for more info.\n    Our server IP is 192.168.1.1.\n\"\"\"\n\n# Extract IPs\nips = parsextract.extract_ips(text)\nprint(\"IPs:\", ips)\n\n# Extract Domains\ndomains = parsextract.extract_domains(text)\nprint(\"Domains:\", domains)\n\n# Extract Emails\nemails = parsextract.extract_emails(text)\nprint(\"Emails:\", emails)\n```\n\n### Example Output\n```\nIPs: ['192.168.1.1']\nDomains: ['mywebsite.geek']\nEmails: ['support@example.com']\n```\n\n## Regular Expressions Used\n- **IP Address Regex**:\n  ```regex\n  (?i)(?<![a-zA-Z0-9:.])(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])(?:\\.(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}(?![a-zA-Z0-9/.])\n  ```\n- **Domain Regex**:\n  ```regex\n  (?i)(?:^|\\s|[\"']|[,])((?:(?:[A-Z0-9_](?:[A-Z0-9_-]{0,61}[A-Z0-9_])?\\.)+)(?:[A-Z0-9-]{1,63}(?<!-)))(?!(?:/[^\\s/]+))(?![a-zA-Z0-9@])\n  ```\n- **Email Regex**:\n  ```regex\n  (?<!@)\\b[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]{2,}\\b(?!@)\n  ```\n\n## Supported TLDs\nThe library validates domains using **IANA-approved TLDs** and additional custom TLDs from OpenNIC.\n\n### Custom TLDs Supported:\n```\n.bbs, .chan, .cyb, .dyn, .epic, .geek, .gopher, .indy, .libre, .neo, .null,\n.o, .oss, .oz, .parody, .pirate\n```\n\n## License\nThis project is licensed under the MIT License.\n\n## Contributing\nFeel free to open an issue or submit a pull request for improvements.\n\n## Author\nMaintained by deepak.kumar@cyware.com.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library to extract IPs, domains, and emails from text",
    "version": "0.1.10",
    "project_urls": {
        "Homepage": "https://github.com/deepakkumar/parsextract"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "26791b43423cd57ef5eb530934807243f5d5044bda2b5a405e6747ce33542061",
                "md5": "ff45c13eb78fa8051adae89dc4441892",
                "sha256": "b93e1f460fc176492ef233c80cb40b4fb64e5458f4e79a82419931d75d57a610"
            },
            "downloads": -1,
            "filename": "parsextract-0.1.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff45c13eb78fa8051adae89dc4441892",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 9573,
            "upload_time": "2025-02-03T09:30:25",
            "upload_time_iso_8601": "2025-02-03T09:30:25.506145Z",
            "url": "https://files.pythonhosted.org/packages/26/79/1b43423cd57ef5eb530934807243f5d5044bda2b5a405e6747ce33542061/parsextract-0.1.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9fd3096ff42084cab7186849dcf6959ed712a56500a511df5e3c1f02e4c26312",
                "md5": "e1f84d5c61c138d6406e18b02ecd0c37",
                "sha256": "512b5d1dff16e8ef0117072be19418ae837bc765c6d4578f5ee46ac037c814e3"
            },
            "downloads": -1,
            "filename": "parsextract-0.1.10.tar.gz",
            "has_sig": false,
            "md5_digest": "e1f84d5c61c138d6406e18b02ecd0c37",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 9641,
            "upload_time": "2025-02-03T09:30:27",
            "upload_time_iso_8601": "2025-02-03T09:30:27.668406Z",
            "url": "https://files.pythonhosted.org/packages/9f/d3/096ff42084cab7186849dcf6959ed712a56500a511df5e3c1f02e4c26312/parsextract-0.1.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-03 09:30:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "deepakkumar",
    "github_project": "parsextract",
    "github_not_found": true,
    "lcname": "parsextract"
}
        
Elapsed time: 1.74158s