Name | CD-Parser JSON |
Version |
0.1.2
JSON |
| download |
home_page | https://codedocta.com |
Summary | An wrapper around the wonderful re and lmxl libraries to make it easier for new users and old. To scrape pages |
upload_time | 2023-10-27 20:36:52 |
maintainer | |
docs_url | None |
author | codedocta |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# RegexParser
## Installation for both xpath, regex parsers
Before you start, ensure you have `lxml` library installed:
```bash
pip install cd-parser
```
A utility class for commonly used regex operations in Python.
## Features
- **Replace**: Easily replace occurrences of a regex pattern with a new string.
- **Find All**: Retrieve all occurrences of a regex pattern in a string.
- **Find First**: Get the first occurrence of a regex pattern in a string.
- **Find Before**: Extract the portion of text immediately before a given substring.
- **Find After**: Fetch the portion of text immediately after a given substring.
- **Find Between**: Find text between two specified substrings.
- **Is Match**: Check if the input text matches a given regex pattern from the start.
- **Split**: Divide the input text using a provided regex pattern.
## Usage
Here are some example usages of the `RegexParser` class:
```python
from cd_parser.regex_parser import RegexParser
# Replace text
modified_text = RegexParser.replace("old", "new", "This is an old text.")
print(modified_text) # Output: "This is a new text."
# Find all matches
matches = RegexParser.find_all("[A-Za-z]+", "123 apple 456 banana")
print(matches) # Output: ['apple', 'banana']
# ... [You can add more examples for other methods]
```
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## License
[MIT](https://choosealicense.com/licenses/mit/)
Absolutely. Here's a README.md file for the `XpathParser` class:
---
# XpathParser
A simple and lightweight XPath parser class for extracting data from HTML/XML content. Built on top of the `lxml` library, it offers a variety of methods for precise element extraction based on various criteria.
## Features
- Fetch multiple elements or a single element using a custom XPath query.
- Predefined methods for common XPath queries like selecting by tag, attribute, text, etc.
- Simple, user-friendly, and Pythonic API.
## Usage
### Initialization
Create an instance of the `XpathParser` class with your HTML/XML content:
```python
from cd_parser.xpath_parser import XpathParser
doc_text = """
<html>
<body>
<a id="link1" href="https://example.com/page1">Link 1</a>
<a id="link2" href="https://example.com/page2">Link 2</a>
</body>
</html>
"""
parser = XpathParser(doc_text)
```
### Fetch Elements
Using custom XPath:
```python
links = parser.get_elements('//a')
print([link.text for link in links])
```
Get a single element (the first match):
```python
single_link = parser.get_element('//*[@id="link1"]')
if single_link:
print(single_link.text)
```
### Predefined Queries
Select all nodes:
```python
all_nodes = parser.select_all_nodes()
```
Select by tag:
```python
anchors = parser.select_by_tag("a")
```
Select by attribute:
```python
divs_with_class = parser.select_by_class("div", "my-class")
```
... and many more. Refer to the class docstrings for details on each method.
## Contributing
Feel free to fork the repository, make your changes, and submit pull requests. We appreciate all contributions!
---
Please note:
1. The filename `xpath_parser.py` is assumed in the usage example. Adjust it accordingly if you're using a different filename.
2. Modify sections like "Contributing" as per your actual project needs and repository policies. This is a generic template to help you get started.
License
MIT License
More documentation at:
[Code Docta](https://codedocta.com "Code Docta")
Raw data
{
"_id": null,
"home_page": "https://codedocta.com",
"name": "CD-Parser",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "codedocta",
"author_email": "codedocta@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b8/08/55edc92b97c7af756f87dc46e30bb54ea27384b19f6338d6c0cf0f4284d4/CD_Parser-0.1.2.tar.gz",
"platform": null,
"description": "\r\n# RegexParser\r\n\r\n## Installation for both xpath, regex parsers\r\n\r\nBefore you start, ensure you have `lxml` library installed:\r\n```bash\r\npip install cd-parser\r\n```\r\n\r\n\r\nA utility class for commonly used regex operations in Python.\r\n\r\n## Features\r\n\r\n- **Replace**: Easily replace occurrences of a regex pattern with a new string.\r\n- **Find All**: Retrieve all occurrences of a regex pattern in a string.\r\n- **Find First**: Get the first occurrence of a regex pattern in a string.\r\n- **Find Before**: Extract the portion of text immediately before a given substring.\r\n- **Find After**: Fetch the portion of text immediately after a given substring.\r\n- **Find Between**: Find text between two specified substrings.\r\n- **Is Match**: Check if the input text matches a given regex pattern from the start.\r\n- **Split**: Divide the input text using a provided regex pattern.\r\n\r\n## Usage\r\n\r\nHere are some example usages of the `RegexParser` class:\r\n\r\n```python\r\nfrom cd_parser.regex_parser import RegexParser\r\n\r\n\r\n# Replace text\r\nmodified_text = RegexParser.replace(\"old\", \"new\", \"This is an old text.\")\r\nprint(modified_text) # Output: \"This is a new text.\"\r\n\r\n# Find all matches\r\nmatches = RegexParser.find_all(\"[A-Za-z]+\", \"123 apple 456 banana\")\r\nprint(matches) # Output: ['apple', 'banana']\r\n\r\n# ... [You can add more examples for other methods]\r\n```\r\n\r\n\r\n## Contributing\r\n\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\n[MIT](https://choosealicense.com/licenses/mit/)\r\n\r\n\r\nAbsolutely. Here's a README.md file for the `XpathParser` class:\r\n\r\n---\r\n\r\n# XpathParser\r\n\r\nA simple and lightweight XPath parser class for extracting data from HTML/XML content. Built on top of the `lxml` library, it offers a variety of methods for precise element extraction based on various criteria.\r\n\r\n## Features\r\n- Fetch multiple elements or a single element using a custom XPath query.\r\n- Predefined methods for common XPath queries like selecting by tag, attribute, text, etc.\r\n- Simple, user-friendly, and Pythonic API.\r\n\r\n\r\n\r\n## Usage\r\n\r\n### Initialization\r\nCreate an instance of the `XpathParser` class with your HTML/XML content:\r\n\r\n```python\r\nfrom cd_parser.xpath_parser import XpathParser\r\n\r\ndoc_text = \"\"\"\r\n<html>\r\n <body>\r\n <a id=\"link1\" href=\"https://example.com/page1\">Link 1</a>\r\n <a id=\"link2\" href=\"https://example.com/page2\">Link 2</a>\r\n </body>\r\n</html>\r\n\"\"\"\r\n\r\nparser = XpathParser(doc_text)\r\n```\r\n\r\n### Fetch Elements\r\n\r\nUsing custom XPath:\r\n```python\r\nlinks = parser.get_elements('//a')\r\nprint([link.text for link in links])\r\n```\r\n\r\nGet a single element (the first match):\r\n```python\r\nsingle_link = parser.get_element('//*[@id=\"link1\"]')\r\nif single_link:\r\n print(single_link.text)\r\n```\r\n\r\n### Predefined Queries\r\n\r\nSelect all nodes:\r\n```python\r\nall_nodes = parser.select_all_nodes()\r\n```\r\n\r\nSelect by tag:\r\n```python\r\nanchors = parser.select_by_tag(\"a\")\r\n```\r\n\r\nSelect by attribute:\r\n```python\r\ndivs_with_class = parser.select_by_class(\"div\", \"my-class\")\r\n```\r\n\r\n... and many more. Refer to the class docstrings for details on each method.\r\n\r\n## Contributing\r\nFeel free to fork the repository, make your changes, and submit pull requests. We appreciate all contributions!\r\n\r\n---\r\n\r\nPlease note:\r\n1. The filename `xpath_parser.py` is assumed in the usage example. Adjust it accordingly if you're using a different filename.\r\n2. Modify sections like \"Contributing\" as per your actual project needs and repository policies. This is a generic template to help you get started.\r\n\r\n\r\nLicense\r\n\r\nMIT License\r\n\r\nMore documentation at:\r\n[Code Docta](https://codedocta.com \"Code Docta\")\r\n",
"bugtrack_url": null,
"license": "",
"summary": "An wrapper around the wonderful re and lmxl libraries to make it easier for new users and old. To scrape pages",
"version": "0.1.2",
"project_urls": {
"Bug Reports": "https://github.com/codedocta/CD_Parser/issues",
"Homepage": "https://codedocta.com",
"Source": "https://github.com/codedocta/CD_Parser/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "60f8aa0eabd7fa1eab03a5ad74c6038356cb87b5b5d1d554a87105d30c9a658e",
"md5": "d0c7d749ff742f672e4a22d1c6c63cff",
"sha256": "f07d6f5fbea13e5bef58522ca084090afb926ccff69c28bdba784412cb98fc1a"
},
"downloads": -1,
"filename": "CD_Parser-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d0c7d749ff742f672e4a22d1c6c63cff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7026,
"upload_time": "2023-10-27T20:36:50",
"upload_time_iso_8601": "2023-10-27T20:36:50.372231Z",
"url": "https://files.pythonhosted.org/packages/60/f8/aa0eabd7fa1eab03a5ad74c6038356cb87b5b5d1d554a87105d30c9a658e/CD_Parser-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b80855edc92b97c7af756f87dc46e30bb54ea27384b19f6338d6c0cf0f4284d4",
"md5": "bdb4b9f6f378fdd68a4a483febea795b",
"sha256": "2faac62d3f80c616824cca1517098eb9f67dec6fa9650f7f8e946110d3b56603"
},
"downloads": -1,
"filename": "CD_Parser-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "bdb4b9f6f378fdd68a4a483febea795b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5582,
"upload_time": "2023-10-27T20:36:52",
"upload_time_iso_8601": "2023-10-27T20:36:52.898713Z",
"url": "https://files.pythonhosted.org/packages/b8/08/55edc92b97c7af756f87dc46e30bb54ea27384b19f6338d6c0cf0f4284d4/CD_Parser-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-27 20:36:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "codedocta",
"github_project": "CD_Parser",
"github_not_found": true,
"lcname": "cd-parser"
}