CD-Parser


NameCD-Parser JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://codedocta.com
SummaryAn wrapper around the wonderful re and lmxl libraries to make it easier for new users and old. To scrape pages
upload_time2023-10-27 20:36:52
maintainer
docs_urlNone
authorcodedocta
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# RegexParser

## Installation for both xpath, regex parsers

Before you start, ensure you have `lxml` library installed:
```bash
pip install cd-parser
```


A utility class for commonly used regex operations in Python.

## Features

- **Replace**: Easily replace occurrences of a regex pattern with a new string.
- **Find All**: Retrieve all occurrences of a regex pattern in a string.
- **Find First**: Get the first occurrence of a regex pattern in a string.
- **Find Before**: Extract the portion of text immediately before a given substring.
- **Find After**: Fetch the portion of text immediately after a given substring.
- **Find Between**: Find text between two specified substrings.
- **Is Match**: Check if the input text matches a given regex pattern from the start.
- **Split**: Divide the input text using a provided regex pattern.

## Usage

Here are some example usages of the `RegexParser` class:

```python
from cd_parser.regex_parser import RegexParser


# Replace text
modified_text = RegexParser.replace("old", "new", "This is an old text.")
print(modified_text)  # Output: "This is a new text."

# Find all matches
matches = RegexParser.find_all("[A-Za-z]+", "123 apple 456 banana")
print(matches)  # Output: ['apple', 'banana']

# ... [You can add more examples for other methods]
```


## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License

[MIT](https://choosealicense.com/licenses/mit/)


Absolutely. Here's a README.md file for the `XpathParser` class:

---

# XpathParser

A simple and lightweight XPath parser class for extracting data from HTML/XML content. Built on top of the `lxml` library, it offers a variety of methods for precise element extraction based on various criteria.

## Features
- Fetch multiple elements or a single element using a custom XPath query.
- Predefined methods for common XPath queries like selecting by tag, attribute, text, etc.
- Simple, user-friendly, and Pythonic API.



## Usage

### Initialization
Create an instance of the `XpathParser` class with your HTML/XML content:

```python
from cd_parser.xpath_parser import XpathParser

doc_text = """
<html>
    <body>
        <a id="link1" href="https://example.com/page1">Link 1</a>
        <a id="link2" href="https://example.com/page2">Link 2</a>
    </body>
</html>
"""

parser = XpathParser(doc_text)
```

### Fetch Elements

Using custom XPath:
```python
links = parser.get_elements('//a')
print([link.text for link in links])
```

Get a single element (the first match):
```python
single_link = parser.get_element('//*[@id="link1"]')
if single_link:
    print(single_link.text)
```

### Predefined Queries

Select all nodes:
```python
all_nodes = parser.select_all_nodes()
```

Select by tag:
```python
anchors = parser.select_by_tag("a")
```

Select by attribute:
```python
divs_with_class = parser.select_by_class("div", "my-class")
```

... and many more. Refer to the class docstrings for details on each method.

## Contributing
Feel free to fork the repository, make your changes, and submit pull requests. We appreciate all contributions!

---

Please note:
1. The filename `xpath_parser.py` is assumed in the usage example. Adjust it accordingly if you're using a different filename.
2. Modify sections like "Contributing" as per your actual project needs and repository policies. This is a generic template to help you get started.


License

MIT License

More documentation at:
[Code Docta](https://codedocta.com "Code Docta")

            

Raw data

            {
    "_id": null,
    "home_page": "https://codedocta.com",
    "name": "CD-Parser",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "codedocta",
    "author_email": "codedocta@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b8/08/55edc92b97c7af756f87dc46e30bb54ea27384b19f6338d6c0cf0f4284d4/CD_Parser-0.1.2.tar.gz",
    "platform": null,
    "description": "\r\n# RegexParser\r\n\r\n## Installation for both xpath, regex parsers\r\n\r\nBefore you start, ensure you have `lxml` library installed:\r\n```bash\r\npip install cd-parser\r\n```\r\n\r\n\r\nA utility class for commonly used regex operations in Python.\r\n\r\n## Features\r\n\r\n- **Replace**: Easily replace occurrences of a regex pattern with a new string.\r\n- **Find All**: Retrieve all occurrences of a regex pattern in a string.\r\n- **Find First**: Get the first occurrence of a regex pattern in a string.\r\n- **Find Before**: Extract the portion of text immediately before a given substring.\r\n- **Find After**: Fetch the portion of text immediately after a given substring.\r\n- **Find Between**: Find text between two specified substrings.\r\n- **Is Match**: Check if the input text matches a given regex pattern from the start.\r\n- **Split**: Divide the input text using a provided regex pattern.\r\n\r\n## Usage\r\n\r\nHere are some example usages of the `RegexParser` class:\r\n\r\n```python\r\nfrom cd_parser.regex_parser import RegexParser\r\n\r\n\r\n# Replace text\r\nmodified_text = RegexParser.replace(\"old\", \"new\", \"This is an old text.\")\r\nprint(modified_text)  # Output: \"This is a new text.\"\r\n\r\n# Find all matches\r\nmatches = RegexParser.find_all(\"[A-Za-z]+\", \"123 apple 456 banana\")\r\nprint(matches)  # Output: ['apple', 'banana']\r\n\r\n# ... [You can add more examples for other methods]\r\n```\r\n\r\n\r\n## Contributing\r\n\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\n[MIT](https://choosealicense.com/licenses/mit/)\r\n\r\n\r\nAbsolutely. Here's a README.md file for the `XpathParser` class:\r\n\r\n---\r\n\r\n# XpathParser\r\n\r\nA simple and lightweight XPath parser class for extracting data from HTML/XML content. Built on top of the `lxml` library, it offers a variety of methods for precise element extraction based on various criteria.\r\n\r\n## Features\r\n- Fetch multiple elements or a single element using a custom XPath query.\r\n- Predefined methods for common XPath queries like selecting by tag, attribute, text, etc.\r\n- Simple, user-friendly, and Pythonic API.\r\n\r\n\r\n\r\n## Usage\r\n\r\n### Initialization\r\nCreate an instance of the `XpathParser` class with your HTML/XML content:\r\n\r\n```python\r\nfrom cd_parser.xpath_parser import XpathParser\r\n\r\ndoc_text = \"\"\"\r\n<html>\r\n    <body>\r\n        <a id=\"link1\" href=\"https://example.com/page1\">Link 1</a>\r\n        <a id=\"link2\" href=\"https://example.com/page2\">Link 2</a>\r\n    </body>\r\n</html>\r\n\"\"\"\r\n\r\nparser = XpathParser(doc_text)\r\n```\r\n\r\n### Fetch Elements\r\n\r\nUsing custom XPath:\r\n```python\r\nlinks = parser.get_elements('//a')\r\nprint([link.text for link in links])\r\n```\r\n\r\nGet a single element (the first match):\r\n```python\r\nsingle_link = parser.get_element('//*[@id=\"link1\"]')\r\nif single_link:\r\n    print(single_link.text)\r\n```\r\n\r\n### Predefined Queries\r\n\r\nSelect all nodes:\r\n```python\r\nall_nodes = parser.select_all_nodes()\r\n```\r\n\r\nSelect by tag:\r\n```python\r\nanchors = parser.select_by_tag(\"a\")\r\n```\r\n\r\nSelect by attribute:\r\n```python\r\ndivs_with_class = parser.select_by_class(\"div\", \"my-class\")\r\n```\r\n\r\n... and many more. Refer to the class docstrings for details on each method.\r\n\r\n## Contributing\r\nFeel free to fork the repository, make your changes, and submit pull requests. We appreciate all contributions!\r\n\r\n---\r\n\r\nPlease note:\r\n1. The filename `xpath_parser.py` is assumed in the usage example. Adjust it accordingly if you're using a different filename.\r\n2. Modify sections like \"Contributing\" as per your actual project needs and repository policies. This is a generic template to help you get started.\r\n\r\n\r\nLicense\r\n\r\nMIT License\r\n\r\nMore documentation at:\r\n[Code Docta](https://codedocta.com \"Code Docta\")\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "An wrapper around the wonderful re and lmxl libraries to make it easier for new users and old. To scrape pages",
    "version": "0.1.2",
    "project_urls": {
        "Bug Reports": "https://github.com/codedocta/CD_Parser/issues",
        "Homepage": "https://codedocta.com",
        "Source": "https://github.com/codedocta/CD_Parser/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60f8aa0eabd7fa1eab03a5ad74c6038356cb87b5b5d1d554a87105d30c9a658e",
                "md5": "d0c7d749ff742f672e4a22d1c6c63cff",
                "sha256": "f07d6f5fbea13e5bef58522ca084090afb926ccff69c28bdba784412cb98fc1a"
            },
            "downloads": -1,
            "filename": "CD_Parser-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0c7d749ff742f672e4a22d1c6c63cff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7026,
            "upload_time": "2023-10-27T20:36:50",
            "upload_time_iso_8601": "2023-10-27T20:36:50.372231Z",
            "url": "https://files.pythonhosted.org/packages/60/f8/aa0eabd7fa1eab03a5ad74c6038356cb87b5b5d1d554a87105d30c9a658e/CD_Parser-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b80855edc92b97c7af756f87dc46e30bb54ea27384b19f6338d6c0cf0f4284d4",
                "md5": "bdb4b9f6f378fdd68a4a483febea795b",
                "sha256": "2faac62d3f80c616824cca1517098eb9f67dec6fa9650f7f8e946110d3b56603"
            },
            "downloads": -1,
            "filename": "CD_Parser-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "bdb4b9f6f378fdd68a4a483febea795b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5582,
            "upload_time": "2023-10-27T20:36:52",
            "upload_time_iso_8601": "2023-10-27T20:36:52.898713Z",
            "url": "https://files.pythonhosted.org/packages/b8/08/55edc92b97c7af756f87dc46e30bb54ea27384b19f6338d6c0cf0f4284d4/CD_Parser-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-27 20:36:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "codedocta",
    "github_project": "CD_Parser",
    "github_not_found": true,
    "lcname": "cd-parser"
}
        
Elapsed time: 0.14491s