WrapperXSelector


NameWrapperXSelector JSON
Version 0.1.39 PyPI version JSON
download
home_page
SummaryThe WrapperXSelector[Crow] is a Python-based tool that streamlines web scraping and data extraction tasks through the creation and utilization of wrappers.
upload_time2024-01-05 10:32:49
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords scraping xpath wrapper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # WrapperXSelector(CroW)

[![PyPI Version](https://img.shields.io/pypi/v/WrapperXSelector.svg)](https://pypi.org/project/WrapperXSelector/)
[![Python Versions](https://img.shields.io/pypi/pyversions/WrapperXSelector.svg)](https://pypi.org/project/WrapperXSelector/)
[![License](https://img.shields.io/pypi/l/WrapperXSelector.svg)](https://opensource.org/licenses/MIT)

The WrapperXSelector(CroW) is a Python-based tool that streamlines web scraping and data extraction tasks through the creation and utilization of wrappers. 
Leveraging Selenium for web automation and BeautifulSoup for HTML parsing, the project empowers users to effortlessly set up wrappers for tables and general 
data on web pages. With functions like setTableWrapper and setGeneralWrapper, users can define the structure of data extraction, while getWrapperData 
facilitates the retrieval of data based on these wrappers. Additionally, the project offers functionality for listing existing wrappers with listWrappers. 
Overall, the Web Wrapper Project serves as a versatile solution for simplifying web scraping workflows and enhancing data extraction efficiency in Python.



**Important Note:**

**1. The Select operation becomes functional exclusively upon the application of a right mouse click.**

**2. Please make sure Chrome is installed on your machine.**




## Table of Contents

- [Installation](#installation)
  - [Requirements](#requirements)
  - [Installation Steps](#installation-steps)
- [Usage](#usage)
  - [Setting up a Table Wrapper](#setting-up-a-table-wrapper)
  - [Setting up a General Wrapper](#setting-up-a-general-wrapper)
  - [Getting Wrapper Data](#getting-wrapper-data)
  - [Listing Wrappers](#listing-wrappers)
- [Dependencies](#dependencies)
- [Disclaimer](#disclaimer)

## Installation

### Requirements

- Python 3.x
- Chrome browser (for Selenium)

### Installation Steps

You can install the Web Wrapper Project using pip:

```bash
pip install WrapperXSelector
```

## Usage

### Setting up a Table Wrapper

To set up a table wrapper, use the `setTableWrapper` function. This function automates the process of creating a web scraping wrapper for a table on a specified web page.

#### Function Input:

- **URL (string):** The URL of the web page containing the table you want to scrape.
- **Wrapper Name (string, optional):** A custom name for the wrapper. If not provided, a unique name will be generated.

```python
from web_wrapper_project import setTableWrapper

# Example: Setting up a table wrapper for "https://example.com" with a custom name
result = setTableWrapper("https://example.com", wrapper_name="my_table_wrapper")
print(result)
```

#### Function Output:

The setTableWrapper function returns a tuple with information about the operation:

- **Success Flag (bool):** True if the operation was successful, False otherwise.
- **Wrapper Name (string):** The name assigned to the wrapper, either custom or auto-generated. None if unsuccessful.
- **Error Code (int or None):** If unsuccessful, an error code indicating the nature of the failure. None if successful.
- **Error Type (string or None):** The type of the raised exception (if any). None if successful.
- **Error Message (string or None):** A descriptive error message (if any). None if successful.

```python
# Example Output
# (True, 'my_table_wrapper_abc123.json', None, None, None)
```



### Setting up a General Wrapper

To set up a general wrapper, use the setGeneralWrapper function. This function facilitates the creation of a web scraping wrapper for a general web page structure.

#### Function Input:

- **URL (string):** The URL of the web page you want to scrape.
- **Wrapper Name (string, optional):** A custom name for the wrapper. If not provided, a unique name will be generated.
- **Repeat (string, optional):** Indicate whether to repeat the pattern. Options are 'yes' or 'no'. Default is 'no'.

```python
from web_wrapper_project import setGeneralWrapper

# Example: Setting up a general wrapper for "https://example.com" with a custom name and repeat pattern
result = setGeneralWrapper("https://example.com", wrapper_name="my_general_wrapper", repeat="yes")
print(result)
```

#### Function Output:

The setGeneralWrapper function returns a tuple with information about the operation:

- **Success Flag (bool):** True if the operation was successful, False otherwise. None if unsuccessful.
- **Wrapper Name (string):** The name assigned to the wrapper, either custom or auto-generated.
- **Error Code (int or None):** If unsuccessful, an error code indicating the nature of the failure. None if successful.
- **Error Type (string or None):** The type of the raised exception (if any). None if successful.
- **Error Message (string or None):** A descriptive error message (if any). None if successful.

```python
# Example Output
# (True, 'my_table_wrapper_abc123.json', None, None, None)
```




### Getting Wrapper Data

To retrieve data from a previously set up wrapper, use the getWrapperData function. This function allows you to extract structured data from a web page based on the defined wrapper.

#### Function Input:

- **Wrapper Name (string):** The name of the wrapper from which to retrieve data.
- **Maximum Data Count:** The number of rows(Default value 100).
- **URL (string, optional):** The URL of the web page. If not provided, the URL from the original wrapper setup will be used.

```python
from web_wrapper_project import getWrapperData

# Example: Getting data from the wrapper named "my_table_wrapper"
result = getWrapperData("my_table_wrapper", 200, url="https://example.com")
print(result)
```

#### Function Output:

The getWrapperData function returns a tuple with information about the operation:

- **Success Flag (bool):** True if the operation was successful, False otherwise.
- **Data (list or string):** If successful, the structured data extracted from the web page based on the wrapper. If unsuccessful, an error message describing the issue.

```python
# Example Output
# (True, [['Column 1', 'Column 2'], ['Data 1', 'Data 2']])

# Example Output for Error
# (False, 'Permission denied: Unable to write to wrappers_5ece4797eaf5e')

```







### Listing Wrappers

To retrieve a list of all available wrappers, use the listWrappers function. This function provides the names of all wrappers that have been set up in the system.

#### Function Input:

None

```python
from web_wrapper_project import listWrappers

# Example: Listing all available wrappers
result = listWrappers()
print(result)
```

#### Function Output:

The listWrappers function returns a tuple with information about the operation:

- **Success Flag (bool):** True if the operation was successful, False otherwise.
- **Wrappers (list or string):** If successful, a list containing the names of all available wrappers. If unsuccessful, an error message describing the issue.

```python
# Example Output
# (True, ['wrapper1.json', 'wrapper2.json'])

# Example Output for Error
# (False, 'Permission denied: Unable to read the wrappers')

```


### Dependencies

- [Selenium](https://pypi.org/project/selenium/)
- [ChromeDriver](https://pypi.org/project/webdriver-manager/)
- [Beautifulsoup4](https://pypi.org/project/beautifulsoup4/)

### Disclaimer

Intended for educational and legal use only. Users must comply with the terms of service of scraped websites and applicable laws and regulations.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "WrapperXSelector",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "Scraping,Xpath,Wrapper",
    "author": "",
    "author_email": "Kallol Naha <kallolnaha@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d8/da/c33808e07bbefdbe344d0a3bb963009b0afe6cd9504ece2260a4ed0155ab/WrapperXSelector-0.1.39.tar.gz",
    "platform": null,
    "description": "# WrapperXSelector(CroW)\n\n[![PyPI Version](https://img.shields.io/pypi/v/WrapperXSelector.svg)](https://pypi.org/project/WrapperXSelector/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/WrapperXSelector.svg)](https://pypi.org/project/WrapperXSelector/)\n[![License](https://img.shields.io/pypi/l/WrapperXSelector.svg)](https://opensource.org/licenses/MIT)\n\nThe WrapperXSelector(CroW) is a Python-based tool that streamlines web scraping and data extraction tasks through the creation and utilization of wrappers. \nLeveraging Selenium for web automation and BeautifulSoup for HTML parsing, the project empowers users to effortlessly set up wrappers for tables and general \ndata on web pages. With functions like setTableWrapper and setGeneralWrapper, users can define the structure of data extraction, while getWrapperData \nfacilitates the retrieval of data based on these wrappers. Additionally, the project offers functionality for listing existing wrappers with listWrappers. \nOverall, the Web Wrapper Project serves as a versatile solution for simplifying web scraping workflows and enhancing data extraction efficiency in Python.\n\n\n\n**Important Note:**\n\n**1. The Select operation becomes functional exclusively upon the application of a right mouse click.**\n\n**2. Please make sure Chrome is installed on your machine.**\n\n\n\n\n## Table of Contents\n\n- [Installation](#installation)\n  - [Requirements](#requirements)\n  - [Installation Steps](#installation-steps)\n- [Usage](#usage)\n  - [Setting up a Table Wrapper](#setting-up-a-table-wrapper)\n  - [Setting up a General Wrapper](#setting-up-a-general-wrapper)\n  - [Getting Wrapper Data](#getting-wrapper-data)\n  - [Listing Wrappers](#listing-wrappers)\n- [Dependencies](#dependencies)\n- [Disclaimer](#disclaimer)\n\n## Installation\n\n### Requirements\n\n- Python 3.x\n- Chrome browser (for Selenium)\n\n### Installation Steps\n\nYou can install the Web Wrapper Project using pip:\n\n```bash\npip install WrapperXSelector\n```\n\n## Usage\n\n### Setting up a Table Wrapper\n\nTo set up a table wrapper, use the `setTableWrapper` function. This function automates the process of creating a web scraping wrapper for a table on a specified web page.\n\n#### Function Input:\n\n- **URL (string):** The URL of the web page containing the table you want to scrape.\n- **Wrapper Name (string, optional):** A custom name for the wrapper. If not provided, a unique name will be generated.\n\n```python\nfrom web_wrapper_project import setTableWrapper\n\n# Example: Setting up a table wrapper for \"https://example.com\" with a custom name\nresult = setTableWrapper(\"https://example.com\", wrapper_name=\"my_table_wrapper\")\nprint(result)\n```\n\n#### Function Output:\n\nThe setTableWrapper function returns a tuple with information about the operation:\n\n- **Success Flag (bool):** True if the operation was successful, False otherwise.\n- **Wrapper Name (string):** The name assigned to the wrapper, either custom or auto-generated. None if unsuccessful.\n- **Error Code (int or None):** If unsuccessful, an error code indicating the nature of the failure. None if successful.\n- **Error Type (string or None):** The type of the raised exception (if any). None if successful.\n- **Error Message (string or None):** A descriptive error message (if any). None if successful.\n\n```python\n# Example Output\n# (True, 'my_table_wrapper_abc123.json', None, None, None)\n```\n\n\n\n### Setting up a General Wrapper\n\nTo set up a general wrapper, use the setGeneralWrapper function. This function facilitates the creation of a web scraping wrapper for a general web page structure.\n\n#### Function Input:\n\n- **URL (string):** The URL of the web page you want to scrape.\n- **Wrapper Name (string, optional):** A custom name for the wrapper. If not provided, a unique name will be generated.\n- **Repeat (string, optional):** Indicate whether to repeat the pattern. Options are 'yes' or 'no'. Default is 'no'.\n\n```python\nfrom web_wrapper_project import setGeneralWrapper\n\n# Example: Setting up a general wrapper for \"https://example.com\" with a custom name and repeat pattern\nresult = setGeneralWrapper(\"https://example.com\", wrapper_name=\"my_general_wrapper\", repeat=\"yes\")\nprint(result)\n```\n\n#### Function Output:\n\nThe setGeneralWrapper function returns a tuple with information about the operation:\n\n- **Success Flag (bool):** True if the operation was successful, False otherwise. None if unsuccessful.\n- **Wrapper Name (string):** The name assigned to the wrapper, either custom or auto-generated.\n- **Error Code (int or None):** If unsuccessful, an error code indicating the nature of the failure. None if successful.\n- **Error Type (string or None):** The type of the raised exception (if any). None if successful.\n- **Error Message (string or None):** A descriptive error message (if any). None if successful.\n\n```python\n# Example Output\n# (True, 'my_table_wrapper_abc123.json', None, None, None)\n```\n\n\n\n\n### Getting Wrapper Data\n\nTo retrieve data from a previously set up wrapper, use the getWrapperData function. This function allows you to extract structured data from a web page based on the defined wrapper.\n\n#### Function Input:\n\n- **Wrapper Name (string):** The name of the wrapper from which to retrieve data.\n- **Maximum Data Count:** The number of rows(Default value 100).\n- **URL (string, optional):** The URL of the web page. If not provided, the URL from the original wrapper setup will be used.\n\n```python\nfrom web_wrapper_project import getWrapperData\n\n# Example: Getting data from the wrapper named \"my_table_wrapper\"\nresult = getWrapperData(\"my_table_wrapper\", 200, url=\"https://example.com\")\nprint(result)\n```\n\n#### Function Output:\n\nThe getWrapperData function returns a tuple with information about the operation:\n\n- **Success Flag (bool):** True if the operation was successful, False otherwise.\n- **Data (list or string):** If successful, the structured data extracted from the web page based on the wrapper. If unsuccessful, an error message describing the issue.\n\n```python\n# Example Output\n# (True, [['Column 1', 'Column 2'], ['Data 1', 'Data 2']])\n\n# Example Output for Error\n# (False, 'Permission denied: Unable to write to wrappers_5ece4797eaf5e')\n\n```\n\n\n\n\n\n\n\n### Listing Wrappers\n\nTo retrieve a list of all available wrappers, use the listWrappers function. This function provides the names of all wrappers that have been set up in the system.\n\n#### Function Input:\n\nNone\n\n```python\nfrom web_wrapper_project import listWrappers\n\n# Example: Listing all available wrappers\nresult = listWrappers()\nprint(result)\n```\n\n#### Function Output:\n\nThe listWrappers function returns a tuple with information about the operation:\n\n- **Success Flag (bool):** True if the operation was successful, False otherwise.\n- **Wrappers (list or string):** If successful, a list containing the names of all available wrappers. If unsuccessful, an error message describing the issue.\n\n```python\n# Example Output\n# (True, ['wrapper1.json', 'wrapper2.json'])\n\n# Example Output for Error\n# (False, 'Permission denied: Unable to read the wrappers')\n\n```\n\n\n### Dependencies\n\n- [Selenium](https://pypi.org/project/selenium/)\n- [ChromeDriver](https://pypi.org/project/webdriver-manager/)\n- [Beautifulsoup4](https://pypi.org/project/beautifulsoup4/)\n\n### Disclaimer\n\nIntended for educational and legal use only. Users must comply with the terms of service of scraped websites and applicable laws and regulations.\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "The WrapperXSelector[Crow] is a Python-based tool that streamlines web scraping and data extraction tasks through the creation and utilization of wrappers.",
    "version": "0.1.39",
    "project_urls": {
        "Homepage": "https://pypi.org/project/WrapperXSelector/",
        "Issues": "https://pypi.org/project/WrapperXSelector/#issues"
    },
    "split_keywords": [
        "scraping",
        "xpath",
        "wrapper"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60446d6d727d444c6d677d9dce8ebc35131a9ebb75a0462a90aa5a86ae98f8cd",
                "md5": "ed7ac5639419967128fdbb250a7fa6ef",
                "sha256": "1083bd536aa148a0b66e9599639c644c6c6fc62f3e0bdaec93ffa7a35851fbec"
            },
            "downloads": -1,
            "filename": "WrapperXSelector-0.1.39-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed7ac5639419967128fdbb250a7fa6ef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 83161,
            "upload_time": "2024-01-05T10:32:47",
            "upload_time_iso_8601": "2024-01-05T10:32:47.935528Z",
            "url": "https://files.pythonhosted.org/packages/60/44/6d6d727d444c6d677d9dce8ebc35131a9ebb75a0462a90aa5a86ae98f8cd/WrapperXSelector-0.1.39-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d8dac33808e07bbefdbe344d0a3bb963009b0afe6cd9504ece2260a4ed0155ab",
                "md5": "4f3042dcfcae3cd2d74fd4bbfe3c0531",
                "sha256": "f084b21587baac19b8d7ac33b63c13d549cb3ac45a2885da6ba18f783d4aa1d3"
            },
            "downloads": -1,
            "filename": "WrapperXSelector-0.1.39.tar.gz",
            "has_sig": false,
            "md5_digest": "4f3042dcfcae3cd2d74fd4bbfe3c0531",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 82065,
            "upload_time": "2024-01-05T10:32:49",
            "upload_time_iso_8601": "2024-01-05T10:32:49.912801Z",
            "url": "https://files.pythonhosted.org/packages/d8/da/c33808e07bbefdbe344d0a3bb963009b0afe6cd9504ece2260a4ed0155ab/WrapperXSelector-0.1.39.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-05 10:32:49",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "wrapperxselector"
}
        
Elapsed time: 0.20380s