Web-page-Screenshot-Segmentation

Name	Web-page-Screenshot-Segmentation JSON
Version	1.0.4 JSON
	download
home_page
Summary	Divide long web page screenshots into blocks to input models with shorter contexts. 将长网页截图进行区块分割，用于输入上下文较短的模型
upload_time	2024-01-23 03:41:42
maintainer
docs_url	None
author
requires_python	>=3.9
license
keywords	opencv website image segementation screenshot gpt
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![PyPI - Version](https://img.shields.io/pypi/v/Web_page_Screenshot_Segmentation)](https://pypi.org/project/Web_page_Screenshot_Segmentation/) [![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/Tim-Saijun/Web-page-Screenshot-Segmentation/python-publish.yml)](https://github.com/Tim-Saijun/Web-page-Screenshot-Segmentation/actions/workflows/python-publish.yml)[![PyPI - License](https://img.shields.io/pypi/l/Web_page_Screenshot_Segmentation)](https://pypi.org/project/Web_page_Screenshot_Segmentation/)   [![Static Badge](https://img.shields.io/badge/%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87-8A2BE2)](README-ZH.md) [![Static Badge](https://img.shields.io/badge/English-blue)](README.md)

## Introduction
This project is used to split the long screenshot of web pages into several parts based on the height of the text. The main idea is to find the low variation region of the image, and then find the split line in the low variation region. 
![The Red lines are split lines ](images/demo.png)
The output are small but complete images of the web page, which can be used to generate web pages using [Screen-to-code](https://github.com/abi/screenshot-to-code) or to train models.
More results can be found in the [images](images) directory.

## Getting started
### Install 
```bash
 pip install Web-page-Screenshot-Segmentation
```

## Using in the command line
### Obtain the height of the split line of the image
```bash
python -m Web_page_Screenshot_Segmentation.master -f "path/to/img"
```
The output looks like this: ` [6, 868, 1912, 2672, 3568, 4444, 5124, 6036, 7698] `. It is the height list of the split line of the image.

If you want to check the split line on the image, you can use the following command:
```bash
python -m Web_page_Screenshot_Segmentation.master -f "path/to/img" -s True
```
Then you can get the path to the result image.

### Draw the split lines on the image
```bash
python -m Web_page_Screenshot_Segmentation.drawer --image_file path/to/image.jpg --hl [100,200] --color (0,255,0)
```

### Split the image
```bash
python -m Web_page_Screenshot_Segmentation.spliter --f path/to/image.jpg -ht "[233,456]"
```
You will get the split image at the path returned by the command.

For details, please refer to the help information
```bash
python -m Web_page_Screenshot_Segmentation.master --help
python -m Web_page_Screenshot_Segmentation.drawer --help
python -m Web_page_Screenshot_Segmentation.spliter --help
```

## Using from the Source Code
 
### split_heights function

The `split_heights` function is used to split an image into several parts based on various thresholds. It takes the following parameters:

- `file_path`: The path of the image file.
- `split`: A boolean indicating whether to split the image.
- `height_threshold`: The height threshold of the low variation region.
- `variation_threshold`: The variation threshold of the low variation region.
- `color_threshold`: The threshold of the color difference.
- `color_variation_threshold`: The threshold of the color difference variation.
- `merge_threshold`: The threshold of the least distance between two lines.

The function returns a list of heights of the split lines if `split` is `False`, or the path of the split image if `split` is `True`.

#### Example usage

```python
import Web_page_Screenshot_Segmentation
from Web_page_Screenshot_Segmentation.master import split_heights

# Split the image at 'path/to/image.jpg' into several parts
split_image_path = split_heights(
    file_path='path/to/image.jpg',
    split=True,
    height_threshold=102,
    variation_threshold=0.5,
    color_threshold=100,
    color_variation_threshold=15,
    merge_threshold=350
)

print(f"The split image is saved at {split_image_path}")
```

In this example, the image at 'path/to/image.jpg' is split into several parts based on the provided thresholds. The split image is saved at the path returned by the function.


### draw_line_from_file function

The `draw_line_from_file` function is used to draw lines on an image at specified heights. It takes the following parameters:

- `image_file`: The path of the image file.
- `heights`: A list of heights at which to draw the lines.
- `color`: The color of the lines to be drawn. The default color is red `(0, 0, 255)`.

The function reads the image from the provided file path, draws lines at the specified heights, and then saves the modified image to a new file. The new file is saved in the `result` directory with the same name as the original file, but with 'result' appended before the file extension.

If the function encounters an error while reading the image file (for example, if the file path contains '.' or Chinese characters), it raises an exception.

#### Example usage

```python
import Web_page_Screenshot_Segmentation
from Web_page_Screenshot_Segmentation.spliter import draw_line_from_file

# Draw lines on the image at 'path/to/image.jpg' at heights 100 and 200
result_image_path = draw_line_from_file(
    image_file='path/to/image.jpg',
    heights=[100, 200],
    color=(0, 255, 0)  # Draw the lines in green
)

print(f"The modified image is saved at {result_image_path}")
```

In this example, the image at 'path/to/image.jpg' is modified by drawing green lines at heights 100 and 200. The modified image is saved at the path returned by the function.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "Web-page-Screenshot-Segmentation",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "opencv,website,image segementation,screenshot,gpt",
    "author": "",
    "author_email": "Tim Saijun <code@zair.top>",
    "download_url": "https://files.pythonhosted.org/packages/7e/37/c1a2f89b0c277ce74bca0054ba56480a2f06578e50d5d63657b0d229e568/Web_page_Screenshot_Segmentation-1.0.4.tar.gz",
    "platform": null,
    "description": "[![PyPI - Version](https://img.shields.io/pypi/v/Web_page_Screenshot_Segmentation)](https://pypi.org/project/Web_page_Screenshot_Segmentation/) [![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/Tim-Saijun/Web-page-Screenshot-Segmentation/python-publish.yml)](https://github.com/Tim-Saijun/Web-page-Screenshot-Segmentation/actions/workflows/python-publish.yml)[![PyPI - License](https://img.shields.io/pypi/l/Web_page_Screenshot_Segmentation)](https://pypi.org/project/Web_page_Screenshot_Segmentation/)   [![Static Badge](https://img.shields.io/badge/%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87-8A2BE2)](README-ZH.md) [![Static Badge](https://img.shields.io/badge/English-blue)](README.md)\n\n## Introduction\nThis project is used to split the long screenshot of web pages into several parts based on the height of the text. The main idea is to find the low variation region of the image, and then find the split line in the low variation region. \n![The Red lines are split lines ](images/demo.png)\nThe output are small but complete images of the web page, which can be used to generate web pages using [Screen-to-code](https://github.com/abi/screenshot-to-code) or to train models.\nMore results can be found in the [images](images) directory.\n\n## Getting started\n### Install \n```bash\n pip install Web-page-Screenshot-Segmentation\n```\n\n## Using in the command line\n### Obtain the height of the split line of the image\n```bash\npython -m Web_page_Screenshot_Segmentation.master -f \"path/to/img\"\n```\nThe output looks like this: ` [6, 868, 1912, 2672, 3568, 4444, 5124, 6036, 7698] `. It is the height list of the split line of the image.\n\nIf you want to check the split line on the image, you can use the following command:\n```bash\npython -m Web_page_Screenshot_Segmentation.master -f \"path/to/img\" -s True\n```\nThen you can get the path to the result image.\n\n### Draw the split lines on the image\n```bash\npython -m Web_page_Screenshot_Segmentation.drawer --image_file path/to/image.jpg --hl [100,200] --color (0,255,0)\n```\n\n### Split the image\n```bash\npython -m Web_page_Screenshot_Segmentation.spliter --f path/to/image.jpg -ht \"[233,456]\"\n```\nYou will get the split image at the path returned by the command.\n\nFor details, please refer to the help information\n```bash\npython -m Web_page_Screenshot_Segmentation.master --help\npython -m Web_page_Screenshot_Segmentation.drawer --help\npython -m Web_page_Screenshot_Segmentation.spliter --help\n```\n\n## Using from the Source Code\n \n### split_heights function\n\nThe `split_heights` function is used to split an image into several parts based on various thresholds. It takes the following parameters:\n\n- `file_path`: The path of the image file.\n- `split`: A boolean indicating whether to split the image.\n- `height_threshold`: The height threshold of the low variation region.\n- `variation_threshold`: The variation threshold of the low variation region.\n- `color_threshold`: The threshold of the color difference.\n- `color_variation_threshold`: The threshold of the color difference variation.\n- `merge_threshold`: The threshold of the least distance between two lines.\n\nThe function returns a list of heights of the split lines if `split` is `False`, or the path of the split image if `split` is `True`.\n\n#### Example usage\n\n```python\nimport Web_page_Screenshot_Segmentation\nfrom Web_page_Screenshot_Segmentation.master import split_heights\n\n# Split the image at 'path/to/image.jpg' into several parts\nsplit_image_path = split_heights(\n    file_path='path/to/image.jpg',\n    split=True,\n    height_threshold=102,\n    variation_threshold=0.5,\n    color_threshold=100,\n    color_variation_threshold=15,\n    merge_threshold=350\n)\n\nprint(f\"The split image is saved at {split_image_path}\")\n```\n\nIn this example, the image at 'path/to/image.jpg' is split into several parts based on the provided thresholds. The split image is saved at the path returned by the function.\n\n\n### draw_line_from_file function\n\nThe `draw_line_from_file` function is used to draw lines on an image at specified heights. It takes the following parameters:\n\n- `image_file`: The path of the image file.\n- `heights`: A list of heights at which to draw the lines.\n- `color`: The color of the lines to be drawn. The default color is red `(0, 0, 255)`.\n\nThe function reads the image from the provided file path, draws lines at the specified heights, and then saves the modified image to a new file. The new file is saved in the `result` directory with the same name as the original file, but with 'result' appended before the file extension.\n\nIf the function encounters an error while reading the image file (for example, if the file path contains '.' or Chinese characters), it raises an exception.\n\n#### Example usage\n\n```python\nimport Web_page_Screenshot_Segmentation\nfrom Web_page_Screenshot_Segmentation.spliter import draw_line_from_file\n\n# Draw lines on the image at 'path/to/image.jpg' at heights 100 and 200\nresult_image_path = draw_line_from_file(\n    image_file='path/to/image.jpg',\n    heights=[100, 200],\n    color=(0, 255, 0)  # Draw the lines in green\n)\n\nprint(f\"The modified image is saved at {result_image_path}\")\n```\n\nIn this example, the image at 'path/to/image.jpg' is modified by drawing green lines at heights 100 and 200. The modified image is saved at the path returned by the function.\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Divide long web page screenshots into blocks to input models with shorter contexts. \u5c06\u957f\u7f51\u9875\u622a\u56fe\u8fdb\u884c\u533a\u5757\u5206\u5272\uff0c\u7528\u4e8e\u8f93\u5165\u4e0a\u4e0b\u6587\u8f83\u77ed\u7684\u6a21\u578b",
    "version": "1.0.4",
    "project_urls": {
        "Documentation": "https://tim-saijun.github.io/Web_page_Screenshot_Segmentation/",
        "Homepage": "https://github.com/Tim-Saijun/Web_page_Screenshot_Segmentation"
    },
    "split_keywords": [
        "opencv",
        "website",
        "image segementation",
        "screenshot",
        "gpt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "034f171760c1fae734d0c218d7a153f6079e332e9a00de3067d771b51728d46b",
                "md5": "180cc3cd93b5005d9410b766f1809001",
                "sha256": "5c943c53d3558c9f7c45086a25d8b2c71fb404deb2f421afb67964847b6af304"
            },
            "downloads": -1,
            "filename": "Web_page_Screenshot_Segmentation-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "180cc3cd93b5005d9410b766f1809001",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10296,
            "upload_time": "2024-01-23T03:41:41",
            "upload_time_iso_8601": "2024-01-23T03:41:41.478720Z",
            "url": "https://files.pythonhosted.org/packages/03/4f/171760c1fae734d0c218d7a153f6079e332e9a00de3067d771b51728d46b/Web_page_Screenshot_Segmentation-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7e37c1a2f89b0c277ce74bca0054ba56480a2f06578e50d5d63657b0d229e568",
                "md5": "5ba99dcd63a9cebdb6034d9a913f5358",
                "sha256": "cf3bfb4f6b775bcd6724cd6a1f7e6d3ed5eb6735108a11aeb2ce3e59856fe02f"
            },
            "downloads": -1,
            "filename": "Web_page_Screenshot_Segmentation-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "5ba99dcd63a9cebdb6034d9a913f5358",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 9628,
            "upload_time": "2024-01-23T03:41:42",
            "upload_time_iso_8601": "2024-01-23T03:41:42.484829Z",
            "url": "https://files.pythonhosted.org/packages/7e/37/c1a2f89b0c277ce74bca0054ba56480a2f06578e50d5d63657b0d229e568/Web_page_Screenshot_Segmentation-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-23 03:41:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Tim-Saijun",
    "github_project": "Web_page_Screenshot_Segmentation",
    "github_not_found": true,
    "lcname": "web-page-screenshot-segmentation"
}