pictureTextCrop


NamepictureTextCrop JSON
Version 0.6.1 PyPI version JSON
download
home_page
SummaryInteractive extraction of selected text from images and batch processing of stored image files.
upload_time2024-01-04 05:34:56
maintainer
docs_urlNone
author
requires_python>=3.6
license
keywords image text extraction text extraction screenshot
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pictureTextCrop

This application can be run by installing the python package dependencies using pip and then simply unzipping the source archive, pictureTextCrop.tar.gz into a new directory that you create to contain the application and its files.  The python packages required, available at PyPI.org, include one rather large one, PyQt5.  The complete list of packages required is: PyQt5, pytesseract, and Pillow, which is included as a library module in current releases of Python.

Installation Steps:

1.    Download the source code archive from this account: PictureTextCrop-main.zip.
2.    Put it in a folder you make to host this application.
3.    Extract the source code archive in its application folder.
4.    Go into the extrated folder, then into the src/ folder.  You will find the main python file: pictureTextCrop.py.  This is the file you will run in a console.
      Open a console / terminal in this folder.  This is the working folder for the application.
6.    Check your python version with python -V or python3 -V.  If these do not work try the -h option for a list of options available.  The version must be 3.6 or higher.
        (Installing python is beyond the scope of this Quick Start guide.)
7.    If the python package installer, pip, is not installed, execute the following command in the console:<br>
  
          sudo apt install python3-pip
9.    To install the python GUI package required, execute:
   
          pip install PyQt5
11.    Then to install the text extraction package, run:
   
           pip install pytesseract
   
      then
      
          sudo apt install tesseract_ocr
13.    Although sqlite3 is provided in python as a library, you will want to use it outside of this context and will want a GUI viewer for the database file:
   
           sudo apt install sqlite3
   
           sudo apt install squlitebrowser
   

Once these steps succeed, go into the PictureTextCrop-main folder, then into its src/ folder, and locate the pictureTextCrop.py source file. To start the application, execute the following on the command line:

$ python3 pictureTextCrop.py


Introduction:

This application is a simple tool which allows you to interactively extract text from image files by dragging from the top-left corner of a crop rectangle to the bottom-right corner. When you release the mouse button the text inside the rectangle is printed to the console and a record of the conversion and its result is made in the SQLite3 database, TextExtraction.db, in the CropLog table. Each record of this table records the date and time of the extraction in its timeStamp field, the full path to the file on your disk of the image file in the filePath field, the coordinates of the crop in the coordinates field, and the text of the crop in the text field. The value of storing all extractions in a database table is simply that you can then search large numbers of images containing particular text using SQL.

The image formats recognized are those recognized by the PyQy5 QImageReader class, which includes those associated with the following file extensions:

    'bmp', 'cur', 'gif', 'icns', 'ico', 'jpeg', 'jpg', 'mng', 'pbm', 'pgm', 'png', 'ppm', 'svg', 'svgz', 'tga', 'tif', 'tiff', 'wbmp', 'webp', 'xbm', 'xpm'

The application main window displays a list on the left which will contain the paths of the image files identified when you select folders to scan.  In the right half of the dialog will be a space which shows the image selected when you click on a file path in the list.  The dialog can be resized to accommodate your needs.

To extract text from an image, select the checkbox located to the left of it and click on the "Select Image Text" button in the button-bar at the top. You can select multiple files and a text selection dialog will be stacked up for each. Press the left mouse button at the top left of a rectangle containing the text you want to extract and then drag to the lower right corner. When you release the button the extracted text is printed to the console and a record of the extraction and its result is made in the SQLite3 database. You can do this as many times as you like for each image. A separate CropLog record will be made for each.

Batch Mode:

If you click on the "Batch Process" button in the tool bar the application will extract all of the text from each of the files in the folder scan list. This process can take some time, my own trials on my ordinary laptop resulting in a pace of more than 5 seconds per image, sometimes 15 or so. With only 50 images, the process could take up to 10 minutes. The resulting extracted text will be placed into the BatchMaster table in the SQLite3 database. The fields written in each record in include the extraction TimeStamp, the FolderPath, the FileName, the Text, and the file metadata in the Info field. Views for reviewing the file metadata and other components saved in convenient formats are planned.


MIME Information Collection:

Coming Soon.

When processing digital evidence you must be careful to preserve the file metadata, especially the time stamps. 
Know how operating systems treat timestamps and permissions when you copy files and how your archiving applications do as well. 


If you would like to support this project, you can contribute here: https://www.buymeacoffee.com/keithmichah


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pictureTextCrop",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "Keith Watson <keith.watson.py@gmail.com>",
    "keywords": "image,text,extraction,text extraction,screenshot",
    "author": "",
    "author_email": "Keith Watson <georgekeithwatson@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f6/47/3fbdaab19136ac8ed4e0bf694e51530b91c24d416143ade96727e8ceb379/pictureTextCrop-0.6.1.tar.gz",
    "platform": null,
    "description": "# pictureTextCrop\n\nThis application can be run by installing the python package dependencies using pip and then simply unzipping the source archive, pictureTextCrop.tar.gz into a new directory that you create to contain the application and its files.  The python packages required, available at PyPI.org, include one rather large one, PyQt5.  The complete list of packages required is: PyQt5, pytesseract, and Pillow, which is included as a library module in current releases of Python.\n\nInstallation Steps:\n\n1.    Download the source code archive from this account: PictureTextCrop-main.zip.\n2.    Put it in a folder you make to host this application.\n3.    Extract the source code archive in its application folder.\n4.    Go into the extrated folder, then into the src/ folder.  You will find the main python file: pictureTextCrop.py.  This is the file you will run in a console.\n      Open a console / terminal in this folder.  This is the working folder for the application.\n6.    Check your python version with python -V or python3 -V.  If these do not work try the -h option for a list of options available.  The version must be 3.6 or higher.\n        (Installing python is beyond the scope of this Quick Start guide.)\n7.    If the python package installer, pip, is not installed, execute the following command in the console:<br>\n  \n          sudo apt install python3-pip\n9.    To install the python GUI package required, execute:\n   \n          pip install PyQt5\n11.    Then to install the text extraction package, run:\n   \n           pip install pytesseract\n   \n      then\n      \n          sudo apt install tesseract_ocr\n13.    Although sqlite3 is provided in python as a library, you will want to use it outside of this context and will want a GUI viewer for the database file:\n   \n           sudo apt install sqlite3\n   \n           sudo apt install squlitebrowser\n   \n\nOnce these steps succeed, go into the PictureTextCrop-main folder, then into its src/ folder, and locate the pictureTextCrop.py source file. To start the application, execute the following on the command line:\n\n$ python3 pictureTextCrop.py\n\n\nIntroduction:\n\nThis application is a simple tool which allows you to interactively extract text from image files by dragging from the top-left corner of a crop rectangle to the bottom-right corner. When you release the mouse button the text inside the rectangle is printed to the console and a record of the conversion and its result is made in the SQLite3 database, TextExtraction.db, in the CropLog table. Each record of this table records the date and time of the extraction in its timeStamp field, the full path to the file on your disk of the image file in the filePath field, the coordinates of the crop in the coordinates field, and the text of the crop in the text field. The value of storing all extractions in a database table is simply that you can then search large numbers of images containing particular text using SQL.\n\nThe image formats recognized are those recognized by the PyQy5 QImageReader class, which includes those associated with the following file extensions:\n\n    'bmp', 'cur', 'gif', 'icns', 'ico', 'jpeg', 'jpg', 'mng', 'pbm', 'pgm', 'png', 'ppm', 'svg', 'svgz', 'tga', 'tif', 'tiff', 'wbmp', 'webp', 'xbm', 'xpm'\n\nThe application main window displays a list on the left which will contain the paths of the image files identified when you select folders to scan.  In the right half of the dialog will be a space which shows the image selected when you click on a file path in the list.  The dialog can be resized to accommodate your needs.\n\nTo extract text from an image, select the checkbox located to the left of it and click on the \"Select Image Text\" button in the button-bar at the top. You can select multiple files and a text selection dialog will be stacked up for each. Press the left mouse button at the top left of a rectangle containing the text you want to extract and then drag to the lower right corner. When you release the button the extracted text is printed to the console and a record of the extraction and its result is made in the SQLite3 database. You can do this as many times as you like for each image. A separate CropLog record will be made for each.\n\nBatch Mode:\n\nIf you click on the \"Batch Process\" button in the tool bar the application will extract all of the text from each of the files in the folder scan list. This process can take some time, my own trials on my ordinary laptop resulting in a pace of more than 5 seconds per image, sometimes 15 or so. With only 50 images, the process could take up to 10 minutes. The resulting extracted text will be placed into the BatchMaster table in the SQLite3 database. The fields written in each record in include the extraction TimeStamp, the FolderPath, the FileName, the Text, and the file metadata in the Info field. Views for reviewing the file metadata and other components saved in convenient formats are planned.\n\n\nMIME Information Collection:\n\nComing Soon.\n\nWhen processing digital evidence you must be careful to preserve the file metadata, especially the time stamps. \nKnow how operating systems treat timestamps and permissions when you copy files and how your archiving applications do as well. \n\n\nIf you would like to support this project, you can contribute here: https://www.buymeacoffee.com/keithmichah\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Interactive extraction of selected text from images and batch processing of stored image files.",
    "version": "0.6.1",
    "project_urls": {
        "Homepage": "https://github.com/keithCollins77093/PictureTextCrop"
    },
    "split_keywords": [
        "image",
        "text",
        "extraction",
        "text extraction",
        "screenshot"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cad02d631682bbff56ca784b18b5dff38d6c10cc846f9f532aca3e5741429ee8",
                "md5": "6d2945e2a70fd8394935eca3ee6307a9",
                "sha256": "720e5f68500e396ad0cd61feff8c17df4fb528a105dfe9e6f52016feec143515"
            },
            "downloads": -1,
            "filename": "pictureTextCrop-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d2945e2a70fd8394935eca3ee6307a9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 223696,
            "upload_time": "2024-01-04T05:34:54",
            "upload_time_iso_8601": "2024-01-04T05:34:54.061742Z",
            "url": "https://files.pythonhosted.org/packages/ca/d0/2d631682bbff56ca784b18b5dff38d6c10cc846f9f532aca3e5741429ee8/pictureTextCrop-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6473fbdaab19136ac8ed4e0bf694e51530b91c24d416143ade96727e8ceb379",
                "md5": "242d1036a07c49b535bedfa34374b79e",
                "sha256": "dfbe3c8d05b0c4e71e7d6d0ba176f5872989cea4ffcb10c018e37cfcfb91faa4"
            },
            "downloads": -1,
            "filename": "pictureTextCrop-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "242d1036a07c49b535bedfa34374b79e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 212606,
            "upload_time": "2024-01-04T05:34:56",
            "upload_time_iso_8601": "2024-01-04T05:34:56.139994Z",
            "url": "https://files.pythonhosted.org/packages/f6/47/3fbdaab19136ac8ed4e0bf694e51530b91c24d416143ade96727e8ceb379/pictureTextCrop-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-04 05:34:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "keithCollins77093",
    "github_project": "PictureTextCrop",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "picturetextcrop"
}
        
Elapsed time: 0.15738s