ocraccuracyreporter


Nameocraccuracyreporter JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/lucidprogrammer/ocraccuracyreporter
SummaryOCR Accuracy Reporter
upload_time2018-02-16 02:24:48
maintainer
docs_urlNone
authorLucid Programmer
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ============
Overview
============

Your OCR pipeline may have various stages and may use various tools.
You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.

=========
Usage
=========

>>> pip install ocraccuracyreporter
>>> from ocraccuracyreporter.oar import oar

.. topic:: initialising the reporter

>>> oreport = oar(expected='john', given='joh', label='name')

>>> print(oreport)
>>> name,john,joh,86,100,86,86,94,1

or you may have various ocr results for the same item, so you may want to initialise the expected alone
with or without a label

>>> oreport = oar(expected='john', label='name')
>>> oreport.given = 'joh'
>>> repr(oreoprt)
if you are creating a csv report with header info
>>>label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance
  name,john,joh,86,100,86,86,94,1

  .. topic:: Items in the report


  ratio - uses pure Levenshtein Distance based matching
          (100 - means perfect match)

  partial_ratio - matches based on best substrings

  token_sort_ratio - tokenizes the strings and sorts them alphabetically

  token_set_ratio - tokenizes the strings and compared the intersection

  jaro_winkler - this algorithm giving more weight to common prefix
                 (for example, some parts are good, missing others)

  distance - this shows how many characters are really different in given
             compared to expected




=========
Class variables
=========

label  - a meaningful name for the ocr string.
expected - expected result
given - result you got out of ocr pipeline

total_expected_char_count - calculated expected char count
total_expected_word_count - calculated expected word count

total_given_char_count - calculated given char count
total_given_word_count - calculated given word count

            

Raw data

            {
    "maintainer": "", 
    "docs_url": null, 
    "requires_python": "", 
    "maintainer_email": "", 
    "cheesecake_code_kwalitee_id": null, 
    "keywords": "", 
    "upload_time": "2018-02-16 02:24:48", 
    "author": "Lucid Programmer", 
    "home_page": "https://github.com/lucidprogrammer/ocraccuracyreporter", 
    "github_user": "lucidprogrammer", 
    "download_url": "https://pypi.python.org/packages/9f/5a/3cfabc321cd8e9fd0796af14dfbe9bb2efec2add0d35cafccfa83c929c1c/ocraccuracyreporter-0.0.5.tar.gz", 
    "platform": "", 
    "version": "0.0.5", 
    "cheesecake_documentation_id": null, 
    "description": "============\nOverview\n============\n\nYour OCR pipeline may have various stages and may use various tools.\nYou need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.\n\n=========\nUsage\n=========\n\n>>> pip install ocraccuracyreporter\n>>> from ocraccuracyreporter.oar import oar\n\n.. topic:: initialising the reporter\n\n>>> oreport = oar(expected='john', given='joh', label='name')\n\n>>> print(oreport)\n>>> name,john,joh,86,100,86,86,94,1\n\nor you may have various ocr results for the same item, so you may want to initialise the expected alone\nwith or without a label\n\n>>> oreport = oar(expected='john', label='name')\n>>> oreport.given = 'joh'\n>>> repr(oreoprt)\nif you are creating a csv report with header info\n>>>label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance\n  name,john,joh,86,100,86,86,94,1\n\n  .. topic:: Items in the report\n\n\n  ratio - uses pure Levenshtein Distance based matching\n          (100 - means perfect match)\n\n  partial_ratio - matches based on best substrings\n\n  token_sort_ratio - tokenizes the strings and sorts them alphabetically\n\n  token_set_ratio - tokenizes the strings and compared the intersection\n\n  jaro_winkler - this algorithm giving more weight to common prefix\n                 (for example, some parts are good, missing others)\n\n  distance - this shows how many characters are really different in given\n             compared to expected\n\n\n\n\n=========\nClass variables\n=========\n\nlabel  - a meaningful name for the ocr string.\nexpected - expected result\ngiven - result you got out of ocr pipeline\n\ntotal_expected_char_count - calculated expected char count\ntotal_expected_word_count - calculated expected word count\n\ntotal_given_char_count - calculated given char count\ntotal_given_word_count - calculated given word count\n", 
    "lcname": "ocraccuracyreporter", 
    "bugtrack_url": null, 
    "github": true, 
    "coveralls": false, 
    "name": "ocraccuracyreporter", 
    "license": "MIT", 
    "travis_ci": false, 
    "github_project": "ocraccuracyreporter", 
    "summary": "OCR Accuracy Reporter", 
    "split_keywords": [], 
    "author_email": "lucidprogrammer@hotmail.com", 
    "urls": [
        {
            "has_sig": false, 
            "upload_time": "2018-02-16T02:24:48", 
            "comment_text": "", 
            "python_version": "source", 
            "url": "https://pypi.python.org/packages/9f/5a/3cfabc321cd8e9fd0796af14dfbe9bb2efec2add0d35cafccfa83c929c1c/ocraccuracyreporter-0.0.5.tar.gz", 
            "md5_digest": "8fba8ae1fb617bd8d6bfd231d459f556", 
            "downloads": 0, 
            "filename": "ocraccuracyreporter-0.0.5.tar.gz", 
            "packagetype": "sdist", 
            "path": "9f/5a/3cfabc321cd8e9fd0796af14dfbe9bb2efec2add0d35cafccfa83c929c1c/ocraccuracyreporter-0.0.5.tar.gz", 
            "size": 3107
        }
    ], 
    "_id": null, 
    "cheesecake_installability_id": null
}