pic2prose


Namepic2prose JSON
Version 0.0.2 PyPI version JSON
download
home_page
SummaryEnables real-world data collection, bridges the gap between OCR and NLP, enabling you to convert text from any image to ready to use nlp data structures.
upload_time2023-09-22 05:01:33
maintainer
docs_urlNone
authorRohit Mishra
requires_python
license
keywords python images text nlp natural language lexicon
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# pic2prose
A package that can take in images and build a corpus and produce nlp datastructures for direct use in experimentation and model training.

Take any image with text, use p2p to generate NLP datastructures ready for use in fine-tuning LLM's, generating embeddings, sentiment classification, etc.

# Installation
```
pip install pic2prose
```

Open up your favorite editor, import, and build a robust corpus.
```
from pic2prose.structures import *

# initialize the object
# may take longer if you're not using a GPU
corpus = Corp(image_path="ex1.png")

# generate co-occurrence matrix
corpus.get_co_occurrence_matrix()

# generate tf-idf matrix
corpus.get_tfidf_matrix()

# one-hot encodings
corpus.one_hot_encode()
```

# Coming Soon
Support for building corpi from URL

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pic2prose",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,images,text,nlp,natural,language,lexicon",
    "author": "Rohit Mishra",
    "author_email": "<rohitnmishra2@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/92/96/b5153f18a6170b8c3494e13f3fe69cee953a1ddbb413168684b8e8144f27/pic2prose-0.0.2.tar.gz",
    "platform": null,
    "description": "\n# pic2prose\nA package that can take in images and build a corpus and produce nlp datastructures for direct use in experimentation and model training.\n\nTake any image with text, use p2p to generate NLP datastructures ready for use in fine-tuning LLM's, generating embeddings, sentiment classification, etc.\n\n# Installation\n```\npip install pic2prose\n```\n\nOpen up your favorite editor, import, and build a robust corpus.\n```\nfrom pic2prose.structures import *\n\n# initialize the object\n# may take longer if you're not using a GPU\ncorpus = Corp(image_path=\"ex1.png\")\n\n# generate co-occurrence matrix\ncorpus.get_co_occurrence_matrix()\n\n# generate tf-idf matrix\ncorpus.get_tfidf_matrix()\n\n# one-hot encodings\ncorpus.one_hot_encode()\n```\n\n# Coming Soon\nSupport for building corpi from URL\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Enables real-world data collection, bridges the gap between OCR and NLP, enabling you to convert text from any image to ready to use nlp data structures.",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "python",
        "images",
        "text",
        "nlp",
        "natural",
        "language",
        "lexicon"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b112370c3e43945cd8776f3774f2ac5173de108d46a43b04b7fc8d32a1775a7",
                "md5": "fffb41f158cd904e12bacced79a12bca",
                "sha256": "74c80581b289a18e4cd3fc9f05137f6ba35e588ef2029feb8a13335e50521d27"
            },
            "downloads": -1,
            "filename": "pic2prose-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fffb41f158cd904e12bacced79a12bca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 4205,
            "upload_time": "2023-09-22T05:01:31",
            "upload_time_iso_8601": "2023-09-22T05:01:31.526642Z",
            "url": "https://files.pythonhosted.org/packages/5b/11/2370c3e43945cd8776f3774f2ac5173de108d46a43b04b7fc8d32a1775a7/pic2prose-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9296b5153f18a6170b8c3494e13f3fe69cee953a1ddbb413168684b8e8144f27",
                "md5": "999d9cc68db96c98e83eaf68d42e7070",
                "sha256": "6738f67247805f53325aa140999c2073fabc46bfaa552139fa01dc46c14f8984"
            },
            "downloads": -1,
            "filename": "pic2prose-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "999d9cc68db96c98e83eaf68d42e7070",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4033,
            "upload_time": "2023-09-22T05:01:33",
            "upload_time_iso_8601": "2023-09-22T05:01:33.687385Z",
            "url": "https://files.pythonhosted.org/packages/92/96/b5153f18a6170b8c3494e13f3fe69cee953a1ddbb413168684b8e8144f27/pic2prose-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-22 05:01:33",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pic2prose"
}
        
Elapsed time: 0.11978s