Name | pic2prose JSON |
Version |
0.0.2
JSON |
| download |
home_page | |
Summary | Enables real-world data collection, bridges the gap between OCR and NLP, enabling you to convert text from any image to ready to use nlp data structures. |
upload_time | 2023-09-22 05:01:33 |
maintainer | |
docs_url | None |
author | Rohit Mishra |
requires_python | |
license | |
keywords |
python
images
text
nlp
natural
language
lexicon
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pic2prose
A package that can take in images and build a corpus and produce nlp datastructures for direct use in experimentation and model training.
Take any image with text, use p2p to generate NLP datastructures ready for use in fine-tuning LLM's, generating embeddings, sentiment classification, etc.
# Installation
```
pip install pic2prose
```
Open up your favorite editor, import, and build a robust corpus.
```
from pic2prose.structures import *
# initialize the object
# may take longer if you're not using a GPU
corpus = Corp(image_path="ex1.png")
# generate co-occurrence matrix
corpus.get_co_occurrence_matrix()
# generate tf-idf matrix
corpus.get_tfidf_matrix()
# one-hot encodings
corpus.one_hot_encode()
```
# Coming Soon
Support for building corpi from URL
Raw data
{
"_id": null,
"home_page": "",
"name": "pic2prose",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "python,images,text,nlp,natural,language,lexicon",
"author": "Rohit Mishra",
"author_email": "<rohitnmishra2@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/92/96/b5153f18a6170b8c3494e13f3fe69cee953a1ddbb413168684b8e8144f27/pic2prose-0.0.2.tar.gz",
"platform": null,
"description": "\n# pic2prose\nA package that can take in images and build a corpus and produce nlp datastructures for direct use in experimentation and model training.\n\nTake any image with text, use p2p to generate NLP datastructures ready for use in fine-tuning LLM's, generating embeddings, sentiment classification, etc.\n\n# Installation\n```\npip install pic2prose\n```\n\nOpen up your favorite editor, import, and build a robust corpus.\n```\nfrom pic2prose.structures import *\n\n# initialize the object\n# may take longer if you're not using a GPU\ncorpus = Corp(image_path=\"ex1.png\")\n\n# generate co-occurrence matrix\ncorpus.get_co_occurrence_matrix()\n\n# generate tf-idf matrix\ncorpus.get_tfidf_matrix()\n\n# one-hot encodings\ncorpus.one_hot_encode()\n```\n\n# Coming Soon\nSupport for building corpi from URL\n",
"bugtrack_url": null,
"license": "",
"summary": "Enables real-world data collection, bridges the gap between OCR and NLP, enabling you to convert text from any image to ready to use nlp data structures.",
"version": "0.0.2",
"project_urls": null,
"split_keywords": [
"python",
"images",
"text",
"nlp",
"natural",
"language",
"lexicon"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5b112370c3e43945cd8776f3774f2ac5173de108d46a43b04b7fc8d32a1775a7",
"md5": "fffb41f158cd904e12bacced79a12bca",
"sha256": "74c80581b289a18e4cd3fc9f05137f6ba35e588ef2029feb8a13335e50521d27"
},
"downloads": -1,
"filename": "pic2prose-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fffb41f158cd904e12bacced79a12bca",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4205,
"upload_time": "2023-09-22T05:01:31",
"upload_time_iso_8601": "2023-09-22T05:01:31.526642Z",
"url": "https://files.pythonhosted.org/packages/5b/11/2370c3e43945cd8776f3774f2ac5173de108d46a43b04b7fc8d32a1775a7/pic2prose-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9296b5153f18a6170b8c3494e13f3fe69cee953a1ddbb413168684b8e8144f27",
"md5": "999d9cc68db96c98e83eaf68d42e7070",
"sha256": "6738f67247805f53325aa140999c2073fabc46bfaa552139fa01dc46c14f8984"
},
"downloads": -1,
"filename": "pic2prose-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "999d9cc68db96c98e83eaf68d42e7070",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4033,
"upload_time": "2023-09-22T05:01:33",
"upload_time_iso_8601": "2023-09-22T05:01:33.687385Z",
"url": "https://files.pythonhosted.org/packages/92/96/b5153f18a6170b8c3494e13f3fe69cee953a1ddbb413168684b8e8144f27/pic2prose-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-22 05:01:33",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pic2prose"
}