# djangotextsplitter
This package is meant as a django-extension for the pdftextsplitter package. As such, the pdftextsplitter package
should be installed before this package. <br />
<br />
This django-extension provides an out-of-the-box django-app with database models for the python-classes in pdftextsplitter.
As such, it becomes possible to store the results of the pdftextsplitter package in the django database. <br />
<br />
The django-application in this package does not contain any views, urls, templates, static files or any
other functionality. Only database models (including admin-registration) and load/write functions.
These models and load/write functions can then be used in other django applications, together with
the pdftextsplitter engine. <br />
<br />
Installation works like: pip install djangopdftextsplitter <br />
<br />
## List of database models
The database models in this application are:
* textpart (corresponds to the textpart-class from the pdftextsplitter-package)
* fontregion (corresponds to the fonregion-class from the pdftextsplitter-package)
* lineregion (corresponds to the lineregion-class from the pdftextsplitter-package)
* readingline (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* readinghistogram (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* title (corresponds to the title-class from the pdftextsplitter-package)
* body (corresponds to the body-class from the pdftextsplitter-package)
* footer (corresponds to the footer-class from the pdftextsplitter-package)
* headlines (corresponds to the headlines-class from the pdftextsplitter-package)
* headlines_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* enumeration (corresponds to the enumeration-class from the pdftextsplitter-package)
* enumeration_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* textsplitter (corresponds to the textsplitter-class from the pdftextsplitter-package)
* native_toc_element (corresponds to the native_toc_element-class from the pdftextsplitter-package)
* breakdown_decision (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* textalinea (corresponds to the textalinea-class from the pdftextsplitter-package)
## Getting started
Within a django-environment (if the djangotextsplitter is installed in the virtual environment and registered in the django),
one can simpy have access to the model by calling <br />
from djangotextsplitter.models import textsplitter as db_textsplitter <br />
We recommend using the 'as db_' to distinguish django database models from base classes in the pdftextsplitter-package. <br />
Loading/writing operations can be accessed with: <br />
from djangotextsplitter.loads import load_textsplitter <br />
Each model that has an associated class in pdftextsplitter, has a load-function, a newwrite-function, an overwrite-function and a delete-function. <br />
They can be called as: <br />
from pdftextsplitter import textsplitter <br />
from djangotextsplitter.models import textsplitter as db_textsplitter <br />
from djangotextsplitter.loads import load_textsplitter <br />
from djangotextsplitter.newwrites import newwrite_textsplitter <br />
from djangotextsplitter.overwrites import overwrite_textsplitter <br />
from djangotextsplitter.deletes import delete_textsplitter <br />
mysplitter = load_textsplitter(31) # 31 is database primary key; in django the pk <br />
db_splitter = newwrite_textsplitter(mysplitter) # No need for a key here, as it is appended to the list <br />
db_splitter = overwrite_textsplitter(31,mysplitter) # 31 is database primary key; in django the pk <br />
delete_textsplitter(31) # 31 is database primary key; in django the pk <br />
<br />
For further details, we refer the user to the documentation of [pdftextsplitter](https://pypi.org/project/pdftextsplitter/), or to the mode details documentation in the docs-folder of this package. <br />
djangotextsplitter is not very complicated. It just provides the database models and load/newwrite/overwrite/delete functions to the pdftextsplitter package, so the pdftextsplitter package can be efficiently used from within a django webapplication.
## Permissions
The admin registration of the models is done in such a way that only superusers have access
to the models in the admin function, even if other users have admin-access and the permissions
to view/add/change/delete them. This is done to enforce people to only change the models using
the load/newwrite/overwrite/delete functions. If someone would manually change the structure
of the models somewhere in the hierarchy, this could cause major disruptions.
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/",
"name": "djangotextsplitter",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "NLP,PDF,Text recognition,Structure recognition,ChatGPT",
"author": "Unit Data en Innovatie, Ministerie van Infrastructuur en Waterstaat, Netherlands",
"author_email": "dataloket@minienw.nl",
"download_url": "https://files.pythonhosted.org/packages/5f/b7/5dcd1c08fb8f9aad431fba267e9aa91755d6b0860114f6c34713b9509cf6/djangotextsplitter-1.2.1.tar.gz",
"platform": null,
"description": "# djangotextsplitter\n\nThis package is meant as a django-extension for the pdftextsplitter package. As such, the pdftextsplitter package\nshould be installed before this package. <br />\n<br />\nThis django-extension provides an out-of-the-box django-app with database models for the python-classes in pdftextsplitter.\nAs such, it becomes possible to store the results of the pdftextsplitter package in the django database. <br />\n<br />\nThe django-application in this package does not contain any views, urls, templates, static files or any\nother functionality. Only database models (including admin-registration) and load/write functions.\nThese models and load/write functions can then be used in other django applications, together with\nthe pdftextsplitter engine. <br />\n<br />\nInstallation works like: pip install djangopdftextsplitter <br />\n<br />\n\n## List of database models\n\nThe database models in this application are:\n* textpart (corresponds to the textpart-class from the pdftextsplitter-package)\n* fontregion (corresponds to the fonregion-class from the pdftextsplitter-package)\n* lineregion (corresponds to the lineregion-class from the pdftextsplitter-package)\n* readingline (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* readinghistogram (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* title (corresponds to the title-class from the pdftextsplitter-package)\n* body (corresponds to the body-class from the pdftextsplitter-package)\n* footer (corresponds to the footer-class from the pdftextsplitter-package)\n* headlines (corresponds to the headlines-class from the pdftextsplitter-package)\n* headlines_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* enumeration (corresponds to the enumeration-class from the pdftextsplitter-package)\n* enumeration_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* textsplitter (corresponds to the textsplitter-class from the pdftextsplitter-package)\n* native_toc_element (corresponds to the native_toc_element-class from the pdftextsplitter-package)\n* breakdown_decision (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* textalinea (corresponds to the textalinea-class from the pdftextsplitter-package)\n\n## Getting started\n\nWithin a django-environment (if the djangotextsplitter is installed in the virtual environment and registered in the django),\none can simpy have access to the model by calling <br />\nfrom djangotextsplitter.models import textsplitter as db_textsplitter <br />\nWe recommend using the 'as db_' to distinguish django database models from base classes in the pdftextsplitter-package. <br />\nLoading/writing operations can be accessed with: <br />\nfrom djangotextsplitter.loads import load_textsplitter <br />\nEach model that has an associated class in pdftextsplitter, has a load-function, a newwrite-function, an overwrite-function and a delete-function. <br />\nThey can be called as: <br />\nfrom pdftextsplitter import textsplitter <br />\nfrom djangotextsplitter.models import textsplitter as db_textsplitter <br />\nfrom djangotextsplitter.loads import load_textsplitter <br />\nfrom djangotextsplitter.newwrites import newwrite_textsplitter <br />\nfrom djangotextsplitter.overwrites import overwrite_textsplitter <br />\nfrom djangotextsplitter.deletes import delete_textsplitter <br />\nmysplitter = load_textsplitter(31) # 31 is database primary key; in django the pk <br />\ndb_splitter = newwrite_textsplitter(mysplitter) # No need for a key here, as it is appended to the list <br />\ndb_splitter = overwrite_textsplitter(31,mysplitter) # 31 is database primary key; in django the pk <br />\ndelete_textsplitter(31) # 31 is database primary key; in django the pk <br />\n<br />\nFor further details, we refer the user to the documentation of [pdftextsplitter](https://pypi.org/project/pdftextsplitter/), or to the mode details documentation in the docs-folder of this package. <br />\ndjangotextsplitter is not very complicated. It just provides the database models and load/newwrite/overwrite/delete functions to the pdftextsplitter package, so the pdftextsplitter package can be efficiently used from within a django webapplication.\n\n## Permissions\n\nThe admin registration of the models is done in such a way that only superusers have access\nto the models in the admin function, even if other users have admin-access and the permissions\nto view/add/change/delete them. This is done to enforce people to only change the models using\nthe load/newwrite/overwrite/delete functions. If someone would manually change the structure\nof the models somewhere in the hierarchy, this could cause major disruptions.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "This package allows the pdftextsplitter engine to communicate with a Django-database",
"version": "1.2.1",
"project_urls": {
"Download": "https://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/",
"Homepage": "https://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/"
},
"split_keywords": [
"nlp",
"pdf",
"text recognition",
"structure recognition",
"chatgpt"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "447cb434dfcf1f4db95d9da2537ad96a2587dc984a979753dff7ed9b0a56ff34",
"md5": "14fca6251461a3e3d11938b02c79d22e",
"sha256": "0ab89b6762a814e01395ce0745268fdf0f4e1821052ab13d638d628187068c6e"
},
"downloads": -1,
"filename": "djangotextsplitter-1.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "14fca6251461a3e3d11938b02c79d22e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 61774,
"upload_time": "2023-12-15T13:54:00",
"upload_time_iso_8601": "2023-12-15T13:54:00.533076Z",
"url": "https://files.pythonhosted.org/packages/44/7c/b434dfcf1f4db95d9da2537ad96a2587dc984a979753dff7ed9b0a56ff34/djangotextsplitter-1.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5fb75dcd1c08fb8f9aad431fba267e9aa91755d6b0860114f6c34713b9509cf6",
"md5": "9fa606e96f2f104e4b92fe82667c465c",
"sha256": "0f8b87a76b10676381256d25117ef1545775fdee735cfa64af8fc5bc0a47ce25"
},
"downloads": -1,
"filename": "djangotextsplitter-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "9fa606e96f2f104e4b92fe82667c465c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 560819,
"upload_time": "2023-12-15T13:54:02",
"upload_time_iso_8601": "2023-12-15T13:54:02.845159Z",
"url": "https://files.pythonhosted.org/packages/5f/b7/5dcd1c08fb8f9aad431fba267e9aa91755d6b0860114f6c34713b9509cf6/djangotextsplitter-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-15 13:54:02",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "datainnovatielab",
"gitlab_project": "public",
"lcname": "djangotextsplitter"
}