djangotextsplitter


Namedjangotextsplitter JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/
SummaryThis package allows the pdftextsplitter engine to communicate with a Django-database
upload_time2023-12-15 13:54:02
maintainer
docs_urlNone
authorUnit Data en Innovatie, Ministerie van Infrastructuur en Waterstaat, Netherlands
requires_python>=3.10
licenseMIT
keywords nlp pdf text recognition structure recognition chatgpt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # djangotextsplitter

This package is meant as a django-extension for the pdftextsplitter package. As such, the pdftextsplitter package
should be installed before this package. <br />
<br />
This django-extension provides an out-of-the-box django-app with database models for the python-classes in pdftextsplitter.
As such, it becomes possible to store the results of the pdftextsplitter package in the django database. <br />
<br />
The django-application in this package does not contain any views, urls, templates, static files or any
other functionality. Only database models (including admin-registration) and load/write functions.
These models and load/write functions can then be used in other django applications, together with
the pdftextsplitter engine. <br />
<br />
Installation works like: pip install djangopdftextsplitter <br />
<br />

## List of database models

The database models in this application are:
* textpart (corresponds to the textpart-class from the pdftextsplitter-package)
* fontregion (corresponds to the fonregion-class from the pdftextsplitter-package)
* lineregion (corresponds to the lineregion-class from the pdftextsplitter-package)
* readingline (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* readinghistogram (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* title (corresponds to the title-class from the pdftextsplitter-package)
* body (corresponds to the body-class from the pdftextsplitter-package)
* footer (corresponds to the footer-class from the pdftextsplitter-package)
* headlines (corresponds to the headlines-class from the pdftextsplitter-package)
* headlines_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* enumeration (corresponds to the enumeration-class from the pdftextsplitter-package)
* enumeration_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* textsplitter (corresponds to the textsplitter-class from the pdftextsplitter-package)
* native_toc_element (corresponds to the native_toc_element-class from the pdftextsplitter-package)
* breakdown_decision (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
* textalinea (corresponds to the textalinea-class from the pdftextsplitter-package)

## Getting started

Within a django-environment (if the djangotextsplitter is installed in the virtual environment and registered in the django),
one can simpy have access to the model by calling <br />
from djangotextsplitter.models import textsplitter as db_textsplitter <br />
We recommend using the 'as db_' to distinguish django database models from base classes in the pdftextsplitter-package. <br />
Loading/writing operations can be accessed with: <br />
from djangotextsplitter.loads import load_textsplitter <br />
Each model that has an associated class in pdftextsplitter, has a load-function, a newwrite-function, an overwrite-function and a delete-function. <br />
They can be called as: <br />
from pdftextsplitter import textsplitter <br />
from djangotextsplitter.models import textsplitter as db_textsplitter <br />
from djangotextsplitter.loads import load_textsplitter <br />
from djangotextsplitter.newwrites import newwrite_textsplitter <br />
from djangotextsplitter.overwrites import overwrite_textsplitter <br />
from djangotextsplitter.deletes import delete_textsplitter <br />
mysplitter = load_textsplitter(31) # 31 is database primary key; in django the pk <br />
db_splitter = newwrite_textsplitter(mysplitter) # No need for a key here, as it is appended to the list <br />
db_splitter = overwrite_textsplitter(31,mysplitter) # 31 is database primary key; in django the pk <br />
delete_textsplitter(31) # 31 is database primary key; in django the pk <br />
<br />
For further details, we refer the user to the documentation of [pdftextsplitter](https://pypi.org/project/pdftextsplitter/), or to the mode details documentation in the docs-folder of this package. <br />
djangotextsplitter is not very complicated. It just provides the database models and load/newwrite/overwrite/delete functions to the pdftextsplitter package, so the pdftextsplitter package can be efficiently used from within a django webapplication.

## Permissions

The admin registration of the models is done in such a way that only superusers have access
to the models in the admin function, even if other users have admin-access and the permissions
to view/add/change/delete them. This is done to enforce people to only change the models using
the load/newwrite/overwrite/delete functions. If someone would manually change the structure
of the models somewhere in the hierarchy, this could cause major disruptions.

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/",
    "name": "djangotextsplitter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "NLP,PDF,Text recognition,Structure recognition,ChatGPT",
    "author": "Unit Data en Innovatie, Ministerie van Infrastructuur en Waterstaat, Netherlands",
    "author_email": "dataloket@minienw.nl",
    "download_url": "https://files.pythonhosted.org/packages/5f/b7/5dcd1c08fb8f9aad431fba267e9aa91755d6b0860114f6c34713b9509cf6/djangotextsplitter-1.2.1.tar.gz",
    "platform": null,
    "description": "# djangotextsplitter\n\nThis package is meant as a django-extension for the pdftextsplitter package. As such, the pdftextsplitter package\nshould be installed before this package. <br />\n<br />\nThis django-extension provides an out-of-the-box django-app with database models for the python-classes in pdftextsplitter.\nAs such, it becomes possible to store the results of the pdftextsplitter package in the django database. <br />\n<br />\nThe django-application in this package does not contain any views, urls, templates, static files or any\nother functionality. Only database models (including admin-registration) and load/write functions.\nThese models and load/write functions can then be used in other django applications, together with\nthe pdftextsplitter engine. <br />\n<br />\nInstallation works like: pip install djangopdftextsplitter <br />\n<br />\n\n## List of database models\n\nThe database models in this application are:\n* textpart (corresponds to the textpart-class from the pdftextsplitter-package)\n* fontregion (corresponds to the fonregion-class from the pdftextsplitter-package)\n* lineregion (corresponds to the lineregion-class from the pdftextsplitter-package)\n* readingline (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* readinghistogram (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* title (corresponds to the title-class from the pdftextsplitter-package)\n* body (corresponds to the body-class from the pdftextsplitter-package)\n* footer (corresponds to the footer-class from the pdftextsplitter-package)\n* headlines (corresponds to the headlines-class from the pdftextsplitter-package)\n* headlines_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* enumeration (corresponds to the enumeration-class from the pdftextsplitter-package)\n* enumeration_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* textsplitter (corresponds to the textsplitter-class from the pdftextsplitter-package)\n* native_toc_element (corresponds to the native_toc_element-class from the pdftextsplitter-package)\n* breakdown_decision (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)\n* textalinea (corresponds to the textalinea-class from the pdftextsplitter-package)\n\n## Getting started\n\nWithin a django-environment (if the djangotextsplitter is installed in the virtual environment and registered in the django),\none can simpy have access to the model by calling <br />\nfrom djangotextsplitter.models import textsplitter as db_textsplitter <br />\nWe recommend using the 'as db_' to distinguish django database models from base classes in the pdftextsplitter-package. <br />\nLoading/writing operations can be accessed with: <br />\nfrom djangotextsplitter.loads import load_textsplitter <br />\nEach model that has an associated class in pdftextsplitter, has a load-function, a newwrite-function, an overwrite-function and a delete-function. <br />\nThey can be called as: <br />\nfrom pdftextsplitter import textsplitter <br />\nfrom djangotextsplitter.models import textsplitter as db_textsplitter <br />\nfrom djangotextsplitter.loads import load_textsplitter <br />\nfrom djangotextsplitter.newwrites import newwrite_textsplitter <br />\nfrom djangotextsplitter.overwrites import overwrite_textsplitter <br />\nfrom djangotextsplitter.deletes import delete_textsplitter <br />\nmysplitter = load_textsplitter(31) # 31 is database primary key; in django the pk <br />\ndb_splitter = newwrite_textsplitter(mysplitter) # No need for a key here, as it is appended to the list <br />\ndb_splitter = overwrite_textsplitter(31,mysplitter) # 31 is database primary key; in django the pk <br />\ndelete_textsplitter(31) # 31 is database primary key; in django the pk <br />\n<br />\nFor further details, we refer the user to the documentation of [pdftextsplitter](https://pypi.org/project/pdftextsplitter/), or to the mode details documentation in the docs-folder of this package. <br />\ndjangotextsplitter is not very complicated. It just provides the database models and load/newwrite/overwrite/delete functions to the pdftextsplitter package, so the pdftextsplitter package can be efficiently used from within a django webapplication.\n\n## Permissions\n\nThe admin registration of the models is done in such a way that only superusers have access\nto the models in the admin function, even if other users have admin-access and the permissions\nto view/add/change/delete them. This is done to enforce people to only change the models using\nthe load/newwrite/overwrite/delete functions. If someone would manually change the structure\nof the models somewhere in the hierarchy, this could cause major disruptions.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This package allows the pdftextsplitter engine to communicate with a Django-database",
    "version": "1.2.1",
    "project_urls": {
        "Download": "https://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/",
        "Homepage": "https://gitlab.com/datainnovatielab/public/djangotextsplitter/dist/"
    },
    "split_keywords": [
        "nlp",
        "pdf",
        "text recognition",
        "structure recognition",
        "chatgpt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "447cb434dfcf1f4db95d9da2537ad96a2587dc984a979753dff7ed9b0a56ff34",
                "md5": "14fca6251461a3e3d11938b02c79d22e",
                "sha256": "0ab89b6762a814e01395ce0745268fdf0f4e1821052ab13d638d628187068c6e"
            },
            "downloads": -1,
            "filename": "djangotextsplitter-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "14fca6251461a3e3d11938b02c79d22e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 61774,
            "upload_time": "2023-12-15T13:54:00",
            "upload_time_iso_8601": "2023-12-15T13:54:00.533076Z",
            "url": "https://files.pythonhosted.org/packages/44/7c/b434dfcf1f4db95d9da2537ad96a2587dc984a979753dff7ed9b0a56ff34/djangotextsplitter-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5fb75dcd1c08fb8f9aad431fba267e9aa91755d6b0860114f6c34713b9509cf6",
                "md5": "9fa606e96f2f104e4b92fe82667c465c",
                "sha256": "0f8b87a76b10676381256d25117ef1545775fdee735cfa64af8fc5bc0a47ce25"
            },
            "downloads": -1,
            "filename": "djangotextsplitter-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9fa606e96f2f104e4b92fe82667c465c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 560819,
            "upload_time": "2023-12-15T13:54:02",
            "upload_time_iso_8601": "2023-12-15T13:54:02.845159Z",
            "url": "https://files.pythonhosted.org/packages/5f/b7/5dcd1c08fb8f9aad431fba267e9aa91755d6b0860114f6c34713b9509cf6/djangotextsplitter-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-15 13:54:02",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "datainnovatielab",
    "gitlab_project": "public",
    "lcname": "djangotextsplitter"
}
        
Elapsed time: 0.17522s