# ArchiTXT: Text-to-Database Structuring Tool

[](https://pypi.org/project/architxt/)

[](https://github.com/Neplex/ArchiTXT/actions)
[](https://archive.softwareheritage.org/browse/origin/?origin_url=https://github.com/Neplex/ArchiTXT)
**ArchiTXT** is a robust tool designed to convert unstructured textual data into structured formats that are ready for
database storage. It automates the generation of database schemas and creates corresponding data instances, simplifying
the integration of text-based information into database systems.
Working with unstructured text can be challenging when you need to store and query it in a structured database.
**ArchiTXT** bridges this gap by transforming raw text into organized, query-friendly structures. By automating both
schema generation and data instance creation, it streamlines the entire process of managing textual information in
databases.
## Installation
To install **ArchiTXT**, make sure you have Python 3.10+ and pip installed. Then, run:
```sh
pip install architxt
```
For the development version, you can install it directly through GIT using
```sh
pip install git+https://github.com/Neplex/ArchiTXT.git
```
## Usage
**ArchiTXT** is built to work seamlessly with BRAT-annotated corpora that includes pre-labeled named entities.
It also requires access to a CoreNLP server, which you can set up using the Docker configuration available in
the source repository.
```sh
$ architxt --help
Usage: architxt [OPTIONS] COMMAND [ARGS]...
ArchiTXT is a tool for structuring textual data into a valid database model.
It is guided by a meta-grammar and uses an iterative process of tree rewriting.
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ run Extract a database schema form a corpus. │
│ ui Launch the web-based UI. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```
```sh
$ architxt run --help
Usage: architxt run [OPTIONS] CORPUS_PATH
Extract a database schema form a corpus.
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * corpus_path PATH Path to the input corpus. [default: None] [required] │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --tau FLOAT The similarity threshold. [default: 0.7] │
│ --epoch INTEGER Number of iteration for tree rewriting. [default: 100] │
│ --min-support INTEGER Minimum support for tree patterns. [default: 20] │
│ --corenlp-url TEXT URL of the CoreNLP server. [default: http://localhost:9000] │
│ --gen-instances INTEGER Number of synthetic instances to generate. [default: 0] │
│ --language TEXT Language of the input corpus. [default: French] │
│ --debug --no-debug Enable debug mode for more verbose output. [default: no-debug] │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```
To deploy the CoreNLP server using the source repository, you can use Docker Compose with the following command:
```sh
docker compose up -d corenlp
```
Raw data
{
"_id": null,
"home_page": null,
"name": "architxt",
"maintainer": "Nicolas Hiot",
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": "nicolas.hiot@univ-orleans.fr",
"keywords": "python, nlp, database, structuration, text mining, text analysis, data analysis",
"author": "Nicolas Hiot",
"author_email": "nicolas.hiot@univ-orleans.fr",
"download_url": "https://files.pythonhosted.org/packages/2b/dd/4d07e25f2db7c9ca8ba9628a06804a27c1a84f4c26998b6fcbe991aed8b1/architxt-0.3.1.tar.gz",
"platform": null,
"description": "# ArchiTXT: Text-to-Database Structuring Tool\n\n\n[](https://pypi.org/project/architxt/)\n\n[](https://github.com/Neplex/ArchiTXT/actions)\n[](https://archive.softwareheritage.org/browse/origin/?origin_url=https://github.com/Neplex/ArchiTXT)\n\n**ArchiTXT** is a robust tool designed to convert unstructured textual data into structured formats that are ready for\ndatabase storage. It automates the generation of database schemas and creates corresponding data instances, simplifying\nthe integration of text-based information into database systems.\n\nWorking with unstructured text can be challenging when you need to store and query it in a structured database.\n**ArchiTXT** bridges this gap by transforming raw text into organized, query-friendly structures. By automating both\nschema generation and data instance creation, it streamlines the entire process of managing textual information in\ndatabases.\n\n## Installation\n\nTo install **ArchiTXT**, make sure you have Python 3.10+ and pip installed. Then, run:\n\n```sh\npip install architxt\n```\n\nFor the development version, you can install it directly through GIT using\n\n```sh\npip install git+https://github.com/Neplex/ArchiTXT.git\n```\n\n## Usage\n\n**ArchiTXT** is built to work seamlessly with BRAT-annotated corpora that includes pre-labeled named entities.\nIt also requires access to a CoreNLP server, which you can set up using the Docker configuration available in\nthe source repository.\n\n```sh\n$ architxt --help\n\n Usage: architxt [OPTIONS] COMMAND [ARGS]...\n\n ArchiTXT is a tool for structuring textual data into a valid database model.\n It is guided by a meta-grammar and uses an iterative process of tree rewriting.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --install-completion Install completion for the current shell. \u2502\n\u2502 --show-completion Show completion for the current shell, to copy it or customize the installation. \u2502\n\u2502 --help Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Commands \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 run Extract a database schema form a corpus. \u2502\n\u2502 ui Launch the web-based UI. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n```sh\n$ architxt run --help\n\n Usage: architxt run [OPTIONS] CORPUS_PATH\n\n Extract a database schema form a corpus.\n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 * corpus_path PATH Path to the input corpus. [default: None] [required] \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --tau FLOAT The similarity threshold. [default: 0.7] \u2502\n\u2502 --epoch INTEGER Number of iteration for tree rewriting. [default: 100] \u2502\n\u2502 --min-support INTEGER Minimum support for tree patterns. [default: 20] \u2502\n\u2502 --corenlp-url TEXT URL of the CoreNLP server. [default: http://localhost:9000] \u2502\n\u2502 --gen-instances INTEGER Number of synthetic instances to generate. [default: 0] \u2502\n\u2502 --language TEXT Language of the input corpus. [default: French] \u2502\n\u2502 --debug --no-debug Enable debug mode for more verbose output. [default: no-debug] \u2502\n\u2502 --help Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\nTo deploy the CoreNLP server using the source repository, you can use Docker Compose with the following command:\n\n```sh\ndocker compose up -d corenlp\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "ArchiTXT is a tool for structuring textual data into a valid database model. It is guided by a meta-grammar and uses an iterative process of tree rewriting.",
"version": "0.3.1",
"project_urls": {
"Documentation": "https://neplex.github.io/ArchiTXT",
"Repository": "https://github.com/neplex/ArchiTXT"
},
"split_keywords": [
"python",
" nlp",
" database",
" structuration",
" text mining",
" text analysis",
" data analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e046601ee4ea28c92a0efe8315da4bd67e9b8f20f9d14a8f34ab79928f29d34a",
"md5": "1a35c174cb72515b761cb69e2834315b",
"sha256": "4597191d738264369a975244613a2169fdc0e50588a3940a655795e20e07cb71"
},
"downloads": -1,
"filename": "architxt-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1a35c174cb72515b761cb69e2834315b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 106206,
"upload_time": "2025-07-15T12:38:15",
"upload_time_iso_8601": "2025-07-15T12:38:15.494510Z",
"url": "https://files.pythonhosted.org/packages/e0/46/601ee4ea28c92a0efe8315da4bd67e9b8f20f9d14a8f34ab79928f29d34a/architxt-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2bdd4d07e25f2db7c9ca8ba9628a06804a27c1a84f4c26998b6fcbe991aed8b1",
"md5": "5b9ace191f5cb6f00b70fd595d45c40e",
"sha256": "a4fed117f13d14642fc1912b830358489813e46a7b98eda9759c7a0ca54746ae"
},
"downloads": -1,
"filename": "architxt-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "5b9ace191f5cb6f00b70fd595d45c40e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 86737,
"upload_time": "2025-07-15T12:38:17",
"upload_time_iso_8601": "2025-07-15T12:38:17.199854Z",
"url": "https://files.pythonhosted.org/packages/2b/dd/4d07e25f2db7c9ca8ba9628a06804a27c1a84f4c26998b6fcbe991aed8b1/architxt-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 12:38:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neplex",
"github_project": "ArchiTXT",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "architxt"
}