extralit-server


Nameextralit-server JSON
Version 0.1.0a3 PyPI version JSON
download
home_pagehttps://www.argilla.io
SummaryOpen-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.
upload_time2024-06-17 06:15:07
maintainerNone
docs_urlNone
authorNone
requires_python<3.11,>=3.9
licenseApache-2.0
keywords literature-review data-annotation artificial-intelligence machine-learning human-in-the-loop mlops
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<h1 align="center">
  <a href=""><img src="https://github.com/dvsrepo/imgs/raw/main/rg.svg" alt="Argilla" width="150"></a>
  <br>
  Extralit
  <br>
</h1>

<h2 align="center">Open-source feedback layer for LLM-assisted data extractions</h2>

<h3>
<p align="center">
<a href="https://docs.argilla.io">📄 Documentation</a> | </span>
<a href="#-quickstart">🚀 Quickstart</a> <span> | </span>
<a href="#-project-architecture">🛠️ Architecture</a> <span> | </span>
</p>
</h3>

## What is Extralit?

Extralit is a UI interface and platform for LLM-based document data extraction that integrates human and model feedback loops for continuous LLM refinement and data extraction oversight.

With a Python SDK and flexible UI, you can create human and model-in-the-loop workflows for:

* Data extraction validation
* Supervised fine-tuning
* Preference tuning (RLHF, DPO, RLAIF, and more)
* Small, specialized NLP models
* Scalable evaluation.

## 🚀 Development Quickstart

### Install the Pre-requisites
These steps are required to run and develop Argilla locally.

1. Install [Docker Desktop](https://docs.docker.com/get-docker/)
2. Install [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation)
2. Install [ctlptl](https://github.com/tilt-dev/ctlptl/tree/main#how-do-i-install-it)
3. Install [Tilt](https://docs.tilt.dev/)

### Set up local infrastructure for Kind

1. Create a `kind` cluster

```bash
ctlptl create registry ctlptl-registry --port=5005
ctlptl create cluster kind --registry=ctlptl-registry
```


2. Apply config to mount local directory

```bash
ctlptl apply -f k8s/kind/kind-config.yaml
kubectl taint node kind-control-plane node-role.kubernetes.io/control-plane:NoSchedule-

```

### Start local development

1. Run Tilt 

Select the K8s cluster
```bash
kubectl config set-cluster <cluster_name>
```

Setting the `ENV` variable to `dev` enables hot-reloading of Docker containers for 🚀 rapid deployment:
```bash
kubectl create ns <namespace>
ENV=dev tilt up --namespace=<namespace>
```

### Start staging/prod K8s deployment

```bash
ENV=dev DOCKER_REPO=<remote docker repository> tilt up --namespace <namespace> --context <K8s cluster context>
```

## 🛠️ Developer guide

### Editing database schema:
Editting the database schema files at `src/argilla/server/models/*.py` require running these commands to apply revisions to the database.

1. Create revision
```bash
cd src/argilla
alembic revision -m <message>
```

If you happen to run into errors due to the revisions from upstream argilla-io/argilla repo, set the down-revision tag to their latest in the revision `"7552df94427a"` at `src/argilla/server/alembic/versions`

2. Apply the revision
```bash
# Be sure to set environment variables ARGILLA_ELASTICSEARCH and ARGILLA_DATABASE_URL
python -m argilla server database migrate
```

3. Update frontend site to the API backend

```bash
bash scripts/build_frontend.sh
python setup.py bdist_wheel
```

## 🛠️ Project Architecture

Argilla is built on 5 core components:

- **Python SDK**: A Python SDK which is installable with `pip install argilla`. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration and annotation workflows.
- **FastAPI Server**: The core of Argilla is a *Python FastAPI* server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.
- **Relational Database**: A relational database to store the metadata of the records and the annotations. *SQLite* is used as the default built-in option and is deployed separately with the Argilla Server but a separate *PostgreSQL* can be used too.
- **Vector Database**: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support *ElasticSearch* and *AWS OpenSearch* and they can be deployed as separate Docker images.
- **Vue.js UI**: A web application to visualize and annotate your data, users and teams. It is built with *Vue.js* and is directly deployed alongside the Argilla Server within our Argilla Docker image.



<p align="center">
<a  href="https://pypi.org/project/argilla-server/">
<img alt="CI" src="https://img.shields.io/pypi/v/argilla.svg?style=flat-round&logo=pypi&logoColor=white">
</a>
<img alt="Codecov" src="https://codecov.io/gh/argilla-io/argilla-server/branch/main/graph/badge.svg?token=VDVR29VOMG"/>
<a href="https://pepy.tech/project/argilla-server">
<img alt="CI" src="https://static.pepy.tech/personalized-badge/argilla-server?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month">
</a>
<a href="https://huggingface.co/new-space?template=argilla/argilla-template-space">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-sm.svg"/>
</a>
</p>

<p align="center">
<a href="https://twitter.com/argilla_io">
<img src="https://img.shields.io/badge/twitter-black?logo=x"/>
</a>
<a href="https://www.linkedin.com/company/argilla-io">
<img src="https://img.shields.io/badge/linkedin-blue?logo=linkedin"/>
</a>
<a href="https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g">
<img src="https://img.shields.io/badge/slack-purple?logo=slack"/>
</a>
</p>


## Clone repository

`argilla-server` is using `argilla` repository as submodule to build frontend statics so when cloning use the following command:

```sh
git clone --recurse-submodules git@github.com:argilla-io/argilla-server.git
```

If you already cloned the repository without using `--recurse-submodules` you can init and update the submodules with:

```sh
git submodule update --remote --recursive --init
```

> [!IMPORTANT]
> By default `argilla` submodule is using `develop` branch so the previous command will get the latest commit from that branch.

### Specify a tag for argilla submodule

When doing a release we should change `argilla` submodule to use an specific tag. In the following example we are setting tag `v1.22.0`:

```sh
cd argilla
git fetch --tags
git checkout v1.22.0
```

> [!NOTE]
> You should see some changes on the `argilla-server` root folder where the subproject commit is now changed to the one from the tag version. Feel free to commit these changes.

## Development environment

By default all commands executed with `pdm run` will get environment variables from `.env.dev` except command `pdm test` that will overwrite some of them using values coming from `.env.test` file.

These environment variables can be overrided if necessary so feel free to defined your own ones locally.

### Run cli

```sh
pdm cli
```

### Run database migrations

By default a SQLite located at `~/.argilla/argilla.db` will be used. You can create the database and run migrations with the following custom PDM command:

```sh
pdm migrate
```

### Run tests

A SQLite database located at `~/.argilla/argilla-test.db` will be automatically created to run tests. You can run the entire test suite using the following custom PDM command:

```sh
pdm test
```

## Run development server

### Build frontend static files

Before running Argilla development server we need to build the frontend static files. Node version 18 is required for this action:

```sh
brew install node@18
```

After that you can build the frontend static files:

```sh
./scripts/build_frontend.sh
```

After running the previous script you should have a folder at `src/argilla_server/static` with all the frontend static files successfully generated.

### Run uvicorn development server

```sh
pdm server
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://www.argilla.io",
    "name": "extralit-server",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.11,>=3.9",
    "maintainer_email": "Jonny Tran <nhat.c.tran@gmail.com>, argilla <contact@argilla.io>",
    "keywords": "literature-review, data-annotation, artificial-intelligence, machine-learning, human-in-the-loop, mlops",
    "author": null,
    "author_email": "Jonny Tran <nhat.c.tran@gmail.com>, argilla <contact@argilla.io>",
    "download_url": "https://files.pythonhosted.org/packages/64/9e/116874b5d6f1c442a1b36d9d5bce26be5ddf09b5659a45ff43f7b903659c/extralit_server-0.1.0a3.tar.gz",
    "platform": null,
    "description": "\n<h1 align=\"center\">\n  <a href=\"\"><img src=\"https://github.com/dvsrepo/imgs/raw/main/rg.svg\" alt=\"Argilla\" width=\"150\"></a>\n  <br>\n  Extralit\n  <br>\n</h1>\n\n<h2 align=\"center\">Open-source feedback layer for LLM-assisted data extractions</h2>\n\n<h3>\n<p align=\"center\">\n<a href=\"https://docs.argilla.io\">\ud83d\udcc4 Documentation</a> | </span>\n<a href=\"#-quickstart\">\ud83d\ude80 Quickstart</a> <span> | </span>\n<a href=\"#-project-architecture\">\ud83d\udee0\ufe0f Architecture</a> <span> | </span>\n</p>\n</h3>\n\n## What is Extralit?\n\nExtralit is a UI interface and platform for LLM-based document data extraction that integrates human and model feedback loops for continuous LLM refinement and data extraction oversight.\n\nWith a Python SDK and flexible UI, you can create human and model-in-the-loop workflows for:\n\n* Data extraction validation\n* Supervised fine-tuning\n* Preference tuning (RLHF, DPO, RLAIF, and more)\n* Small, specialized NLP models\n* Scalable evaluation.\n\n## \ud83d\ude80 Development Quickstart\n\n### Install the Pre-requisites\nThese steps are required to run and develop Argilla locally.\n\n1. Install [Docker Desktop](https://docs.docker.com/get-docker/)\n2. Install [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation)\n2. Install [ctlptl](https://github.com/tilt-dev/ctlptl/tree/main#how-do-i-install-it)\n3. Install [Tilt](https://docs.tilt.dev/)\n\n### Set up local infrastructure for Kind\n\n1. Create a `kind` cluster\n\n```bash\nctlptl create registry ctlptl-registry --port=5005\nctlptl create cluster kind --registry=ctlptl-registry\n```\n\n\n2. Apply config to mount local directory\n\n```bash\nctlptl apply -f k8s/kind/kind-config.yaml\nkubectl taint node kind-control-plane node-role.kubernetes.io/control-plane:NoSchedule-\n\n```\n\n### Start local development\n\n1. Run Tilt \n\nSelect the K8s cluster\n```bash\nkubectl config set-cluster <cluster_name>\n```\n\nSetting the `ENV` variable to `dev` enables hot-reloading of Docker containers for \ud83d\ude80 rapid deployment:\n```bash\nkubectl create ns <namespace>\nENV=dev tilt up --namespace=<namespace>\n```\n\n### Start staging/prod K8s deployment\n\n```bash\nENV=dev DOCKER_REPO=<remote docker repository> tilt up --namespace <namespace> --context <K8s cluster context>\n```\n\n## \ud83d\udee0\ufe0f Developer guide\n\n### Editing database schema:\nEditting the database schema files at `src/argilla/server/models/*.py` require running these commands to apply revisions to the database.\n\n1. Create revision\n```bash\ncd src/argilla\nalembic revision -m <message>\n```\n\nIf you happen to run into errors due to the revisions from upstream argilla-io/argilla repo, set the down-revision tag to their latest in the revision `\"7552df94427a\"` at `src/argilla/server/alembic/versions`\n\n2. Apply the revision\n```bash\n# Be sure to set environment variables ARGILLA_ELASTICSEARCH and ARGILLA_DATABASE_URL\npython -m argilla server database migrate\n```\n\n3. Update frontend site to the API backend\n\n```bash\nbash scripts/build_frontend.sh\npython setup.py bdist_wheel\n```\n\n## \ud83d\udee0\ufe0f Project Architecture\n\nArgilla is built on 5 core components:\n\n- **Python SDK**: A Python SDK which is installable with `pip install argilla`. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration and annotation workflows.\n- **FastAPI Server**: The core of Argilla is a *Python FastAPI* server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.\n- **Relational Database**: A relational database to store the metadata of the records and the annotations. *SQLite* is used as the default built-in option and is deployed separately with the Argilla Server but a separate *PostgreSQL* can be used too.\n- **Vector Database**: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support *ElasticSearch* and *AWS OpenSearch* and they can be deployed as separate Docker images.\n- **Vue.js UI**: A web application to visualize and annotate your data, users and teams. It is built with *Vue.js* and is directly deployed alongside the Argilla Server within our Argilla Docker image.\n\n\n\n<p align=\"center\">\n<a  href=\"https://pypi.org/project/argilla-server/\">\n<img alt=\"CI\" src=\"https://img.shields.io/pypi/v/argilla.svg?style=flat-round&logo=pypi&logoColor=white\">\n</a>\n<img alt=\"Codecov\" src=\"https://codecov.io/gh/argilla-io/argilla-server/branch/main/graph/badge.svg?token=VDVR29VOMG\"/>\n<a href=\"https://pepy.tech/project/argilla-server\">\n<img alt=\"CI\" src=\"https://static.pepy.tech/personalized-badge/argilla-server?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month\">\n</a>\n<a href=\"https://huggingface.co/new-space?template=argilla/argilla-template-space\">\n<img src=\"https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-sm.svg\"/>\n</a>\n</p>\n\n<p align=\"center\">\n<a href=\"https://twitter.com/argilla_io\">\n<img src=\"https://img.shields.io/badge/twitter-black?logo=x\"/>\n</a>\n<a href=\"https://www.linkedin.com/company/argilla-io\">\n<img src=\"https://img.shields.io/badge/linkedin-blue?logo=linkedin\"/>\n</a>\n<a href=\"https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g\">\n<img src=\"https://img.shields.io/badge/slack-purple?logo=slack\"/>\n</a>\n</p>\n\n\n## Clone repository\n\n`argilla-server` is using `argilla` repository as submodule to build frontend statics so when cloning use the following command:\n\n```sh\ngit clone --recurse-submodules git@github.com:argilla-io/argilla-server.git\n```\n\nIf you already cloned the repository without using `--recurse-submodules` you can init and update the submodules with:\n\n```sh\ngit submodule update --remote --recursive --init\n```\n\n> [!IMPORTANT]\n> By default `argilla` submodule is using `develop` branch so the previous command will get the latest commit from that branch.\n\n### Specify a tag for argilla submodule\n\nWhen doing a release we should change `argilla` submodule to use an specific tag. In the following example we are setting tag `v1.22.0`:\n\n```sh\ncd argilla\ngit fetch --tags\ngit checkout v1.22.0\n```\n\n> [!NOTE]\n> You should see some changes on the `argilla-server` root folder where the subproject commit is now changed to the one from the tag version. Feel free to commit these changes.\n\n## Development environment\n\nBy default all commands executed with `pdm run` will get environment variables from `.env.dev` except command `pdm test` that will overwrite some of them using values coming from `.env.test` file.\n\nThese environment variables can be overrided if necessary so feel free to defined your own ones locally.\n\n### Run cli\n\n```sh\npdm cli\n```\n\n### Run database migrations\n\nBy default a SQLite located at `~/.argilla/argilla.db` will be used. You can create the database and run migrations with the following custom PDM command:\n\n```sh\npdm migrate\n```\n\n### Run tests\n\nA SQLite database located at `~/.argilla/argilla-test.db` will be automatically created to run tests. You can run the entire test suite using the following custom PDM command:\n\n```sh\npdm test\n```\n\n## Run development server\n\n### Build frontend static files\n\nBefore running Argilla development server we need to build the frontend static files. Node version 18 is required for this action:\n\n```sh\nbrew install node@18\n```\n\nAfter that you can build the frontend static files:\n\n```sh\n./scripts/build_frontend.sh\n```\n\nAfter running the previous script you should have a folder at `src/argilla_server/static` with all the frontend static files successfully generated.\n\n### Run uvicorn development server\n\n```sh\npdm server\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.",
    "version": "0.1.0a3",
    "project_urls": {
        "Documentation": "https://docs.argilla.io",
        "Homepage": "https://www.argilla.io",
        "Repository": "https://github.com/argilla-io/argilla"
    },
    "split_keywords": [
        "literature-review",
        " data-annotation",
        " artificial-intelligence",
        " machine-learning",
        " human-in-the-loop",
        " mlops"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c28213d8d4dd6b22cddebd0557798e184a955eb51dacc8345e56e177bea6bf4c",
                "md5": "9e35e90ff748e55a4769c929412713cf",
                "sha256": "89318ec5a1d48cbf88e8adfcdaa0f789be33a1e9d537f8d0657605058e45b4e9"
            },
            "downloads": -1,
            "filename": "extralit_server-0.1.0a3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e35e90ff748e55a4769c929412713cf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.11,>=3.9",
            "size": 4747660,
            "upload_time": "2024-06-17T06:14:40",
            "upload_time_iso_8601": "2024-06-17T06:14:40.845809Z",
            "url": "https://files.pythonhosted.org/packages/c2/82/13d8d4dd6b22cddebd0557798e184a955eb51dacc8345e56e177bea6bf4c/extralit_server-0.1.0a3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "649e116874b5d6f1c442a1b36d9d5bce26be5ddf09b5659a45ff43f7b903659c",
                "md5": "b8279e5a19154fe897c04a9022b3f2b2",
                "sha256": "da2ebbc6a2a337a07f2ebf028fb7da2eeb85315f1fc2aa483b97dbf14e11d398"
            },
            "downloads": -1,
            "filename": "extralit_server-0.1.0a3.tar.gz",
            "has_sig": false,
            "md5_digest": "b8279e5a19154fe897c04a9022b3f2b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.11,>=3.9",
            "size": 4324812,
            "upload_time": "2024-06-17T06:15:07",
            "upload_time_iso_8601": "2024-06-17T06:15:07.629139Z",
            "url": "https://files.pythonhosted.org/packages/64/9e/116874b5d6f1c442a1b36d9d5bce26be5ddf09b5659a45ff43f7b903659c/extralit_server-0.1.0a3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-17 06:15:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "argilla-io",
    "github_project": "argilla",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "extralit-server"
}
        
Elapsed time: 0.26910s