<h1 align="center">
<a href=""><img src="https://github.com/extralit/extralit/raw/develop/extralit/docs/assets/logo.svg" alt="Extralit" width="150"></a>
<br>
Extralit Server
<br>
</h1>
<h3 align="center">Extract structured data from scientific literature with human validation</h2>
<p align="center">
<a href="https://pypi.org/project/extralit/">
<img alt="CI" src="https://img.shields.io/pypi/v/extralit.svg?style=flat-round&logo=pypi&logoColor=white">
</a>
<img alt="Codecov" src="https://codecov.io/gh/extralit/extralit/branch/main/graph/badge.svg"/>
<a href="https://pepy.tech/project/extralit">
<img alt="Downloads" src="https://static.pepy.tech/personalized-badge/extralit?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month">
</a>
</p>
<p align="center">
<a href="https://www.linkedin.com/company/extralit-ai">
<img src="https://img.shields.io/badge/linkedin-blue?logo=linkedin"/>
</a>
</p>
This repository contains developer information about the backend server components. For general usage, please refer to our [main repository](https://github.com/extralit/extralit) or our [documentation](https://docs.extralit.ai/latest/).
## Source Code Structure
The server components are split into two main services:
```
/extralit_server
/api # Core extraction API endpoints
/handlers # FastAPI request handlers
/schemas # Data models and validation
/services # Business logic services
/utils # Helper utilities
/ml # Machine learning components
/extractors # Document extraction models
/ocr # OCR processing
/pipeline # Extraction pipeline orchestration
/storage # Data persistence layer
/models # Database models
/search # Search engine integration
/vector # Vector store
```
```
/extralit_server
/api # Annotation UI API endpoints
/handlers
/schemas
/models # Database models
/auth # Authentication
/tasks # Background jobs
```
## Development Environment
The development environment uses Docker Compose to run all required services. Key commands:
```sh
# Start all services
docker-compose up -d
# Run server in dev mode
pdm run dev
# Run tests
pdm test
# Format and lint
pdm format
pdm lint
# Run all checks
pdm all
```
## Key Components
### FastAPI Servers
- **Extraction Server**: Handles document processing, extraction pipeline, and ML model serving
- **Annotation Server**: Manages UI, data validation workflow, and user collaboration
### Databases
- **PostgreSQL**: Main database for user data, annotations, and metadata
- **Elasticsearch**: Vector store for semantic search and document indexing
- **Weaviate**: Vector database for table and section embeddings
### Background Processing
Uses Celery for asynchronous tasks like:
- Document OCR and preprocessing
- ML model inference
- Batch extraction jobs
- Data export
## CLI Commands
Key management commands:
```sh
# Database management
python -m extralit_server db migrate
python -m extralit_server db create-user
# Start servers
python -m extralit_server start
python -m extralit_server start
# Run workers
python -m extralit_server worker
```
See full CLI documentation in our [developer docs](https://docs.extralit.ai/latest/developer).
## Running Tests
The pytest suite is primarily designed to run in the CI environment using GitHub Actions as defined in `.github/workflows/extralit-server.yml`. This workflow sets up the necessary dependencies including Elasticsearch, PostgreSQL, Redis, and Minio.
Note that some tests are specifically skipped when running locally due to differences between the CI environment and local development environments. These tests may involve:
- Search engine dynamics (Elasticsearch/OpenSearch compatibility)
- File storage operations with Minio
- Authentication and permission checks
To run tests in CI, create a pull request to trigger the test workflow.
If you need to run a specific test locally for debugging purposes, you can use:
```bash
cd extralit-server
python -m pytest [test_path] -v
```
However, expect some tests to fail or be skipped when running locally.
## Contributing
Check our [contribution guide](https://docs.extralit.ai/latest/community/contributor) and join our [Slack community](https://join.slack.com/t/extralit/shared_invite/zt-2kt8t12r7-uFj0bZ5SPAOhRFkxP7ZQaQ).
## Roadmap
See our [development roadmap](https://github.com/orgs/extralit/projects/2/views/1) and share your ideas!
Raw data
{
"_id": null,
"home_page": null,
"name": "extralit-server",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Extralit Labs <extralit.contact@gmail.com>",
"keywords": "literature-review, data-annotation, artificial-intelligence, machine-learning, human-in-the-loop, mlops",
"author": null,
"author_email": "Extralit Labs <extralit.contact@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/94/75/e45206e9ee0870a8172cf79360116a69d1ed6f89b22093fc9c47411cb435/extralit_server-0.6.1.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n <a href=\"\"><img src=\"https://github.com/extralit/extralit/raw/develop/extralit/docs/assets/logo.svg\" alt=\"Extralit\" width=\"150\"></a>\n <br>\n Extralit Server\n <br>\n</h1>\n<h3 align=\"center\">Extract structured data from scientific literature with human validation</h2>\n\n<p align=\"center\">\n<a href=\"https://pypi.org/project/extralit/\">\n<img alt=\"CI\" src=\"https://img.shields.io/pypi/v/extralit.svg?style=flat-round&logo=pypi&logoColor=white\">\n</a>\n<img alt=\"Codecov\" src=\"https://codecov.io/gh/extralit/extralit/branch/main/graph/badge.svg\"/>\n<a href=\"https://pepy.tech/project/extralit\">\n<img alt=\"Downloads\" src=\"https://static.pepy.tech/personalized-badge/extralit?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month\">\n</a>\n</p>\n\n<p align=\"center\">\n<a href=\"https://www.linkedin.com/company/extralit-ai\">\n<img src=\"https://img.shields.io/badge/linkedin-blue?logo=linkedin\"/>\n</a>\n</p>\n\nThis repository contains developer information about the backend server components. For general usage, please refer to our [main repository](https://github.com/extralit/extralit) or our [documentation](https://docs.extralit.ai/latest/).\n\n## Source Code Structure\n\nThe server components are split into two main services:\n\n```\n/extralit_server\n /api # Core extraction API endpoints\n /handlers # FastAPI request handlers\n /schemas # Data models and validation\n /services # Business logic services\n /utils # Helper utilities\n /ml # Machine learning components\n /extractors # Document extraction models\n /ocr # OCR processing\n /pipeline # Extraction pipeline orchestration\n /storage # Data persistence layer\n /models # Database models\n /search # Search engine integration\n /vector # Vector store\n```\n\n```\n/extralit_server\n /api # Annotation UI API endpoints\n /handlers\n /schemas\n /models # Database models\n /auth # Authentication\n /tasks # Background jobs\n```\n\n## Development Environment\n\nThe development environment uses Docker Compose to run all required services. Key commands:\n\n```sh\n# Start all services\ndocker-compose up -d\n\n# Run server in dev mode\npdm run dev\n\n# Run tests\npdm test\n\n# Format and lint\npdm format\npdm lint\n\n# Run all checks\npdm all\n```\n\n## Key Components\n\n### FastAPI Servers\n\n- **Extraction Server**: Handles document processing, extraction pipeline, and ML model serving\n- **Annotation Server**: Manages UI, data validation workflow, and user collaboration\n\n### Databases\n\n- **PostgreSQL**: Main database for user data, annotations, and metadata\n- **Elasticsearch**: Vector store for semantic search and document indexing\n- **Weaviate**: Vector database for table and section embeddings\n\n### Background Processing\n\nUses Celery for asynchronous tasks like:\n\n- Document OCR and preprocessing\n- ML model inference\n- Batch extraction jobs\n- Data export\n\n## CLI Commands\n\nKey management commands:\n\n```sh\n# Database management\npython -m extralit_server db migrate\npython -m extralit_server db create-user\n\n# Start servers\npython -m extralit_server start\npython -m extralit_server start\n\n# Run workers\npython -m extralit_server worker\n```\n\nSee full CLI documentation in our [developer docs](https://docs.extralit.ai/latest/developer).\n\n## Running Tests\n\nThe pytest suite is primarily designed to run in the CI environment using GitHub Actions as defined in `.github/workflows/extralit-server.yml`. This workflow sets up the necessary dependencies including Elasticsearch, PostgreSQL, Redis, and Minio.\n\nNote that some tests are specifically skipped when running locally due to differences between the CI environment and local development environments. These tests may involve:\n\n- Search engine dynamics (Elasticsearch/OpenSearch compatibility)\n- File storage operations with Minio\n- Authentication and permission checks\n\nTo run tests in CI, create a pull request to trigger the test workflow.\n\nIf you need to run a specific test locally for debugging purposes, you can use:\n\n```bash\ncd extralit-server\npython -m pytest [test_path] -v\n```\n\nHowever, expect some tests to fail or be skipped when running locally.\n\n\n## Contributing\n\nCheck our [contribution guide](https://docs.extralit.ai/latest/community/contributor) and join our [Slack community](https://join.slack.com/t/extralit/shared_invite/zt-2kt8t12r7-uFj0bZ5SPAOhRFkxP7ZQaQ).\n\n## Roadmap\n\nSee our [development roadmap](https://github.com/orgs/extralit/projects/2/views/1) and share your ideas!\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.",
"version": "0.6.1",
"project_urls": {
"documentation": "https://docs.extralit.ai",
"homepage": "https://extralit.ai",
"repository": "https://github.com/extralit/extralit"
},
"split_keywords": [
"literature-review",
" data-annotation",
" artificial-intelligence",
" machine-learning",
" human-in-the-loop",
" mlops"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c32d55c3db6c255ab8e65107f167e5873b14cf8cf0f2799000ce8568c0b1c819",
"md5": "2aecd7bb1b63cc5e64975709e39f5cb6",
"sha256": "b4d16099787fbb8ec50046560c494bd26cd097f7d33fe46426c409f8534728df"
},
"downloads": -1,
"filename": "extralit_server-0.6.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2aecd7bb1b63cc5e64975709e39f5cb6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 5697539,
"upload_time": "2025-08-29T07:41:51",
"upload_time_iso_8601": "2025-08-29T07:41:51.673479Z",
"url": "https://files.pythonhosted.org/packages/c3/2d/55c3db6c255ab8e65107f167e5873b14cf8cf0f2799000ce8568c0b1c819/extralit_server-0.6.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9475e45206e9ee0870a8172cf79360116a69d1ed6f89b22093fc9c47411cb435",
"md5": "e7f4926bcad9a445815b6e4ba3510119",
"sha256": "ff00d068cb506b1c383f299b09a9a7ae09310306266e98094e85ba7e8e9fb68f"
},
"downloads": -1,
"filename": "extralit_server-0.6.1.tar.gz",
"has_sig": false,
"md5_digest": "e7f4926bcad9a445815b6e4ba3510119",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 5610087,
"upload_time": "2025-08-29T07:41:53",
"upload_time_iso_8601": "2025-08-29T07:41:53.339063Z",
"url": "https://files.pythonhosted.org/packages/94/75/e45206e9ee0870a8172cf79360116a69d1ed6f89b22093fc9c47411cb435/extralit_server-0.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 07:41:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "extralit",
"github_project": "extralit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "extralit-server"
}