embed-rerank


Nameembed-rerank JSON
Version 1.1.3 PyPI version JSON
download
home_pageNone
SummarySingle Model Embedding & Reranker API with Apple Silicon acceleration
upload_time2025-09-03 07:53:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.13
licenseNone
keywords apple-silicon embeddings fastapi mlx reranking
VCS
bugtrack_url
requirements fastapi uvicorn pydantic pydantic-settings torch sentence-transformers transformers numpy httpx python-multipart python-dotenv structlog prometheus-client psutil mlx mlx-lm pytest pytest-asyncio black flake8 mypy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ๐Ÿ”ฅ Single Model Embedding & Reranking API

<div align="center">
<strong>Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first, OpenAI & TEI compatible)</strong>
<br/><br/>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/v/embed-rerank?logo=pypi&logoColor=white" /></a>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/dm/embed-rerank?logo=pypi&logoColor=white" /></a>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/pyversions/embed-rerank?logo=python&logoColor=white" /></a>
<a href="https://github.com/joonsoo-me/embed-rerank/blob/main/LICENSE"><img src="https://img.shields.io/github/license/joonsoo-me/embed-rerank?logo=opensource&logoColor=white" /></a>
<a href="https://developer.apple.com/silicon/"><img src="https://img.shields.io/badge/Apple_Silicon-Ready-blue?logo=apple&logoColor=white" /></a>
<a href="https://ml-explore.github.io/mlx/"><img src="https://img.shields.io/badge/MLX-Optimized-green?logo=apple&logoColor=white" /></a>
<a href="https://fastapi.tiangolo.com/"><img src="https://img.shields.io/badge/FastAPI-009688?logo=fastapi&logoColor=white" /></a>
</div>

---

## โšก Why This Matters

Transform your text processing with **10x faster** embeddings and reranking on Apple Silicon. Drop-in replacement for OpenAI API and Hugging Face TEI with **zero code changes** required.

### ๐Ÿ† Performance Comparison

| Operation | This API (MLX) | OpenAI API | Hugging Face TEI |
|-----------|----------------|------------|------------------|
| **Embeddings** | `0.78ms` | `200ms+` | `15ms` |
| **Reranking** | `1.04ms` | `N/A` | `25ms` |
| **Model Loading** | `0.36s` | `N/A` | `3.2s` |
| **Cost** | `$0` | `$0.02/1K` | `$0` |

*Tested on Apple M4 Max*

---

## ๐Ÿš€ Quick Start

### Option 1: Install from PyPI (Recommended)

```bash
# Install the package
pip install embed-rerank

# Start the server (default port 9000)
embed-rerank

# Or with custom port and options
embed-rerank --port 8080 --host 127.0.0.1

# See all options
embed-rerank --help
```

### Option 2: From Source (Development)

```bash
# 1. Clone and setup
git clone https://github.com/joonsoo-me/embed-rerank.git
cd embed-rerank
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Start server (macOS/Linux)
./tools/server-run.sh

# 3. Test it works
curl http://localhost:9000/health/
```

๐ŸŽ‰ **Done!** Visit http://localhost:9000/docs for interactive API documentation.

---

## ๐Ÿ›  Server Management (macOS/Linux)

```bash
# Start server (background)
./tools/server-run.sh

# Start server (foreground/development)
./tools/server-run-foreground.sh

# Stop server
./tools/server-stop.sh
```

> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.

---

## โš™๏ธ CLI Configuration

### PyPI Package CLI Options

**Server Options:**
- `--host`: Server host (default: 0.0.0.0)
- `--port`: Server port (default: 9000)
- `--reload`: Enable auto-reload for development
- `--log-level`: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

**Testing Options:**
- `--test quick`: Run quick validation tests
- `--test performance`: Run performance benchmark tests  
- `--test quality`: Run quality validation tests
- `--test full`: Run comprehensive test suite
- `--test-url`: Custom server URL for testing
- `--test-output`: Test output directory

**Examples:**
```bash
# Custom server configuration
embed-rerank --port 8080 --host 127.0.0.1 --reload

# Built-in performance testing
embed-rerank --port 8080 &
embed-rerank --test performance --test-url http://localhost:8080
pkill -f embed-rerank

# Environment variables
export PORT=8080 HOST=127.0.0.1
embed-rerank
```

### Source Code Configuration

Create `.env` file for development:

```env
# Server
PORT=9000
HOST=0.0.0.0

# Backend
BACKEND=auto                                   # auto | mlx | torch
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ

# Model Cache (first run downloads ~2.3GB model)
MODEL_PATH=                               # Custom model directory
TRANSFORMERS_CACHE=                           # HF cache override
# Default: ~/.cache/huggingface/hub/

# Performance
BATCH_SIZE=32
MAX_TEXTS_PER_REQUEST=100
```

---

### ๐Ÿ“‚ Model Cache Management

The service automatically manages model downloads and caching:

| Environment Variable | Purpose | Default |
|---------------------|---------|---------|
| `MODEL_PATH` | Custom model directory | *(uses HF cache)* |
| `TRANSFORMERS_CACHE` | Override HF cache location | `~/.cache/huggingface/transformers` |
| `HF_HOME` | HF home directory | `~/.cache/huggingface` |
| *(auto)* | Default HF cache | `~/.cache/huggingface/hub/` |

#### Cache Location Check
``` bash
# Find where your model is cached
python3 -c "
import os
print('MODEL_PATH:', os.getenv('MODEL_PATH', '<not set>'))
print('TRANSFORMERS_CACHE:', os.getenv('TRANSFORMERS_CACHE', '<not set>'))
print('HF_HOME:', os.getenv('HF_HOME', '<not set>'))
print('Default cache:', os.path.expanduser('~/.cache/huggingface/hub'))
"

# List cached Qwen3 models
ls ~/.cache/huggingface/hub | grep -i qwen3 || echo "No Qwen3 models found in cache"
```

---

## ๐ŸŒ Three APIs, One Service

| API | Endpoint | Use Case |
|-----|----------|----------|
| **Native** | `/api/v1/embed`, `/api/v1/rerank` | New projects |
| **OpenAI** | `/v1/embeddings` | Existing OpenAI code |
| **TEI** | `/embed`, `/rerank` | Hugging Face TEI replacement |

### OpenAI Compatible (Drop-in)

```python
import openai

client = openai.OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v1"
)

response = client.embeddings.create(
    input=["Hello world", "Apple Silicon is fast!"],
    model="text-embedding-ada-002"
)
# ๐Ÿš€ 10x faster than OpenAI, same code!
```

### TEI Compatible

```bash
curl -X POST "http://localhost:9000/embed" 
  -H "Content-Type: application/json" 
  -d '{"inputs": ["Hello world"], "truncate": true}'
```

### Native API

```bash
# Embeddings
curl -X POST "http://localhost:9000/api/v1/embed/" 
  -H "Content-Type: application/json" 
  -d '{"texts": ["Apple Silicon", "MLX acceleration"]}'

# Reranking  
curl -X POST "http://localhost:9000/api/v1/rerank/" 
  -H "Content-Type: application/json" 
  -d '{"query": "machine learning", "passages": ["AI is cool", "Dogs are pets", "MLX is fast"]}'
```

---

## ๐Ÿงช Performance Testing & Validation

### ๐Ÿš€ Built-in CLI Testing (PyPI Package)

The PyPI package includes powerful built-in testing capabilities:

```bash
# Quick validation (basic functionality check)
embed-rerank --test quick

# Performance benchmark (latency, throughput, concurrency)
embed-rerank --test performance --test-url http://localhost:9000

# Quality validation (semantic similarity, multilingual)  
embed-rerank --test quality --test-url http://localhost:9000

# Full comprehensive test suite
embed-rerank --test full --test-url http://localhost:9000
```

**Test Results Include:**
- ๐Ÿ“Š **Latency Metrics**: Mean, P95, P99 response times
- ๐Ÿš€ **Throughput Analysis**: Texts/sec processing rates
- ๐Ÿ”„ **Concurrency Testing**: Multi-threaded request handling
- ๐Ÿง  **Semantic Validation**: Quality of embeddings and reranking
- ๐ŸŒ **Multilingual Support**: Cross-language performance
- ๐Ÿ“ˆ **JSON Reports**: Detailed metrics for automation

**Example Output:**
```bash
๐Ÿงช Running Embed-Rerank Test Suite
๐Ÿ“ Target URL: http://localhost:9000
๐ŸŽฏ Test Mode: performance

โšก Performance Results:
โ€ข Latency: 0.8ms avg, 1.2ms max
โ€ข Throughput: 1,250 texts/sec peak  
โ€ข Concurrency: 5/5 successful (100%)
๐Ÿ“ Results saved to: ./test-results/performance_test_results.json
```

### ๐Ÿ”ง Advanced Testing (Source Code)

```bash
### ๐Ÿ”ง Advanced Testing (Source Code)

For development and comprehensive testing with the source code:

```bash
# Comprehensive test suite (shell script)
./tools/server-tests.sh

# Run with specific test modes
./tools/server-tests.sh --quick            # Quick validation only
./tools/server-tests.sh --performance      # Performance tests only
./tools/server-tests.sh --full             # Full test suite

# Custom server URL
./tools/server-tests.sh --url http://localhost:8080

# Manual health check
curl http://localhost:9000/health/

# Unit tests with pytest
pytest tests/ -v
```

---

## ๐Ÿ›  Development & Deployment

### Local Development (Source Code)

```bash
# Start server (background)
./tools/server-run.sh

# Start server (foreground/development)
./tools/server-run-foreground.sh

# Stop server
./tools/server-stop.sh
```

### Production Deployment (PyPI Package)

```bash
# Install and run
pip install embed-rerank
embed-rerank --port 9000 --host 0.0.0.0

# With custom configuration
embed-rerank --port 8080 --reload --log-level DEBUG

# Background deployment
embed-rerank --port 9000 &
```

> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.
```

---

## ๐Ÿš€ What You Get

### ๐ŸŽฏ Core Features
- โœ… **Zero Code Changes**: Drop-in replacement for OpenAI API and TEI
- โšก **10x Performance**: Apple MLX acceleration on Apple Silicon  
- ๐Ÿ’ฐ **Zero Costs**: No API fees, runs locally
- ๐Ÿ”’ **Privacy**: Your data never leaves your machine
- ๐ŸŽฏ **Three APIs**: Native, OpenAI, and TEI compatibility
- ๐Ÿ“Š **Production Ready**: Health checks, monitoring, structured logging

### ๐Ÿงช Built-in Testing & Benchmarking
- ๐Ÿ“ˆ **CLI Performance Testing**: One-command benchmarking
- ๐Ÿ”„ **Concurrency Testing**: Multi-threaded request validation
- ๐Ÿง  **Quality Validation**: Semantic similarity and multilingual testing
- ๐Ÿ“Š **JSON Reports**: Automated performance monitoring
- ๐Ÿš€ **Real-time Metrics**: Latency, throughput, and success rates

### ๐Ÿ›  Deployment Options
- ๐Ÿ“ฆ **PyPI Package**: `pip install embed-rerank` for instant deployment
- ๐Ÿ”ง **Source Code**: Full development environment with advanced tooling
- ๐ŸŒ **Multi-API Support**: OpenAI, TEI, and native endpoints
- โš™๏ธ **Flexible Configuration**: Environment variables, CLI args, .env files

---

## ๏ฟฝ Quick Reference

### Installation & Startup
```bash
# PyPI Package (Production)
pip install embed-rerank && embed-rerank

# Source Code (Development)  
git clone https://github.com/joonsoo-me/embed-rerank.git
cd embed-rerank && ./tools/server-run.sh
```

### Performance Testing
```bash
# One-command benchmark
embed-rerank --test performance --test-url http://localhost:9000

# Comprehensive testing
./tools/server-tests.sh --full
```

### API Endpoints
- **Native**: `POST /api/v1/embed/` and `/api/v1/rerank/`
- **OpenAI**: `POST /v1/embeddings` (drop-in replacement)
- **TEI**: `POST /embed` and `/rerank` (Hugging Face compatible)
- **Health**: `GET /health/` (monitoring and diagnostics)

---

## ๏ฟฝ๐Ÿ“„ License

MIT License - build amazing things with this code!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "embed-rerank",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "apple-silicon, embeddings, fastapi, mlx, reranking",
    "author": null,
    "author_email": "joonsoo-me <bear8203@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/10/98/5bd6dbc4aa37941f68efeeaf5bb4acc54cf61e8134f30572898d8eb26aa6/embed_rerank-1.1.3.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd25 Single Model Embedding & Reranking API\n\n<div align=\"center\">\n<strong>Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first, OpenAI & TEI compatible)</strong>\n<br/><br/>\n<a href=\"https://pypi.org/project/embed-rerank/\"><img src=\"https://img.shields.io/pypi/v/embed-rerank?logo=pypi&logoColor=white\" /></a>\n<a href=\"https://pypi.org/project/embed-rerank/\"><img src=\"https://img.shields.io/pypi/dm/embed-rerank?logo=pypi&logoColor=white\" /></a>\n<a href=\"https://pypi.org/project/embed-rerank/\"><img src=\"https://img.shields.io/pypi/pyversions/embed-rerank?logo=python&logoColor=white\" /></a>\n<a href=\"https://github.com/joonsoo-me/embed-rerank/blob/main/LICENSE\"><img src=\"https://img.shields.io/github/license/joonsoo-me/embed-rerank?logo=opensource&logoColor=white\" /></a>\n<a href=\"https://developer.apple.com/silicon/\"><img src=\"https://img.shields.io/badge/Apple_Silicon-Ready-blue?logo=apple&logoColor=white\" /></a>\n<a href=\"https://ml-explore.github.io/mlx/\"><img src=\"https://img.shields.io/badge/MLX-Optimized-green?logo=apple&logoColor=white\" /></a>\n<a href=\"https://fastapi.tiangolo.com/\"><img src=\"https://img.shields.io/badge/FastAPI-009688?logo=fastapi&logoColor=white\" /></a>\n</div>\n\n---\n\n## \u26a1 Why This Matters\n\nTransform your text processing with **10x faster** embeddings and reranking on Apple Silicon. Drop-in replacement for OpenAI API and Hugging Face TEI with **zero code changes** required.\n\n### \ud83c\udfc6 Performance Comparison\n\n| Operation | This API (MLX) | OpenAI API | Hugging Face TEI |\n|-----------|----------------|------------|------------------|\n| **Embeddings** | `0.78ms` | `200ms+` | `15ms` |\n| **Reranking** | `1.04ms` | `N/A` | `25ms` |\n| **Model Loading** | `0.36s` | `N/A` | `3.2s` |\n| **Cost** | `$0` | `$0.02/1K` | `$0` |\n\n*Tested on Apple M4 Max*\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Option 1: Install from PyPI (Recommended)\n\n```bash\n# Install the package\npip install embed-rerank\n\n# Start the server (default port 9000)\nembed-rerank\n\n# Or with custom port and options\nembed-rerank --port 8080 --host 127.0.0.1\n\n# See all options\nembed-rerank --help\n```\n\n### Option 2: From Source (Development)\n\n```bash\n# 1. Clone and setup\ngit clone https://github.com/joonsoo-me/embed-rerank.git\ncd embed-rerank\npython -m venv .venv && source .venv/bin/activate\npip install -r requirements.txt\n\n# 2. Start server (macOS/Linux)\n./tools/server-run.sh\n\n# 3. Test it works\ncurl http://localhost:9000/health/\n```\n\n\ud83c\udf89 **Done!** Visit http://localhost:9000/docs for interactive API documentation.\n\n---\n\n## \ud83d\udee0 Server Management (macOS/Linux)\n\n```bash\n# Start server (background)\n./tools/server-run.sh\n\n# Start server (foreground/development)\n./tools/server-run-foreground.sh\n\n# Stop server\n./tools/server-stop.sh\n```\n\n> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.\n\n---\n\n## \u2699\ufe0f CLI Configuration\n\n### PyPI Package CLI Options\n\n**Server Options:**\n- `--host`: Server host (default: 0.0.0.0)\n- `--port`: Server port (default: 9000)\n- `--reload`: Enable auto-reload for development\n- `--log-level`: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)\n\n**Testing Options:**\n- `--test quick`: Run quick validation tests\n- `--test performance`: Run performance benchmark tests  \n- `--test quality`: Run quality validation tests\n- `--test full`: Run comprehensive test suite\n- `--test-url`: Custom server URL for testing\n- `--test-output`: Test output directory\n\n**Examples:**\n```bash\n# Custom server configuration\nembed-rerank --port 8080 --host 127.0.0.1 --reload\n\n# Built-in performance testing\nembed-rerank --port 8080 &\nembed-rerank --test performance --test-url http://localhost:8080\npkill -f embed-rerank\n\n# Environment variables\nexport PORT=8080 HOST=127.0.0.1\nembed-rerank\n```\n\n### Source Code Configuration\n\nCreate `.env` file for development:\n\n```env\n# Server\nPORT=9000\nHOST=0.0.0.0\n\n# Backend\nBACKEND=auto                                   # auto | mlx | torch\nMODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ\n\n# Model Cache (first run downloads ~2.3GB model)\nMODEL_PATH=                               # Custom model directory\nTRANSFORMERS_CACHE=                           # HF cache override\n# Default: ~/.cache/huggingface/hub/\n\n# Performance\nBATCH_SIZE=32\nMAX_TEXTS_PER_REQUEST=100\n```\n\n---\n\n### \ud83d\udcc2 Model Cache Management\n\nThe service automatically manages model downloads and caching:\n\n| Environment Variable | Purpose | Default |\n|---------------------|---------|---------|\n| `MODEL_PATH` | Custom model directory | *(uses HF cache)* |\n| `TRANSFORMERS_CACHE` | Override HF cache location | `~/.cache/huggingface/transformers` |\n| `HF_HOME` | HF home directory | `~/.cache/huggingface` |\n| *(auto)* | Default HF cache | `~/.cache/huggingface/hub/` |\n\n#### Cache Location Check\n``` bash\n# Find where your model is cached\npython3 -c \"\nimport os\nprint('MODEL_PATH:', os.getenv('MODEL_PATH', '<not set>'))\nprint('TRANSFORMERS_CACHE:', os.getenv('TRANSFORMERS_CACHE', '<not set>'))\nprint('HF_HOME:', os.getenv('HF_HOME', '<not set>'))\nprint('Default cache:', os.path.expanduser('~/.cache/huggingface/hub'))\n\"\n\n# List cached Qwen3 models\nls ~/.cache/huggingface/hub | grep -i qwen3 || echo \"No Qwen3 models found in cache\"\n```\n\n---\n\n## \ud83c\udf10 Three APIs, One Service\n\n| API | Endpoint | Use Case |\n|-----|----------|----------|\n| **Native** | `/api/v1/embed`, `/api/v1/rerank` | New projects |\n| **OpenAI** | `/v1/embeddings` | Existing OpenAI code |\n| **TEI** | `/embed`, `/rerank` | Hugging Face TEI replacement |\n\n### OpenAI Compatible (Drop-in)\n\n```python\nimport openai\n\nclient = openai.OpenAI(\n    api_key=\"dummy-key\",\n    base_url=\"http://localhost:9000/v1\"\n)\n\nresponse = client.embeddings.create(\n    input=[\"Hello world\", \"Apple Silicon is fast!\"],\n    model=\"text-embedding-ada-002\"\n)\n# \ud83d\ude80 10x faster than OpenAI, same code!\n```\n\n### TEI Compatible\n\n```bash\ncurl -X POST \"http://localhost:9000/embed\" \n  -H \"Content-Type: application/json\" \n  -d '{\"inputs\": [\"Hello world\"], \"truncate\": true}'\n```\n\n### Native API\n\n```bash\n# Embeddings\ncurl -X POST \"http://localhost:9000/api/v1/embed/\" \n  -H \"Content-Type: application/json\" \n  -d '{\"texts\": [\"Apple Silicon\", \"MLX acceleration\"]}'\n\n# Reranking  \ncurl -X POST \"http://localhost:9000/api/v1/rerank/\" \n  -H \"Content-Type: application/json\" \n  -d '{\"query\": \"machine learning\", \"passages\": [\"AI is cool\", \"Dogs are pets\", \"MLX is fast\"]}'\n```\n\n---\n\n## \ud83e\uddea Performance Testing & Validation\n\n### \ud83d\ude80 Built-in CLI Testing (PyPI Package)\n\nThe PyPI package includes powerful built-in testing capabilities:\n\n```bash\n# Quick validation (basic functionality check)\nembed-rerank --test quick\n\n# Performance benchmark (latency, throughput, concurrency)\nembed-rerank --test performance --test-url http://localhost:9000\n\n# Quality validation (semantic similarity, multilingual)  \nembed-rerank --test quality --test-url http://localhost:9000\n\n# Full comprehensive test suite\nembed-rerank --test full --test-url http://localhost:9000\n```\n\n**Test Results Include:**\n- \ud83d\udcca **Latency Metrics**: Mean, P95, P99 response times\n- \ud83d\ude80 **Throughput Analysis**: Texts/sec processing rates\n- \ud83d\udd04 **Concurrency Testing**: Multi-threaded request handling\n- \ud83e\udde0 **Semantic Validation**: Quality of embeddings and reranking\n- \ud83c\udf0d **Multilingual Support**: Cross-language performance\n- \ud83d\udcc8 **JSON Reports**: Detailed metrics for automation\n\n**Example Output:**\n```bash\n\ud83e\uddea Running Embed-Rerank Test Suite\n\ud83d\udccd Target URL: http://localhost:9000\n\ud83c\udfaf Test Mode: performance\n\n\u26a1 Performance Results:\n\u2022 Latency: 0.8ms avg, 1.2ms max\n\u2022 Throughput: 1,250 texts/sec peak  \n\u2022 Concurrency: 5/5 successful (100%)\n\ud83d\udcc1 Results saved to: ./test-results/performance_test_results.json\n```\n\n### \ud83d\udd27 Advanced Testing (Source Code)\n\n```bash\n### \ud83d\udd27 Advanced Testing (Source Code)\n\nFor development and comprehensive testing with the source code:\n\n```bash\n# Comprehensive test suite (shell script)\n./tools/server-tests.sh\n\n# Run with specific test modes\n./tools/server-tests.sh --quick            # Quick validation only\n./tools/server-tests.sh --performance      # Performance tests only\n./tools/server-tests.sh --full             # Full test suite\n\n# Custom server URL\n./tools/server-tests.sh --url http://localhost:8080\n\n# Manual health check\ncurl http://localhost:9000/health/\n\n# Unit tests with pytest\npytest tests/ -v\n```\n\n---\n\n## \ud83d\udee0 Development & Deployment\n\n### Local Development (Source Code)\n\n```bash\n# Start server (background)\n./tools/server-run.sh\n\n# Start server (foreground/development)\n./tools/server-run-foreground.sh\n\n# Stop server\n./tools/server-stop.sh\n```\n\n### Production Deployment (PyPI Package)\n\n```bash\n# Install and run\npip install embed-rerank\nembed-rerank --port 9000 --host 0.0.0.0\n\n# With custom configuration\nembed-rerank --port 8080 --reload --log-level DEBUG\n\n# Background deployment\nembed-rerank --port 9000 &\n```\n\n> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.\n```\n\n---\n\n## \ud83d\ude80 What You Get\n\n### \ud83c\udfaf Core Features\n- \u2705 **Zero Code Changes**: Drop-in replacement for OpenAI API and TEI\n- \u26a1 **10x Performance**: Apple MLX acceleration on Apple Silicon  \n- \ud83d\udcb0 **Zero Costs**: No API fees, runs locally\n- \ud83d\udd12 **Privacy**: Your data never leaves your machine\n- \ud83c\udfaf **Three APIs**: Native, OpenAI, and TEI compatibility\n- \ud83d\udcca **Production Ready**: Health checks, monitoring, structured logging\n\n### \ud83e\uddea Built-in Testing & Benchmarking\n- \ud83d\udcc8 **CLI Performance Testing**: One-command benchmarking\n- \ud83d\udd04 **Concurrency Testing**: Multi-threaded request validation\n- \ud83e\udde0 **Quality Validation**: Semantic similarity and multilingual testing\n- \ud83d\udcca **JSON Reports**: Automated performance monitoring\n- \ud83d\ude80 **Real-time Metrics**: Latency, throughput, and success rates\n\n### \ud83d\udee0 Deployment Options\n- \ud83d\udce6 **PyPI Package**: `pip install embed-rerank` for instant deployment\n- \ud83d\udd27 **Source Code**: Full development environment with advanced tooling\n- \ud83c\udf10 **Multi-API Support**: OpenAI, TEI, and native endpoints\n- \u2699\ufe0f **Flexible Configuration**: Environment variables, CLI args, .env files\n\n---\n\n## \ufffd Quick Reference\n\n### Installation & Startup\n```bash\n# PyPI Package (Production)\npip install embed-rerank && embed-rerank\n\n# Source Code (Development)  \ngit clone https://github.com/joonsoo-me/embed-rerank.git\ncd embed-rerank && ./tools/server-run.sh\n```\n\n### Performance Testing\n```bash\n# One-command benchmark\nembed-rerank --test performance --test-url http://localhost:9000\n\n# Comprehensive testing\n./tools/server-tests.sh --full\n```\n\n### API Endpoints\n- **Native**: `POST /api/v1/embed/` and `/api/v1/rerank/`\n- **OpenAI**: `POST /v1/embeddings` (drop-in replacement)\n- **TEI**: `POST /embed` and `/rerank` (Hugging Face compatible)\n- **Health**: `GET /health/` (monitoring and diagnostics)\n\n---\n\n## \ufffd\ud83d\udcc4 License\n\nMIT License - build amazing things with this code!\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Single Model Embedding & Reranker API with Apple Silicon acceleration",
    "version": "1.1.3",
    "project_urls": {
        "Documentation": "https://github.com/joonsoo-me/embed-rerank#readme",
        "Issues": "https://github.com/joonsoo-me/embed-rerank/issues",
        "Source": "https://github.com/joonsoo-me/embed-rerank"
    },
    "split_keywords": [
        "apple-silicon",
        " embeddings",
        " fastapi",
        " mlx",
        " reranking"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "856ccb74a18477d0d9980e4bd5b24b92cf4df5e0cb32d3e4178bcd5342bf1574",
                "md5": "3ad424058c3f58e3f8dcc45ebaa25ea8",
                "sha256": "2a384ecaf7aa92e3dfe1e659d538c3231a0a03a34db0af9c7c3a7313f99e7514"
            },
            "downloads": -1,
            "filename": "embed_rerank-1.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3ad424058c3f58e3f8dcc45ebaa25ea8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 62452,
            "upload_time": "2025-09-03T07:53:10",
            "upload_time_iso_8601": "2025-09-03T07:53:10.703220Z",
            "url": "https://files.pythonhosted.org/packages/85/6c/cb74a18477d0d9980e4bd5b24b92cf4df5e0cb32d3e4178bcd5342bf1574/embed_rerank-1.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "10985bd6dbc4aa37941f68efeeaf5bb4acc54cf61e8134f30572898d8eb26aa6",
                "md5": "5c0dd985671b0fba257d8f714ffb5018",
                "sha256": "8401882948dda4c6cd84cd22a02373f49a01cf9277ab0d73b02823c7fae0f1f7"
            },
            "downloads": -1,
            "filename": "embed_rerank-1.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "5c0dd985671b0fba257d8f714ffb5018",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 101254,
            "upload_time": "2025-09-03T07:53:12",
            "upload_time_iso_8601": "2025-09-03T07:53:12.226727Z",
            "url": "https://files.pythonhosted.org/packages/10/98/5bd6dbc4aa37941f68efeeaf5bb4acc54cf61e8134f30572898d8eb26aa6/embed_rerank-1.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-03 07:53:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "joonsoo-me",
    "github_project": "embed-rerank#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.104.0"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pydantic-settings",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.30.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.24.0"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    ">=",
                    "0.25.0"
                ]
            ]
        },
        {
            "name": "python-multipart",
            "specs": []
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "structlog",
            "specs": []
        },
        {
            "name": "prometheus-client",
            "specs": []
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.9.0"
                ]
            ]
        },
        {
            "name": "mlx",
            "specs": [
                [
                    ">=",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "mlx-lm",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.4.0"
                ]
            ]
        },
        {
            "name": "pytest-asyncio",
            "specs": [
                [
                    ">=",
                    "0.21.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        }
    ],
    "lcname": "embed-rerank"
}
        
Elapsed time: 1.74424s