# OpenSearchEval: Ultimate Search Evaluation Platform
<div align="center">
[](https://badge.fury.io/py/opensearcheval)
[](https://pypi.org/project/opensearcheval/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/llamasearchai/OpenSearchEval/actions)
[](https://coveralls.io/github/llamasearchai/OpenSearchEval?branch=main)
[](https://opensearcheval.readthedocs.io/en/latest/?badge=latest)
[](https://github.com/psf/black)
[](https://pycqa.github.io/isort/)
[](https://mypy-lang.org/)
[](https://github.com/PyCQA/bandit)
[](https://pepy.tech/project/opensearcheval)
[](https://pypi.org/project/opensearcheval/)
[](https://github.com/llamasearchai/OpenSearchEval/stargazers)
[](https://github.com/llamasearchai/OpenSearchEval/network/members)
</div>
## Overview
OpenSearchEval is a comprehensive, production-ready platform for evaluating search quality, conducting A/B tests, and analyzing user behavior. Built with modern Python technologies and featuring agent architecture, FastAPI endpoints, and MLX integration for Apple Silicon optimization.
### Key Features
- **Search Quality Metrics**: MRR, NDCG, Precision@K, Recall@K, and more
- **A/B Testing Framework**: Design, run, and analyze experiments with statistical significance
- **User Behavior Analytics**: Click tracking, dwell time, satisfaction metrics, and journey analysis
- **Agent Architecture**: Distributed processing with asynchronous task handling
- **MLX Integration**: Optimized ML components for Apple Silicon with GPU acceleration
- **LLM-as-Judge**: AI-powered qualitative evaluation of search results
- **FastAPI Endpoints**: Production-ready REST API with automatic documentation
- **Rich Visualizations**: Interactive charts, dashboards, and reporting tools
- **Extensible Design**: Plugin architecture for custom metrics and data sources
- **Performance Monitoring**: Real-time metrics collection and alerting
## Quick Start
### Installation
```bash
# Install from PyPI
pip install opensearcheval
# Or install with all optional dependencies
pip install opensearcheval[all]
# For development
pip install opensearcheval[dev]
```
### Docker Installation (Recommended)
```bash
# Clone the repository
git clone https://github.com/llamasearchai/OpenSearchEval.git
cd OpenSearchEval
# Start with Docker Compose
docker-compose up -d
# Access the API at http://localhost:8000
# Access the UI at http://localhost:5000
```
### Basic Usage
```python
import opensearcheval as ose
# Initialize experiment manager
experiments = ose.ExperimentManager()
# Create a new A/B test
experiment = experiments.create_experiment(
name="New Ranking Algorithm",
description="Testing improved relevance scoring",
metrics=["mean_reciprocal_rank", "click_through_rate", "satisfaction_score"]
)
# Evaluate search results
results = [
{"doc_id": "doc1", "title": "Python Tutorial", "score": 0.95},
{"doc_id": "doc2", "title": "Machine Learning Guide", "score": 0.87},
]
# Calculate metrics
mrr = ose.mean_reciprocal_rank(
query="python tutorial",
results=results,
relevance_judgments={"doc1": 2, "doc2": 1}
)
print(f"Mean Reciprocal Rank: {mrr:.3f}")
```
### API Usage
```python
import httpx
# Start the API server
# opensearcheval-api --host 0.0.0.0 --port 8000
# Evaluate search results via API
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/api/v1/evaluate",
json={
"id": "eval_001",
"query": "python tutorial",
"results": results,
"relevance_judgments": {"doc1": 2, "doc2": 1}
}
)
metrics = response.json()["metrics"]
print(f"API Response: {metrics}")
```
### Command Line Interface
```bash
# Evaluate search results from file
opensearcheval evaluate --input-file search_data.json --output-file results.json
# Create an experiment
opensearcheval experiment create \
--name "Improved Ranking" \
--metrics "mrr,ndcg_at_10,ctr" \
--description "Testing new ranking algorithm"
# Generate embeddings
opensearcheval embedding generate \
--input-file documents.json \
--output-file embeddings.json \
--model text-embedding-ada-002
# Run A/B test analysis
opensearcheval ab-test analyze \
--control-data control.json \
--treatment-data treatment.json \
--confidence-level 0.95
```
## Architecture
### Agent-Based Processing
```python
# Initialize agent manager
agent_manager = ose.AgentManager()
# Create specialized agents
search_agent = ose.SearchEvaluationAgent(
name="search_evaluator",
metrics=[ose.mean_reciprocal_rank, ose.ndcg_at_k],
config={"batch_size": 100}
)
ab_test_agent = ose.ABTestAgent(
name="ab_tester",
statistical_tests=[ose.t_test, ose.mann_whitney_u_test],
config={"confidence_level": 0.95}
)
# Register and start agents
agent_manager.register_agent(search_agent)
agent_manager.register_agent(ab_test_agent)
await agent_manager.start_all()
```
### MLX Integration (Apple Silicon)
```python
# Use MLX for accelerated ML operations
from opensearcheval.ml import SearchRankingModel, ClickThroughRatePredictor
# Initialize MLX-powered ranking model
ranking_model = SearchRankingModel(
embedding_dim=768,
hidden_dim=256,
use_mlx=True
)
# Train CTR prediction model
ctr_model = ClickThroughRatePredictor(
feature_dim=20,
hidden_dims=[64, 32]
)
# Train the model
trained_model = train_ctr_model(
training_data=training_examples,
epochs=50,
batch_size=64
)
```
## Available Metrics
### Relevance Metrics
- **Mean Reciprocal Rank (MRR)**: Average of reciprocal ranks
- **NDCG@K**: Normalized Discounted Cumulative Gain
- **Precision@K**: Precision at various cutoff points
- **Recall@K**: Recall at various cutoff points
- **F1-Score**: Harmonic mean of precision and recall
### User Behavior Metrics
- **Click-Through Rate (CTR)**: Percentage of results clicked
- **Time to First Click**: Average time before first interaction
- **Dwell Time**: Time spent on clicked results
- **Abandonment Rate**: Percentage of searches without clicks
- **Satisfaction Score**: Composite user satisfaction metric
### Advanced Metrics
- **Reciprocal Rank Fusion**: Combines multiple result sets
- **Diversity Score**: Measures result diversity
- **Novelty Score**: Measures result novelty
- **Coverage**: Measures catalog coverage
## Configuration
### Environment Variables
```bash
# Application
APP_NAME=OpenSearchEval
ENVIRONMENT=production
DEBUG=false
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
# Database
DB_TYPE=postgresql
DB_URL=postgresql://user:password@localhost/opensearcheval
# LLM Configuration
LLM_MODEL=gpt-4-turbo
LLM_API_KEY=your-openai-api-key
LLM_TEMPERATURE=0.1
# MLX Configuration
USE_MLX=true
MLX_MODEL_PATH=./models/mlx_model
# Caching
REDIS_URL=redis://localhost:6379/0
ENABLE_CACHING=true
CACHE_TTL=3600
# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090
```
### Configuration File
```python
from opensearcheval.core.config import get_settings
settings = get_settings()
# Access configuration
print(f"API Host: {settings.API_HOST}")
print(f"Database URL: {settings.database_url}")
print(f"Using MLX: {settings.USE_MLX}")
```
## API Documentation
### REST Endpoints
- `GET /health` - Health check
- `POST /api/v1/evaluate` - Evaluate search results
- `POST /api/v1/analyze-ab-test` - Analyze A/B test results
- `POST /api/v1/llm-judge` - LLM-based evaluation
- `GET /api/v1/experiments` - List experiments
- `POST /api/v1/experiments` - Create experiment
- `GET /api/v1/experiments/{id}` - Get experiment details
- `POST /api/v1/experiments/{id}/start` - Start experiment
- `POST /api/v1/experiments/{id}/stop` - Stop experiment
### WebSocket Support
```python
# Real-time metrics streaming
import websockets
async def metrics_stream():
uri = "ws://localhost:8000/ws/metrics"
async with websockets.connect(uri) as websocket:
async for message in websocket:
metrics = json.loads(message)
print(f"Real-time metrics: {metrics}")
```
## Data Connectors
### Supported Data Sources
- **Databases**: PostgreSQL, MySQL, SQLite, MongoDB
- **Big Data**: Apache Spark, Databricks, Snowflake
- **Search Engines**: Elasticsearch, OpenSearch, Solr
- **Cloud Storage**: AWS S3, Google Cloud Storage, Azure Blob
- **APIs**: REST APIs, GraphQL endpoints
### Custom Connectors
```python
from opensearcheval.data.connectors import BaseConnector
class CustomConnector(BaseConnector):
def connect(self):
# Implement connection logic
pass
def fetch_data(self, query):
# Implement data fetching
pass
```
## Testing
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=opensearcheval --cov-report=html
# Run specific test categories
pytest tests/test_metrics.py -v
pytest tests/test_agents.py -v
pytest tests/test_api.py -v
# Run performance tests
pytest tests/performance/ -v
```
### Test Data Generation
```python
from opensearcheval.testing import generate_test_data
# Generate synthetic search data
test_data = generate_test_data(
num_queries=1000,
num_results_per_query=50,
relevance_distribution=[0.6, 0.3, 0.1] # irrelevant, relevant, highly relevant
)
```
## Deployment
### Production Deployment
```yaml
# docker-compose.prod.yml
version: '3.8'
services:
opensearcheval-api:
image: opensearcheval:latest
ports:
- "8000:8000"
environment:
- ENVIRONMENT=production
- DB_URL=postgresql://user:password@db/opensearcheval
- REDIS_URL=redis://redis:6379/0
depends_on:
- db
- redis
opensearcheval-ui:
image: opensearcheval-ui:latest
ports:
- "5000:5000"
depends_on:
- opensearcheval-api
db:
image: postgres:15
environment:
- POSTGRES_DB=opensearcheval
- POSTGRES_USER=opensearcheval
- POSTGRES_PASSWORD=secure_password
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
```
### Kubernetes Deployment
```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: opensearcheval-api
spec:
replicas: 3
selector:
matchLabels:
app: opensearcheval-api
template:
metadata:
labels:
app: opensearcheval-api
spec:
containers:
- name: api
image: opensearcheval:latest
ports:
- containerPort: 8000
env:
- name: ENVIRONMENT
value: "production"
- name: DB_URL
valueFrom:
secretKeyRef:
name: opensearcheval-secrets
key: database-url
```
## Monitoring and Observability
### Metrics Collection
```python
from opensearcheval.monitoring import MetricsCollector
collector = MetricsCollector()
# Custom metrics
collector.counter("search_evaluations_total").inc()
collector.histogram("evaluation_duration_seconds").observe(0.5)
collector.gauge("active_experiments").set(5)
```
### Grafana Dashboard
Pre-built Grafana dashboards available at `/grafana/dashboards/`
### Alerting
```python
from opensearcheval.monitoring import AlertManager
alert_manager = AlertManager()
# Configure alerts
alert_manager.add_alert(
name="high_error_rate",
condition="error_rate > 0.05",
notification_channels=["slack", "email"]
)
```
## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Clone the repository
git clone https://github.com/llamasearchai/OpenSearchEval.git
cd OpenSearchEval
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
# Format code
black opensearcheval/
isort opensearcheval/
# Type checking
mypy opensearcheval/
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Support
- **Documentation**: [https://opensearcheval.readthedocs.io/](https://opensearcheval.readthedocs.io/)
- **Issues**: [GitHub Issues](https://github.com/llamasearchai/OpenSearchEval/issues)
- **Discussions**: [GitHub Discussions](https://github.com/llamasearchai/OpenSearchEval/discussions)
- **Email**: nikjois@llamasearch.ai
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for version history and release notes.
## Acknowledgments
- Built with [FastAPI](https://fastapi.tiangolo.com/) and [MLX](https://ml-explore.github.io/mlx/)
- Inspired by modern search evaluation best practices
- Special thanks to the open-source community
---
<div align="center">
<p>Made with love by <a href="https://github.com/nikjois">Nik Jois</a></p>
<p>Star this project if you find it useful!</p>
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/llamasearchai/OpenSearchEval",
"name": "opensearcheval",
"maintainer": "Nik Jois",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "nikjois@llamasearch.ai",
"keywords": "search, evaluation, a/b testing, mlx, llm, machine learning, information retrieval, search quality, metrics, analytics, experiment, ranking, click-through rate, precision, recall, ndcg, mrr, agent architecture, fastapi, dashboard, visualization",
"author": "Nik Jois",
"author_email": "Nik Jois <nikjois@llamasearch.ai>",
"download_url": "https://files.pythonhosted.org/packages/03/9f/acde54e412c6d5bc85024f1c884809498170daf3578af906714f7ef798a8/opensearcheval-1.0.2.tar.gz",
"platform": "any",
"description": "# OpenSearchEval: Ultimate Search Evaluation Platform\n\n<div align=\"center\">\n\n[](https://badge.fury.io/py/opensearcheval)\n[](https://pypi.org/project/opensearcheval/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/llamasearchai/OpenSearchEval/actions)\n[](https://coveralls.io/github/llamasearchai/OpenSearchEval?branch=main)\n[](https://opensearcheval.readthedocs.io/en/latest/?badge=latest)\n\n[](https://github.com/psf/black)\n[](https://pycqa.github.io/isort/)\n[](https://mypy-lang.org/)\n[](https://github.com/PyCQA/bandit)\n\n[](https://pepy.tech/project/opensearcheval)\n[](https://pypi.org/project/opensearcheval/)\n[](https://github.com/llamasearchai/OpenSearchEval/stargazers)\n[](https://github.com/llamasearchai/OpenSearchEval/network/members)\n\n</div>\n\n## Overview\n\nOpenSearchEval is a comprehensive, production-ready platform for evaluating search quality, conducting A/B tests, and analyzing user behavior. Built with modern Python technologies and featuring agent architecture, FastAPI endpoints, and MLX integration for Apple Silicon optimization.\n\n### Key Features\n\n- **Search Quality Metrics**: MRR, NDCG, Precision@K, Recall@K, and more\n- **A/B Testing Framework**: Design, run, and analyze experiments with statistical significance\n- **User Behavior Analytics**: Click tracking, dwell time, satisfaction metrics, and journey analysis\n- **Agent Architecture**: Distributed processing with asynchronous task handling\n- **MLX Integration**: Optimized ML components for Apple Silicon with GPU acceleration\n- **LLM-as-Judge**: AI-powered qualitative evaluation of search results\n- **FastAPI Endpoints**: Production-ready REST API with automatic documentation\n- **Rich Visualizations**: Interactive charts, dashboards, and reporting tools\n- **Extensible Design**: Plugin architecture for custom metrics and data sources\n- **Performance Monitoring**: Real-time metrics collection and alerting\n\n## Quick Start\n\n### Installation\n\n```bash\n# Install from PyPI\npip install opensearcheval\n\n# Or install with all optional dependencies\npip install opensearcheval[all]\n\n# For development\npip install opensearcheval[dev]\n```\n\n### Docker Installation (Recommended)\n\n```bash\n# Clone the repository\ngit clone https://github.com/llamasearchai/OpenSearchEval.git\ncd OpenSearchEval\n\n# Start with Docker Compose\ndocker-compose up -d\n\n# Access the API at http://localhost:8000\n# Access the UI at http://localhost:5000\n```\n\n### Basic Usage\n\n```python\nimport opensearcheval as ose\n\n# Initialize experiment manager\nexperiments = ose.ExperimentManager()\n\n# Create a new A/B test\nexperiment = experiments.create_experiment(\n name=\"New Ranking Algorithm\",\n description=\"Testing improved relevance scoring\",\n metrics=[\"mean_reciprocal_rank\", \"click_through_rate\", \"satisfaction_score\"]\n)\n\n# Evaluate search results\nresults = [\n {\"doc_id\": \"doc1\", \"title\": \"Python Tutorial\", \"score\": 0.95},\n {\"doc_id\": \"doc2\", \"title\": \"Machine Learning Guide\", \"score\": 0.87},\n]\n\n# Calculate metrics\nmrr = ose.mean_reciprocal_rank(\n query=\"python tutorial\",\n results=results,\n relevance_judgments={\"doc1\": 2, \"doc2\": 1}\n)\n\nprint(f\"Mean Reciprocal Rank: {mrr:.3f}\")\n```\n\n### API Usage\n\n```python\nimport httpx\n\n# Start the API server\n# opensearcheval-api --host 0.0.0.0 --port 8000\n\n# Evaluate search results via API\nasync with httpx.AsyncClient() as client:\n response = await client.post(\n \"http://localhost:8000/api/v1/evaluate\",\n json={\n \"id\": \"eval_001\",\n \"query\": \"python tutorial\",\n \"results\": results,\n \"relevance_judgments\": {\"doc1\": 2, \"doc2\": 1}\n }\n )\n \n metrics = response.json()[\"metrics\"]\n print(f\"API Response: {metrics}\")\n```\n\n### Command Line Interface\n\n```bash\n# Evaluate search results from file\nopensearcheval evaluate --input-file search_data.json --output-file results.json\n\n# Create an experiment\nopensearcheval experiment create \\\n --name \"Improved Ranking\" \\\n --metrics \"mrr,ndcg_at_10,ctr\" \\\n --description \"Testing new ranking algorithm\"\n\n# Generate embeddings\nopensearcheval embedding generate \\\n --input-file documents.json \\\n --output-file embeddings.json \\\n --model text-embedding-ada-002\n\n# Run A/B test analysis\nopensearcheval ab-test analyze \\\n --control-data control.json \\\n --treatment-data treatment.json \\\n --confidence-level 0.95\n```\n\n## Architecture\n\n### Agent-Based Processing\n\n```python\n# Initialize agent manager\nagent_manager = ose.AgentManager()\n\n# Create specialized agents\nsearch_agent = ose.SearchEvaluationAgent(\n name=\"search_evaluator\",\n metrics=[ose.mean_reciprocal_rank, ose.ndcg_at_k],\n config={\"batch_size\": 100}\n)\n\nab_test_agent = ose.ABTestAgent(\n name=\"ab_tester\",\n statistical_tests=[ose.t_test, ose.mann_whitney_u_test],\n config={\"confidence_level\": 0.95}\n)\n\n# Register and start agents\nagent_manager.register_agent(search_agent)\nagent_manager.register_agent(ab_test_agent)\nawait agent_manager.start_all()\n```\n\n### MLX Integration (Apple Silicon)\n\n```python\n# Use MLX for accelerated ML operations\nfrom opensearcheval.ml import SearchRankingModel, ClickThroughRatePredictor\n\n# Initialize MLX-powered ranking model\nranking_model = SearchRankingModel(\n embedding_dim=768,\n hidden_dim=256,\n use_mlx=True\n)\n\n# Train CTR prediction model\nctr_model = ClickThroughRatePredictor(\n feature_dim=20,\n hidden_dims=[64, 32]\n)\n\n# Train the model\ntrained_model = train_ctr_model(\n training_data=training_examples,\n epochs=50,\n batch_size=64\n)\n```\n\n## Available Metrics\n\n### Relevance Metrics\n- **Mean Reciprocal Rank (MRR)**: Average of reciprocal ranks\n- **NDCG@K**: Normalized Discounted Cumulative Gain\n- **Precision@K**: Precision at various cutoff points\n- **Recall@K**: Recall at various cutoff points\n- **F1-Score**: Harmonic mean of precision and recall\n\n### User Behavior Metrics\n- **Click-Through Rate (CTR)**: Percentage of results clicked\n- **Time to First Click**: Average time before first interaction\n- **Dwell Time**: Time spent on clicked results\n- **Abandonment Rate**: Percentage of searches without clicks\n- **Satisfaction Score**: Composite user satisfaction metric\n\n### Advanced Metrics\n- **Reciprocal Rank Fusion**: Combines multiple result sets\n- **Diversity Score**: Measures result diversity\n- **Novelty Score**: Measures result novelty\n- **Coverage**: Measures catalog coverage\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# Application\nAPP_NAME=OpenSearchEval\nENVIRONMENT=production\nDEBUG=false\n\n# API Configuration\nAPI_HOST=0.0.0.0\nAPI_PORT=8000\n\n# Database\nDB_TYPE=postgresql\nDB_URL=postgresql://user:password@localhost/opensearcheval\n\n# LLM Configuration\nLLM_MODEL=gpt-4-turbo\nLLM_API_KEY=your-openai-api-key\nLLM_TEMPERATURE=0.1\n\n# MLX Configuration\nUSE_MLX=true\nMLX_MODEL_PATH=./models/mlx_model\n\n# Caching\nREDIS_URL=redis://localhost:6379/0\nENABLE_CACHING=true\nCACHE_TTL=3600\n\n# Monitoring\nENABLE_METRICS=true\nMETRICS_PORT=9090\n```\n\n### Configuration File\n\n```python\nfrom opensearcheval.core.config import get_settings\n\nsettings = get_settings()\n\n# Access configuration\nprint(f\"API Host: {settings.API_HOST}\")\nprint(f\"Database URL: {settings.database_url}\")\nprint(f\"Using MLX: {settings.USE_MLX}\")\n```\n\n## API Documentation\n\n### REST Endpoints\n\n- `GET /health` - Health check\n- `POST /api/v1/evaluate` - Evaluate search results\n- `POST /api/v1/analyze-ab-test` - Analyze A/B test results\n- `POST /api/v1/llm-judge` - LLM-based evaluation\n- `GET /api/v1/experiments` - List experiments\n- `POST /api/v1/experiments` - Create experiment\n- `GET /api/v1/experiments/{id}` - Get experiment details\n- `POST /api/v1/experiments/{id}/start` - Start experiment\n- `POST /api/v1/experiments/{id}/stop` - Stop experiment\n\n### WebSocket Support\n\n```python\n# Real-time metrics streaming\nimport websockets\n\nasync def metrics_stream():\n uri = \"ws://localhost:8000/ws/metrics\"\n async with websockets.connect(uri) as websocket:\n async for message in websocket:\n metrics = json.loads(message)\n print(f\"Real-time metrics: {metrics}\")\n```\n\n## Data Connectors\n\n### Supported Data Sources\n\n- **Databases**: PostgreSQL, MySQL, SQLite, MongoDB\n- **Big Data**: Apache Spark, Databricks, Snowflake\n- **Search Engines**: Elasticsearch, OpenSearch, Solr\n- **Cloud Storage**: AWS S3, Google Cloud Storage, Azure Blob\n- **APIs**: REST APIs, GraphQL endpoints\n\n### Custom Connectors\n\n```python\nfrom opensearcheval.data.connectors import BaseConnector\n\nclass CustomConnector(BaseConnector):\n def connect(self):\n # Implement connection logic\n pass\n \n def fetch_data(self, query):\n # Implement data fetching\n pass\n```\n\n## Testing\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=opensearcheval --cov-report=html\n\n# Run specific test categories\npytest tests/test_metrics.py -v\npytest tests/test_agents.py -v\npytest tests/test_api.py -v\n\n# Run performance tests\npytest tests/performance/ -v\n```\n\n### Test Data Generation\n\n```python\nfrom opensearcheval.testing import generate_test_data\n\n# Generate synthetic search data\ntest_data = generate_test_data(\n num_queries=1000,\n num_results_per_query=50,\n relevance_distribution=[0.6, 0.3, 0.1] # irrelevant, relevant, highly relevant\n)\n```\n\n## Deployment\n\n### Production Deployment\n\n```yaml\n# docker-compose.prod.yml\nversion: '3.8'\nservices:\n opensearcheval-api:\n image: opensearcheval:latest\n ports:\n - \"8000:8000\"\n environment:\n - ENVIRONMENT=production\n - DB_URL=postgresql://user:password@db/opensearcheval\n - REDIS_URL=redis://redis:6379/0\n depends_on:\n - db\n - redis\n \n opensearcheval-ui:\n image: opensearcheval-ui:latest\n ports:\n - \"5000:5000\"\n depends_on:\n - opensearcheval-api\n \n db:\n image: postgres:15\n environment:\n - POSTGRES_DB=opensearcheval\n - POSTGRES_USER=opensearcheval\n - POSTGRES_PASSWORD=secure_password\n volumes:\n - postgres_data:/var/lib/postgresql/data\n \n redis:\n image: redis:7-alpine\n volumes:\n - redis_data:/data\n\nvolumes:\n postgres_data:\n redis_data:\n```\n\n### Kubernetes Deployment\n\n```yaml\n# k8s-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: opensearcheval-api\nspec:\n replicas: 3\n selector:\n matchLabels:\n app: opensearcheval-api\n template:\n metadata:\n labels:\n app: opensearcheval-api\n spec:\n containers:\n - name: api\n image: opensearcheval:latest\n ports:\n - containerPort: 8000\n env:\n - name: ENVIRONMENT\n value: \"production\"\n - name: DB_URL\n valueFrom:\n secretKeyRef:\n name: opensearcheval-secrets\n key: database-url\n```\n\n## Monitoring and Observability\n\n### Metrics Collection\n\n```python\nfrom opensearcheval.monitoring import MetricsCollector\n\ncollector = MetricsCollector()\n\n# Custom metrics\ncollector.counter(\"search_evaluations_total\").inc()\ncollector.histogram(\"evaluation_duration_seconds\").observe(0.5)\ncollector.gauge(\"active_experiments\").set(5)\n```\n\n### Grafana Dashboard\n\nPre-built Grafana dashboards available at `/grafana/dashboards/`\n\n### Alerting\n\n```python\nfrom opensearcheval.monitoring import AlertManager\n\nalert_manager = AlertManager()\n\n# Configure alerts\nalert_manager.add_alert(\n name=\"high_error_rate\",\n condition=\"error_rate > 0.05\",\n notification_channels=[\"slack\", \"email\"]\n)\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/llamasearchai/OpenSearchEval.git\ncd OpenSearchEval\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\n\n# Install development dependencies\npip install -e .[dev]\n\n# Install pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n\n# Format code\nblack opensearcheval/\nisort opensearcheval/\n\n# Type checking\nmypy opensearcheval/\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- **Documentation**: [https://opensearcheval.readthedocs.io/](https://opensearcheval.readthedocs.io/)\n- **Issues**: [GitHub Issues](https://github.com/llamasearchai/OpenSearchEval/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/llamasearchai/OpenSearchEval/discussions)\n- **Email**: nikjois@llamasearch.ai\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for version history and release notes.\n\n## Acknowledgments\n\n- Built with [FastAPI](https://fastapi.tiangolo.com/) and [MLX](https://ml-explore.github.io/mlx/)\n- Inspired by modern search evaluation best practices\n- Special thanks to the open-source community\n\n---\n\n<div align=\"center\">\n <p>Made with love by <a href=\"https://github.com/nikjois\">Nik Jois</a></p>\n <p>Star this project if you find it useful!</p>\n</div>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive search evaluation platform with agent architecture and MLX integration",
"version": "1.0.2",
"project_urls": {
"Bug Tracker": "https://github.com/llamasearchai/OpenSearchEval/issues",
"Changelog": "https://github.com/llamasearchai/OpenSearchEval/blob/main/CHANGELOG.md",
"Documentation": "https://opensearcheval.readthedocs.io/",
"Homepage": "https://github.com/llamasearchai/OpenSearchEval",
"Repository": "https://github.com/llamasearchai/OpenSearchEval"
},
"split_keywords": [
"search",
" evaluation",
" a/b testing",
" mlx",
" llm",
" machine learning",
" information retrieval",
" search quality",
" metrics",
" analytics",
" experiment",
" ranking",
" click-through rate",
" precision",
" recall",
" ndcg",
" mrr",
" agent architecture",
" fastapi",
" dashboard",
" visualization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "49b5f6475092c82755cbc942924a2dbbfbf5db087fad1d4e7bbb729d7c9e0f71",
"md5": "394b93b3fc590079b50d30adb31c1a1c",
"sha256": "5b95a0311f37706f76af96457ab7564af5884825b768e558d378519e0f9089d6"
},
"downloads": -1,
"filename": "opensearcheval-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "394b93b3fc590079b50d30adb31c1a1c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 77057,
"upload_time": "2025-07-16T17:22:35",
"upload_time_iso_8601": "2025-07-16T17:22:35.899368Z",
"url": "https://files.pythonhosted.org/packages/49/b5/f6475092c82755cbc942924a2dbbfbf5db087fad1d4e7bbb729d7c9e0f71/opensearcheval-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "039facde54e412c6d5bc85024f1c884809498170daf3578af906714f7ef798a8",
"md5": "b2172973ae66afd218f1adc325ae0772",
"sha256": "307dce0e6bb1c6be946f7a379b1cbbaf2ce138ce816dc091dd9d5037f1477854"
},
"downloads": -1,
"filename": "opensearcheval-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "b2172973ae66afd218f1adc325ae0772",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 72771,
"upload_time": "2025-07-16T17:22:37",
"upload_time_iso_8601": "2025-07-16T17:22:37.432487Z",
"url": "https://files.pythonhosted.org/packages/03/9f/acde54e412c6d5bc85024f1c884809498170daf3578af906714f7ef798a8/opensearcheval-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-16 17:22:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "llamasearchai",
"github_project": "OpenSearchEval",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pydantic",
"specs": [
[
">=",
"2.4.0"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "fastapi",
"specs": [
[
">=",
"0.104.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.23.2"
]
]
},
{
"name": "httpx",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "aiofiles",
"specs": [
[
">=",
"23.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.24.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.10.0"
]
]
},
{
"name": "mlx",
"specs": [
[
">=",
"0.0.5"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.7.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.12.0"
]
]
},
{
"name": "plotly",
"specs": [
[
">=",
"5.14.0"
]
]
},
{
"name": "sqlalchemy",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "alembic",
"specs": [
[
">=",
"1.12.0"
]
]
},
{
"name": "pyspark",
"specs": [
[
">=",
"3.4.0"
]
]
},
{
"name": "flask",
"specs": [
[
">=",
"2.3.0"
]
]
},
{
"name": "flask-cors",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "dash",
"specs": [
[
">=",
"2.11.0"
]
]
},
{
"name": "dash-bootstrap-components",
"specs": [
[
">=",
"1.4.0"
]
]
},
{
"name": "asyncio-mqtt",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "rich",
"specs": [
[
">=",
"13.0.0"
]
]
},
{
"name": "typer",
"specs": [
[
">=",
"0.9.0"
]
]
},
{
"name": "redis",
"specs": [
[
">=",
"4.5.0"
]
]
},
{
"name": "prometheus-client",
"specs": [
[
">=",
"0.17.0"
]
]
},
{
"name": "psutil",
"specs": [
[
">=",
"5.9.0"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.30.0"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "openai",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
">=",
"2.8.0"
]
]
},
{
"name": "cryptography",
"specs": [
[
">=",
"41.0.0"
]
]
},
{
"name": "passlib",
"specs": [
[
">=",
"1.7.4"
]
]
},
{
"name": "python-jose",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "openpyxl",
"specs": [
[
">=",
"3.1.0"
]
]
},
{
"name": "xlsxwriter",
"specs": [
[
">=",
"3.1.0"
]
]
},
{
"name": "orjson",
"specs": [
[
">=",
"3.9.0"
]
]
},
{
"name": "structlog",
"specs": [
[
">=",
"23.0.0"
]
]
}
],
"lcname": "opensearcheval"
}