opensearcheval

Name	opensearcheval JSON
Version	1.0.2 JSON
	download
home_page	https://github.com/llamasearchai/OpenSearchEval
Summary	A comprehensive search evaluation platform with agent architecture and MLX integration
upload_time	2025-07-16 17:22:37
maintainer	Nik Jois
docs_url	None
author	Nik Jois
requires_python	>=3.9
license	MIT
keywords	search evaluation a/b testing mlx llm machine learning information retrieval search quality metrics analytics experiment ranking click-through rate precision recall ndcg mrr agent architecture fastapi dashboard visualization
VCS
bugtrack_url
requirements	pydantic pydantic-settings python-dotenv fastapi uvicorn httpx aiofiles pandas numpy scikit-learn scipy mlx matplotlib seaborn plotly sqlalchemy alembic pyspark flask flask-cors dash dash-bootstrap-components asyncio-mqtt click rich typer redis prometheus-client psutil transformers torch openai python-dateutil cryptography passlib python-jose openpyxl xlsxwriter orjson structlog
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # OpenSearchEval: Ultimate Search Evaluation Platform

<div align="center">

[![PyPI version](https://badge.fury.io/py/opensearcheval.svg)](https://badge.fury.io/py/opensearcheval)
[![Python versions](https://img.shields.io/pypi/pyversions/opensearcheval.svg)](https://pypi.org/project/opensearcheval/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/llamasearchai/OpenSearchEval/workflows/CI/badge.svg)](https://github.com/llamasearchai/OpenSearchEval/actions)
[![Coverage Status](https://coveralls.io/repos/github/llamasearchai/OpenSearchEval/badge.svg?branch=main)](https://coveralls.io/github/llamasearchai/OpenSearchEval?branch=main)
[![Documentation Status](https://readthedocs.org/projects/opensearcheval/badge/?version=latest)](https://opensearcheval.readthedocs.io/en/latest/?badge=latest)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Type checking: mypy](https://img.shields.io/badge/%20type_checker-mypy-%231674b1?style=flat)](https://mypy-lang.org/)
[![Security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)

[![Downloads](https://pepy.tech/badge/opensearcheval)](https://pepy.tech/project/opensearcheval)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/opensearcheval)](https://pypi.org/project/opensearcheval/)
[![GitHub stars](https://img.shields.io/github/stars/llamasearchai/OpenSearchEval?style=social)](https://github.com/llamasearchai/OpenSearchEval/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/llamasearchai/OpenSearchEval?style=social)](https://github.com/llamasearchai/OpenSearchEval/network/members)

</div>

## Overview

OpenSearchEval is a comprehensive, production-ready platform for evaluating search quality, conducting A/B tests, and analyzing user behavior. Built with modern Python technologies and featuring agent architecture, FastAPI endpoints, and MLX integration for Apple Silicon optimization.

### Key Features

- **Search Quality Metrics**: MRR, NDCG, Precision@K, Recall@K, and more
- **A/B Testing Framework**: Design, run, and analyze experiments with statistical significance
- **User Behavior Analytics**: Click tracking, dwell time, satisfaction metrics, and journey analysis
- **Agent Architecture**: Distributed processing with asynchronous task handling
- **MLX Integration**: Optimized ML components for Apple Silicon with GPU acceleration
- **LLM-as-Judge**: AI-powered qualitative evaluation of search results
- **FastAPI Endpoints**: Production-ready REST API with automatic documentation
- **Rich Visualizations**: Interactive charts, dashboards, and reporting tools
- **Extensible Design**: Plugin architecture for custom metrics and data sources
- **Performance Monitoring**: Real-time metrics collection and alerting

## Quick Start

### Installation

```bash
# Install from PyPI
pip install opensearcheval

# Or install with all optional dependencies
pip install opensearcheval[all]

# For development
pip install opensearcheval[dev]
```

### Docker Installation (Recommended)

```bash
# Clone the repository
git clone https://github.com/llamasearchai/OpenSearchEval.git
cd OpenSearchEval

# Start with Docker Compose
docker-compose up -d

# Access the API at http://localhost:8000
# Access the UI at http://localhost:5000
```

### Basic Usage

```python
import opensearcheval as ose

# Initialize experiment manager
experiments = ose.ExperimentManager()

# Create a new A/B test
experiment = experiments.create_experiment(
    name="New Ranking Algorithm",
    description="Testing improved relevance scoring",
    metrics=["mean_reciprocal_rank", "click_through_rate", "satisfaction_score"]
)

# Evaluate search results
results = [
    {"doc_id": "doc1", "title": "Python Tutorial", "score": 0.95},
    {"doc_id": "doc2", "title": "Machine Learning Guide", "score": 0.87},
]

# Calculate metrics
mrr = ose.mean_reciprocal_rank(
    query="python tutorial",
    results=results,
    relevance_judgments={"doc1": 2, "doc2": 1}
)

print(f"Mean Reciprocal Rank: {mrr:.3f}")
```

### API Usage

```python
import httpx

# Start the API server
# opensearcheval-api --host 0.0.0.0 --port 8000

# Evaluate search results via API
async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8000/api/v1/evaluate",
        json={
            "id": "eval_001",
            "query": "python tutorial",
            "results": results,
            "relevance_judgments": {"doc1": 2, "doc2": 1}
        }
    )
    
    metrics = response.json()["metrics"]
    print(f"API Response: {metrics}")
```

### Command Line Interface

```bash
# Evaluate search results from file
opensearcheval evaluate --input-file search_data.json --output-file results.json

# Create an experiment
opensearcheval experiment create \
    --name "Improved Ranking" \
    --metrics "mrr,ndcg_at_10,ctr" \
    --description "Testing new ranking algorithm"

# Generate embeddings
opensearcheval embedding generate \
    --input-file documents.json \
    --output-file embeddings.json \
    --model text-embedding-ada-002

# Run A/B test analysis
opensearcheval ab-test analyze \
    --control-data control.json \
    --treatment-data treatment.json \
    --confidence-level 0.95
```

## Architecture

### Agent-Based Processing

```python
# Initialize agent manager
agent_manager = ose.AgentManager()

# Create specialized agents
search_agent = ose.SearchEvaluationAgent(
    name="search_evaluator",
    metrics=[ose.mean_reciprocal_rank, ose.ndcg_at_k],
    config={"batch_size": 100}
)

ab_test_agent = ose.ABTestAgent(
    name="ab_tester",
    statistical_tests=[ose.t_test, ose.mann_whitney_u_test],
    config={"confidence_level": 0.95}
)

# Register and start agents
agent_manager.register_agent(search_agent)
agent_manager.register_agent(ab_test_agent)
await agent_manager.start_all()
```

### MLX Integration (Apple Silicon)

```python
# Use MLX for accelerated ML operations
from opensearcheval.ml import SearchRankingModel, ClickThroughRatePredictor

# Initialize MLX-powered ranking model
ranking_model = SearchRankingModel(
    embedding_dim=768,
    hidden_dim=256,
    use_mlx=True
)

# Train CTR prediction model
ctr_model = ClickThroughRatePredictor(
    feature_dim=20,
    hidden_dims=[64, 32]
)

# Train the model
trained_model = train_ctr_model(
    training_data=training_examples,
    epochs=50,
    batch_size=64
)
```

## Available Metrics

### Relevance Metrics
- **Mean Reciprocal Rank (MRR)**: Average of reciprocal ranks
- **NDCG@K**: Normalized Discounted Cumulative Gain
- **Precision@K**: Precision at various cutoff points
- **Recall@K**: Recall at various cutoff points
- **F1-Score**: Harmonic mean of precision and recall

### User Behavior Metrics
- **Click-Through Rate (CTR)**: Percentage of results clicked
- **Time to First Click**: Average time before first interaction
- **Dwell Time**: Time spent on clicked results
- **Abandonment Rate**: Percentage of searches without clicks
- **Satisfaction Score**: Composite user satisfaction metric

### Advanced Metrics
- **Reciprocal Rank Fusion**: Combines multiple result sets
- **Diversity Score**: Measures result diversity
- **Novelty Score**: Measures result novelty
- **Coverage**: Measures catalog coverage

## Configuration

### Environment Variables

```bash
# Application
APP_NAME=OpenSearchEval
ENVIRONMENT=production
DEBUG=false

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000

# Database
DB_TYPE=postgresql
DB_URL=postgresql://user:password@localhost/opensearcheval

# LLM Configuration
LLM_MODEL=gpt-4-turbo
LLM_API_KEY=your-openai-api-key
LLM_TEMPERATURE=0.1

# MLX Configuration
USE_MLX=true
MLX_MODEL_PATH=./models/mlx_model

# Caching
REDIS_URL=redis://localhost:6379/0
ENABLE_CACHING=true
CACHE_TTL=3600

# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090
```

### Configuration File

```python
from opensearcheval.core.config import get_settings

settings = get_settings()

# Access configuration
print(f"API Host: {settings.API_HOST}")
print(f"Database URL: {settings.database_url}")
print(f"Using MLX: {settings.USE_MLX}")
```

## API Documentation

### REST Endpoints

- `GET /health` - Health check
- `POST /api/v1/evaluate` - Evaluate search results
- `POST /api/v1/analyze-ab-test` - Analyze A/B test results
- `POST /api/v1/llm-judge` - LLM-based evaluation
- `GET /api/v1/experiments` - List experiments
- `POST /api/v1/experiments` - Create experiment
- `GET /api/v1/experiments/{id}` - Get experiment details
- `POST /api/v1/experiments/{id}/start` - Start experiment
- `POST /api/v1/experiments/{id}/stop` - Stop experiment

### WebSocket Support

```python
# Real-time metrics streaming
import websockets

async def metrics_stream():
    uri = "ws://localhost:8000/ws/metrics"
    async with websockets.connect(uri) as websocket:
        async for message in websocket:
            metrics = json.loads(message)
            print(f"Real-time metrics: {metrics}")
```

## Data Connectors

### Supported Data Sources

- **Databases**: PostgreSQL, MySQL, SQLite, MongoDB
- **Big Data**: Apache Spark, Databricks, Snowflake
- **Search Engines**: Elasticsearch, OpenSearch, Solr
- **Cloud Storage**: AWS S3, Google Cloud Storage, Azure Blob
- **APIs**: REST APIs, GraphQL endpoints

### Custom Connectors

```python
from opensearcheval.data.connectors import BaseConnector

class CustomConnector(BaseConnector):
    def connect(self):
        # Implement connection logic
        pass
    
    def fetch_data(self, query):
        # Implement data fetching
        pass
```

## Testing

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=opensearcheval --cov-report=html

# Run specific test categories
pytest tests/test_metrics.py -v
pytest tests/test_agents.py -v
pytest tests/test_api.py -v

# Run performance tests
pytest tests/performance/ -v
```

### Test Data Generation

```python
from opensearcheval.testing import generate_test_data

# Generate synthetic search data
test_data = generate_test_data(
    num_queries=1000,
    num_results_per_query=50,
    relevance_distribution=[0.6, 0.3, 0.1]  # irrelevant, relevant, highly relevant
)
```

## Deployment

### Production Deployment

```yaml
# docker-compose.prod.yml
version: '3.8'
services:
  opensearcheval-api:
    image: opensearcheval:latest
    ports:
      - "8000:8000"
    environment:
      - ENVIRONMENT=production
      - DB_URL=postgresql://user:password@db/opensearcheval
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - db
      - redis
  
  opensearcheval-ui:
    image: opensearcheval-ui:latest
    ports:
      - "5000:5000"
    depends_on:
      - opensearcheval-api
  
  db:
    image: postgres:15
    environment:
      - POSTGRES_DB=opensearcheval
      - POSTGRES_USER=opensearcheval
      - POSTGRES_PASSWORD=secure_password
    volumes:
      - postgres_data:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:
```

### Kubernetes Deployment

```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: opensearcheval-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: opensearcheval-api
  template:
    metadata:
      labels:
        app: opensearcheval-api
    spec:
      containers:
      - name: api
        image: opensearcheval:latest
        ports:
        - containerPort: 8000
        env:
        - name: ENVIRONMENT
          value: "production"
        - name: DB_URL
          valueFrom:
            secretKeyRef:
              name: opensearcheval-secrets
              key: database-url
```

## Monitoring and Observability

### Metrics Collection

```python
from opensearcheval.monitoring import MetricsCollector

collector = MetricsCollector()

# Custom metrics
collector.counter("search_evaluations_total").inc()
collector.histogram("evaluation_duration_seconds").observe(0.5)
collector.gauge("active_experiments").set(5)
```

### Grafana Dashboard

Pre-built Grafana dashboards available at `/grafana/dashboards/`

### Alerting

```python
from opensearcheval.monitoring import AlertManager

alert_manager = AlertManager()

# Configure alerts
alert_manager.add_alert(
    name="high_error_rate",
    condition="error_rate > 0.05",
    notification_channels=["slack", "email"]
)
```

## Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup

```bash
# Clone the repository
git clone https://github.com/llamasearchai/OpenSearchEval.git
cd OpenSearchEval

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Format code
black opensearcheval/
isort opensearcheval/

# Type checking
mypy opensearcheval/
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

- **Documentation**: [https://opensearcheval.readthedocs.io/](https://opensearcheval.readthedocs.io/)
- **Issues**: [GitHub Issues](https://github.com/llamasearchai/OpenSearchEval/issues)
- **Discussions**: [GitHub Discussions](https://github.com/llamasearchai/OpenSearchEval/discussions)
- **Email**: nikjois@llamasearch.ai

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history and release notes.

## Acknowledgments

- Built with [FastAPI](https://fastapi.tiangolo.com/) and [MLX](https://ml-explore.github.io/mlx/)
- Inspired by modern search evaluation best practices
- Special thanks to the open-source community

---

<div align="center">
  <p>Made with love by <a href="https://github.com/nikjois">Nik Jois</a></p>
  <p>Star this project if you find it useful!</p>
</div>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/llamasearchai/OpenSearchEval",
    "name": "opensearcheval",
    "maintainer": "Nik Jois",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "nikjois@llamasearch.ai",
    "keywords": "search, evaluation, a/b testing, mlx, llm, machine learning, information retrieval, search quality, metrics, analytics, experiment, ranking, click-through rate, precision, recall, ndcg, mrr, agent architecture, fastapi, dashboard, visualization",
    "author": "Nik Jois",
    "author_email": "Nik Jois <nikjois@llamasearch.ai>",
    "download_url": "https://files.pythonhosted.org/packages/03/9f/acde54e412c6d5bc85024f1c884809498170daf3578af906714f7ef798a8/opensearcheval-1.0.2.tar.gz",
    "platform": "any",
    "description": "# OpenSearchEval: Ultimate Search Evaluation Platform\n\n<div align=\"center\">\n\n[![PyPI version](https://badge.fury.io/py/opensearcheval.svg)](https://badge.fury.io/py/opensearcheval)\n[![Python versions](https://img.shields.io/pypi/pyversions/opensearcheval.svg)](https://pypi.org/project/opensearcheval/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://github.com/llamasearchai/OpenSearchEval/workflows/CI/badge.svg)](https://github.com/llamasearchai/OpenSearchEval/actions)\n[![Coverage Status](https://coveralls.io/repos/github/llamasearchai/OpenSearchEval/badge.svg?branch=main)](https://coveralls.io/github/llamasearchai/OpenSearchEval?branch=main)\n[![Documentation Status](https://readthedocs.org/projects/opensearcheval/badge/?version=latest)](https://opensearcheval.readthedocs.io/en/latest/?badge=latest)\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![Type checking: mypy](https://img.shields.io/badge/%20type_checker-mypy-%231674b1?style=flat)](https://mypy-lang.org/)\n[![Security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)\n\n[![Downloads](https://pepy.tech/badge/opensearcheval)](https://pepy.tech/project/opensearcheval)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/opensearcheval)](https://pypi.org/project/opensearcheval/)\n[![GitHub stars](https://img.shields.io/github/stars/llamasearchai/OpenSearchEval?style=social)](https://github.com/llamasearchai/OpenSearchEval/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/llamasearchai/OpenSearchEval?style=social)](https://github.com/llamasearchai/OpenSearchEval/network/members)\n\n</div>\n\n## Overview\n\nOpenSearchEval is a comprehensive, production-ready platform for evaluating search quality, conducting A/B tests, and analyzing user behavior. Built with modern Python technologies and featuring agent architecture, FastAPI endpoints, and MLX integration for Apple Silicon optimization.\n\n### Key Features\n\n- **Search Quality Metrics**: MRR, NDCG, Precision@K, Recall@K, and more\n- **A/B Testing Framework**: Design, run, and analyze experiments with statistical significance\n- **User Behavior Analytics**: Click tracking, dwell time, satisfaction metrics, and journey analysis\n- **Agent Architecture**: Distributed processing with asynchronous task handling\n- **MLX Integration**: Optimized ML components for Apple Silicon with GPU acceleration\n- **LLM-as-Judge**: AI-powered qualitative evaluation of search results\n- **FastAPI Endpoints**: Production-ready REST API with automatic documentation\n- **Rich Visualizations**: Interactive charts, dashboards, and reporting tools\n- **Extensible Design**: Plugin architecture for custom metrics and data sources\n- **Performance Monitoring**: Real-time metrics collection and alerting\n\n## Quick Start\n\n### Installation\n\n```bash\n# Install from PyPI\npip install opensearcheval\n\n# Or install with all optional dependencies\npip install opensearcheval[all]\n\n# For development\npip install opensearcheval[dev]\n```\n\n### Docker Installation (Recommended)\n\n```bash\n# Clone the repository\ngit clone https://github.com/llamasearchai/OpenSearchEval.git\ncd OpenSearchEval\n\n# Start with Docker Compose\ndocker-compose up -d\n\n# Access the API at http://localhost:8000\n# Access the UI at http://localhost:5000\n```\n\n### Basic Usage\n\n```python\nimport opensearcheval as ose\n\n# Initialize experiment manager\nexperiments = ose.ExperimentManager()\n\n# Create a new A/B test\nexperiment = experiments.create_experiment(\n    name=\"New Ranking Algorithm\",\n    description=\"Testing improved relevance scoring\",\n    metrics=[\"mean_reciprocal_rank\", \"click_through_rate\", \"satisfaction_score\"]\n)\n\n# Evaluate search results\nresults = [\n    {\"doc_id\": \"doc1\", \"title\": \"Python Tutorial\", \"score\": 0.95},\n    {\"doc_id\": \"doc2\", \"title\": \"Machine Learning Guide\", \"score\": 0.87},\n]\n\n# Calculate metrics\nmrr = ose.mean_reciprocal_rank(\n    query=\"python tutorial\",\n    results=results,\n    relevance_judgments={\"doc1\": 2, \"doc2\": 1}\n)\n\nprint(f\"Mean Reciprocal Rank: {mrr:.3f}\")\n```\n\n### API Usage\n\n```python\nimport httpx\n\n# Start the API server\n# opensearcheval-api --host 0.0.0.0 --port 8000\n\n# Evaluate search results via API\nasync with httpx.AsyncClient() as client:\n    response = await client.post(\n        \"http://localhost:8000/api/v1/evaluate\",\n        json={\n            \"id\": \"eval_001\",\n            \"query\": \"python tutorial\",\n            \"results\": results,\n            \"relevance_judgments\": {\"doc1\": 2, \"doc2\": 1}\n        }\n    )\n    \n    metrics = response.json()[\"metrics\"]\n    print(f\"API Response: {metrics}\")\n```\n\n### Command Line Interface\n\n```bash\n# Evaluate search results from file\nopensearcheval evaluate --input-file search_data.json --output-file results.json\n\n# Create an experiment\nopensearcheval experiment create \\\n    --name \"Improved Ranking\" \\\n    --metrics \"mrr,ndcg_at_10,ctr\" \\\n    --description \"Testing new ranking algorithm\"\n\n# Generate embeddings\nopensearcheval embedding generate \\\n    --input-file documents.json \\\n    --output-file embeddings.json \\\n    --model text-embedding-ada-002\n\n# Run A/B test analysis\nopensearcheval ab-test analyze \\\n    --control-data control.json \\\n    --treatment-data treatment.json \\\n    --confidence-level 0.95\n```\n\n## Architecture\n\n### Agent-Based Processing\n\n```python\n# Initialize agent manager\nagent_manager = ose.AgentManager()\n\n# Create specialized agents\nsearch_agent = ose.SearchEvaluationAgent(\n    name=\"search_evaluator\",\n    metrics=[ose.mean_reciprocal_rank, ose.ndcg_at_k],\n    config={\"batch_size\": 100}\n)\n\nab_test_agent = ose.ABTestAgent(\n    name=\"ab_tester\",\n    statistical_tests=[ose.t_test, ose.mann_whitney_u_test],\n    config={\"confidence_level\": 0.95}\n)\n\n# Register and start agents\nagent_manager.register_agent(search_agent)\nagent_manager.register_agent(ab_test_agent)\nawait agent_manager.start_all()\n```\n\n### MLX Integration (Apple Silicon)\n\n```python\n# Use MLX for accelerated ML operations\nfrom opensearcheval.ml import SearchRankingModel, ClickThroughRatePredictor\n\n# Initialize MLX-powered ranking model\nranking_model = SearchRankingModel(\n    embedding_dim=768,\n    hidden_dim=256,\n    use_mlx=True\n)\n\n# Train CTR prediction model\nctr_model = ClickThroughRatePredictor(\n    feature_dim=20,\n    hidden_dims=[64, 32]\n)\n\n# Train the model\ntrained_model = train_ctr_model(\n    training_data=training_examples,\n    epochs=50,\n    batch_size=64\n)\n```\n\n## Available Metrics\n\n### Relevance Metrics\n- **Mean Reciprocal Rank (MRR)**: Average of reciprocal ranks\n- **NDCG@K**: Normalized Discounted Cumulative Gain\n- **Precision@K**: Precision at various cutoff points\n- **Recall@K**: Recall at various cutoff points\n- **F1-Score**: Harmonic mean of precision and recall\n\n### User Behavior Metrics\n- **Click-Through Rate (CTR)**: Percentage of results clicked\n- **Time to First Click**: Average time before first interaction\n- **Dwell Time**: Time spent on clicked results\n- **Abandonment Rate**: Percentage of searches without clicks\n- **Satisfaction Score**: Composite user satisfaction metric\n\n### Advanced Metrics\n- **Reciprocal Rank Fusion**: Combines multiple result sets\n- **Diversity Score**: Measures result diversity\n- **Novelty Score**: Measures result novelty\n- **Coverage**: Measures catalog coverage\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# Application\nAPP_NAME=OpenSearchEval\nENVIRONMENT=production\nDEBUG=false\n\n# API Configuration\nAPI_HOST=0.0.0.0\nAPI_PORT=8000\n\n# Database\nDB_TYPE=postgresql\nDB_URL=postgresql://user:password@localhost/opensearcheval\n\n# LLM Configuration\nLLM_MODEL=gpt-4-turbo\nLLM_API_KEY=your-openai-api-key\nLLM_TEMPERATURE=0.1\n\n# MLX Configuration\nUSE_MLX=true\nMLX_MODEL_PATH=./models/mlx_model\n\n# Caching\nREDIS_URL=redis://localhost:6379/0\nENABLE_CACHING=true\nCACHE_TTL=3600\n\n# Monitoring\nENABLE_METRICS=true\nMETRICS_PORT=9090\n```\n\n### Configuration File\n\n```python\nfrom opensearcheval.core.config import get_settings\n\nsettings = get_settings()\n\n# Access configuration\nprint(f\"API Host: {settings.API_HOST}\")\nprint(f\"Database URL: {settings.database_url}\")\nprint(f\"Using MLX: {settings.USE_MLX}\")\n```\n\n## API Documentation\n\n### REST Endpoints\n\n- `GET /health` - Health check\n- `POST /api/v1/evaluate` - Evaluate search results\n- `POST /api/v1/analyze-ab-test` - Analyze A/B test results\n- `POST /api/v1/llm-judge` - LLM-based evaluation\n- `GET /api/v1/experiments` - List experiments\n- `POST /api/v1/experiments` - Create experiment\n- `GET /api/v1/experiments/{id}` - Get experiment details\n- `POST /api/v1/experiments/{id}/start` - Start experiment\n- `POST /api/v1/experiments/{id}/stop` - Stop experiment\n\n### WebSocket Support\n\n```python\n# Real-time metrics streaming\nimport websockets\n\nasync def metrics_stream():\n    uri = \"ws://localhost:8000/ws/metrics\"\n    async with websockets.connect(uri) as websocket:\n        async for message in websocket:\n            metrics = json.loads(message)\n            print(f\"Real-time metrics: {metrics}\")\n```\n\n## Data Connectors\n\n### Supported Data Sources\n\n- **Databases**: PostgreSQL, MySQL, SQLite, MongoDB\n- **Big Data**: Apache Spark, Databricks, Snowflake\n- **Search Engines**: Elasticsearch, OpenSearch, Solr\n- **Cloud Storage**: AWS S3, Google Cloud Storage, Azure Blob\n- **APIs**: REST APIs, GraphQL endpoints\n\n### Custom Connectors\n\n```python\nfrom opensearcheval.data.connectors import BaseConnector\n\nclass CustomConnector(BaseConnector):\n    def connect(self):\n        # Implement connection logic\n        pass\n    \n    def fetch_data(self, query):\n        # Implement data fetching\n        pass\n```\n\n## Testing\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=opensearcheval --cov-report=html\n\n# Run specific test categories\npytest tests/test_metrics.py -v\npytest tests/test_agents.py -v\npytest tests/test_api.py -v\n\n# Run performance tests\npytest tests/performance/ -v\n```\n\n### Test Data Generation\n\n```python\nfrom opensearcheval.testing import generate_test_data\n\n# Generate synthetic search data\ntest_data = generate_test_data(\n    num_queries=1000,\n    num_results_per_query=50,\n    relevance_distribution=[0.6, 0.3, 0.1]  # irrelevant, relevant, highly relevant\n)\n```\n\n## Deployment\n\n### Production Deployment\n\n```yaml\n# docker-compose.prod.yml\nversion: '3.8'\nservices:\n  opensearcheval-api:\n    image: opensearcheval:latest\n    ports:\n      - \"8000:8000\"\n    environment:\n      - ENVIRONMENT=production\n      - DB_URL=postgresql://user:password@db/opensearcheval\n      - REDIS_URL=redis://redis:6379/0\n    depends_on:\n      - db\n      - redis\n  \n  opensearcheval-ui:\n    image: opensearcheval-ui:latest\n    ports:\n      - \"5000:5000\"\n    depends_on:\n      - opensearcheval-api\n  \n  db:\n    image: postgres:15\n    environment:\n      - POSTGRES_DB=opensearcheval\n      - POSTGRES_USER=opensearcheval\n      - POSTGRES_PASSWORD=secure_password\n    volumes:\n      - postgres_data:/var/lib/postgresql/data\n  \n  redis:\n    image: redis:7-alpine\n    volumes:\n      - redis_data:/data\n\nvolumes:\n  postgres_data:\n  redis_data:\n```\n\n### Kubernetes Deployment\n\n```yaml\n# k8s-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: opensearcheval-api\nspec:\n  replicas: 3\n  selector:\n    matchLabels:\n      app: opensearcheval-api\n  template:\n    metadata:\n      labels:\n        app: opensearcheval-api\n    spec:\n      containers:\n      - name: api\n        image: opensearcheval:latest\n        ports:\n        - containerPort: 8000\n        env:\n        - name: ENVIRONMENT\n          value: \"production\"\n        - name: DB_URL\n          valueFrom:\n            secretKeyRef:\n              name: opensearcheval-secrets\n              key: database-url\n```\n\n## Monitoring and Observability\n\n### Metrics Collection\n\n```python\nfrom opensearcheval.monitoring import MetricsCollector\n\ncollector = MetricsCollector()\n\n# Custom metrics\ncollector.counter(\"search_evaluations_total\").inc()\ncollector.histogram(\"evaluation_duration_seconds\").observe(0.5)\ncollector.gauge(\"active_experiments\").set(5)\n```\n\n### Grafana Dashboard\n\nPre-built Grafana dashboards available at `/grafana/dashboards/`\n\n### Alerting\n\n```python\nfrom opensearcheval.monitoring import AlertManager\n\nalert_manager = AlertManager()\n\n# Configure alerts\nalert_manager.add_alert(\n    name=\"high_error_rate\",\n    condition=\"error_rate > 0.05\",\n    notification_channels=[\"slack\", \"email\"]\n)\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/llamasearchai/OpenSearchEval.git\ncd OpenSearchEval\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install development dependencies\npip install -e .[dev]\n\n# Install pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n\n# Format code\nblack opensearcheval/\nisort opensearcheval/\n\n# Type checking\nmypy opensearcheval/\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- **Documentation**: [https://opensearcheval.readthedocs.io/](https://opensearcheval.readthedocs.io/)\n- **Issues**: [GitHub Issues](https://github.com/llamasearchai/OpenSearchEval/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/llamasearchai/OpenSearchEval/discussions)\n- **Email**: nikjois@llamasearch.ai\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for version history and release notes.\n\n## Acknowledgments\n\n- Built with [FastAPI](https://fastapi.tiangolo.com/) and [MLX](https://ml-explore.github.io/mlx/)\n- Inspired by modern search evaluation best practices\n- Special thanks to the open-source community\n\n---\n\n<div align=\"center\">\n  <p>Made with love by <a href=\"https://github.com/nikjois\">Nik Jois</a></p>\n  <p>Star this project if you find it useful!</p>\n</div>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive search evaluation platform with agent architecture and MLX integration",
    "version": "1.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/llamasearchai/OpenSearchEval/issues",
        "Changelog": "https://github.com/llamasearchai/OpenSearchEval/blob/main/CHANGELOG.md",
        "Documentation": "https://opensearcheval.readthedocs.io/",
        "Homepage": "https://github.com/llamasearchai/OpenSearchEval",
        "Repository": "https://github.com/llamasearchai/OpenSearchEval"
    },
    "split_keywords": [
        "search",
        " evaluation",
        " a/b testing",
        " mlx",
        " llm",
        " machine learning",
        " information retrieval",
        " search quality",
        " metrics",
        " analytics",
        " experiment",
        " ranking",
        " click-through rate",
        " precision",
        " recall",
        " ndcg",
        " mrr",
        " agent architecture",
        " fastapi",
        " dashboard",
        " visualization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "49b5f6475092c82755cbc942924a2dbbfbf5db087fad1d4e7bbb729d7c9e0f71",
                "md5": "394b93b3fc590079b50d30adb31c1a1c",
                "sha256": "5b95a0311f37706f76af96457ab7564af5884825b768e558d378519e0f9089d6"
            },
            "downloads": -1,
            "filename": "opensearcheval-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "394b93b3fc590079b50d30adb31c1a1c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 77057,
            "upload_time": "2025-07-16T17:22:35",
            "upload_time_iso_8601": "2025-07-16T17:22:35.899368Z",
            "url": "https://files.pythonhosted.org/packages/49/b5/f6475092c82755cbc942924a2dbbfbf5db087fad1d4e7bbb729d7c9e0f71/opensearcheval-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "039facde54e412c6d5bc85024f1c884809498170daf3578af906714f7ef798a8",
                "md5": "b2172973ae66afd218f1adc325ae0772",
                "sha256": "307dce0e6bb1c6be946f7a379b1cbbaf2ce138ce816dc091dd9d5037f1477854"
            },
            "downloads": -1,
            "filename": "opensearcheval-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b2172973ae66afd218f1adc325ae0772",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 72771,
            "upload_time": "2025-07-16T17:22:37",
            "upload_time_iso_8601": "2025-07-16T17:22:37.432487Z",
            "url": "https://files.pythonhosted.org/packages/03/9f/acde54e412c6d5bc85024f1c884809498170daf3578af906714f7ef798a8/opensearcheval-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 17:22:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "llamasearchai",
    "github_project": "OpenSearchEval",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.4.0"
                ]
            ]
        },
        {
            "name": "pydantic-settings",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.104.0"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.23.2"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "aiofiles",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.24.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "mlx",
            "specs": [
                [
                    ">=",
                    "0.0.5"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.7.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.12.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.14.0"
                ]
            ]
        },
        {
            "name": "sqlalchemy",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "alembic",
            "specs": [
                [
                    ">=",
                    "1.12.0"
                ]
            ]
        },
        {
            "name": "pyspark",
            "specs": [
                [
                    ">=",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "flask",
            "specs": [
                [
                    ">=",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "flask-cors",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "dash",
            "specs": [
                [
                    ">=",
                    "2.11.0"
                ]
            ]
        },
        {
            "name": "dash-bootstrap-components",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "asyncio-mqtt",
            "specs": [
                [
                    ">=",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    ">=",
                    "13.0.0"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": [
                [
                    ">=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "redis",
            "specs": [
                [
                    ">=",
                    "4.5.0"
                ]
            ]
        },
        {
            "name": "prometheus-client",
            "specs": [
                [
                    ">=",
                    "0.17.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.9.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.30.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    ">=",
                    "2.8.0"
                ]
            ]
        },
        {
            "name": "cryptography",
            "specs": [
                [
                    ">=",
                    "41.0.0"
                ]
            ]
        },
        {
            "name": "passlib",
            "specs": [
                [
                    ">=",
                    "1.7.4"
                ]
            ]
        },
        {
            "name": "python-jose",
            "specs": [
                [
                    ">=",
                    "3.3.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "xlsxwriter",
            "specs": [
                [
                    ">=",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "orjson",
            "specs": [
                [
                    ">=",
                    "3.9.0"
                ]
            ]
        },
        {
            "name": "structlog",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        }
    ],
    "lcname": "opensearcheval"
}

Nik Jois