coreset

Name	coreset JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	A flexible framework for experimenting with and evaluating different sample selection strategies
upload_time	2025-01-18 02:12:53
maintainer	None
docs_url	None
author	Yasas Senarath
requires_python	>=3.9
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Exemplar Sample Selection Framework

A flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:
- Compare different selection strategies
- Evaluate using multiple metrics
- Work with various datasets
- Extend with custom strategies and metrics

## Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/exemplar-sample-selection.git
cd exemplar-sample-selection
```

2. Install dependencies using Poetry:
```bash
# Install Poetry if you haven't already:
# curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies and create virtual environment
poetry install
```

3. Activate the virtual environment:
```bash
poetry shell
```

## Quick Start

Run the example experiment:
```bash
python examples/run_experiment.py
```

This will:
1. Load the IMDB dataset
2. Extract features using a sentence transformer
3. Run random selection (baseline strategy)
4. Evaluate using coverage metrics
5. Save results to `outputs/imdb_random/`

## Framework Structure

```
exemplar-sample-selection/
├── src/
│   ├── data/              # Dataset handling
│   ├── selection/         # Selection strategies
│   ├── metrics/           # Evaluation metrics
│   ├── experiments/       # Experiment management
│   └── utils/             # Utilities
├── tests/                 # Unit tests
├── configs/               # Experiment configs
├── examples/              # Example scripts
└── docs/                  # Documentation
```

## Core Components

### 1. Dataset Management
- Standardized dataset interface
- Built-in support for text datasets
- Feature extraction and caching
- Easy extension to other data types

### 2. Selection Strategies
- Base strategy interface
- Random selection baseline
- Support for both supervised and unsupervised selection
- Easy addition of new strategies

### 3. Evaluation Metrics
- Coverage metrics
- Distribution matching
- Performance metrics
- Extensible metric system

### 4. Experiment Management
- Configuration-based setup
- Automated logging
- Result tracking
- Reproducible experiments

## Adding New Components

### Adding a New Selection Strategy

1. Create a new file in `src/selection/`:
```python
from .base import SelectionStrategy

class MyStrategy(SelectionStrategy):
    def select(self, features, labels=None, n_samples=100):
        # Implement your selection logic here
        return selected_indices
```

2. Register in `src/selection/__init__.py`

### Adding a New Metric

1. Create a new file in `src/metrics/`:
```python
from .base import Metric

class MyMetric(Metric):
    def compute(self, selected_features, full_features, 
                selected_labels=None, full_labels=None):
        # Implement your metric computation here
        return {'my_metric': value}
```

2. Register in `src/metrics/__init__.py`

## Running Experiments

### 1. Create Configuration

```python
from src.experiments import ExperimentConfig
from src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig

config = ExperimentConfig(
    name="My Experiment",
    dataset=DatasetConfig(
        name="dataset_name",
        split="train"
    ),
    selection=SelectionConfig(
        name="strategy_name",
        params={"param1": value1},
        n_samples=1000
    ),
    metrics=[
        MetricConfig(
            name="metric_name",
            params={"param1": value1}
        )
    ]
)
```

### 2. Run Experiment

```python
from src.experiments import ExperimentRunner

runner = ExperimentRunner(config)
results = runner.run()
```

### 3. Examine Results

Results are saved in the output directory:
- `config.json`: Experiment configuration
- `results.json`: Detailed results
- `summary.txt`: Human-readable summary
- `experiment.log`: Execution log

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## License

MIT License - see LICENSE file for details

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "coreset",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Yasas Senarath",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/85/8b/186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc/coreset-0.0.1.tar.gz",
    "platform": null,
    "description": "# Exemplar Sample Selection Framework\n\nA flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:\n- Compare different selection strategies\n- Evaluate using multiple metrics\n- Work with various datasets\n- Extend with custom strategies and metrics\n\n## Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/yourusername/exemplar-sample-selection.git\ncd exemplar-sample-selection\n```\n\n2. Install dependencies using Poetry:\n```bash\n# Install Poetry if you haven't already:\n# curl -sSL https://install.python-poetry.org | python3 -\n\n# Install dependencies and create virtual environment\npoetry install\n```\n\n3. Activate the virtual environment:\n```bash\npoetry shell\n```\n\n## Quick Start\n\nRun the example experiment:\n```bash\npython examples/run_experiment.py\n```\n\nThis will:\n1. Load the IMDB dataset\n2. Extract features using a sentence transformer\n3. Run random selection (baseline strategy)\n4. Evaluate using coverage metrics\n5. Save results to `outputs/imdb_random/`\n\n## Framework Structure\n\n```\nexemplar-sample-selection/\n\u251c\u2500\u2500 src/\n\u2502   \u251c\u2500\u2500 data/              # Dataset handling\n\u2502   \u251c\u2500\u2500 selection/         # Selection strategies\n\u2502   \u251c\u2500\u2500 metrics/           # Evaluation metrics\n\u2502   \u251c\u2500\u2500 experiments/       # Experiment management\n\u2502   \u2514\u2500\u2500 utils/             # Utilities\n\u251c\u2500\u2500 tests/                 # Unit tests\n\u251c\u2500\u2500 configs/               # Experiment configs\n\u251c\u2500\u2500 examples/              # Example scripts\n\u2514\u2500\u2500 docs/                  # Documentation\n```\n\n## Core Components\n\n### 1. Dataset Management\n- Standardized dataset interface\n- Built-in support for text datasets\n- Feature extraction and caching\n- Easy extension to other data types\n\n### 2. Selection Strategies\n- Base strategy interface\n- Random selection baseline\n- Support for both supervised and unsupervised selection\n- Easy addition of new strategies\n\n### 3. Evaluation Metrics\n- Coverage metrics\n- Distribution matching\n- Performance metrics\n- Extensible metric system\n\n### 4. Experiment Management\n- Configuration-based setup\n- Automated logging\n- Result tracking\n- Reproducible experiments\n\n## Adding New Components\n\n### Adding a New Selection Strategy\n\n1. Create a new file in `src/selection/`:\n```python\nfrom .base import SelectionStrategy\n\nclass MyStrategy(SelectionStrategy):\n    def select(self, features, labels=None, n_samples=100):\n        # Implement your selection logic here\n        return selected_indices\n```\n\n2. Register in `src/selection/__init__.py`\n\n### Adding a New Metric\n\n1. Create a new file in `src/metrics/`:\n```python\nfrom .base import Metric\n\nclass MyMetric(Metric):\n    def compute(self, selected_features, full_features, \n                selected_labels=None, full_labels=None):\n        # Implement your metric computation here\n        return {'my_metric': value}\n```\n\n2. Register in `src/metrics/__init__.py`\n\n## Running Experiments\n\n### 1. Create Configuration\n\n```python\nfrom src.experiments import ExperimentConfig\nfrom src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig\n\nconfig = ExperimentConfig(\n    name=\"My Experiment\",\n    dataset=DatasetConfig(\n        name=\"dataset_name\",\n        split=\"train\"\n    ),\n    selection=SelectionConfig(\n        name=\"strategy_name\",\n        params={\"param1\": value1},\n        n_samples=1000\n    ),\n    metrics=[\n        MetricConfig(\n            name=\"metric_name\",\n            params={\"param1\": value1}\n        )\n    ]\n)\n```\n\n### 2. Run Experiment\n\n```python\nfrom src.experiments import ExperimentRunner\n\nrunner = ExperimentRunner(config)\nresults = runner.run()\n```\n\n### 3. Examine Results\n\nResults are saved in the output directory:\n- `config.json`: Experiment configuration\n- `results.json`: Detailed results\n- `summary.txt`: Human-readable summary\n- `experiment.log`: Execution log\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests if applicable\n5. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A flexible framework for experimenting with and evaluating different sample selection strategies",
    "version": "0.0.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5fb2b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e",
                "md5": "ed6f2324e8588bcf7c1b23c36865c616",
                "sha256": "83d4cf8f805545c443c02eb0c9701c77156e432730e547ff7397680beb9ebe5a"
            },
            "downloads": -1,
            "filename": "coreset-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed6f2324e8588bcf7c1b23c36865c616",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 16508,
            "upload_time": "2025-01-18T02:12:51",
            "upload_time_iso_8601": "2025-01-18T02:12:51.134522Z",
            "url": "https://files.pythonhosted.org/packages/5f/b2/b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e/coreset-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "858b186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc",
                "md5": "a70a35c451ab214fd136a38b5dab3537",
                "sha256": "6e803a1baf065314ea930013f13cc2836a8fba3badaa5e2d2ee30ebe500f6261"
            },
            "downloads": -1,
            "filename": "coreset-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a70a35c451ab214fd136a38b5dab3537",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 12129,
            "upload_time": "2025-01-18T02:12:53",
            "upload_time_iso_8601": "2025-01-18T02:12:53.582893Z",
            "url": "https://files.pythonhosted.org/packages/85/8b/186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc/coreset-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-18 02:12:53",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "coreset"
}

Yasas Senarath