Name | coreset JSON |
Version |
0.0.1
JSON |
| download |
home_page | None |
Summary | A flexible framework for experimenting with and evaluating different sample selection strategies |
upload_time | 2025-01-18 02:12:53 |
maintainer | None |
docs_url | None |
author | Yasas Senarath |
requires_python | >=3.9 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Exemplar Sample Selection Framework
A flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:
- Compare different selection strategies
- Evaluate using multiple metrics
- Work with various datasets
- Extend with custom strategies and metrics
## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/exemplar-sample-selection.git
cd exemplar-sample-selection
```
2. Install dependencies using Poetry:
```bash
# Install Poetry if you haven't already:
# curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies and create virtual environment
poetry install
```
3. Activate the virtual environment:
```bash
poetry shell
```
## Quick Start
Run the example experiment:
```bash
python examples/run_experiment.py
```
This will:
1. Load the IMDB dataset
2. Extract features using a sentence transformer
3. Run random selection (baseline strategy)
4. Evaluate using coverage metrics
5. Save results to `outputs/imdb_random/`
## Framework Structure
```
exemplar-sample-selection/
├── src/
│ ├── data/ # Dataset handling
│ ├── selection/ # Selection strategies
│ ├── metrics/ # Evaluation metrics
│ ├── experiments/ # Experiment management
│ └── utils/ # Utilities
├── tests/ # Unit tests
├── configs/ # Experiment configs
├── examples/ # Example scripts
└── docs/ # Documentation
```
## Core Components
### 1. Dataset Management
- Standardized dataset interface
- Built-in support for text datasets
- Feature extraction and caching
- Easy extension to other data types
### 2. Selection Strategies
- Base strategy interface
- Random selection baseline
- Support for both supervised and unsupervised selection
- Easy addition of new strategies
### 3. Evaluation Metrics
- Coverage metrics
- Distribution matching
- Performance metrics
- Extensible metric system
### 4. Experiment Management
- Configuration-based setup
- Automated logging
- Result tracking
- Reproducible experiments
## Adding New Components
### Adding a New Selection Strategy
1. Create a new file in `src/selection/`:
```python
from .base import SelectionStrategy
class MyStrategy(SelectionStrategy):
def select(self, features, labels=None, n_samples=100):
# Implement your selection logic here
return selected_indices
```
2. Register in `src/selection/__init__.py`
### Adding a New Metric
1. Create a new file in `src/metrics/`:
```python
from .base import Metric
class MyMetric(Metric):
def compute(self, selected_features, full_features,
selected_labels=None, full_labels=None):
# Implement your metric computation here
return {'my_metric': value}
```
2. Register in `src/metrics/__init__.py`
## Running Experiments
### 1. Create Configuration
```python
from src.experiments import ExperimentConfig
from src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig
config = ExperimentConfig(
name="My Experiment",
dataset=DatasetConfig(
name="dataset_name",
split="train"
),
selection=SelectionConfig(
name="strategy_name",
params={"param1": value1},
n_samples=1000
),
metrics=[
MetricConfig(
name="metric_name",
params={"param1": value1}
)
]
)
```
### 2. Run Experiment
```python
from src.experiments import ExperimentRunner
runner = ExperimentRunner(config)
results = runner.run()
```
### 3. Examine Results
Results are saved in the output directory:
- `config.json`: Experiment configuration
- `results.json`: Detailed results
- `summary.txt`: Human-readable summary
- `experiment.log`: Execution log
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## License
MIT License - see LICENSE file for details
Raw data
{
"_id": null,
"home_page": null,
"name": "coreset",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Yasas Senarath",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/85/8b/186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc/coreset-0.0.1.tar.gz",
"platform": null,
"description": "# Exemplar Sample Selection Framework\n\nA flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:\n- Compare different selection strategies\n- Evaluate using multiple metrics\n- Work with various datasets\n- Extend with custom strategies and metrics\n\n## Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/yourusername/exemplar-sample-selection.git\ncd exemplar-sample-selection\n```\n\n2. Install dependencies using Poetry:\n```bash\n# Install Poetry if you haven't already:\n# curl -sSL https://install.python-poetry.org | python3 -\n\n# Install dependencies and create virtual environment\npoetry install\n```\n\n3. Activate the virtual environment:\n```bash\npoetry shell\n```\n\n## Quick Start\n\nRun the example experiment:\n```bash\npython examples/run_experiment.py\n```\n\nThis will:\n1. Load the IMDB dataset\n2. Extract features using a sentence transformer\n3. Run random selection (baseline strategy)\n4. Evaluate using coverage metrics\n5. Save results to `outputs/imdb_random/`\n\n## Framework Structure\n\n```\nexemplar-sample-selection/\n\u251c\u2500\u2500 src/\n\u2502 \u251c\u2500\u2500 data/ # Dataset handling\n\u2502 \u251c\u2500\u2500 selection/ # Selection strategies\n\u2502 \u251c\u2500\u2500 metrics/ # Evaluation metrics\n\u2502 \u251c\u2500\u2500 experiments/ # Experiment management\n\u2502 \u2514\u2500\u2500 utils/ # Utilities\n\u251c\u2500\u2500 tests/ # Unit tests\n\u251c\u2500\u2500 configs/ # Experiment configs\n\u251c\u2500\u2500 examples/ # Example scripts\n\u2514\u2500\u2500 docs/ # Documentation\n```\n\n## Core Components\n\n### 1. Dataset Management\n- Standardized dataset interface\n- Built-in support for text datasets\n- Feature extraction and caching\n- Easy extension to other data types\n\n### 2. Selection Strategies\n- Base strategy interface\n- Random selection baseline\n- Support for both supervised and unsupervised selection\n- Easy addition of new strategies\n\n### 3. Evaluation Metrics\n- Coverage metrics\n- Distribution matching\n- Performance metrics\n- Extensible metric system\n\n### 4. Experiment Management\n- Configuration-based setup\n- Automated logging\n- Result tracking\n- Reproducible experiments\n\n## Adding New Components\n\n### Adding a New Selection Strategy\n\n1. Create a new file in `src/selection/`:\n```python\nfrom .base import SelectionStrategy\n\nclass MyStrategy(SelectionStrategy):\n def select(self, features, labels=None, n_samples=100):\n # Implement your selection logic here\n return selected_indices\n```\n\n2. Register in `src/selection/__init__.py`\n\n### Adding a New Metric\n\n1. Create a new file in `src/metrics/`:\n```python\nfrom .base import Metric\n\nclass MyMetric(Metric):\n def compute(self, selected_features, full_features, \n selected_labels=None, full_labels=None):\n # Implement your metric computation here\n return {'my_metric': value}\n```\n\n2. Register in `src/metrics/__init__.py`\n\n## Running Experiments\n\n### 1. Create Configuration\n\n```python\nfrom src.experiments import ExperimentConfig\nfrom src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig\n\nconfig = ExperimentConfig(\n name=\"My Experiment\",\n dataset=DatasetConfig(\n name=\"dataset_name\",\n split=\"train\"\n ),\n selection=SelectionConfig(\n name=\"strategy_name\",\n params={\"param1\": value1},\n n_samples=1000\n ),\n metrics=[\n MetricConfig(\n name=\"metric_name\",\n params={\"param1\": value1}\n )\n ]\n)\n```\n\n### 2. Run Experiment\n\n```python\nfrom src.experiments import ExperimentRunner\n\nrunner = ExperimentRunner(config)\nresults = runner.run()\n```\n\n### 3. Examine Results\n\nResults are saved in the output directory:\n- `config.json`: Experiment configuration\n- `results.json`: Detailed results\n- `summary.txt`: Human-readable summary\n- `experiment.log`: Execution log\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests if applicable\n5. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A flexible framework for experimenting with and evaluating different sample selection strategies",
"version": "0.0.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5fb2b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e",
"md5": "ed6f2324e8588bcf7c1b23c36865c616",
"sha256": "83d4cf8f805545c443c02eb0c9701c77156e432730e547ff7397680beb9ebe5a"
},
"downloads": -1,
"filename": "coreset-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ed6f2324e8588bcf7c1b23c36865c616",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 16508,
"upload_time": "2025-01-18T02:12:51",
"upload_time_iso_8601": "2025-01-18T02:12:51.134522Z",
"url": "https://files.pythonhosted.org/packages/5f/b2/b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e/coreset-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "858b186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc",
"md5": "a70a35c451ab214fd136a38b5dab3537",
"sha256": "6e803a1baf065314ea930013f13cc2836a8fba3badaa5e2d2ee30ebe500f6261"
},
"downloads": -1,
"filename": "coreset-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "a70a35c451ab214fd136a38b5dab3537",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 12129,
"upload_time": "2025-01-18T02:12:53",
"upload_time_iso_8601": "2025-01-18T02:12:53.582893Z",
"url": "https://files.pythonhosted.org/packages/85/8b/186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc/coreset-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-18 02:12:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "coreset"
}