# Surrogate Concept Retrieval
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
# surrogate_concept_retrieval
Implementation for the paper Concept Retrieval - What and How?
## Package Status
✅ **Added**:
- Project URLs and Documentation links
- Keywords and classifiers for PyPI
- Populated `__init__.py` for proper imports
- Documentation structure with Sphinx
- Example code
- Improved README with usage examples
🔄 **In Progress**:
- Comprehensive documentation
- Test coverage
- CI/CD setup
## Getting Started
```bash
# Install the package
pip install -e .
```
See `RECOMMENDATIONS.md` for full details on package improvements.
## Overview
This package provides tools for extracting concepts from large datasets using surrogate concept retrieval method.
## Features
- Fast embedding indexing using FAISS
- GPU-accelerated similarity computation
- Automatic concept extraction from embedding spaces
- Flexible concept filtering and refinement
- Support for projection-based concept analysis
## Installation
```bash
# Install from PyPI
pip install coret
# Install with development dependencies
pip install "coret[dev]"
```
## Quick Start
```python
import numpy as np
from coret import ConceptRetrieval
# Load your embeddings (example uses random data)
embeddings = np.random.randn(1000, 768)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
embeddings = np.ascontiguousarray(embeddings, dtype=np.float32)
# Initialize concept retrieval
concept_retriever = ConceptRetrieval()
# Fit the model with embeddings
concept_retriever.fit(embeddings=embeddings)
# Select a random query embedding for demonstration
query_index = np.random.randint(0, len(embeddings))
query_embedding = embeddings[query_index]
# Retrieve concepts for the query
concepts = concept_retriever.retrieve(
query=query_embedding,
number_of_concepts=5,
number_of_samples_per_concept=5
)
# Print retrieved concepts
top_k_concepts_indices_s = concepts['top_k_concepts_indices_s']
print(f"Query index: {query_index}")
for i, concept_indices in enumerate(top_k_concepts_indices_s):
print(f"Concept {i+1}:")
print(f" Indices: {concept_indices}")
print()
```
## Requirements
- Python 3.9+
- CUDA-compatible GPU (recommended for large datasets)
- Dependencies:
- numpy
- faiss-gpu (or faiss-cpu)
- scipy
- scikit-learn
- tqdm
- cupy (for GPU acceleration)
## Documentation
For detailed API documentation and examples, please visit our [documentation site](https://onr.github.io/surrogate_concept_retrieval).
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
<!-- ## Citation
If you use this library in your research, please cite:
```bibtex
@article{author2025concept,
title={Concept Retrieval - What and How?},
author={Author, A.},
journal={Journal Name},
year={2025}
}
``` -->
Raw data
{
"_id": null,
"home_page": null,
"name": "coret",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "concept-retrieval, interpretability, xai, computer-vision, machine-learning",
"author": null,
"author_email": "Onr <restin3@gmail.com>",
"download_url": null,
"platform": null,
"description": "# Surrogate Concept Retrieval\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n# surrogate_concept_retrieval\nImplementation for the paper Concept Retrieval - What and How?\n\n## Package Status\n\n\u2705 **Added**:\n- Project URLs and Documentation links\n- Keywords and classifiers for PyPI\n- Populated `__init__.py` for proper imports\n- Documentation structure with Sphinx\n- Example code\n- Improved README with usage examples\n\n\ud83d\udd04 **In Progress**:\n- Comprehensive documentation\n- Test coverage\n- CI/CD setup\n\n## Getting Started\n\n```bash\n# Install the package\npip install -e .\n```\n\nSee `RECOMMENDATIONS.md` for full details on package improvements.\n\n## Overview\n\nThis package provides tools for extracting concepts from large datasets using surrogate concept retrieval method.\n\n## Features\n\n- Fast embedding indexing using FAISS\n- GPU-accelerated similarity computation\n- Automatic concept extraction from embedding spaces\n- Flexible concept filtering and refinement\n- Support for projection-based concept analysis\n\n## Installation\n\n```bash\n# Install from PyPI\npip install coret\n\n# Install with development dependencies\npip install \"coret[dev]\"\n```\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom coret import ConceptRetrieval\n\n# Load your embeddings (example uses random data)\nembeddings = np.random.randn(1000, 768)\nembeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)\nembeddings = np.ascontiguousarray(embeddings, dtype=np.float32)\n\n# Initialize concept retrieval\nconcept_retriever = ConceptRetrieval()\n\n# Fit the model with embeddings\nconcept_retriever.fit(embeddings=embeddings)\n\n# Select a random query embedding for demonstration\nquery_index = np.random.randint(0, len(embeddings))\nquery_embedding = embeddings[query_index]\n\n# Retrieve concepts for the query\nconcepts = concept_retriever.retrieve(\n query=query_embedding,\n number_of_concepts=5,\n number_of_samples_per_concept=5\n)\n\n# Print retrieved concepts\ntop_k_concepts_indices_s = concepts['top_k_concepts_indices_s']\n\nprint(f\"Query index: {query_index}\")\nfor i, concept_indices in enumerate(top_k_concepts_indices_s):\n print(f\"Concept {i+1}:\")\n print(f\" Indices: {concept_indices}\")\n print()\n```\n\n## Requirements\n\n- Python 3.9+\n- CUDA-compatible GPU (recommended for large datasets)\n- Dependencies:\n - numpy\n - faiss-gpu (or faiss-cpu)\n - scipy\n - scikit-learn\n - tqdm\n - cupy (for GPU acceleration)\n\n## Documentation\n\nFor detailed API documentation and examples, please visit our [documentation site](https://onr.github.io/surrogate_concept_retrieval).\n\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n<!-- ## Citation\n\nIf you use this library in your research, please cite:\n\n```bibtex\n@article{author2025concept,\n title={Concept Retrieval - What and How?},\n author={Author, A.},\n journal={Journal Name},\n year={2025}\n}\n``` -->\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Surrogate based Concept Retrieval for Large Datasets",
"version": "0.1.1",
"project_urls": {
"Bug_Tracker": "https://github.com/Onr/surrogate_concept_retrieval/issues",
"Documentation": "https://onr.github.io/surrogate_concept_retrieval/",
"Homepage": "https://github.com/Onr/surrogate_concept_retrieval"
},
"split_keywords": [
"concept-retrieval",
" interpretability",
" xai",
" computer-vision",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9b9a7364d92e6fa18b502d04a65a37da20362072092a4cfaae6fb678228d1df2",
"md5": "9b4a0827fd35ec7829a1a6e2d68f0e38",
"sha256": "497c356eb1285389a3e5798a8aa42a25a523fd93794fe71613dd816f75737531"
},
"downloads": -1,
"filename": "coret-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b4a0827fd35ec7829a1a6e2d68f0e38",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 41520,
"upload_time": "2025-08-06T11:52:02",
"upload_time_iso_8601": "2025-08-06T11:52:02.034132Z",
"url": "https://files.pythonhosted.org/packages/9b/9a/7364d92e6fa18b502d04a65a37da20362072092a4cfaae6fb678228d1df2/coret-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 11:52:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Onr",
"github_project": "surrogate_concept_retrieval",
"github_not_found": true,
"lcname": "coret"
}