coret


Namecoret JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummarySurrogate based Concept Retrieval for Large Datasets
upload_time2025-08-06 11:52:02
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.9
licenseMIT
keywords concept-retrieval interpretability xai computer-vision machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Surrogate Concept Retrieval

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

# surrogate_concept_retrieval
Implementation for the paper Concept Retrieval - What and How?

## Package Status

✅ **Added**:
- Project URLs and Documentation links
- Keywords and classifiers for PyPI
- Populated `__init__.py` for proper imports
- Documentation structure with Sphinx
- Example code
- Improved README with usage examples

🔄 **In Progress**:
- Comprehensive documentation
- Test coverage
- CI/CD setup

## Getting Started

```bash
# Install the package
pip install -e .
```

See `RECOMMENDATIONS.md` for full details on package improvements.

## Overview

This package provides tools for extracting concepts from large datasets using surrogate concept retrieval method.

## Features

- Fast embedding indexing using FAISS
- GPU-accelerated similarity computation
- Automatic concept extraction from embedding spaces
- Flexible concept filtering and refinement
- Support for projection-based concept analysis

## Installation

```bash
# Install from PyPI
pip install coret

# Install with development dependencies
pip install "coret[dev]"
```

## Quick Start

```python
import numpy as np
from coret import ConceptRetrieval

# Load your embeddings (example uses random data)
embeddings = np.random.randn(1000, 768)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
embeddings = np.ascontiguousarray(embeddings, dtype=np.float32)

# Initialize concept retrieval
concept_retriever = ConceptRetrieval()

# Fit the model with embeddings
concept_retriever.fit(embeddings=embeddings)

# Select a random query embedding for demonstration
query_index = np.random.randint(0, len(embeddings))
query_embedding = embeddings[query_index]

# Retrieve concepts for the query
concepts = concept_retriever.retrieve(
    query=query_embedding,
    number_of_concepts=5,
    number_of_samples_per_concept=5
)

# Print retrieved concepts
top_k_concepts_indices_s = concepts['top_k_concepts_indices_s']

print(f"Query index: {query_index}")
for i, concept_indices in enumerate(top_k_concepts_indices_s):
  print(f"Concept {i+1}:")
  print(f"  Indices: {concept_indices}")
  print()
```

## Requirements

- Python 3.9+
- CUDA-compatible GPU (recommended for large datasets)
- Dependencies:
  - numpy
  - faiss-gpu (or faiss-cpu)
  - scipy
  - scikit-learn
  - tqdm
  - cupy (for GPU acceleration)

## Documentation

For detailed API documentation and examples, please visit our [documentation site](https://onr.github.io/surrogate_concept_retrieval).


## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

<!-- ## Citation

If you use this library in your research, please cite:

```bibtex
@article{author2025concept,
  title={Concept Retrieval - What and How?},
  author={Author, A.},
  journal={Journal Name},
  year={2025}
}
``` -->

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "coret",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "concept-retrieval, interpretability, xai, computer-vision, machine-learning",
    "author": null,
    "author_email": "Onr <restin3@gmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# Surrogate Concept Retrieval\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n# surrogate_concept_retrieval\nImplementation for the paper Concept Retrieval - What and How?\n\n## Package Status\n\n\u2705 **Added**:\n- Project URLs and Documentation links\n- Keywords and classifiers for PyPI\n- Populated `__init__.py` for proper imports\n- Documentation structure with Sphinx\n- Example code\n- Improved README with usage examples\n\n\ud83d\udd04 **In Progress**:\n- Comprehensive documentation\n- Test coverage\n- CI/CD setup\n\n## Getting Started\n\n```bash\n# Install the package\npip install -e .\n```\n\nSee `RECOMMENDATIONS.md` for full details on package improvements.\n\n## Overview\n\nThis package provides tools for extracting concepts from large datasets using surrogate concept retrieval method.\n\n## Features\n\n- Fast embedding indexing using FAISS\n- GPU-accelerated similarity computation\n- Automatic concept extraction from embedding spaces\n- Flexible concept filtering and refinement\n- Support for projection-based concept analysis\n\n## Installation\n\n```bash\n# Install from PyPI\npip install coret\n\n# Install with development dependencies\npip install \"coret[dev]\"\n```\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom coret import ConceptRetrieval\n\n# Load your embeddings (example uses random data)\nembeddings = np.random.randn(1000, 768)\nembeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)\nembeddings = np.ascontiguousarray(embeddings, dtype=np.float32)\n\n# Initialize concept retrieval\nconcept_retriever = ConceptRetrieval()\n\n# Fit the model with embeddings\nconcept_retriever.fit(embeddings=embeddings)\n\n# Select a random query embedding for demonstration\nquery_index = np.random.randint(0, len(embeddings))\nquery_embedding = embeddings[query_index]\n\n# Retrieve concepts for the query\nconcepts = concept_retriever.retrieve(\n    query=query_embedding,\n    number_of_concepts=5,\n    number_of_samples_per_concept=5\n)\n\n# Print retrieved concepts\ntop_k_concepts_indices_s = concepts['top_k_concepts_indices_s']\n\nprint(f\"Query index: {query_index}\")\nfor i, concept_indices in enumerate(top_k_concepts_indices_s):\n  print(f\"Concept {i+1}:\")\n  print(f\"  Indices: {concept_indices}\")\n  print()\n```\n\n## Requirements\n\n- Python 3.9+\n- CUDA-compatible GPU (recommended for large datasets)\n- Dependencies:\n  - numpy\n  - faiss-gpu (or faiss-cpu)\n  - scipy\n  - scikit-learn\n  - tqdm\n  - cupy (for GPU acceleration)\n\n## Documentation\n\nFor detailed API documentation and examples, please visit our [documentation site](https://onr.github.io/surrogate_concept_retrieval).\n\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n<!-- ## Citation\n\nIf you use this library in your research, please cite:\n\n```bibtex\n@article{author2025concept,\n  title={Concept Retrieval - What and How?},\n  author={Author, A.},\n  journal={Journal Name},\n  year={2025}\n}\n``` -->\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Surrogate based Concept Retrieval for Large Datasets",
    "version": "0.1.1",
    "project_urls": {
        "Bug_Tracker": "https://github.com/Onr/surrogate_concept_retrieval/issues",
        "Documentation": "https://onr.github.io/surrogate_concept_retrieval/",
        "Homepage": "https://github.com/Onr/surrogate_concept_retrieval"
    },
    "split_keywords": [
        "concept-retrieval",
        " interpretability",
        " xai",
        " computer-vision",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9b9a7364d92e6fa18b502d04a65a37da20362072092a4cfaae6fb678228d1df2",
                "md5": "9b4a0827fd35ec7829a1a6e2d68f0e38",
                "sha256": "497c356eb1285389a3e5798a8aa42a25a523fd93794fe71613dd816f75737531"
            },
            "downloads": -1,
            "filename": "coret-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b4a0827fd35ec7829a1a6e2d68f0e38",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 41520,
            "upload_time": "2025-08-06T11:52:02",
            "upload_time_iso_8601": "2025-08-06T11:52:02.034132Z",
            "url": "https://files.pythonhosted.org/packages/9b/9a/7364d92e6fa18b502d04a65a37da20362072092a4cfaae6fb678228d1df2/coret-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 11:52:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Onr",
    "github_project": "surrogate_concept_retrieval",
    "github_not_found": true,
    "lcname": "coret"
}
        
Elapsed time: 0.83738s