# Athelas: Zettelkasten-Inspired ML Model Catalog
Athelas is a comprehensive machine learning repository organized according to Zettelkasten knowledge management principles. It provides a unified catalog of ML models, data processing components, and intelligent knowledge management tools designed to facilitate model discovery, comparison, and innovation.
## Overview
Athelas transforms traditional ML repositories into a living knowledge system by implementing a dual-layer architecture:
- **Implementation Notes**: Atomic, reusable ML components (models, processors, utilities)
- **Literature Notes**: Contextual documentation that connects and explains components
- **Intelligent Agents**: Knowledge orchestrator and retriever for automated knowledge management
## Key Features
### 🧠**Intelligent Knowledge Management**
- **Knowledge Orchestrator**: Automatically maintains connections between components, generates documentation, and validates relationships
- **Knowledge Retriever**: Enables semantic search and discovery through RAG-based interfaces and knowledge graph exploration
- **Dual-Layer Architecture**: Combines executable code with rich contextual documentation
### 🔗 **Explicit Component Connectivity**
- Connection registries document relationships between models and processors
- Cross-reference metadata enables discovery of compatible components
- Knowledge graph visualization of component relationships
### âš¡ **Multi-Framework Support**
- **PyTorch Lightning**: Advanced neural network implementations
- **PyTorch**: Native PyTorch models and components
- **XGBoost & LightGBM**: Gradient boosting models
- **Reinforcement Learning**: Actor-critic and bandit algorithms
- **AWS Bedrock**: Cloud-based model integrations
### 🛠**Comprehensive Processing Pipeline**
- **Text Processing**: BERT tokenization, Gensim processing, text augmentation
- **Tabular Processing**: Numerical imputation, categorical encoding, feature engineering
- **Image Processing**: Computer vision preprocessing and augmentation
- **Multimodal Processing**: Cross-modal fusion and attention mechanisms
## Architecture
### Repository Structure
```
src/
├── models/ # ML Model Implementations
│ ├── lightning/ # PyTorch Lightning models
│ ├── pytorch/ # Native PyTorch implementations
│ ├── xgboost/ # XGBoost models
│ ├── lightgbm/ # LightGBM models
│ └── actor_critic/ # Reinforcement learning models
├── processing/ # Data Processing Components
│ ├── text/ # Text processing (BERT, Gensim, etc.)
│ ├── tabular/ # Tabular data processing
│ ├── image/ # Image processing
│ ├── feature/ # Feature engineering
│ └── augmentation/ # Data augmentation
├── knowledge/ # Knowledge Management System
│ ├── orchestrator/ # Knowledge orchestration agents
│ ├── retriever/ # Intelligent retrieval system
│ ├── connections/ # Component relationship registries
│ └── demonstrations/ # Usage examples and tutorials
└── utils/ # Shared utilities
slipbox/ # Knowledge Documentation
├── models/ # Model documentation and analysis
├── processing/ # Processing component documentation
└── knowledge/ # Knowledge system documentation
```
### Core Design Principles
1. **Atomicity**: Each component focuses on a single, well-defined responsibility
2. **Explicit Connectivity**: Relationships between components are explicitly documented
3. **Emergent Organization**: Structure evolves naturally from content relationships
4. **Knowledge Preservation**: Implementation and context are preserved together
## Installation
```bash
pip install athelas
```
### Development Installation
```bash
git clone https://github.com/TianpeiLuke/athelas.git
cd athelas
pip install -e .
```
## Quick Start
### Basic Model Usage
```python
from athelas.models.lightning import BertClassifier
from athelas.processing.text import BertTokenizeProcessor
# Initialize components
tokenizer = BertTokenizeProcessor()
model = BertClassifier(config={
'num_classes': 2,
'learning_rate': 2e-5
})
# Process data
processed_text = tokenizer("Example text for classification")
# Train model
trainer = pl.Trainer(max_epochs=3)
trainer.fit(model, train_dataloader)
```
### Knowledge System Queries
```python
from athelas.knowledge.retriever import KnowledgeRetriever
# Initialize knowledge retriever
retriever = KnowledgeRetriever()
# Semantic search for components
results = retriever.search("text classification with BERT")
# Explore component relationships
related = retriever.find_related_components("bert_classifier")
# Get recommendations based on context
recommendations = retriever.recommend_components({
'task': 'text_classification',
'data_type': 'text',
'framework': 'lightning'
})
```
### Component Discovery
```python
from athelas.knowledge.orchestrator import KnowledgeOrchestrator
# Initialize orchestrator
orchestrator = KnowledgeOrchestrator()
# Discover compatible processors for a model
compatible = orchestrator.find_compatible_processors("bert_classifier")
# Get alternative models for a task
alternatives = orchestrator.find_alternatives("text_classification")
```
## Available Components
### Models
#### Text Classification
- **BERT Classifier**: Transformer-based classification with Hugging Face integration
- **Text CNN**: Convolutional neural networks for text classification
- **LSTM**: Recurrent neural networks for sequence classification
#### Multimodal Models
- **Multimodal BERT**: Text and tabular data fusion
- **Cross-Attention**: Attention-based multimodal fusion
- **Gate Fusion**: Gated multimodal feature combination
- **Mixture of Experts**: Sparse multimodal processing
#### Traditional ML
- **XGBoost**: Gradient boosting for tabular data
- **LightGBM**: Fast gradient boosting implementation
### Processing Components
#### Text Processing
- **BERT Tokenizer**: Hugging Face BERT tokenization
- **Gensim Processor**: Word2Vec and Doc2Vec processing
- **Text Augmentation**: Data augmentation for text
#### Tabular Processing
- **Numerical Imputation**: Missing value handling
- **Categorical Encoding**: Label and one-hot encoding
- **Feature Engineering**: Automated feature creation
## Knowledge Management
### Intelligent Discovery
Athelas includes intelligent agents that help you discover and connect components:
```python
# Ask natural language questions about the catalog
answer = retriever.ask("What models work best for multimodal classification?")
# Explore the knowledge graph
graph = retriever.get_knowledge_graph()
connections = graph.get_connections("bert_classifier")
```
### Automatic Documentation
The Knowledge Orchestrator automatically:
- Extracts metadata from component implementations
- Maintains connection registries between components
- Generates and updates documentation
- Validates component relationships
## Contributing
Athelas follows Zettelkasten principles for contributions:
1. **Atomic Components**: Create focused, single-purpose implementations
2. **Explicit Connections**: Document relationships with other components
3. **Rich Metadata**: Include structured metadata in component docstrings
4. **Knowledge Documentation**: Provide contextual documentation in the slipbox
### Adding a New Component
```python
"""
---
component_type: model
framework: lightning
task: text_classification
connections:
requires:
- "processing.text.bert_tokenize_processor.BertTokenizeProcessor"
alternatives:
- "models.lightning.text_cnn.TextCNN"
---
"""
class YourModel(LightningModule):
"""Your model implementation with metadata."""
pass
```
## License
MIT License - see [LICENSE](LICENSE) for details.
## Citation
If you use Athelas in your research, please cite:
```bibtex
@software{athelas2025,
title={Athelas: Zettelkasten-Inspired ML Model Catalog},
author={Xie, Tianpei},
year={2025},
url={https://github.com/TianpeiLuke/athelas}
}
```
## Related Projects
- [Zettelkasten Method](https://zettelkasten.de/): Knowledge management methodology
- [PyTorch Lightning](https://lightning.ai/): Framework for professional AI research
- [Hugging Face Transformers](https://huggingface.co/transformers/): State-of-the-art NLP models
---
**Athelas**: *From the Greek word meaning "healing" - helping researchers and practitioners heal the fragmentation in ML model development through
Raw data
{
"_id": null,
"home_page": null,
"name": "athelas",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "machine-learning, deep-learning, pytorch, knowledge-management, zettelkasten, model-catalog",
"author": null,
"author_email": "Tianpei Xie <unidoctor@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ca/f9/ad902165310dff3c307028e27029c626ca2f88705fb5c2378c69762daf9c/athelas-0.1.0.tar.gz",
"platform": null,
"description": "# Athelas: Zettelkasten-Inspired ML Model Catalog\n\nAthelas is a comprehensive machine learning repository organized according to Zettelkasten knowledge management principles. It provides a unified catalog of ML models, data processing components, and intelligent knowledge management tools designed to facilitate model discovery, comparison, and innovation.\n\n## Overview\n\nAthelas transforms traditional ML repositories into a living knowledge system by implementing a dual-layer architecture:\n\n- **Implementation Notes**: Atomic, reusable ML components (models, processors, utilities)\n- **Literature Notes**: Contextual documentation that connects and explains components\n- **Intelligent Agents**: Knowledge orchestrator and retriever for automated knowledge management\n\n## Key Features\n\n### \ud83e\udde0 **Intelligent Knowledge Management**\n- **Knowledge Orchestrator**: Automatically maintains connections between components, generates documentation, and validates relationships\n- **Knowledge Retriever**: Enables semantic search and discovery through RAG-based interfaces and knowledge graph exploration\n- **Dual-Layer Architecture**: Combines executable code with rich contextual documentation\n\n### \ud83d\udd17 **Explicit Component Connectivity**\n- Connection registries document relationships between models and processors\n- Cross-reference metadata enables discovery of compatible components\n- Knowledge graph visualization of component relationships\n\n### \u26a1 **Multi-Framework Support**\n- **PyTorch Lightning**: Advanced neural network implementations\n- **PyTorch**: Native PyTorch models and components\n- **XGBoost & LightGBM**: Gradient boosting models\n- **Reinforcement Learning**: Actor-critic and bandit algorithms\n- **AWS Bedrock**: Cloud-based model integrations\n\n### \ud83d\udee0 **Comprehensive Processing Pipeline**\n- **Text Processing**: BERT tokenization, Gensim processing, text augmentation\n- **Tabular Processing**: Numerical imputation, categorical encoding, feature engineering\n- **Image Processing**: Computer vision preprocessing and augmentation\n- **Multimodal Processing**: Cross-modal fusion and attention mechanisms\n\n## Architecture\n\n### Repository Structure\n\n```\nsrc/\n\u251c\u2500\u2500 models/ # ML Model Implementations\n\u2502 \u251c\u2500\u2500 lightning/ # PyTorch Lightning models\n\u2502 \u251c\u2500\u2500 pytorch/ # Native PyTorch implementations\n\u2502 \u251c\u2500\u2500 xgboost/ # XGBoost models\n\u2502 \u251c\u2500\u2500 lightgbm/ # LightGBM models\n\u2502 \u2514\u2500\u2500 actor_critic/ # Reinforcement learning models\n\u251c\u2500\u2500 processing/ # Data Processing Components\n\u2502 \u251c\u2500\u2500 text/ # Text processing (BERT, Gensim, etc.)\n\u2502 \u251c\u2500\u2500 tabular/ # Tabular data processing\n\u2502 \u251c\u2500\u2500 image/ # Image processing\n\u2502 \u251c\u2500\u2500 feature/ # Feature engineering\n\u2502 \u2514\u2500\u2500 augmentation/ # Data augmentation\n\u251c\u2500\u2500 knowledge/ # Knowledge Management System\n\u2502 \u251c\u2500\u2500 orchestrator/ # Knowledge orchestration agents\n\u2502 \u251c\u2500\u2500 retriever/ # Intelligent retrieval system\n\u2502 \u251c\u2500\u2500 connections/ # Component relationship registries\n\u2502 \u2514\u2500\u2500 demonstrations/ # Usage examples and tutorials\n\u2514\u2500\u2500 utils/ # Shared utilities\n\nslipbox/ # Knowledge Documentation\n\u251c\u2500\u2500 models/ # Model documentation and analysis\n\u251c\u2500\u2500 processing/ # Processing component documentation\n\u2514\u2500\u2500 knowledge/ # Knowledge system documentation\n```\n\n### Core Design Principles\n\n1. **Atomicity**: Each component focuses on a single, well-defined responsibility\n2. **Explicit Connectivity**: Relationships between components are explicitly documented\n3. **Emergent Organization**: Structure evolves naturally from content relationships\n4. **Knowledge Preservation**: Implementation and context are preserved together\n\n## Installation\n\n```bash\npip install athelas\n```\n\n### Development Installation\n\n```bash\ngit clone https://github.com/TianpeiLuke/athelas.git\ncd athelas\npip install -e .\n```\n\n## Quick Start\n\n### Basic Model Usage\n\n```python\nfrom athelas.models.lightning import BertClassifier\nfrom athelas.processing.text import BertTokenizeProcessor\n\n# Initialize components\ntokenizer = BertTokenizeProcessor()\nmodel = BertClassifier(config={\n 'num_classes': 2,\n 'learning_rate': 2e-5\n})\n\n# Process data\nprocessed_text = tokenizer(\"Example text for classification\")\n\n# Train model\ntrainer = pl.Trainer(max_epochs=3)\ntrainer.fit(model, train_dataloader)\n```\n\n### Knowledge System Queries\n\n```python\nfrom athelas.knowledge.retriever import KnowledgeRetriever\n\n# Initialize knowledge retriever\nretriever = KnowledgeRetriever()\n\n# Semantic search for components\nresults = retriever.search(\"text classification with BERT\")\n\n# Explore component relationships\nrelated = retriever.find_related_components(\"bert_classifier\")\n\n# Get recommendations based on context\nrecommendations = retriever.recommend_components({\n 'task': 'text_classification',\n 'data_type': 'text',\n 'framework': 'lightning'\n})\n```\n\n### Component Discovery\n\n```python\nfrom athelas.knowledge.orchestrator import KnowledgeOrchestrator\n\n# Initialize orchestrator\norchestrator = KnowledgeOrchestrator()\n\n# Discover compatible processors for a model\ncompatible = orchestrator.find_compatible_processors(\"bert_classifier\")\n\n# Get alternative models for a task\nalternatives = orchestrator.find_alternatives(\"text_classification\")\n```\n\n## Available Components\n\n### Models\n\n#### Text Classification\n- **BERT Classifier**: Transformer-based classification with Hugging Face integration\n- **Text CNN**: Convolutional neural networks for text classification\n- **LSTM**: Recurrent neural networks for sequence classification\n\n#### Multimodal Models\n- **Multimodal BERT**: Text and tabular data fusion\n- **Cross-Attention**: Attention-based multimodal fusion\n- **Gate Fusion**: Gated multimodal feature combination\n- **Mixture of Experts**: Sparse multimodal processing\n\n#### Traditional ML\n- **XGBoost**: Gradient boosting for tabular data\n- **LightGBM**: Fast gradient boosting implementation\n\n### Processing Components\n\n#### Text Processing\n- **BERT Tokenizer**: Hugging Face BERT tokenization\n- **Gensim Processor**: Word2Vec and Doc2Vec processing\n- **Text Augmentation**: Data augmentation for text\n\n#### Tabular Processing\n- **Numerical Imputation**: Missing value handling\n- **Categorical Encoding**: Label and one-hot encoding\n- **Feature Engineering**: Automated feature creation\n\n## Knowledge Management\n\n### Intelligent Discovery\n\nAthelas includes intelligent agents that help you discover and connect components:\n\n```python\n# Ask natural language questions about the catalog\nanswer = retriever.ask(\"What models work best for multimodal classification?\")\n\n# Explore the knowledge graph\ngraph = retriever.get_knowledge_graph()\nconnections = graph.get_connections(\"bert_classifier\")\n```\n\n### Automatic Documentation\n\nThe Knowledge Orchestrator automatically:\n- Extracts metadata from component implementations\n- Maintains connection registries between components\n- Generates and updates documentation\n- Validates component relationships\n\n## Contributing\n\nAthelas follows Zettelkasten principles for contributions:\n\n1. **Atomic Components**: Create focused, single-purpose implementations\n2. **Explicit Connections**: Document relationships with other components\n3. **Rich Metadata**: Include structured metadata in component docstrings\n4. **Knowledge Documentation**: Provide contextual documentation in the slipbox\n\n### Adding a New Component\n\n```python\n\"\"\"\n---\ncomponent_type: model\nframework: lightning\ntask: text_classification\nconnections:\n requires:\n - \"processing.text.bert_tokenize_processor.BertTokenizeProcessor\"\n alternatives:\n - \"models.lightning.text_cnn.TextCNN\"\n---\n\"\"\"\n\nclass YourModel(LightningModule):\n \"\"\"Your model implementation with metadata.\"\"\"\n pass\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Citation\n\nIf you use Athelas in your research, please cite:\n\n```bibtex\n@software{athelas2025,\n title={Athelas: Zettelkasten-Inspired ML Model Catalog},\n author={Xie, Tianpei},\n year={2025},\n url={https://github.com/TianpeiLuke/athelas}\n}\n```\n\n## Related Projects\n\n- [Zettelkasten Method](https://zettelkasten.de/): Knowledge management methodology\n- [PyTorch Lightning](https://lightning.ai/): Framework for professional AI research\n- [Hugging Face Transformers](https://huggingface.co/transformers/): State-of-the-art NLP models\n\n---\n\n**Athelas**: *From the Greek word meaning \"healing\" - helping researchers and practitioners heal the fragmentation in ML model development through\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Zettelkasten-inspired ML model catalog with intelligent knowledge management",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/TianpeiLuke/athelas/issues",
"Documentation": "https://github.com/TianpeiLuke/athelas/blob/main/README.md",
"Homepage": "https://github.com/TianpeiLuke/athelas",
"Repository": "https://github.com/TianpeiLuke/athelas"
},
"split_keywords": [
"machine-learning",
" deep-learning",
" pytorch",
" knowledge-management",
" zettelkasten",
" model-catalog"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "24d2b73633dbad06d9c8ae20d00705a8bf68e87351c8d422559422ad3d6ec895",
"md5": "544a4720cdea8a29e21fd3cfeae8dab5",
"sha256": "b596ae4d2619a7608dc8e3669e598396e3e229e1f47f16775f375638f120870a"
},
"downloads": -1,
"filename": "athelas-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "544a4720cdea8a29e21fd3cfeae8dab5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 75469,
"upload_time": "2025-08-28T03:48:45",
"upload_time_iso_8601": "2025-08-28T03:48:45.877843Z",
"url": "https://files.pythonhosted.org/packages/24/d2/b73633dbad06d9c8ae20d00705a8bf68e87351c8d422559422ad3d6ec895/athelas-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "caf9ad902165310dff3c307028e27029c626ca2f88705fb5c2378c69762daf9c",
"md5": "8535d642229388a60ea65791e8724402",
"sha256": "a1613510506fc451015c7697f7ad44a0589d0feef40c7ebce6921ff043e6b64b"
},
"downloads": -1,
"filename": "athelas-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "8535d642229388a60ea65791e8724402",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 112942,
"upload_time": "2025-08-28T03:48:47",
"upload_time_iso_8601": "2025-08-28T03:48:47.711428Z",
"url": "https://files.pythonhosted.org/packages/ca/f9/ad902165310dff3c307028e27029c626ca2f88705fb5c2378c69762daf9c/athelas-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-28 03:48:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TianpeiLuke",
"github_project": "athelas",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "boto3",
"specs": [
[
">=",
"1.39.0"
]
]
},
{
"name": "botocore",
"specs": [
[
">=",
"1.39.0"
]
]
},
{
"name": "sagemaker",
"specs": [
[
">=",
"2.248.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.11.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "networkx",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.2.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.32.0"
]
]
},
{
"name": "packaging",
"specs": [
[
">=",
"24.2.0"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
">=",
"4.14.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.26.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "joblib",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
">=",
"14.0.0"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "torchvision",
"specs": [
[
">=",
"0.15.0"
]
]
},
{
"name": "torchaudio",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pytorch-lightning",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "torchmetrics",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "lightning",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.30.0"
]
]
},
{
"name": "spacy",
"specs": [
[
">=",
"3.7.0"
]
]
},
{
"name": "tokenizers",
"specs": [
[
">=",
"0.15.0"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
">=",
"0.20.0"
]
]
},
{
"name": "gensim",
"specs": [
[
">=",
"4.3.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.12.0"
]
]
},
{
"name": "xgboost",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.8.0"
]
]
},
{
"name": "onnx",
"specs": [
[
">=",
"1.15.0"
]
]
},
{
"name": "onnxruntime",
"specs": [
[
">=",
"1.16.0"
]
]
},
{
"name": "flask",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "jupyter",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "ipywidgets",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "plotly",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "nbformat",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "jinja2",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.12.0"
]
]
},
{
"name": "jupyterlab",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "ipython",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pytest-mock",
"specs": [
[
">=",
"3.10.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"23.0.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "pre-commit",
"specs": [
[
">=",
"3.0.0"
]
]
}
],
"lcname": "athelas"
}