# AMBER: Attention-based Multi-head Bidirectional Enhanced Representations
[](https://badge.fury.io/py/amber-embeddings)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
AMBER is a hybrid word embedding framework that bridges statistical and neural semantics by combining TF-IDF weighting with multi-head self-attention mechanisms. It enhances static word embeddings (like Word2Vec) with contextual awareness, making them suitable for word sense disambiguation and semantic search tasks.
## ๐ Key Features
- **Context-Aware Embeddings**: Transform static embeddings into context-sensitive representations
- **Hybrid Architecture**: Combines statistical (TF-IDF) and neural (attention) approaches
- **Multiple Attention Mechanisms**: Multi-head and positional attention variants
- **Plug-and-Play Design**: Works with any Word2Vec-compatible embedding model
- **Comprehensive Evaluation**: Built-in metrics including Context Sensitivity Score (CSS)
- **Lightweight & Interpretable**: Computationally efficient alternative to transformer models
- **Easy Integration**: Simple API for seamless integration into existing NLP pipelines
## ๐ฆ Installation
```bash
pip install amber-embeddings
```
Or install from source:
```bash
git clone https://github.com/Saiyam-Sandhir-Jain/AMBER.git
cd AMBER
pip install -e .
```
## ๐ง Quick Start
```python
from amber import AMBERModel, AMBERComparator
# Your text corpus
corpus = [
"The bass guitar sounds amazing in concert",
"He caught a large bass fish in the lake",
"The apple fell from the tree",
"Apple company released a new iPhone"
]
# Initialize AMBER model (uses Google News Word2Vec by default)
amber_model = AMBERModel(corpus)
# Get context-aware embeddings
embedding1 = amber_model.get_contextual_embedding(
word="bass",
sentence="The bass guitar sounds amazing",
method="multi_head"
)
embedding2 = amber_model.get_contextual_embedding(
word="bass",
sentence="He caught a large bass fish",
method="multi_head"
)
# The embeddings will be different, reflecting different contexts!
print(f"Embedding shapes: {embedding1.shape}, {embedding2.shape}")
```
## ๐ Core Components
### AMBERModel
The main model class that creates context-aware embeddings:
```python
from amber import AMBERModel
import gensim.downloader as api
# Option 1: Use default Word2Vec model
model = AMBERModel(corpus)
# Option 2: Use custom Word2Vec model
custom_w2v = api.load('word2vec-google-news-300')
model = AMBERModel(corpus, w2v_model=custom_w2v)
# Option 3: Custom TF-IDF parameters
model = AMBERModel(
corpus,
tfidf_params={
'max_features': 5000,
'min_df': 2,
'max_df': 0.8
}
)
```
### Embedding Methods
AMBER provides three embedding methods:
1. **Multi-head Attention** (`multi_head`): Best for disambiguation
2. **Positional Attention** (`positional`): Considers word proximity
3. **TF-IDF Only** (`tfidf_only`): Statistical weighting only
```python
# Multi-head attention (recommended)
embedding = model.get_contextual_embedding(
word="bank",
sentence="He went to the bank to deposit money",
method="multi_head",
num_heads=4,
temperature=0.8
)
# Positional attention
embedding = model.get_contextual_embedding(
word="bank",
sentence="The river bank was muddy",
method="positional",
window_size=5
)
```
### Batch Processing
Process multiple word-context pairs efficiently:
```python
batch_data = [
{"word": "apple", "sentence": "The apple fell from tree", "doc_idx": 0},
{"word": "apple", "sentence": "Apple released new iPhone", "doc_idx": 1},
{"word": "mouse", "sentence": "Computer mouse not working", "doc_idx": 2}
]
embeddings = model.batch_contextual_embeddings(batch_data, method="multi_head")
```
## ๐ Evaluation and Metrics
### Context Sensitivity Score (CSS)
CSS measures how much a word's embedding varies across different contexts:
```python
from amber import ContextSensitivityScore
# Calculate CSS for embeddings of the same word in different contexts
css_score = ContextSensitivityScore.calculate([embedding1, embedding2, embedding3])
print(f"Context Sensitivity Score: {css_score:.4f}")
# Higher CSS = better disambiguation ability
# Static embeddings have CSS โ 0
# AMBER embeddings have CSS > 0
```
### Model Comparison
Compare AMBER with baseline models:
```python
from amber import AMBERComparator
comparator = AMBERComparator(amber_model)
# Define test cases
test_cases = [
{
'word': 'bank',
'contexts': [
{'sentence': 'He went to the bank to deposit money', 'type': 'Financial'},
{'sentence': 'The river bank was muddy after rain', 'type': 'Geographic'}
]
}
]
# Run comprehensive evaluation
results = comparator.evaluate_disambiguation(test_cases)
report = comparator.generate_report(test_cases)
print(report)
```
### Visualization
Generate comparison plots:
```python
# Run comprehensive comparison
comparison_df = comparator.comprehensive_comparison(test_cases)
# Create visualizations
comparator.visualize_comparison(comparison_df, save_path="amber_comparison.png")
```
## ๐ฌ Advanced Usage
### Custom Word Embedding Models
```python
import gensim.downloader as api
# Use different pre-trained models
glove_model = api.load('glove-wiki-gigaword-300')
amber_glove = AMBERModel(corpus, w2v_model=glove_model)
# Or load your own trained model
from gensim.models import KeyedVectors
custom_model = KeyedVectors.load_word2vec_format('path/to/your/model.bin', binary=True)
amber_custom = AMBERModel(corpus, w2v_model=custom_model)
```
### Fine-tuning Parameters
```python
# Adjust attention parameters
embedding = model.get_contextual_embedding(
word="python",
sentence="Python is a programming language",
method="multi_head",
num_heads=8, # More heads for complex contexts
temperature=0.5 # Lower temperature for sharper attention
)
# Adjust positional attention
embedding = model.get_contextual_embedding(
word="spring",
sentence="Spring brings beautiful flowers",
method="positional",
window_size=7, # Larger context window
position_decay=1.5 # Stronger distance decay
)
```
### Export Embeddings
```python
from amber.utils import export_embeddings
# Export for external use
words_and_contexts = [
{"word": "bank", "sentence": "Financial bank services"},
{"word": "bank", "sentence": "River bank location"}
]
# Export as dictionary
embeddings_dict = export_embeddings(model, words_and_contexts, output_format="dict")
# Export as numpy array
embeddings_array = export_embeddings(model, words_and_contexts, output_format="array")
# Export as pandas DataFrame (requires pandas)
embeddings_df = export_embeddings(model, words_and_contexts, output_format="dataframe")
```
## ๐งช Evaluation Results
AMBER demonstrates significant improvements over static embeddings:
| Method | Context Sensitivity Score | Disambiguation Accuracy |
|--------|--------------------------|-------------------------|
| Word2Vec (Static) | 0.000 | 61% |
| TF-IDF Weighted | 0.018 | 73% |
| AMBER Multi-head | **0.043** | **87%** |
| AMBER Positional | 0.037 | 84% |
## ๐ Use Cases
- **Word Sense Disambiguation**: Distinguish between different meanings of polysemous words
- **Semantic Search**: Improve search relevance with context-aware embeddings
- **Document Classification**: Enhanced feature representations for text classification
- **Similarity Matching**: More accurate semantic similarity in specific domains
- **Information Retrieval**: Better query-document matching with contextual understanding
## ๐ ๏ธ Technical Details
### Architecture
AMBER enhances static embeddings through three key components:
1. **TF-IDF Scaling**: Weights embeddings by lexical importance
2. **Multi-head Attention**: Captures different types of contextual relationships
3. **Residual Fusion**: Preserves semantic stability while adding contextual adaptation
### Mathematical Foundation
The final contextual embedding is computed as:
```
F = ฮณ ยท MultiHeadAttention(TF-IDF(E)) + (1 - ฮณ) ยท E
```
Where:
- `E` is the original Word2Vec embedding
- `TF-IDF(E)` applies sentence-level TF-IDF weighting
- `MultiHeadAttention` computes contextual relationships
- `ฮณ` controls the context-static balance
## ๐ Citation
If you use AMBER in your research, please cite:
```bibtex
@article{jain2024amber,
title={not decided},
author={Jain, Saiyam and Bhowmik, Swaroop and Choudhury, Dipanjan},
journal={not decided},
year={}
}
```
## ๐ค Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- Google News Word2Vec model for default embeddings
- Gensim library for word embedding utilities
- scikit-learn for TF-IDF implementation
- The research community for inspiration and feedback
## ๐ Support
- **Documentation**: [Read the Docs](https://amber-embeddings.readthedocs.io/)
- **Issues**: [GitHub Issues](https://github.com/Saiyam-Sandhir-Jain/AMBER/issues)
- **Email**: saiyam.sandhir.jain@gmail.com
- **Paper**: [arXiv:xxxx.xxxxx](https://arxiv.org/abs/xxxx.xxxxx)
---
**AMBER**: Making word embeddings context-aware, one attention head at a time! ๐ฏ
Raw data
{
"_id": null,
"home_page": "https://github.com/Saiyam-Sandhir-Jain/AMBER",
"name": "amber-embeddings",
"maintainer": "Saiyam Jain",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "saiyam.sandhir.jain@gmail.com",
"keywords": "nlp, word-embeddings, attention, context-aware, word-sense-disambiguation, semantic-search, tfidf, word2vec, machine-learning, natural-language-processing",
"author": "Saiyam Jain",
"author_email": "saiyam.sandhir.jain@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cf/fe/033ce0d0745b0ecce08c722b9cdbab499d8a5d237e55624b8bc98650183a/amber_embeddings-0.1.0.tar.gz",
"platform": null,
"description": "# AMBER: Attention-based Multi-head Bidirectional Enhanced Representations\n\n[](https://badge.fury.io/py/amber-embeddings)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nAMBER is a hybrid word embedding framework that bridges statistical and neural semantics by combining TF-IDF weighting with multi-head self-attention mechanisms. It enhances static word embeddings (like Word2Vec) with contextual awareness, making them suitable for word sense disambiguation and semantic search tasks.\n\n## \ud83d\ude80 Key Features\n\n- **Context-Aware Embeddings**: Transform static embeddings into context-sensitive representations\n- **Hybrid Architecture**: Combines statistical (TF-IDF) and neural (attention) approaches\n- **Multiple Attention Mechanisms**: Multi-head and positional attention variants\n- **Plug-and-Play Design**: Works with any Word2Vec-compatible embedding model\n- **Comprehensive Evaluation**: Built-in metrics including Context Sensitivity Score (CSS)\n- **Lightweight & Interpretable**: Computationally efficient alternative to transformer models\n- **Easy Integration**: Simple API for seamless integration into existing NLP pipelines\n\n## \ud83d\udce6 Installation\n\n```bash\npip install amber-embeddings\n```\n\nOr install from source:\n\n```bash\ngit clone https://github.com/Saiyam-Sandhir-Jain/AMBER.git\ncd AMBER\npip install -e .\n```\n\n## \ud83d\udd27 Quick Start\n\n```python\nfrom amber import AMBERModel, AMBERComparator\n\n# Your text corpus\ncorpus = [\n \"The bass guitar sounds amazing in concert\",\n \"He caught a large bass fish in the lake\",\n \"The apple fell from the tree\",\n \"Apple company released a new iPhone\"\n]\n\n# Initialize AMBER model (uses Google News Word2Vec by default)\namber_model = AMBERModel(corpus)\n\n# Get context-aware embeddings\nembedding1 = amber_model.get_contextual_embedding(\n word=\"bass\",\n sentence=\"The bass guitar sounds amazing\",\n method=\"multi_head\"\n)\n\nembedding2 = amber_model.get_contextual_embedding(\n word=\"bass\", \n sentence=\"He caught a large bass fish\",\n method=\"multi_head\"\n)\n\n# The embeddings will be different, reflecting different contexts!\nprint(f\"Embedding shapes: {embedding1.shape}, {embedding2.shape}\")\n```\n\n## \ud83d\udcd6 Core Components\n\n### AMBERModel\n\nThe main model class that creates context-aware embeddings:\n\n```python\nfrom amber import AMBERModel\nimport gensim.downloader as api\n\n# Option 1: Use default Word2Vec model\nmodel = AMBERModel(corpus)\n\n# Option 2: Use custom Word2Vec model\ncustom_w2v = api.load('word2vec-google-news-300')\nmodel = AMBERModel(corpus, w2v_model=custom_w2v)\n\n# Option 3: Custom TF-IDF parameters\nmodel = AMBERModel(\n corpus,\n tfidf_params={\n 'max_features': 5000,\n 'min_df': 2,\n 'max_df': 0.8\n }\n)\n```\n\n### Embedding Methods\n\nAMBER provides three embedding methods:\n\n1. **Multi-head Attention** (`multi_head`): Best for disambiguation\n2. **Positional Attention** (`positional`): Considers word proximity\n3. **TF-IDF Only** (`tfidf_only`): Statistical weighting only\n\n```python\n# Multi-head attention (recommended)\nembedding = model.get_contextual_embedding(\n word=\"bank\",\n sentence=\"He went to the bank to deposit money\",\n method=\"multi_head\",\n num_heads=4,\n temperature=0.8\n)\n\n# Positional attention\nembedding = model.get_contextual_embedding(\n word=\"bank\",\n sentence=\"The river bank was muddy\",\n method=\"positional\",\n window_size=5\n)\n```\n\n### Batch Processing\n\nProcess multiple word-context pairs efficiently:\n\n```python\nbatch_data = [\n {\"word\": \"apple\", \"sentence\": \"The apple fell from tree\", \"doc_idx\": 0},\n {\"word\": \"apple\", \"sentence\": \"Apple released new iPhone\", \"doc_idx\": 1},\n {\"word\": \"mouse\", \"sentence\": \"Computer mouse not working\", \"doc_idx\": 2}\n]\n\nembeddings = model.batch_contextual_embeddings(batch_data, method=\"multi_head\")\n```\n\n## \ud83d\udcca Evaluation and Metrics\n\n### Context Sensitivity Score (CSS)\n\nCSS measures how much a word's embedding varies across different contexts:\n\n```python\nfrom amber import ContextSensitivityScore\n\n# Calculate CSS for embeddings of the same word in different contexts\ncss_score = ContextSensitivityScore.calculate([embedding1, embedding2, embedding3])\nprint(f\"Context Sensitivity Score: {css_score:.4f}\")\n\n# Higher CSS = better disambiguation ability\n# Static embeddings have CSS \u2248 0\n# AMBER embeddings have CSS > 0\n```\n\n### Model Comparison\n\nCompare AMBER with baseline models:\n\n```python\nfrom amber import AMBERComparator\n\ncomparator = AMBERComparator(amber_model)\n\n# Define test cases\ntest_cases = [\n {\n 'word': 'bank',\n 'contexts': [\n {'sentence': 'He went to the bank to deposit money', 'type': 'Financial'},\n {'sentence': 'The river bank was muddy after rain', 'type': 'Geographic'}\n ]\n }\n]\n\n# Run comprehensive evaluation\nresults = comparator.evaluate_disambiguation(test_cases)\nreport = comparator.generate_report(test_cases)\nprint(report)\n```\n\n### Visualization\n\nGenerate comparison plots:\n\n```python\n# Run comprehensive comparison\ncomparison_df = comparator.comprehensive_comparison(test_cases)\n\n# Create visualizations\ncomparator.visualize_comparison(comparison_df, save_path=\"amber_comparison.png\")\n```\n\n## \ud83d\udd2c Advanced Usage\n\n### Custom Word Embedding Models\n\n```python\nimport gensim.downloader as api\n\n# Use different pre-trained models\nglove_model = api.load('glove-wiki-gigaword-300')\namber_glove = AMBERModel(corpus, w2v_model=glove_model)\n\n# Or load your own trained model\nfrom gensim.models import KeyedVectors\ncustom_model = KeyedVectors.load_word2vec_format('path/to/your/model.bin', binary=True)\namber_custom = AMBERModel(corpus, w2v_model=custom_model)\n```\n\n### Fine-tuning Parameters\n\n```python\n# Adjust attention parameters\nembedding = model.get_contextual_embedding(\n word=\"python\",\n sentence=\"Python is a programming language\",\n method=\"multi_head\",\n num_heads=8, # More heads for complex contexts\n temperature=0.5 # Lower temperature for sharper attention\n)\n\n# Adjust positional attention\nembedding = model.get_contextual_embedding(\n word=\"spring\",\n sentence=\"Spring brings beautiful flowers\",\n method=\"positional\", \n window_size=7, # Larger context window\n position_decay=1.5 # Stronger distance decay\n)\n```\n\n### Export Embeddings\n\n```python\nfrom amber.utils import export_embeddings\n\n# Export for external use\nwords_and_contexts = [\n {\"word\": \"bank\", \"sentence\": \"Financial bank services\"},\n {\"word\": \"bank\", \"sentence\": \"River bank location\"}\n]\n\n# Export as dictionary\nembeddings_dict = export_embeddings(model, words_and_contexts, output_format=\"dict\")\n\n# Export as numpy array\nembeddings_array = export_embeddings(model, words_and_contexts, output_format=\"array\")\n\n# Export as pandas DataFrame (requires pandas)\nembeddings_df = export_embeddings(model, words_and_contexts, output_format=\"dataframe\")\n```\n\n## \ud83e\uddea Evaluation Results\n\nAMBER demonstrates significant improvements over static embeddings:\n\n| Method | Context Sensitivity Score | Disambiguation Accuracy |\n|--------|--------------------------|-------------------------|\n| Word2Vec (Static) | 0.000 | 61% |\n| TF-IDF Weighted | 0.018 | 73% |\n| AMBER Multi-head | **0.043** | **87%** |\n| AMBER Positional | 0.037 | 84% |\n\n## \ud83d\udd0d Use Cases\n\n- **Word Sense Disambiguation**: Distinguish between different meanings of polysemous words\n- **Semantic Search**: Improve search relevance with context-aware embeddings\n- **Document Classification**: Enhanced feature representations for text classification\n- **Similarity Matching**: More accurate semantic similarity in specific domains\n- **Information Retrieval**: Better query-document matching with contextual understanding\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### Architecture\n\nAMBER enhances static embeddings through three key components:\n\n1. **TF-IDF Scaling**: Weights embeddings by lexical importance\n2. **Multi-head Attention**: Captures different types of contextual relationships\n3. **Residual Fusion**: Preserves semantic stability while adding contextual adaptation\n\n### Mathematical Foundation\n\nThe final contextual embedding is computed as:\n\n```\nF = \u03b3 \u00b7 MultiHeadAttention(TF-IDF(E)) + (1 - \u03b3) \u00b7 E\n```\n\nWhere:\n- `E` is the original Word2Vec embedding\n- `TF-IDF(E)` applies sentence-level TF-IDF weighting\n- `MultiHeadAttention` computes contextual relationships\n- `\u03b3` controls the context-static balance\n\n## \ud83d\udcda Citation\n\nIf you use AMBER in your research, please cite:\n\n```bibtex\n@article{jain2024amber,\n title={not decided},\n author={Jain, Saiyam and Bhowmik, Swaroop and Choudhury, Dipanjan},\n journal={not decided},\n year={}\n}\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests\n5. Submit a pull request\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Google News Word2Vec model for default embeddings\n- Gensim library for word embedding utilities\n- scikit-learn for TF-IDF implementation\n- The research community for inspiration and feedback\n\n## \ud83d\udcde Support\n\n- **Documentation**: [Read the Docs](https://amber-embeddings.readthedocs.io/)\n- **Issues**: [GitHub Issues](https://github.com/Saiyam-Sandhir-Jain/AMBER/issues)\n- **Email**: saiyam.sandhir.jain@gmail.com\n- **Paper**: [arXiv:xxxx.xxxxx](https://arxiv.org/abs/xxxx.xxxxx)\n\n---\n\n**AMBER**: Making word embeddings context-aware, one attention head at a time! \ud83c\udfaf\n",
"bugtrack_url": null,
"license": null,
"summary": "Attention-based Multi-head Bidirectional Enhanced Representations for contextual word embeddings",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/Saiyam-Sandhir-Jain/AMBER/issues",
"Documentation": "https://amber-embeddings.readthedocs.io/",
"Homepage": "https://github.com/Saiyam-Sandhir-Jain/AMBER",
"Source Code": "https://github.com/Saiyam-Sandhir-Jain/AMBER"
},
"split_keywords": [
"nlp",
" word-embeddings",
" attention",
" context-aware",
" word-sense-disambiguation",
" semantic-search",
" tfidf",
" word2vec",
" machine-learning",
" natural-language-processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "df7ca9961e50c2610e0aa7c03130053e10d108e8f9d6d081333e7d4585f98892",
"md5": "8976fba296c4088be3fa0d3402cb6136",
"sha256": "c694edce6810d095ab13d264cd2de9e1a0e6efb5310940b67e2f6c9de9087c4c"
},
"downloads": -1,
"filename": "amber_embeddings-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8976fba296c4088be3fa0d3402cb6136",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 21529,
"upload_time": "2025-07-29T10:39:28",
"upload_time_iso_8601": "2025-07-29T10:39:28.253615Z",
"url": "https://files.pythonhosted.org/packages/df/7c/a9961e50c2610e0aa7c03130053e10d108e8f9d6d081333e7d4585f98892/amber_embeddings-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cffe033ce0d0745b0ecce08c722b9cdbab499d8a5d237e55624b8bc98650183a",
"md5": "f2029108430836b9e4374d2788b31605",
"sha256": "ca9e4c634b3b1de7b30218942a56d81f012ca9cd4d68f4d23bef13600404a507"
},
"downloads": -1,
"filename": "amber_embeddings-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "f2029108430836b9e4374d2788b31605",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 44793,
"upload_time": "2025-07-29T10:39:30",
"upload_time_iso_8601": "2025-07-29T10:39:30.313364Z",
"url": "https://files.pythonhosted.org/packages/cf/fe/033ce0d0745b0ecce08c722b9cdbab499d8a5d237e55624b8bc98650183a/amber_embeddings-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-29 10:39:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Saiyam-Sandhir-Jain",
"github_project": "AMBER",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.19.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "gensim",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.2.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
}
],
"lcname": "amber-embeddings"
}