# Layered Bias Probe
A comprehensive Python package for performing layer-wise bias analysis in language models, with support for fine-tuning and bias evolution tracking.
## Features
- **Layer-wise Bias Analysis**: Probe bias at each transformer layer using WEAT (Word Embedding Association Test) methodology
- **Multiple WEAT Categories**: Support for all WEAT categories including original, new human biases, and India-specific biases
- **Fine-tuning Integration**: Track bias evolution during model fine-tuning
- **Multi-language Support**: Analyze bias across different languages (English, Hindi, Bengali, etc.)
- **Flexible Model Support**: Works with 9+ popular language models
- **Export Results**: Save analysis results in CSV format with proper naming conventions
## Supported Models
- apple/OpenELM-270M
- facebook/MobileLLM-125M
- cerebras/Cerebras-GPT-111M
- EleutherAI/pythia-70m
- meta-llama/Llama-3.2-1B
- Qwen/Qwen2.5-1.5B
- google/gemma-2-2b
- ibm-granite/granite-3.3-2b-base
- HuggingFaceTB/SmolLM2-135M
## Installation
```bash
pip install layered-bias-probe
```
Or install from source:
```bash
git clone https://github.com/yourusername/layered-bias-probe.git
cd layered-bias-probe
pip install -e .
```
## Quick Start
### Basic Bias Analysis
```python
from layered_bias_probe import BiasProbe
# Initialize the probe
probe = BiasProbe(
model_name="EleutherAI/pythia-70m",
cache_dir="./cache"
)
# Run bias analysis
results = probe.analyze_bias(
languages=["en", "hi"],
weat_categories=["WEAT1", "WEAT2", "WEAT6"],
output_dir="./results"
)
print(f"Analysis complete! Results saved to: {results['output_path']}")
```
### Fine-tuning with Bias Tracking
```python
from layered_bias_probe import FineTuner
# Initialize fine-tuner with bias tracking
tuner = FineTuner(
model_name="EleutherAI/pythia-70m",
dataset_name="iamshnoo/alpaca-cleaned-hindi",
track_bias=True,
bias_languages=["en", "hi"],
weat_categories=["WEAT1", "WEAT2", "WEAT6"]
)
# Fine-tune model and track bias evolution
results = tuner.train(
num_epochs=5,
batch_size=4,
learning_rate=2e-5,
output_dir="./fine_tuned_model"
)
```
### Command Line Interface
```bash
# Basic bias analysis
layered-bias-probe analyze --model "EleutherAI/pythia-70m" --languages en hi --output ./results
# Fine-tuning with bias tracking
layered-bias-probe finetune --model "EleutherAI/pythia-70m" --dataset "iamshnoo/alpaca-cleaned-hindi" --track-bias --output ./results
# List available WEAT categories
layered-bias-probe list-weat
# Get model info
layered-bias-probe model-info --model "EleutherAI/pythia-70m"
```
## WEAT Categories
The package supports multiple WEAT (Word Embedding Association Test) categories:
### Original WEAT Tests
- **WEAT1**: Flowers vs. Insects with Pleasant vs. Unpleasant
- **WEAT2**: Instruments vs. Weapons with Pleasant vs. Unpleasant
- **WEAT6**: Career vs. Family with Male vs. Female Names
- **WEAT7**: Math vs. Arts with Male vs. Female Terms
- **WEAT8**: Science vs. Arts with Male vs. Female Terms
- **WEAT9**: Mental vs. Physical Disease with Temporary vs. Permanent
### New Human Biases (WEAT11-15)
- **WEAT11-15**: Various social and cultural bias categories
### India-Specific Biases (WEAT16-26)
- **WEAT16-26**: Caste, religion, and regional bias categories specific to Indian context
## Configuration
Create a `config.yaml` file to customize default settings:
```yaml
# Default model settings
model:
cache_dir: "./cache"
device_map: "auto"
torch_dtype: "float16"
quantization: true
# Bias analysis settings
bias_analysis:
default_languages: ["en"]
default_weat_categories: ["WEAT1", "WEAT2", "WEAT6"]
batch_size: 1
# Fine-tuning settings
fine_tuning:
default_epochs: 5
default_batch_size: 4
default_learning_rate: 2e-5
save_strategy: "epoch"
# Output settings
output:
results_format: "csv"
include_timestamp: true
compression: false
```
## Advanced Usage
### Custom WEAT Categories
```python
from layered_bias_probe import BiasProbe, WEATCategory
# Define custom WEAT category
custom_weat = WEATCategory(
name="CUSTOM1",
target1=["word1", "word2"],
target2=["word3", "word4"],
attribute1=["attr1", "attr2"],
attribute2=["attr3", "attr4"],
language="en"
)
probe = BiasProbe("EleutherAI/pythia-70m")
results = probe.analyze_custom_bias(custom_weat, output_dir="./results")
```
### Batch Processing Multiple Models
```python
from layered_bias_probe import BatchProcessor
models = [
"EleutherAI/pythia-70m",
"facebook/MobileLLM-125M",
"cerebras/Cerebras-GPT-111M"
]
processor = BatchProcessor(models)
results = processor.run_bias_analysis(
languages=["en", "hi"],
weat_categories=["WEAT1", "WEAT2", "WEAT6"],
output_dir="./batch_results"
)
```
### Results Analysis and Visualization
```python
from layered_bias_probe import ResultsAnalyzer
# Load and analyze results
analyzer = ResultsAnalyzer("./results")
# Generate bias evolution plots
analyzer.plot_bias_evolution(
model_name="EleutherAI/pythia-70m",
weat_category="WEAT1",
language="en"
)
# Create heatmaps
analyzer.create_bias_heatmap(
model_name="EleutherAI/pythia-70m",
languages=["en", "hi"]
)
# Export summary statistics
summary = analyzer.generate_summary_report()
```
## Output Format
Results are saved in CSV format with the following structure:
```csv
model_id,language,weat_category_id,layer_idx,weat_score,comments,timestamp
EleutherAI/pythia-70m,en,WEAT1,0,-0.234,Before_finetuning,2024-01-01T12:00:00
EleutherAI/pythia-70m,en,WEAT1,1,-0.187,Before_finetuning,2024-01-01T12:00:00
...
```
## Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Citation
If you use this package in your research, please cite:
```bibtex
@software{layered_bias_probe,
title={Layered Bias Probe: A Toolkit for Layer-wise Bias Analysis in Language Models},
author={Koushik Deb},
year={2025},
url={https://github.com/DevDaring/layered-bias-probe}
}
```
## Acknowledgments
This package builds upon the WEAT methodology and WEATHub dataset. Special thanks to the research community for their contributions to bias detection in NLP.
Raw data
{
"_id": null,
"home_page": "https://github.com/DevDaring/layered-bias-probe",
"name": "layered-bias-probe",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "bias nlp language-models weat fairness transformers machine-learning layer-analysis fine-tuning",
"author": "DebK",
"author_email": "koushikdeb88@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/56/34/32e17ed2ce24f1afa8ae0db6341ee4f7419c929ea891883b573734b287a3/layered_bias_probe-1.0.1.tar.gz",
"platform": null,
"description": "# Layered Bias Probe\r\n\r\nA comprehensive Python package for performing layer-wise bias analysis in language models, with support for fine-tuning and bias evolution tracking.\r\n\r\n## Features\r\n\r\n- **Layer-wise Bias Analysis**: Probe bias at each transformer layer using WEAT (Word Embedding Association Test) methodology\r\n- **Multiple WEAT Categories**: Support for all WEAT categories including original, new human biases, and India-specific biases\r\n- **Fine-tuning Integration**: Track bias evolution during model fine-tuning\r\n- **Multi-language Support**: Analyze bias across different languages (English, Hindi, Bengali, etc.)\r\n- **Flexible Model Support**: Works with 9+ popular language models\r\n- **Export Results**: Save analysis results in CSV format with proper naming conventions\r\n\r\n## Supported Models\r\n\r\n- apple/OpenELM-270M\r\n- facebook/MobileLLM-125M\r\n- cerebras/Cerebras-GPT-111M \r\n- EleutherAI/pythia-70m\r\n- meta-llama/Llama-3.2-1B\r\n- Qwen/Qwen2.5-1.5B\r\n- google/gemma-2-2b\r\n- ibm-granite/granite-3.3-2b-base\r\n- HuggingFaceTB/SmolLM2-135M\r\n\r\n## Installation\r\n\r\n```bash\r\npip install layered-bias-probe\r\n```\r\n\r\nOr install from source:\r\n\r\n```bash\r\ngit clone https://github.com/yourusername/layered-bias-probe.git\r\ncd layered-bias-probe\r\npip install -e .\r\n```\r\n\r\n## Quick Start\r\n\r\n### Basic Bias Analysis\r\n\r\n```python\r\nfrom layered_bias_probe import BiasProbe\r\n\r\n# Initialize the probe\r\nprobe = BiasProbe(\r\n model_name=\"EleutherAI/pythia-70m\",\r\n cache_dir=\"./cache\"\r\n)\r\n\r\n# Run bias analysis\r\nresults = probe.analyze_bias(\r\n languages=[\"en\", \"hi\"],\r\n weat_categories=[\"WEAT1\", \"WEAT2\", \"WEAT6\"],\r\n output_dir=\"./results\"\r\n)\r\n\r\nprint(f\"Analysis complete! Results saved to: {results['output_path']}\")\r\n```\r\n\r\n### Fine-tuning with Bias Tracking\r\n\r\n```python\r\nfrom layered_bias_probe import FineTuner\r\n\r\n# Initialize fine-tuner with bias tracking\r\ntuner = FineTuner(\r\n model_name=\"EleutherAI/pythia-70m\",\r\n dataset_name=\"iamshnoo/alpaca-cleaned-hindi\",\r\n track_bias=True,\r\n bias_languages=[\"en\", \"hi\"],\r\n weat_categories=[\"WEAT1\", \"WEAT2\", \"WEAT6\"]\r\n)\r\n\r\n# Fine-tune model and track bias evolution\r\nresults = tuner.train(\r\n num_epochs=5,\r\n batch_size=4,\r\n learning_rate=2e-5,\r\n output_dir=\"./fine_tuned_model\"\r\n)\r\n```\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Basic bias analysis\r\nlayered-bias-probe analyze --model \"EleutherAI/pythia-70m\" --languages en hi --output ./results\r\n\r\n# Fine-tuning with bias tracking\r\nlayered-bias-probe finetune --model \"EleutherAI/pythia-70m\" --dataset \"iamshnoo/alpaca-cleaned-hindi\" --track-bias --output ./results\r\n\r\n# List available WEAT categories\r\nlayered-bias-probe list-weat\r\n\r\n# Get model info\r\nlayered-bias-probe model-info --model \"EleutherAI/pythia-70m\"\r\n```\r\n\r\n## WEAT Categories\r\n\r\nThe package supports multiple WEAT (Word Embedding Association Test) categories:\r\n\r\n### Original WEAT Tests\r\n- **WEAT1**: Flowers vs. Insects with Pleasant vs. Unpleasant\r\n- **WEAT2**: Instruments vs. Weapons with Pleasant vs. Unpleasant \r\n- **WEAT6**: Career vs. Family with Male vs. Female Names\r\n- **WEAT7**: Math vs. Arts with Male vs. Female Terms\r\n- **WEAT8**: Science vs. Arts with Male vs. Female Terms\r\n- **WEAT9**: Mental vs. Physical Disease with Temporary vs. Permanent\r\n\r\n### New Human Biases (WEAT11-15)\r\n- **WEAT11-15**: Various social and cultural bias categories\r\n\r\n### India-Specific Biases (WEAT16-26) \r\n- **WEAT16-26**: Caste, religion, and regional bias categories specific to Indian context\r\n\r\n## Configuration\r\n\r\nCreate a `config.yaml` file to customize default settings:\r\n\r\n```yaml\r\n# Default model settings\r\nmodel:\r\n cache_dir: \"./cache\"\r\n device_map: \"auto\"\r\n torch_dtype: \"float16\"\r\n quantization: true\r\n\r\n# Bias analysis settings\r\nbias_analysis:\r\n default_languages: [\"en\"]\r\n default_weat_categories: [\"WEAT1\", \"WEAT2\", \"WEAT6\"]\r\n batch_size: 1\r\n \r\n# Fine-tuning settings\r\nfine_tuning:\r\n default_epochs: 5\r\n default_batch_size: 4\r\n default_learning_rate: 2e-5\r\n save_strategy: \"epoch\"\r\n \r\n# Output settings\r\noutput:\r\n results_format: \"csv\"\r\n include_timestamp: true\r\n compression: false\r\n```\r\n\r\n## Advanced Usage\r\n\r\n### Custom WEAT Categories\r\n\r\n```python\r\nfrom layered_bias_probe import BiasProbe, WEATCategory\r\n\r\n# Define custom WEAT category\r\ncustom_weat = WEATCategory(\r\n name=\"CUSTOM1\",\r\n target1=[\"word1\", \"word2\"],\r\n target2=[\"word3\", \"word4\"], \r\n attribute1=[\"attr1\", \"attr2\"],\r\n attribute2=[\"attr3\", \"attr4\"],\r\n language=\"en\"\r\n)\r\n\r\nprobe = BiasProbe(\"EleutherAI/pythia-70m\")\r\nresults = probe.analyze_custom_bias(custom_weat, output_dir=\"./results\")\r\n```\r\n\r\n### Batch Processing Multiple Models\r\n\r\n```python\r\nfrom layered_bias_probe import BatchProcessor\r\n\r\nmodels = [\r\n \"EleutherAI/pythia-70m\",\r\n \"facebook/MobileLLM-125M\", \r\n \"cerebras/Cerebras-GPT-111M\"\r\n]\r\n\r\nprocessor = BatchProcessor(models)\r\nresults = processor.run_bias_analysis(\r\n languages=[\"en\", \"hi\"],\r\n weat_categories=[\"WEAT1\", \"WEAT2\", \"WEAT6\"],\r\n output_dir=\"./batch_results\"\r\n)\r\n```\r\n\r\n### Results Analysis and Visualization\r\n\r\n```python\r\nfrom layered_bias_probe import ResultsAnalyzer\r\n\r\n# Load and analyze results\r\nanalyzer = ResultsAnalyzer(\"./results\")\r\n\r\n# Generate bias evolution plots\r\nanalyzer.plot_bias_evolution(\r\n model_name=\"EleutherAI/pythia-70m\",\r\n weat_category=\"WEAT1\",\r\n language=\"en\"\r\n)\r\n\r\n# Create heatmaps\r\nanalyzer.create_bias_heatmap(\r\n model_name=\"EleutherAI/pythia-70m\",\r\n languages=[\"en\", \"hi\"]\r\n)\r\n\r\n# Export summary statistics\r\nsummary = analyzer.generate_summary_report()\r\n```\r\n\r\n## Output Format\r\n\r\nResults are saved in CSV format with the following structure:\r\n\r\n```csv\r\nmodel_id,language,weat_category_id,layer_idx,weat_score,comments,timestamp\r\nEleutherAI/pythia-70m,en,WEAT1,0,-0.234,Before_finetuning,2024-01-01T12:00:00\r\nEleutherAI/pythia-70m,en,WEAT1,1,-0.187,Before_finetuning,2024-01-01T12:00:00\r\n...\r\n```\r\n\r\n## Contributing\r\n\r\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Citation\r\n\r\nIf you use this package in your research, please cite:\r\n\r\n```bibtex\r\n@software{layered_bias_probe,\r\n title={Layered Bias Probe: A Toolkit for Layer-wise Bias Analysis in Language Models},\r\n author={Koushik Deb},\r\n year={2025},\r\n url={https://github.com/DevDaring/layered-bias-probe}\r\n}\r\n```\r\n\r\n## Acknowledgments\r\n\r\nThis package builds upon the WEAT methodology and WEATHub dataset. Special thanks to the research community for their contributions to bias detection in NLP.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A comprehensive toolkit for layer-wise bias analysis in language models with fine-tuning support",
"version": "1.0.1",
"project_urls": {
"Bug Reports": "https://github.com/DevDaring/layered-bias-probe/issues",
"Documentation": "https://github.com/DevDaring/layered-bias-probe#readme",
"Homepage": "https://github.com/DevDaring/layered-bias-probe",
"Source": "https://github.com/DevDaring/layered-bias-probe"
},
"split_keywords": [
"bias",
"nlp",
"language-models",
"weat",
"fairness",
"transformers",
"machine-learning",
"layer-analysis",
"fine-tuning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e47237dd0cb89a23be804b5b1e23bba9e10f824b1381f166fc8e50aa6ad2b0bc",
"md5": "8d8bb2c384bda8663e88bc690c2ec37d",
"sha256": "09cb84e44296364746efa08bf377929f4624e12030be52fba14dc1bda94e31b2"
},
"downloads": -1,
"filename": "layered_bias_probe-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8d8bb2c384bda8663e88bc690c2ec37d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39789,
"upload_time": "2025-08-29T15:30:59",
"upload_time_iso_8601": "2025-08-29T15:30:59.796179Z",
"url": "https://files.pythonhosted.org/packages/e4/72/37dd0cb89a23be804b5b1e23bba9e10f824b1381f166fc8e50aa6ad2b0bc/layered_bias_probe-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "563432e17ed2ce24f1afa8ae0db6341ee4f7419c929ea891883b573734b287a3",
"md5": "4f36e75a620fb71d75daaccd5450bedf",
"sha256": "d62387b455036c5317e95c8bdca31bd9b29572857707fa2a37a46e3cac94965a"
},
"downloads": -1,
"filename": "layered_bias_probe-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "4f36e75a620fb71d75daaccd5450bedf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 46533,
"upload_time": "2025-08-29T15:31:01",
"upload_time_iso_8601": "2025-08-29T15:31:01.536268Z",
"url": "https://files.pythonhosted.org/packages/56/34/32e17ed2ce24f1afa8ae0db6341ee4f7419c929ea891883b573734b287a3/layered_bias_probe-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 15:31:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DevDaring",
"github_project": "layered-bias-probe",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.32.0"
]
]
},
{
"name": "datasets",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.64.0"
]
]
},
{
"name": "accelerate",
"specs": [
[
">=",
"0.20.0"
]
]
},
{
"name": "bitsandbytes",
"specs": [
[
">=",
"0.40.0"
]
]
},
{
"name": "huggingface_hub",
"specs": [
[
">=",
"0.16.0"
]
]
},
{
"name": "plotly",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.5.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "pyyaml",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "kaleido",
"specs": [
[
">=",
"0.1.0"
]
]
}
],
"lcname": "layered-bias-probe"
}