layered-bias-probe


Namelayered-bias-probe JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/DevDaring/layered-bias-probe
SummaryA comprehensive toolkit for layer-wise bias analysis in language models with fine-tuning support
upload_time2025-08-29 15:31:01
maintainerNone
docs_urlNone
authorDebK
requires_python>=3.8
licenseNone
keywords bias nlp language-models weat fairness transformers machine-learning layer-analysis fine-tuning
VCS
bugtrack_url
requirements torch transformers datasets pandas numpy scikit-learn tqdm accelerate bitsandbytes huggingface_hub plotly matplotlib seaborn pyyaml click kaleido
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Layered Bias Probe

A comprehensive Python package for performing layer-wise bias analysis in language models, with support for fine-tuning and bias evolution tracking.

## Features

- **Layer-wise Bias Analysis**: Probe bias at each transformer layer using WEAT (Word Embedding Association Test) methodology
- **Multiple WEAT Categories**: Support for all WEAT categories including original, new human biases, and India-specific biases
- **Fine-tuning Integration**: Track bias evolution during model fine-tuning
- **Multi-language Support**: Analyze bias across different languages (English, Hindi, Bengali, etc.)
- **Flexible Model Support**: Works with 9+ popular language models
- **Export Results**: Save analysis results in CSV format with proper naming conventions

## Supported Models

- apple/OpenELM-270M
- facebook/MobileLLM-125M
- cerebras/Cerebras-GPT-111M 
- EleutherAI/pythia-70m
- meta-llama/Llama-3.2-1B
- Qwen/Qwen2.5-1.5B
- google/gemma-2-2b
- ibm-granite/granite-3.3-2b-base
- HuggingFaceTB/SmolLM2-135M

## Installation

```bash
pip install layered-bias-probe
```

Or install from source:

```bash
git clone https://github.com/yourusername/layered-bias-probe.git
cd layered-bias-probe
pip install -e .
```

## Quick Start

### Basic Bias Analysis

```python
from layered_bias_probe import BiasProbe

# Initialize the probe
probe = BiasProbe(
    model_name="EleutherAI/pythia-70m",
    cache_dir="./cache"
)

# Run bias analysis
results = probe.analyze_bias(
    languages=["en", "hi"],
    weat_categories=["WEAT1", "WEAT2", "WEAT6"],
    output_dir="./results"
)

print(f"Analysis complete! Results saved to: {results['output_path']}")
```

### Fine-tuning with Bias Tracking

```python
from layered_bias_probe import FineTuner

# Initialize fine-tuner with bias tracking
tuner = FineTuner(
    model_name="EleutherAI/pythia-70m",
    dataset_name="iamshnoo/alpaca-cleaned-hindi",
    track_bias=True,
    bias_languages=["en", "hi"],
    weat_categories=["WEAT1", "WEAT2", "WEAT6"]
)

# Fine-tune model and track bias evolution
results = tuner.train(
    num_epochs=5,
    batch_size=4,
    learning_rate=2e-5,
    output_dir="./fine_tuned_model"
)
```

### Command Line Interface

```bash
# Basic bias analysis
layered-bias-probe analyze --model "EleutherAI/pythia-70m" --languages en hi --output ./results

# Fine-tuning with bias tracking
layered-bias-probe finetune --model "EleutherAI/pythia-70m" --dataset "iamshnoo/alpaca-cleaned-hindi" --track-bias --output ./results

# List available WEAT categories
layered-bias-probe list-weat

# Get model info
layered-bias-probe model-info --model "EleutherAI/pythia-70m"
```

## WEAT Categories

The package supports multiple WEAT (Word Embedding Association Test) categories:

### Original WEAT Tests
- **WEAT1**: Flowers vs. Insects with Pleasant vs. Unpleasant
- **WEAT2**: Instruments vs. Weapons with Pleasant vs. Unpleasant  
- **WEAT6**: Career vs. Family with Male vs. Female Names
- **WEAT7**: Math vs. Arts with Male vs. Female Terms
- **WEAT8**: Science vs. Arts with Male vs. Female Terms
- **WEAT9**: Mental vs. Physical Disease with Temporary vs. Permanent

### New Human Biases (WEAT11-15)
- **WEAT11-15**: Various social and cultural bias categories

### India-Specific Biases (WEAT16-26)  
- **WEAT16-26**: Caste, religion, and regional bias categories specific to Indian context

## Configuration

Create a `config.yaml` file to customize default settings:

```yaml
# Default model settings
model:
  cache_dir: "./cache"
  device_map: "auto"
  torch_dtype: "float16"
  quantization: true

# Bias analysis settings
bias_analysis:
  default_languages: ["en"]
  default_weat_categories: ["WEAT1", "WEAT2", "WEAT6"]
  batch_size: 1
  
# Fine-tuning settings
fine_tuning:
  default_epochs: 5
  default_batch_size: 4
  default_learning_rate: 2e-5
  save_strategy: "epoch"
  
# Output settings
output:
  results_format: "csv"
  include_timestamp: true
  compression: false
```

## Advanced Usage

### Custom WEAT Categories

```python
from layered_bias_probe import BiasProbe, WEATCategory

# Define custom WEAT category
custom_weat = WEATCategory(
    name="CUSTOM1",
    target1=["word1", "word2"],
    target2=["word3", "word4"], 
    attribute1=["attr1", "attr2"],
    attribute2=["attr3", "attr4"],
    language="en"
)

probe = BiasProbe("EleutherAI/pythia-70m")
results = probe.analyze_custom_bias(custom_weat, output_dir="./results")
```

### Batch Processing Multiple Models

```python
from layered_bias_probe import BatchProcessor

models = [
    "EleutherAI/pythia-70m",
    "facebook/MobileLLM-125M", 
    "cerebras/Cerebras-GPT-111M"
]

processor = BatchProcessor(models)
results = processor.run_bias_analysis(
    languages=["en", "hi"],
    weat_categories=["WEAT1", "WEAT2", "WEAT6"],
    output_dir="./batch_results"
)
```

### Results Analysis and Visualization

```python
from layered_bias_probe import ResultsAnalyzer

# Load and analyze results
analyzer = ResultsAnalyzer("./results")

# Generate bias evolution plots
analyzer.plot_bias_evolution(
    model_name="EleutherAI/pythia-70m",
    weat_category="WEAT1",
    language="en"
)

# Create heatmaps
analyzer.create_bias_heatmap(
    model_name="EleutherAI/pythia-70m",
    languages=["en", "hi"]
)

# Export summary statistics
summary = analyzer.generate_summary_report()
```

## Output Format

Results are saved in CSV format with the following structure:

```csv
model_id,language,weat_category_id,layer_idx,weat_score,comments,timestamp
EleutherAI/pythia-70m,en,WEAT1,0,-0.234,Before_finetuning,2024-01-01T12:00:00
EleutherAI/pythia-70m,en,WEAT1,1,-0.187,Before_finetuning,2024-01-01T12:00:00
...
```

## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation

If you use this package in your research, please cite:

```bibtex
@software{layered_bias_probe,
  title={Layered Bias Probe: A Toolkit for Layer-wise Bias Analysis in Language Models},
  author={Koushik Deb},
  year={2025},
  url={https://github.com/DevDaring/layered-bias-probe}
}
```

## Acknowledgments

This package builds upon the WEAT methodology and WEATHub dataset. Special thanks to the research community for their contributions to bias detection in NLP.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/DevDaring/layered-bias-probe",
    "name": "layered-bias-probe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "bias nlp language-models weat fairness transformers machine-learning layer-analysis fine-tuning",
    "author": "DebK",
    "author_email": "koushikdeb88@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/56/34/32e17ed2ce24f1afa8ae0db6341ee4f7419c929ea891883b573734b287a3/layered_bias_probe-1.0.1.tar.gz",
    "platform": null,
    "description": "# Layered Bias Probe\r\n\r\nA comprehensive Python package for performing layer-wise bias analysis in language models, with support for fine-tuning and bias evolution tracking.\r\n\r\n## Features\r\n\r\n- **Layer-wise Bias Analysis**: Probe bias at each transformer layer using WEAT (Word Embedding Association Test) methodology\r\n- **Multiple WEAT Categories**: Support for all WEAT categories including original, new human biases, and India-specific biases\r\n- **Fine-tuning Integration**: Track bias evolution during model fine-tuning\r\n- **Multi-language Support**: Analyze bias across different languages (English, Hindi, Bengali, etc.)\r\n- **Flexible Model Support**: Works with 9+ popular language models\r\n- **Export Results**: Save analysis results in CSV format with proper naming conventions\r\n\r\n## Supported Models\r\n\r\n- apple/OpenELM-270M\r\n- facebook/MobileLLM-125M\r\n- cerebras/Cerebras-GPT-111M \r\n- EleutherAI/pythia-70m\r\n- meta-llama/Llama-3.2-1B\r\n- Qwen/Qwen2.5-1.5B\r\n- google/gemma-2-2b\r\n- ibm-granite/granite-3.3-2b-base\r\n- HuggingFaceTB/SmolLM2-135M\r\n\r\n## Installation\r\n\r\n```bash\r\npip install layered-bias-probe\r\n```\r\n\r\nOr install from source:\r\n\r\n```bash\r\ngit clone https://github.com/yourusername/layered-bias-probe.git\r\ncd layered-bias-probe\r\npip install -e .\r\n```\r\n\r\n## Quick Start\r\n\r\n### Basic Bias Analysis\r\n\r\n```python\r\nfrom layered_bias_probe import BiasProbe\r\n\r\n# Initialize the probe\r\nprobe = BiasProbe(\r\n    model_name=\"EleutherAI/pythia-70m\",\r\n    cache_dir=\"./cache\"\r\n)\r\n\r\n# Run bias analysis\r\nresults = probe.analyze_bias(\r\n    languages=[\"en\", \"hi\"],\r\n    weat_categories=[\"WEAT1\", \"WEAT2\", \"WEAT6\"],\r\n    output_dir=\"./results\"\r\n)\r\n\r\nprint(f\"Analysis complete! Results saved to: {results['output_path']}\")\r\n```\r\n\r\n### Fine-tuning with Bias Tracking\r\n\r\n```python\r\nfrom layered_bias_probe import FineTuner\r\n\r\n# Initialize fine-tuner with bias tracking\r\ntuner = FineTuner(\r\n    model_name=\"EleutherAI/pythia-70m\",\r\n    dataset_name=\"iamshnoo/alpaca-cleaned-hindi\",\r\n    track_bias=True,\r\n    bias_languages=[\"en\", \"hi\"],\r\n    weat_categories=[\"WEAT1\", \"WEAT2\", \"WEAT6\"]\r\n)\r\n\r\n# Fine-tune model and track bias evolution\r\nresults = tuner.train(\r\n    num_epochs=5,\r\n    batch_size=4,\r\n    learning_rate=2e-5,\r\n    output_dir=\"./fine_tuned_model\"\r\n)\r\n```\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Basic bias analysis\r\nlayered-bias-probe analyze --model \"EleutherAI/pythia-70m\" --languages en hi --output ./results\r\n\r\n# Fine-tuning with bias tracking\r\nlayered-bias-probe finetune --model \"EleutherAI/pythia-70m\" --dataset \"iamshnoo/alpaca-cleaned-hindi\" --track-bias --output ./results\r\n\r\n# List available WEAT categories\r\nlayered-bias-probe list-weat\r\n\r\n# Get model info\r\nlayered-bias-probe model-info --model \"EleutherAI/pythia-70m\"\r\n```\r\n\r\n## WEAT Categories\r\n\r\nThe package supports multiple WEAT (Word Embedding Association Test) categories:\r\n\r\n### Original WEAT Tests\r\n- **WEAT1**: Flowers vs. Insects with Pleasant vs. Unpleasant\r\n- **WEAT2**: Instruments vs. Weapons with Pleasant vs. Unpleasant  \r\n- **WEAT6**: Career vs. Family with Male vs. Female Names\r\n- **WEAT7**: Math vs. Arts with Male vs. Female Terms\r\n- **WEAT8**: Science vs. Arts with Male vs. Female Terms\r\n- **WEAT9**: Mental vs. Physical Disease with Temporary vs. Permanent\r\n\r\n### New Human Biases (WEAT11-15)\r\n- **WEAT11-15**: Various social and cultural bias categories\r\n\r\n### India-Specific Biases (WEAT16-26)  \r\n- **WEAT16-26**: Caste, religion, and regional bias categories specific to Indian context\r\n\r\n## Configuration\r\n\r\nCreate a `config.yaml` file to customize default settings:\r\n\r\n```yaml\r\n# Default model settings\r\nmodel:\r\n  cache_dir: \"./cache\"\r\n  device_map: \"auto\"\r\n  torch_dtype: \"float16\"\r\n  quantization: true\r\n\r\n# Bias analysis settings\r\nbias_analysis:\r\n  default_languages: [\"en\"]\r\n  default_weat_categories: [\"WEAT1\", \"WEAT2\", \"WEAT6\"]\r\n  batch_size: 1\r\n  \r\n# Fine-tuning settings\r\nfine_tuning:\r\n  default_epochs: 5\r\n  default_batch_size: 4\r\n  default_learning_rate: 2e-5\r\n  save_strategy: \"epoch\"\r\n  \r\n# Output settings\r\noutput:\r\n  results_format: \"csv\"\r\n  include_timestamp: true\r\n  compression: false\r\n```\r\n\r\n## Advanced Usage\r\n\r\n### Custom WEAT Categories\r\n\r\n```python\r\nfrom layered_bias_probe import BiasProbe, WEATCategory\r\n\r\n# Define custom WEAT category\r\ncustom_weat = WEATCategory(\r\n    name=\"CUSTOM1\",\r\n    target1=[\"word1\", \"word2\"],\r\n    target2=[\"word3\", \"word4\"], \r\n    attribute1=[\"attr1\", \"attr2\"],\r\n    attribute2=[\"attr3\", \"attr4\"],\r\n    language=\"en\"\r\n)\r\n\r\nprobe = BiasProbe(\"EleutherAI/pythia-70m\")\r\nresults = probe.analyze_custom_bias(custom_weat, output_dir=\"./results\")\r\n```\r\n\r\n### Batch Processing Multiple Models\r\n\r\n```python\r\nfrom layered_bias_probe import BatchProcessor\r\n\r\nmodels = [\r\n    \"EleutherAI/pythia-70m\",\r\n    \"facebook/MobileLLM-125M\", \r\n    \"cerebras/Cerebras-GPT-111M\"\r\n]\r\n\r\nprocessor = BatchProcessor(models)\r\nresults = processor.run_bias_analysis(\r\n    languages=[\"en\", \"hi\"],\r\n    weat_categories=[\"WEAT1\", \"WEAT2\", \"WEAT6\"],\r\n    output_dir=\"./batch_results\"\r\n)\r\n```\r\n\r\n### Results Analysis and Visualization\r\n\r\n```python\r\nfrom layered_bias_probe import ResultsAnalyzer\r\n\r\n# Load and analyze results\r\nanalyzer = ResultsAnalyzer(\"./results\")\r\n\r\n# Generate bias evolution plots\r\nanalyzer.plot_bias_evolution(\r\n    model_name=\"EleutherAI/pythia-70m\",\r\n    weat_category=\"WEAT1\",\r\n    language=\"en\"\r\n)\r\n\r\n# Create heatmaps\r\nanalyzer.create_bias_heatmap(\r\n    model_name=\"EleutherAI/pythia-70m\",\r\n    languages=[\"en\", \"hi\"]\r\n)\r\n\r\n# Export summary statistics\r\nsummary = analyzer.generate_summary_report()\r\n```\r\n\r\n## Output Format\r\n\r\nResults are saved in CSV format with the following structure:\r\n\r\n```csv\r\nmodel_id,language,weat_category_id,layer_idx,weat_score,comments,timestamp\r\nEleutherAI/pythia-70m,en,WEAT1,0,-0.234,Before_finetuning,2024-01-01T12:00:00\r\nEleutherAI/pythia-70m,en,WEAT1,1,-0.187,Before_finetuning,2024-01-01T12:00:00\r\n...\r\n```\r\n\r\n## Contributing\r\n\r\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Citation\r\n\r\nIf you use this package in your research, please cite:\r\n\r\n```bibtex\r\n@software{layered_bias_probe,\r\n  title={Layered Bias Probe: A Toolkit for Layer-wise Bias Analysis in Language Models},\r\n  author={Koushik Deb},\r\n  year={2025},\r\n  url={https://github.com/DevDaring/layered-bias-probe}\r\n}\r\n```\r\n\r\n## Acknowledgments\r\n\r\nThis package builds upon the WEAT methodology and WEATHub dataset. Special thanks to the research community for their contributions to bias detection in NLP.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A comprehensive toolkit for layer-wise bias analysis in language models with fine-tuning support",
    "version": "1.0.1",
    "project_urls": {
        "Bug Reports": "https://github.com/DevDaring/layered-bias-probe/issues",
        "Documentation": "https://github.com/DevDaring/layered-bias-probe#readme",
        "Homepage": "https://github.com/DevDaring/layered-bias-probe",
        "Source": "https://github.com/DevDaring/layered-bias-probe"
    },
    "split_keywords": [
        "bias",
        "nlp",
        "language-models",
        "weat",
        "fairness",
        "transformers",
        "machine-learning",
        "layer-analysis",
        "fine-tuning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e47237dd0cb89a23be804b5b1e23bba9e10f824b1381f166fc8e50aa6ad2b0bc",
                "md5": "8d8bb2c384bda8663e88bc690c2ec37d",
                "sha256": "09cb84e44296364746efa08bf377929f4624e12030be52fba14dc1bda94e31b2"
            },
            "downloads": -1,
            "filename": "layered_bias_probe-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8d8bb2c384bda8663e88bc690c2ec37d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 39789,
            "upload_time": "2025-08-29T15:30:59",
            "upload_time_iso_8601": "2025-08-29T15:30:59.796179Z",
            "url": "https://files.pythonhosted.org/packages/e4/72/37dd0cb89a23be804b5b1e23bba9e10f824b1381f166fc8e50aa6ad2b0bc/layered_bias_probe-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "563432e17ed2ce24f1afa8ae0db6341ee4f7419c929ea891883b573734b287a3",
                "md5": "4f36e75a620fb71d75daaccd5450bedf",
                "sha256": "d62387b455036c5317e95c8bdca31bd9b29572857707fa2a37a46e3cac94965a"
            },
            "downloads": -1,
            "filename": "layered_bias_probe-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4f36e75a620fb71d75daaccd5450bedf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 46533,
            "upload_time": "2025-08-29T15:31:01",
            "upload_time_iso_8601": "2025-08-29T15:31:01.536268Z",
            "url": "https://files.pythonhosted.org/packages/56/34/32e17ed2ce24f1afa8ae0db6341ee4f7419c929ea891883b573734b287a3/layered_bias_probe-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-29 15:31:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DevDaring",
    "github_project": "layered-bias-probe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.32.0"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.64.0"
                ]
            ]
        },
        {
            "name": "accelerate",
            "specs": [
                [
                    ">=",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "bitsandbytes",
            "specs": [
                [
                    ">=",
                    "0.40.0"
                ]
            ]
        },
        {
            "name": "huggingface_hub",
            "specs": [
                [
                    ">=",
                    "0.16.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "kaleido",
            "specs": [
                [
                    ">=",
                    "0.1.0"
                ]
            ]
        }
    ],
    "lcname": "layered-bias-probe"
}
        
Elapsed time: 1.01691s