genderbench


Namegenderbench JSON
Version 0.5.1 PyPI version JSON
download
home_pagehttps://github.com/matus-pikuliak/genderbench
SummaryEvaluation suite for gender biases in LLMs.
upload_time2025-03-18 22:43:44
maintainerNone
docs_urlNone
authorMatúš Pikuliak
requires_python>=3.12
licenseCopyright 2024 Matúš Pikuliak Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. This license applies only to the code in this repository. The contents of the `/src/genderbench/resources` folder are not covered by this license and are subject to fair use as outlined in the appropriate `FAIR_USE.md` files or may include their own `LICENSE` files, which specify different terms. In such cases, the terms in those files take precedence for the corresponding content.
keywords gender-bias fairness-ai llms llms-benchmarking
VCS
bugtrack_url
requirements aiohttp aiosignal alabaster annotated-types anthropic anyio asttokens attrs babel build certifi cffi charset-normalizer click comm contourpy cryptography cycler datasets debugpy decorator dill distro docutils executing filelock fonttools frozenlist fsspec gitdb GitPython h11 httpcore httpx huggingface-hub id idna imagesize ipykernel ipython isort jaraco.classes jaraco.context jaraco.functools jedi jeepney Jinja2 jiter joblib jupyter_client jupyter_core keyring kiwisolver markdown-it-py MarkupSafe matplotlib matplotlib-inline mdit-py-plugins mdurl mistune more-itertools multidict multiprocess myst-parser nest-asyncio nh3 nltk numpy openai packaging pandas parso pexpect pillow platformdirs prompt_toolkit psutil ptyprocess pure-eval pyarrow pyarrow-hotfix pycparser pydantic pydantic_core Pygments pypandoc pyparsing pyproject_hooks python-dateutil python-dotenv pytz PyYAML pyzmq readme_renderer regex requests requests-toolbelt rfc3986 rich scikit-learn scipy SecretStorage setuptools six smmap sniffio snowballstemmer Sphinx sphinx-rtd-theme sphinx_mdinclude sphinxcontrib-applehelp sphinxcontrib-devhelp sphinxcontrib-htmlhelp sphinxcontrib-jquery sphinxcontrib-jsmath sphinxcontrib-qthelp sphinxcontrib-serializinghtml stack-data threadpoolctl tornado tqdm traitlets twine typing_extensions tzdata urllib3 wcwidth wheel xxhash yarl
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GenderBench - Evaluation suite for gender biases in LLMs

`GenderBench` is an evaluation suite designed to measure and benchmark gender
biases in large language models. It uses a variety of tests, called **probes**,
each targeting a specific type of unfair behavior. Our goal is to cover as many
types of unfair behavior as possible.

This project has two purposes:

1. **To publish the results we measured for various LLMs.** Our goal is to
inform the public about the state of the field and raise awareness about the
gender-related issues that LLMs have.

2. **To allow researchers to run the benchmark on their own LLMs.** Our goal is
to make the research in the area easier and more reproducible. `GenderBench` can
serve as a base to pursue various fairness-related research questions.

The probes we provide here are often inspired by existing published scientific
methodologies. Our philosophy when creating the probes is to prefer quality over
quantity, i.e., we carefully vet the data and evaluation protocols to ensure
high reliability.

## Results

`GenderBench` quantifies the intensity of harmful behavior in text generators.
To categorize the severity of harmful behaviors, we use a four-tier
_mark_ system:

- **A - Healthy.** No detectable signs of harmful behavior.
- **B - Cautionary.** Low-intensity harmful behavior, often subtle enough to go
unnoticed by most users.
- **C - Critical.** Noticeable harmful behavior that may affect user experience.
- **D - Catastrophic.** Harmful behavior is common and present in most
interactions.

To calculate these marks, we use the so-called `Probes`. Each probe measures one
or more harmful behaviors. A probe consists of a set of prompts that are fed
into the LLM. The responses are then evaluated with various techniques, and
based on this evaluation, the probe quantifies how the LLM behaves.

For example, one of our probes -- `JobsLumProbe` -- asks the model to generate
novel characters with certain occupations. We analyze the genders of the
generated characters by observing the pronouns the LLM decided to use. Then we
award the model with two marks, (1) based on how gender-balanced the generation
is and (2) based on how strongly the LLM associates occupations with their
stereotypical genders.

### Report
<a href="https://genderbench.readthedocs.io/latest/_static/reports/genderbench_report_1_0.html">↗ GenderBench Report 1.0 available here.</a>

This is the current version of the **GenderBench Report**, summarizing the
results for a selected set of 12 LLMs with the most recent version of
`GenderBench`.

## Documentation

<a href="https://genderbench.readthedocs.io/">↗ Documentation.</a>

This is the documentation for developers that can help you run the code and
implement additional probes.

## Licensing & Fair Use

Read our full [`LICENSE`](https://github.com/matus-pikuliak/genderbench/blob/main/LICENSE) before using or sharing this repository.

- The **code** in this repository is licensed under the MIT License.
- Some **resources** in the `src/genderbench/resources` folder are used under
**fair use** for research and educational purposes. See the appropriate
`FAIR_USE.md` files for details.
-  Some **resources** in the `src/genderbench/resources` folder are licensed
under various additional licenses. See the appropriate `LICENSE` files.

**Do not use or redistribute** the `resources` folder unless you verify that you
comply with applicable laws.

## Usage

This section is for the researchers trying to run `GenderBench` on their own.
`GenderBench` can be used to evaluate an arbitrary text generator, i.e.,
anything that is able to call `generate(texts: list[str]) -> list[str]` method.

### Installation

1. Install the package from this repository:

```
pip install genderbench
```

2. Optionally, set the environment variables. Default values are shown below:

```
# Directory where the logs are stored. Logs include probing results.
export GENDER_BENCH_LOG_DIR=logs/
```

### Probes

Each `Probe` is designed to measure one or more types of harmful behavior. Here
is a snippet of how to use one of the existing probes to measure the intensity
of harmful behavior. `GestCreativeProbe` analyzes the genders of novel
characters written by the model and whether they agree with gender-stereotypical
reasoning (i.e., a character that is supposed to be emotional is a woman).

```python
>>> from genderbench.generators.random import RandomGenerator
>>> from genderbench.probes import GestCreativeProbe

>>> generator = RandomGenerator(["He was a punk", "She did ballet"])
>>> probe = GestCreativeProbe(num_repetitions=10)
>>> marks, metrics = probe.run(generator)
>>> print(marks)

{
    'stereotype_rate': {
        'mark': 0,
        'value': (-0.000424083125234594, 0.021459989952670712),
        'description': 'The extent to which protagonists generated by the model based on specific traits reflect stereotypical norms.',
        'harm_types': ['Stereotyping'],
        'mark_ranges': {0: [(-1, 0.05)], 1: [(0.05, 0.15)], 2: [(0.15, 0.5)], 3: [(0.5, 1)]}
    },
    'masculine_rate': {
        'mark': 0,
        'value': (0.49493275319102825, 0.5053406479328618),
        'description': 'The extent to which protagonists generated based on various traits are gender-balanced.',
        'harm_types': ['Representational Harm'],
        'mark_ranges': {
            0: [(0.45, 0.55)],
            1: [(0.4, 0.45), (0.55, 0.6)],
            2: [(0.2, 0.4), (0.6, 0.8)],
            3: [(0, 0.2), (0.8, 1)]
        }
    }
}
```

This probe returns two marks, `stereotype_rate` and `masculine_rate`. The `mark`
field has the final mark value (0-3 correspond to A-D) as well as additional
information about the assessment.

Each probe also returns _metrics_. Metrics are various statistics calculated
from evaluating the generated texts. Some of the metrics are interpreted as
marks, others can be used for deeper analysis of the behavior.

```python
>>> print(metrics)

{
    'masculine_rate_1': (0.48048006423314693, 0.5193858953694468),
    'masculine_rate_2': (0.48399659154678404, 0.5254386064452468),
    'masculine_rate_3': (0.47090795152805015, 0.510947638616683),
    'masculine_rate_4': (0.48839445645726937, 0.5296722203113409),
    'masculine_rate_5': (0.4910796025082781, 0.5380797154294977),
    'masculine_rate_6': (0.46205626682788525, 0.5045443731017809),
    'masculine_rate_7': (0.47433983921265566, 0.5131845674198158),
    'masculine_rate_8': (0.4725341930823318, 0.5124063381595765),
    'masculine_rate_9': (0.4988185260308012, 0.5380271387495005),
    'masculine_rate_10': (0.48079375199930596, 0.5259076517813326),
    'masculine_rate_11': (0.4772442605197886, 0.5202096109660775),
    'masculine_rate_12': (0.4648792975582989, 0.5067107903737995),
    'masculine_rate_13': (0.48985062489334896, 0.5271224515622255),
    'masculine_rate_14': (0.49629854649442573, 0.5412001544322199),
    'masculine_rate_15': (0.4874085730954739, 0.5289167071824322),
    'masculine_rate_16': (0.4759040068439664, 0.5193538086025689),
    'masculine_rate': (0.4964871874310115, 0.5070187014024483),
    'stereotype_rate': (-0.00727218880142508, 0.01425014866363799),
    'undetected_rate_items': (0.0, 0.0),
    'undetected_rate_attempts': (0.0, 0.0)
}
```

In this case, apart from the two metrics used to calculate marks (`stereotype_rate`
and `masculine_rate`), we also have 18 additional metrics.

### Harnesses

To run a comprehensive evaluation, probes are organized into predefined sets
called `harnesses`. Each harness returns the marks and metrics from the probes
it entails. Harnesses are used to generate data for our reports. Currently,
there is only one harness in the repository, `DefaultHarness`:

```python
from genderbench.harnesses.default import DefaultHarness

harness = DefaultHarness()
marks, metrics = harness.run(generator)
```

### Report generation

The logs generated by harnesses can be used to generate a comprehensive and
sharable HTML report that summarizes the findings.

```python
from genderbench.report_generation.report import create_report


logfiles = [
    "logs/meta_llama_3_1_8b_instruct/defaultharness_e3b73c08-f7f3-4a45-8429-a8089cb6f042.jsonl",
    "logs/mistral_7b_instruct_v0_3/defaultharness_2b0a0385-47ed-48c2-967e-0e26b0b7add4.jsonl",
    "logs/meta_llama_3_1_70b_instruct/defaultharness_a4047219-d16c-407d-9e5d-4a3e5e47a17a.jsonl",    
]
model_names = [
    "meta_llama_3_1_8b_instruct",
    "mistral_7b_instruct_v0_3",
    "meta_llama_3_1_70b_instruct",
]
create_report(
    output_file_path="reports/new_report.html",
    log_files=logfiles,
    model_names=model_names,
)
```

## Probes

This sections briefly describes the probes that are currently present in
`GenderBench`. Each probe has a more detailed documentation that contains all
the necessary information about its data and evaluation methodology.

The probes are organized into sections based on different use cases, which
include:

- **Decision Making** - Evaluates how fair and unbiased the model is in making
decisions.
- **Creative Writing** - Examines how the model handles stereotypes and
representation in creative outputs.
- **Manifested Opinions** - Assesses whether the model's expressed opinions show
bias when asked.
- **Affective Computing** - Looks at whether the model makes assumptions about
users' emotional states based on their gender.

### Decision Making

Evaluates how fair and unbiased the model is in making decisions. We simulate
scenarios where the model is used in fully automated systems or as a
decision-making assistant for users. The focus is on important real-world
situations, such as hiring.

- `DiscriminationTamkin` - The model is asked to make a yes-or-no decision about 
various questions (e.g., should a person get a loan, should a person get a job 
offer). The gender of the person is specified. We study whether the model gives 
better outcomes to any genders. [Documentation](https://genderbench.readthedocs.io/latest/probes/discrimination_tamkin.html).

- `HiringAn` - The model is asked about a candidate for a job. The candidate is 
described using a gender-coded name. We study how the candidate's name 
correlates with the stereotypical gender associated with the job. [Documentation](https://genderbench.readthedocs.io/latest/probes/hiring_an.html).

- `HiringBloomberg` - The model is asked to select candidates from a list of
CVs. The CVs contain gender-coded names. We study which genders tend to win for
different occupations. [Documentation](https://genderbench.readthedocs.io/latest/probes/hiring_bloomberg.html).

- `DiversityMedQa` - We ask the model the same medical question for either a
male or a female patient. We study whether the gender of the patient affects
the accuracy of the model. [Documentation](https://genderbench.readthedocs.io/latest/probes/diversitymedqa.html).


### Creative Writing

Examines how the model handles stereotypes and representation in creative
outputs. We simulate scenarios where authors ask the model to help them with
creative writing. Writing is a common LLM application.

- `GestCreative` - We ask the model to generate a character profile for a novel 
based on a motto. The mottos are associated with various gender stereotypes. We 
study what gender the model uses for the character. [Documentation](https://genderbench.readthedocs.io/latest/probes/gest_creative.html).

- `Inventories` - We ask the model to generate a character profile based on a 
simple description. The descriptions come from gender inventories and are 
associated with various gender stereotypes. We study what gender does the model 
use for the character. [Documentation](https://genderbench.readthedocs.io/latest/probes/inventories.html).

- `JobsLum` - We ask the model to generate a character profile based on an 
occupation. We compare the gender of the generated characters with the 
stereotypical gender of the occupations. [Documentation](https://genderbench.readthedocs.io/latest/probes/jobs_lum.html).

### Manifested Opinions

Assesses whether the model's expressed opinions show bias when asked. We coverly
or overtly inquire about how the model perceives genders. While this may not
reflect typical use cases, it provides insight into the underlying ideologies
embedded in the model.

- `BBQ` - The BBQ dataset contains tricky multiple-choice questions that test 
whether the model uses gender-stereotypical reasoning. [Documentation](https://genderbench.readthedocs.io/latest/probes/bbq.html).

- `BusinessVocabulary` - We ask the model to generate various business communication documents (reference letter, motivational letter, and employee review). We study how gender-stereotypical the vocabulary used in those documents is. [Documentation](https://genderbench.readthedocs.io/latest/probes/business_vocabulary.html).

- `Direct` - We ask the model whether it agrees with various stereotypical 
statements about genders. [Documentation](https://genderbench.readthedocs.io/latest/probes/direct.html).

- `Gest` - We ask the model questions that can be answered using either logical 
or stereotypical reasoning. We observe how often stereotypical reasoning is 
used. [Documentation](https://genderbench.readthedocs.io/latest/probes/gest.html).

- `RelationshipLevy` - We ask the model about everyday relationship conflicts
between a married couple. We study how often the model thinks that either men
or women are in the right. [Documentation](https://genderbench.readthedocs.io/latest/probes/relationship_levy.html).

### Affective Computing

Looks at whether the model makes assumptions about users' emotional states based
on their gender. When the model is aware of a user's gender, it may treat them
differently by assuming certain psychological traits or states. This can result
in unintended unequal treatment.

- `Dreaddit` - We ask the model to predict how stressed the author of a text is. 
We study whether the model exhibits different perceptions of stress based on the 
gender of the author. [Documentation](https://genderbench.readthedocs.io/latest/probes/dreaddit.html).

- `Isear` - We ask the model to role-play as a person of a specific gender and 
inquire about its emotional response to various events. We study whether the 
model exhibits different perceptions of emotionality based on gender. 
[Documentation](https://genderbench.readthedocs.io/latest/probes/isear.html).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/matus-pikuliak/genderbench",
    "name": "genderbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "gender-bias, fairness-ai, llms, llms-benchmarking",
    "author": "Mat\u00fa\u0161 Pikuliak",
    "author_email": "matus.pikuliak@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c2/d4/f16ef83ed9edc660076c308006a2a4b7f436c4676495c1c46e4750f38a7f/genderbench-0.5.1.tar.gz",
    "platform": null,
    "description": "# GenderBench - Evaluation suite for gender biases in LLMs\n\n`GenderBench` is an evaluation suite designed to measure and benchmark gender\nbiases in large language models. It uses a variety of tests, called **probes**,\neach targeting a specific type of unfair behavior. Our goal is to cover as many\ntypes of unfair behavior as possible.\n\nThis project has two purposes:\n\n1. **To publish the results we measured for various LLMs.** Our goal is to\ninform the public about the state of the field and raise awareness about the\ngender-related issues that LLMs have.\n\n2. **To allow researchers to run the benchmark on their own LLMs.** Our goal is\nto make the research in the area easier and more reproducible. `GenderBench` can\nserve as a base to pursue various fairness-related research questions.\n\nThe probes we provide here are often inspired by existing published scientific\nmethodologies. Our philosophy when creating the probes is to prefer quality over\nquantity, i.e., we carefully vet the data and evaluation protocols to ensure\nhigh reliability.\n\n## Results\n\n`GenderBench` quantifies the intensity of harmful behavior in text generators.\nTo categorize the severity of harmful behaviors, we use a four-tier\n_mark_ system:\n\n- **A - Healthy.** No detectable signs of harmful behavior.\n- **B - Cautionary.** Low-intensity harmful behavior, often subtle enough to go\nunnoticed by most users.\n- **C - Critical.** Noticeable harmful behavior that may affect user experience.\n- **D - Catastrophic.** Harmful behavior is common and present in most\ninteractions.\n\nTo calculate these marks, we use the so-called `Probes`. Each probe measures one\nor more harmful behaviors. A probe consists of a set of prompts that are fed\ninto the LLM. The responses are then evaluated with various techniques, and\nbased on this evaluation, the probe quantifies how the LLM behaves.\n\nFor example, one of our probes -- `JobsLumProbe` -- asks the model to generate\nnovel characters with certain occupations. We analyze the genders of the\ngenerated characters by observing the pronouns the LLM decided to use. Then we\naward the model with two marks, (1) based on how gender-balanced the generation\nis and (2) based on how strongly the LLM associates occupations with their\nstereotypical genders.\n\n### Report\n<a href=\"https://genderbench.readthedocs.io/latest/_static/reports/genderbench_report_1_0.html\">\u2197 GenderBench Report 1.0 available here.</a>\n\nThis is the current version of the **GenderBench Report**, summarizing the\nresults for a selected set of 12 LLMs with the most recent version of\n`GenderBench`.\n\n## Documentation\n\n<a href=\"https://genderbench.readthedocs.io/\">\u2197 Documentation.</a>\n\nThis is the documentation for developers that can help you run the code and\nimplement additional probes.\n\n## Licensing & Fair Use\n\nRead our full [`LICENSE`](https://github.com/matus-pikuliak/genderbench/blob/main/LICENSE) before using or sharing this repository.\n\n- The **code** in this repository is licensed under the MIT License.\n- Some **resources** in the `src/genderbench/resources` folder are used under\n**fair use** for research and educational purposes. See the appropriate\n`FAIR_USE.md` files for details.\n-  Some **resources** in the `src/genderbench/resources` folder are licensed\nunder various additional licenses. See the appropriate `LICENSE` files.\n\n**Do not use or redistribute** the `resources` folder unless you verify that you\ncomply with applicable laws.\n\n## Usage\n\nThis section is for the researchers trying to run `GenderBench` on their own.\n`GenderBench` can be used to evaluate an arbitrary text generator, i.e.,\nanything that is able to call `generate(texts: list[str]) -> list[str]` method.\n\n### Installation\n\n1. Install the package from this repository:\n\n```\npip install genderbench\n```\n\n2. Optionally, set the environment variables. Default values are shown below:\n\n```\n# Directory where the logs are stored. Logs include probing results.\nexport GENDER_BENCH_LOG_DIR=logs/\n```\n\n### Probes\n\nEach `Probe` is designed to measure one or more types of harmful behavior. Here\nis a snippet of how to use one of the existing probes to measure the intensity\nof harmful behavior. `GestCreativeProbe` analyzes the genders of novel\ncharacters written by the model and whether they agree with gender-stereotypical\nreasoning (i.e., a character that is supposed to be emotional is a woman).\n\n```python\n>>> from genderbench.generators.random import RandomGenerator\n>>> from genderbench.probes import GestCreativeProbe\n\n>>> generator = RandomGenerator([\"He was a punk\", \"She did ballet\"])\n>>> probe = GestCreativeProbe(num_repetitions=10)\n>>> marks, metrics = probe.run(generator)\n>>> print(marks)\n\n{\n    'stereotype_rate': {\n        'mark': 0,\n        'value': (-0.000424083125234594, 0.021459989952670712),\n        'description': 'The extent to which protagonists generated by the model based on specific traits reflect stereotypical norms.',\n        'harm_types': ['Stereotyping'],\n        'mark_ranges': {0: [(-1, 0.05)], 1: [(0.05, 0.15)], 2: [(0.15, 0.5)], 3: [(0.5, 1)]}\n    },\n    'masculine_rate': {\n        'mark': 0,\n        'value': (0.49493275319102825, 0.5053406479328618),\n        'description': 'The extent to which protagonists generated based on various traits are gender-balanced.',\n        'harm_types': ['Representational Harm'],\n        'mark_ranges': {\n            0: [(0.45, 0.55)],\n            1: [(0.4, 0.45), (0.55, 0.6)],\n            2: [(0.2, 0.4), (0.6, 0.8)],\n            3: [(0, 0.2), (0.8, 1)]\n        }\n    }\n}\n```\n\nThis probe returns two marks, `stereotype_rate` and `masculine_rate`. The `mark`\nfield has the final mark value (0-3 correspond to A-D) as well as additional\ninformation about the assessment.\n\nEach probe also returns _metrics_. Metrics are various statistics calculated\nfrom evaluating the generated texts. Some of the metrics are interpreted as\nmarks, others can be used for deeper analysis of the behavior.\n\n```python\n>>> print(metrics)\n\n{\n    'masculine_rate_1': (0.48048006423314693, 0.5193858953694468),\n    'masculine_rate_2': (0.48399659154678404, 0.5254386064452468),\n    'masculine_rate_3': (0.47090795152805015, 0.510947638616683),\n    'masculine_rate_4': (0.48839445645726937, 0.5296722203113409),\n    'masculine_rate_5': (0.4910796025082781, 0.5380797154294977),\n    'masculine_rate_6': (0.46205626682788525, 0.5045443731017809),\n    'masculine_rate_7': (0.47433983921265566, 0.5131845674198158),\n    'masculine_rate_8': (0.4725341930823318, 0.5124063381595765),\n    'masculine_rate_9': (0.4988185260308012, 0.5380271387495005),\n    'masculine_rate_10': (0.48079375199930596, 0.5259076517813326),\n    'masculine_rate_11': (0.4772442605197886, 0.5202096109660775),\n    'masculine_rate_12': (0.4648792975582989, 0.5067107903737995),\n    'masculine_rate_13': (0.48985062489334896, 0.5271224515622255),\n    'masculine_rate_14': (0.49629854649442573, 0.5412001544322199),\n    'masculine_rate_15': (0.4874085730954739, 0.5289167071824322),\n    'masculine_rate_16': (0.4759040068439664, 0.5193538086025689),\n    'masculine_rate': (0.4964871874310115, 0.5070187014024483),\n    'stereotype_rate': (-0.00727218880142508, 0.01425014866363799),\n    'undetected_rate_items': (0.0, 0.0),\n    'undetected_rate_attempts': (0.0, 0.0)\n}\n```\n\nIn this case, apart from the two metrics used to calculate marks (`stereotype_rate`\nand `masculine_rate`), we also have 18 additional metrics.\n\n### Harnesses\n\nTo run a comprehensive evaluation, probes are organized into predefined sets\ncalled `harnesses`. Each harness returns the marks and metrics from the probes\nit entails. Harnesses are used to generate data for our reports. Currently,\nthere is only one harness in the repository, `DefaultHarness`:\n\n```python\nfrom genderbench.harnesses.default import DefaultHarness\n\nharness = DefaultHarness()\nmarks, metrics = harness.run(generator)\n```\n\n### Report generation\n\nThe logs generated by harnesses can be used to generate a comprehensive and\nsharable HTML report that summarizes the findings.\n\n```python\nfrom genderbench.report_generation.report import create_report\n\n\nlogfiles = [\n    \"logs/meta_llama_3_1_8b_instruct/defaultharness_e3b73c08-f7f3-4a45-8429-a8089cb6f042.jsonl\",\n    \"logs/mistral_7b_instruct_v0_3/defaultharness_2b0a0385-47ed-48c2-967e-0e26b0b7add4.jsonl\",\n    \"logs/meta_llama_3_1_70b_instruct/defaultharness_a4047219-d16c-407d-9e5d-4a3e5e47a17a.jsonl\",    \n]\nmodel_names = [\n    \"meta_llama_3_1_8b_instruct\",\n    \"mistral_7b_instruct_v0_3\",\n    \"meta_llama_3_1_70b_instruct\",\n]\ncreate_report(\n    output_file_path=\"reports/new_report.html\",\n    log_files=logfiles,\n    model_names=model_names,\n)\n```\n\n## Probes\n\nThis sections briefly describes the probes that are currently present in\n`GenderBench`. Each probe has a more detailed documentation that contains all\nthe necessary information about its data and evaluation methodology.\n\nThe probes are organized into sections based on different use cases, which\ninclude:\n\n- **Decision Making** - Evaluates how fair and unbiased the model is in making\ndecisions.\n- **Creative Writing** - Examines how the model handles stereotypes and\nrepresentation in creative outputs.\n- **Manifested Opinions** - Assesses whether the model's expressed opinions show\nbias when asked.\n- **Affective Computing** - Looks at whether the model makes assumptions about\nusers' emotional states based on their gender.\n\n### Decision Making\n\nEvaluates how fair and unbiased the model is in making decisions. We simulate\nscenarios where the model is used in fully automated systems or as a\ndecision-making assistant for users. The focus is on important real-world\nsituations, such as hiring.\n\n- `DiscriminationTamkin` - The model is asked to make a yes-or-no decision about \nvarious questions (e.g., should a person get a loan, should a person get a job \noffer). The gender of the person is specified. We study whether the model gives \nbetter outcomes to any genders. [Documentation](https://genderbench.readthedocs.io/latest/probes/discrimination_tamkin.html).\n\n- `HiringAn` - The model is asked about a candidate for a job. The candidate is \ndescribed using a gender-coded name. We study how the candidate's name \ncorrelates with the stereotypical gender associated with the job. [Documentation](https://genderbench.readthedocs.io/latest/probes/hiring_an.html).\n\n- `HiringBloomberg` - The model is asked to select candidates from a list of\nCVs. The CVs contain gender-coded names. We study which genders tend to win for\ndifferent occupations. [Documentation](https://genderbench.readthedocs.io/latest/probes/hiring_bloomberg.html).\n\n- `DiversityMedQa` - We ask the model the same medical question for either a\nmale or a female patient. We study whether the gender of the patient affects\nthe accuracy of the model. [Documentation](https://genderbench.readthedocs.io/latest/probes/diversitymedqa.html).\n\n\n### Creative Writing\n\nExamines how the model handles stereotypes and representation in creative\noutputs. We simulate scenarios where authors ask the model to help them with\ncreative writing. Writing is a common LLM application.\n\n- `GestCreative` - We ask the model to generate a character profile for a novel \nbased on a motto. The mottos are associated with various gender stereotypes. We \nstudy what gender the model uses for the character. [Documentation](https://genderbench.readthedocs.io/latest/probes/gest_creative.html).\n\n- `Inventories` - We ask the model to generate a character profile based on a \nsimple description. The descriptions come from gender inventories and are \nassociated with various gender stereotypes. We study what gender does the model \nuse for the character. [Documentation](https://genderbench.readthedocs.io/latest/probes/inventories.html).\n\n- `JobsLum` - We ask the model to generate a character profile based on an \noccupation. We compare the gender of the generated characters with the \nstereotypical gender of the occupations. [Documentation](https://genderbench.readthedocs.io/latest/probes/jobs_lum.html).\n\n### Manifested Opinions\n\nAssesses whether the model's expressed opinions show bias when asked. We coverly\nor overtly inquire about how the model perceives genders. While this may not\nreflect typical use cases, it provides insight into the underlying ideologies\nembedded in the model.\n\n- `BBQ` - The BBQ dataset contains tricky multiple-choice questions that test \nwhether the model uses gender-stereotypical reasoning. [Documentation](https://genderbench.readthedocs.io/latest/probes/bbq.html).\n\n- `BusinessVocabulary` - We ask the model to generate various business communication documents (reference letter, motivational letter, and employee review). We study how gender-stereotypical the vocabulary used in those documents is. [Documentation](https://genderbench.readthedocs.io/latest/probes/business_vocabulary.html).\n\n- `Direct` - We ask the model whether it agrees with various stereotypical \nstatements about genders. [Documentation](https://genderbench.readthedocs.io/latest/probes/direct.html).\n\n- `Gest` - We ask the model questions that can be answered using either logical \nor stereotypical reasoning. We observe how often stereotypical reasoning is \nused. [Documentation](https://genderbench.readthedocs.io/latest/probes/gest.html).\n\n- `RelationshipLevy` - We ask the model about everyday relationship conflicts\nbetween a married couple. We study how often the model thinks that either men\nor women are in the right. [Documentation](https://genderbench.readthedocs.io/latest/probes/relationship_levy.html).\n\n### Affective Computing\n\nLooks at whether the model makes assumptions about users' emotional states based\non their gender. When the model is aware of a user's gender, it may treat them\ndifferently by assuming certain psychological traits or states. This can result\nin unintended unequal treatment.\n\n- `Dreaddit` - We ask the model to predict how stressed the author of a text is. \nWe study whether the model exhibits different perceptions of stress based on the \ngender of the author. [Documentation](https://genderbench.readthedocs.io/latest/probes/dreaddit.html).\n\n- `Isear` - We ask the model to role-play as a person of a specific gender and \ninquire about its emotional response to various events. We study whether the \nmodel exhibits different perceptions of emotionality based on gender. \n[Documentation](https://genderbench.readthedocs.io/latest/probes/isear.html).\n",
    "bugtrack_url": null,
    "license": "Copyright 2024 Mat\u00fa\u0161 Pikuliak\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\nThis license applies only to the code in this repository. The contents of the `/src/genderbench/resources` folder are not covered by this license and are subject to fair use as outlined in the appropriate `FAIR_USE.md` files or may include their own `LICENSE` files, which specify different terms. In such cases, the terms in those files take precedence for the corresponding content.\n",
    "summary": "Evaluation suite for gender biases in LLMs.",
    "version": "0.5.1",
    "project_urls": {
        "Documentation": "https://genderbench.readthedocs.io",
        "Homepage": "https://github.com/matus-pikuliak/genderbench",
        "Repository": "https://github.com/matus-pikuliak/genderbench"
    },
    "split_keywords": [
        "gender-bias",
        " fairness-ai",
        " llms",
        " llms-benchmarking"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d13a86ba62e5d83cf7037ecb061f673f5cbc652b9468c830be79e8f27bdcbc7e",
                "md5": "d54b60d325af970fcddb620ea393f11f",
                "sha256": "41439b0456a291bd2e1014381d029b9832c34e8ff7bef61800bb69b96ec657a1"
            },
            "downloads": -1,
            "filename": "genderbench-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d54b60d325af970fcddb620ea393f11f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 1783059,
            "upload_time": "2025-03-18T22:43:42",
            "upload_time_iso_8601": "2025-03-18T22:43:42.400564Z",
            "url": "https://files.pythonhosted.org/packages/d1/3a/86ba62e5d83cf7037ecb061f673f5cbc652b9468c830be79e8f27bdcbc7e/genderbench-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c2d4f16ef83ed9edc660076c308006a2a4b7f436c4676495c1c46e4750f38a7f",
                "md5": "5dea1ce14bf9a33e5a2c3f7344f2dd67",
                "sha256": "b3617bb4f3bcfaf3b38ca7ceeac45aeedadb8b0e0fae589106d13dbafc02d36c"
            },
            "downloads": -1,
            "filename": "genderbench-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5dea1ce14bf9a33e5a2c3f7344f2dd67",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 1709752,
            "upload_time": "2025-03-18T22:43:44",
            "upload_time_iso_8601": "2025-03-18T22:43:44.050072Z",
            "url": "https://files.pythonhosted.org/packages/c2/d4/f16ef83ed9edc660076c308006a2a4b7f436c4676495c1c46e4750f38a7f/genderbench-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-18 22:43:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "matus-pikuliak",
    "github_project": "genderbench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "aiohttp",
            "specs": [
                [
                    "==",
                    "3.9.5"
                ]
            ]
        },
        {
            "name": "aiosignal",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "alabaster",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "annotated-types",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "anthropic",
            "specs": [
                [
                    "==",
                    "0.49.0"
                ]
            ]
        },
        {
            "name": "anyio",
            "specs": [
                [
                    "==",
                    "4.4.0"
                ]
            ]
        },
        {
            "name": "asttokens",
            "specs": [
                [
                    "==",
                    "2.4.1"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "23.2.0"
                ]
            ]
        },
        {
            "name": "babel",
            "specs": [
                [
                    "==",
                    "2.16.0"
                ]
            ]
        },
        {
            "name": "build",
            "specs": [
                [
                    "==",
                    "1.2.2.post1"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.7.4"
                ]
            ]
        },
        {
            "name": "cffi",
            "specs": [
                [
                    "==",
                    "1.17.1"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.3.2"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "comm",
            "specs": [
                [
                    "==",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "contourpy",
            "specs": [
                [
                    "==",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "cryptography",
            "specs": [
                [
                    "==",
                    "44.0.1"
                ]
            ]
        },
        {
            "name": "cycler",
            "specs": [
                [
                    "==",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    "==",
                    "3.2.0"
                ]
            ]
        },
        {
            "name": "debugpy",
            "specs": [
                [
                    "==",
                    "1.8.2"
                ]
            ]
        },
        {
            "name": "decorator",
            "specs": [
                [
                    "==",
                    "5.1.1"
                ]
            ]
        },
        {
            "name": "dill",
            "specs": [
                [
                    "==",
                    "0.3.8"
                ]
            ]
        },
        {
            "name": "distro",
            "specs": [
                [
                    "==",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "docutils",
            "specs": [
                [
                    "==",
                    "0.21.2"
                ]
            ]
        },
        {
            "name": "executing",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    "==",
                    "3.15.4"
                ]
            ]
        },
        {
            "name": "fonttools",
            "specs": [
                [
                    "==",
                    "4.53.1"
                ]
            ]
        },
        {
            "name": "frozenlist",
            "specs": [
                [
                    "==",
                    "1.4.1"
                ]
            ]
        },
        {
            "name": "fsspec",
            "specs": [
                [
                    "==",
                    "2024.5.0"
                ]
            ]
        },
        {
            "name": "gitdb",
            "specs": [
                [
                    "==",
                    "4.0.11"
                ]
            ]
        },
        {
            "name": "GitPython",
            "specs": [
                [
                    "==",
                    "3.1.41"
                ]
            ]
        },
        {
            "name": "h11",
            "specs": [
                [
                    "==",
                    "0.14.0"
                ]
            ]
        },
        {
            "name": "httpcore",
            "specs": [
                [
                    "==",
                    "1.0.5"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    "==",
                    "0.27.0"
                ]
            ]
        },
        {
            "name": "huggingface-hub",
            "specs": [
                [
                    "==",
                    "0.27.0"
                ]
            ]
        },
        {
            "name": "id",
            "specs": [
                [
                    "==",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.7"
                ]
            ]
        },
        {
            "name": "imagesize",
            "specs": [
                [
                    "==",
                    "1.4.1"
                ]
            ]
        },
        {
            "name": "ipykernel",
            "specs": [
                [
                    "==",
                    "6.29.5"
                ]
            ]
        },
        {
            "name": "ipython",
            "specs": [
                [
                    "==",
                    "8.26.0"
                ]
            ]
        },
        {
            "name": "isort",
            "specs": [
                [
                    "==",
                    "5.13.2"
                ]
            ]
        },
        {
            "name": "jaraco.classes",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "jaraco.context",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "jaraco.functools",
            "specs": [
                [
                    "==",
                    "4.1.0"
                ]
            ]
        },
        {
            "name": "jedi",
            "specs": [
                [
                    "==",
                    "0.19.1"
                ]
            ]
        },
        {
            "name": "jeepney",
            "specs": [
                [
                    "==",
                    "0.8.0"
                ]
            ]
        },
        {
            "name": "Jinja2",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "jiter",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "jupyter_client",
            "specs": [
                [
                    "==",
                    "8.6.2"
                ]
            ]
        },
        {
            "name": "jupyter_core",
            "specs": [
                [
                    "==",
                    "5.7.2"
                ]
            ]
        },
        {
            "name": "keyring",
            "specs": [
                [
                    "==",
                    "25.6.0"
                ]
            ]
        },
        {
            "name": "kiwisolver",
            "specs": [
                [
                    "==",
                    "1.4.5"
                ]
            ]
        },
        {
            "name": "markdown-it-py",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "MarkupSafe",
            "specs": [
                [
                    "==",
                    "3.0.2"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.9.1"
                ]
            ]
        },
        {
            "name": "matplotlib-inline",
            "specs": [
                [
                    "==",
                    "0.1.7"
                ]
            ]
        },
        {
            "name": "mdit-py-plugins",
            "specs": [
                [
                    "==",
                    "0.4.2"
                ]
            ]
        },
        {
            "name": "mdurl",
            "specs": [
                [
                    "==",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "mistune",
            "specs": [
                [
                    "==",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.6.0"
                ]
            ]
        },
        {
            "name": "multidict",
            "specs": [
                [
                    "==",
                    "6.0.5"
                ]
            ]
        },
        {
            "name": "multiprocess",
            "specs": [
                [
                    "==",
                    "0.70.16"
                ]
            ]
        },
        {
            "name": "myst-parser",
            "specs": [
                [
                    "==",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "nest-asyncio",
            "specs": [
                [
                    "==",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "nh3",
            "specs": [
                [
                    "==",
                    "0.2.20"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    "==",
                    "3.8.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    "==",
                    "1.35.13"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "parso",
            "specs": [
                [
                    "==",
                    "0.8.4"
                ]
            ]
        },
        {
            "name": "pexpect",
            "specs": [
                [
                    "==",
                    "4.9.0"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "10.4.0"
                ]
            ]
        },
        {
            "name": "platformdirs",
            "specs": [
                [
                    "==",
                    "4.2.2"
                ]
            ]
        },
        {
            "name": "prompt_toolkit",
            "specs": [
                [
                    "==",
                    "3.0.47"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    "==",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "ptyprocess",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "pure-eval",
            "specs": [
                [
                    "==",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    "==",
                    "16.1.0"
                ]
            ]
        },
        {
            "name": "pyarrow-hotfix",
            "specs": [
                [
                    "==",
                    "0.6"
                ]
            ]
        },
        {
            "name": "pycparser",
            "specs": [
                [
                    "==",
                    "2.22"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pydantic_core",
            "specs": [
                [
                    "==",
                    "2.20.1"
                ]
            ]
        },
        {
            "name": "Pygments",
            "specs": [
                [
                    "==",
                    "2.18.0"
                ]
            ]
        },
        {
            "name": "pypandoc",
            "specs": [
                [
                    "==",
                    "1.15"
                ]
            ]
        },
        {
            "name": "pyparsing",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "pyproject_hooks",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "pyzmq",
            "specs": [
                [
                    "==",
                    "26.0.3"
                ]
            ]
        },
        {
            "name": "readme_renderer",
            "specs": [
                [
                    "==",
                    "44.0"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2024.5.15"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "requests-toolbelt",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "rfc3986",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.7.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.14.0"
                ]
            ]
        },
        {
            "name": "SecretStorage",
            "specs": [
                [
                    "==",
                    "3.3.3"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "75.8.0"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "smmap",
            "specs": [
                [
                    "==",
                    "5.0.1"
                ]
            ]
        },
        {
            "name": "sniffio",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "snowballstemmer",
            "specs": [
                [
                    "==",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "Sphinx",
            "specs": [
                [
                    "==",
                    "8.1.3"
                ]
            ]
        },
        {
            "name": "sphinx-rtd-theme",
            "specs": [
                [
                    "==",
                    "3.0.2"
                ]
            ]
        },
        {
            "name": "sphinx_mdinclude",
            "specs": [
                [
                    "==",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-applehelp",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-devhelp",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-htmlhelp",
            "specs": [
                [
                    "==",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-jquery",
            "specs": [
                [
                    "==",
                    "4.1"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-jsmath",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-qthelp",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-serializinghtml",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "stack-data",
            "specs": [
                [
                    "==",
                    "0.6.3"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "tornado",
            "specs": [
                [
                    "==",
                    "6.4.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "traitlets",
            "specs": [
                [
                    "==",
                    "5.14.3"
                ]
            ]
        },
        {
            "name": "twine",
            "specs": [
                [
                    "==",
                    "6.1.0"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "wcwidth",
            "specs": [
                [
                    "==",
                    "0.2.13"
                ]
            ]
        },
        {
            "name": "wheel",
            "specs": [
                [
                    "==",
                    "0.45.1"
                ]
            ]
        },
        {
            "name": "xxhash",
            "specs": [
                [
                    "==",
                    "3.4.1"
                ]
            ]
        },
        {
            "name": "yarl",
            "specs": [
                [
                    "==",
                    "1.9.4"
                ]
            ]
        }
    ],
    "lcname": "genderbench"
}
        
Elapsed time: 1.30950s