Name | multimedeval JSON |
Version |
1.0.0
JSON |
| download |
home_page | None |
Summary | A Python tool to evaluate the performance of VLM on the medical domain. |
upload_time | 2025-07-23 14:44:40 |
maintainer | None |
docs_url | None |
author | Corentin Royer |
requires_python | <3.13,>=3.9 |
license | MIT |
keywords |
evaluation
medical
vlm
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MultiMedEval
MultiMedEval is a library to evaluate the performance of Vision-Language Models (VLM) on medical domain tasks. The goal is to have a set of benchmark with a unified evaluation scheme to facilitate the development and comparison of medical VLM.
We include 24 tasks representing 10 different imaging modalities and some text-only tasks.
   
## Tasks
<details>
<summary>Question Answering</summary>
| Task | Description | Modality | Size |
| -------- | ------------------------------------------------------ | ---------------- | ---- |
| MedQA | Multiple choice questions on general medical knowledge | General medicine | 1273 |
| PubMedQA | Yes/no/maybe questions based on PubMed paper abstracts | General medicine | 500 |
| MedMCQA | Multiple choice questions on general medical knowledge | General medicine | 4183 |
</details>
</br>
<details>
<summary>Visual Question Answering</summary>
| Task | Description | Modality | Size |
| -------- | ---------------------------------------- | --------- | ---- |
| VQA-RAD | Open ended questions on radiology images | X-ray | 451 |
| Path-VQA | Open ended questions on pathology images | Pathology | 6719 |
| SLAKE | Open ended questions on radiology images | X-ray | 1061 |
</details>
</br>
<details>
<summary>Report Comparison</summary>
| Task | Description | Modality | Size |
| -------------------------- | --------------------------------------------------------------------------------- | ----------- | ----- |
| MIMIC-CXR-ReportGeneration | Generation of finding sections of radiology reports based on the radiology images | Chest X-ray | 2347 |
| MIMIC-III | Summarization of radiology reports | Text | 13054 |
</details>
</br>
<details>
<summary>Natural Language Inference</summary>
| Task | Description | Modality | Size |
| ------ | ------------------------------------------------ | ---------------- | ---- |
| MedNLI | Natural Language Inference on medical sentences. | General medicine | 1422 |
</details>
</br>
<details>
<summary>Image Classification</summary>
| Task | Description | Modality | Size |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------- | ----- |
| MIMIC-CXR-ImageClassification | Classification of radiology images into 5 diseases | Chest X-ray | 5159 |
| VinDr-Mammo | Classification of mammography images into 5 BIRADS levels | Mammography | 429 |
| Pad-UFES-20 | Classification of skin lesion images into 7 diseases | Dermatology | 2298 |
| CBIS-DDSM-Mass | Classification of masses in mammography images into "benign", "malignant" or "benign without callback" | Mammography | 378 |
| CBIS-DDSM-Calcification | Classification of calcification in mammography images into "benign", "malignant" or "benign without callback" | Mammography | 326 |
| MNIST-Oct | Image classification of Optical coherence tomography of the retine | OCT | 1000 |
| MNIST-Path | Image classification of pathology image | Pathology | 7180 |
| MNIST-Blood | Image classification of blood cell seen through a microscope | Microscopy | 3421 |
| MNIST-Breast | Image classification of mammography | Mammography | 156 |
| MNIST-Derma | Image classification of skin deffect images | Dermatology | 2005 |
| MNIST-OrganC | Image classification of abdominal CT scan | CT | 8216 |
| MNIST-OrganS | Image classification of abdominal CT scan | CT | 8827 |
| MNIST-Pneumonia | Image classification of chest X-Rays | X-Ray | 624 |
| MNIST-Retina | Image classification of the retina taken with a fondus camera | Fondus Camera | 400 |
| MNIST-Tissue | Image classification of kidney cortex seen through a microscope | Microscopy | 12820 |
</details>
</br>
<p align="center">
<img src="figures/sankey.png" alt="Sankey graph">
<br>
<em>Representation of the modalities, tasks and datasets in MultiMedEval</em>
</p>
## Setup
To install the library, you can use `pip`
```console
pip install multimedeval
```
To run the benchmark on your model, you first need to create an instance of the `MultiMedEval` class.
```python
from multimedeval import MultiMedEval, SetupParams, EvalParams
from multimedeval.utils import BatcherInput, BatcherOutput
engine = MultiMedEval()
```
You then need to call the `setup` function of the `engine`. This will download the datasets if needed and prepare them for evaluation. You can specify where to store the data and which datasets you want to download.
```python
setupParams = SetupParams(medqa_dir="data/")
tasksReady = engine.setup(setup_params=setupParams)
```
Here we initialize the `SetupParams` dataclass with only the path for the MedQA dataset. If you omit to pass a directory for some of the datasets, they will be skipped during the evaluation. During the setup process, the script will need a Physionet username and password to download "VinDr-Mammo", "MIMIC-CXR" and "MIMIC-III". You also need to setup Kaggle on your machine before running the setup as the "CBIS-DDSM" is hosted on Kaggle. At the end of the setup process, you will see a summary of which tasks are ready and which didn't run properly and the function will return a summary in the form of a dictionary.
## Usage
### Implement the Batcher
The user must implement one Callable: `batcher`. It takes a batch of input and must return the answer.
The batch is a list of inputs.
Each input is an instance of @dataclass `BatcherInput`, containing the following fields:
- `conversation`: a prompt in the form of a Hugginface style conversation between a user and an assistant.
- `images`: a list of Pillow images. The number of images matches the number of <img> tokens in the prompt and are ordered.
- `segmentation_masks`: (optional) a list of segmentation masks, the number of which matches that of <seg> tokens in the prompt and are ordered.
```python
[
BatcherInput(
conversation =
[
{"role": "user", "content": "This is a question with an image <img>."},
{"role": "assistant", "content": "This is the answer."},
{"role": "user", "content": "This is a question with an image <img>."},
],
images = [PIL.Image(), PIL.Image()],
segmentation_masks = [PIL.Image(), PIL.Image()]
),
BatcherInput(
conversation =
[
{"role": "user", "content": "This is a question without images."},
{"role": "assistant", "content": "This is the answer."},
{"role": "user", "content": "This is a question without images."},
],
images = [],
segmentation_masks = []
),
]
```
Here is an example of a `batcher` without any logic:
```python
def batcher(prompts: List[BatcherInput]) -> List[BatcherOutput]:
return ["Answer" for _ in prompts]
```
A function is the simplest example of a Callable but the batcher can also be implemented as a Callable class (i.e. a class implementing the `__call__` method). Doing it this way allows to initialize the model in the `__init__` function of the class. We give an example for the Mistral model (a language-only model).
```python
class batcherMistral:
def __init__(self) -> None:
self.model: MistralModel = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
self.tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
self.tokenizer.pad_token = self.tokenizer.eos_token
def __call__(self, prompts: List[BatcherInput]) -> List[BatcherOutput]:
model_inputs = [self.tokenizer.apply_chat_template(messages.conversation, return_tensors="pt", tokenize=False) for messages in prompts]
model_inputs = self.tokenizer(model_inputs, padding="max_length", truncation=True, max_length=1024, return_tensors="pt")
generated_ids = self.model.generate(**model_inputs, max_new_tokens=200, do_sample=True, pad_token_id=self.tokenizer.pad_token_id)
# Remove the first 1024 tokens (prompt)
generated_ids = generated_ids[:, model_inputs["input_ids"].shape[1] :]
answers = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
return [BatcherOutput(text=answer) for answer in answers]
```
### Run the benchmark
To run the benchmark, call the `eval` method of the `MultiMedEval` class with the list of tasks to benchmark on, the batcher to ealuate and the evaluation parameters. If the list is empty, all the tasks will be benchmarked.
```python
evalParams = EvalParams(batch_size=128)
results = engine.eval(["MedQA", "VQA-RAD"], batcher, eval_params=evalParams)
```
## MultiMedEval parameters
The `SetupParams` class takes a path for each dataset:
- medqa_dir: will be use in Huggingface's `load_dataset` as cache_dir
- pubmedqa_dir: will be use in Huggingface's `load_dataset` as cache_dir
- medmcqa_dir: will be use in Huggingface's `load_dataset` as cache_dir
- vqa_rad_dir: will be use in Huggingface's `load_dataset` as cache_dir
- path_vqa_dir: will be use in Huggingface's `load_dataset` as cache_dir
- slake_dir: the dataset is currently hosted on Google Drive which can be an issue on some systems.
- mimic_iii_dir: path for the (physionet) MIMIC-III dataset.
- mednli_dir: will be use in Huggingface's `load_dataset` as cache_dir
- mimic_cxr_dir: path for the (physionet) MIMIC-CXR dataset.
- vindr_mammo_dir: path for the (physionet) VinDr-Mammo dataset.
- pad_ufes_20_dir
- cbis_ddsm_dir: dataset hosted on Kaggle. Kaggle must be set up on the system (see [this](https://www.kaggle.com/docs/api#getting-started-installation-&-authentication))
- mnist_oct_dir
- mnist_path_dir
- mnist_blood_dir
- mnist_breast_dir
- mnist_derma_dir
- mnist_organc_dir
- mnist_organs_dir
- mnist_pneumonia_dir
- mnist_retina_dir
- mnist_tissue_dir
- chexbert_dir: path for the CheXBert model checkpoint
- physionet_username: physionet username to download MIMIC and VinDr-Mammo
- physionet_password: password for the physionet account
The `EvalParams` class takes the following arguments:
- batch_size: The size of the batches sent to the user's batcher Callable.
- run_name: The name to use for the folder where the output will be stored.
- fewshot: A boolean indicating whether the evaluation is few-shot.
- num_workers: The number of workers for the dataloader.
- device: The device to run the evaluation on.
- tensorBoardWriter: The tensorboard writer to use for logging.
- tensorboardStep: The global step for logging to tensorboard.
## Additional tasks
To add a new task to the list of already implemented ones, create a folder named `MultiMedEvalAdditionalDatasets` and a subfolder with the name of your dataset.
Inside your dataset folder, create a `json` file that follows the following template for a VQA dataset:
```json
{
"taskType": "VQA",
"modality": "Radiology",
"samples": [
{
"question": "Question 1",
"answer": "Answer 1",
"images": ["image1.png", "image2.png"]
},
{ "question": "Question 2", "answer": "Answer 2", "images": ["image1.png"] }
]
}
```
And for a QA dataset:
```json
{
"taskType": "QA",
"modality": "Pathology",
"samples": [
{
"question": "Question 1",
"answer": "Answer 1",
"options": ["Option 1", "Option 2"],
"images": ["image1.png", "image2.png"]
},
{
"question": "Question 2",
"answer": "Answer 2",
"options": ["Option 1", "Option 2"],
"images": ["image1.png"]
}
]
}
```
Note that in both cases the `images` key is optional. If the `taskType` is VQA, the metrics computed will be BLEU-1, accuracy for closed and open questions, recall and recall for open questions as well as F1. For the QA `taskType`, the tool will report the accuracy (by comparing the answer to every option using BLEU).
## Reference
```
@misc{royer2024multimedeval,
title={MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models},
author={Corentin Royer and Bjoern Menze and Anjany Sekuboyina},
year={2024},
eprint={2402.09262},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "multimedeval",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "evaluation, medical, vlm",
"author": "Corentin Royer",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/67/90/ce9b7331ba3a748e27fc5a0e5c4f38e81dc9e440153ecc1f80fba7e112bc/multimedeval-1.0.0.tar.gz",
"platform": null,
"description": "# MultiMedEval\n\nMultiMedEval is a library to evaluate the performance of Vision-Language Models (VLM) on medical domain tasks. The goal is to have a set of benchmark with a unified evaluation scheme to facilitate the development and comparison of medical VLM.\nWe include 24 tasks representing 10 different imaging modalities and some text-only tasks.\n\n   \n\n## Tasks\n\n<details>\n <summary>Question Answering</summary>\n\n| Task | Description | Modality | Size |\n| -------- | ------------------------------------------------------ | ---------------- | ---- |\n| MedQA | Multiple choice questions on general medical knowledge | General medicine | 1273 |\n| PubMedQA | Yes/no/maybe questions based on PubMed paper abstracts | General medicine | 500 |\n| MedMCQA | Multiple choice questions on general medical knowledge | General medicine | 4183 |\n\n</details>\n\n</br>\n\n<details>\n <summary>Visual Question Answering</summary>\n\n| Task | Description | Modality | Size |\n| -------- | ---------------------------------------- | --------- | ---- |\n| VQA-RAD | Open ended questions on radiology images | X-ray | 451 |\n| Path-VQA | Open ended questions on pathology images | Pathology | 6719 |\n| SLAKE | Open ended questions on radiology images | X-ray | 1061 |\n\n</details>\n\n</br>\n\n<details>\n <summary>Report Comparison</summary>\n\n| Task | Description | Modality | Size |\n| -------------------------- | --------------------------------------------------------------------------------- | ----------- | ----- |\n| MIMIC-CXR-ReportGeneration | Generation of finding sections of radiology reports based on the radiology images | Chest X-ray | 2347 |\n| MIMIC-III | Summarization of radiology reports | Text | 13054 |\n\n</details>\n\n</br>\n\n<details>\n <summary>Natural Language Inference</summary>\n\n| Task | Description | Modality | Size |\n| ------ | ------------------------------------------------ | ---------------- | ---- |\n| MedNLI | Natural Language Inference on medical sentences. | General medicine | 1422 |\n\n</details>\n\n</br>\n\n<details>\n <summary>Image Classification</summary>\n\n| Task | Description | Modality | Size |\n| ----------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------- | ----- |\n| MIMIC-CXR-ImageClassification | Classification of radiology images into 5 diseases | Chest X-ray | 5159 |\n| VinDr-Mammo | Classification of mammography images into 5 BIRADS levels | Mammography | 429 |\n| Pad-UFES-20 | Classification of skin lesion images into 7 diseases | Dermatology | 2298 |\n| CBIS-DDSM-Mass | Classification of masses in mammography images into \"benign\", \"malignant\" or \"benign without callback\" | Mammography | 378 |\n| CBIS-DDSM-Calcification | Classification of calcification in mammography images into \"benign\", \"malignant\" or \"benign without callback\" | Mammography | 326 |\n| MNIST-Oct | Image classification of Optical coherence tomography of the retine | OCT | 1000 |\n| MNIST-Path | Image classification of pathology image | Pathology | 7180 |\n| MNIST-Blood | Image classification of blood cell seen through a microscope | Microscopy | 3421 |\n| MNIST-Breast | Image classification of mammography | Mammography | 156 |\n| MNIST-Derma | Image classification of skin deffect images | Dermatology | 2005 |\n| MNIST-OrganC | Image classification of abdominal CT scan | CT | 8216 |\n| MNIST-OrganS | Image classification of abdominal CT scan | CT | 8827 |\n| MNIST-Pneumonia | Image classification of chest X-Rays | X-Ray | 624 |\n| MNIST-Retina | Image classification of the retina taken with a fondus camera | Fondus Camera | 400 |\n| MNIST-Tissue | Image classification of kidney cortex seen through a microscope | Microscopy | 12820 |\n\n</details>\n\n</br>\n\n<p align=\"center\">\n <img src=\"figures/sankey.png\" alt=\"Sankey graph\">\n <br>\n <em>Representation of the modalities, tasks and datasets in MultiMedEval</em>\n</p>\n\n## Setup\n\nTo install the library, you can use `pip`\n\n```console\npip install multimedeval\n```\n\nTo run the benchmark on your model, you first need to create an instance of the `MultiMedEval` class.\n\n```python\nfrom multimedeval import MultiMedEval, SetupParams, EvalParams\nfrom multimedeval.utils import BatcherInput, BatcherOutput\n\nengine = MultiMedEval()\n```\n\nYou then need to call the `setup` function of the `engine`. This will download the datasets if needed and prepare them for evaluation. You can specify where to store the data and which datasets you want to download.\n\n```python\nsetupParams = SetupParams(medqa_dir=\"data/\")\ntasksReady = engine.setup(setup_params=setupParams)\n```\n\nHere we initialize the `SetupParams` dataclass with only the path for the MedQA dataset. If you omit to pass a directory for some of the datasets, they will be skipped during the evaluation. During the setup process, the script will need a Physionet username and password to download \"VinDr-Mammo\", \"MIMIC-CXR\" and \"MIMIC-III\". You also need to setup Kaggle on your machine before running the setup as the \"CBIS-DDSM\" is hosted on Kaggle. At the end of the setup process, you will see a summary of which tasks are ready and which didn't run properly and the function will return a summary in the form of a dictionary.\n\n## Usage\n\n### Implement the Batcher\n\nThe user must implement one Callable: `batcher`. It takes a batch of input and must return the answer.\nThe batch is a list of inputs.\nEach input is an instance of @dataclass `BatcherInput`, containing the following fields:\n\n- `conversation`: a prompt in the form of a Hugginface style conversation between a user and an assistant.\n- `images`: a list of Pillow images. The number of images matches the number of <img> tokens in the prompt and are ordered.\n- `segmentation_masks`: (optional) a list of segmentation masks, the number of which matches that of <seg> tokens in the prompt and are ordered.\n\n```python\n[\n BatcherInput(\n conversation = \n [\n {\"role\": \"user\", \"content\": \"This is a question with an image <img>.\"},\n {\"role\": \"assistant\", \"content\": \"This is the answer.\"},\n {\"role\": \"user\", \"content\": \"This is a question with an image <img>.\"},\n ],\n images = [PIL.Image(), PIL.Image()],\n segmentation_masks = [PIL.Image(), PIL.Image()]\n ),\n BatcherInput(\n conversation =\n [\n {\"role\": \"user\", \"content\": \"This is a question without images.\"},\n {\"role\": \"assistant\", \"content\": \"This is the answer.\"},\n {\"role\": \"user\", \"content\": \"This is a question without images.\"},\n ],\n images = [],\n segmentation_masks = []\n ),\n\n]\n```\n\nHere is an example of a `batcher` without any logic:\n\n```python\ndef batcher(prompts: List[BatcherInput]) -> List[BatcherOutput]:\n return [\"Answer\" for _ in prompts]\n```\n\nA function is the simplest example of a Callable but the batcher can also be implemented as a Callable class (i.e. a class implementing the `__call__` method). Doing it this way allows to initialize the model in the `__init__` function of the class. We give an example for the Mistral model (a language-only model).\n\n```python\nclass batcherMistral:\n def __init__(self) -> None:\n self.model: MistralModel = AutoModelForCausalLM.from_pretrained(\"mistralai/Mistral-7B-Instruct-v0.1\")\n self.tokenizer = AutoTokenizer.from_pretrained(\"mistralai/Mistral-7B-Instruct-v0.1\")\n self.tokenizer.pad_token = self.tokenizer.eos_token\n\n def __call__(self, prompts: List[BatcherInput]) -> List[BatcherOutput]:\n model_inputs = [self.tokenizer.apply_chat_template(messages.conversation, return_tensors=\"pt\", tokenize=False) for messages in prompts]\n model_inputs = self.tokenizer(model_inputs, padding=\"max_length\", truncation=True, max_length=1024, return_tensors=\"pt\")\n\n generated_ids = self.model.generate(**model_inputs, max_new_tokens=200, do_sample=True, pad_token_id=self.tokenizer.pad_token_id)\n\n # Remove the first 1024 tokens (prompt)\n generated_ids = generated_ids[:, model_inputs[\"input_ids\"].shape[1] :]\n\n answers = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)\n return [BatcherOutput(text=answer) for answer in answers]\n```\n\n### Run the benchmark\n\nTo run the benchmark, call the `eval` method of the `MultiMedEval` class with the list of tasks to benchmark on, the batcher to ealuate and the evaluation parameters. If the list is empty, all the tasks will be benchmarked.\n\n```python\nevalParams = EvalParams(batch_size=128)\nresults = engine.eval([\"MedQA\", \"VQA-RAD\"], batcher, eval_params=evalParams)\n```\n\n## MultiMedEval parameters\n\nThe `SetupParams` class takes a path for each dataset:\n\n- medqa_dir: will be use in Huggingface's `load_dataset` as cache_dir\n- pubmedqa_dir: will be use in Huggingface's `load_dataset` as cache_dir\n- medmcqa_dir: will be use in Huggingface's `load_dataset` as cache_dir\n- vqa_rad_dir: will be use in Huggingface's `load_dataset` as cache_dir\n- path_vqa_dir: will be use in Huggingface's `load_dataset` as cache_dir\n- slake_dir: the dataset is currently hosted on Google Drive which can be an issue on some systems.\n- mimic_iii_dir: path for the (physionet) MIMIC-III dataset.\n- mednli_dir: will be use in Huggingface's `load_dataset` as cache_dir\n- mimic_cxr_dir: path for the (physionet) MIMIC-CXR dataset.\n- vindr_mammo_dir: path for the (physionet) VinDr-Mammo dataset.\n- pad_ufes_20_dir\n- cbis_ddsm_dir: dataset hosted on Kaggle. Kaggle must be set up on the system (see [this](https://www.kaggle.com/docs/api#getting-started-installation-&-authentication))\n- mnist_oct_dir\n- mnist_path_dir\n- mnist_blood_dir\n- mnist_breast_dir\n- mnist_derma_dir\n- mnist_organc_dir\n- mnist_organs_dir\n- mnist_pneumonia_dir\n- mnist_retina_dir\n- mnist_tissue_dir\n- chexbert_dir: path for the CheXBert model checkpoint\n- physionet_username: physionet username to download MIMIC and VinDr-Mammo\n- physionet_password: password for the physionet account\n\nThe `EvalParams` class takes the following arguments:\n\n- batch_size: The size of the batches sent to the user's batcher Callable.\n- run_name: The name to use for the folder where the output will be stored.\n- fewshot: A boolean indicating whether the evaluation is few-shot.\n- num_workers: The number of workers for the dataloader.\n- device: The device to run the evaluation on.\n- tensorBoardWriter: The tensorboard writer to use for logging.\n- tensorboardStep: The global step for logging to tensorboard.\n\n## Additional tasks\n\nTo add a new task to the list of already implemented ones, create a folder named `MultiMedEvalAdditionalDatasets` and a subfolder with the name of your dataset.\n\nInside your dataset folder, create a `json` file that follows the following template for a VQA dataset:\n\n```json\n{\n \"taskType\": \"VQA\",\n \"modality\": \"Radiology\",\n \"samples\": [\n {\n \"question\": \"Question 1\",\n \"answer\": \"Answer 1\",\n \"images\": [\"image1.png\", \"image2.png\"]\n },\n { \"question\": \"Question 2\", \"answer\": \"Answer 2\", \"images\": [\"image1.png\"] }\n ]\n}\n```\n\nAnd for a QA dataset:\n\n```json\n{\n \"taskType\": \"QA\",\n \"modality\": \"Pathology\",\n \"samples\": [\n {\n \"question\": \"Question 1\",\n \"answer\": \"Answer 1\",\n \"options\": [\"Option 1\", \"Option 2\"],\n \"images\": [\"image1.png\", \"image2.png\"]\n },\n {\n \"question\": \"Question 2\",\n \"answer\": \"Answer 2\",\n \"options\": [\"Option 1\", \"Option 2\"],\n \"images\": [\"image1.png\"]\n }\n ]\n}\n```\n\nNote that in both cases the `images` key is optional. If the `taskType` is VQA, the metrics computed will be BLEU-1, accuracy for closed and open questions, recall and recall for open questions as well as F1. For the QA `taskType`, the tool will report the accuracy (by comparing the answer to every option using BLEU).\n\n## Reference\n\n```\n@misc{royer2024multimedeval,\n title={MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models},\n author={Corentin Royer and Bjoern Menze and Anjany Sekuboyina},\n year={2024},\n eprint={2402.09262},\n archivePrefix={arXiv},\n primaryClass={cs.CV}\n}\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python tool to evaluate the performance of VLM on the medical domain.",
"version": "1.0.0",
"project_urls": {
"Documentation": "https://github.com/corentin-ryr/MultiMedEval",
"Homepage": "https://github.com/corentin-ryr/MultiMedEval",
"Repository": "https://github.com/corentin-ryr/MultiMedEval"
},
"split_keywords": [
"evaluation",
" medical",
" vlm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5833f62c775e754de7c5d9070c4701e4cc96683de059750c911a35614b85af87",
"md5": "3d6565d6926815044bcc7cf80e6f6185",
"sha256": "05b15c7d2d7e798043ef7577274edea45549362c9300ba5c6fed6c4eb88e19ae"
},
"downloads": -1,
"filename": "multimedeval-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3d6565d6926815044bcc7cf80e6f6185",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 907340,
"upload_time": "2025-07-23T14:44:38",
"upload_time_iso_8601": "2025-07-23T14:44:38.977836Z",
"url": "https://files.pythonhosted.org/packages/58/33/f62c775e754de7c5d9070c4701e4cc96683de059750c911a35614b85af87/multimedeval-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6790ce9b7331ba3a748e27fc5a0e5c4f38e81dc9e440153ecc1f80fba7e112bc",
"md5": "32b881dabf99bc9c32e7eff42e210d65",
"sha256": "6ce7dfed53eb934ade54c5edd73fd0197a2669a235e7b022cff52dcd260e1bf7"
},
"downloads": -1,
"filename": "multimedeval-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "32b881dabf99bc9c32e7eff42e210d65",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 631967,
"upload_time": "2025-07-23T14:44:40",
"upload_time_iso_8601": "2025-07-23T14:44:40.782258Z",
"url": "https://files.pythonhosted.org/packages/67/90/ce9b7331ba3a748e27fc5a0e5c4f38e81dc9e440153ecc1f80fba7e112bc/multimedeval-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-23 14:44:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "corentin-ryr",
"github_project": "MultiMedEval",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "multimedeval"
}