maverick-coref

Name	maverick-coref JSON
Version	1.0.3 JSON
	download
home_page	https://github.com/sapienzanlp/maverick-coref
Summary	None
upload_time	2024-08-27 07:35:06
maintainer	None
docs_url	None
author	Giuliano Martinelli
requires_python	>=3.8.0
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">
  Maverick Coref
</h1>
<div align="center">


[![Conference](https://img.shields.io/badge/ACL%202024%20Paper-red)](https://aclanthology.org/2024.acl-long.722.pdf)
[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-green.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
[![Pip Package](https://img.shields.io/badge/🐍%20Python%20package-blue)](https://pypi.org/project/maverick-coref/)
[![git](https://img.shields.io/badge/Git%20Repo%20-yellow.svg)](https://github.com/SapienzaNLP/maverick-coref)
</div>


This is the official repository for [*Maverick:
Efficient and Accurate Coreference Resolution Defying Recent Trends*](https://aclanthology.org/2024.acl-long.722.pdf).  


# Python Package
The `maverick-coref` Python package provides an easy API to use Maverick models, enabling efficient and accurate coreference resolution with few lines of code.

Install the library from [PyPI](https://pypi.org/project/maverick-coref/)

```bash
pip install maverick-coref
```
or from source 

```bash
git clone https://github.com/SapienzaNLP/maverick-coref.git
cd maverick-coref
pip install -e .
```

## Loading a Pretrained Model
Maverick models can be loaded using huggingface_id or local path:
```bash
from maverick import Maverick
model = Maverick(
  hf_name_or_path = "maverick_hf_name" | "maverick_ckpt_path", default = "sapienzanlp/maverick-mes-ontonotes"
  device = "cpu" | "cuda", default = "cuda:0"
)
```

## Available Models

Available models at [SapienzaNLP huggingface hub](https://huggingface.co/collections/sapienzanlp/maverick-coreference-resolution-66a750a50246fad8d9c7086a):

|            hf_model_name            | training dataset | Score | Singletons |
|:-----------------------------------:|:----------------:|:-----:|:----------:|
|    ["sapienzanlp/maverick-mes-ontonotes"](https://huggingface.co/sapienzanlp/maverick-mes-ontonotes)    |     OntoNotes    |  83.6 |     No     |
|     ["sapienzanlp/maverick-mes-litbank"](https://huggingface.co/sapienzanlp/maverick-mes-litbank)     |      LitBank     |  78.0 |     Yes    |
|      ["sapienzanlp/maverick-mes-preco"](https://huggingface.co/sapienzanlp/maverick-mes-preco)      |       PreCo      |  87.4 |     Yes    |
<!-- |    "sapienzanlp/maverick-s2e-ontonotes"    |     OntoNotes    |  83.4 |     No     |     No    | -->
<!-- |    "sapienzanlp/maverick-incr-ontonotes"   |     Ontonotes    |  83.5 |     No     |     No    | -->
<!-- |  "sapienzanlp/maverick-mes-ontonotes-base" |     Ontonotes    |  81.4 |     No     |     No    | -->
<!-- | "sapienzanlp/maverick-s2e-ontonotes-base"  |     Ontonotes    |  81.1 |     No     |     No    | -->
<!-- | "sapienzanlp/maverick-incr-ontonotes-base" |     Ontonotes    |  81.0 |     No     |     No    | -->
<!-- |     "sapienzanlp/maverick-s2e-litbank"     |      LitBank     |  77.6 |     Yes    |     No    | -->
<!-- |     "sapienzanlp/maverick-incr-litbank"    |      LitBank     |  78.3 |     Yes    |     No    | -->
<!-- |      "sapienzanlp/maverick-s2e-preco"      |       PreCo      |  87.2 |     Yes    |     No    | -->
<!-- |      "sapienzanlp/maverick-incr-preco"     |       PreCo      |  88.0 |     Yes    |     No    | -->
N.B. Each dataset has different annotation guidelines, choose your model according to your use case.

## Inference
### Inputs
Maverick inputs can be formatted as either:
- plain text:
  ```bash
  text = "Barack Obama is traveling to Rome. The city is sunny and the president plans to visit its most important attractions"
  ```
- word-tokenized text, as a list of tokens:
  ```bash
  word_tokenized = ['Barack', 'Obama', 'is', 'traveling', 'to', 'Rome', '.',  'The', 'city', 'is', 'sunny', 'and', 'the', 'president', 'plans', 'to', 'visit', 'its', 'most', 'important', 'attractions']
  ```
- sentence split, word-tokenized text, i.e., OntoNotes like input, as a list of lists of tokens:
  ```bash
  ontonotes_format = [['Barack', 'Obama', 'is', 'traveling', 'to', 'Rome', '.'], ['The', 'city', 'is', 'sunny', 'and', 'the', 'president', 'plans', 'to', 'visit', 'its', 'most', 'important', 'attractions']] 
  ```

### Predict
You can use model.predict() to obtain coreference predictions.
For a sample input, the model will a dictionary containing:
- `tokens`, word tokenized version of the input.
- `clusters_token_offsets`, a list of clusters containing mentions' token offsets.
- `clusters_text_mentions`, a list of clusters containing mentions in plain text.

Example:
  ```bash
model.predict(ontonotes_format)
>>> {
  'tokens': ['Barack', 'Obama', 'is', 'traveling', 'to', 'Rome', '.', 'The', 'city', 'is', 'sunny', 'and', 'the', 'president', 'plans', 'to', 'visit', 'its', 'most', 'important', 'monument', ',', 'the', 'Colosseum'], 
  'clusters_token_offsets': [[(5, 5), (7, 8), (17, 17)], [(0, 1), (12, 13)]], 
  'clusters_text_mentions': [['Rome', 'The city', 'its'], ['Barack Obama', 'the president']]
}
```

If you input plain text, the model will include also char level offsets as `clusters_char_offsets`:
```bash
model.predict(text)
>>> {
  'tokens': [...], 
  'clusters_token_offsets': [...], 
  'clusters_char_offsets': [[(29, 32), (35, 42), (86, 88)], [(0, 11), (57, 69)]], 
  'clusters_text_mentions': [...]
  }
```

### 🚨Additional Features🚨
Since Coreference Resolution may serve as a stepping stone for many downstream use cases, in this package we cover multiple additional features:

- **Singletons**, either include or exclude singletons (i.e., single mention clusters) prediction by setting `singletons` to `True` or `False`.
*(hint: for accurate singletons use preco- or litbank-based models, since ontonotes does not include singletons and therefore the model is not trained to extract any)*
  ```bash
  #supported input: ontonotes_format
  model.predict(ontonotes_format, singletons=True)
  {'tokens': [...], 
  'clusters_token_offsets': [((5, 5), (7, 8), (17, 17)), ((0, 1), (12, 13)), ((17, 20),)],
  'clusters_char_offsets': None, 
  'clusters_token_text': [['Rome', 'The city', 'its'], ['Barack Obama', 'the president'], ['its most important attractions']], 
  'clusters_char_text': None
  }
  ```

- **Clustering-only**, predict with predefined mentions (clustering-only), by passing mentions as a list of token offsets.
  ```bash
  #supported input: ontonotes_format
  mentions = [(0, 1), (5, 5), (7, 8)]
  model.predict(ontonotes_format, predefined_mentions=mentions)
  >>> {'tokens': [...], 
  'clusters_token_offsets': [((5, 5), (7, 8))],
  'clusters_char_offsets': None, 
  'clusters_token_text': [['Rome', 'The city']], 
  'clusters_char_text': None}
  ```

- **Starting from gold clusters**, predict starting from gold clusters, by passing the model the mentions as a list of token offsets.
*(Note: since starting clusters will be the first in the token offset outputs, to obtain the coreference resolution predictions **only for starting clusters** it is enough to take the first N clusters, where N is the number of starting clusters.)*
  ```bash
  #supported input: ontonotes_format 
  clusters = [[(5, 5), (7, 8)], [(0, 1)]]
  model.predict(ontonotes_format, add_gold_clusters=clusters)
  >>> {'tokens': [...], 'clusters_token_offsets': [((5, 5), (7, 8), (17, 17)), ((0, 1), (12, 13))], 'clusters_char_offsets': None, 'clusters_token_text': [['Rome', 'The city', 'its'], ['Barack Obama', 'the president']], 'clusters_char_text': None}
  ```

- **Speaker information**, since OntoNotes models are trained with additional speaker information [(more info here)](https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf), you can specify speaker information with OntoNotes format. 

```bash
  #supported input: ontonotes_format 
  speakers = [["Mark", "Mark", "Mark", "Mark", "Mark"],["Jhon", "Jhon", "Jhon", "Jhon"]]
  model.predict(ontonotes_format, speakers=clusters)
```

# Using the official Training and Evaluation Script

This same repository contains also the code to train and evaluate Maverick systems using pytorch-lightning and Hydra.

**We strongly suggest to directly use the [python package](https://pypi.org/project/maverick-coref/) for easier inference and downstream usage.** 

## Environment
To set up the training and evaluation environment, run the bash script setup.sh that you can find at top level in this repository. This script will handle the creation of a new conda environment and will take care of all the requirements and data preprocessing for training and evaluating a model on OntoNotes. 

Simply run on the command line:
```
bash ./setup.sh
```
N.B. Remember to put the zip file *ontonotes-release-5.0_LDC2013T19.tgz* in the folder *data/prepare_ontonotes/* if you want to preprocess Ontonotes with the standard preprocessing proposed by [e2e-coref](https://github.com/kentonl/e2e-coref/). OntoNotes can be downloaded, upon registration, at the following [link](https://catalog.ldc.upenn.edu/LDC2013T19)

## Hydra
This repository uses [Hydra](https://hydra.cc/) configuration environment.

- In *conf/data/* each yaml file contains a dataset configuration.
- *conf/evaluation/* contains the model checkpoint file path and device settings for model evaluation.
- *conf/logging/* contains details for wandb logging.
- In *conf/model/*, each yaml file contains a model setup.
-  *conf/train/* contains training configurations.
- *conf/root.yaml* regulates the overall configuration of the environment.


## Train
To train a Maverick model, modify *conf/root.yaml* with your custom setup. 
By default, this file contains the settings for training and evaluating on the OntoNotes dataset.

To train a new model, follow the steps in  [Environment](#environment) section and run the following script:
```
conda activate maverick_env
python maverick/train.py
```


## Evaluate
To evaluate an existing model, it is necessary to set up two different environment variables.
1. Set the dataset path in conf/root.yaml, by default it is set to OntoNotes.
2. Set the model checkpoint path in conf/evaluation/default_evaluation.yaml.

Finally run the following:
```
conda activate env_name
python maverick/evaluate.py
```
This will directly output the CoNLL-2012 scores, and, under the experiments/ folder,  a output.jsonlines file containing the model outputs in OntoNotes style.

### Replicate paper results
The weights of each model can be found in the [SapienzaNLP huggingface hub](https://huggingface.co/collections/sapienzanlp/maverick-coreference-resolution-66a750a50246fad8d9c7086a).
To replicate any of the paper results,  download the weights.ckpt of a model from the its model card files and follow the steps reported in the [Evaluate](#evaluate) section.

E.G. to replicate the state of the art results of *Maverick_mes* on OntoNotes:
- download the weights from [here](https://huggingface.co/sapienzanlp/maverick-mes-ontonotes/blob/main/weights.ckpt).
- copy the local path of the weights in conf/evaluation/default_evaluation.yaml.
- activate the project's conda environment with *conda activate maverick_coref*.
- run *python maverick/evaluate.py*

# Citation
This work has been published at [ACL 2024 main conference](https://aclanthology.org/2024.acl-long.722.pdf). If you use any part, please consider citing our paper as follows:
```bibtex
@inproceedings{martinelli-etal-2024-maverick,
    title = "Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends",
    author = "Martinelli, Giuliano  and
      Barba, Edoardo  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.722",
    pages = "13380--13394",
}
```


## License

The data and software are licensed under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sapienzanlp/maverick-coref",
    "name": "maverick-coref",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8.0",
    "maintainer_email": null,
    "keywords": null,
    "author": "Giuliano Martinelli",
    "author_email": "giuliano.martinelli97@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/95/06/e08c4a0b99dd7789105f426c198c04fc14309f5aebd71d1d6171e56d3284/maverick-coref-1.0.3.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n  Maverick Coref\n</h1>\n<div align=\"center\">\n\n\n[![Conference](https://img.shields.io/badge/ACL%202024%20Paper-red)](https://aclanthology.org/2024.acl-long.722.pdf)\n[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-green.svg)](https://creativecommons.org/licenses/by-nc/4.0/)\n[![Pip Package](https://img.shields.io/badge/\ud83d\udc0d%20Python%20package-blue)](https://pypi.org/project/maverick-coref/)\n[![git](https://img.shields.io/badge/Git%20Repo%20-yellow.svg)](https://github.com/SapienzaNLP/maverick-coref)\n</div>\n\n\nThis is the official repository for [*Maverick:\nEfficient and Accurate Coreference Resolution Defying Recent Trends*](https://aclanthology.org/2024.acl-long.722.pdf).  \n\n\n# Python Package\nThe `maverick-coref` Python package provides an easy API to use Maverick models, enabling efficient and accurate coreference resolution with few lines of code.\n\nInstall the library from [PyPI](https://pypi.org/project/maverick-coref/)\n\n```bash\npip install maverick-coref\n```\nor from source \n\n```bash\ngit clone https://github.com/SapienzaNLP/maverick-coref.git\ncd maverick-coref\npip install -e .\n```\n\n## Loading a Pretrained Model\nMaverick models can be loaded using huggingface_id or local path:\n```bash\nfrom maverick import Maverick\nmodel = Maverick(\n  hf_name_or_path = \"maverick_hf_name\" | \"maverick_ckpt_path\", default = \"sapienzanlp/maverick-mes-ontonotes\"\n  device = \"cpu\" | \"cuda\", default = \"cuda:0\"\n)\n```\n\n## Available Models\n\nAvailable models at [SapienzaNLP huggingface hub](https://huggingface.co/collections/sapienzanlp/maverick-coreference-resolution-66a750a50246fad8d9c7086a):\n\n|            hf_model_name            | training dataset | Score | Singletons |\n|:-----------------------------------:|:----------------:|:-----:|:----------:|\n|    [\"sapienzanlp/maverick-mes-ontonotes\"](https://huggingface.co/sapienzanlp/maverick-mes-ontonotes)    |     OntoNotes    |  83.6 |     No     |\n|     [\"sapienzanlp/maverick-mes-litbank\"](https://huggingface.co/sapienzanlp/maverick-mes-litbank)     |      LitBank     |  78.0 |     Yes    |\n|      [\"sapienzanlp/maverick-mes-preco\"](https://huggingface.co/sapienzanlp/maverick-mes-preco)      |       PreCo      |  87.4 |     Yes    |\n<!-- |    \"sapienzanlp/maverick-s2e-ontonotes\"    |     OntoNotes    |  83.4 |     No     |     No    | -->\n<!-- |    \"sapienzanlp/maverick-incr-ontonotes\"   |     Ontonotes    |  83.5 |     No     |     No    | -->\n<!-- |  \"sapienzanlp/maverick-mes-ontonotes-base\" |     Ontonotes    |  81.4 |     No     |     No    | -->\n<!-- | \"sapienzanlp/maverick-s2e-ontonotes-base\"  |     Ontonotes    |  81.1 |     No     |     No    | -->\n<!-- | \"sapienzanlp/maverick-incr-ontonotes-base\" |     Ontonotes    |  81.0 |     No     |     No    | -->\n<!-- |     \"sapienzanlp/maverick-s2e-litbank\"     |      LitBank     |  77.6 |     Yes    |     No    | -->\n<!-- |     \"sapienzanlp/maverick-incr-litbank\"    |      LitBank     |  78.3 |     Yes    |     No    | -->\n<!-- |      \"sapienzanlp/maverick-s2e-preco\"      |       PreCo      |  87.2 |     Yes    |     No    | -->\n<!-- |      \"sapienzanlp/maverick-incr-preco\"     |       PreCo      |  88.0 |     Yes    |     No    | -->\nN.B. Each dataset has different annotation guidelines, choose your model according to your use case.\n\n## Inference\n### Inputs\nMaverick inputs can be formatted as either:\n- plain text:\n  ```bash\n  text = \"Barack Obama is traveling to Rome. The city is sunny and the president plans to visit its most important attractions\"\n  ```\n- word-tokenized text, as a list of tokens:\n  ```bash\n  word_tokenized = ['Barack', 'Obama', 'is', 'traveling', 'to', 'Rome', '.',  'The', 'city', 'is', 'sunny', 'and', 'the', 'president', 'plans', 'to', 'visit', 'its', 'most', 'important', 'attractions']\n  ```\n- sentence split, word-tokenized text, i.e., OntoNotes like input, as a list of lists of tokens:\n  ```bash\n  ontonotes_format = [['Barack', 'Obama', 'is', 'traveling', 'to', 'Rome', '.'], ['The', 'city', 'is', 'sunny', 'and', 'the', 'president', 'plans', 'to', 'visit', 'its', 'most', 'important', 'attractions']] \n  ```\n\n### Predict\nYou can use model.predict() to obtain coreference predictions.\nFor a sample input, the model will a dictionary containing:\n- `tokens`, word tokenized version of the input.\n- `clusters_token_offsets`, a list of clusters containing mentions' token offsets.\n- `clusters_text_mentions`, a list of clusters containing mentions in plain text.\n\nExample:\n  ```bash\nmodel.predict(ontonotes_format)\n>>> {\n  'tokens': ['Barack', 'Obama', 'is', 'traveling', 'to', 'Rome', '.', 'The', 'city', 'is', 'sunny', 'and', 'the', 'president', 'plans', 'to', 'visit', 'its', 'most', 'important', 'monument', ',', 'the', 'Colosseum'], \n  'clusters_token_offsets': [[(5, 5), (7, 8), (17, 17)], [(0, 1), (12, 13)]], \n  'clusters_text_mentions': [['Rome', 'The city', 'its'], ['Barack Obama', 'the president']]\n}\n```\n\nIf you input plain text, the model will include also char level offsets as `clusters_char_offsets`:\n```bash\nmodel.predict(text)\n>>> {\n  'tokens': [...], \n  'clusters_token_offsets': [...], \n  'clusters_char_offsets': [[(29, 32), (35, 42), (86, 88)], [(0, 11), (57, 69)]], \n  'clusters_text_mentions': [...]\n  }\n```\n\n### \ud83d\udea8Additional Features\ud83d\udea8\nSince Coreference Resolution may serve as a stepping stone for many downstream use cases, in this package we cover multiple additional features:\n\n- **Singletons**, either include or exclude singletons (i.e., single mention clusters) prediction by setting `singletons` to `True` or `False`.\n*(hint: for accurate singletons use preco- or litbank-based models, since ontonotes does not include singletons and therefore the model is not trained to extract any)*\n  ```bash\n  #supported input: ontonotes_format\n  model.predict(ontonotes_format, singletons=True)\n  {'tokens': [...], \n  'clusters_token_offsets': [((5, 5), (7, 8), (17, 17)), ((0, 1), (12, 13)), ((17, 20),)],\n  'clusters_char_offsets': None, \n  'clusters_token_text': [['Rome', 'The city', 'its'], ['Barack Obama', 'the president'], ['its most important attractions']], \n  'clusters_char_text': None\n  }\n  ```\n\n- **Clustering-only**, predict with predefined mentions (clustering-only), by passing mentions as a list of token offsets.\n  ```bash\n  #supported input: ontonotes_format\n  mentions = [(0, 1), (5, 5), (7, 8)]\n  model.predict(ontonotes_format, predefined_mentions=mentions)\n  >>> {'tokens': [...], \n  'clusters_token_offsets': [((5, 5), (7, 8))],\n  'clusters_char_offsets': None, \n  'clusters_token_text': [['Rome', 'The city']], \n  'clusters_char_text': None}\n  ```\n\n- **Starting from gold clusters**, predict starting from gold clusters, by passing the model the mentions as a list of token offsets.\n*(Note: since starting clusters will be the first in the token offset outputs, to obtain the coreference resolution predictions **only for starting clusters** it is enough to take the first N clusters, where N is the number of starting clusters.)*\n  ```bash\n  #supported input: ontonotes_format \n  clusters = [[(5, 5), (7, 8)], [(0, 1)]]\n  model.predict(ontonotes_format, add_gold_clusters=clusters)\n  >>> {'tokens': [...], 'clusters_token_offsets': [((5, 5), (7, 8), (17, 17)), ((0, 1), (12, 13))], 'clusters_char_offsets': None, 'clusters_token_text': [['Rome', 'The city', 'its'], ['Barack Obama', 'the president']], 'clusters_char_text': None}\n  ```\n\n- **Speaker information**, since OntoNotes models are trained with additional speaker information [(more info here)](https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf), you can specify speaker information with OntoNotes format. \n\n```bash\n  #supported input: ontonotes_format \n  speakers = [[\"Mark\", \"Mark\", \"Mark\", \"Mark\", \"Mark\"],[\"Jhon\", \"Jhon\", \"Jhon\", \"Jhon\"]]\n  model.predict(ontonotes_format, speakers=clusters)\n```\n\n# Using the official Training and Evaluation Script\n\nThis same repository contains also the code to train and evaluate Maverick systems using pytorch-lightning and Hydra.\n\n**We strongly suggest to directly use the [python package](https://pypi.org/project/maverick-coref/) for easier inference and downstream usage.** \n\n## Environment\nTo set up the training and evaluation environment, run the bash script setup.sh that you can find at top level in this repository. This script will handle the creation of a new conda environment and will take care of all the requirements and data preprocessing for training and evaluating a model on OntoNotes. \n\nSimply run on the command line:\n```\nbash ./setup.sh\n```\nN.B. Remember to put the zip file *ontonotes-release-5.0_LDC2013T19.tgz* in the folder *data/prepare_ontonotes/* if you want to preprocess Ontonotes with the standard preprocessing proposed by [e2e-coref](https://github.com/kentonl/e2e-coref/). OntoNotes can be downloaded, upon registration, at the following [link](https://catalog.ldc.upenn.edu/LDC2013T19)\n\n## Hydra\nThis repository uses [Hydra](https://hydra.cc/) configuration environment.\n\n- In *conf/data/* each yaml file contains a dataset configuration.\n- *conf/evaluation/* contains the model checkpoint file path and device settings for model evaluation.\n- *conf/logging/* contains details for wandb logging.\n- In *conf/model/*, each yaml file contains a model setup.\n-  *conf/train/* contains training configurations.\n- *conf/root.yaml* regulates the overall configuration of the environment.\n\n\n## Train\nTo train a Maverick model, modify *conf/root.yaml* with your custom setup. \nBy default, this file contains the settings for training and evaluating on the OntoNotes dataset.\n\nTo train a new model, follow the steps in  [Environment](#environment) section and run the following script:\n```\nconda activate maverick_env\npython maverick/train.py\n```\n\n\n## Evaluate\nTo evaluate an existing model, it is necessary to set up two different environment variables.\n1. Set the dataset path in conf/root.yaml, by default it is set to OntoNotes.\n2. Set the model checkpoint path in conf/evaluation/default_evaluation.yaml.\n\nFinally run the following:\n```\nconda activate env_name\npython maverick/evaluate.py\n```\nThis will directly output the CoNLL-2012 scores, and, under the experiments/ folder,  a output.jsonlines file containing the model outputs in OntoNotes style.\n\n### Replicate paper results\nThe weights of each model can be found in the [SapienzaNLP huggingface hub](https://huggingface.co/collections/sapienzanlp/maverick-coreference-resolution-66a750a50246fad8d9c7086a).\nTo replicate any of the paper results,  download the weights.ckpt of a model from the its model card files and follow the steps reported in the [Evaluate](#evaluate) section.\n\nE.G. to replicate the state of the art results of *Maverick_mes* on OntoNotes:\n- download the weights from [here](https://huggingface.co/sapienzanlp/maverick-mes-ontonotes/blob/main/weights.ckpt).\n- copy the local path of the weights in conf/evaluation/default_evaluation.yaml.\n- activate the project's conda environment with *conda activate maverick_coref*.\n- run *python maverick/evaluate.py*\n\n# Citation\nThis work has been published at [ACL 2024 main conference](https://aclanthology.org/2024.acl-long.722.pdf). If you use any part, please consider citing our paper as follows:\n```bibtex\n@inproceedings{martinelli-etal-2024-maverick,\n    title = \"Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends\",\n    author = \"Martinelli, Giuliano  and\n      Barba, Edoardo  and\n      Navigli, Roberto\",\n    booktitle = \"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = aug,\n    year = \"2024\",\n    address = \"Bangkok, Thailand\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.acl-long.722\",\n    pages = \"13380--13394\",\n}\n```\n\n\n## License\n\nThe data and software are licensed under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).",
    "bugtrack_url": null,
    "license": null,
    "summary": null,
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/sapienzanlp/maverick-coref"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9506e08c4a0b99dd7789105f426c198c04fc14309f5aebd71d1d6171e56d3284",
                "md5": "5f4a7cd7582b4c27b3e95b7346edef07",
                "sha256": "7b07a865ada12a665a926a2fe5e171d34369d4fe581255c6b300dd0cd1a8a938"
            },
            "downloads": -1,
            "filename": "maverick-coref-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "5f4a7cd7582b4c27b3e95b7346edef07",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.0",
            "size": 41066,
            "upload_time": "2024-08-27T07:35:06",
            "upload_time_iso_8601": "2024-08-27T07:35:06.031237Z",
            "url": "https://files.pythonhosted.org/packages/95/06/e08c4a0b99dd7789105f426c198c04fc14309f5aebd71d1d6171e56d3284/maverick-coref-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-27 07:35:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sapienzanlp",
    "github_project": "maverick-coref",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "maverick-coref"
}

Giuliano Martinelli