hulu-evaluate

Name	hulu-evaluate JSON
Version	0.0.5 JSON
	download
home_page	None
Summary	Client library to fine-tune and evaluate models on the HuLU benchmark.
upload_time	2025-10-29 09:44:46
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	Apache
keywords	artificial-intelligence machine-learning deep-learning natural-language-processing fine-tuning evaluation benchmark
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # HuLU
[Hungarian Language Understanding Benchmark Kit](https://hulu.nytud.hu/)


This repository contains the databases included in HuLU, the Hungarian Language Understanding Benchmark Kit developed, maintained and updated in the Language Technology Research Group of the Hungarian Research Centre for Linguistics.

Currently (11/07/2024) six corpora are available to download and to test the models on.

- **HuCOLA** (Hungarian Corpus of Linguistic Acceptability) contains 9 076 Hungarian sentences labeled for their acceptability/grammaticality (0/1). The sentences were collected by two human annotators from three linguistic books. Each sentence was annotated by four human annotators. The final label of the sentence is the one assigned by the majority of the annotators. The proportion of train, validation and test sets is 80% (7 276 sentences), 10% (900 sentences) and 10% (900 sentences), respectively.
- **HuCoPa** (Hungarian Choice of Plausible Alternatives Corpus) contains 1,000 instances. Each instance is composed of a premise and two alternatives. The task is to select the alternative that describes a situation standing in causal relation to the situation described by the premise. The corpus was created by translating and re-annotating the original English CoPA corpus. The train, validation, and test sets contain 400, 100 and 500 instances, respectively.
- **HuRC** (Hungarian Reading Comprehension Dataset) contains 80,621 instances. Each instance is composed of a passage and a cloze-style query with a masked entity. The task is to select the named entity that is being masked in the query. The data was collected from the online news of Népszabadság online (nol.hu).
- **HuSST** (Hungarian version of the Stanford Sentiment Treebank) contains 11 683 sentences. Each sentence is annotated for its sentiment on a three-point scale. The corpus was created by translating and re-annotating the full sentences of the SST. The train, validation, and test sets contain 9 347, 1 168, and 1 168 sentences, respectively.
- **HuWNLI** is a Hungarian dataset of anaphora resolution, designed as a sentence pair classification task of natural language inference. Its base, the HuWS corpus was created by translating and manually curating the original English Winograd schemata. The NLI format was created by replacing the ambiguous pronoun with each possible referent in the schemata. We extended the set of sentence pairs derived from the schemata by the translation of the sentence pairs that build up the WNLI dataset of GLUE. The data is distributed in three splits: training set (562), development set (59), and test set (134).
- **HuCB** (Hungarian CommitmentBank) consists of short text fragments in which at least one sentence contains a subordinating clause, which is syntactically subordinated to a logical inference-canceling operator. In the database, the premise is the complete text fragment and the hypothesis is the embedded tag clause. In the inference task, it is necessary to decide to what extent the author of the text is committed to the truth of the subordinate clause. The corpus consists of a training, a validation, and a test set (of 250, 103, and 250 examples, respectively).


## Evaluation

An evaluation library / script is provided for fine-tuning and benchmarking language models on Hungarian tasks within the [HuLU benchmark](https://hulu.nytud.hu/).

It provides a unified CLI tool for running experiments across all six tasks, with support for Low-Rank Adaptation (LoRA), submission-ready outputs for the HuLU leaderboard, and multi-task training (i.e. training on more than one task). The library is configurable via parameters in a JSON file (see `parameters.json`) and compatible with HuggingFace's `transformers` library. For each task, the following evaluation metrics are computed: accuracy, balanced accuracy, precision, recall, F1 score, ROC-AUC, specificity and Matthews correlation coefficient (MCC).

### Installation

Installation via pypi

```bash
pip install hulu
```

To install the library for testing and development, clone the repository and install dependencies:

```bash
git clone git@github.com:nytud/HuLU.git
cd evaluate/
pip install .
```

### Usage

The CLI provides a unified interface for configuring and running HuLU model evaluation and fine-tuning.


```bash
hulu <command> [<args>]
```

Below are the supported command-line arguments.

| Option | Description |
|--------|-------------|
| `--model-name <str>` | **(Required)** Name or path of the pretrained model to load. |
| `--parameters-path <path>` | **(Required)** Path to the JSON configuration file containing training hyperparameters (e.g., `parameters.json`). |
| `--tokenizer-name <str>` | Name or path of the tokenizer. Defaults to the model’s tokenizer if not specified. |
| `--tasks <list>` | List of HuLU tasks to run. Choices: `hucola`, `hurte`, `huwnli`, `hucommitmentbank`, `husst`, `hucopa`. |
| `--eval-test <true/false>` | Whether to evaluate on the test set after training. Default: `false`. |
| `--report-to <str>` | Reporting backend (e.g., `wandb`, `mlflow`). |
| `--report-uri <str>` | URI for the selected reporting backend. |
| `--experiment-name <str>` | Experiment/project name used for reporting. Default: `hulu-finetune`. |
| `--run-name <str>` | Custom run name used for logging and reporting. |
| `--save-results-path <path>` | Directory to save prediction outputs and logs. Default: `./results/`. |

---

#### Example Usage

```bash
hulu --model-name distilbert/distilbert-base-uncased --tasks hucola husst --parameters ./parameters.json --experiment-name test-experiment --eval-test false
```

### Submitting the Results to the HuLU Leaderboard

The official [HuLU page](https://hulu.nytud.hu/tasks) allows you to validate the results of your training procedure. The output path is determined by the `--save-results-path` option; if it is not provided, a `results` directory is created in the working directory. This folder contains the model’s predictions for each task, and these files are required for submission.

Navigate to the webpage https://hulu.nytud.hu/ and sign up. After successful authorization, select `Submission` fill the form and upload the newly created predictions for the given task - one at a time.

## Citation

If you use these resources or any part of its documentation, please refer to:

Noémi Ligeti-Nagy, Gergő Ferenczi, Enikő Héja, László János Laki, Noémi Vadász, Zijian Győző Yang, and Tamás Váradi. 2024. HuLU: Hungarian Language Understanding Benchmark Kit. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8360–8371, Torino, Italia. ELRA and ICCL.

```bibtex
@inproceedings{ligeti-nagy-etal-2024-hulu-hungarian,
    title = "{H}u{LU}: {H}ungarian Language Understanding Benchmark Kit",
    author = "Ligeti-Nagy, Noémi  and
      Ferenczi, Gergő  and
      Héja, Enikő  and
      Laki, László János  and
      Vadász Noémi  and
      Yang, Zijian Győző  and
      Váradi Tamás",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.733",
    pages = "8360--8371",
}
```

and to any other references listed in the readme files of the individual corpora.

If you use the evaluation scripts / library, please refer to:

```bibtex
@inproceedings{hatvani2024hulu,
  author    = {Péter Hatvani, Kristóf Varga and Zijian Győző Yang},
  title     = {Evaluation Library for the Hungarian Language Understanding Benchmark (HuLU)},
  booktitle = {Proceedings of the 21th Hungarian Computational Linguistics Conference},
  year      = {2024},
  address   = {Hungary},
  publisher = {Szegedi Tudományegyetem TTIK, Informatikai Intézet},
  note      = {Affiliations: PPKE Doctoral School of Linguistics, HUN-REN Hungarian Research Center for Linguistics},
  email     = {hatvani9823@gmail.com, varga.kristof@nytud.hun-ren.hu, yang.zijian.gyozo@nytud.hun-ren.hu}
}
```

This evaluation scripts (only!) are licensed under the Apache License.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hulu-evaluate",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "artificial-intelligence, machine-learning, deep-learning, natural-language-processing, fine-tuning, evaluation, benchmark",
    "author": null,
    "author_email": "ELTE Research Center for Linguistics <varga.kristof@nytud.elte.hu>, ELTE Research Center for Linguistics <osvath.matyas@nytud.elte.hu>",
    "download_url": "https://files.pythonhosted.org/packages/1c/51/79f1eb37d64a76ed7d92578de3fa8f13093b85b01f9924a3fb7cb241ae30/hulu_evaluate-0.0.5.tar.gz",
    "platform": null,
    "description": "# HuLU\n[Hungarian Language Understanding Benchmark Kit](https://hulu.nytud.hu/)\n\n\nThis repository contains the databases included in HuLU, the Hungarian Language Understanding Benchmark Kit developed, maintained and updated in the Language Technology Research Group of the Hungarian Research Centre for Linguistics.\n\nCurrently (11/07/2024) six corpora are available to download and to test the models on.\n\n- **HuCOLA** (Hungarian Corpus of Linguistic Acceptability) contains 9 076 Hungarian sentences labeled for their acceptability/grammaticality (0/1). The sentences were collected by two human annotators from three linguistic books. Each sentence was annotated by four human annotators. The final label of the sentence is the one assigned by the majority of the annotators. The proportion of train, validation and test sets is 80% (7 276 sentences), 10% (900 sentences) and 10% (900 sentences), respectively.\n- **HuCoPa** (Hungarian Choice of Plausible Alternatives Corpus) contains 1,000 instances. Each instance is composed of a premise and two alternatives. The task is to select the alternative that describes a situation standing in causal relation to the situation described by the premise. The corpus was created by translating and re-annotating the original English CoPA corpus. The train, validation, and test sets contain 400, 100 and 500 instances, respectively.\n- **HuRC** (Hungarian Reading Comprehension Dataset) contains 80,621 instances. Each instance is composed of a passage and a cloze-style query with a masked entity. The task is to select the named entity that is being masked in the query. The data was collected from the online news of N\u00e9pszabads\u00e1g online (nol.hu).\n- **HuSST** (Hungarian version of the Stanford Sentiment Treebank) contains 11 683 sentences. Each sentence is annotated for its sentiment on a three-point scale. The corpus was created by translating and re-annotating the full sentences of the SST. The train, validation, and test sets contain 9 347, 1 168, and 1 168 sentences, respectively.\n- **HuWNLI** is a Hungarian dataset of anaphora resolution, designed as a sentence pair classification task of natural language inference. Its base, the HuWS corpus was created by translating and manually curating the original English Winograd schemata. The NLI format was created by replacing the ambiguous pronoun with each possible referent in the schemata. We extended the set of sentence pairs derived from the schemata by the translation of the sentence pairs that build up the WNLI dataset of GLUE. The data is distributed in three splits: training set (562), development set (59), and test set (134).\n- **HuCB** (Hungarian CommitmentBank) consists of short text fragments in which at least one sentence contains a subordinating clause, which is syntactically subordinated to a logical inference-canceling operator. In the database, the premise is the complete text fragment and the hypothesis is the embedded tag clause. In the inference task, it is necessary to decide to what extent the author of the text is committed to the truth of the subordinate clause. The corpus consists of a training, a validation, and a test set (of 250, 103, and 250 examples, respectively).\n\n\n## Evaluation\n\nAn evaluation library / script is provided for fine-tuning and benchmarking language models on Hungarian tasks within the [HuLU benchmark](https://hulu.nytud.hu/).\n\nIt provides a unified CLI tool for running experiments across all six tasks, with support for Low-Rank Adaptation (LoRA), submission-ready outputs for the HuLU leaderboard, and multi-task training (i.e. training on more than one task). The library is configurable via parameters in a JSON file (see `parameters.json`) and compatible with HuggingFace's `transformers` library. For each task, the following evaluation metrics are computed: accuracy, balanced accuracy, precision, recall, F1 score, ROC-AUC, specificity and Matthews correlation coefficient (MCC).\n\n### Installation\n\nInstallation via pypi\n\n```bash\npip install hulu\n```\n\nTo install the library for testing and development, clone the repository and install dependencies:\n\n```bash\ngit clone git@github.com:nytud/HuLU.git\ncd evaluate/\npip install .\n```\n\n### Usage\n\nThe CLI provides a unified interface for configuring and running HuLU model evaluation and fine-tuning.\n\n\n```bash\nhulu <command> [<args>]\n```\n\nBelow are the supported command-line arguments.\n\n| Option | Description |\n|--------|-------------|\n| `--model-name <str>` | **(Required)** Name or path of the pretrained model to load. |\n| `--parameters-path <path>` | **(Required)** Path to the JSON configuration file containing training hyperparameters (e.g., `parameters.json`). |\n| `--tokenizer-name <str>` | Name or path of the tokenizer. Defaults to the model\u2019s tokenizer if not specified. |\n| `--tasks <list>` | List of HuLU tasks to run. Choices: `hucola`, `hurte`, `huwnli`, `hucommitmentbank`, `husst`, `hucopa`. |\n| `--eval-test <true/false>` | Whether to evaluate on the test set after training. Default: `false`. |\n| `--report-to <str>` | Reporting backend (e.g., `wandb`, `mlflow`). |\n| `--report-uri <str>` | URI for the selected reporting backend. |\n| `--experiment-name <str>` | Experiment/project name used for reporting. Default: `hulu-finetune`. |\n| `--run-name <str>` | Custom run name used for logging and reporting. |\n| `--save-results-path <path>` | Directory to save prediction outputs and logs. Default: `./results/`. |\n\n---\n\n#### Example Usage\n\n```bash\nhulu --model-name distilbert/distilbert-base-uncased --tasks hucola husst --parameters ./parameters.json --experiment-name test-experiment --eval-test false\n```\n\n### Submitting the Results to the HuLU Leaderboard\n\nThe official [HuLU page](https://hulu.nytud.hu/tasks) allows you to validate the results of your training procedure. The output path is determined by the `--save-results-path` option; if it is not provided, a `results` directory is created in the working directory. This folder contains the model\u2019s predictions for each task, and these files are required for submission.\n\nNavigate to the webpage https://hulu.nytud.hu/ and sign up. After successful authorization, select `Submission` fill the form and upload the newly created predictions for the given task - one at a time.\n\n## Citation\n\nIf you use these resources or any part of its documentation, please refer to:\n\nNo\u00e9mi Ligeti-Nagy, Gerg\u0151 Ferenczi, Enik\u0151 H\u00e9ja, L\u00e1szl\u00f3 J\u00e1nos Laki, No\u00e9mi Vad\u00e1sz, Zijian Gy\u0151z\u0151 Yang, and Tam\u00e1s V\u00e1radi. 2024. HuLU: Hungarian Language Understanding Benchmark Kit. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8360\u20138371, Torino, Italia. ELRA and ICCL.\n\n```bibtex\n@inproceedings{ligeti-nagy-etal-2024-hulu-hungarian,\n    title = \"{H}u{LU}: {H}ungarian Language Understanding Benchmark Kit\",\n    author = \"Ligeti-Nagy, No\u00e9mi  and\n      Ferenczi, Gerg\u0151  and\n      H\u00e9ja, Enik\u0151  and\n      Laki, L\u00e1szl\u00f3 J\u00e1nos  and\n      Vad\u00e1sz No\u00e9mi  and\n      Yang, Zijian Gy\u0151z\u0151  and\n      V\u00e1radi Tam\u00e1s\",\n    editor = \"Calzolari, Nicoletta  and\n      Kan, Min-Yen  and\n      Hoste, Veronique  and\n      Lenci, Alessandro  and\n      Sakti, Sakriani  and\n      Xue, Nianwen\",\n    booktitle = \"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)\",\n    month = may,\n    year = \"2024\",\n    address = \"Torino, Italia\",\n    publisher = \"ELRA and ICCL\",\n    url = \"https://aclanthology.org/2024.lrec-main.733\",\n    pages = \"8360--8371\",\n}\n```\n\nand to any other references listed in the readme files of the individual corpora.\n\nIf you use the evaluation scripts / library, please refer to:\n\n```bibtex\n@inproceedings{hatvani2024hulu,\n  author    = {P\u00e9ter Hatvani, Krist\u00f3f Varga and Zijian Gy\u0151z\u0151 Yang},\n  title     = {Evaluation Library for the Hungarian Language Understanding Benchmark (HuLU)},\n  booktitle = {Proceedings of the 21th Hungarian Computational Linguistics Conference},\n  year      = {2024},\n  address   = {Hungary},\n  publisher = {Szegedi Tudom\u00e1nyegyetem TTIK, Informatikai Int\u00e9zet},\n  note      = {Affiliations: PPKE Doctoral School of Linguistics, HUN-REN Hungarian Research Center for Linguistics},\n  email     = {hatvani9823@gmail.com, varga.kristof@nytud.hun-ren.hu, yang.zijian.gyozo@nytud.hun-ren.hu}\n}\n```\n\nThis evaluation scripts (only!) are licensed under the Apache License.\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Client library to fine-tune and evaluate models on the HuLU benchmark.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://hulu.nytud.hu/",
        "Repository": "https://github.com/nytud/HuLU"
    },
    "split_keywords": [
        "artificial-intelligence",
        " machine-learning",
        " deep-learning",
        " natural-language-processing",
        " fine-tuning",
        " evaluation",
        " benchmark"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "006ff6644174f636c0aef703f54f704784323749e3438c59579affb66bfcc690",
                "md5": "188ace4ee19c322646e16be13c2c7f26",
                "sha256": "6d2238c88be434fe526de63174e045ff542b2a82e9e3cbfb5011a88d65362a7d"
            },
            "downloads": -1,
            "filename": "hulu_evaluate-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "188ace4ee19c322646e16be13c2c7f26",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15340,
            "upload_time": "2025-10-29T09:44:46",
            "upload_time_iso_8601": "2025-10-29T09:44:46.093349Z",
            "url": "https://files.pythonhosted.org/packages/00/6f/f6644174f636c0aef703f54f704784323749e3438c59579affb66bfcc690/hulu_evaluate-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1c5179f1eb37d64a76ed7d92578de3fa8f13093b85b01f9924a3fb7cb241ae30",
                "md5": "bb59c8cb68d1884a0294215d8a8588d6",
                "sha256": "041cb7bfc99f6ba36600e6cc4d9b86bd280702ca9807f1e105ab03879be522ec"
            },
            "downloads": -1,
            "filename": "hulu_evaluate-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "bb59c8cb68d1884a0294215d8a8588d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17583,
            "upload_time": "2025-10-29T09:44:46",
            "upload_time_iso_8601": "2025-10-29T09:44:46.933634Z",
            "url": "https://files.pythonhosted.org/packages/1c/51/79f1eb37d64a76ed7d92578de3fa8f13093b85b01f9924a3fb7cb241ae30/hulu_evaluate-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-29 09:44:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nytud",
    "github_project": "HuLU",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hulu-evaluate"
}

None