<div align="center">
# WebLINX
| [**🤗Dataset**](https://huggingface.co/datasets/McGill-NLP/WebLINX) | [**📄Paper**](https://arxiv.org/abs/2402.05930) | [**🌐Website**](https://mcgill-nlp.github.io/weblinx) | [**📓Colab**](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) |
| :--: | :--: | :--: | :--: |
| [**🤖Models**](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) | [**💻Explorer**](https://huggingface.co/spaces/McGill-NLP/weblinx-explorer) | [**🐦Tweets**](https://twitter.com/sivareddyg/status/1755799365031965140) | [**🏆Leaderboard**](https://paperswithcode.com/sota/conversational-web-navigation-on-weblinx) |
<br>
**[WebLINX: Real-World Website Navigation with Multi-Turn Dialogue](https://mcgill-nlp.github.io/weblinx)**\
*[Xing Han Lù*](https://xinghanlu.com), [Zdeněk Kasner*](https://kasnerz.github.io/), [Siva Reddy](https://sivareddy.in)*\
_\*Equal contribution_\
**ICML 2024 (Spotlight)**
<img src="https://github.com/McGill-NLP/weblinx/raw/main/docs/assets/images/webnav.demo.svg" width="80%" alt="Sample conversation between a user and an agent" />
</div>
## Intro
Welcome to `WebLINX`'s official repository! In addition to providing code used to train [the models](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) reported in our [WebLINX paper](https://arxiv.org/abs/2402.05930), we also provide a comprehensive Python library (aka API) to help you work with the [WebLINX dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX).
If you want to get started with `weblinx`, please check out the following places:
| | | |
| :---: | :---: | --- |
| 🌐 | [Website](https://mcgill-nlp.github.io/weblinx) | If you want a quick overview of the project, this is the best place to start.|
| 📓 | [Colab](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) | Eager to try it out? Start by running this colab notebook!|
| 🗄️ | [Docs](https://mcgill-nlp.github.io/weblinx/docs/) | You can find quickstart instructions, the official user guide, and all relevant API specifications in the docs. |
| 📄 | [Paper](https://arxiv.org/abs/2402.05930) | If you want to get more in-depth, please read our paper, which provides comprehensive description of the project and report relevant results.|
| 🤗 | [Dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX) | The official dataset page, you can download preprocessed dataset and follow instructions to get started.|
| | | |
*If you want to learn more about the codebase itself, please keep on reading!*
## Installation
```bash
# Install the base package
pip install weblinx
# Install all dependencies
pip install weblinx[all]
# Install specific dependencies for...
# ...processing HTML 🖥️
pip install weblinx[processing]
# ...video processing 📽️
pip install weblinx[video]
# ...evaluating models 🔬
pip install weblinx[eval]
# ...development of this library 🛠️
pip install weblinx[dev]
```
## Structure
This repository is structured in the following way:
| Module | Description |
| --- | --- |
| `weblinx` | The `__init__.py` provides many useful abstractions to provide a Pythonic experience when working with the dataset. For example, you can use `weblinx.Demonstration` to manipulate a demonstration at a high-level, `weblinx.Replay` to focus on more finegrained details of the demonstration, including iterating over turns, or `weblinx.Turn` to focus on a specific turn. All relevant information is included in the documentations! |
| `weblinx.eval` | Code for evaluating action models trained with WebLINX, it has both `import`able functions/metrics, but can also be accessed via command line |
| `weblinx.processing` | Code for processing various inputs or outputs used by the models, it is extensively used in the models' processing code |
| `weblinx.utils` | Miscellaneous utility functions used across the codebase. |
## Modeling
Our `modeling/` repo-level directory has code for processing, training and evaluating the models reported in the paper (DMR, LLaMA, MindAct, Pix2Act, Flan-T5). It is separate from the `weblinx` library, which focuses on data processing and evaluation. You can use it by cloning this repository, and it is recommended to edit the files in `modeling/` directly for your own needs. Our modeling code is separate from the `weblinx` library, but requires it as a dependency. You can install the modeling code by running:
```bash
# First, install the base package
pip install weblinx
# Then, clone this repo
git clone https://github.com/McGill-NLP/weblinx
cd weblinx/modeling
```
For the rest of the instructions, please take a look at the [modeling README](./modeling/README.md).
## Evaluation
To install packages necessary for evaluation, run:
```bash
pip install weblinx[eval]
```
You can now access the evaluation module by importing in Python:
```python
import weblinx.eval
```
Use `weblinx.eval.metrics` for evaluation metrics, `weblinx.eval.__init__` for useful evaluation-related functions. You may also find it useful to take a look at `weblinx.processing.outputs` to get an idea of how to use the outputs of the model for evaluation.
To run the automatic evaluation, you can use the following command:
```bash
python -m weblinx.eval --help
```
For more examples on how to use `weblinx.eval`, take a look at the [modeling README](./modeling/README.md).
> Note: We are still working on the code for `weblinx.eval` and `weblinx.processing.outputs`. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.
## Citations
If you use this library, please cite our work using the following:
```bibtex
@misc{lù2024weblinx,
title={WebLINX: Real-World Website Navigation with Multi-Turn Dialogue},
author={Xing Han Lù and Zdeněk Kasner and Siva Reddy},
year={2024},
eprint={2402.05930},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## License
This project's license can be found at [LICENSE](./LICENSE). Please note that the license of the data in `tests/data` follow the license from the official dataset, not the license of this repository. The official dataset's license can be found in the [official dataset page](https://huggingface.co/datasets/McGill-NLP/WebLINX). The license of the models trained using this repo might also differ - please find them in the respective model cards.
Raw data
{
"_id": null,
"home_page": "https://github.com/McGill-NLP/weblinx",
"name": "weblinx",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "McGill NLP",
"author_email": "weblinx@googlegroups.com",
"download_url": "https://files.pythonhosted.org/packages/4a/c6/5d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2/weblinx-0.3.2.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# WebLINX\n\n| [**\ud83e\udd17Dataset**](https://huggingface.co/datasets/McGill-NLP/WebLINX) | [**\ud83d\udcc4Paper**](https://arxiv.org/abs/2402.05930) | [**\ud83c\udf10Website**](https://mcgill-nlp.github.io/weblinx) | [**\ud83d\udcd3Colab**](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) |\n| :--: | :--: | :--: | :--: |\n| [**\ud83e\udd16Models**](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) | [**\ud83d\udcbbExplorer**](https://huggingface.co/spaces/McGill-NLP/weblinx-explorer) | [**\ud83d\udc26Tweets**](https://twitter.com/sivareddyg/status/1755799365031965140) | [**\ud83c\udfc6Leaderboard**](https://paperswithcode.com/sota/conversational-web-navigation-on-weblinx) |\n\n<br>\n\n**[WebLINX: Real-World Website Navigation with Multi-Turn Dialogue](https://mcgill-nlp.github.io/weblinx)**\\\n*[Xing Han L\u00f9*](https://xinghanlu.com), [Zden\u011bk Kasner*](https://kasnerz.github.io/), [Siva Reddy](https://sivareddy.in)*\\\n_\\*Equal contribution_\\\n**ICML 2024 (Spotlight)**\n\n<img src=\"https://github.com/McGill-NLP/weblinx/raw/main/docs/assets/images/webnav.demo.svg\" width=\"80%\" alt=\"Sample conversation between a user and an agent\" />\n\n</div>\n\n## Intro\n\nWelcome to `WebLINX`'s official repository! In addition to providing code used to train [the models](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) reported in our [WebLINX paper](https://arxiv.org/abs/2402.05930), we also provide a comprehensive Python library (aka API) to help you work with the [WebLINX dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX). \n\nIf you want to get started with `weblinx`, please check out the following places:\n\n| | | |\n| :---: | :---: | --- |\n| \ud83c\udf10 | [Website](https://mcgill-nlp.github.io/weblinx) | If you want a quick overview of the project, this is the best place to start.|\n| \ud83d\udcd3 | [Colab](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) | Eager to try it out? Start by running this colab notebook!|\n| \ud83d\uddc4\ufe0f | [Docs](https://mcgill-nlp.github.io/weblinx/docs/) | You can find quickstart instructions, the official user guide, and all relevant API specifications in the docs. |\n| \ud83d\udcc4 | [Paper](https://arxiv.org/abs/2402.05930) | If you want to get more in-depth, please read our paper, which provides comprehensive description of the project and report relevant results.|\n| \ud83e\udd17 | [Dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX) | The official dataset page, you can download preprocessed dataset and follow instructions to get started.|\n| | | |\n\n*If you want to learn more about the codebase itself, please keep on reading!*\n\n## Installation\n\n```bash\n# Install the base package\npip install weblinx\n\n# Install all dependencies\npip install weblinx[all]\n\n# Install specific dependencies for...\n# ...processing HTML \ud83d\udda5\ufe0f\npip install weblinx[processing]\n# ...video processing \ud83d\udcfd\ufe0f\npip install weblinx[video]\n# ...evaluating models \ud83d\udd2c\npip install weblinx[eval]\n# ...development of this library \ud83d\udee0\ufe0f\npip install weblinx[dev]\n```\n\n## Structure\n\nThis repository is structured in the following way:\n\n| Module | Description |\n| --- | --- |\n| `weblinx` | The `__init__.py` provides many useful abstractions to provide a Pythonic experience when working with the dataset. For example, you can use `weblinx.Demonstration` to manipulate a demonstration at a high-level, `weblinx.Replay` to focus on more finegrained details of the demonstration, including iterating over turns, or `weblinx.Turn` to focus on a specific turn. All relevant information is included in the documentations! |\n| `weblinx.eval` | Code for evaluating action models trained with WebLINX, it has both `import`able functions/metrics, but can also be accessed via command line |\n| `weblinx.processing` | Code for processing various inputs or outputs used by the models, it is extensively used in the models' processing code |\n| `weblinx.utils` | Miscellaneous utility functions used across the codebase. |\n\n## Modeling\n\nOur `modeling/` repo-level directory has code for processing, training and evaluating the models reported in the paper (DMR, LLaMA, MindAct, Pix2Act, Flan-T5). It is separate from the `weblinx` library, which focuses on data processing and evaluation. You can use it by cloning this repository, and it is recommended to edit the files in `modeling/` directly for your own needs. Our modeling code is separate from the `weblinx` library, but requires it as a dependency. You can install the modeling code by running:\n\n```bash\n# First, install the base package\npip install weblinx\n\n# Then, clone this repo\ngit clone https://github.com/McGill-NLP/weblinx\ncd weblinx/modeling\n```\n\nFor the rest of the instructions, please take a look at the [modeling README](./modeling/README.md).\n\n## Evaluation\n\nTo install packages necessary for evaluation, run:\n\n```bash\npip install weblinx[eval]\n```\n\nYou can now access the evaluation module by importing in Python:\n\n```python\nimport weblinx.eval\n```\n\nUse `weblinx.eval.metrics` for evaluation metrics, `weblinx.eval.__init__` for useful evaluation-related functions. You may also find it useful to take a look at `weblinx.processing.outputs` to get an idea of how to use the outputs of the model for evaluation.\n\nTo run the automatic evaluation, you can use the following command:\n\n```bash\npython -m weblinx.eval --help\n```\n\nFor more examples on how to use `weblinx.eval`, take a look at the [modeling README](./modeling/README.md).\n\n> Note: We are still working on the code for `weblinx.eval` and `weblinx.processing.outputs`. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.\n\n## Citations\n\nIf you use this library, please cite our work using the following:\n```bibtex\n@misc{l\u00f92024weblinx,\n title={WebLINX: Real-World Website Navigation with Multi-Turn Dialogue}, \n author={Xing Han L\u00f9 and Zden\u011bk Kasner and Siva Reddy},\n year={2024},\n eprint={2402.05930},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n```\n\n## License\n\nThis project's license can be found at [LICENSE](./LICENSE). Please note that the license of the data in `tests/data` follow the license from the official dataset, not the license of this repository. The official dataset's license can be found in the [official dataset page](https://huggingface.co/datasets/McGill-NLP/WebLINX). The license of the models trained using this repo might also differ - please find them in the respective model cards.\n",
"bugtrack_url": null,
"license": null,
"summary": "The official weblinx library",
"version": "0.3.2",
"project_urls": {
"Homepage": "https://github.com/McGill-NLP/weblinx"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ea3c940850b54ea8b5927e0b794fda6839a7cf2cc542c03af2388a55f0483d7b",
"md5": "019379d9b62cd18740b9f3924f1c4066",
"sha256": "9ab8de1c631617827955debaeb76864b8b3122d230185af1f9c30f5c793e7213"
},
"downloads": -1,
"filename": "weblinx-0.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "019379d9b62cd18740b9f3924f1c4066",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 82905,
"upload_time": "2024-10-01T23:55:57",
"upload_time_iso_8601": "2024-10-01T23:55:57.150357Z",
"url": "https://files.pythonhosted.org/packages/ea/3c/940850b54ea8b5927e0b794fda6839a7cf2cc542c03af2388a55f0483d7b/weblinx-0.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4ac65d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2",
"md5": "0bb624c4d318c4647dddc00244b416bf",
"sha256": "259946c2b08cf50b48929fdd1c17f09ff5808b7d53528d728d25b583b8a06c85"
},
"downloads": -1,
"filename": "weblinx-0.3.2.tar.gz",
"has_sig": false,
"md5_digest": "0bb624c4d318c4647dddc00244b416bf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 80931,
"upload_time": "2024-10-01T23:55:58",
"upload_time_iso_8601": "2024-10-01T23:55:58.618675Z",
"url": "https://files.pythonhosted.org/packages/4a/c6/5d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2/weblinx-0.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-01 23:55:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "McGill-NLP",
"github_project": "weblinx",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "weblinx"
}