weblinx


Nameweblinx JSON
Version 0.3.2 PyPI version JSON
download
home_pagehttps://github.com/McGill-NLP/weblinx
SummaryThe official weblinx library
upload_time2024-10-01 23:55:58
maintainerNone
docs_urlNone
authorMcGill NLP
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

# WebLINX

| [**🤗Dataset**](https://huggingface.co/datasets/McGill-NLP/WebLINX) | [**📄Paper**](https://arxiv.org/abs/2402.05930) | [**🌐Website**](https://mcgill-nlp.github.io/weblinx) | [**📓Colab**](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) |
| :--: | :--: | :--: | :--: |
| [**🤖Models**](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) | [**💻Explorer**](https://huggingface.co/spaces/McGill-NLP/weblinx-explorer) | [**🐦Tweets**](https://twitter.com/sivareddyg/status/1755799365031965140) | [**🏆Leaderboard**](https://paperswithcode.com/sota/conversational-web-navigation-on-weblinx) |

<br>

**[WebLINX: Real-World Website Navigation with Multi-Turn Dialogue](https://mcgill-nlp.github.io/weblinx)**\
*[Xing Han Lù*](https://xinghanlu.com), [Zdeněk Kasner*](https://kasnerz.github.io/), [Siva Reddy](https://sivareddy.in)*\
_\*Equal contribution_\
**ICML 2024 (Spotlight)**

<img src="https://github.com/McGill-NLP/weblinx/raw/main/docs/assets/images/webnav.demo.svg" width="80%" alt="Sample conversation between a user and an agent" />

</div>

## Intro

Welcome to `WebLINX`'s official repository! In addition to providing code used to train [the models](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) reported in our [WebLINX paper](https://arxiv.org/abs/2402.05930), we also provide a comprehensive Python library (aka API) to help you work with the [WebLINX dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX). 

If you want to get started with `weblinx`, please check out the following places:

| | | |
| :---: | :---: | --- |
| 🌐 | [Website](https://mcgill-nlp.github.io/weblinx) | If you want a quick overview of the project, this is the best place to start.|
| 📓 | [Colab](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) | Eager to try it out? Start by running this colab notebook!|
| 🗄️ | [Docs](https://mcgill-nlp.github.io/weblinx/docs/) | You can find quickstart instructions, the official user guide, and all relevant API specifications in the docs. |
| 📄 | [Paper](https://arxiv.org/abs/2402.05930) | If you want to get more in-depth, please read our paper, which provides comprehensive description of the project and report relevant results.|
| 🤗 | [Dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX) | The official dataset page, you can download preprocessed dataset and follow instructions to get started.|
| | | |

*If you want to learn more about the codebase itself, please keep on reading!*

## Installation

```bash
# Install the base package
pip install weblinx

# Install all dependencies
pip install weblinx[all]

# Install specific dependencies for...
# ...processing HTML 🖥️
pip install weblinx[processing]
# ...video processing 📽️
pip install weblinx[video]
# ...evaluating models 🔬
pip install weblinx[eval]
# ...development of this library 🛠️
pip install weblinx[dev]
```

## Structure

This repository is structured in the following way:

| Module | Description |
| --- | --- |
| `weblinx` | The `__init__.py` provides many useful abstractions to provide a Pythonic experience when working with the dataset. For example, you can use `weblinx.Demonstration` to manipulate a demonstration at a high-level, `weblinx.Replay` to focus on more finegrained details of the demonstration, including iterating over turns, or `weblinx.Turn` to focus on a specific turn. All relevant information is included in the documentations! |
| `weblinx.eval` | Code for evaluating action models trained with WebLINX, it has both `import`able functions/metrics, but can also be accessed via command line |
| `weblinx.processing` | Code for processing various inputs or outputs used by the models, it is extensively used in the models' processing code |
| `weblinx.utils` | Miscellaneous utility functions used across the codebase. |

## Modeling

Our `modeling/` repo-level directory has code for processing, training and evaluating the models reported in the paper (DMR, LLaMA, MindAct, Pix2Act, Flan-T5). It is separate from the `weblinx` library, which focuses on data processing and evaluation. You can use it by cloning this repository, and it is recommended to edit the files in `modeling/` directly for your own needs. Our modeling code is separate from the `weblinx` library, but requires it as a dependency. You can install the modeling code by running:

```bash
# First, install the base package
pip install weblinx

# Then, clone this repo
git clone https://github.com/McGill-NLP/weblinx
cd weblinx/modeling
```

For the rest of the instructions, please take a look at the [modeling README](./modeling/README.md).

## Evaluation

To install packages necessary for evaluation, run:

```bash
pip install weblinx[eval]
```

You can now access the evaluation module by importing in Python:

```python
import weblinx.eval
```

Use `weblinx.eval.metrics` for evaluation metrics, `weblinx.eval.__init__` for useful evaluation-related functions. You may also find it useful to take a look at `weblinx.processing.outputs` to get an idea of how to use the outputs of the model for evaluation.

To run the automatic evaluation, you can use the following command:

```bash
python -m weblinx.eval --help
```

For more examples on how to use `weblinx.eval`, take a look at the [modeling README](./modeling/README.md).

> Note: We are still working on the code for `weblinx.eval` and `weblinx.processing.outputs`. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.

## Citations

If you use this library, please cite our work using the following:
```bibtex
@misc{lù2024weblinx,
      title={WebLINX: Real-World Website Navigation with Multi-Turn Dialogue}, 
      author={Xing Han Lù and Zdeněk Kasner and Siva Reddy},
      year={2024},
      eprint={2402.05930},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## License

This project's license can be found at [LICENSE](./LICENSE). Please note that the license of the data in `tests/data` follow the license from the official dataset, not the license of this repository. The official dataset's license can be found in the [official dataset page](https://huggingface.co/datasets/McGill-NLP/WebLINX). The license of the models trained using this repo might also differ - please find them in the respective model cards.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/McGill-NLP/weblinx",
    "name": "weblinx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "McGill NLP",
    "author_email": "weblinx@googlegroups.com",
    "download_url": "https://files.pythonhosted.org/packages/4a/c6/5d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2/weblinx-0.3.2.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# WebLINX\n\n| [**\ud83e\udd17Dataset**](https://huggingface.co/datasets/McGill-NLP/WebLINX) | [**\ud83d\udcc4Paper**](https://arxiv.org/abs/2402.05930) | [**\ud83c\udf10Website**](https://mcgill-nlp.github.io/weblinx) | [**\ud83d\udcd3Colab**](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) |\n| :--: | :--: | :--: | :--: |\n| [**\ud83e\udd16Models**](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) | [**\ud83d\udcbbExplorer**](https://huggingface.co/spaces/McGill-NLP/weblinx-explorer) | [**\ud83d\udc26Tweets**](https://twitter.com/sivareddyg/status/1755799365031965140) | [**\ud83c\udfc6Leaderboard**](https://paperswithcode.com/sota/conversational-web-navigation-on-weblinx) |\n\n<br>\n\n**[WebLINX: Real-World Website Navigation with Multi-Turn Dialogue](https://mcgill-nlp.github.io/weblinx)**\\\n*[Xing Han L\u00f9*](https://xinghanlu.com), [Zden\u011bk Kasner*](https://kasnerz.github.io/), [Siva Reddy](https://sivareddy.in)*\\\n_\\*Equal contribution_\\\n**ICML 2024 (Spotlight)**\n\n<img src=\"https://github.com/McGill-NLP/weblinx/raw/main/docs/assets/images/webnav.demo.svg\" width=\"80%\" alt=\"Sample conversation between a user and an agent\" />\n\n</div>\n\n## Intro\n\nWelcome to `WebLINX`'s official repository! In addition to providing code used to train [the models](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434) reported in our [WebLINX paper](https://arxiv.org/abs/2402.05930), we also provide a comprehensive Python library (aka API) to help you work with the [WebLINX dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX). \n\nIf you want to get started with `weblinx`, please check out the following places:\n\n| | | |\n| :---: | :---: | --- |\n| \ud83c\udf10 | [Website](https://mcgill-nlp.github.io/weblinx) | If you want a quick overview of the project, this is the best place to start.|\n| \ud83d\udcd3 | [Colab](https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb) | Eager to try it out? Start by running this colab notebook!|\n| \ud83d\uddc4\ufe0f | [Docs](https://mcgill-nlp.github.io/weblinx/docs/) | You can find quickstart instructions, the official user guide, and all relevant API specifications in the docs. |\n| \ud83d\udcc4 | [Paper](https://arxiv.org/abs/2402.05930) | If you want to get more in-depth, please read our paper, which provides comprehensive description of the project and report relevant results.|\n| \ud83e\udd17 | [Dataset](https://huggingface.co/datasets/McGill-NLP/WebLINX) | The official dataset page, you can download preprocessed dataset and follow instructions to get started.|\n| | | |\n\n*If you want to learn more about the codebase itself, please keep on reading!*\n\n## Installation\n\n```bash\n# Install the base package\npip install weblinx\n\n# Install all dependencies\npip install weblinx[all]\n\n# Install specific dependencies for...\n# ...processing HTML \ud83d\udda5\ufe0f\npip install weblinx[processing]\n# ...video processing \ud83d\udcfd\ufe0f\npip install weblinx[video]\n# ...evaluating models \ud83d\udd2c\npip install weblinx[eval]\n# ...development of this library \ud83d\udee0\ufe0f\npip install weblinx[dev]\n```\n\n## Structure\n\nThis repository is structured in the following way:\n\n| Module | Description |\n| --- | --- |\n| `weblinx` | The `__init__.py` provides many useful abstractions to provide a Pythonic experience when working with the dataset. For example, you can use `weblinx.Demonstration` to manipulate a demonstration at a high-level, `weblinx.Replay` to focus on more finegrained details of the demonstration, including iterating over turns, or `weblinx.Turn` to focus on a specific turn. All relevant information is included in the documentations! |\n| `weblinx.eval` | Code for evaluating action models trained with WebLINX, it has both `import`able functions/metrics, but can also be accessed via command line |\n| `weblinx.processing` | Code for processing various inputs or outputs used by the models, it is extensively used in the models' processing code |\n| `weblinx.utils` | Miscellaneous utility functions used across the codebase. |\n\n## Modeling\n\nOur `modeling/` repo-level directory has code for processing, training and evaluating the models reported in the paper (DMR, LLaMA, MindAct, Pix2Act, Flan-T5). It is separate from the `weblinx` library, which focuses on data processing and evaluation. You can use it by cloning this repository, and it is recommended to edit the files in `modeling/` directly for your own needs. Our modeling code is separate from the `weblinx` library, but requires it as a dependency. You can install the modeling code by running:\n\n```bash\n# First, install the base package\npip install weblinx\n\n# Then, clone this repo\ngit clone https://github.com/McGill-NLP/weblinx\ncd weblinx/modeling\n```\n\nFor the rest of the instructions, please take a look at the [modeling README](./modeling/README.md).\n\n## Evaluation\n\nTo install packages necessary for evaluation, run:\n\n```bash\npip install weblinx[eval]\n```\n\nYou can now access the evaluation module by importing in Python:\n\n```python\nimport weblinx.eval\n```\n\nUse `weblinx.eval.metrics` for evaluation metrics, `weblinx.eval.__init__` for useful evaluation-related functions. You may also find it useful to take a look at `weblinx.processing.outputs` to get an idea of how to use the outputs of the model for evaluation.\n\nTo run the automatic evaluation, you can use the following command:\n\n```bash\npython -m weblinx.eval --help\n```\n\nFor more examples on how to use `weblinx.eval`, take a look at the [modeling README](./modeling/README.md).\n\n> Note: We are still working on the code for `weblinx.eval` and `weblinx.processing.outputs`. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.\n\n## Citations\n\nIf you use this library, please cite our work using the following:\n```bibtex\n@misc{l\u00f92024weblinx,\n      title={WebLINX: Real-World Website Navigation with Multi-Turn Dialogue}, \n      author={Xing Han L\u00f9 and Zden\u011bk Kasner and Siva Reddy},\n      year={2024},\n      eprint={2402.05930},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n## License\n\nThis project's license can be found at [LICENSE](./LICENSE). Please note that the license of the data in `tests/data` follow the license from the official dataset, not the license of this repository. The official dataset's license can be found in the [official dataset page](https://huggingface.co/datasets/McGill-NLP/WebLINX). The license of the models trained using this repo might also differ - please find them in the respective model cards.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "The official weblinx library",
    "version": "0.3.2",
    "project_urls": {
        "Homepage": "https://github.com/McGill-NLP/weblinx"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ea3c940850b54ea8b5927e0b794fda6839a7cf2cc542c03af2388a55f0483d7b",
                "md5": "019379d9b62cd18740b9f3924f1c4066",
                "sha256": "9ab8de1c631617827955debaeb76864b8b3122d230185af1f9c30f5c793e7213"
            },
            "downloads": -1,
            "filename": "weblinx-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "019379d9b62cd18740b9f3924f1c4066",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 82905,
            "upload_time": "2024-10-01T23:55:57",
            "upload_time_iso_8601": "2024-10-01T23:55:57.150357Z",
            "url": "https://files.pythonhosted.org/packages/ea/3c/940850b54ea8b5927e0b794fda6839a7cf2cc542c03af2388a55f0483d7b/weblinx-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ac65d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2",
                "md5": "0bb624c4d318c4647dddc00244b416bf",
                "sha256": "259946c2b08cf50b48929fdd1c17f09ff5808b7d53528d728d25b583b8a06c85"
            },
            "downloads": -1,
            "filename": "weblinx-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "0bb624c4d318c4647dddc00244b416bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 80931,
            "upload_time": "2024-10-01T23:55:58",
            "upload_time_iso_8601": "2024-10-01T23:55:58.618675Z",
            "url": "https://files.pythonhosted.org/packages/4a/c6/5d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2/weblinx-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-01 23:55:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "McGill-NLP",
    "github_project": "weblinx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "weblinx"
}
        
Elapsed time: 1.15026s