forte


Nameforte JSON
Version 0.0.1a3 PyPI version JSON
download
home_pagehttps://github.com/asyml/forte
SummaryForte is extensible framework for building composable and modularized NLP workflows.
upload_time2021-01-13 19:37:19
maintainer
docs_urlNone
author
requires_python
licenseApache License Version 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
   <img src="https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/logo_h.png"><br><br>
</div>

-----------------

[![Build Status](https://travis-ci.org/asyml/forte.svg?branch=master)](https://travis-ci.org/asyml/forte)
[![codecov](https://codecov.io/gh/asyml/forte/branch/master/graph/badge.svg)](https://codecov.io/gh/asyml/forte)
[![Documentation Status](https://readthedocs.org/projects/asyml-forte/badge/?version=latest)](https://asyml-forte.readthedocs.io/en/latest/?badge=latest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/asyml/forte/blob/master/LICENSE)
[![Chat](http://img.shields.io/badge/gitter.im-asyml/forte-blue.svg)](https://gitter.im/asyml/community)


**Forte** is a toolkit for building Natural Language Processing pipelines, featuring cross-task 
interaction, adaptable data-model interfaces and composable pipeline. 
Forte was originally developed in CMU and is actively contributed by [Petuum](https://petuum.com/) 
in collaboration with other institutes.
This project is part of the [CASL Open Source](http://casl-project.ai/) family.

Forte provides a platform to assemble
state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide 
spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural 
Language Generation.  

With Forte, it is extremely simple to build an integrated system that can search documents, 
analyze, extract information and generate language all in one place. This allows developers
to fully utilize the strength of individual module, combine the results from each step, and enables 
the system to make fully informed decision at the end of the pipeline.  

Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check out these [examples](./examples)!),
but also brings technology to you by offering a miscellaneous collection of deep learning modules via Texar, and 
a convenient model-data interface for casting tasks to models.

## Core Design Principles

The core design principle of Forte is the abstraction of NLP concepts and machine learning models. It 
not only separates data, model and tasks but also enables interactions between different components of 
the pipeline. Based on this principle, we make Forte:

* **Composable**: Forte helps users to decompose a problem into *data*, *models* and *tasks*. 
The tasks can further be divided into sub-tasks. A complex use case 
can be solved by composing heterogeneous modules via straightforward python APIs or declarative 
configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly 
swapped in and out, as long as the API contracts are matched. This approach greatly improves module 
reusability, enables fast development and enhances the flexibility of using libraries.

* **Generalizable and Extensible**: Forte not only generalizes well on a wide 
range of NLP tasks, but also extends easily to new tasks or new domains. In particular, Forte 
provides the *Ontology* system that helps users define types according to their specific tasks. 
Users can declaratively specify the type through simple JSON files and our Code Generation tool 
will automatically generate ready-to-use python files for your project. Check out our 
[Ontology Generation documentation](./docs/ontology_generation.md) for more details.

* **Universal Data Flow**: Forte enables a universal data flow that supports seamless data flow between
different steps. Central to Forte's composable architecture, a transparent data flow facilitates flexible 
process interventions and simple pipeline management. Adaptive to generic data formats, Forte is positioned as 
a perfect tool for data inspection, component swapping and result sharing. 
This is particularly helpful during team collaborations!

-----------------
| ![forte_arch.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png) | 
|:--:| 
| *A high level Architecture of Forte showing how ontology and entries work with the pipeline.* |
-----------------
| ![forte_results.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png) | 
|:--:| 
| *Forte stores results in data packs and use the ontology to represent task logic.* |
-----------------

## Package Overview

<table>
<tr>
    <td><b> forte </b></td>
    <td> an open-source toolkit for NLP  </td>
</tr>
<tr>
    <td><b> forte.data.readers </b></td>
    <td> a data module for reading different formats of text data like CoNLL, Ontonotes etc 
    </td>
</tr>
<tr>
    <td><b> forte.processors </b></td>
    <td> a collection of processors for building NLP pipelines </td>
</tr>
<tr>
    <td><b> forte.trainer </b></td>
    <td> a collection of modules for training different NLP tasks </td>
</tr>
<tr>
    <td><b> ft.onto.base_ontology </b></td>
    <td> a module containing basic ontologies like Token, Sentence, Document etc </td>
</tr>
</table>

### Library API Example

A simple code example that runs Named Entity Recognizer

```python
import yaml

from forte.pipeline import Pipeline
from forte.data.readers import CoNLL03Reader
from forte.processors import CoNLLNERPredictor
from ft.onto.base_ontology import Token, Sentence
from forte.common.configuration import Config


config_data = yaml.safe_load(open("config_data.yml", "r"))
config_model = yaml.safe_load(open("config_model.yml", "r"))

config = Config({}, default_hparams=None)
config.add_hparam('config_data', config_data)
config.add_hparam('config_model', config_model)


pl = Pipeline()
pl.set_reader(CoNLL03Reader())
pl.add(CoNLLNERPredictor(), config=config)

pl.initialize()

for pack in pl.process_dataset(config.config_data.test_path):
    for pred_sentence in pack.get_data(context_type=Sentence, request={Token: {"fields": ["ner"]}}):
        print("============================")
        print(pred_sentence["context"])
        print("The entities are...")
        print(pred_sentence["Token"]["ner"])
        print("============================")

```

Find more examples [here](./examples).

### Download and Installation

To install the released version from PyPI:
```bash
pip install forte
```

To install from source, 
```bash
git clone https://github.com/asyml/forte.git
cd forte
pip install .
```

### Getting Started

* [Examples](./examples)
* [Documentation](https://asyml-forte.readthedocs.io/)
* Currently we are working on some interesting [tutorials](https://github.com/asyml/forte/wiki)

### Trouble Shooting
1. If you try to run `generate_ontology` script but encounter the following
    ```
    Traceback (most recent call last):
      File "~/anaconda3/bin/generate_ontology", line 33, in <module>
        sys.exit(load_entry_point('forte', 'console_scripts', 'generate_ontology')())
      File "~/anaconda3/bin/generate_ontology", line 22, in importlib_load_entry_point
        for entry_point in distribution(dist_name).entry_points
      File "~/anaconda3/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 418, in distribution
        return Distribution.from_name(package)
      File "~/anaconda3/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 184, in from_name
        raise PackageNotFoundError(name)
    importlib_metadata.PackageNotFoundError: forte
    ```
    This is likely to be caused by multiple conflicting installation, such as
    installing both from source or from PIP. One way to solve this is to manually
    remove the script `~/anaconda3/bin/generate_ontology` and re-install the package.

### Contributing
If you are interested in making enhancement to Forte, please first go over our [Code of Conduct](https://github.com/asyml/forte/blob/master/CODE_OF_CONDUCT.md) and [Contribution Guideline](https://github.com/asyml/forte/blob/master/CONTRIBUTING.md)

### License

[Apache License 2.0](./LICENSE)

### Companies and Universities Supporting Forte
<p float="left">
   <img src="https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/Petuum.png" width="200" align="top">
   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
   <img src="https://asyml.io/assets/institutions/cmu.png", width="200" align="top">
</p>




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/asyml/forte",
    "name": "forte",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/cf/54/9a4a9fa536f94167b947570d85614b59c66d4770d284eb3c55a9f5caaeaa/forte-0.0.1a3.tar.gz",
    "platform": "any",
    "description": "<div align=\"center\">\n   <img src=\"https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/logo_h.png\"><br><br>\n</div>\n\n-----------------\n\n[![Build Status](https://travis-ci.org/asyml/forte.svg?branch=master)](https://travis-ci.org/asyml/forte)\n[![codecov](https://codecov.io/gh/asyml/forte/branch/master/graph/badge.svg)](https://codecov.io/gh/asyml/forte)\n[![Documentation Status](https://readthedocs.org/projects/asyml-forte/badge/?version=latest)](https://asyml-forte.readthedocs.io/en/latest/?badge=latest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/asyml/forte/blob/master/LICENSE)\n[![Chat](http://img.shields.io/badge/gitter.im-asyml/forte-blue.svg)](https://gitter.im/asyml/community)\n\n\n**Forte** is a toolkit for building Natural Language Processing pipelines, featuring cross-task \ninteraction, adaptable data-model interfaces and composable pipeline. \nForte was originally developed in CMU and is actively contributed by [Petuum](https://petuum.com/) \nin collaboration with other institutes.\nThis project is part of the [CASL Open Source](http://casl-project.ai/) family.\n\nForte provides a platform to assemble\nstate-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide \nspectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural \nLanguage Generation.  \n\nWith Forte, it is extremely simple to build an integrated system that can search documents, \nanalyze, extract information and generate language all in one place. This allows developers\nto fully utilize the strength of individual module, combine the results from each step, and enables \nthe system to make fully informed decision at the end of the pipeline.  \n\nForte not only makes it easy to integrate with arbitrary 3rd party tools (Check out these [examples](./examples)!),\nbut also brings technology to you by offering a miscellaneous collection of deep learning modules via Texar, and \na convenient model-data interface for casting tasks to models.\n\n## Core Design Principles\n\nThe core design principle of Forte is the abstraction of NLP concepts and machine learning models. It \nnot only separates data, model and tasks but also enables interactions between different components of \nthe pipeline. Based on this principle, we make Forte:\n\n* **Composable**: Forte helps users to decompose a problem into *data*, *models* and *tasks*. \nThe tasks can further be divided into sub-tasks. A complex use case \ncan be solved by composing heterogeneous modules via straightforward python APIs or declarative \nconfiguration files. The components (e.g. models or tasks) in the pipeline can be flexibly \nswapped in and out, as long as the API contracts are matched. This approach greatly improves module \nreusability, enables fast development and enhances the flexibility of using libraries.\n\n* **Generalizable and Extensible**: Forte not only generalizes well on a wide \nrange of NLP tasks, but also extends easily to new tasks or new domains. In particular, Forte \nprovides the *Ontology* system that helps users define types according to their specific tasks. \nUsers can declaratively specify the type through simple JSON files and our Code Generation tool \nwill automatically generate ready-to-use python files for your project. Check out our \n[Ontology Generation documentation](./docs/ontology_generation.md) for more details.\n\n* **Universal Data Flow**: Forte enables a universal data flow that supports seamless data flow between\ndifferent steps. Central to Forte's composable architecture, a transparent data flow facilitates flexible \nprocess interventions and simple pipeline management. Adaptive to generic data formats, Forte is positioned as \na perfect tool for data inspection, component swapping and result sharing. \nThis is particularly helpful during team collaborations!\n\n-----------------\n| ![forte_arch.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png) | \n|:--:| \n| *A high level Architecture of Forte showing how ontology and entries work with the pipeline.* |\n-----------------\n| ![forte_results.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png) | \n|:--:| \n| *Forte stores results in data packs and use the ontology to represent task logic.* |\n-----------------\n\n## Package Overview\n\n<table>\n<tr>\n    <td><b> forte </b></td>\n    <td> an open-source toolkit for NLP  </td>\n</tr>\n<tr>\n    <td><b> forte.data.readers </b></td>\n    <td> a data module for reading different formats of text data like CoNLL, Ontonotes etc \n    </td>\n</tr>\n<tr>\n    <td><b> forte.processors </b></td>\n    <td> a collection of processors for building NLP pipelines </td>\n</tr>\n<tr>\n    <td><b> forte.trainer </b></td>\n    <td> a collection of modules for training different NLP tasks </td>\n</tr>\n<tr>\n    <td><b> ft.onto.base_ontology </b></td>\n    <td> a module containing basic ontologies like Token, Sentence, Document etc </td>\n</tr>\n</table>\n\n### Library API Example\n\nA simple code example that runs Named Entity Recognizer\n\n```python\nimport yaml\n\nfrom forte.pipeline import Pipeline\nfrom forte.data.readers import CoNLL03Reader\nfrom forte.processors import CoNLLNERPredictor\nfrom ft.onto.base_ontology import Token, Sentence\nfrom forte.common.configuration import Config\n\n\nconfig_data = yaml.safe_load(open(\"config_data.yml\", \"r\"))\nconfig_model = yaml.safe_load(open(\"config_model.yml\", \"r\"))\n\nconfig = Config({}, default_hparams=None)\nconfig.add_hparam('config_data', config_data)\nconfig.add_hparam('config_model', config_model)\n\n\npl = Pipeline()\npl.set_reader(CoNLL03Reader())\npl.add(CoNLLNERPredictor(), config=config)\n\npl.initialize()\n\nfor pack in pl.process_dataset(config.config_data.test_path):\n    for pred_sentence in pack.get_data(context_type=Sentence, request={Token: {\"fields\": [\"ner\"]}}):\n        print(\"============================\")\n        print(pred_sentence[\"context\"])\n        print(\"The entities are...\")\n        print(pred_sentence[\"Token\"][\"ner\"])\n        print(\"============================\")\n\n```\n\nFind more examples [here](./examples).\n\n### Download and Installation\n\nTo install the released version from PyPI:\n```bash\npip install forte\n```\n\nTo install from source, \n```bash\ngit clone https://github.com/asyml/forte.git\ncd forte\npip install .\n```\n\n### Getting Started\n\n* [Examples](./examples)\n* [Documentation](https://asyml-forte.readthedocs.io/)\n* Currently we are working on some interesting [tutorials](https://github.com/asyml/forte/wiki)\n\n### Trouble Shooting\n1. If you try to run `generate_ontology` script but encounter the following\n    ```\n    Traceback (most recent call last):\n      File \"~/anaconda3/bin/generate_ontology\", line 33, in <module>\n        sys.exit(load_entry_point('forte', 'console_scripts', 'generate_ontology')())\n      File \"~/anaconda3/bin/generate_ontology\", line 22, in importlib_load_entry_point\n        for entry_point in distribution(dist_name).entry_points\n      File \"~/anaconda3/lib/python3.6/site-packages/importlib_metadata/__init__.py\", line 418, in distribution\n        return Distribution.from_name(package)\n      File \"~/anaconda3/lib/python3.6/site-packages/importlib_metadata/__init__.py\", line 184, in from_name\n        raise PackageNotFoundError(name)\n    importlib_metadata.PackageNotFoundError: forte\n    ```\n    This is likely to be caused by multiple conflicting installation, such as\n    installing both from source or from PIP. One way to solve this is to manually\n    remove the script `~/anaconda3/bin/generate_ontology` and re-install the package.\n\n### Contributing\nIf you are interested in making enhancement to Forte, please first go over our [Code of Conduct](https://github.com/asyml/forte/blob/master/CODE_OF_CONDUCT.md) and [Contribution Guideline](https://github.com/asyml/forte/blob/master/CONTRIBUTING.md)\n\n### License\n\n[Apache License 2.0](./LICENSE)\n\n### Companies and Universities Supporting Forte\n<p float=\"left\">\n   <img src=\"https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/Petuum.png\" width=\"200\" align=\"top\">\n   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n   <img src=\"https://asyml.io/assets/institutions/cmu.png\", width=\"200\" align=\"top\">\n</p>\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License Version 2.0",
    "summary": "Forte is extensible framework for building composable and modularized NLP workflows.",
    "version": "0.0.1a3",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "acaae58720e69aae62169e01b372236e",
                "sha256": "91c240166806feaeab97cf0a0f619cf91e82111a40b5de17d545e9b00905783d"
            },
            "downloads": -1,
            "filename": "forte-0.0.1a3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "acaae58720e69aae62169e01b372236e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 417664,
            "upload_time": "2021-01-13T19:37:16",
            "upload_time_iso_8601": "2021-01-13T19:37:16.433217Z",
            "url": "https://files.pythonhosted.org/packages/7a/68/0cc9ecd477bb90e63a22f9972b24cb98586ce214119d81d6661d88c92a20/forte-0.0.1a3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "397a7a31738a0842c31e461d8e5c8ca2",
                "sha256": "3d8b721d3fad47974f6855057687d3cc91fefc47dc9808055a25a8e186105dcf"
            },
            "downloads": -1,
            "filename": "forte-0.0.1a3.tar.gz",
            "has_sig": false,
            "md5_digest": "397a7a31738a0842c31e461d8e5c8ca2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 269353,
            "upload_time": "2021-01-13T19:37:19",
            "upload_time_iso_8601": "2021-01-13T19:37:19.195016Z",
            "url": "https://files.pythonhosted.org/packages/cf/54/9a4a9fa536f94167b947570d85614b59c66d4770d284eb3c55a9f5caaeaa/forte-0.0.1a3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-01-13 19:37:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "asyml",
    "error": "Could not fetch GitHub repository",
    "lcname": "forte"
}
        
Elapsed time: 0.22709s