Trial2Vec

Name	Trial2Vec JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/RyanWangZf/Trial2Vec
Summary	Pretrained BERT models for encoding clinical trial documents to compact embeddings.
upload_time	2023-04-21 15:34:43
maintainer
docs_url	None
author	Zifeng Wang
requires_python
license
keywords	clinical trial machine learning data mining information retrieval
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Trial2Vec
[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/trial2vec)
[![Downloads](https://pepy.tech/badge/trial2vec)](https://pepy.tech/project/trial2vec)
![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/trial2vec)
![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/trial2vec)

Wang, Zifeng and Sun, Jimeng. (2022). Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision. Findings of EMNLP'22.

# News
- 12/8/2022: Support `download_embedding` that obtains the pretrained embedding only. It saves a lot of GPU/CPU memory! Please refer this [example](example/demo_download_embedding.ipynb) for detailed use cases.

```python
from trial2vec import download_embedding
t2v_emb = download_embedding()
```

- 10/27/2022: Support `word_vector` and `sentence_vector`!
```python
# sentence vectors
inputs = ['I am a sentence', 'I am another sentence']
outputs = model.sentence_vector(inputs)
# torch.tensor w/ shape [2, 128]
```

```python
# word vectors
inputs = ['I am a sentence', 'I am another sentence abcdefg xyz']
outputs = model.word_vector(inputs)
# {'word_embs': torch.tensor w/ shape [2, max_token, 128], 'mask': torch.tensor w/ shape [2, max_token]}
```


# Usage
Get pretrained Trial2Vec model in three lines:

```python
from trial2vec import Trial2Vec

model = Trial2Vec()

model.from_pretrained()
```

A jupyter example is shown at https://github.com/RyanWangZf/Trial2Vec/blob/main/example/demo_trial2vec.ipynb.

# How to install
Install the correct `PyTorch` version by referring to https://pytorch.org/get-started/locally/.

Then install `Trial2Vec` by

```bash
# Recommended because it is update to date, small bugs will be kept fixed
pip install git+https://github.com/RyanWangZf/Trial2Vec.git

```

or
```bash

pip install trial2vec

```

# Search similar trials
Use `Trial2Vec` to search similar clinical trials:

```python

# load demo data
from trial2vec import load_demo_data
data = load_demo_data()

# contains trial documents
test_data = {'x': data['x']} 

# make prediction
pred = model.predict(test_data)
```

# Encode trials

Use `Trial2Vec` to encode clinical trial documents:

```python

test_data = {'x': df} # contains trial documents

emb = model.encode(test_data) # make inference

# or just find the pre-encoded trial documents
emb = [model[nct_id] for test_data['x']['nct_id']]
```

# Continue training

One can continue to train the pretrained models on new trials as

```python

# just formulate trial documents as the format of `data`
data = load_demo_data()

model.fit(
    {
    'x':data['x'], # document dataframe
    'fields':data['fields'], # attribute field columns
    'ctx_fields':data['ctx_fields'], # context field columns
    'tag': data['tag'], # nct_id is the unique tag for each trial
    },
    valid_data={
            'x':data['x_val'],
            'y':data['y_val']
        },
)

# save
model.save_model('./finetuned-trial2vec')

```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/RyanWangZf/Trial2Vec",
    "name": "Trial2Vec",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "clinical trial,machine learning,data mining,information retrieval",
    "author": "Zifeng Wang",
    "author_email": "zifengw2@illinois.edu",
    "download_url": "https://files.pythonhosted.org/packages/d8/77/3917d553ab6d02b09d8f415977e8f124e8e89b4c6062184fb8bbbead56e6/Trial2Vec-0.1.0.tar.gz",
    "platform": null,
    "description": "# Trial2Vec\n[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/trial2vec)\n[![Downloads](https://pepy.tech/badge/trial2vec)](https://pepy.tech/project/trial2vec)\n![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/trial2vec)\n![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/trial2vec)\n\nWang, Zifeng and Sun, Jimeng. (2022). Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision. Findings of EMNLP'22.\n\n# News\n- 12/8/2022: Support `download_embedding` that obtains the pretrained embedding only. It saves a lot of GPU/CPU memory! Please refer this [example](example/demo_download_embedding.ipynb) for detailed use cases.\n\n```python\nfrom trial2vec import download_embedding\nt2v_emb = download_embedding()\n```\n\n- 10/27/2022: Support `word_vector` and `sentence_vector`!\n```python\n# sentence vectors\ninputs = ['I am a sentence', 'I am another sentence']\noutputs = model.sentence_vector(inputs)\n# torch.tensor w/ shape [2, 128]\n```\n\n```python\n# word vectors\ninputs = ['I am a sentence', 'I am another sentence abcdefg xyz']\noutputs = model.word_vector(inputs)\n# {'word_embs': torch.tensor w/ shape [2, max_token, 128], 'mask': torch.tensor w/ shape [2, max_token]}\n```\n\n\n# Usage\nGet pretrained Trial2Vec model in three lines:\n\n```python\nfrom trial2vec import Trial2Vec\n\nmodel = Trial2Vec()\n\nmodel.from_pretrained()\n```\n\nA jupyter example is shown at https://github.com/RyanWangZf/Trial2Vec/blob/main/example/demo_trial2vec.ipynb.\n\n# How to install\nInstall the correct `PyTorch` version by referring to https://pytorch.org/get-started/locally/.\n\nThen install `Trial2Vec` by\n\n```bash\n# Recommended because it is update to date, small bugs will be kept fixed\npip install git+https://github.com/RyanWangZf/Trial2Vec.git\n\n```\n\nor\n```bash\n\npip install trial2vec\n\n```\n\n# Search similar trials\nUse `Trial2Vec` to search similar clinical trials:\n\n```python\n\n# load demo data\nfrom trial2vec import load_demo_data\ndata = load_demo_data()\n\n# contains trial documents\ntest_data = {'x': data['x']} \n\n# make prediction\npred = model.predict(test_data)\n```\n\n# Encode trials\n\nUse `Trial2Vec` to encode clinical trial documents:\n\n```python\n\ntest_data = {'x': df} # contains trial documents\n\nemb = model.encode(test_data) # make inference\n\n# or just find the pre-encoded trial documents\nemb = [model[nct_id] for test_data['x']['nct_id']]\n```\n\n# Continue training\n\nOne can continue to train the pretrained models on new trials as\n\n```python\n\n# just formulate trial documents as the format of `data`\ndata = load_demo_data()\n\nmodel.fit(\n    {\n    'x':data['x'], # document dataframe\n    'fields':data['fields'], # attribute field columns\n    'ctx_fields':data['ctx_fields'], # context field columns\n    'tag': data['tag'], # nct_id is the unique tag for each trial\n    },\n    valid_data={\n            'x':data['x_val'],\n            'y':data['y_val']\n        },\n)\n\n# save\nmodel.save_model('./finetuned-trial2vec')\n\n```\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Pretrained BERT models for encoding clinical trial documents to compact embeddings.",
    "version": "0.1.0",
    "split_keywords": [
        "clinical trial",
        "machine learning",
        "data mining",
        "information retrieval"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0a0d15db1ee739865367d5bf4ea02426becaa489cb0188ffb33606db5fcdb9de",
                "md5": "66e74cc9a6ae84878cc68e7d9ac3f733",
                "sha256": "361c88f5a2fc7e74a5bcafceb409b109ce0ddce615721d412d960e684009a815"
            },
            "downloads": -1,
            "filename": "Trial2Vec-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66e74cc9a6ae84878cc68e7d9ac3f733",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27974,
            "upload_time": "2023-04-21T15:34:39",
            "upload_time_iso_8601": "2023-04-21T15:34:39.822147Z",
            "url": "https://files.pythonhosted.org/packages/0a/0d/15db1ee739865367d5bf4ea02426becaa489cb0188ffb33606db5fcdb9de/Trial2Vec-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d8773917d553ab6d02b09d8f415977e8f124e8e89b4c6062184fb8bbbead56e6",
                "md5": "0ea10528fa8ca855c326ee5664bbcd72",
                "sha256": "c1341bd852c598a6a02aefa43e38f3e3f80a17299469da7c9c57174979b927e3"
            },
            "downloads": -1,
            "filename": "Trial2Vec-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0ea10528fa8ca855c326ee5664bbcd72",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 24307,
            "upload_time": "2023-04-21T15:34:43",
            "upload_time_iso_8601": "2023-04-21T15:34:43.439483Z",
            "url": "https://files.pythonhosted.org/packages/d8/77/3917d553ab6d02b09d8f415977e8f124e8e89b4c6062184fb8bbbead56e6/Trial2Vec-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-21 15:34:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "RyanWangZf",
    "github_project": "Trial2Vec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "trial2vec"
}

Zifeng Wang