# Trial2Vec
[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/trial2vec)
[![Downloads](https://pepy.tech/badge/trial2vec)](https://pepy.tech/project/trial2vec)
![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/trial2vec)
![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/trial2vec)
Wang, Zifeng and Sun, Jimeng. (2022). Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision. Findings of EMNLP'22.
# News
- 12/8/2022: Support `download_embedding` that obtains the pretrained embedding only. It saves a lot of GPU/CPU memory! Please refer this [example](example/demo_download_embedding.ipynb) for detailed use cases.
```python
from trial2vec import download_embedding
t2v_emb = download_embedding()
```
- 10/27/2022: Support `word_vector` and `sentence_vector`!
```python
# sentence vectors
inputs = ['I am a sentence', 'I am another sentence']
outputs = model.sentence_vector(inputs)
# torch.tensor w/ shape [2, 128]
```
```python
# word vectors
inputs = ['I am a sentence', 'I am another sentence abcdefg xyz']
outputs = model.word_vector(inputs)
# {'word_embs': torch.tensor w/ shape [2, max_token, 128], 'mask': torch.tensor w/ shape [2, max_token]}
```
# Usage
Get pretrained Trial2Vec model in three lines:
```python
from trial2vec import Trial2Vec
model = Trial2Vec()
model.from_pretrained()
```
A jupyter example is shown at https://github.com/RyanWangZf/Trial2Vec/blob/main/example/demo_trial2vec.ipynb.
# How to install
Install the correct `PyTorch` version by referring to https://pytorch.org/get-started/locally/.
Then install `Trial2Vec` by
```bash
# Recommended because it is update to date, small bugs will be kept fixed
pip install git+https://github.com/RyanWangZf/Trial2Vec.git
```
or
```bash
pip install trial2vec
```
# Search similar trials
Use `Trial2Vec` to search similar clinical trials:
```python
# load demo data
from trial2vec import load_demo_data
data = load_demo_data()
# contains trial documents
test_data = {'x': data['x']}
# make prediction
pred = model.predict(test_data)
```
# Encode trials
Use `Trial2Vec` to encode clinical trial documents:
```python
test_data = {'x': df} # contains trial documents
emb = model.encode(test_data) # make inference
# or just find the pre-encoded trial documents
emb = [model[nct_id] for test_data['x']['nct_id']]
```
# Continue training
One can continue to train the pretrained models on new trials as
```python
# just formulate trial documents as the format of `data`
data = load_demo_data()
model.fit(
{
'x':data['x'], # document dataframe
'fields':data['fields'], # attribute field columns
'ctx_fields':data['ctx_fields'], # context field columns
'tag': data['tag'], # nct_id is the unique tag for each trial
},
valid_data={
'x':data['x_val'],
'y':data['y_val']
},
)
# save
model.save_model('./finetuned-trial2vec')
```
Raw data
{
"_id": null,
"home_page": "https://github.com/RyanWangZf/Trial2Vec",
"name": "Trial2Vec",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "clinical trial,machine learning,data mining,information retrieval",
"author": "Zifeng Wang",
"author_email": "zifengw2@illinois.edu",
"download_url": "https://files.pythonhosted.org/packages/d8/77/3917d553ab6d02b09d8f415977e8f124e8e89b4c6062184fb8bbbead56e6/Trial2Vec-0.1.0.tar.gz",
"platform": null,
"description": "# Trial2Vec\n[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/trial2vec)\n[![Downloads](https://pepy.tech/badge/trial2vec)](https://pepy.tech/project/trial2vec)\n![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/trial2vec)\n![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/trial2vec)\n\nWang, Zifeng and Sun, Jimeng. (2022). Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision. Findings of EMNLP'22.\n\n# News\n- 12/8/2022: Support `download_embedding` that obtains the pretrained embedding only. It saves a lot of GPU/CPU memory! Please refer this [example](example/demo_download_embedding.ipynb) for detailed use cases.\n\n```python\nfrom trial2vec import download_embedding\nt2v_emb = download_embedding()\n```\n\n- 10/27/2022: Support `word_vector` and `sentence_vector`!\n```python\n# sentence vectors\ninputs = ['I am a sentence', 'I am another sentence']\noutputs = model.sentence_vector(inputs)\n# torch.tensor w/ shape [2, 128]\n```\n\n```python\n# word vectors\ninputs = ['I am a sentence', 'I am another sentence abcdefg xyz']\noutputs = model.word_vector(inputs)\n# {'word_embs': torch.tensor w/ shape [2, max_token, 128], 'mask': torch.tensor w/ shape [2, max_token]}\n```\n\n\n# Usage\nGet pretrained Trial2Vec model in three lines:\n\n```python\nfrom trial2vec import Trial2Vec\n\nmodel = Trial2Vec()\n\nmodel.from_pretrained()\n```\n\nA jupyter example is shown at https://github.com/RyanWangZf/Trial2Vec/blob/main/example/demo_trial2vec.ipynb.\n\n# How to install\nInstall the correct `PyTorch` version by referring to https://pytorch.org/get-started/locally/.\n\nThen install `Trial2Vec` by\n\n```bash\n# Recommended because it is update to date, small bugs will be kept fixed\npip install git+https://github.com/RyanWangZf/Trial2Vec.git\n\n```\n\nor\n```bash\n\npip install trial2vec\n\n```\n\n# Search similar trials\nUse `Trial2Vec` to search similar clinical trials:\n\n```python\n\n# load demo data\nfrom trial2vec import load_demo_data\ndata = load_demo_data()\n\n# contains trial documents\ntest_data = {'x': data['x']} \n\n# make prediction\npred = model.predict(test_data)\n```\n\n# Encode trials\n\nUse `Trial2Vec` to encode clinical trial documents:\n\n```python\n\ntest_data = {'x': df} # contains trial documents\n\nemb = model.encode(test_data) # make inference\n\n# or just find the pre-encoded trial documents\nemb = [model[nct_id] for test_data['x']['nct_id']]\n```\n\n# Continue training\n\nOne can continue to train the pretrained models on new trials as\n\n```python\n\n# just formulate trial documents as the format of `data`\ndata = load_demo_data()\n\nmodel.fit(\n {\n 'x':data['x'], # document dataframe\n 'fields':data['fields'], # attribute field columns\n 'ctx_fields':data['ctx_fields'], # context field columns\n 'tag': data['tag'], # nct_id is the unique tag for each trial\n },\n valid_data={\n 'x':data['x_val'],\n 'y':data['y_val']\n },\n)\n\n# save\nmodel.save_model('./finetuned-trial2vec')\n\n```\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Pretrained BERT models for encoding clinical trial documents to compact embeddings.",
"version": "0.1.0",
"split_keywords": [
"clinical trial",
"machine learning",
"data mining",
"information retrieval"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0a0d15db1ee739865367d5bf4ea02426becaa489cb0188ffb33606db5fcdb9de",
"md5": "66e74cc9a6ae84878cc68e7d9ac3f733",
"sha256": "361c88f5a2fc7e74a5bcafceb409b109ce0ddce615721d412d960e684009a815"
},
"downloads": -1,
"filename": "Trial2Vec-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "66e74cc9a6ae84878cc68e7d9ac3f733",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 27974,
"upload_time": "2023-04-21T15:34:39",
"upload_time_iso_8601": "2023-04-21T15:34:39.822147Z",
"url": "https://files.pythonhosted.org/packages/0a/0d/15db1ee739865367d5bf4ea02426becaa489cb0188ffb33606db5fcdb9de/Trial2Vec-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d8773917d553ab6d02b09d8f415977e8f124e8e89b4c6062184fb8bbbead56e6",
"md5": "0ea10528fa8ca855c326ee5664bbcd72",
"sha256": "c1341bd852c598a6a02aefa43e38f3e3f80a17299469da7c9c57174979b927e3"
},
"downloads": -1,
"filename": "Trial2Vec-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "0ea10528fa8ca855c326ee5664bbcd72",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24307,
"upload_time": "2023-04-21T15:34:43",
"upload_time_iso_8601": "2023-04-21T15:34:43.439483Z",
"url": "https://files.pythonhosted.org/packages/d8/77/3917d553ab6d02b09d8f415977e8f124e8e89b4c6062184fb8bbbead56e6/Trial2Vec-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-21 15:34:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "RyanWangZf",
"github_project": "Trial2Vec",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "trial2vec"
}