harmonydata


Nameharmonydata JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryHarmony Tool for Retrospective Data Harmonisation
upload_time2024-11-26 15:10:03
maintainerNone
docs_urlNone
authorNone
requires_python<=3.13.0,>=3.6
licenseMIT License Copyright (c) 2023 Ulster University. Information at: https://harmonydata.ac.uk (maintainer: Thomas Wood, https://fastdatascience.com/harmony-wellcome-data-prize/) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords harmony harmonisation harmonization harmonise
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![The Harmony Project logo](https://raw.githubusercontent.com/harmonydata/brand/main/Logo/PNG/%D0%BB%D0%BE%D0%B3%D0%BE%20%D1%84%D1%83%D0%BB-05.png)

<a href="https://harmonydata.ac.uk"><span align="left">🌐 harmonydata.ac.uk</span></a>
<a href="https://www.linkedin.com/company/harmonydata"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/linkedin.svg" alt="Harmony | LinkedIn" width="21px"/></a>
<a href="https://twitter.com/harmony_data"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/x.svg" alt="Harmony | X" width="21px"/></a>
<a href="https://www.instagram.com/harmonydata/"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/instagram.svg" alt="Harmony | Instagram" width="21px"/></a>
<a href="https://www.facebook.com/people/Harmony-Project/100086772661697/"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/fb.svg" alt="Harmony | Facebook" width="21px"/></a>
<a href="https://www.youtube.com/channel/UCraLlfBr0jXwap41oQ763OQ"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/yt.svg" alt="Harmony | YouTube" width="21px"/></a>

 [![Harmony on Twitter](https://img.shields.io/twitter/follow/harmony_data.svg?style=social&label=Follow)](https://twitter.com/harmony_data) 


# Harmony Python library

<!-- badges: start -->
[![PyPI package](https://img.shields.io/badge/pip%20install-harmonydata-brightgreen)](https://pypi.org/project/harmonydata/) ![my badge](https://badgen.net/badge/Status/In%20Development/orange) [![License](https://img.shields.io/github/license/harmonydata/harmony)](https://github.com/harmonydata/harmony/blob/main/LICENSE)
[![tests](https://github.com/harmonydata/harmony/actions/workflows/test.yml/badge.svg)](https://github.com/harmonydata/harmony/actions/workflows/test.yml)
[![Current Release Version](https://img.shields.io/github/release/harmonydata/harmony.svg?style=flat-square&logo=github)](https://github.com/harmonydata/harmony/releases)
[![pypi Version](https://img.shields.io/pypi/v/harmonydata.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/harmonydata/)
 [![version number](https://img.shields.io/pypi/v/harmonydata?color=green&label=version)](https://github.com/harmonydata/harmony/releases) [![PyPi downloads](https://static.pepy.tech/personalized-badge/harmonydata?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/harmonydata/)
[![forks](https://img.shields.io/github/forks/harmonydata/harmony)](https://github.com/harmonydata/harmony/forks)
[![docker](https://img.shields.io/badge/docker-pull-blue.svg?logo=docker&logoColor=white)](https://hub.docker.com/r/harmonydata/harmonyapi)

You can also join [our Discord server](https://discord.gg/harmonydata)! If you found Harmony helpful, you can [leave us a review](https://g.page/r/CaRWc2ViO653EBM/review)!

# What does Harmony do?

* Psychologists and social scientists often have to match items in different questionnaires, such as "I often feel anxious" and "Feeling nervous, anxious or afraid". 
* This is called **harmonisation**.
* Harmonisation is a time consuming and subjective process.
* Going through long PDFs of questionnaires and putting the questions into Excel is no fun.
* Enter [Harmony](https://harmonydata.ac.uk/app), a tool that uses [natural language processing](naturallanguageprocessing.com) and generative AI models to help researchers harmonise questionnaire items, even in different languages.

# Quick start with the code

[Read our guide to contributing to Harmony here](https://harmonydata.ac.uk/contributing-to-harmony/) or read [CONTRIBUTING.md](./CONTRIBUTING.md).

You can run the walkthrough Python notebook in [Google Colab](https://colab.research.google.com/github/harmonydata/harmony/blob/main/Harmony_example_walkthrough.ipynb) with a single click: <a href="https://colab.research.google.com/github/harmonydata/harmony/blob/main/Harmony_example_walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

You can also download an R markdown notebook to run in R Studio: <a href="https://harmonydata.ac.uk/harmony_r_example.nb.html" target="_parent"><img src="https://img.shields.io/badge/RStudio-4285F4" alt="Open In R Studio"/></a>

You can run the walkthrough R notebook in Google Colab with a single click: <a href="https://colab.research.google.com/github/harmonydata/experiments/blob/main/Harmony_R_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> [View the PDF documentation of the R package on CRAN](https://cran.r-project.org/web/packages/harmonydata/harmonydata.pdf)

# Looking for examples?

Check out our examples repository at [https://github.com/harmonydata/harmony_examples](https://github.com/harmonydata/harmony_examples)


<!-- badges: end -->

# The Harmony Project

Harmony is a tool using AI which allows you to compare items from questionnaires and identify similar content. You can try Harmony at https://harmonydata.ac.uk/app and you can read our blog at https://harmonydata.ac.uk/blog/.

## Who to contact?

You can contact Harmony team at https://harmonydata.ac.uk/, or Thomas Wood at https://fastdatascience.com/.

## πŸ–₯ Installation instructions (video)

[![Installing Harmony](https://raw.githubusercontent.com/harmonydata/.github/main/profile/installation_video.jpg)](https://www.youtube.com/watch?v=enWh0-4I0Sg "Installing Harmony")

## πŸ–± Looking to try Harmony in the browser?

Visit: https://harmonydata.ac.uk/app/

You can also visit our blog at https://harmonydata.ac.uk/

## βœ… You need Tika if you want to extract instruments from PDFs

Download and install Java if you don't have it already. Download and install Apache Tika and run it on your computer https://tika.apache.org/download.html

```
java -jar tika-server-standard-2.3.0.jar
```

## Requirements

You need a Windows, Linux or Mac system with

* Python 3.8 or above
* the requirements in [requirements.txt](./requirements.txt)
* Java (if you want to extract items from PDFs)
* [Apache Tika](https://tika.apache.org/download.html) (if you want to extract items from PDFs)

## πŸ–₯ Installing Harmony Python package

You can install from [PyPI](https://pypi.org/project/harmonydata/).

```
pip install harmonydata
```

## Loading all models

Harmony uses spaCy to help with text extraction from PDFs. spaCy models can be downloaded with the following command in Python:

```
import harmony
harmony.download_models()
```

## Matching example instruments

```
instruments = harmony.example_instruments["CES_D English"], harmony.example_instruments["GAD-7 Portuguese"]
questions, similarity, query_similarity, new_vectors_dict = harmony.match_instruments(instruments)
```

## How to load a PDF, Excel or Word into an instrument

```
harmony.load_instruments_from_local_file("gad-7.pdf")
```

## Optional environment variables

As an alternative to downloading models, you can set environment variables so that Harmony calls spaCy on a remote server. This is only necessary if you are making a server deployment of Harmony.

* `HARMONY_SPACY_PATH` - determines where model files are stored. Defaults to `HOME DIRECTORY/harmony`
* `HARMONY_DATA_PATH` - determines where data files are stored. Defaults to `HOME DIRECTORY/harmony`
* `HARMONY_NO_PARSING` - set to 1 to import a lightweight variant of Harmony which doesn't support PDF parsing.
* `HARMONY_NO_MATCHING` - set to 1 to import a lightweight variant of Harmony which doesn't support matching.

## Creating instruments from a list of strings

You can also create instruments quickly from a list of strings

```
from harmony import create_instrument_from_list, match_instruments
instrument1 = create_instrument_from_list(["I feel anxious", "I feel nervous"])
instrument2 = create_instrument_from_list(["I feel afraid", "I feel worried"])
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments([instrument1, instrument2])
```

## Loading instruments from PDFs

If you have a local file, you can load it into a list of `Instrument` instances:

```
from harmony import load_instruments_from_local_file
instruments = load_instruments_from_local_file("gad-7.pdf")
```

## Matching instruments

Once you have some instruments, you can match them with each other with a call to `match_instruments`.

```
from harmony import match_instruments
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments(instruments)
```

* `all_questions` is a list of the questions passed to Harmony, in order.
* `similarity` is the similarity matrix returned by Harmony.
* `query_similarity` is the degree of similarity of each item to an optional query passed as argument to `match_instruments`.

## β‡—β‡— Using a different vectorisation function

Harmony defaults to `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` ([HuggingFace link](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)). However you can use other sentence transformers from HuggingFace by setting the environment `HARMONY_SENTENCE_TRANSFORMER_PATH` before importing Harmony:

```
export HARMONY_SENTENCE_TRANSFORMER_PATH=sentence-transformers/distiluse-base-multilingual-cased-v2
```

## Using OpenAI or other LLMs for vectorisation

Any word vector representation can be used by Harmony. The below example works for OpenAI's [text-embedding-ada-002](https://openai.com/blog/new-and-improved-embedding-model) model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.

```
import numpy as np
from harmony import match_instruments_with_function, example_instruments
from openai import OpenAI

client = OpenAI()
model_name = "text-embedding-ada-002"
def convert_texts_to_vector(texts):
    vectors = client.embeddings.create(input = texts, model=model_name).data
    return np.asarray([vectors[i].embedding for i in range(len(vectors))])
instruments = example_instruments["CES_D English"], example_instruments["GAD-7 Portuguese"]
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments_with_function(instruments, None, convert_texts_to_vector)
```
 
## πŸ’» Do you want to run Harmony in your browser locally?

Download and install Docker:

* https://docs.docker.com/desktop/install/mac-install/
* https://docs.docker.com/desktop/install/windows-install/
* https://docs.docker.com/desktop/install/linux-install/

Open a Terminal and run

```
docker run -p 8000:8000 -p 3000:3000 harmonydata/harmonylocal
```

Then go to http://localhost:3000 in your browser.

## Looking for the Harmony API?

Visit: https://github.com/harmonydata/harmonyapi

* πŸ“° The code for training the PDF extraction is here: https://github.com/harmonydata/pdf-questionnaire-extraction

## Docker images

If you are a Docker user, you can run Harmony from a pre-built Docker image.

* https://hub.docker.com/repository/docker/harmonydata/harmonyapi - just the Harmony API
* https://hub.docker.com/repository/docker/harmonydata/harmonylocal - Harmony API and React front end

## Contributing to Harmony

If you'd like to contribute to this project, you can contact us at https://harmonydata.ac.uk/ or make a pull request on our [Github repository](https://github.com/harmonydata/harmonyapi). You can also [raise an issue](https://github.com/harmonydata/harmony/issues). 

## Developing Harmony

### πŸ§ͺ Automated tests

Test code is in **tests/** folder using [unittest](https://docs.python.org/3/library/unittest.html).

The testing tool `tox` is used in the automation with GitHub Actions CI/CD. **Since the PDF extraction also needs Java and Tika installed, you cannot run the unit tests without first installing Java and Tika. See above for instructions.**

### πŸ§ͺ Use tox locally

Install tox and run it:

```
pip install tox
tox
```

In our configuration, tox runs a check of source distribution using [check-manifest](https://pypi.org/project/check-manifest/) (which requires your repo to be git-initialized (`git init`) and added (`git add .`) at least), setuptools's check, and unit tests using pytest. You don't need to install check-manifest and pytest though, tox will install them in a separate environment.

The automated tests are run against several Python versions, but on your machine, you might be using only one version of Python, if that is Python 3.9, then run:

```
tox -e py39
```

Thanks to GitHub Actions' automated process, you don't need to generate distribution files locally. 

### βš™οΈContinuous integration/deployment to PyPI

This package is based on the template https://pypi.org/project/example-pypi-package/

This package

- uses GitHub Actions for both testing and publishing
- is tested when pushing `master` or `main` branch, and is published when create a release
- includes test files in the source distribution
- uses **setup.cfg** for [version single-sourcing](https://packaging.python.org/guides/single-sourcing-package-version/) (setuptools 46.4.0+)

## βš™οΈRe-releasing the package manually

The code to re-release Harmony on PyPI is as follows:

```
source activate py311
pip install twine
rm -rf dist
python setup.py sdist
twine upload dist/*
```

## β€ŽπŸ˜ƒπŸ’ Who worked on Harmony?

Harmony is a collaboration project between [Ulster University](https://ulster.ac.uk/), [University College London](https://ucl.ac.uk/), the [Universidade Federal de Santa Maria](https://www.ufsm.br/), and [Fast Data Science](http://fastdatascience.com/).  Harmony is funded by [Wellcome](https://wellcome.org/) as part of the [Wellcome Data Prize in Mental Health](https://wellcome.org/grant-funding/schemes/wellcome-mental-health-data-prize).

The core team at Harmony is made up of:

* [Dr Bettina Moltrecht, PhD](https://profiles.ucl.ac.uk/60736-bettina-moltrecht) (UCL)
* [Dr Eoin McElroy](https://www.ulster.ac.uk/staff/e-mcelroy) (University of Ulster)
* [Dr George Ploubidis](https://profiles.ucl.ac.uk/48171-george-ploubidis) (UCL)
* [Dr Mauricio Scopel Hoffmann](https://ufsmpublica.ufsm.br/docente/18264) (Universidade Federal de Santa Maria, Brazil)
* [Thomas Wood](https://freelancedatascientist.net/) ([Fast Data Science](https://fastdatascience.com))

## πŸ“œ License

MIT License. Copyright (c) 2023 Ulster University (https://www.ulster.ac.uk)

## πŸ“œ How do I cite Harmony?

You can cite our validation paper:

 McElroy, Wood, Bond, Mulvenna, Shevlin, Ploubidis, Scopel Hoffmann, Moltrecht, [Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data](https://bmcpsychiatry.biomedcentral.com/articles/10.1186/s12888-024-05954-2#citeas). BMC Psychiatry 24, 530 (2024), https://doi.org/10.1186/s12888-024-05954-2
 

A BibTeX entry for LaTeX users is

```
@article{mcelroy2024using,
  title={Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data},
  author={McElroy, Eoin and Wood, Thomas and Bond, Raymond and Mulvenna, Maurice and Shevlin, Mark and Ploubidis, George B and Hoffmann, Mauricio Scopel and Moltrecht, Bettina},
  journal={BMC psychiatry},
  volume={24},
  number={1},
  pages={530},
  year={2024},
  publisher={Springer}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "harmonydata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.13.0,>=3.6",
    "maintainer_email": "Thomas Wood <thomas@fastdatascience.com>",
    "keywords": "harmony, harmonisation, harmonization, harmonise",
    "author": null,
    "author_email": "Thomas Wood <thomas@fastdatascience.com>",
    "download_url": "https://files.pythonhosted.org/packages/14/21/659305f4b7d4c013ceaeade662ed1ade4fbc843b75103483646ba02c9dbb/harmonydata-1.0.1.tar.gz",
    "platform": null,
    "description": "![The Harmony Project logo](https://raw.githubusercontent.com/harmonydata/brand/main/Logo/PNG/%D0%BB%D0%BE%D0%B3%D0%BE%20%D1%84%D1%83%D0%BB-05.png)\n\n<a href=\"https://harmonydata.ac.uk\"><span align=\"left\">\ud83c\udf10 harmonydata.ac.uk</span></a>\n<a href=\"https://www.linkedin.com/company/harmonydata\"><img align=\"left\" src=\"https://raw.githubusercontent.com//harmonydata/.github/main/profile/linkedin.svg\" alt=\"Harmony | LinkedIn\" width=\"21px\"/></a>\n<a href=\"https://twitter.com/harmony_data\"><img align=\"left\" src=\"https://raw.githubusercontent.com//harmonydata/.github/main/profile/x.svg\" alt=\"Harmony | X\" width=\"21px\"/></a>\n<a href=\"https://www.instagram.com/harmonydata/\"><img align=\"left\" src=\"https://raw.githubusercontent.com//harmonydata/.github/main/profile/instagram.svg\" alt=\"Harmony | Instagram\" width=\"21px\"/></a>\n<a href=\"https://www.facebook.com/people/Harmony-Project/100086772661697/\"><img align=\"left\" src=\"https://raw.githubusercontent.com//harmonydata/.github/main/profile/fb.svg\" alt=\"Harmony | Facebook\" width=\"21px\"/></a>\n<a href=\"https://www.youtube.com/channel/UCraLlfBr0jXwap41oQ763OQ\"><img align=\"left\" src=\"https://raw.githubusercontent.com//harmonydata/.github/main/profile/yt.svg\" alt=\"Harmony | YouTube\" width=\"21px\"/></a>\n\n [![Harmony on Twitter](https://img.shields.io/twitter/follow/harmony_data.svg?style=social&label=Follow)](https://twitter.com/harmony_data) \n\n\n# Harmony Python library\n\n<!-- badges: start -->\n[![PyPI package](https://img.shields.io/badge/pip%20install-harmonydata-brightgreen)](https://pypi.org/project/harmonydata/) ![my badge](https://badgen.net/badge/Status/In%20Development/orange) [![License](https://img.shields.io/github/license/harmonydata/harmony)](https://github.com/harmonydata/harmony/blob/main/LICENSE)\n[![tests](https://github.com/harmonydata/harmony/actions/workflows/test.yml/badge.svg)](https://github.com/harmonydata/harmony/actions/workflows/test.yml)\n[![Current Release Version](https://img.shields.io/github/release/harmonydata/harmony.svg?style=flat-square&logo=github)](https://github.com/harmonydata/harmony/releases)\n[![pypi Version](https://img.shields.io/pypi/v/harmonydata.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/harmonydata/)\n [![version number](https://img.shields.io/pypi/v/harmonydata?color=green&label=version)](https://github.com/harmonydata/harmony/releases) [![PyPi downloads](https://static.pepy.tech/personalized-badge/harmonydata?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/harmonydata/)\n[![forks](https://img.shields.io/github/forks/harmonydata/harmony)](https://github.com/harmonydata/harmony/forks)\n[![docker](https://img.shields.io/badge/docker-pull-blue.svg?logo=docker&logoColor=white)](https://hub.docker.com/r/harmonydata/harmonyapi)\n\nYou can also join [our Discord server](https://discord.gg/harmonydata)! If you found Harmony helpful, you can [leave us a review](https://g.page/r/CaRWc2ViO653EBM/review)!\n\n# What does Harmony do?\n\n* Psychologists and social scientists often have to match items in different questionnaires, such as \"I often feel anxious\" and \"Feeling nervous, anxious or afraid\". \n* This is called **harmonisation**.\n* Harmonisation is a time consuming and subjective process.\n* Going through long PDFs of questionnaires and putting the questions into Excel is no fun.\n* Enter [Harmony](https://harmonydata.ac.uk/app), a tool that uses [natural language processing](naturallanguageprocessing.com) and generative AI models to help researchers harmonise questionnaire items, even in different languages.\n\n# Quick start with the code\n\n[Read our guide to contributing to Harmony here](https://harmonydata.ac.uk/contributing-to-harmony/) or read [CONTRIBUTING.md](./CONTRIBUTING.md).\n\nYou can run the walkthrough Python notebook in [Google Colab](https://colab.research.google.com/github/harmonydata/harmony/blob/main/Harmony_example_walkthrough.ipynb) with a single click: <a href=\"https://colab.research.google.com/github/harmonydata/harmony/blob/main/Harmony_example_walkthrough.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\nYou can also download an R markdown notebook to run in R Studio: <a href=\"https://harmonydata.ac.uk/harmony_r_example.nb.html\" target=\"_parent\"><img src=\"https://img.shields.io/badge/RStudio-4285F4\" alt=\"Open In R Studio\"/></a>\n\nYou can run the walkthrough R notebook in Google Colab with a single click: <a href=\"https://colab.research.google.com/github/harmonydata/experiments/blob/main/Harmony_R_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a> [View the PDF documentation of the R package on CRAN](https://cran.r-project.org/web/packages/harmonydata/harmonydata.pdf)\n\n# Looking for examples?\n\nCheck out our examples repository at [https://github.com/harmonydata/harmony_examples](https://github.com/harmonydata/harmony_examples)\n\n\n<!-- badges: end -->\n\n# The Harmony Project\n\nHarmony is a tool using AI which allows you to compare items from questionnaires and identify similar content. You can try Harmony at https://harmonydata.ac.uk/app and you can read our blog at https://harmonydata.ac.uk/blog/.\n\n## Who to contact?\n\nYou can contact Harmony team at https://harmonydata.ac.uk/, or Thomas Wood at https://fastdatascience.com/.\n\n## \ud83d\udda5 Installation instructions (video)\n\n[![Installing Harmony](https://raw.githubusercontent.com/harmonydata/.github/main/profile/installation_video.jpg)](https://www.youtube.com/watch?v=enWh0-4I0Sg \"Installing Harmony\")\n\n## \ud83d\uddb1 Looking to try Harmony in the browser?\n\nVisit: https://harmonydata.ac.uk/app/\n\nYou can also visit our blog at https://harmonydata.ac.uk/\n\n## \u2705 You need Tika if you want to extract instruments from PDFs\n\nDownload and install Java if you don't have it already. Download and install Apache Tika and run it on your computer https://tika.apache.org/download.html\n\n```\njava -jar tika-server-standard-2.3.0.jar\n```\n\n## Requirements\n\nYou need a Windows, Linux or Mac system with\n\n* Python 3.8 or above\n* the requirements in [requirements.txt](./requirements.txt)\n* Java (if you want to extract items from PDFs)\n* [Apache Tika](https://tika.apache.org/download.html) (if you want to extract items from PDFs)\n\n## \ud83d\udda5 Installing Harmony Python package\n\nYou can install from [PyPI](https://pypi.org/project/harmonydata/).\n\n```\npip install harmonydata\n```\n\n## Loading all models\n\nHarmony uses spaCy to help with text extraction from PDFs. spaCy models can be downloaded with the following command in Python:\n\n```\nimport harmony\nharmony.download_models()\n```\n\n## Matching example instruments\n\n```\ninstruments = harmony.example_instruments[\"CES_D English\"], harmony.example_instruments[\"GAD-7 Portuguese\"]\nquestions, similarity, query_similarity, new_vectors_dict = harmony.match_instruments(instruments)\n```\n\n## How to load a PDF, Excel or Word into an instrument\n\n```\nharmony.load_instruments_from_local_file(\"gad-7.pdf\")\n```\n\n## Optional environment variables\n\nAs an alternative to downloading models, you can set environment variables so that Harmony calls spaCy on a remote server. This is only necessary if you are making a server deployment of Harmony.\n\n* `HARMONY_SPACY_PATH` - determines where model files are stored. Defaults to `HOME DIRECTORY/harmony`\n* `HARMONY_DATA_PATH` - determines where data files are stored. Defaults to `HOME DIRECTORY/harmony`\n* `HARMONY_NO_PARSING` - set to 1 to import a lightweight variant of Harmony which doesn't support PDF parsing.\n* `HARMONY_NO_MATCHING` - set to 1 to import a lightweight variant of Harmony which doesn't support matching.\n\n## Creating instruments from a list of strings\n\nYou can also create instruments quickly from a list of strings\n\n```\nfrom harmony import create_instrument_from_list, match_instruments\ninstrument1 = create_instrument_from_list([\"I feel anxious\", \"I feel nervous\"])\ninstrument2 = create_instrument_from_list([\"I feel afraid\", \"I feel worried\"])\nall_questions, similarity, query_similarity, new_vectors_dict = match_instruments([instrument1, instrument2])\n```\n\n## Loading instruments from PDFs\n\nIf you have a local file, you can load it into a list of `Instrument` instances:\n\n```\nfrom harmony import load_instruments_from_local_file\ninstruments = load_instruments_from_local_file(\"gad-7.pdf\")\n```\n\n## Matching instruments\n\nOnce you have some instruments, you can match them with each other with a call to `match_instruments`.\n\n```\nfrom harmony import match_instruments\nall_questions, similarity, query_similarity, new_vectors_dict = match_instruments(instruments)\n```\n\n* `all_questions` is a list of the questions passed to Harmony, in order.\n* `similarity` is the similarity matrix returned by Harmony.\n* `query_similarity` is the degree of similarity of each item to an optional query passed as argument to `match_instruments`.\n\n## \u21d7\u21d7 Using a different vectorisation function\n\nHarmony defaults to `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` ([HuggingFace link](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)). However you can use other sentence transformers from HuggingFace by setting the environment `HARMONY_SENTENCE_TRANSFORMER_PATH` before importing Harmony:\n\n```\nexport HARMONY_SENTENCE_TRANSFORMER_PATH=sentence-transformers/distiluse-base-multilingual-cased-v2\n```\n\n## Using OpenAI or other LLMs for vectorisation\n\nAny word vector representation can be used by Harmony. The below example works for OpenAI's [text-embedding-ada-002](https://openai.com/blog/new-and-improved-embedding-model) model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.\n\n```\nimport numpy as np\nfrom harmony import match_instruments_with_function, example_instruments\nfrom openai import OpenAI\n\nclient = OpenAI()\nmodel_name = \"text-embedding-ada-002\"\ndef convert_texts_to_vector(texts):\n    vectors = client.embeddings.create(input = texts, model=model_name).data\n    return np.asarray([vectors[i].embedding for i in range(len(vectors))])\ninstruments = example_instruments[\"CES_D English\"], example_instruments[\"GAD-7 Portuguese\"]\nall_questions, similarity, query_similarity, new_vectors_dict = match_instruments_with_function(instruments, None, convert_texts_to_vector)\n```\n \n## \ud83d\udcbb Do you want to run Harmony in your browser locally?\n\nDownload and install Docker:\n\n* https://docs.docker.com/desktop/install/mac-install/\n* https://docs.docker.com/desktop/install/windows-install/\n* https://docs.docker.com/desktop/install/linux-install/\n\nOpen a Terminal and run\n\n```\ndocker run -p 8000:8000 -p 3000:3000 harmonydata/harmonylocal\n```\n\nThen go to http://localhost:3000 in your browser.\n\n## Looking for the Harmony API?\n\nVisit: https://github.com/harmonydata/harmonyapi\n\n* \ud83d\udcf0 The code for training the PDF extraction is here: https://github.com/harmonydata/pdf-questionnaire-extraction\n\n## Docker images\n\nIf you are a Docker user, you can run Harmony from a pre-built Docker image.\n\n* https://hub.docker.com/repository/docker/harmonydata/harmonyapi - just the Harmony API\n* https://hub.docker.com/repository/docker/harmonydata/harmonylocal - Harmony API and React front end\n\n## Contributing to Harmony\n\nIf you'd like to contribute to this project, you can contact us at https://harmonydata.ac.uk/ or make a pull request on our [Github repository](https://github.com/harmonydata/harmonyapi). You can also [raise an issue](https://github.com/harmonydata/harmony/issues). \n\n## Developing Harmony\n\n### \ud83e\uddea Automated tests\n\nTest code is in **tests/** folder using [unittest](https://docs.python.org/3/library/unittest.html).\n\nThe testing tool `tox` is used in the automation with GitHub Actions CI/CD. **Since the PDF extraction also needs Java and Tika installed, you cannot run the unit tests without first installing Java and Tika. See above for instructions.**\n\n### \ud83e\uddea Use tox locally\n\nInstall tox and run it:\n\n```\npip install tox\ntox\n```\n\nIn our configuration, tox runs a check of source distribution using [check-manifest](https://pypi.org/project/check-manifest/) (which requires your repo to be git-initialized (`git init`) and added (`git add .`) at least), setuptools's check, and unit tests using pytest. You don't need to install check-manifest and pytest though, tox will install them in a separate environment.\n\nThe automated tests are run against several Python versions, but on your machine, you might be using only one version of Python, if that is Python 3.9, then run:\n\n```\ntox -e py39\n```\n\nThanks to GitHub Actions' automated process, you don't need to generate distribution files locally. \n\n### \u2699\ufe0fContinuous integration/deployment to PyPI\n\nThis package is based on the template https://pypi.org/project/example-pypi-package/\n\nThis package\n\n- uses GitHub Actions for both testing and publishing\n- is tested when pushing `master` or `main` branch, and is published when create a release\n- includes test files in the source distribution\n- uses **setup.cfg** for [version single-sourcing](https://packaging.python.org/guides/single-sourcing-package-version/) (setuptools 46.4.0+)\n\n## \u2699\ufe0fRe-releasing the package manually\n\nThe code to re-release Harmony on PyPI is as follows:\n\n```\nsource activate py311\npip install twine\nrm -rf dist\npython setup.py sdist\ntwine upload dist/*\n```\n\n## \u200e\ud83d\ude03\ud83d\udc81 Who worked on Harmony?\n\nHarmony is a collaboration project between [Ulster University](https://ulster.ac.uk/), [University College London](https://ucl.ac.uk/), the [Universidade Federal de Santa Maria](https://www.ufsm.br/), and [Fast Data Science](http://fastdatascience.com/).  Harmony is funded by [Wellcome](https://wellcome.org/) as part of the [Wellcome Data Prize in Mental Health](https://wellcome.org/grant-funding/schemes/wellcome-mental-health-data-prize).\n\nThe core team at Harmony is made up of:\n\n* [Dr Bettina Moltrecht, PhD](https://profiles.ucl.ac.uk/60736-bettina-moltrecht) (UCL)\n* [Dr Eoin McElroy](https://www.ulster.ac.uk/staff/e-mcelroy) (University of Ulster)\n* [Dr George Ploubidis](https://profiles.ucl.ac.uk/48171-george-ploubidis) (UCL)\n* [Dr Mauricio Scopel Hoffmann](https://ufsmpublica.ufsm.br/docente/18264) (Universidade Federal de Santa Maria, Brazil)\n* [Thomas Wood](https://freelancedatascientist.net/) ([Fast Data Science](https://fastdatascience.com))\n\n## \ud83d\udcdc License\n\nMIT License. Copyright (c) 2023 Ulster University (https://www.ulster.ac.uk)\n\n## \ud83d\udcdc How do I cite Harmony?\n\nYou can cite our validation paper:\n\n McElroy, Wood, Bond, Mulvenna, Shevlin, Ploubidis, Scopel Hoffmann, Moltrecht, [Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data](https://bmcpsychiatry.biomedcentral.com/articles/10.1186/s12888-024-05954-2#citeas). BMC Psychiatry 24, 530 (2024), https://doi.org/10.1186/s12888-024-05954-2\n \n\nA BibTeX entry for LaTeX users is\n\n```\n@article{mcelroy2024using,\n  title={Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data},\n  author={McElroy, Eoin and Wood, Thomas and Bond, Raymond and Mulvenna, Maurice and Shevlin, Mark and Ploubidis, George B and Hoffmann, Mauricio Scopel and Moltrecht, Bettina},\n  journal={BMC psychiatry},\n  volume={24},\n  number={1},\n  pages={530},\n  year={2024},\n  publisher={Springer}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023 Ulster University. Information at: https://harmonydata.ac.uk (maintainer: Thomas Wood, https://fastdatascience.com/harmony-wellcome-data-prize/)  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Harmony Tool for Retrospective Data Harmonisation",
    "version": "1.0.1",
    "project_urls": {
        "Bug Reports": "https://github.com/harmonydata/harmony/issues",
        "Documentation": "https://harmonydata.ac.uk/",
        "Source Code": "https://github.com/harmonydata/harmony"
    },
    "split_keywords": [
        "harmony",
        " harmonisation",
        " harmonization",
        " harmonise"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1395406f15f920265879b35dce4565d780e652a52767779cd5b22c0b95bf4137",
                "md5": "1c8f00f05b70fecc5b9fa47e991ab26d",
                "sha256": "f84e3a7d9ae70c8d7f8f5dc7f9eb0c496571a63940a10425fff7e638a078981b"
            },
            "downloads": -1,
            "filename": "harmonydata-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1c8f00f05b70fecc5b9fa47e991ab26d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.13.0,>=3.6",
            "size": 151249,
            "upload_time": "2024-11-26T15:10:02",
            "upload_time_iso_8601": "2024-11-26T15:10:02.490772Z",
            "url": "https://files.pythonhosted.org/packages/13/95/406f15f920265879b35dce4565d780e652a52767779cd5b22c0b95bf4137/harmonydata-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1421659305f4b7d4c013ceaeade662ed1ade4fbc843b75103483646ba02c9dbb",
                "md5": "c234cbc01684d9e59ea14564f807f91b",
                "sha256": "5bc92a309ddba1e75feac278bbb8f61ba32e1eabbf4e54081e6083b7c27651ba"
            },
            "downloads": -1,
            "filename": "harmonydata-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c234cbc01684d9e59ea14564f807f91b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.13.0,>=3.6",
            "size": 174119,
            "upload_time": "2024-11-26T15:10:03",
            "upload_time_iso_8601": "2024-11-26T15:10:03.735722Z",
            "url": "https://files.pythonhosted.org/packages/14/21/659305f4b7d4c013ceaeade662ed1ade4fbc843b75103483646ba02c9dbb/harmonydata-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-26 15:10:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "harmonydata",
    "github_project": "harmony",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "harmonydata"
}
        
Elapsed time: 0.47126s