influencemapper


Nameinfluencemapper JSON
Version 0.9.4 PyPI version JSON
download
home_pagehttps://github.com/networkdynamics/influencemapper
SummaryA tool for extracting information from disclosure statements.
upload_time2024-12-22 04:29:44
maintainerNone
docs_urlNone
authorHardy
requires_python<4.0,>=3.9
licenseMIT
keywords disclosure conflict of interests competing interest research
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # InfluenceMapper

InfluenceMapper is a python library for extracting disclosure information from scholarly articles. It uses fine-tuned OpenAI's GPT models for the extraction of entities and relationships from the text. The functions included in the library are:
- Extract entities from the text.
- Extract relationships between authors and entities.
- Extract relationships between entities and the study.

## Installation

To install the library, run the following command:

```bash
pip install influencemapper
```

## Training the model

The model is trained on a dataset of scholarly articles. The dataset is available at the `data` folder. To train the model, clone the directory and run the following command:

```bash
python core/src/influencemapper/cli.py fine_tune -train_data data/train.jsonl -valid_data data/valid.jsonl -model_name gpt-4o-mini -threshold 1500 study_org 
python core/src/influencemapper/cli.py fine_tune -train_data data/train.jsonl -valid_data data/valid.jsonl -model_name gpt-4o-mini -threshold 1500 author_org
```

As of the writing of this README, the resulting file has to be uploaded manually to the OpenAI platform to fine-tune the model. The model will be available for use after the fine-tuning process is completed.The `threshold` parameter is used to restrict samples, allowing only those with a maximum token count that meets the training requirements to pass.

## Inferring entities and relationships

To infer entities and relationships from a disclosure text, run the following command:

```bash 
python core/src/influencemapper/cli.py infer -data data/test.jsonl -model_name gpt-4o-mini -API_KEY [API_KEY] study_org
python core/src/influencemapper/cli.py infer -data data/test.jsonl -model_name gpt-4o-mini -API_KEY [API_KEY] author_org
```

To get the results, you have to visit the OpenAI platform and download the results. After dowlnoading the results, you need to combine the results back to the original dataset using the following command:

```bash
python core/src/influencemapper/cli.py combine -data data/test.jsonl -result batch*.jsonl study_org
python core/src/influencemapper/cli.py combine -data data/test.jsonl -result batch*.jsonl author_org
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/networkdynamics/influencemapper",
    "name": "influencemapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "disclosure, conflict of interests, competing interest, research",
    "author": "Hardy",
    "author_email": "hardy.oei@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a1/3e/7c98dc7f96a29005abb806425f8082a11e84806430288298516601d4b297/influencemapper-0.9.4.tar.gz",
    "platform": null,
    "description": "# InfluenceMapper\n\nInfluenceMapper is a python library for extracting disclosure information from scholarly articles. It uses fine-tuned OpenAI's GPT models for the extraction of entities and relationships from the text. The functions included in the library are:\n- Extract entities from the text.\n- Extract relationships between authors and entities.\n- Extract relationships between entities and the study.\n\n## Installation\n\nTo install the library, run the following command:\n\n```bash\npip install influencemapper\n```\n\n## Training the model\n\nThe model is trained on a dataset of scholarly articles. The dataset is available at the `data` folder. To train the model, clone the directory and run the following command:\n\n```bash\npython core/src/influencemapper/cli.py fine_tune -train_data data/train.jsonl -valid_data data/valid.jsonl -model_name gpt-4o-mini -threshold 1500 study_org \npython core/src/influencemapper/cli.py fine_tune -train_data data/train.jsonl -valid_data data/valid.jsonl -model_name gpt-4o-mini -threshold 1500 author_org\n```\n\nAs of the writing of this README, the resulting file has to be uploaded manually to the OpenAI platform to fine-tune the model. The model will be available for use after the fine-tuning process is completed.The `threshold` parameter is used to restrict samples, allowing only those with a maximum token count that meets the training requirements to pass.\n\n## Inferring entities and relationships\n\nTo infer entities and relationships from a disclosure text, run the following command:\n\n```bash \npython core/src/influencemapper/cli.py infer -data data/test.jsonl -model_name gpt-4o-mini -API_KEY [API_KEY] study_org\npython core/src/influencemapper/cli.py infer -data data/test.jsonl -model_name gpt-4o-mini -API_KEY [API_KEY] author_org\n```\n\nTo get the results, you have to visit the OpenAI platform and download the results. After dowlnoading the results, you need to combine the results back to the original dataset using the following command:\n\n```bash\npython core/src/influencemapper/cli.py combine -data data/test.jsonl -result batch*.jsonl study_org\npython core/src/influencemapper/cli.py combine -data data/test.jsonl -result batch*.jsonl author_org\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool for extracting information from disclosure statements.",
    "version": "0.9.4",
    "project_urls": {
        "Homepage": "https://github.com/networkdynamics/influencemapper"
    },
    "split_keywords": [
        "disclosure",
        " conflict of interests",
        " competing interest",
        " research"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c0000b17156519213622fcf7d02894a2b2384981da8453158081b34edd2e1dc8",
                "md5": "0628bd1efa8a36b5b74c45e71c47ad5b",
                "sha256": "3217e74848d221fe213ac01314298fadb91918d0f52536de89c6f15c95569a37"
            },
            "downloads": -1,
            "filename": "influencemapper-0.9.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0628bd1efa8a36b5b74c45e71c47ad5b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 22707,
            "upload_time": "2024-12-22T04:29:42",
            "upload_time_iso_8601": "2024-12-22T04:29:42.346683Z",
            "url": "https://files.pythonhosted.org/packages/c0/00/0b17156519213622fcf7d02894a2b2384981da8453158081b34edd2e1dc8/influencemapper-0.9.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a13e7c98dc7f96a29005abb806425f8082a11e84806430288298516601d4b297",
                "md5": "93781a924a5d9da7812dfe336b1b5ade",
                "sha256": "0f3e3669fac8a489120c2d74bb68b03bffc46277736b6bf69bdd1105c645218a"
            },
            "downloads": -1,
            "filename": "influencemapper-0.9.4.tar.gz",
            "has_sig": false,
            "md5_digest": "93781a924a5d9da7812dfe336b1b5ade",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 16707,
            "upload_time": "2024-12-22T04:29:44",
            "upload_time_iso_8601": "2024-12-22T04:29:44.470962Z",
            "url": "https://files.pythonhosted.org/packages/a1/3e/7c98dc7f96a29005abb806425f8082a11e84806430288298516601d4b297/influencemapper-0.9.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-22 04:29:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "networkdynamics",
    "github_project": "influencemapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "influencemapper"
}
        
Elapsed time: 0.53654s