tf-idf-cosimm


Nametf-idf-cosimm JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/cooperrc/tf_idf_cosimm
SummaryThis is a short set of functions meant to help analyze cosine similarity between texts
upload_time2024-05-28 18:03:19
maintainerNone
docs_urlNone
authorRyan C. Cooper
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            tf_idf
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This file will become your README and also the index of your
documentation.

## Install

``` sh
pip install tf_idf
```

## How to use

Fill me in please! Don’t forget code examples:

``` python
import tf_idf.core as tf_idf
import pandas as pd
```

``` python
AI = 'For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety'
ME = 'For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness'
# word_tokenize(AI.lower().split())
# preprocess_text(AI)
```

``` python
compare = tf_idf.preprocess_text(AI)
```

``` python
compare = pd.concat([compare, preprocess_text(ME)], ignore_index=True)
compare
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>DOCUMENT</th>
      <th>LOWERCASE</th>
      <th>CLEANING</th>
      <th>TOKENIZATION</th>
      <th>STOP-WORDS</th>
      <th>STEMMING</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>
      <td>for instance, in the design phase of a structural engineering project, monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>
      <td>for instance in the design phase of a structural engineering project monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties providing valuable insights into its reliability and safety</td>
      <td>[for, instance, in, the, design, phase, of, a, structural, engineering, project, monte, carlo, simulations, can, help, evaluate, the, performance, of, a, proposed, design, under, different, loading, conditions, and, material, properties, providing, valuable, insights, into, its, reliability, and, safety]</td>
      <td>[instance, design, phase, structural, engineering, project, monte, carlo, simulations, evaluate, performance, proposed, design, different, loading, conditions, material, properties, providing, valuable, insights, reliability, safety]</td>
      <td>[instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti]</td>
    </tr>
    <tr>
      <th>1</th>
      <td>For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>for instance, monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>for instance monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>[for, instance, monte, carlo, simulations, can, simulate, hundreds, or, thousands, of, different, combinations, of, loading, conditions, and, material, properties, to, create, statistical, predictions, of, structure, stiffness]</td>
      <td>[instance, monte, carlo, simulations, simulate, hundreds, thousands, different, combinations, loading, conditions, material, properties, create, statistical, predictions, structure, stiffness]</td>
      <td>[instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff]</td>
    </tr>
  </tbody>
</table>
</div>

``` python
compare_tfidf = calculate_tfidf(compare)
compare_tfidf
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>DOCUMENT</th>
      <th>LOWERCASE</th>
      <th>CLEANING</th>
      <th>TOKENIZATION</th>
      <th>STOP-WORDS</th>
      <th>STEMMING</th>
      <th>carlo</th>
      <th>combin</th>
      <th>condit</th>
      <th>creat</th>
      <th>...</th>
      <th>propos</th>
      <th>provid</th>
      <th>reliabl</th>
      <th>safeti</th>
      <th>simul</th>
      <th>statist</th>
      <th>stiff</th>
      <th>structur</th>
      <th>thousand</th>
      <th>valuabl</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>
      <td>for instance, in the design phase of a structural engineering project, monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>
      <td>for instance in the design phase of a structural engineering project monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties providing valuable insights into its reliability and safety</td>
      <td>[for, instance, in, the, design, phase, of, a, structural, engineering, project, monte, carlo, simulations, can, help, evaluate, the, performance, of, a, proposed, design, under, different, loading, conditions, and, material, properties, providing, valuable, insights, into, its, reliability, and, safety]</td>
      <td>[instance, design, phase, structural, engineering, project, monte, carlo, simulations, evaluate, performance, proposed, design, different, loading, conditions, material, properties, providing, valuable, insights, reliability, safety]</td>
      <td>[instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti]</td>
      <td>0.158850</td>
      <td>0.000000</td>
      <td>0.158850</td>
      <td>0.000000</td>
      <td>...</td>
      <td>0.223259</td>
      <td>0.223259</td>
      <td>0.223259</td>
      <td>0.223259</td>
      <td>0.158850</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.158850</td>
      <td>0.000000</td>
      <td>0.223259</td>
    </tr>
    <tr>
      <th>1</th>
      <td>For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>for instance, monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>for instance monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>[for, instance, monte, carlo, simulations, can, simulate, hundreds, or, thousands, of, different, combinations, of, loading, conditions, and, material, properties, to, create, statistical, predictions, of, structure, stiffness]</td>
      <td>[instance, monte, carlo, simulations, simulate, hundreds, thousands, different, combinations, loading, conditions, material, properties, create, statistical, predictions, structure, stiffness]</td>
      <td>[instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff]</td>
      <td>0.193068</td>
      <td>0.271351</td>
      <td>0.193068</td>
      <td>0.271351</td>
      <td>...</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.386137</td>
      <td>0.271351</td>
      <td>0.271351</td>
      <td>0.193068</td>
      <td>0.271351</td>
      <td>0.000000</td>
    </tr>
  </tbody>
</table>
<p>2 rows × 35 columns</p>
</div>

``` python
tf_idf.cosineSimilarity(compare)
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>DOCUMENT</th>
      <th>STEMMING</th>
      <th>COSIM</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>
      <td>[instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti]</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>
      <td>[instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff]</td>
      <td>0.337359</td>
    </tr>
  </tbody>
</table>
</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cooperrc/tf_idf_cosimm",
    "name": "tf-idf-cosimm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "Ryan C. Cooper",
    "author_email": "ryan.c.cooper@uconn.edu",
    "download_url": "https://files.pythonhosted.org/packages/de/da/f1897e332602ef43985bac7c301285a94e371c7ccdfd84935835037cc94b/tf_idf_cosimm-0.0.2.tar.gz",
    "platform": null,
    "description": "tf_idf\n================\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\nThis file will become your README and also the index of your\ndocumentation.\n\n## Install\n\n``` sh\npip install tf_idf\n```\n\n## How to use\n\nFill me in please! Don\u2019t forget code examples:\n\n``` python\nimport tf_idf.core as tf_idf\nimport pandas as pd\n```\n\n``` python\nAI = 'For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety'\nME = 'For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness'\n# word_tokenize(AI.lower().split())\n# preprocess_text(AI)\n```\n\n``` python\ncompare = tf_idf.preprocess_text(AI)\n```\n\n``` python\ncompare = pd.concat([compare, preprocess_text(ME)], ignore_index=True)\ncompare\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>DOCUMENT</th>\n      <th>LOWERCASE</th>\n      <th>CLEANING</th>\n      <th>TOKENIZATION</th>\n      <th>STOP-WORDS</th>\n      <th>STEMMING</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>\n      <td>for instance, in the design phase of a structural engineering project, monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>\n      <td>for instance in the design phase of a structural engineering project monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties providing valuable insights into its reliability and safety</td>\n      <td>[for, instance, in, the, design, phase, of, a, structural, engineering, project, monte, carlo, simulations, can, help, evaluate, the, performance, of, a, proposed, design, under, different, loading, conditions, and, material, properties, providing, valuable, insights, into, its, reliability, and, safety]</td>\n      <td>[instance, design, phase, structural, engineering, project, monte, carlo, simulations, evaluate, performance, proposed, design, different, loading, conditions, material, properties, providing, valuable, insights, reliability, safety]</td>\n      <td>[instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti]</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>for instance, monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>for instance monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>[for, instance, monte, carlo, simulations, can, simulate, hundreds, or, thousands, of, different, combinations, of, loading, conditions, and, material, properties, to, create, statistical, predictions, of, structure, stiffness]</td>\n      <td>[instance, monte, carlo, simulations, simulate, hundreds, thousands, different, combinations, loading, conditions, material, properties, create, statistical, predictions, structure, stiffness]</td>\n      <td>[instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff]</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n\n``` python\ncompare_tfidf = calculate_tfidf(compare)\ncompare_tfidf\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>DOCUMENT</th>\n      <th>LOWERCASE</th>\n      <th>CLEANING</th>\n      <th>TOKENIZATION</th>\n      <th>STOP-WORDS</th>\n      <th>STEMMING</th>\n      <th>carlo</th>\n      <th>combin</th>\n      <th>condit</th>\n      <th>creat</th>\n      <th>...</th>\n      <th>propos</th>\n      <th>provid</th>\n      <th>reliabl</th>\n      <th>safeti</th>\n      <th>simul</th>\n      <th>statist</th>\n      <th>stiff</th>\n      <th>structur</th>\n      <th>thousand</th>\n      <th>valuabl</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>\n      <td>for instance, in the design phase of a structural engineering project, monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>\n      <td>for instance in the design phase of a structural engineering project monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties providing valuable insights into its reliability and safety</td>\n      <td>[for, instance, in, the, design, phase, of, a, structural, engineering, project, monte, carlo, simulations, can, help, evaluate, the, performance, of, a, proposed, design, under, different, loading, conditions, and, material, properties, providing, valuable, insights, into, its, reliability, and, safety]</td>\n      <td>[instance, design, phase, structural, engineering, project, monte, carlo, simulations, evaluate, performance, proposed, design, different, loading, conditions, material, properties, providing, valuable, insights, reliability, safety]</td>\n      <td>[instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti]</td>\n      <td>0.158850</td>\n      <td>0.000000</td>\n      <td>0.158850</td>\n      <td>0.000000</td>\n      <td>...</td>\n      <td>0.223259</td>\n      <td>0.223259</td>\n      <td>0.223259</td>\n      <td>0.223259</td>\n      <td>0.158850</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.158850</td>\n      <td>0.000000</td>\n      <td>0.223259</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>for instance, monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>for instance monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>[for, instance, monte, carlo, simulations, can, simulate, hundreds, or, thousands, of, different, combinations, of, loading, conditions, and, material, properties, to, create, statistical, predictions, of, structure, stiffness]</td>\n      <td>[instance, monte, carlo, simulations, simulate, hundreds, thousands, different, combinations, loading, conditions, material, properties, create, statistical, predictions, structure, stiffness]</td>\n      <td>[instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff]</td>\n      <td>0.193068</td>\n      <td>0.271351</td>\n      <td>0.193068</td>\n      <td>0.271351</td>\n      <td>...</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.386137</td>\n      <td>0.271351</td>\n      <td>0.271351</td>\n      <td>0.193068</td>\n      <td>0.271351</td>\n      <td>0.000000</td>\n    </tr>\n  </tbody>\n</table>\n<p>2 rows \u00d7 35 columns</p>\n</div>\n\n``` python\ntf_idf.cosineSimilarity(compare)\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>DOCUMENT</th>\n      <th>STEMMING</th>\n      <th>COSIM</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety</td>\n      <td>[instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti]</td>\n      <td>1.000000</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness</td>\n      <td>[instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff]</td>\n      <td>0.337359</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "This is a short set of functions meant to help analyze cosine similarity between texts",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/cooperrc/tf_idf_cosimm"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6a0b83d5cd1985bc46ccab65f313708019a370deed2deee39de7e017f06cefd",
                "md5": "055f30b238d0e168dc78460686e06c91",
                "sha256": "daac75b3065830310aa19fb844109e622f138a809e2d7d958545ce4a2e8cd667"
            },
            "downloads": -1,
            "filename": "tf_idf_cosimm-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "055f30b238d0e168dc78460686e06c91",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9200,
            "upload_time": "2024-05-28T18:03:18",
            "upload_time_iso_8601": "2024-05-28T18:03:18.368588Z",
            "url": "https://files.pythonhosted.org/packages/c6/a0/b83d5cd1985bc46ccab65f313708019a370deed2deee39de7e017f06cefd/tf_idf_cosimm-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dedaf1897e332602ef43985bac7c301285a94e371c7ccdfd84935835037cc94b",
                "md5": "a01ba3d2cea0953d7717ed50a3c0a26e",
                "sha256": "a3e9a38c4cd53e5720bca687215abdc273a71d5a39f7e59ae659f9abc4e69c96"
            },
            "downloads": -1,
            "filename": "tf_idf_cosimm-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a01ba3d2cea0953d7717ed50a3c0a26e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 10424,
            "upload_time": "2024-05-28T18:03:19",
            "upload_time_iso_8601": "2024-05-28T18:03:19.363731Z",
            "url": "https://files.pythonhosted.org/packages/de/da/f1897e332602ef43985bac7c301285a94e371c7ccdfd84935835037cc94b/tf_idf_cosimm-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-28 18:03:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cooperrc",
    "github_project": "tf_idf_cosimm",
    "github_not_found": true,
    "lcname": "tf-idf-cosimm"
}
        
Elapsed time: 0.22561s