team-comm-tools


Nameteam-comm-tools JSON
Version 0.1.4.post2 PyPI version JSON
download
home_pageNone
SummaryA toolkit that generates a variety of features for team conversation data.
upload_time2024-10-08 05:32:15
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License Copyright (c) 2022 Xinlan Emily Hu Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords computational social science teams communication conversation chat analysis
VCS
bugtrack_url
requirements chardet convokit emoji flask gensim nltk numpy pandas pyphen pytest pytest-runner python-dateutil pytz regex scikit-learn scipy sentence-transformers sentencepiece spacy spacy-legacy spacy-loggers textblob tokenizers torch torchaudio torchvision transformers tqdm tzdata tzlocal
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Testing Features](https://github.com/Watts-Lab/team_comm_tools/workflows/Testing%20Features/badge.svg)](https://github.com/Watts-Lab/team_comm_tools/actions?query=workflow:"Testing+Features")
[![GitHub release](https://img.shields.io/github/release/Watts-Lab/team_comm_tools?include_prereleases=&sort=semver&color=blue)](https://github.com/Watts-Lab/team_comm_tools/releases/)
[![License](https://img.shields.io/badge/License-MIT-blue)](#license)

# The Team Communication Toolkit
The Team Communication Toolkit is a Python package that makes it easy for social scientists to analyze and understand *text-based communication data*. Our aim is to facilitate seamless analyses of conversational data --- especially among groups and teams! --- by providing a single interface for researchers to generate and explore dozens of research-backed conversational features.

We are a research project created by the [Computational Social Science Lab at UPenn](https://css.seas.upenn.edu/) and funded by the [Wharton AI and Analytics Initiative](https://ai-analytics.wharton.upenn.edu/).

<div align="center">

[![View - Home Page](https://img.shields.io/badge/View_site-GH_Pages-2ea44f?style=for-the-badge)](https://teamcommtools.seas.upenn.edu/)

[![View - Documentation](https://img.shields.io/badge/view-Documentation-blue?style=for-the-badge)](https://conversational-featurizer.readthedocs.io/en/latest/ "Go to project documentation")

The Team Communication Toolkit is an academic project and is intended to be used for academic purposes only.

</div>

# Getting Started

To use our tool, please ensure that you have Python >= 3.10 installed and a working version of [pip](https://pypi.org/project/pip/), which is Python's package installer. Then, in your local environment, run the following:

```sh
pip install team_comm_tools
```

This command will automatically install our package and all required dependencies.

## Troubleshooting

In the event that some dependency installations fail (for example, you may get an error that `en_core_web_sm` from Spacy is not found, or that there is a missing NLTK resource), please run this simple one-line command in your terminal, which will force the installation of Spacy and NLTK dependencies:

```sh
download_resources
```

If you encounter a further issue in which the 'wordnet' package from NLTK is not found, it may be related to a known bug in NLTK in which the wordnet package does not unzip automatically. If this is the case, please follow the instructions to manually unzip it, documented in [this thread](https://github.com/nltk/nltk/issues/3028).

## Import Recommendations: Virtual Environment and Pip

**We strongly recommend using a virtual environment in Python to run the package.** We have several specific dependency requirements. One important one is that we are currently only compatible with numpy < 2.0.0 because [numpy 2.0.0 and above](https://numpy.org/devdocs/release/2.0.0-notes.html#changes) made significant changes that are not compatible with other dependencies of our package. As those dependencies are updated, we will support later versions of numpy.

**We also strongly recommend using thet your version of pip is up-to-date (>=24.0).** There have been reports in which users have had trouble downloading dependencies (specifically, the Spacy package) with older versions of pip. If you get an error with downloading `en_core_web_sm`, we recommend updating pip.


## Using the FeatureBuilder
After you import the package and install dependencies, you can then use our tool in your Python script as follows:

```python
from team_comm_tools import FeatureBuilder
```

*Note*: PyPI treats hyphens and underscores equally, so `pip install team_comm_tools` and `pip install team-comm-tools` are equivalent. However, Python does NOT treat them equally, and **you should use underscores when you import the package, like this: `from team_comm_tools import FeatureBuilder`**.

Once you import the tool, you will be able to declare a FeatureBuilder object, which is the heart of our tool. Here is some sample syntax:

```python
my_feature_builder = FeatureBuilder(
   input_df = my_pandas_dataframe,
   # this means there's a column in your data called 'conversation_id' that uniquely identifies a conversation
   conversation_id_col = "conversation_id",
   # this means there's a column in your data called 'speaker_id' that uniquely identifies a speaker
   speaker_id_col = "speaker_id",
   # this means there's a column in your data called 'messagae' that contains the content you want to featurize
   message_col = "message",
   # this means there's a column in your data called 'timestamp' that conains the time associated with each message; we also accept a list of (timestamp_start, timestamp_end), in case your data is formatted in that way.
   timestamp_col= "timestamp",
   # this is where we'll cache things like sentence vectors; this directory doesn't have to exist; we'll create it for you!
   vector_directory = "./vector_data/",
   # this will be the base file path for which we generate the three outputs;
   # you will get your outputs in output/chat/my_output_chat_level.csv; output/conv/my_output_conv_level.csv; and output/user/my_output_user_level.
   output_file_base = "my_output"
   # it will also store the output into output/turns/my_output_chat_level.csv
   turns = False,
   # these features depend on sentence vectors, so they take longer to generate on larger datasets. Add them in manually if you are interested in adding them to your output!
   custom_features = [
         "(BERT) Mimicry",
         "Moving Mimicry",
         "Forward Flow",
         "Discursive Diversity"
   ],
)

# this line of code runs the FeatureBuilder on your data
my_feature_builder.featurize()
```

### Data Format
We accept input data in the format of a Pandas DataFrame. Your data needs to have three (3) required input columns and one optional column.

1. A **conversation ID**, 
2. A **speaker ID**, 
3. A **message/text input**, which contains the content that you want to get featurized;
4. (Optional) a **timestamp**. This is not necessary for generating features, but behaviors related to the conversation's pace (for example, the average delay between messages; the "burstiness" of a conversation) cannot be measured without it.

### Featurized Outputs: Levels of Analysis

Notably, not all communication features are made equal, as they can be defined at different levels of analysis. For example, a single utterance ("you are great!") may be described as a "positive statement." An individual who makes many such utterances may be described as a "positive person." Finally, the entire team may enjoy a "positive conversation," an interaction in which everyone speaks positively to each other. In this way, the same concept of positivity can be applied to three levels: 

1. The **utterance**,
2. The **speaker**, and
3. The **conversation**

**We generate a separate output file for each level.** When you declare a FeatureBuilder, you can use the `output_file_base` to define a base path shared among all three levels, and an output path will be automatically generated for each level of analysis.

For more information, please refer to the [Introduction on our Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/intro.html#intro).

# Learn More
Please visit our website, [https://teamcommtools.seas.upenn.edu/](https://teamcommtools.seas.upenn.edu/), for general information about our project and research. For more detailed documentation on our features and examples, please visit our [Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/).

# Becoming a Contributor
If you would like to make pull requests to this open-sourced repository, please read our [GitHub Repo Getting Started Guide](/github_repo_getting_started.md). We welcome new feature contributions or improvements to our framework.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "team-comm-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "computational social science, teams, communication, conversation, chat, analysis",
    "author": null,
    "author_email": "Xinlan Emily Hu <xehu@wharton.upenn.edu>, Yuxuan Zhang <yuxuanzh@seas.upenn.edu>",
    "download_url": "https://files.pythonhosted.org/packages/88/ef/bec1f15e2b62e1119e32f7fe8c4a2119a43b553599f1aadde15c49fd90fe/team_comm_tools-0.1.4.post2.tar.gz",
    "platform": null,
    "description": "[![Testing Features](https://github.com/Watts-Lab/team_comm_tools/workflows/Testing%20Features/badge.svg)](https://github.com/Watts-Lab/team_comm_tools/actions?query=workflow:\"Testing+Features\")\n[![GitHub release](https://img.shields.io/github/release/Watts-Lab/team_comm_tools?include_prereleases=&sort=semver&color=blue)](https://github.com/Watts-Lab/team_comm_tools/releases/)\n[![License](https://img.shields.io/badge/License-MIT-blue)](#license)\n\n# The Team Communication Toolkit\nThe Team Communication Toolkit is a Python package that makes it easy for social scientists to analyze and understand *text-based communication data*. Our aim is to facilitate seamless analyses of conversational data --- especially among groups and teams! --- by providing a single interface for researchers to generate and explore dozens of research-backed conversational features.\n\nWe are a research project created by the [Computational Social Science Lab at UPenn](https://css.seas.upenn.edu/) and funded by the [Wharton AI and Analytics Initiative](https://ai-analytics.wharton.upenn.edu/).\n\n<div align=\"center\">\n\n[![View - Home Page](https://img.shields.io/badge/View_site-GH_Pages-2ea44f?style=for-the-badge)](https://teamcommtools.seas.upenn.edu/)\n\n[![View - Documentation](https://img.shields.io/badge/view-Documentation-blue?style=for-the-badge)](https://conversational-featurizer.readthedocs.io/en/latest/ \"Go to project documentation\")\n\nThe Team Communication Toolkit is an academic project and is intended to be used for academic purposes only.\n\n</div>\n\n# Getting Started\n\nTo use our tool, please ensure that you have Python >= 3.10 installed and a working version of [pip](https://pypi.org/project/pip/), which is Python's package installer. Then, in your local environment, run the following:\n\n```sh\npip install team_comm_tools\n```\n\nThis command will automatically install our package and all required dependencies.\n\n## Troubleshooting\n\nIn the event that some dependency installations fail (for example, you may get an error that `en_core_web_sm` from Spacy is not found, or that there is a missing NLTK resource), please run this simple one-line command in your terminal, which will force the installation of Spacy and NLTK dependencies:\n\n```sh\ndownload_resources\n```\n\nIf you encounter a further issue in which the 'wordnet' package from NLTK is not found, it may be related to a known bug in NLTK in which the wordnet package does not unzip automatically. If this is the case, please follow the instructions to manually unzip it, documented in [this thread](https://github.com/nltk/nltk/issues/3028).\n\n## Import Recommendations: Virtual Environment and Pip\n\n**We strongly recommend using a virtual environment in Python to run the package.** We have several specific dependency requirements. One important one is that we are currently only compatible with numpy < 2.0.0 because [numpy 2.0.0 and above](https://numpy.org/devdocs/release/2.0.0-notes.html#changes) made significant changes that are not compatible with other dependencies of our package. As those dependencies are updated, we will support later versions of numpy.\n\n**We also strongly recommend using thet your version of pip is up-to-date (>=24.0).** There have been reports in which users have had trouble downloading dependencies (specifically, the Spacy package) with older versions of pip. If you get an error with downloading `en_core_web_sm`, we recommend updating pip.\n\n\n## Using the FeatureBuilder\nAfter you import the package and install dependencies, you can then use our tool in your Python script as follows:\n\n```python\nfrom team_comm_tools import FeatureBuilder\n```\n\n*Note*: PyPI treats hyphens and underscores equally, so `pip install team_comm_tools` and `pip install team-comm-tools` are equivalent. However, Python does NOT treat them equally, and **you should use underscores when you import the package, like this: `from team_comm_tools import FeatureBuilder`**.\n\nOnce you import the tool, you will be able to declare a FeatureBuilder object, which is the heart of our tool. Here is some sample syntax:\n\n```python\nmy_feature_builder = FeatureBuilder(\n   input_df = my_pandas_dataframe,\n   # this means there's a column in your data called 'conversation_id' that uniquely identifies a conversation\n   conversation_id_col = \"conversation_id\",\n   # this means there's a column in your data called 'speaker_id' that uniquely identifies a speaker\n   speaker_id_col = \"speaker_id\",\n   # this means there's a column in your data called 'messagae' that contains the content you want to featurize\n   message_col = \"message\",\n   # this means there's a column in your data called 'timestamp' that conains the time associated with each message; we also accept a list of (timestamp_start, timestamp_end), in case your data is formatted in that way.\n   timestamp_col= \"timestamp\",\n   # this is where we'll cache things like sentence vectors; this directory doesn't have to exist; we'll create it for you!\n   vector_directory = \"./vector_data/\",\n   # this will be the base file path for which we generate the three outputs;\n   # you will get your outputs in output/chat/my_output_chat_level.csv; output/conv/my_output_conv_level.csv; and output/user/my_output_user_level.\n   output_file_base = \"my_output\"\n   # it will also store the output into output/turns/my_output_chat_level.csv\n   turns = False,\n   # these features depend on sentence vectors, so they take longer to generate on larger datasets. Add them in manually if you are interested in adding them to your output!\n   custom_features = [\n         \"(BERT) Mimicry\",\n         \"Moving Mimicry\",\n         \"Forward Flow\",\n         \"Discursive Diversity\"\n   ],\n)\n\n# this line of code runs the FeatureBuilder on your data\nmy_feature_builder.featurize()\n```\n\n### Data Format\nWe accept input data in the format of a Pandas DataFrame. Your data needs to have three (3) required input columns and one optional column.\n\n1. A **conversation ID**, \n2. A **speaker ID**, \n3. A **message/text input**, which contains the content that you want to get featurized;\n4. (Optional) a **timestamp**. This is not necessary for generating features, but behaviors related to the conversation's pace (for example, the average delay between messages; the \"burstiness\" of a conversation) cannot be measured without it.\n\n### Featurized Outputs: Levels of Analysis\n\nNotably, not all communication features are made equal, as they can be defined at different levels of analysis. For example, a single utterance (\"you are great!\") may be described as a \"positive statement.\" An individual who makes many such utterances may be described as a \"positive person.\" Finally, the entire team may enjoy a \"positive conversation,\" an interaction in which everyone speaks positively to each other. In this way, the same concept of positivity can be applied to three levels: \n\n1. The **utterance**,\n2. The **speaker**, and\n3. The **conversation**\n\n**We generate a separate output file for each level.** When you declare a FeatureBuilder, you can use the `output_file_base` to define a base path shared among all three levels, and an output path will be automatically generated for each level of analysis.\n\nFor more information, please refer to the [Introduction on our Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/intro.html#intro).\n\n# Learn More\nPlease visit our website, [https://teamcommtools.seas.upenn.edu/](https://teamcommtools.seas.upenn.edu/), for general information about our project and research. For more detailed documentation on our features and examples, please visit our [Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/).\n\n# Becoming a Contributor\nIf you would like to make pull requests to this open-sourced repository, please read our [GitHub Repo Getting Started Guide](/github_repo_getting_started.md). We welcome new feature contributions or improvements to our framework.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022 Xinlan Emily Hu  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "A toolkit that generates a variety of features for team conversation data.",
    "version": "0.1.4.post2",
    "project_urls": {
        "Documentation": "https://conversational-featurizer.readthedocs.io/en/latest/",
        "Homepage": "https://teamcommtools.seas.upenn.edu/",
        "Repository": "https://github.com/Watts-Lab/team-comm-tools"
    },
    "split_keywords": [
        "computational social science",
        " teams",
        " communication",
        " conversation",
        " chat",
        " analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "81546e95077ddeca606fbfd6c6b4805064dfd64f5b9770e124ecdbcb21acc5ab",
                "md5": "7adc3f415dd86bdcf2fb68d7e6073a29",
                "sha256": "e5366e7a3a6f868172255f4e9c63364aee270e0791481daf3efcc57912aeeabc"
            },
            "downloads": -1,
            "filename": "team_comm_tools-0.1.4.post2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7adc3f415dd86bdcf2fb68d7e6073a29",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 219188,
            "upload_time": "2024-10-08T05:32:13",
            "upload_time_iso_8601": "2024-10-08T05:32:13.893250Z",
            "url": "https://files.pythonhosted.org/packages/81/54/6e95077ddeca606fbfd6c6b4805064dfd64f5b9770e124ecdbcb21acc5ab/team_comm_tools-0.1.4.post2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "88efbec1f15e2b62e1119e32f7fe8c4a2119a43b553599f1aadde15c49fd90fe",
                "md5": "b40af800299ca3749dadab1c6c563d06",
                "sha256": "6286b9fcde9a67fb370b6f4f97910d045874e3c01e5a6334b2f2990f0800b173"
            },
            "downloads": -1,
            "filename": "team_comm_tools-0.1.4.post2.tar.gz",
            "has_sig": false,
            "md5_digest": "b40af800299ca3749dadab1c6c563d06",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 210442,
            "upload_time": "2024-10-08T05:32:15",
            "upload_time_iso_8601": "2024-10-08T05:32:15.921728Z",
            "url": "https://files.pythonhosted.org/packages/88/ef/bec1f15e2b62e1119e32f7fe8c4a2119a43b553599f1aadde15c49fd90fe/team_comm_tools-0.1.4.post2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 05:32:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Watts-Lab",
    "github_project": "team-comm-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "chardet",
            "specs": [
                [
                    ">=",
                    "3.0.4"
                ]
            ]
        },
        {
            "name": "convokit",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "emoji",
            "specs": [
                [
                    "==",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "flask",
            "specs": [
                [
                    "==",
                    "3.0.3"
                ]
            ]
        },
        {
            "name": "gensim",
            "specs": [
                [
                    ">=",
                    "4.3.3"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    "==",
                    "3.9.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "pyphen",
            "specs": [
                [
                    "==",
                    "0.14.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.3.2"
                ]
            ]
        },
        {
            "name": "pytest-runner",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2023.12.25"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "<",
                    "1.14.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "2.3.1"
                ]
            ]
        },
        {
            "name": "sentencepiece",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "spacy",
            "specs": [
                [
                    ">=",
                    "3.7.2"
                ]
            ]
        },
        {
            "name": "spacy-legacy",
            "specs": [
                [
                    "==",
                    "3.0.12"
                ]
            ]
        },
        {
            "name": "spacy-loggers",
            "specs": [
                [
                    "==",
                    "1.0.5"
                ]
            ]
        },
        {
            "name": "textblob",
            "specs": [
                [
                    "==",
                    "0.17.1"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    "==",
                    "0.19.1"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.4.1"
                ]
            ]
        },
        {
            "name": "torchaudio",
            "specs": [
                [
                    "==",
                    "2.4.1"
                ]
            ]
        },
        {
            "name": "torchvision",
            "specs": [
                [
                    "==",
                    "0.19.1"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.44.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.66.5"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    ">=",
                    "2023.3"
                ]
            ]
        },
        {
            "name": "tzlocal",
            "specs": [
                [
                    "==",
                    "5.2"
                ]
            ]
        }
    ],
    "lcname": "team-comm-tools"
}
        
Elapsed time: 0.37492s