texture-viz


Nametexture-viz JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/cmudig/Texture
SummaryProcess and profile text datasets interactively
upload_time2024-05-08 21:30:28
maintainerNone
docs_urlNone
authorWill Epperson
requires_python<4.0,>=3.10
licenseNone
keywords text nlp data profiling llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Texture: Structured Text Analytics

[![PyPi](https://img.shields.io/pypi/v/texture-viz.svg)](https://pypi.org/project/texture-viz/)

Texture is a system for exploring and creating structured insights with your text datasets.

1. **Interactive Attribute Profiles**: Texture visualizes structured attributes alongside your text data in interactive, cross-filterable charts.
2. **Flexible attribute definitions**: Attribute charts can come from different tables and any level of a document such as words, sentences, or documents.
3. **Derive new attributes**: Texture helps you derive new attributes during analysis with code and LLM transformations.

![screenshot of Texture interface](.github/screenshots/texture_sc.png)

## Install and run

Install texture with pip:

```bash
pip install texture-viz
```

Then you can run in a python script or notebook by providing a dataframe with your text data and attributes.

```python
import texture
texture.run(df)
```

## Texture Configuration

You can optionally pass arguments to the [`run`](./texture/runner.py) command to configure the interface. Notable configuration options are:

- `embeddings: np.ndarray`: embeddings of your text data can be provided to enable similarity search and a projection overview. If you already have a 2d projection of these embeddings, you must provide it as columns `umap_x` and `umap_y` in the dataframe.
- `column_info: List[ColumnInputInfo]`: Used to override default column types and provide derived tables. Texture will automatically infer the types (text, categorical, number, date) of your columns, but you can override here. Additionally, you can provide column information for columns from another table like words.
- `api_key`: Your OpenAI API key to enable LLM attribute derivation.

We provide various preprocessing functions to calculate embeddings, projections, and word tables. You can use these functions to preprocess your data before launching the Texture app.

```python
import pandas as pd
import texture

df_vis_papers = pd.read_parquet("https://raw.githubusercontent.com/cmudig/Texture/main/examples/vis_papers/vis_paper_data.parquet")

# get embeddings and projection
embeddings, projection = texture.preprocess.get_embeddings_and_projection(
    df_vis_papers["Abstract"], ".", "all-mpnet-base-v2"
)

df_vis_papers["umap_x"] = projection[:, 0]
df_vis_papers["umap_y"] = projection[:, 1]

# get word table
df_words = texture.preprocess.get_df_words_w_span(df_vis_papers["Abstract"], df_vis_papers["id"])

# launch texture
texture.run(
    df_vis_papers,
    embeddings=embeddings,
    column_info=[
        {"name": "Abstract", "type": "text"},
        {"name": "Title", "type": "categorical"},
        {"name": "Year", "type": "number"},
        {
            "name": "word",
            "derived_from": "Abstract",
            "table_data": df_words,
            "type": "categorical",
        },
    ],
)
```

## Dev install

See [DEV.md](DEV.md) for dev workflows and setup.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cmudig/Texture",
    "name": "texture-viz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "text, nlp, data profiling, llm",
    "author": "Will Epperson",
    "author_email": "willepp@live.com",
    "download_url": "https://files.pythonhosted.org/packages/1c/3a/792f52308ead20a901cc22ee66c542c02419d596e2cb5b38a497de97de84/texture_viz-0.0.4.tar.gz",
    "platform": null,
    "description": "# Texture: Structured Text Analytics\n\n[![PyPi](https://img.shields.io/pypi/v/texture-viz.svg)](https://pypi.org/project/texture-viz/)\n\nTexture is a system for exploring and creating structured insights with your text datasets.\n\n1. **Interactive Attribute Profiles**: Texture visualizes structured attributes alongside your text data in interactive, cross-filterable charts.\n2. **Flexible attribute definitions**: Attribute charts can come from different tables and any level of a document such as words, sentences, or documents.\n3. **Derive new attributes**: Texture helps you derive new attributes during analysis with code and LLM transformations.\n\n![screenshot of Texture interface](.github/screenshots/texture_sc.png)\n\n## Install and run\n\nInstall texture with pip:\n\n```bash\npip install texture-viz\n```\n\nThen you can run in a python script or notebook by providing a dataframe with your text data and attributes.\n\n```python\nimport texture\ntexture.run(df)\n```\n\n## Texture Configuration\n\nYou can optionally pass arguments to the [`run`](./texture/runner.py) command to configure the interface. Notable configuration options are:\n\n- `embeddings: np.ndarray`: embeddings of your text data can be provided to enable similarity search and a projection overview. If you already have a 2d projection of these embeddings, you must provide it as columns `umap_x` and `umap_y` in the dataframe.\n- `column_info: List[ColumnInputInfo]`: Used to override default column types and provide derived tables. Texture will automatically infer the types (text, categorical, number, date) of your columns, but you can override here. Additionally, you can provide column information for columns from another table like words.\n- `api_key`: Your OpenAI API key to enable LLM attribute derivation.\n\nWe provide various preprocessing functions to calculate embeddings, projections, and word tables. You can use these functions to preprocess your data before launching the Texture app.\n\n```python\nimport pandas as pd\nimport texture\n\ndf_vis_papers = pd.read_parquet(\"https://raw.githubusercontent.com/cmudig/Texture/main/examples/vis_papers/vis_paper_data.parquet\")\n\n# get embeddings and projection\nembeddings, projection = texture.preprocess.get_embeddings_and_projection(\n    df_vis_papers[\"Abstract\"], \".\", \"all-mpnet-base-v2\"\n)\n\ndf_vis_papers[\"umap_x\"] = projection[:, 0]\ndf_vis_papers[\"umap_y\"] = projection[:, 1]\n\n# get word table\ndf_words = texture.preprocess.get_df_words_w_span(df_vis_papers[\"Abstract\"], df_vis_papers[\"id\"])\n\n# launch texture\ntexture.run(\n    df_vis_papers,\n    embeddings=embeddings,\n    column_info=[\n        {\"name\": \"Abstract\", \"type\": \"text\"},\n        {\"name\": \"Title\", \"type\": \"categorical\"},\n        {\"name\": \"Year\", \"type\": \"number\"},\n        {\n            \"name\": \"word\",\n            \"derived_from\": \"Abstract\",\n            \"table_data\": df_words,\n            \"type\": \"categorical\",\n        },\n    ],\n)\n```\n\n## Dev install\n\nSee [DEV.md](DEV.md) for dev workflows and setup.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Process and profile text datasets interactively",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/cmudig/Texture",
        "Repository": "https://github.com/cmudig/Texture"
    },
    "split_keywords": [
        "text",
        " nlp",
        " data profiling",
        " llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e84faf8ca109e4b3863a298b75b3ff62b04a788d495e1b174ae308868f9a9cad",
                "md5": "e4753508ff25a9c8ecf66e19e7d579ae",
                "sha256": "a4606f5ddb04af8d0bd3245a1a45f9953100f9da970994fa50adda30b117c826"
            },
            "downloads": -1,
            "filename": "texture_viz-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e4753508ff25a9c8ecf66e19e7d579ae",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 3195084,
            "upload_time": "2024-05-08T21:30:25",
            "upload_time_iso_8601": "2024-05-08T21:30:25.616941Z",
            "url": "https://files.pythonhosted.org/packages/e8/4f/af8ca109e4b3863a298b75b3ff62b04a788d495e1b174ae308868f9a9cad/texture_viz-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1c3a792f52308ead20a901cc22ee66c542c02419d596e2cb5b38a497de97de84",
                "md5": "05ffbe99b57117e25381aeb8faaee029",
                "sha256": "c181d951fddb271da64d44bcf323d90b24801637e73557ec8cdf5404e0b43e72"
            },
            "downloads": -1,
            "filename": "texture_viz-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "05ffbe99b57117e25381aeb8faaee029",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 3122458,
            "upload_time": "2024-05-08T21:30:28",
            "upload_time_iso_8601": "2024-05-08T21:30:28.006274Z",
            "url": "https://files.pythonhosted.org/packages/1c/3a/792f52308ead20a901cc22ee66c542c02419d596e2cb5b38a497de97de84/texture_viz-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-08 21:30:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cmudig",
    "github_project": "Texture",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "texture-viz"
}
        
Elapsed time: 0.29742s