kangas


Namekangas JSON
Version 2.4.7 PyPI version JSON
download
home_pagehttps://github.com/comet-ml/kangas
SummaryTool for exploring columnar data, including multimedia
upload_time2024-01-26 13:23:50
maintainer
docs_urlNone
authorKangas Development Team
requires_python>=3.7
licenseMIT License
keywords data science python machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<img src="https://raw.githubusercontent.com/comet-ml/kangas/main/imgs/kangas-datagrid.png"><br>
</div>

-----------------

<p align="center">
    <a href="https://badge.fury.io/py/kangas">
        <img src="https://badge.fury.io/py/kangas.png" alt="PyPI version" height="18">
    </a>
    <a rel="nofollow" href="https://opensource.org/licenses/Apache-2.0">
        <img alt="GitHub" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg">
    </a>
    <a rel="nofollow" href="https://colab.research.google.com/github/comet-ml/kangas/blob/main/notebooks/DataGrid-Getting%20Started.ipynb">
        <img src="https://colab.research.google.com/assets/colab-badge.svg">
    </a>
    <a href="https://kangas.comet.com?datagrid=/data/coco-500.datagrid" rel="nofollow">
        <img src="https://img.shields.io/badge/Kangas-Live%20Demo-blue.svg" alt="Kangas Live Demo">
    </a>
    <a href="https://github.com/comet-ml/kangas/wiki" rel="nofollow">
        <img src="https://img.shields.io/badge/Kangas-Docs-blue.svg" alt="Kangas Documentation">
    </a>
    <a rel="nofollow" href="https://pepy.tech/project/kangas">
        <img style="max-width: 100%;" data-canonical-src="https://pepy.tech/badge/kangas" alt="Downloads"  src="https://camo.githubusercontent.com/708e470ec83922035f2189544eb968c8c5bba5c8623b0ebb9cb88c5c370766c4/68747470733a2f2f706570792e746563682f62616467652f6b616e676173">
    </a>
    <a rel="nofollow" href="https://doi.org/10.5281/zenodo.7410884">
        <img src="https://zenodo.org/badge/DOI/10.5281/zenodo.7410884.svg" alt="DOI">
    </a>
</p>

# Kangas: Explore Multimedia Datasets at Scale :kangaroo:

Kangas is a tool for exploring, analyzing, and visualizing large-scale multimedia data. It provides a straightforward Python API
for logging large tables of data, along with an intuitive visual interface for performing complex queries against your dataset.

The key features of Kangas include:

- **Scalability**. Kangas DataGrid, the fundamental class for representing datasets, can easily store millions of rows of data.
- **Performance**. Group, sort, and filter across millions of data points in seconds with a simple, fast UI.
- **Interoperability**. Any data, any environment. Kangas can run in a notebook or as a standalone app, both locally and remotely.
- **Integrated computer vision support**. Visualize and filter bounding boxes, labels, and metadata without any extra setup.

You can access a live demo of Kangas at <a href="https://kangas.comet.com?datagrid=/data/coco-500.datagrid">kangas.comet.com</a>. 

## Getting Started

Kangas is accessible as a Python library via pip
```
pip install kangas
```

Once installed, there are many ways to load or create a DataGrid. 

Without writing any code, you can even download a DataGrid and begin exploring the data. At the console:

```
kangas server https://github.com/caleb-kaiser/kangas_examples/raw/master/coco-500.datagrid.zip
```

That's it!

In the next example, we load a publicly available DataGrid file, but the Kangas API also provides methods for ingesting CSVs, Pandas DataFrames, and for manually constructing a new DataGrid:

```python
import kangas as kg

# Load an existing DataGrid
dg = kg.read_datagrid("https://github.com/caleb-kaiser/kangas_examples/raw/master/coco-500.datagrid.zip")
```

After your DataGrid is initialized, you can render it within the Kangas Viewer directly from Python:

```python
dg.show()
```
<img width="1789" alt="image" src="https://user-images.githubusercontent.com/42076840/197875668-5519d504-2209-472f-952e-ed09554ecb7a.png">

From the Kangas Viewer, you can group, sort, and filter data. In addition, Kangas will do its best to parse any metadata attached to your assets. For example, if you're using the COCO-500 DataGrid from the quickstart above, Kangas will automatically parse labels and scores for each image:

<img src="https://github.com/caleb-kaiser/kangas_examples/blob/master/Oct-25-2022%2016-43-56.gif">

And voil&agrave;! Now you're started using Kangas. 

### Pandas DataFrames

Kangas can also read Pandas DataFrame objects directly:

```python
import kangas as kg
import pandas as pd

df = pd.DataFrame({"hidden_layer_size": [8, 16, 64], "loss": [0.97, 0.53, 0.12]})
dg = kg.read_dataframe(df)
```
### HuggingFace Datasets

HuggingFace's datasets can also be loaded into DataGrid directly because they use
rows of dictionaries, and images are represented by PIL images. DataGrid will
automatically convert PIL images into a [Kangas Image](https://github.com/comet-ml/kangas/wiki/Image#image):

```python
import kangas as kg
from datasets import load_dataset

dataset = load_dataset("beans", split="train")
dg = kg.DataGrid(dataset)
```

### Parquet files

> **Note**: You will need to have pyarrow installed to read parquet files.

```python
import kangas as kg

dg = kg.read_parquet("https://github.com/Teradata/kylo/raw/master/samples/sample-data/parquet/userdata5.parquet")
```

If you'd like to explore further, take a look at our example notebooks below:

## Documentation

1. <a href="https://github.com/comet-ml/kangas/wiki">Documentation Homepage</a>
2. <a href="https://github.com/comet-ml/kangas/blob/main/notebooks/DataGrid-Getting%20Started.ipynb">Quickstart Notebook</a> <a href="https://colab.research.google.com/github/comet-ml/kangas/blob/main/notebooks/DataGrid-Getting%20Started.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg"></a>
3. <a href="https://github.com/comet-ml/kangas/blob/main/notebooks/Integrations.ipynb">Integrations Notebook</a> <a href="https://colab.research.google.com/github/comet-ml/kangas/blob/main/notebooks/Integrations.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg"></a>
4. <a href="https://github.com/comet-ml/kangas/blob/main/examples/mnist_script.py"> MNIST Classification Example</a>

## FAQ

### Is Kangas ready for public use?
Kangas is currently in an open beta. We stress test Kangas heavily and often, and are confident in sharing with the public. That being said, it is a very young project, and there will be bugs and rough edges. Additionally, new features will be added at a fast pace, so if you find a bug or have a request, please do not hesitate to open a ticket or start a discussion.

### Does Kangas support _____ system?
Kangas can be run as a standalone application on newer versions of Windows, MacOS, and most popular Linux distributions. In addition, Kangas can run remotely via Google Colab, or within any Jupyter notebook environment.

### When should I use Kangas instead of _____?
#### Pandas
Kangas and Pandas are complimentary tools. When you've wrangled your data into a Pandas DataFrame, Kangas can ingest that DataFrame via the `DataGrid.read_dataframe()` method, making it easy to visualize and explore your tabular data. Additionally, if your data is too large to process in Pandas or involves multimedia assets, Kangas is a strong alternative.

#### Tensorboard
TensorBoard is one of several tools (including Kangas parent organization, [Comet](https://www.comet.com/site/?utm_source=kangas&utm_medium=referral&utm_campaign=kangas_datagrids_2022&utm_content=github) that specializes in experiment management and monitoring). Like Kangas, it provides charting and visualizations out of the box, but is specifically designed for analyzing training workflows. Kangas, in contrast, is designed to analyze any dataset. For example, even if you use a tool like TensorBoard for analyzing training runs, you may still use Kangas before training for exploratory data analysis, or for prediction analysis post-deployment.

### What is Kangas relationship with Comet?
Kangas is developed and maintained by the Research team at [Comet](https://www.comet.com/site/?utm_source=kangas&utm_medium=referral&utm_campaign=kangas_datagrids_2022&utm_content=github). It began life as a prototype for Comet users who needed to visualize large computer vision datasets, and was later spun out into a standalone open source project. Kangas is and always will be free and open source software, and we are more than happy to accept community contributions.

## Contributing
Kangas has only recently been released, and as such, we don't have much of a formal process for contributions. If you have an idea or would like to make a contribution, we recommend opening a ticket describing your proposed contribution so that we can collaborate directly. We love working with community contributors.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/comet-ml/kangas",
    "name": "kangas",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "data science,python,machine learning",
    "author": "Kangas Development Team",
    "author_email": "",
    "download_url": "",
    "platform": "Linux",
    "description": "<div align=\"center\">\n<img src=\"https://raw.githubusercontent.com/comet-ml/kangas/main/imgs/kangas-datagrid.png\"><br>\n</div>\n\n-----------------\n\n<p align=\"center\">\n    <a href=\"https://badge.fury.io/py/kangas\">\n        <img src=\"https://badge.fury.io/py/kangas.png\" alt=\"PyPI version\" height=\"18\">\n    </a>\n    <a rel=\"nofollow\" href=\"https://opensource.org/licenses/Apache-2.0\">\n        <img alt=\"GitHub\" src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\">\n    </a>\n    <a rel=\"nofollow\" href=\"https://colab.research.google.com/github/comet-ml/kangas/blob/main/notebooks/DataGrid-Getting%20Started.ipynb\">\n        <img src=\"https://colab.research.google.com/assets/colab-badge.svg\">\n    </a>\n    <a href=\"https://kangas.comet.com?datagrid=/data/coco-500.datagrid\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/badge/Kangas-Live%20Demo-blue.svg\" alt=\"Kangas Live Demo\">\n    </a>\n    <a href=\"https://github.com/comet-ml/kangas/wiki\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/badge/Kangas-Docs-blue.svg\" alt=\"Kangas Documentation\">\n    </a>\n    <a rel=\"nofollow\" href=\"https://pepy.tech/project/kangas\">\n        <img style=\"max-width: 100%;\" data-canonical-src=\"https://pepy.tech/badge/kangas\" alt=\"Downloads\"  src=\"https://camo.githubusercontent.com/708e470ec83922035f2189544eb968c8c5bba5c8623b0ebb9cb88c5c370766c4/68747470733a2f2f706570792e746563682f62616467652f6b616e676173\">\n    </a>\n    <a rel=\"nofollow\" href=\"https://doi.org/10.5281/zenodo.7410884\">\n        <img src=\"https://zenodo.org/badge/DOI/10.5281/zenodo.7410884.svg\" alt=\"DOI\">\n    </a>\n</p>\n\n# Kangas: Explore Multimedia Datasets at Scale :kangaroo:\n\nKangas is a tool for exploring, analyzing, and visualizing large-scale multimedia data. It provides a straightforward Python API\nfor logging large tables of data, along with an intuitive visual interface for performing complex queries against your dataset.\n\nThe key features of Kangas include:\n\n- **Scalability**. Kangas DataGrid, the fundamental class for representing datasets, can easily store millions of rows of data.\n- **Performance**. Group, sort, and filter across millions of data points in seconds with a simple, fast UI.\n- **Interoperability**. Any data, any environment. Kangas can run in a notebook or as a standalone app, both locally and remotely.\n- **Integrated computer vision support**. Visualize and filter bounding boxes, labels, and metadata without any extra setup.\n\nYou can access a live demo of Kangas at <a href=\"https://kangas.comet.com?datagrid=/data/coco-500.datagrid\">kangas.comet.com</a>. \n\n## Getting Started\n\nKangas is accessible as a Python library via pip\n```\npip install kangas\n```\n\nOnce installed, there are many ways to load or create a DataGrid. \n\nWithout writing any code, you can even download a DataGrid and begin exploring the data. At the console:\n\n```\nkangas server https://github.com/caleb-kaiser/kangas_examples/raw/master/coco-500.datagrid.zip\n```\n\nThat's it!\n\nIn the next example, we load a publicly available DataGrid file, but the Kangas API also provides methods for ingesting CSVs, Pandas DataFrames, and for manually constructing a new DataGrid:\n\n```python\nimport kangas as kg\n\n# Load an existing DataGrid\ndg = kg.read_datagrid(\"https://github.com/caleb-kaiser/kangas_examples/raw/master/coco-500.datagrid.zip\")\n```\n\nAfter your DataGrid is initialized, you can render it within the Kangas Viewer directly from Python:\n\n```python\ndg.show()\n```\n<img width=\"1789\" alt=\"image\" src=\"https://user-images.githubusercontent.com/42076840/197875668-5519d504-2209-472f-952e-ed09554ecb7a.png\">\n\nFrom the Kangas Viewer, you can group, sort, and filter data. In addition, Kangas will do its best to parse any metadata attached to your assets. For example, if you're using the COCO-500 DataGrid from the quickstart above, Kangas will automatically parse labels and scores for each image:\n\n<img src=\"https://github.com/caleb-kaiser/kangas_examples/blob/master/Oct-25-2022%2016-43-56.gif\">\n\nAnd voil&agrave;! Now you're started using Kangas. \n\n### Pandas DataFrames\n\nKangas can also read Pandas DataFrame objects directly:\n\n```python\nimport kangas as kg\nimport pandas as pd\n\ndf = pd.DataFrame({\"hidden_layer_size\": [8, 16, 64], \"loss\": [0.97, 0.53, 0.12]})\ndg = kg.read_dataframe(df)\n```\n### HuggingFace Datasets\n\nHuggingFace's datasets can also be loaded into DataGrid directly because they use\nrows of dictionaries, and images are represented by PIL images. DataGrid will\nautomatically convert PIL images into a [Kangas Image](https://github.com/comet-ml/kangas/wiki/Image#image):\n\n```python\nimport kangas as kg\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"beans\", split=\"train\")\ndg = kg.DataGrid(dataset)\n```\n\n### Parquet files\n\n> **Note**: You will need to have pyarrow installed to read parquet files.\n\n```python\nimport kangas as kg\n\ndg = kg.read_parquet(\"https://github.com/Teradata/kylo/raw/master/samples/sample-data/parquet/userdata5.parquet\")\n```\n\nIf you'd like to explore further, take a look at our example notebooks below:\n\n## Documentation\n\n1. <a href=\"https://github.com/comet-ml/kangas/wiki\">Documentation Homepage</a>\n2. <a href=\"https://github.com/comet-ml/kangas/blob/main/notebooks/DataGrid-Getting%20Started.ipynb\">Quickstart Notebook</a> <a href=\"https://colab.research.google.com/github/comet-ml/kangas/blob/main/notebooks/DataGrid-Getting%20Started.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\"></a>\n3. <a href=\"https://github.com/comet-ml/kangas/blob/main/notebooks/Integrations.ipynb\">Integrations Notebook</a> <a href=\"https://colab.research.google.com/github/comet-ml/kangas/blob/main/notebooks/Integrations.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\"></a>\n4. <a href=\"https://github.com/comet-ml/kangas/blob/main/examples/mnist_script.py\"> MNIST Classification Example</a>\n\n## FAQ\n\n### Is Kangas ready for public use?\nKangas is currently in an open beta. We stress test Kangas heavily and often, and are confident in sharing with the public. That being said, it is a very young project, and there will be bugs and rough edges. Additionally, new features will be added at a fast pace, so if you find a bug or have a request, please do not hesitate to open a ticket or start a discussion.\n\n### Does Kangas support _____ system?\nKangas can be run as a standalone application on newer versions of Windows, MacOS, and most popular Linux distributions. In addition, Kangas can run remotely via Google Colab, or within any Jupyter notebook environment.\n\n### When should I use Kangas instead of _____?\n#### Pandas\nKangas and Pandas are complimentary tools. When you've wrangled your data into a Pandas DataFrame, Kangas can ingest that DataFrame via the `DataGrid.read_dataframe()` method, making it easy to visualize and explore your tabular data. Additionally, if your data is too large to process in Pandas or involves multimedia assets, Kangas is a strong alternative.\n\n#### Tensorboard\nTensorBoard is one of several tools (including Kangas parent organization, [Comet](https://www.comet.com/site/?utm_source=kangas&utm_medium=referral&utm_campaign=kangas_datagrids_2022&utm_content=github) that specializes in experiment management and monitoring). Like Kangas, it provides charting and visualizations out of the box, but is specifically designed for analyzing training workflows. Kangas, in contrast, is designed to analyze any dataset. For example, even if you use a tool like TensorBoard for analyzing training runs, you may still use Kangas before training for exploratory data analysis, or for prediction analysis post-deployment.\n\n### What is Kangas relationship with Comet?\nKangas is developed and maintained by the Research team at [Comet](https://www.comet.com/site/?utm_source=kangas&utm_medium=referral&utm_campaign=kangas_datagrids_2022&utm_content=github). It began life as a prototype for Comet users who needed to visualize large computer vision datasets, and was later spun out into a standalone open source project. Kangas is and always will be free and open source software, and we are more than happy to accept community contributions.\n\n## Contributing\nKangas has only recently been released, and as such, we don't have much of a formal process for contributions. If you have an idea or would like to make a contribution, we recommend opening a ticket describing your proposed contribution so that we can collaborate directly. We love working with community contributors.\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Tool for exploring columnar data, including multimedia",
    "version": "2.4.7",
    "project_urls": {
        "Homepage": "https://github.com/comet-ml/kangas"
    },
    "split_keywords": [
        "data science",
        "python",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b202f246ae42b056dc02a6275c105db9506c68db5817c073ec3eddb1321f6203",
                "md5": "017e12c4892938e9d8041afae85fbde2",
                "sha256": "2d5538afa840754da0f44c0aae2dd8e1311759dd524cffb3209c6a38a4ee9e4a"
            },
            "downloads": -1,
            "filename": "kangas-2.4.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "017e12c4892938e9d8041afae85fbde2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 15725112,
            "upload_time": "2024-01-26T13:23:50",
            "upload_time_iso_8601": "2024-01-26T13:23:50.762837Z",
            "url": "https://files.pythonhosted.org/packages/b2/02/f246ae42b056dc02a6275c105db9506c68db5817c073ec3eddb1321f6203/kangas-2.4.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-26 13:23:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "comet-ml",
    "github_project": "kangas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "kangas"
}
        
Elapsed time: 0.18435s