vectorizers


Namevectorizers JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/TutteInstitute/vectorizers
SummaryA suite of vectorizers for various data types.
upload_time2024-11-13 14:52:42
maintainerJohn Healy, Leland McInnes, Colin Weir
docs_urlNone
authorNone
requires_pythonNone
licensenew BSD
keywords
VCS
bugtrack_url
requirements numpy scipy scikit-learn numba pandas dask pynndescent pomegranate
Travis-CI
coveralls test coverage
            .. -*- mode: rst -*-

.. image:: doc/vectorizers_logo_text.png
  :width: 600
  :alt: Vectorizers Logo

|Travis|_ |AppVeyor|_ |Codecov|_ |CircleCI|_ |ReadTheDocs|_

.. |Travis| image:: https://travis-ci.com/TutteInstitute/vectorizers.svg?branch=master
.. _Travis: https://travis-ci.com/TutteInstitute/vectorizers

.. |AppVeyor| image:: https://ci.appveyor.com/api/projects/status/sjawsgwo7g4k3jon?svg=true
.. _AppVeyor: https://ci.appveyor.com/project/lmcinnes/vectorizers

.. |Codecov| image:: https://codecov.io/gh/TutteInstitute/vectorizers/branch/master/graph/badge.svg
.. _Codecov: https://codecov.io/gh/TutteInstitute/vectorizers


.. |CircleCI| image:: https://circleci.com/gh/TutteInstitute/vectorizers.svg?style=shield&circle-token=:circle-token
.. _CircleCI: https://circleci.com/gh/scikit-learn-contrib/project-template/tree/master

.. |ReadTheDocs| image:: https://readthedocs.org/projects/vectorizers/badge/?version=latest
.. _ReadTheDocs: https://vectorizers.readthedocs.io/en/latest/?badge=latest

===========
Vectorizers
===========

There are a large number of machine learning tools for effectively exploring and working
with data that is given as vectors (ideally with a defined notion of distance as well).
There is also a large volume of data that does not come neatly packaged as vectors. It
could be text data, variable length sequence data (either numeric or categorical),
dataframes of mixed data types, sets of point clouds, or more. Usually, one way or another,
such data can be wrangled into vectors in a way that preserves some relevant properties
of the original data. This library seeks to provide a suite of a wide variety of
general purpose techniques for such wrangling, making it easier and faster for users
to get various kinds of unstructured sequence data into vector formats for exploration and
machine learning.

--------------------
Why use Vectorizers?
--------------------

Data wrangling can be tedious, error-prone, and fragile when trying to integrate it into
production pipelines. The vectorizers library aims to provide a set of easy to use
tools for turning various kinds of unstructured sequence data into vectors. By following the
scikit-learn transformer API we ensure that any of the vectorizer classes can be
trivially integrated into existing sklearn workflows or pipelines. By keeping the
vectorization approaches as general as possible (as opposed to specialising on very
specific data types), we aim to ensure that a very broad range of data can be handled
efficiently. Finally we aim to provide robust techniques with sound mathematical foundations
over potentially more powerful but black-box approaches for greater transparency
in data processing and transformation.

----------------------
How to use Vectorizers
----------------------

Quick start examples to be added soon ...

For further examples on using this library for text we recommend checking out the documentation
written up in the EasyData reproducible data science framework by some of our colleagues over at:
https://github.com/hackalog/vectorizers_playground

----------
Installing
----------

Vectorizers is designed to be easy to install being a pure python module with
relatively light requirements:

* numpy
* scipy
* scikit-learn >= 0.22
* numba >= 0.51

To install the package from PyPI:

.. code:: bash

    pip install vectorizers

To install the package from source:

.. code:: bash

    pip install https://github.com/TutteInstitute/vectorizers/archive/master.zip

----------------
Help and Support
----------------

This project is still young. The `documentation <https://vectorizers.readthedocs.io/en/latest/>`_ is still growing. In the meantime please
`open an issue <https://github.com/TutteInstitute/vectorizers/issues/new>`_
and we will try to provide any help and guidance that we can. Please also check
the docstrings on the code, which provide some descriptions of the parameters.

------------
Contributing
------------

Contributions are more than welcome! There are lots of opportunities
for potential projects, so please get in touch if you would like to
help out. Everything from code to notebooks to
examples and documentation are all *equally valuable* so please don't feel
you can't contribute. We would greatly appreciate the contribution of
tutorial notebooks applying vectorizer tools to diverse or interesting
datasets. If you find vectorizers useful for your data please consider
contributing an example showing how it can apply to the kind of data
you work with!


To contribute please `fork the project <https://github.com/TutteInstitute/vectorizers/issues#fork-destination-box>`_ make your changes and
submit a pull request. We will do our best to work through any issues with
you and get your code merged into the main branch.

-------
License
-------

The vectorizers package is 3-clause BSD licensed.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TutteInstitute/vectorizers",
    "name": "vectorizers",
    "maintainer": "John Healy, Leland McInnes, Colin Weir",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "leland.mcinnes@gmail.com",
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/10/7c/758c33fbc370eaea5b74fbb2b8a23ef689bfdc844672148d415fe61a2f83/vectorizers-0.2.2.tar.gz",
    "platform": null,
    "description": ".. -*- mode: rst -*-\n\n.. image:: doc/vectorizers_logo_text.png\n  :width: 600\n  :alt: Vectorizers Logo\n\n|Travis|_ |AppVeyor|_ |Codecov|_ |CircleCI|_ |ReadTheDocs|_\n\n.. |Travis| image:: https://travis-ci.com/TutteInstitute/vectorizers.svg?branch=master\n.. _Travis: https://travis-ci.com/TutteInstitute/vectorizers\n\n.. |AppVeyor| image:: https://ci.appveyor.com/api/projects/status/sjawsgwo7g4k3jon?svg=true\n.. _AppVeyor: https://ci.appveyor.com/project/lmcinnes/vectorizers\n\n.. |Codecov| image:: https://codecov.io/gh/TutteInstitute/vectorizers/branch/master/graph/badge.svg\n.. _Codecov: https://codecov.io/gh/TutteInstitute/vectorizers\n\n\n.. |CircleCI| image:: https://circleci.com/gh/TutteInstitute/vectorizers.svg?style=shield&circle-token=:circle-token\n.. _CircleCI: https://circleci.com/gh/scikit-learn-contrib/project-template/tree/master\n\n.. |ReadTheDocs| image:: https://readthedocs.org/projects/vectorizers/badge/?version=latest\n.. _ReadTheDocs: https://vectorizers.readthedocs.io/en/latest/?badge=latest\n\n===========\nVectorizers\n===========\n\nThere are a large number of machine learning tools for effectively exploring and working\nwith data that is given as vectors (ideally with a defined notion of distance as well).\nThere is also a large volume of data that does not come neatly packaged as vectors. It\ncould be text data, variable length sequence data (either numeric or categorical),\ndataframes of mixed data types, sets of point clouds, or more. Usually, one way or another,\nsuch data can be wrangled into vectors in a way that preserves some relevant properties\nof the original data. This library seeks to provide a suite of a wide variety of\ngeneral purpose techniques for such wrangling, making it easier and faster for users\nto get various kinds of unstructured sequence data into vector formats for exploration and\nmachine learning.\n\n--------------------\nWhy use Vectorizers?\n--------------------\n\nData wrangling can be tedious, error-prone, and fragile when trying to integrate it into\nproduction pipelines. The vectorizers library aims to provide a set of easy to use\ntools for turning various kinds of unstructured sequence data into vectors. By following the\nscikit-learn transformer API we ensure that any of the vectorizer classes can be\ntrivially integrated into existing sklearn workflows or pipelines. By keeping the\nvectorization approaches as general as possible (as opposed to specialising on very\nspecific data types), we aim to ensure that a very broad range of data can be handled\nefficiently. Finally we aim to provide robust techniques with sound mathematical foundations\nover potentially more powerful but black-box approaches for greater transparency\nin data processing and transformation.\n\n----------------------\nHow to use Vectorizers\n----------------------\n\nQuick start examples to be added soon ...\n\nFor further examples on using this library for text we recommend checking out the documentation\nwritten up in the EasyData reproducible data science framework by some of our colleagues over at:\nhttps://github.com/hackalog/vectorizers_playground\n\n----------\nInstalling\n----------\n\nVectorizers is designed to be easy to install being a pure python module with\nrelatively light requirements:\n\n* numpy\n* scipy\n* scikit-learn >= 0.22\n* numba >= 0.51\n\nTo install the package from PyPI:\n\n.. code:: bash\n\n    pip install vectorizers\n\nTo install the package from source:\n\n.. code:: bash\n\n    pip install https://github.com/TutteInstitute/vectorizers/archive/master.zip\n\n----------------\nHelp and Support\n----------------\n\nThis project is still young. The `documentation <https://vectorizers.readthedocs.io/en/latest/>`_ is still growing. In the meantime please\n`open an issue <https://github.com/TutteInstitute/vectorizers/issues/new>`_\nand we will try to provide any help and guidance that we can. Please also check\nthe docstrings on the code, which provide some descriptions of the parameters.\n\n------------\nContributing\n------------\n\nContributions are more than welcome! There are lots of opportunities\nfor potential projects, so please get in touch if you would like to\nhelp out. Everything from code to notebooks to\nexamples and documentation are all *equally valuable* so please don't feel\nyou can't contribute. We would greatly appreciate the contribution of\ntutorial notebooks applying vectorizer tools to diverse or interesting\ndatasets. If you find vectorizers useful for your data please consider\ncontributing an example showing how it can apply to the kind of data\nyou work with!\n\n\nTo contribute please `fork the project <https://github.com/TutteInstitute/vectorizers/issues#fork-destination-box>`_ make your changes and\nsubmit a pull request. We will do our best to work through any issues with\nyou and get your code merged into the main branch.\n\n-------\nLicense\n-------\n\nThe vectorizers package is 3-clause BSD licensed.\n\n\n",
    "bugtrack_url": null,
    "license": "new BSD",
    "summary": "A suite of vectorizers for various data types.",
    "version": "0.2.2",
    "project_urls": {
        "Download": "https://github.com/TutteInstitute/vectorizers",
        "Homepage": "https://github.com/TutteInstitute/vectorizers"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aefa084d9bb19d62890b23f99ed4f9ea5babbc1813c00493fccf9a84390d923c",
                "md5": "67cb9fd9521f5100a7b17bed3fb7c659",
                "sha256": "bc0fb5581d4bdd6f13c9ced1d2eb9739482798fea070cd1a33754601f573241a"
            },
            "downloads": -1,
            "filename": "vectorizers-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "67cb9fd9521f5100a7b17bed3fb7c659",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 125710,
            "upload_time": "2024-11-13T14:52:41",
            "upload_time_iso_8601": "2024-11-13T14:52:41.111309Z",
            "url": "https://files.pythonhosted.org/packages/ae/fa/084d9bb19d62890b23f99ed4f9ea5babbc1813c00493fccf9a84390d923c/vectorizers-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "107c758c33fbc370eaea5b74fbb2b8a23ef689bfdc844672148d415fe61a2f83",
                "md5": "9eb362f7988e57f7870a6531c49759c8",
                "sha256": "9a749bc6ed4d682f6a348dd63224097daeabe47c3bfed2b13c194c98ac0320fe"
            },
            "downloads": -1,
            "filename": "vectorizers-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "9eb362f7988e57f7870a6531c49759c8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 106107,
            "upload_time": "2024-11-13T14:52:42",
            "upload_time_iso_8601": "2024-11-13T14:52:42.912534Z",
            "url": "https://files.pythonhosted.org/packages/10/7c/758c33fbc370eaea5b74fbb2b8a23ef689bfdc844672148d415fe61a2f83/vectorizers-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-13 14:52:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TutteInstitute",
    "github_project": "vectorizers",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": false,
    "circle": true,
    "appveyor": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "numba",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "dask",
            "specs": []
        },
        {
            "name": "pynndescent",
            "specs": [
                [
                    ">=",
                    "0.5"
                ]
            ]
        },
        {
            "name": "pomegranate",
            "specs": []
        }
    ],
    "lcname": "vectorizers"
}
        
Elapsed time: 1.96961s