shellinford


Nameshellinford JSON
Version 0.4.1 PyPI version JSON
download
home_pagehttps://github.com/ikegami-yukino/shellinford-python
SummaryWavelet Matrix/Tree succinct data structure for full text search (using shellinford C++ library)
upload_time2019-02-08 13:56:24
maintainer
docs_urlNone
authorYukino Ikegami
requires_python
license
keywords full text search fm-index wavelet matrix
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            shellinford
===========
|travis| |coveralls| |pyversion| |version| |license|

Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.

It is based on `shellinford`_ C++ library.

.. _shellinford: https://github.com/echizentm/shellinford

NOTE: This module requires C++11 compiler

Installation
============

::

 $ pip install shellinford


Usage
=====

Create a new FM-index instance
-------------------------------

.. code:: python

 >>> import shellinford
 >>> fm = shellinford.FMIndex()


- shellinford.Shellinford([use_wavelet_tree=True, filename=None])

  - When given a filename, Shellinford loads FM-index data from the file


Build FM-index
-----------------------------

.. code:: python

 >>> fm.build(['Milky Holmes', 'Sherlock "Sheryl" Shellingford', 'Milky'], 'milky.fm')

- build([docs, filename])

  - When given a filename, Shellinford stores FM-index data to the file


Search word from FM-index
---------------------------------

.. code:: python

 >>> for doc in fm.search('Milky'):
 >>>     print('doc_id:', doc.doc_id)
 >>>     print('count:', doc.count)
 >>>     print('text:', doc.text)
 doc_id: 0
 count: [1]
 text: Milky Holmes
 doc_id: 2
 count: [1]
 text: Milky

 >>> for doc in fm.search(['Milky', 'Holmes']):
 >>>     print('doc_id:', doc.doc_id)
 >>>     print('count:', doc.count)
 >>>     print('text:', doc.text)
 doc_id: 1
 count: [1]
 text: Milky Holmes

- search(query, [_or=False, ignores=[]])

  - If `_or` = True, then "OR" search is executed, else "AND" search
  - Given `ignores`, "NOT" search is also executed
  - NOTE: The search function is available after FM-index is built or loaded


Count word from FM-index
---------------------------------

.. code:: python

 >>> fm.count('Milky'):
 2

 >>> fm.count(['Milky', 'Holmes']):
 1

- count(query, [_or=False])

  - If `_or` = True, then "OR" search is executed, else "AND" search
  - NOTE: The count function is available after FM-index is built or loaded
  - This function is slightly faster than the search function



Add a document
---------------------------------

.. code:: python

 >>> fm.push_back('Baritsu')

- push_back(doc)

  - NOTE: A document added by this method is not available to search until build


Read FM-index from a binary file
---------------------------------

.. code:: python

 >>> fm.read('milky_holmes.fm')

- read(path)


Write FM-index binary to a file
---------------------------------

.. code:: python

 >>> fm.write('milky_holmes.fm')

- write(path)


Check Whether FM-Index contains string
---------------------------------------

.. code:: python

 >>> 'baritsu' in fm


License
=========
- Wrapper code is licensed under the New BSD License.
- Bundled `shellinford`_ C++ library (c) 2012 echizen_tm is licensed under the New BSD License.


.. |travis| image:: https://travis-ci.org/ikegami-yukino/shellinford-python.svg?branch=master
    :target: https://travis-ci.org/ikegami-yukino/shellinford-python
    :alt: travis-ci.org

.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/shellinford-python/badge.svg?branch=master&service=github
    :target: https://coveralls.io/github/ikegami-yukino/shellinford-python?branch=master
    :alt: coveralls.io

.. |pyversion| image:: https://img.shields.io/pypi/pyversions/shellinford.svg

.. |version| image:: https://img.shields.io/pypi/v/shellinford.svg
    :target: http://pypi.python.org/pypi/shellinford/
    :alt: latest version

.. |license| image:: https://img.shields.io/pypi/l/shellinford.svg
    :target: http://pypi.python.org/pypi/shellinford/
    :alt: license


CHANGES
=======

0.4.1 (2010-02-08)
------------------

- Make "in" operator faster

0.4.0 (2018-09-30)
------------------

- `FMIndex.count()` is added
- No longer support Python 2.6
- bug fix

0.3.5 (2018-09-05)
------------------

- `FMIndex.build()` and `FMIndex.pushback()` ignore empty string
- `FMIndex` supports "in" operator. (e.g., 'a' in fm)
- Support Python 3.5, 3.6 and 3.7

0.3.4 (2016-10-28)
------------------

- `FMIndex.search()` returns list

0.3 (2014-11-24)
----------------

- "OR" search and "NOT" search are available in `FMIndex.search()`.
- `FMIndex.size` and `FMIndex.docsize` are available as property

0.2 (2014-03-28)
----------------

"AND" search is available by giving Sequence (list, tuple, etc.) `FMIndex.search()`

0.1 (2014-03-11)
----------------

First release.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ikegami-yukino/shellinford-python",
    "name": "shellinford",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "full text search,FM-index,Wavelet Matrix",
    "author": "Yukino Ikegami",
    "author_email": "yknikgm@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fd/d7/717cc007043e951cccc6f384b25df4161cb54391b69f93c5b1b29cf9b924/shellinford-0.4.1.tar.gz",
    "platform": "",
    "description": "shellinford\n===========\n|travis| |coveralls| |pyversion| |version| |license|\n\nShellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.\n\nIt is based on `shellinford`_ C++ library.\n\n.. _shellinford: https://github.com/echizentm/shellinford\n\nNOTE: This module requires C++11 compiler\n\nInstallation\n============\n\n::\n\n $ pip install shellinford\n\n\nUsage\n=====\n\nCreate a new FM-index instance\n-------------------------------\n\n.. code:: python\n\n >>> import shellinford\n >>> fm = shellinford.FMIndex()\n\n\n- shellinford.Shellinford([use_wavelet_tree=True, filename=None])\n\n  - When given a filename, Shellinford loads FM-index data from the file\n\n\nBuild FM-index\n-----------------------------\n\n.. code:: python\n\n >>> fm.build(['Milky Holmes', 'Sherlock \"Sheryl\" Shellingford', 'Milky'], 'milky.fm')\n\n- build([docs, filename])\n\n  - When given a filename, Shellinford stores FM-index data to the file\n\n\nSearch word from FM-index\n---------------------------------\n\n.. code:: python\n\n >>> for doc in fm.search('Milky'):\n >>>     print('doc_id:', doc.doc_id)\n >>>     print('count:', doc.count)\n >>>     print('text:', doc.text)\n doc_id: 0\n count: [1]\n text: Milky Holmes\n doc_id: 2\n count: [1]\n text: Milky\n\n >>> for doc in fm.search(['Milky', 'Holmes']):\n >>>     print('doc_id:', doc.doc_id)\n >>>     print('count:', doc.count)\n >>>     print('text:', doc.text)\n doc_id: 1\n count: [1]\n text: Milky Holmes\n\n- search(query, [_or=False, ignores=[]])\n\n  - If `_or` = True, then \"OR\" search is executed, else \"AND\" search\n  - Given `ignores`, \"NOT\" search is also executed\n  - NOTE: The search function is available after FM-index is built or loaded\n\n\nCount word from FM-index\n---------------------------------\n\n.. code:: python\n\n >>> fm.count('Milky'):\n 2\n\n >>> fm.count(['Milky', 'Holmes']):\n 1\n\n- count(query, [_or=False])\n\n  - If `_or` = True, then \"OR\" search is executed, else \"AND\" search\n  - NOTE: The count function is available after FM-index is built or loaded\n  - This function is slightly faster than the search function\n\n\n\nAdd a document\n---------------------------------\n\n.. code:: python\n\n >>> fm.push_back('Baritsu')\n\n- push_back(doc)\n\n  - NOTE: A document added by this method is not available to search until build\n\n\nRead FM-index from a binary file\n---------------------------------\n\n.. code:: python\n\n >>> fm.read('milky_holmes.fm')\n\n- read(path)\n\n\nWrite FM-index binary to a file\n---------------------------------\n\n.. code:: python\n\n >>> fm.write('milky_holmes.fm')\n\n- write(path)\n\n\nCheck Whether FM-Index contains string\n---------------------------------------\n\n.. code:: python\n\n >>> 'baritsu' in fm\n\n\nLicense\n=========\n- Wrapper code is licensed under the New BSD License.\n- Bundled `shellinford`_ C++ library (c) 2012 echizen_tm is licensed under the New BSD License.\n\n\n.. |travis| image:: https://travis-ci.org/ikegami-yukino/shellinford-python.svg?branch=master\n    :target: https://travis-ci.org/ikegami-yukino/shellinford-python\n    :alt: travis-ci.org\n\n.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/shellinford-python/badge.svg?branch=master&service=github\n    :target: https://coveralls.io/github/ikegami-yukino/shellinford-python?branch=master\n    :alt: coveralls.io\n\n.. |pyversion| image:: https://img.shields.io/pypi/pyversions/shellinford.svg\n\n.. |version| image:: https://img.shields.io/pypi/v/shellinford.svg\n    :target: http://pypi.python.org/pypi/shellinford/\n    :alt: latest version\n\n.. |license| image:: https://img.shields.io/pypi/l/shellinford.svg\n    :target: http://pypi.python.org/pypi/shellinford/\n    :alt: license\n\n\nCHANGES\n=======\n\n0.4.1 (2010-02-08)\n------------------\n\n- Make \"in\" operator faster\n\n0.4.0 (2018-09-30)\n------------------\n\n- `FMIndex.count()` is added\n- No longer support Python 2.6\n- bug fix\n\n0.3.5 (2018-09-05)\n------------------\n\n- `FMIndex.build()` and `FMIndex.pushback()` ignore empty string\n- `FMIndex` supports \"in\" operator. (e.g., 'a' in fm)\n- Support Python 3.5, 3.6 and 3.7\n\n0.3.4 (2016-10-28)\n------------------\n\n- `FMIndex.search()` returns list\n\n0.3 (2014-11-24)\n----------------\n\n- \"OR\" search and \"NOT\" search are available in `FMIndex.search()`.\n- `FMIndex.size` and `FMIndex.docsize` are available as property\n\n0.2 (2014-03-28)\n----------------\n\n\"AND\" search is available by giving Sequence (list, tuple, etc.) `FMIndex.search()`\n\n0.1 (2014-03-11)\n----------------\n\nFirst release.",
    "bugtrack_url": null,
    "license": "",
    "summary": "Wavelet Matrix/Tree succinct data structure for full text search (using shellinford C++ library)",
    "version": "0.4.1",
    "split_keywords": [
        "full text search",
        "fm-index",
        "wavelet matrix"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "07c7eb51f777dfe91e3d64e0c9923c17",
                "sha256": "04a323dbce44234f4b8df7f61307c26022fe5e34d5ed433d8ed03df8d3b7a725"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp36-cp36m-win32.whl",
            "has_sig": false,
            "md5_digest": "07c7eb51f777dfe91e3d64e0c9923c17",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": null,
            "size": 104045,
            "upload_time": "2021-05-02T03:36:30",
            "upload_time_iso_8601": "2021-05-02T03:36:30.128449Z",
            "url": "https://files.pythonhosted.org/packages/de/f5/ff8b6601e0f3a5c487b716f9b1099c17906f5eda21b12ce4aebfa161a7d7/shellinford-0.4.1-cp36-cp36m-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "5e8a3804e264f68135432d929e87e9e6",
                "sha256": "ac02460be8eccffd26cbfe240f51e6fbe36d36a53eab1942ea4eb6fbc463e385"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp36-cp36m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "5e8a3804e264f68135432d929e87e9e6",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": null,
            "size": 122574,
            "upload_time": "2021-05-02T03:36:31",
            "upload_time_iso_8601": "2021-05-02T03:36:31.844947Z",
            "url": "https://files.pythonhosted.org/packages/d0/b3/dfdc83468ec44bf866ec0b960333fbe0b6ed2e7be3f2cbe4144c7b830196/shellinford-0.4.1-cp36-cp36m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "65eb799186f17296fc779fa27f13cf5a",
                "sha256": "3c2d5f05401508c17540c4afa0e7b1da224915973edc862f6e21bd91763b9189"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp37-cp37m-win32.whl",
            "has_sig": false,
            "md5_digest": "65eb799186f17296fc779fa27f13cf5a",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 103094,
            "upload_time": "2021-05-02T03:36:33",
            "upload_time_iso_8601": "2021-05-02T03:36:33.291187Z",
            "url": "https://files.pythonhosted.org/packages/65/64/58cad7a4ca19e1303d7f2c04c30ef9686b67e23061829298523efaf93a07/shellinford-0.4.1-cp37-cp37m-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "ca19765818a752baf066928de4b92db0",
                "sha256": "801ed0d8050a371ba42bb8d97c25487f040484bc958c47188cef8da7768dd18c"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp37-cp37m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "ca19765818a752baf066928de4b92db0",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 121623,
            "upload_time": "2021-05-02T03:36:35",
            "upload_time_iso_8601": "2021-05-02T03:36:35.244160Z",
            "url": "https://files.pythonhosted.org/packages/5c/a1/c75922dda3f2dc9d8835dd9619bcb5ce70d8fc5ac323ef4721cc47d25db5/shellinford-0.4.1-cp37-cp37m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "ceb65998da3a4fc20a9ec94d190eea24",
                "sha256": "d3480e4f7e5c2033c8d82a9b56a367b3d034fe5d0a057bc9cf1442b79ad2c05b"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp38-cp38-win32.whl",
            "has_sig": false,
            "md5_digest": "ceb65998da3a4fc20a9ec94d190eea24",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": null,
            "size": 85773,
            "upload_time": "2021-05-02T03:36:36",
            "upload_time_iso_8601": "2021-05-02T03:36:36.697447Z",
            "url": "https://files.pythonhosted.org/packages/10/16/cf74911f8c30457866174e462a3c1824a36383cd9618939bf215f409dfa4/shellinford-0.4.1-cp38-cp38-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "caf9af5668f999da123531c64e17b4f9",
                "sha256": "2b4b5bd9b9175987ceaea76db1791123c909fa3378d0fd596882b86d18338131"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp38-cp38-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "caf9af5668f999da123531c64e17b4f9",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": null,
            "size": 98926,
            "upload_time": "2021-05-02T03:36:38",
            "upload_time_iso_8601": "2021-05-02T03:36:38.140167Z",
            "url": "https://files.pythonhosted.org/packages/6b/bc/b7e34f98040e13d688e075b11b907ad3ed793f831a22bcb51c88369d224d/shellinford-0.4.1-cp38-cp38-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "84ef80b84ab3e4d390083569c64b171d",
                "sha256": "6cfa15eb07ed4d120f98270a47cab6ed9998552aab4e2220eaac83882d014253"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp39-cp39-win32.whl",
            "has_sig": false,
            "md5_digest": "84ef80b84ab3e4d390083569c64b171d",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 85323,
            "upload_time": "2021-05-02T03:36:39",
            "upload_time_iso_8601": "2021-05-02T03:36:39.610182Z",
            "url": "https://files.pythonhosted.org/packages/c1/b6/fd277c52c5cfcde5f85c9de958f3a02aac7a17c7520a49266cbdc4396ef9/shellinford-0.4.1-cp39-cp39-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "32300f3fb06f618c5e7cd68b9ddfb6f5",
                "sha256": "387dcfc4eab1d02f034f040301b44fe85bae45e1b3a7c216fb11d24eb179561a"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1-cp39-cp39-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "32300f3fb06f618c5e7cd68b9ddfb6f5",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 99213,
            "upload_time": "2021-05-02T03:36:40",
            "upload_time_iso_8601": "2021-05-02T03:36:40.891027Z",
            "url": "https://files.pythonhosted.org/packages/bf/45/f0c098da42050b3dae1a2af6b2c3b6e83ac59fbbd0697a89958df4413a79/shellinford-0.4.1-cp39-cp39-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "d485d6483ace46aca6b6662bea346877",
                "sha256": "c19f125a9d22d9676dbec64c0490ddd2d95d2449363052ddc2f4a588a52b04b3"
            },
            "downloads": -1,
            "filename": "shellinford-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d485d6483ace46aca6b6662bea346877",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 64999,
            "upload_time": "2019-02-08T13:56:24",
            "upload_time_iso_8601": "2019-02-08T13:56:24.446540Z",
            "url": "https://files.pythonhosted.org/packages/fd/d7/717cc007043e951cccc6f384b25df4161cb54391b69f93c5b1b29cf9b924/shellinford-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2019-02-08 13:56:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "ikegami-yukino",
    "github_project": "shellinford-python",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": false,
    "lcname": "shellinford"
}
        
Elapsed time: 0.07003s