shellinford
===========
|travis| |coveralls| |pyversion| |version| |license|
Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.
It is based on `shellinford`_ C++ library.
.. _shellinford: https://github.com/echizentm/shellinford
NOTE: This module requires C++11 compiler
Installation
============
::
$ pip install shellinford
Usage
=====
Create a new FM-index instance
-------------------------------
.. code:: python
>>> import shellinford
>>> fm = shellinford.FMIndex()
- shellinford.Shellinford([use_wavelet_tree=True, filename=None])
- When given a filename, Shellinford loads FM-index data from the file
Build FM-index
-----------------------------
.. code:: python
>>> fm.build(['Milky Holmes', 'Sherlock "Sheryl" Shellingford', 'Milky'], 'milky.fm')
- build([docs, filename])
- When given a filename, Shellinford stores FM-index data to the file
Search word from FM-index
---------------------------------
.. code:: python
>>> for doc in fm.search('Milky'):
>>> print('doc_id:', doc.doc_id)
>>> print('count:', doc.count)
>>> print('text:', doc.text)
doc_id: 0
count: [1]
text: Milky Holmes
doc_id: 2
count: [1]
text: Milky
>>> for doc in fm.search(['Milky', 'Holmes']):
>>> print('doc_id:', doc.doc_id)
>>> print('count:', doc.count)
>>> print('text:', doc.text)
doc_id: 1
count: [1]
text: Milky Holmes
- search(query, [_or=False, ignores=[]])
- If `_or` = True, then "OR" search is executed, else "AND" search
- Given `ignores`, "NOT" search is also executed
- NOTE: The search function is available after FM-index is built or loaded
Count word from FM-index
---------------------------------
.. code:: python
>>> fm.count('Milky'):
2
>>> fm.count(['Milky', 'Holmes']):
1
- count(query, [_or=False])
- If `_or` = True, then "OR" search is executed, else "AND" search
- NOTE: The count function is available after FM-index is built or loaded
- This function is slightly faster than the search function
Add a document
---------------------------------
.. code:: python
>>> fm.push_back('Baritsu')
- push_back(doc)
- NOTE: A document added by this method is not available to search until build
Read FM-index from a binary file
---------------------------------
.. code:: python
>>> fm.read('milky_holmes.fm')
- read(path)
Write FM-index binary to a file
---------------------------------
.. code:: python
>>> fm.write('milky_holmes.fm')
- write(path)
Check Whether FM-Index contains string
---------------------------------------
.. code:: python
>>> 'baritsu' in fm
License
=========
- Wrapper code is licensed under the New BSD License.
- Bundled `shellinford`_ C++ library (c) 2012 echizen_tm is licensed under the New BSD License.
.. |travis| image:: https://travis-ci.org/ikegami-yukino/shellinford-python.svg?branch=master
:target: https://travis-ci.org/ikegami-yukino/shellinford-python
:alt: travis-ci.org
.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/shellinford-python/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/ikegami-yukino/shellinford-python?branch=master
:alt: coveralls.io
.. |pyversion| image:: https://img.shields.io/pypi/pyversions/shellinford.svg
.. |version| image:: https://img.shields.io/pypi/v/shellinford.svg
:target: http://pypi.python.org/pypi/shellinford/
:alt: latest version
.. |license| image:: https://img.shields.io/pypi/l/shellinford.svg
:target: http://pypi.python.org/pypi/shellinford/
:alt: license
CHANGES
=======
0.4.1 (2010-02-08)
------------------
- Make "in" operator faster
0.4.0 (2018-09-30)
------------------
- `FMIndex.count()` is added
- No longer support Python 2.6
- bug fix
0.3.5 (2018-09-05)
------------------
- `FMIndex.build()` and `FMIndex.pushback()` ignore empty string
- `FMIndex` supports "in" operator. (e.g., 'a' in fm)
- Support Python 3.5, 3.6 and 3.7
0.3.4 (2016-10-28)
------------------
- `FMIndex.search()` returns list
0.3 (2014-11-24)
----------------
- "OR" search and "NOT" search are available in `FMIndex.search()`.
- `FMIndex.size` and `FMIndex.docsize` are available as property
0.2 (2014-03-28)
----------------
"AND" search is available by giving Sequence (list, tuple, etc.) `FMIndex.search()`
0.1 (2014-03-11)
----------------
First release.
Raw data
{
"_id": null,
"home_page": "https://github.com/ikegami-yukino/shellinford-python",
"name": "shellinford",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "full text search,FM-index,Wavelet Matrix",
"author": "Yukino Ikegami",
"author_email": "yknikgm@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/fd/d7/717cc007043e951cccc6f384b25df4161cb54391b69f93c5b1b29cf9b924/shellinford-0.4.1.tar.gz",
"platform": "",
"description": "shellinford\n===========\n|travis| |coveralls| |pyversion| |version| |license|\n\nShellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.\n\nIt is based on `shellinford`_ C++ library.\n\n.. _shellinford: https://github.com/echizentm/shellinford\n\nNOTE: This module requires C++11 compiler\n\nInstallation\n============\n\n::\n\n $ pip install shellinford\n\n\nUsage\n=====\n\nCreate a new FM-index instance\n-------------------------------\n\n.. code:: python\n\n >>> import shellinford\n >>> fm = shellinford.FMIndex()\n\n\n- shellinford.Shellinford([use_wavelet_tree=True, filename=None])\n\n - When given a filename, Shellinford loads FM-index data from the file\n\n\nBuild FM-index\n-----------------------------\n\n.. code:: python\n\n >>> fm.build(['Milky Holmes', 'Sherlock \"Sheryl\" Shellingford', 'Milky'], 'milky.fm')\n\n- build([docs, filename])\n\n - When given a filename, Shellinford stores FM-index data to the file\n\n\nSearch word from FM-index\n---------------------------------\n\n.. code:: python\n\n >>> for doc in fm.search('Milky'):\n >>> print('doc_id:', doc.doc_id)\n >>> print('count:', doc.count)\n >>> print('text:', doc.text)\n doc_id: 0\n count: [1]\n text: Milky Holmes\n doc_id: 2\n count: [1]\n text: Milky\n\n >>> for doc in fm.search(['Milky', 'Holmes']):\n >>> print('doc_id:', doc.doc_id)\n >>> print('count:', doc.count)\n >>> print('text:', doc.text)\n doc_id: 1\n count: [1]\n text: Milky Holmes\n\n- search(query, [_or=False, ignores=[]])\n\n - If `_or` = True, then \"OR\" search is executed, else \"AND\" search\n - Given `ignores`, \"NOT\" search is also executed\n - NOTE: The search function is available after FM-index is built or loaded\n\n\nCount word from FM-index\n---------------------------------\n\n.. code:: python\n\n >>> fm.count('Milky'):\n 2\n\n >>> fm.count(['Milky', 'Holmes']):\n 1\n\n- count(query, [_or=False])\n\n - If `_or` = True, then \"OR\" search is executed, else \"AND\" search\n - NOTE: The count function is available after FM-index is built or loaded\n - This function is slightly faster than the search function\n\n\n\nAdd a document\n---------------------------------\n\n.. code:: python\n\n >>> fm.push_back('Baritsu')\n\n- push_back(doc)\n\n - NOTE: A document added by this method is not available to search until build\n\n\nRead FM-index from a binary file\n---------------------------------\n\n.. code:: python\n\n >>> fm.read('milky_holmes.fm')\n\n- read(path)\n\n\nWrite FM-index binary to a file\n---------------------------------\n\n.. code:: python\n\n >>> fm.write('milky_holmes.fm')\n\n- write(path)\n\n\nCheck Whether FM-Index contains string\n---------------------------------------\n\n.. code:: python\n\n >>> 'baritsu' in fm\n\n\nLicense\n=========\n- Wrapper code is licensed under the New BSD License.\n- Bundled `shellinford`_ C++ library (c) 2012 echizen_tm is licensed under the New BSD License.\n\n\n.. |travis| image:: https://travis-ci.org/ikegami-yukino/shellinford-python.svg?branch=master\n :target: https://travis-ci.org/ikegami-yukino/shellinford-python\n :alt: travis-ci.org\n\n.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/shellinford-python/badge.svg?branch=master&service=github\n :target: https://coveralls.io/github/ikegami-yukino/shellinford-python?branch=master\n :alt: coveralls.io\n\n.. |pyversion| image:: https://img.shields.io/pypi/pyversions/shellinford.svg\n\n.. |version| image:: https://img.shields.io/pypi/v/shellinford.svg\n :target: http://pypi.python.org/pypi/shellinford/\n :alt: latest version\n\n.. |license| image:: https://img.shields.io/pypi/l/shellinford.svg\n :target: http://pypi.python.org/pypi/shellinford/\n :alt: license\n\n\nCHANGES\n=======\n\n0.4.1 (2010-02-08)\n------------------\n\n- Make \"in\" operator faster\n\n0.4.0 (2018-09-30)\n------------------\n\n- `FMIndex.count()` is added\n- No longer support Python 2.6\n- bug fix\n\n0.3.5 (2018-09-05)\n------------------\n\n- `FMIndex.build()` and `FMIndex.pushback()` ignore empty string\n- `FMIndex` supports \"in\" operator. (e.g., 'a' in fm)\n- Support Python 3.5, 3.6 and 3.7\n\n0.3.4 (2016-10-28)\n------------------\n\n- `FMIndex.search()` returns list\n\n0.3 (2014-11-24)\n----------------\n\n- \"OR\" search and \"NOT\" search are available in `FMIndex.search()`.\n- `FMIndex.size` and `FMIndex.docsize` are available as property\n\n0.2 (2014-03-28)\n----------------\n\n\"AND\" search is available by giving Sequence (list, tuple, etc.) `FMIndex.search()`\n\n0.1 (2014-03-11)\n----------------\n\nFirst release.",
"bugtrack_url": null,
"license": "",
"summary": "Wavelet Matrix/Tree succinct data structure for full text search (using shellinford C++ library)",
"version": "0.4.1",
"split_keywords": [
"full text search",
"fm-index",
"wavelet matrix"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "07c7eb51f777dfe91e3d64e0c9923c17",
"sha256": "04a323dbce44234f4b8df7f61307c26022fe5e34d5ed433d8ed03df8d3b7a725"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp36-cp36m-win32.whl",
"has_sig": false,
"md5_digest": "07c7eb51f777dfe91e3d64e0c9923c17",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 104045,
"upload_time": "2021-05-02T03:36:30",
"upload_time_iso_8601": "2021-05-02T03:36:30.128449Z",
"url": "https://files.pythonhosted.org/packages/de/f5/ff8b6601e0f3a5c487b716f9b1099c17906f5eda21b12ce4aebfa161a7d7/shellinford-0.4.1-cp36-cp36m-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "5e8a3804e264f68135432d929e87e9e6",
"sha256": "ac02460be8eccffd26cbfe240f51e6fbe36d36a53eab1942ea4eb6fbc463e385"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp36-cp36m-win_amd64.whl",
"has_sig": false,
"md5_digest": "5e8a3804e264f68135432d929e87e9e6",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 122574,
"upload_time": "2021-05-02T03:36:31",
"upload_time_iso_8601": "2021-05-02T03:36:31.844947Z",
"url": "https://files.pythonhosted.org/packages/d0/b3/dfdc83468ec44bf866ec0b960333fbe0b6ed2e7be3f2cbe4144c7b830196/shellinford-0.4.1-cp36-cp36m-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "65eb799186f17296fc779fa27f13cf5a",
"sha256": "3c2d5f05401508c17540c4afa0e7b1da224915973edc862f6e21bd91763b9189"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp37-cp37m-win32.whl",
"has_sig": false,
"md5_digest": "65eb799186f17296fc779fa27f13cf5a",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 103094,
"upload_time": "2021-05-02T03:36:33",
"upload_time_iso_8601": "2021-05-02T03:36:33.291187Z",
"url": "https://files.pythonhosted.org/packages/65/64/58cad7a4ca19e1303d7f2c04c30ef9686b67e23061829298523efaf93a07/shellinford-0.4.1-cp37-cp37m-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "ca19765818a752baf066928de4b92db0",
"sha256": "801ed0d8050a371ba42bb8d97c25487f040484bc958c47188cef8da7768dd18c"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp37-cp37m-win_amd64.whl",
"has_sig": false,
"md5_digest": "ca19765818a752baf066928de4b92db0",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 121623,
"upload_time": "2021-05-02T03:36:35",
"upload_time_iso_8601": "2021-05-02T03:36:35.244160Z",
"url": "https://files.pythonhosted.org/packages/5c/a1/c75922dda3f2dc9d8835dd9619bcb5ce70d8fc5ac323ef4721cc47d25db5/shellinford-0.4.1-cp37-cp37m-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "ceb65998da3a4fc20a9ec94d190eea24",
"sha256": "d3480e4f7e5c2033c8d82a9b56a367b3d034fe5d0a057bc9cf1442b79ad2c05b"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp38-cp38-win32.whl",
"has_sig": false,
"md5_digest": "ceb65998da3a4fc20a9ec94d190eea24",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 85773,
"upload_time": "2021-05-02T03:36:36",
"upload_time_iso_8601": "2021-05-02T03:36:36.697447Z",
"url": "https://files.pythonhosted.org/packages/10/16/cf74911f8c30457866174e462a3c1824a36383cd9618939bf215f409dfa4/shellinford-0.4.1-cp38-cp38-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "caf9af5668f999da123531c64e17b4f9",
"sha256": "2b4b5bd9b9175987ceaea76db1791123c909fa3378d0fd596882b86d18338131"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp38-cp38-win_amd64.whl",
"has_sig": false,
"md5_digest": "caf9af5668f999da123531c64e17b4f9",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 98926,
"upload_time": "2021-05-02T03:36:38",
"upload_time_iso_8601": "2021-05-02T03:36:38.140167Z",
"url": "https://files.pythonhosted.org/packages/6b/bc/b7e34f98040e13d688e075b11b907ad3ed793f831a22bcb51c88369d224d/shellinford-0.4.1-cp38-cp38-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "84ef80b84ab3e4d390083569c64b171d",
"sha256": "6cfa15eb07ed4d120f98270a47cab6ed9998552aab4e2220eaac83882d014253"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp39-cp39-win32.whl",
"has_sig": false,
"md5_digest": "84ef80b84ab3e4d390083569c64b171d",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 85323,
"upload_time": "2021-05-02T03:36:39",
"upload_time_iso_8601": "2021-05-02T03:36:39.610182Z",
"url": "https://files.pythonhosted.org/packages/c1/b6/fd277c52c5cfcde5f85c9de958f3a02aac7a17c7520a49266cbdc4396ef9/shellinford-0.4.1-cp39-cp39-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "32300f3fb06f618c5e7cd68b9ddfb6f5",
"sha256": "387dcfc4eab1d02f034f040301b44fe85bae45e1b3a7c216fb11d24eb179561a"
},
"downloads": -1,
"filename": "shellinford-0.4.1-cp39-cp39-win_amd64.whl",
"has_sig": false,
"md5_digest": "32300f3fb06f618c5e7cd68b9ddfb6f5",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 99213,
"upload_time": "2021-05-02T03:36:40",
"upload_time_iso_8601": "2021-05-02T03:36:40.891027Z",
"url": "https://files.pythonhosted.org/packages/bf/45/f0c098da42050b3dae1a2af6b2c3b6e83ac59fbbd0697a89958df4413a79/shellinford-0.4.1-cp39-cp39-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "d485d6483ace46aca6b6662bea346877",
"sha256": "c19f125a9d22d9676dbec64c0490ddd2d95d2449363052ddc2f4a588a52b04b3"
},
"downloads": -1,
"filename": "shellinford-0.4.1.tar.gz",
"has_sig": false,
"md5_digest": "d485d6483ace46aca6b6662bea346877",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 64999,
"upload_time": "2019-02-08T13:56:24",
"upload_time_iso_8601": "2019-02-08T13:56:24.446540Z",
"url": "https://files.pythonhosted.org/packages/fd/d7/717cc007043e951cccc6f384b25df4161cb54391b69f93c5b1b29cf9b924/shellinford-0.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2019-02-08 13:56:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "ikegami-yukino",
"github_project": "shellinford-python",
"travis_ci": true,
"coveralls": true,
"github_actions": false,
"lcname": "shellinford"
}