fasttext-numpy2
===============
fasttext with one line changed to support numpy 2.
install
-------
.. code:: bash
pip install fasttext-numpy2
or
.. code:: bash
# clone and cd into
pip install -e .
build
-----
.. code:: bash
python -m build
build for pypi
--------------
see ``pypibuild.md``
notes
-----
all credits go to original authors.
fastText |CircleCI|
===================
`fastText <https://fasttext.cc/>`__ is a library for efficient learning
of word representations and sentence classification.
In this document we present how to use fastText in python.
Table of contents
-----------------
- `Requirements <#requirements>`__
- `Installation <#installation>`__
- `Usage overview <#usage-overview>`__
- `Word representation model <#word-representation-model>`__
- `Text classification model <#text-classification-model>`__
- `IMPORTANT: Preprocessing data / encoding
conventions <#important-preprocessing-data-encoding-conventions>`__
- `More examples <#more-examples>`__
- `API <#api>`__
- ```train_unsupervised``
parameters <#train_unsupervised-parameters>`__
- ```train_supervised`` parameters <#train_supervised-parameters>`__
- ```model`` object <#model-object>`__
Requirements
============
`fastText <https://fasttext.cc/>`__ builds on modern Mac OS and Linux
distributions. Since it uses C++11 features, it requires a compiler with
good C++11 support. You will need `Python <https://www.python.org/>`__
(version 2.7 or ≥ 3.4), `NumPy <http://www.numpy.org/>`__ &
`SciPy <https://www.scipy.org/>`__ and
`pybind11 <https://github.com/pybind/pybind11>`__.
Installation
============
To install the latest release, you can do :
.. code:: bash
$ pip install fasttext
or, to get the latest development version of fasttext, you can install
from our github repository :
.. code:: bash
$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ sudo pip install .
$ # or :
$ sudo python setup.py install
Usage overview
==============
Word representation model
-------------------------
In order to learn word vectors, as `described
here <https://fasttext.cc/docs/en/references.html#enriching-word-vectors-with-subword-information>`__,
we can use ``fasttext.train_unsupervised`` function like this:
.. code:: py
import fasttext
# Skipgram model :
model = fasttext.train_unsupervised('data.txt', model='skipgram')
# or, cbow model :
model = fasttext.train_unsupervised('data.txt', model='cbow')
where ``data.txt`` is a training file containing utf-8 encoded text.
The returned ``model`` object represents your learned model, and you can
use it to retrieve information.
.. code:: py
print(model.words) # list of words in dictionary
print(model['king']) # get the vector of the word 'king'
Saving and loading a model object
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can save your trained model object by calling the function
``save_model``.
.. code:: py
model.save_model("model_filename.bin")
and retrieve it later thanks to the function ``load_model`` :
.. code:: py
model = fasttext.load_model("model_filename.bin")
For more information about word representation usage of fasttext, you
can refer to our `word representations
tutorial <https://fasttext.cc/docs/en/unsupervised-tutorial.html>`__.
Text classification model
-------------------------
In order to train a text classifier using the method `described
here <https://fasttext.cc/docs/en/references.html#bag-of-tricks-for-efficient-text-classification>`__,
we can use ``fasttext.train_supervised`` function like this:
.. code:: py
import fasttext
model = fasttext.train_supervised('data.train.txt')
where ``data.train.txt`` is a text file containing a training sentence
per line along with the labels. By default, we assume that labels are
words that are prefixed by the string ``__label__``
Once the model is trained, we can retrieve the list of words and labels:
.. code:: py
print(model.words)
print(model.labels)
To evaluate our model by computing the precision at 1 (P@1) and the
recall on a test set, we use the ``test`` function:
.. code:: py
def print_results(N, p, r):
print("N\t" + str(N))
print("P@{}\t{:.3f}".format(1, p))
print("R@{}\t{:.3f}".format(1, r))
print_results(*model.test('test.txt'))
We can also predict labels for a specific text :
.. code:: py
model.predict("Which baking dish is best to bake a banana bread ?")
By default, ``predict`` returns only one label : the one with the
highest probability. You can also predict more than one label by
specifying the parameter ``k``:
.. code:: py
model.predict("Which baking dish is best to bake a banana bread ?", k=3)
If you want to predict more than one sentence you can pass an array of
strings :
.. code:: py
model.predict(["Which baking dish is best to bake a banana bread ?", "Why not put knives in the dishwasher?"], k=3)
Of course, you can also save and load a model to/from a file as `in the
word representation usage <#saving-and-loading-a-model-object>`__.
For more information about text classification usage of fasttext, you
can refer to our `text classification
tutorial <https://fasttext.cc/docs/en/supervised-tutorial.html>`__.
Compress model files with quantization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When you want to save a supervised model file, fastText can compress it
in order to have a much smaller model file by sacrificing only a little
bit performance.
.. code:: py
# with the previously trained `model` object, call :
model.quantize(input='data.train.txt', retrain=True)
# then display results and save the new model :
print_results(*model.test(valid_data))
model.save_model("model_filename.ftz")
``model_filename.ftz`` will have a much smaller size than
``model_filename.bin``.
For further reading on quantization, you can refer to `this paragraph
from our blog
post <https://fasttext.cc/blog/2017/10/02/blog-post.html#model-compression>`__.
IMPORTANT: Preprocessing data / encoding conventions
----------------------------------------------------
In general it is important to properly preprocess your data. In
particular our example scripts in the `root
folder <https://github.com/facebookresearch/fastText>`__ do this.
fastText assumes UTF-8 encoded text. All text must be `unicode for
Python2 <https://docs.python.org/2/library/functions.html#unicode>`__
and `str for
Python3 <https://docs.python.org/3.5/library/stdtypes.html#textseq>`__.
The passed text will be `encoded as UTF-8 by
pybind11 <https://pybind11.readthedocs.io/en/master/advanced/cast/strings.html?highlight=utf-8#strings-bytes-and-unicode-conversions>`__
before passed to the fastText C++ library. This means it is important to
use UTF-8 encoded text when building a model. On Unix-like systems you
can convert text using `iconv <https://en.wikipedia.org/wiki/Iconv>`__.
fastText will tokenize (split text into pieces) based on the following
ASCII characters (bytes). In particular, it is not aware of UTF-8
whitespace. We advice the user to convert UTF-8 whitespace / word
boundaries into one of the following symbols as appropiate.
- space
- tab
- vertical tab
- carriage return
- formfeed
- the null character
The newline character is used to delimit lines of text. In particular,
the EOS token is appended to a line of text if a newline character is
encountered. The only exception is if the number of tokens exceeds the
MAX_LINE_SIZE constant as defined in the `Dictionary
header <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h>`__.
This means if you have text that is not separate by newlines, such as
the `fil9 dataset <http://mattmahoney.net/dc/textdata>`__, it will be
broken into chunks with MAX_LINE_SIZE of tokens and the EOS token is not
appended.
The length of a token is the number of UTF-8 characters by considering
the `leading two bits of a
byte <https://en.wikipedia.org/wiki/UTF-8#Description>`__ to identify
`subsequent bytes of a multi-byte
sequence <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.cc>`__.
Knowing this is especially important when choosing the minimum and
maximum length of subwords. Further, the EOS token (as specified in the
`Dictionary
header <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h>`__)
is considered a character and will not be broken into subwords.
More examples
-------------
In order to have a better knowledge of fastText models, please consider
the main
`README <https://github.com/facebookresearch/fastText/blob/master/README.md>`__
and in particular `the tutorials on our
website <https://fasttext.cc/docs/en/supervised-tutorial.html>`__.
You can find further python examples in `the doc
folder <https://github.com/facebookresearch/fastText/tree/master/python/doc/examples>`__.
As with any package you can get help on any Python function using the
help function.
For example
::
+>>> import fasttext
+>>> help(fasttext.FastText)
Help on module fasttext.FastText in fasttext:
NAME
fasttext.FastText
DESCRIPTION
# Copyright (c) 2017-present, Facebook, Inc.
# All rights reserved.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
FUNCTIONS
load_model(path)
Load a model given a filepath and return a model object.
tokenize(text)
Given a string of text, tokenize it and return a list of tokens
[...]
API
===
``train_unsupervised`` parameters
---------------------------------
.. code:: python
input # training file path (required)
model # unsupervised fasttext model {cbow, skipgram} [skipgram]
lr # learning rate [0.05]
dim # size of word vectors [100]
ws # size of the context window [5]
epoch # number of epochs [5]
minCount # minimal number of word occurences [5]
minn # min length of char ngram [3]
maxn # max length of char ngram [6]
neg # number of negatives sampled [5]
wordNgrams # max length of word ngram [1]
loss # loss function {ns, hs, softmax, ova} [ns]
bucket # number of buckets [2000000]
thread # number of threads [number of cpus]
lrUpdateRate # change the rate of updates for the learning rate [100]
t # sampling threshold [0.0001]
verbose # verbose [2]
``train_supervised`` parameters
-------------------------------
.. code:: python
input # training file path (required)
lr # learning rate [0.1]
dim # size of word vectors [100]
ws # size of the context window [5]
epoch # number of epochs [5]
minCount # minimal number of word occurences [1]
minCountLabel # minimal number of label occurences [1]
minn # min length of char ngram [0]
maxn # max length of char ngram [0]
neg # number of negatives sampled [5]
wordNgrams # max length of word ngram [1]
loss # loss function {ns, hs, softmax, ova} [softmax]
bucket # number of buckets [2000000]
thread # number of threads [number of cpus]
lrUpdateRate # change the rate of updates for the learning rate [100]
t # sampling threshold [0.0001]
label # label prefix ['__label__']
verbose # verbose [2]
pretrainedVectors # pretrained word vectors (.vec file) for supervised learning []
``model`` object
----------------
``train_supervised``, ``train_unsupervised`` and ``load_model``
functions return an instance of ``_FastText`` class, that we generaly
name ``model`` object.
This object exposes those training arguments as properties : ``lr``,
``dim``, ``ws``, ``epoch``, ``minCount``, ``minCountLabel``, ``minn``,
``maxn``, ``neg``, ``wordNgrams``, ``loss``, ``bucket``, ``thread``,
``lrUpdateRate``, ``t``, ``label``, ``verbose``, ``pretrainedVectors``.
So ``model.wordNgrams`` will give you the max length of word ngram used
for training this model.
In addition, the object exposes several functions :
.. code:: python
get_dimension # Get the dimension (size) of a lookup vector (hidden layer).
# This is equivalent to `dim` property.
get_input_vector # Given an index, get the corresponding vector of the Input Matrix.
get_input_matrix # Get a copy of the full input matrix of a Model.
get_labels # Get the entire list of labels of the dictionary
# This is equivalent to `labels` property.
get_line # Split a line of text into words and labels.
get_output_matrix # Get a copy of the full output matrix of a Model.
get_sentence_vector # Given a string, get a single vector represenation. This function
# assumes to be given a single line of text. We split words on
# whitespace (space, newline, tab, vertical tab) and the control
# characters carriage return, formfeed and the null character.
get_subword_id # Given a subword, return the index (within input matrix) it hashes to.
get_subwords # Given a word, get the subwords and their indicies.
get_word_id # Given a word, get the word id within the dictionary.
get_word_vector # Get the vector representation of word.
get_words # Get the entire list of words of the dictionary
# This is equivalent to `words` property.
is_quantized # whether the model has been quantized
predict # Given a string, get a list of labels and a list of corresponding probabilities.
quantize # Quantize the model reducing the size of the model and it's memory footprint.
save_model # Save the model to the given path
test # Evaluate supervised model using file given by path
test_label # Return the precision and recall score for each label.
The properties ``words``, ``labels`` return the words and labels from
the dictionary :
.. code:: py
model.words # equivalent to model.get_words()
model.labels # equivalent to model.get_labels()
The object overrides ``__getitem__`` and ``__contains__`` functions in
order to return the representation of a word and to check if a word is
in the vocabulary.
.. code:: py
model['king'] # equivalent to model.get_word_vector('king')
'king' in model # equivalent to `'king' in model.get_words()`
Join the fastText community
---------------------------
- `Facebook page <https://www.facebook.com/groups/1174547215919768>`__
- `Stack
overflow <https://stackoverflow.com/questions/tagged/fasttext>`__
- `Google
group <https://groups.google.com/forum/#!forum/fasttext-library>`__
- `GitHub <https://github.com/facebookresearch/fastText>`__
.. |CircleCI| image:: https://circleci.com/gh/facebookresearch/fastText/tree/master.svg?style=svg
:target: https://circleci.com/gh/facebookresearch/fastText/tree/master
Raw data
{
"_id": null,
"home_page": "https://github.com/simon-ging/fasttext-numpy2",
"name": "fasttext-numpy2",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Onur Celebi",
"author_email": "celebio@fb.com",
"download_url": null,
"platform": null,
"description": "fasttext-numpy2\n===============\n\nfasttext with one line changed to support numpy 2.\n\ninstall\n-------\n\n.. code:: bash\n\n pip install fasttext-numpy2\n\nor\n\n.. code:: bash\n\n # clone and cd into\n pip install -e .\n\nbuild\n-----\n\n.. code:: bash\n\n python -m build\n\nbuild for pypi\n--------------\n\nsee ``pypibuild.md``\n\nnotes\n-----\n\nall credits go to original authors.\n\nfastText |CircleCI|\n===================\n\n`fastText <https://fasttext.cc/>`__ is a library for efficient learning\nof word representations and sentence classification.\n\nIn this document we present how to use fastText in python.\n\nTable of contents\n-----------------\n\n- `Requirements <#requirements>`__\n- `Installation <#installation>`__\n- `Usage overview <#usage-overview>`__\n\n - `Word representation model <#word-representation-model>`__\n - `Text classification model <#text-classification-model>`__\n - `IMPORTANT: Preprocessing data / encoding\n conventions <#important-preprocessing-data-encoding-conventions>`__\n - `More examples <#more-examples>`__\n\n- `API <#api>`__\n\n - ```train_unsupervised``\n parameters <#train_unsupervised-parameters>`__\n - ```train_supervised`` parameters <#train_supervised-parameters>`__\n - ```model`` object <#model-object>`__\n\nRequirements\n============\n\n`fastText <https://fasttext.cc/>`__ builds on modern Mac OS and Linux\ndistributions. Since it uses C++11 features, it requires a compiler with\ngood C++11 support. You will need `Python <https://www.python.org/>`__\n(version 2.7 or \u2265 3.4), `NumPy <http://www.numpy.org/>`__ &\n`SciPy <https://www.scipy.org/>`__ and\n`pybind11 <https://github.com/pybind/pybind11>`__.\n\nInstallation\n============\n\nTo install the latest release, you can do :\n\n.. code:: bash\n\n $ pip install fasttext\n\nor, to get the latest development version of fasttext, you can install\nfrom our github repository :\n\n.. code:: bash\n\n $ git clone https://github.com/facebookresearch/fastText.git\n $ cd fastText\n $ sudo pip install .\n $ # or :\n $ sudo python setup.py install\n\nUsage overview\n==============\n\nWord representation model\n-------------------------\n\nIn order to learn word vectors, as `described\nhere <https://fasttext.cc/docs/en/references.html#enriching-word-vectors-with-subword-information>`__,\nwe can use ``fasttext.train_unsupervised`` function like this:\n\n.. code:: py\n\n import fasttext\n\n # Skipgram model :\n model = fasttext.train_unsupervised('data.txt', model='skipgram')\n\n # or, cbow model :\n model = fasttext.train_unsupervised('data.txt', model='cbow')\n\nwhere ``data.txt`` is a training file containing utf-8 encoded text.\n\nThe returned ``model`` object represents your learned model, and you can\nuse it to retrieve information.\n\n.. code:: py\n\n print(model.words) # list of words in dictionary\n print(model['king']) # get the vector of the word 'king'\n\nSaving and loading a model object\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can save your trained model object by calling the function\n``save_model``.\n\n.. code:: py\n\n model.save_model(\"model_filename.bin\")\n\nand retrieve it later thanks to the function ``load_model`` :\n\n.. code:: py\n\n model = fasttext.load_model(\"model_filename.bin\")\n\nFor more information about word representation usage of fasttext, you\ncan refer to our `word representations\ntutorial <https://fasttext.cc/docs/en/unsupervised-tutorial.html>`__.\n\nText classification model\n-------------------------\n\nIn order to train a text classifier using the method `described\nhere <https://fasttext.cc/docs/en/references.html#bag-of-tricks-for-efficient-text-classification>`__,\nwe can use ``fasttext.train_supervised`` function like this:\n\n.. code:: py\n\n import fasttext\n\n model = fasttext.train_supervised('data.train.txt')\n\nwhere ``data.train.txt`` is a text file containing a training sentence\nper line along with the labels. By default, we assume that labels are\nwords that are prefixed by the string ``__label__``\n\nOnce the model is trained, we can retrieve the list of words and labels:\n\n.. code:: py\n\n print(model.words)\n print(model.labels)\n\nTo evaluate our model by computing the precision at 1 (P@1) and the\nrecall on a test set, we use the ``test`` function:\n\n.. code:: py\n\n def print_results(N, p, r):\n print(\"N\\t\" + str(N))\n print(\"P@{}\\t{:.3f}\".format(1, p))\n print(\"R@{}\\t{:.3f}\".format(1, r))\n\n print_results(*model.test('test.txt'))\n\nWe can also predict labels for a specific text :\n\n.. code:: py\n\n model.predict(\"Which baking dish is best to bake a banana bread ?\")\n\nBy default, ``predict`` returns only one label : the one with the\nhighest probability. You can also predict more than one label by\nspecifying the parameter ``k``:\n\n.. code:: py\n\n model.predict(\"Which baking dish is best to bake a banana bread ?\", k=3)\n\nIf you want to predict more than one sentence you can pass an array of\nstrings :\n\n.. code:: py\n\n model.predict([\"Which baking dish is best to bake a banana bread ?\", \"Why not put knives in the dishwasher?\"], k=3)\n\nOf course, you can also save and load a model to/from a file as `in the\nword representation usage <#saving-and-loading-a-model-object>`__.\n\nFor more information about text classification usage of fasttext, you\ncan refer to our `text classification\ntutorial <https://fasttext.cc/docs/en/supervised-tutorial.html>`__.\n\nCompress model files with quantization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen you want to save a supervised model file, fastText can compress it\nin order to have a much smaller model file by sacrificing only a little\nbit performance.\n\n.. code:: py\n\n # with the previously trained `model` object, call :\n model.quantize(input='data.train.txt', retrain=True)\n\n # then display results and save the new model :\n print_results(*model.test(valid_data))\n model.save_model(\"model_filename.ftz\")\n\n``model_filename.ftz`` will have a much smaller size than\n``model_filename.bin``.\n\nFor further reading on quantization, you can refer to `this paragraph\nfrom our blog\npost <https://fasttext.cc/blog/2017/10/02/blog-post.html#model-compression>`__.\n\nIMPORTANT: Preprocessing data / encoding conventions\n----------------------------------------------------\n\nIn general it is important to properly preprocess your data. In\nparticular our example scripts in the `root\nfolder <https://github.com/facebookresearch/fastText>`__ do this.\n\nfastText assumes UTF-8 encoded text. All text must be `unicode for\nPython2 <https://docs.python.org/2/library/functions.html#unicode>`__\nand `str for\nPython3 <https://docs.python.org/3.5/library/stdtypes.html#textseq>`__.\nThe passed text will be `encoded as UTF-8 by\npybind11 <https://pybind11.readthedocs.io/en/master/advanced/cast/strings.html?highlight=utf-8#strings-bytes-and-unicode-conversions>`__\nbefore passed to the fastText C++ library. This means it is important to\nuse UTF-8 encoded text when building a model. On Unix-like systems you\ncan convert text using `iconv <https://en.wikipedia.org/wiki/Iconv>`__.\n\nfastText will tokenize (split text into pieces) based on the following\nASCII characters (bytes). In particular, it is not aware of UTF-8\nwhitespace. We advice the user to convert UTF-8 whitespace / word\nboundaries into one of the following symbols as appropiate.\n\n- space\n- tab\n- vertical tab\n- carriage return\n- formfeed\n- the null character\n\nThe newline character is used to delimit lines of text. In particular,\nthe EOS token is appended to a line of text if a newline character is\nencountered. The only exception is if the number of tokens exceeds the\nMAX_LINE_SIZE constant as defined in the `Dictionary\nheader <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h>`__.\nThis means if you have text that is not separate by newlines, such as\nthe `fil9 dataset <http://mattmahoney.net/dc/textdata>`__, it will be\nbroken into chunks with MAX_LINE_SIZE of tokens and the EOS token is not\nappended.\n\nThe length of a token is the number of UTF-8 characters by considering\nthe `leading two bits of a\nbyte <https://en.wikipedia.org/wiki/UTF-8#Description>`__ to identify\n`subsequent bytes of a multi-byte\nsequence <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.cc>`__.\nKnowing this is especially important when choosing the minimum and\nmaximum length of subwords. Further, the EOS token (as specified in the\n`Dictionary\nheader <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h>`__)\nis considered a character and will not be broken into subwords.\n\nMore examples\n-------------\n\nIn order to have a better knowledge of fastText models, please consider\nthe main\n`README <https://github.com/facebookresearch/fastText/blob/master/README.md>`__\nand in particular `the tutorials on our\nwebsite <https://fasttext.cc/docs/en/supervised-tutorial.html>`__.\n\nYou can find further python examples in `the doc\nfolder <https://github.com/facebookresearch/fastText/tree/master/python/doc/examples>`__.\n\nAs with any package you can get help on any Python function using the\nhelp function.\n\nFor example\n\n::\n\n +>>> import fasttext\n +>>> help(fasttext.FastText)\n\n Help on module fasttext.FastText in fasttext:\n\n NAME\n fasttext.FastText\n\n DESCRIPTION\n # Copyright (c) 2017-present, Facebook, Inc.\n # All rights reserved.\n #\n # This source code is licensed under the MIT license found in the\n # LICENSE file in the root directory of this source tree.\n\n FUNCTIONS\n load_model(path)\n Load a model given a filepath and return a model object.\n\n tokenize(text)\n Given a string of text, tokenize it and return a list of tokens\n [...]\n\nAPI\n===\n\n``train_unsupervised`` parameters\n---------------------------------\n\n.. code:: python\n\n input # training file path (required)\n model # unsupervised fasttext model {cbow, skipgram} [skipgram]\n lr # learning rate [0.05]\n dim # size of word vectors [100]\n ws # size of the context window [5]\n epoch # number of epochs [5]\n minCount # minimal number of word occurences [5]\n minn # min length of char ngram [3]\n maxn # max length of char ngram [6]\n neg # number of negatives sampled [5]\n wordNgrams # max length of word ngram [1]\n loss # loss function {ns, hs, softmax, ova} [ns]\n bucket # number of buckets [2000000]\n thread # number of threads [number of cpus]\n lrUpdateRate # change the rate of updates for the learning rate [100]\n t # sampling threshold [0.0001]\n verbose # verbose [2]\n\n``train_supervised`` parameters\n-------------------------------\n\n.. code:: python\n\n input # training file path (required)\n lr # learning rate [0.1]\n dim # size of word vectors [100]\n ws # size of the context window [5]\n epoch # number of epochs [5]\n minCount # minimal number of word occurences [1]\n minCountLabel # minimal number of label occurences [1]\n minn # min length of char ngram [0]\n maxn # max length of char ngram [0]\n neg # number of negatives sampled [5]\n wordNgrams # max length of word ngram [1]\n loss # loss function {ns, hs, softmax, ova} [softmax]\n bucket # number of buckets [2000000]\n thread # number of threads [number of cpus]\n lrUpdateRate # change the rate of updates for the learning rate [100]\n t # sampling threshold [0.0001]\n label # label prefix ['__label__']\n verbose # verbose [2]\n pretrainedVectors # pretrained word vectors (.vec file) for supervised learning []\n\n``model`` object\n----------------\n\n``train_supervised``, ``train_unsupervised`` and ``load_model``\nfunctions return an instance of ``_FastText`` class, that we generaly\nname ``model`` object.\n\nThis object exposes those training arguments as properties : ``lr``,\n``dim``, ``ws``, ``epoch``, ``minCount``, ``minCountLabel``, ``minn``,\n``maxn``, ``neg``, ``wordNgrams``, ``loss``, ``bucket``, ``thread``,\n``lrUpdateRate``, ``t``, ``label``, ``verbose``, ``pretrainedVectors``.\nSo ``model.wordNgrams`` will give you the max length of word ngram used\nfor training this model.\n\nIn addition, the object exposes several functions :\n\n.. code:: python\n\n get_dimension # Get the dimension (size) of a lookup vector (hidden layer).\n # This is equivalent to `dim` property.\n get_input_vector # Given an index, get the corresponding vector of the Input Matrix.\n get_input_matrix # Get a copy of the full input matrix of a Model.\n get_labels # Get the entire list of labels of the dictionary\n # This is equivalent to `labels` property.\n get_line # Split a line of text into words and labels.\n get_output_matrix # Get a copy of the full output matrix of a Model.\n get_sentence_vector # Given a string, get a single vector represenation. This function\n # assumes to be given a single line of text. We split words on\n # whitespace (space, newline, tab, vertical tab) and the control\n # characters carriage return, formfeed and the null character.\n get_subword_id # Given a subword, return the index (within input matrix) it hashes to.\n get_subwords # Given a word, get the subwords and their indicies.\n get_word_id # Given a word, get the word id within the dictionary.\n get_word_vector # Get the vector representation of word.\n get_words # Get the entire list of words of the dictionary\n # This is equivalent to `words` property.\n is_quantized # whether the model has been quantized\n predict # Given a string, get a list of labels and a list of corresponding probabilities.\n quantize # Quantize the model reducing the size of the model and it's memory footprint.\n save_model # Save the model to the given path\n test # Evaluate supervised model using file given by path\n test_label # Return the precision and recall score for each label. \n\nThe properties ``words``, ``labels`` return the words and labels from\nthe dictionary :\n\n.. code:: py\n\n model.words # equivalent to model.get_words()\n model.labels # equivalent to model.get_labels()\n\nThe object overrides ``__getitem__`` and ``__contains__`` functions in\norder to return the representation of a word and to check if a word is\nin the vocabulary.\n\n.. code:: py\n\n model['king'] # equivalent to model.get_word_vector('king')\n 'king' in model # equivalent to `'king' in model.get_words()`\n\nJoin the fastText community\n---------------------------\n\n- `Facebook page <https://www.facebook.com/groups/1174547215919768>`__\n- `Stack\n overflow <https://stackoverflow.com/questions/tagged/fasttext>`__\n- `Google\n group <https://groups.google.com/forum/#!forum/fasttext-library>`__\n- `GitHub <https://github.com/facebookresearch/fastText>`__\n\n.. |CircleCI| image:: https://circleci.com/gh/facebookresearch/fastText/tree/master.svg?style=svg\n :target: https://circleci.com/gh/facebookresearch/fastText/tree/master\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "fasttext Python bindings, fixed numpy 2 compatibiliy",
"version": "0.10.4",
"project_urls": {
"Homepage": "https://github.com/simon-ging/fasttext-numpy2"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d5fe0d0e19ebffce8ec0f431c653e53313b2592173f213f31a67ebc8d09dd7fd",
"md5": "5a86fc3453ce2174dafcf44de8dd2a8b",
"sha256": "358b09a612d4c803b2e60bbaaa5e004693777927ce3700d21167f5da7b1c1b80"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "5a86fc3453ce2174dafcf44de8dd2a8b",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 4647110,
"upload_time": "2024-11-08T16:50:17",
"upload_time_iso_8601": "2024-11-08T16:50:17.347331Z",
"url": "https://files.pythonhosted.org/packages/d5/fe/0d0e19ebffce8ec0f431c653e53313b2592173f213f31a67ebc8d09dd7fd/fasttext_numpy2-0.10.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "afc3951a2facbbbe8b1707853ba6bd952e4fe0f4ee1e6a45e2e9f29f579b8c7a",
"md5": "142bafa344029b0a28e3f85d800d114c",
"sha256": "b78327f9ad9f0d0f425c646f9bf6c3750642e4d84a134a6f3a1ca236850a0ac0"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "142bafa344029b0a28e3f85d800d114c",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 4540796,
"upload_time": "2024-11-08T16:50:20",
"upload_time_iso_8601": "2024-11-08T16:50:20.644541Z",
"url": "https://files.pythonhosted.org/packages/af/c3/951a2facbbbe8b1707853ba6bd952e4fe0f4ee1e6a45e2e9f29f579b8c7a/fasttext_numpy2-0.10.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "efc40287b3d6a5f9dca6c8d1c3445522bd8164816a4ec08fb6bfaa674c8ad125",
"md5": "14b94de9234057424ff7eaeb193a1d89",
"sha256": "4c295f1f4d0b084b9274036ba3e254aa06fec1ac6a8c72f7984cb1bc1f60fb97"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "14b94de9234057424ff7eaeb193a1d89",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 4669911,
"upload_time": "2024-11-08T16:50:23",
"upload_time_iso_8601": "2024-11-08T16:50:23.809597Z",
"url": "https://files.pythonhosted.org/packages/ef/c4/0287b3d6a5f9dca6c8d1c3445522bd8164816a4ec08fb6bfaa674c8ad125/fasttext_numpy2-0.10.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "482b308c4422f12b0eef0843f1813147448c5c817be8e61c0efd89daf30f5bb0",
"md5": "f8bfe907af73b1a5a44ff479e597b535",
"sha256": "3bcc2e0093315ecb6886d6b7c9abc9e2e3c51e0935bbce4038c4705f9b73a25d"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "f8bfe907af73b1a5a44ff479e597b535",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 4562649,
"upload_time": "2024-11-08T16:50:25",
"upload_time_iso_8601": "2024-11-08T16:50:25.998626Z",
"url": "https://files.pythonhosted.org/packages/48/2b/308c4422f12b0eef0843f1813147448c5c817be8e61c0efd89daf30f5bb0/fasttext_numpy2-0.10.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "af96eb4e3b4798695cb25987dec4b1fc060bda4408ff67153fe586def88e0f1f",
"md5": "dbdcf183b89830c893f5430ca1d020eb",
"sha256": "0ba8a684e31f188a5b6dafb17833d5c9890656164a43e95b8c865ce3439a5ff8"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "dbdcf183b89830c893f5430ca1d020eb",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": null,
"size": 4686845,
"upload_time": "2024-11-08T16:50:29",
"upload_time_iso_8601": "2024-11-08T16:50:29.082278Z",
"url": "https://files.pythonhosted.org/packages/af/96/eb4e3b4798695cb25987dec4b1fc060bda4408ff67153fe586def88e0f1f/fasttext_numpy2-0.10.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e4a9118ff7f2b38f12794ae04091b684b5f0718db729840be5581df527d419b8",
"md5": "b0e74a0b42f8d7e445712259e8a7ff3b",
"sha256": "b781303c8c24324e898b79c5a79f09d2d590e2a87f15c583557dc3ac94a33652"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "b0e74a0b42f8d7e445712259e8a7ff3b",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": null,
"size": 4563745,
"upload_time": "2024-11-08T16:50:31",
"upload_time_iso_8601": "2024-11-08T16:50:31.840169Z",
"url": "https://files.pythonhosted.org/packages/e4/a9/118ff7f2b38f12794ae04091b684b5f0718db729840be5581df527d419b8/fasttext_numpy2-0.10.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "14b8e682c5f2ee0600ba48e3e848e16f1571773b9e6f48944929de1b51763b3f",
"md5": "0f2189dd5c8385527131981d8fed66cb",
"sha256": "5f65f7c96aa6a66fed58994ee0d45368f07fa3de5080d2f84f9f9b5ed7bca380"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "0f2189dd5c8385527131981d8fed66cb",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": null,
"size": 4687226,
"upload_time": "2024-11-08T16:50:33",
"upload_time_iso_8601": "2024-11-08T16:50:33.765275Z",
"url": "https://files.pythonhosted.org/packages/14/b8/e682c5f2ee0600ba48e3e848e16f1571773b9e6f48944929de1b51763b3f/fasttext_numpy2-0.10.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ce1f3114fc06342225e724ffcd7d9055f1f2a382fff3653ec20034c823ea323e",
"md5": "902376a9cb9dc18e7ddf6e303856fc77",
"sha256": "c4719752a197f1d76f9bfb1343d6c63346f56ec037ba5fd85d14f024740402cb"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "902376a9cb9dc18e7ddf6e303856fc77",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": null,
"size": 4563495,
"upload_time": "2024-11-08T16:50:36",
"upload_time_iso_8601": "2024-11-08T16:50:36.484405Z",
"url": "https://files.pythonhosted.org/packages/ce/1f/3114fc06342225e724ffcd7d9055f1f2a382fff3653ec20034c823ea323e/fasttext_numpy2-0.10.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c15e7fce71d32c175587f9b7022b7aeaf3d92b4d61d6cd678301048065327f0d",
"md5": "43372b41515f4b72248cdd1de9c08fcd",
"sha256": "4258e8d7d9ac20c4fcb26a6382b6c0ba75c1e3377cf8eef6718dab99066ce677"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "43372b41515f4b72248cdd1de9c08fcd",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 4737341,
"upload_time": "2024-11-08T16:50:38",
"upload_time_iso_8601": "2024-11-08T16:50:38.432037Z",
"url": "https://files.pythonhosted.org/packages/c1/5e/7fce71d32c175587f9b7022b7aeaf3d92b4d61d6cd678301048065327f0d/fasttext_numpy2-0.10.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cc9e6ef7e70b05a3962598d16e847584f483d9920ef673c36f68894af3552e80",
"md5": "78627b64d60ba20c741fe131aa8f81a4",
"sha256": "0edc85a3ca85d4f0170a84e9ec2bfdf606e2e490c107bea95291051cc80335d9"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp36-cp36m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "78627b64d60ba20c741fe131aa8f81a4",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 4661378,
"upload_time": "2024-11-08T16:50:41",
"upload_time_iso_8601": "2024-11-08T16:50:41.108855Z",
"url": "https://files.pythonhosted.org/packages/cc/9e/6ef7e70b05a3962598d16e847584f483d9920ef673c36f68894af3552e80/fasttext_numpy2-0.10.4-cp36-cp36m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9b7cab17bffc73cee538e592c14909048ad93fd8a82c74e050ca142c9c3ad12f",
"md5": "b2dd6079def682d85d0755e83a292735",
"sha256": "6d5fe255ec61a96f1e019ce472b88fe669cd7262dea06d8bdb8e193f91edb06c"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "b2dd6079def682d85d0755e83a292735",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 4753036,
"upload_time": "2024-11-08T16:50:43",
"upload_time_iso_8601": "2024-11-08T16:50:43.903214Z",
"url": "https://files.pythonhosted.org/packages/9b/7c/ab17bffc73cee538e592c14909048ad93fd8a82c74e050ca142c9c3ad12f/fasttext_numpy2-0.10.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8008b2a23f0069f1d605504ef3cdd2394ef80afd067fe65698b5393466d42148",
"md5": "6f8bb17068d0403858742c7fb2a5b994",
"sha256": "193f708baa15fd99c9f9094022626c1793ea70e0eccb8ea94107d5099d2b681f"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp37-cp37m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "6f8bb17068d0403858742c7fb2a5b994",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 4667978,
"upload_time": "2024-11-08T16:50:45",
"upload_time_iso_8601": "2024-11-08T16:50:45.822889Z",
"url": "https://files.pythonhosted.org/packages/80/08/b2a23f0069f1d605504ef3cdd2394ef80afd067fe65698b5393466d42148/fasttext_numpy2-0.10.4-cp37-cp37m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7069c8fae904fec6a5deef238583d3a15618b04184a4c1844107db6a333d0c21",
"md5": "a73d7f38d4848803948b4d4062ade6f5",
"sha256": "7e6ce53c3c733034d0ac94211cac055ef2cd8e4eb237a7f8d352fae9ce5c2eca"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "a73d7f38d4848803948b4d4062ade6f5",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 4651106,
"upload_time": "2024-11-08T16:50:47",
"upload_time_iso_8601": "2024-11-08T16:50:47.603046Z",
"url": "https://files.pythonhosted.org/packages/70/69/c8fae904fec6a5deef238583d3a15618b04184a4c1844107db6a333d0c21/fasttext_numpy2-0.10.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "90dbd0b2d561b8ec9bad5d9307e21ad598c304f75a0a2a0615f1e811907b6fa1",
"md5": "f5c1967dd47412066116d861b57fb2e0",
"sha256": "be188cc52108aad32b15c46b8d53122cf3d1462db9dcd94897533d8dd6eedc44"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "f5c1967dd47412066116d861b57fb2e0",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 4544184,
"upload_time": "2024-11-08T16:50:50",
"upload_time_iso_8601": "2024-11-08T16:50:50.783730Z",
"url": "https://files.pythonhosted.org/packages/90/db/d0b2d561b8ec9bad5d9307e21ad598c304f75a0a2a0615f1e811907b6fa1/fasttext_numpy2-0.10.4-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ee81ca4c33e4248948401a4e0b2121f70fcabe988922c54368a93943b82f008b",
"md5": "ab703e1747f57c7f24249fc5691576fa",
"sha256": "010cd3d59f8cafc32112f999e37bc13c75fdf7f690d1d3a7fb5eb132d551baac"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "ab703e1747f57c7f24249fc5691576fa",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 4645995,
"upload_time": "2024-11-08T16:50:52",
"upload_time_iso_8601": "2024-11-08T16:50:52.733412Z",
"url": "https://files.pythonhosted.org/packages/ee/81/ca4c33e4248948401a4e0b2121f70fcabe988922c54368a93943b82f008b/fasttext_numpy2-0.10.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7cd9f85053bff9631c4a90faee34902361bc291808e11a58ffd8a9ca27198c51",
"md5": "4d67746cd1311c2808fe735bf059a163",
"sha256": "41cfa07bad0a60bf2a0a0139c125a102b1d6101f46841586f2a1e879151bcc61"
},
"downloads": -1,
"filename": "fasttext_numpy2-0.10.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "4d67746cd1311c2808fe735bf059a163",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 4539947,
"upload_time": "2024-11-08T16:50:54",
"upload_time_iso_8601": "2024-11-08T16:50:54.840768Z",
"url": "https://files.pythonhosted.org/packages/7c/d9/f85053bff9631c4a90faee34902361bc291808e11a58ffd8a9ca27198c51/fasttext_numpy2-0.10.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-08 16:50:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simon-ging",
"github_project": "fasttext-numpy2",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "fasttext-numpy2"
}