Name | py-skipgram-24 JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | Implementing my own skipgram model |
upload_time | 2024-03-22 19:25:54 |
maintainer | None |
docs_url | None |
author | Bill |
requires_python | <4.0,>=3.9 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# py_skipgram_24
![ci-cd](https://github.com/billwan96/2024_03-skipgram/actions/workflows/ci-cd.yml/badge.svg)
[![PyPI version](https://badge.fury.io/py/py-skipgram-24.svg)](https://badge.fury.io/py/py-skipgram-24)
## ๐ About
This package, named โpy_skipgram_24โ, is a comprehensive toolkit for Skip-gram modeling and evaluation. It offers a set of functions designed to facilitate various aspects of working with Skip-gram algorithms, from preprocessing the data, creating input pairs, training the model to getting word vectors. We aim to simplify the process by providing essential functionalities for data manipulation, model training, and evaluation.
## ๐ฆ Functions
This package consists of six functions and explained as below:
- SkipgramModel(vocab_size, embedding_dim): This class initializes the Skipgram model with the vocabulary size and embedding dimension, and defines the forward pass.
- MyPreprocessor(texts, stopwords, strip_puncts=True): This class preprocesses the given texts by tokenizing the sentences, converting to lower case, and removing stopwords and punctuation.
- create_input_pairs(pp_corpus, word2idx, context_size=2): This function creates input pairs for the Skipgram model from the preprocessed corpus, word-to-index mapping, and context size.
- get_vocab(tokenized_corpus): This function gets the vocabulary from the tokenized corpus.
- get_word_vectors(model, word2idx): This function gets the word vectors from the trained model and word-to-index mapping.
## ๐ ๏ธ Installation
Option 1 (For Users)
The package has been published to PYPI, we could use pip install
Create and activate a virtual environment using conda
```
$ conda create --name <env_name> pip -y
$ conda activate <env_name>
```
Install the package using the command below
```
$ pip install py_skipgram_24
```
Option 2 (For Developers)
To successfully run the following commands of installation, we would need conda and poetry, guide included in the link (conda, poetry)
Clone this repository
```
$ git clone git@github.com:<your_username>/py_skipgram_24.git
```
Direct to the root of this repository
Create a virtual environment in Conda with Python by the following commands at terminal and activate it:
```
$ conda create --name py_skipgram_24 python=3.11 -y
$ conda activate py_skipgram_24
```
Install this package via poetry, run the following command.
```
$ poetry install
```
## โ
Testing
To test this package, please run the following command from the root directory of the repository:
```
$ pytest tests/
```
Branch coverage could be viewed with the following command:
```
$ pytest --cov-branch --cov=py_skipgram_24
```
## Usage
To successfully use our Skipgram model to predict the target, please first ensure you have followed the instruction of installation, and then run the following line in a python notebook. Or you can look at the doc folder, with an example notebook.
```python
from py_skipgram_24 import SkipgramModel, create_input_pairs, get_vocab, MyPreprocessor, get_word_vectors
corpus = ["It was a great day. I loved the movie and spending time with you. I wish we had more time.",
"The sky is always blue underneath. Remember that."]
sentences = MyPreprocessor(corpus)
pp_corpus = list(sentences)
vocab = get_vocab(pp_corpus)
word2idx = {word: idx for idx, word in enumerate(vocab)}
idx_pairs = create_input_pairs(pp_corpus, word2idx, context_size=2)
model = SkipgramModel(len(vocab), 10)
train_model(model, idx_pairs, epochs=250, learning_rate=0.025)
word_vectors = get_word_vectors(model, word2idx)
print(word_vectors)
```
### ๐ Package Integration within the Python Ecosystem
py_skipgram_24, while acknowledging the robustness and the capabilities of PyTorchโs nn.Module, aims to offer a specialized and streamlined toolkit tailored explicitly for Skip-gram tasks. As a lightweight and focused alternative, py_skipgram_24 serves users who seek a concise package that offers preprocessing, creating input pairs, training the model, and getting word vectors functions. While PyTorch covers a broader spectrum of deep learning algorithms, py_skipgram_24 provides a more specialized package, potentially appealing to those who prefer a tailored implementation of their Skip-gram workflows.
### Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
## License
`py_skipgram_24` was created by Bill. It is licensed under the terms of the MIT license.
## Credits
`py_skipgram_24` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).
Raw data
{
"_id": null,
"home_page": null,
"name": "py-skipgram-24",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Bill",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/36/4b/020cc255e14021f36f947e3c115eb6b51e799a52335a8581ef32cef4e97c/py_skipgram_24-0.2.0.tar.gz",
"platform": null,
"description": "# py_skipgram_24\n![ci-cd](https://github.com/billwan96/2024_03-skipgram/actions/workflows/ci-cd.yml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/py-skipgram-24.svg)](https://badge.fury.io/py/py-skipgram-24)\n\n## \ud83d\udcc4 About\nThis package, named \u201cpy_skipgram_24\u201d, is a comprehensive toolkit for Skip-gram modeling and evaluation. It offers a set of functions designed to facilitate various aspects of working with Skip-gram algorithms, from preprocessing the data, creating input pairs, training the model to getting word vectors. We aim to simplify the process by providing essential functionalities for data manipulation, model training, and evaluation.\n\n## \ud83d\udce6 Functions\nThis package consists of six functions and explained as below:\n\n- SkipgramModel(vocab_size, embedding_dim): This class initializes the Skipgram model with the vocabulary size and embedding dimension, and defines the forward pass.\n- MyPreprocessor(texts, stopwords, strip_puncts=True): This class preprocesses the given texts by tokenizing the sentences, converting to lower case, and removing stopwords and punctuation.\n- create_input_pairs(pp_corpus, word2idx, context_size=2): This function creates input pairs for the Skipgram model from the preprocessed corpus, word-to-index mapping, and context size.\n- get_vocab(tokenized_corpus): This function gets the vocabulary from the tokenized corpus.\n- get_word_vectors(model, word2idx): This function gets the word vectors from the trained model and word-to-index mapping.\n\n## \ud83d\udee0\ufe0f Installation\nOption 1 (For Users)\n\nThe package has been published to PYPI, we could use pip install\n\nCreate and activate a virtual environment using conda\n```\n$ conda create --name <env_name> pip -y\n$ conda activate <env_name>\n```\n\nInstall the package using the command below\n```\n$ pip install py_skipgram_24\n```\n\nOption 2 (For Developers)\n\nTo successfully run the following commands of installation, we would need conda and poetry, guide included in the link (conda, poetry)\n\nClone this repository\n```\n$ git clone git@github.com:<your_username>/py_skipgram_24.git\n```\n\nDirect to the root of this repository\nCreate a virtual environment in Conda with Python by the following commands at terminal and activate it:\n```\n$ conda create --name py_skipgram_24 python=3.11 -y\n$ conda activate py_skipgram_24\n```\n\nInstall this package via poetry, run the following command.\n```\n$ poetry install\n```\n\n## \u2705 Testing\nTo test this package, please run the following command from the root directory of the repository:\n```\n$ pytest tests/\n```\nBranch coverage could be viewed with the following command:\n```\n$ pytest --cov-branch --cov=py_skipgram_24\n```\n\n## Usage\nTo successfully use our Skipgram model to predict the target, please first ensure you have followed the instruction of installation, and then run the following line in a python notebook. Or you can look at the doc folder, with an example notebook.\n\n```python\nfrom py_skipgram_24 import SkipgramModel, create_input_pairs, get_vocab, MyPreprocessor, get_word_vectors\ncorpus = [\"It was a great day. I loved the movie and spending time with you. I wish we had more time.\", \n \"The sky is always blue underneath. Remember that.\"]\nsentences = MyPreprocessor(corpus)\npp_corpus = list(sentences)\nvocab = get_vocab(pp_corpus)\nword2idx = {word: idx for idx, word in enumerate(vocab)}\nidx_pairs = create_input_pairs(pp_corpus, word2idx, context_size=2)\nmodel = SkipgramModel(len(vocab), 10)\ntrain_model(model, idx_pairs, epochs=250, learning_rate=0.025)\nword_vectors = get_word_vectors(model, word2idx)\nprint(word_vectors)\n```\n\n### \ud83d\udcda Package Integration within the Python Ecosystem\npy_skipgram_24, while acknowledging the robustness and the capabilities of PyTorch\u2019s nn.Module, aims to offer a specialized and streamlined toolkit tailored explicitly for Skip-gram tasks. As a lightweight and focused alternative, py_skipgram_24 serves users who seek a concise package that offers preprocessing, creating input pairs, training the model, and getting word vectors functions. While PyTorch covers a broader spectrum of deep learning algorithms, py_skipgram_24 provides a more specialized package, potentially appealing to those who prefer a tailored implementation of their Skip-gram workflows.\n\n### Contributing\nInterested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.\n\n## License\n`py_skipgram_24` was created by Bill. It is licensed under the terms of the MIT license.\n\n## Credits\n`py_skipgram_24` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Implementing my own skipgram model",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "83f09ba341b3cf1005313311c897833554470dd8650df5007052e9f0340c7828",
"md5": "66db50b80607425a6eba27793281818b",
"sha256": "8a885d73f68cd0de52a5734a6a38f42e8c48a4e4350aece133dc0e05d038ba31"
},
"downloads": -1,
"filename": "py_skipgram_24-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "66db50b80607425a6eba27793281818b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 6952,
"upload_time": "2024-03-22T19:25:53",
"upload_time_iso_8601": "2024-03-22T19:25:53.260362Z",
"url": "https://files.pythonhosted.org/packages/83/f0/9ba341b3cf1005313311c897833554470dd8650df5007052e9f0340c7828/py_skipgram_24-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "364b020cc255e14021f36f947e3c115eb6b51e799a52335a8581ef32cef4e97c",
"md5": "649d94aafb435a0e2371d540a3eb5ec6",
"sha256": "4fd1ec28f71ae6eb6e3e2dc869fc28b5e90f296610ade6eb9b048a6c1d5b0f18"
},
"downloads": -1,
"filename": "py_skipgram_24-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "649d94aafb435a0e2371d540a3eb5ec6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 5723,
"upload_time": "2024-03-22T19:25:54",
"upload_time_iso_8601": "2024-03-22T19:25:54.712222Z",
"url": "https://files.pythonhosted.org/packages/36/4b/020cc255e14021f36f947e3c115eb6b51e799a52335a8581ef32cef4e97c/py_skipgram_24-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-22 19:25:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "py-skipgram-24"
}