# idiom
Access and operations with word2vec data
To install: ```pip install idiom```
## Overview
The `idiom` package provides access to word vector data and useful functions to manipulate and analyze it. It includes functionalities for finding the closest words to a given word, calculating word frequencies, and working with various word vector models.
## Features
- **Closest Words**: Find the closest words to a given word based on cosine similarity.
- **Word Frequencies**: Access and manipulate word frequency data.
- **Word Vector Models**: Work with pre-trained word vector models such as FastText.
- **IDF Calculations**: Compute different types of Inverse Document Frequency (IDF) values.
## Usage
### Finding Closest Words
You can find the closest words to a given word using the `closest_words` function:
```python
from idiom import closest_words
# Example: Find the closest words to 'mad' that start with 'l'
starts_with_L = lambda x: x.startswith('l')
print(closest_words('mad', k=10, search_words=starts_with_L))
```
### Accessing Word Frequencies
You can access the most frequent words using the `most_frequent_words` function:
```python
from idiom import most_frequent_words
# Get the top 100,000 most frequent words
frequent_words = most_frequent_words(max_n_words=100000)
print(frequent_words)
```
### Working with Word Vectors
You can load and work with pre-trained word vectors using the `WordVec` class:
```python
from idiom import WordVec
# Initialize WordVec with default word vectors
word_vec = WordVec()
# Calculate the distance between two queries
distance = word_vec.dist('france capital', 'paris')
print(distance)
```
### IDF Calculations
You can compute different types of IDF values using the `_IDF` class:
```python
from idiom import idf
# Access logarithmic IDF values
log_idf = idf.logarithmic
print(log_idf)
```
## Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/thorwhalen/idiom",
"name": "idiom",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Thor Whalen",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/25/58/0ef6df3ed985cca2bfefd8b35a2312600da22a7b52455293ccddd572e298/idiom-0.1.5.tar.gz",
"platform": "any",
"description": "# idiom\n\nAccess and operations with word2vec data\n\nTo install:\t```pip install idiom```\n\n## Overview\n\nThe `idiom` package provides access to word vector data and useful functions to manipulate and analyze it. It includes functionalities for finding the closest words to a given word, calculating word frequencies, and working with various word vector models.\n\n## Features\n\n- **Closest Words**: Find the closest words to a given word based on cosine similarity.\n- **Word Frequencies**: Access and manipulate word frequency data.\n- **Word Vector Models**: Work with pre-trained word vector models such as FastText.\n- **IDF Calculations**: Compute different types of Inverse Document Frequency (IDF) values.\n\n## Usage\n\n### Finding Closest Words\n\nYou can find the closest words to a given word using the `closest_words` function:\n\n```python\nfrom idiom import closest_words\n\n# Example: Find the closest words to 'mad' that start with 'l'\nstarts_with_L = lambda x: x.startswith('l')\nprint(closest_words('mad', k=10, search_words=starts_with_L))\n```\n\n### Accessing Word Frequencies\n\nYou can access the most frequent words using the `most_frequent_words` function:\n\n```python\nfrom idiom import most_frequent_words\n\n# Get the top 100,000 most frequent words\nfrequent_words = most_frequent_words(max_n_words=100000)\nprint(frequent_words)\n```\n\n### Working with Word Vectors\n\nYou can load and work with pre-trained word vectors using the `WordVec` class:\n\n```python\nfrom idiom import WordVec\n\n# Initialize WordVec with default word vectors\nword_vec = WordVec()\n\n# Calculate the distance between two queries\ndistance = word_vec.dist('france capital', 'paris')\nprint(distance)\n```\n\n### IDF Calculations\n\nYou can compute different types of IDF values using the `_IDF` class:\n\n```python\nfrom idiom import idf\n\n# Access logarithmic IDF values\nlog_idf = idf.logarithmic\nprint(log_idf)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.\n\n## License\n\nThis project is licensed under the MIT License.\n\n\n",
"bugtrack_url": null,
"license": "mit",
"summary": "Access and operations with word2vec data",
"version": "0.1.5",
"project_urls": {
"Homepage": "https://github.com/thorwhalen/idiom"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "798d7f31109613b19fb6137a7a409bb653f8d8146de02c915ff9360cca8926b6",
"md5": "1ff6e797c9c53dbee3312105a21ed1dd",
"sha256": "0b2791a0d7b8e97aac0cc58f32e45f523f8325392d4778cca54aca3c8221acd9"
},
"downloads": -1,
"filename": "idiom-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1ff6e797c9c53dbee3312105a21ed1dd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 2246883,
"upload_time": "2024-12-12T09:35:05",
"upload_time_iso_8601": "2024-12-12T09:35:05.638516Z",
"url": "https://files.pythonhosted.org/packages/79/8d/7f31109613b19fb6137a7a409bb653f8d8146de02c915ff9360cca8926b6/idiom-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "25580ef6df3ed985cca2bfefd8b35a2312600da22a7b52455293ccddd572e298",
"md5": "65d0c245f70059f73b4ab55ad9025125",
"sha256": "b241107284e256a5dd48ad4519f1404b8620241b0b34f0d3906f7c55cb889d61"
},
"downloads": -1,
"filename": "idiom-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "65d0c245f70059f73b4ab55ad9025125",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 2249452,
"upload_time": "2024-12-12T09:35:08",
"upload_time_iso_8601": "2024-12-12T09:35:08.460464Z",
"url": "https://files.pythonhosted.org/packages/25/58/0ef6df3ed985cca2bfefd8b35a2312600da22a7b52455293ccddd572e298/idiom-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-12 09:35:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thorwhalen",
"github_project": "idiom",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "idiom"
}