# idiom
Access and operations with word2vec data
To install: ```pip install idiom```
## Overview
The `idiom` package provides access to word vector data and useful functions to manipulate and analyze it. It includes functionalities for finding the closest words to a given word, calculating word frequencies, and working with various word vector models.
## Features
- **Closest Words**: Find the closest words to a given word based on cosine similarity.
- **Word Frequencies**: Access and manipulate word frequency data.
- **Word Vector Models**: Work with pre-trained word vector models such as FastText.
- **IDF Calculations**: Compute different types of Inverse Document Frequency (IDF) values.
## Usage
### Finding Closest Words
You can find the closest words to a given word using the `closest_words` function:
```python
from idiom import closest_words
# Example: Find the closest words to 'mad' that start with 'l'
starts_with_L = lambda x: x.startswith('l')
print(closest_words('mad', k=10, search_words=starts_with_L))
```
### Accessing Word Frequencies
You can access the most frequent words using the `most_frequent_words` function:
```python
from idiom import most_frequent_words
# Get the top 100,000 most frequent words
frequent_words = most_frequent_words(max_n_words=100000)
print(frequent_words)
```
### Working with Word Vectors
You can load and work with pre-trained word vectors using the `WordVec` class:
```python
from idiom import WordVec
# Initialize WordVec with default word vectors
word_vec = WordVec()
# Calculate the distance between two queries
distance = word_vec.dist('france capital', 'paris')
print(distance)
```
### IDF Calculations
You can compute different types of IDF values using the `_IDF` class:
```python
from idiom import idf
# Access logarithmic IDF values
log_idf = idf.logarithmic
print(log_idf)
```
## Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/thorwhalen/idiom",
"name": "idiom",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Thor Whalen",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/bd/31/44428393183593fb08c930066c4bd9a6d27d604f3c71d4ded8d29cf5730b/idiom-0.1.6.tar.gz",
"platform": "any",
"description": "# idiom\n\nAccess and operations with word2vec data\n\nTo install:\t```pip install idiom```\n\n## Overview\n\nThe `idiom` package provides access to word vector data and useful functions to manipulate and analyze it. It includes functionalities for finding the closest words to a given word, calculating word frequencies, and working with various word vector models.\n\n## Features\n\n- **Closest Words**: Find the closest words to a given word based on cosine similarity.\n- **Word Frequencies**: Access and manipulate word frequency data.\n- **Word Vector Models**: Work with pre-trained word vector models such as FastText.\n- **IDF Calculations**: Compute different types of Inverse Document Frequency (IDF) values.\n\n## Usage\n\n### Finding Closest Words\n\nYou can find the closest words to a given word using the `closest_words` function:\n\n```python\nfrom idiom import closest_words\n\n# Example: Find the closest words to 'mad' that start with 'l'\nstarts_with_L = lambda x: x.startswith('l')\nprint(closest_words('mad', k=10, search_words=starts_with_L))\n```\n\n### Accessing Word Frequencies\n\nYou can access the most frequent words using the `most_frequent_words` function:\n\n```python\nfrom idiom import most_frequent_words\n\n# Get the top 100,000 most frequent words\nfrequent_words = most_frequent_words(max_n_words=100000)\nprint(frequent_words)\n```\n\n### Working with Word Vectors\n\nYou can load and work with pre-trained word vectors using the `WordVec` class:\n\n```python\nfrom idiom import WordVec\n\n# Initialize WordVec with default word vectors\nword_vec = WordVec()\n\n# Calculate the distance between two queries\ndistance = word_vec.dist('france capital', 'paris')\nprint(distance)\n```\n\n### IDF Calculations\n\nYou can compute different types of IDF values using the `_IDF` class:\n\n```python\nfrom idiom import idf\n\n# Access logarithmic IDF values\nlog_idf = idf.logarithmic\nprint(log_idf)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.\n\n## License\n\nThis project is licensed under the MIT License.\n\n\n",
"bugtrack_url": null,
"license": "mit",
"summary": "Access and operations with word2vec data",
"version": "0.1.6",
"project_urls": {
"Homepage": "https://github.com/thorwhalen/idiom"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c5061673e34edc7a19c2a3cc3678ea48da75963d3b03375f860ab8fb8e2ed6c2",
"md5": "daba44cf8bcab5d8382583ff9743a0d2",
"sha256": "0563cba757fe52f2f7ee9478b4a0a85571b2a191caee84ac04a0dd46ccdb44fe"
},
"downloads": -1,
"filename": "idiom-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "daba44cf8bcab5d8382583ff9743a0d2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 2248561,
"upload_time": "2025-02-01T12:06:08",
"upload_time_iso_8601": "2025-02-01T12:06:08.502015Z",
"url": "https://files.pythonhosted.org/packages/c5/06/1673e34edc7a19c2a3cc3678ea48da75963d3b03375f860ab8fb8e2ed6c2/idiom-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bd3144428393183593fb08c930066c4bd9a6d27d604f3c71d4ded8d29cf5730b",
"md5": "16158c33c90b657dce7fd4924ac5308c",
"sha256": "572c0dbb2a082f4957e2152a4598858e7b4ae8851c09049a80f77ef86564240f"
},
"downloads": -1,
"filename": "idiom-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "16158c33c90b657dce7fd4924ac5308c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 2251216,
"upload_time": "2025-02-01T12:06:10",
"upload_time_iso_8601": "2025-02-01T12:06:10.746190Z",
"url": "https://files.pythonhosted.org/packages/bd/31/44428393183593fb08c930066c4bd9a6d27d604f3c71d4ded8d29cf5730b/idiom-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-01 12:06:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thorwhalen",
"github_project": "idiom",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "idiom"
}