Name | swordcloud JSON |
Version |
0.0.10
JSON |
| download |
home_page | |
Summary | Semantic word cloud package for Thai and English |
upload_time | 2023-11-08 19:49:20 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | MIT |
keywords |
python
word cloud
t-sne
k-means
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# **Semantic Word Cloud for Thai and English**
`swordcloud`: A semantic word cloud generator that uses t-SNE and k-means clustering to visualize words in high-dimensional semantic space. Based on [A. Mueller's `wordcloud` module](https://github.com/amueller/word_cloud), `swordcloud` can generate semantic word clouds from Thai and English texts based on any word vector models.
## **Content**
1. [Installation](#installation)
2. [Usage](#usage)\
2.1 [Initialize `SemanticWordCloud` instance](#initialize-semanticwordcloud-instance)\
2.2 [Generate from Raw Text](#generate-from-raw-text)\
2.3 [Generate from Word Frequencies](#generate-from-word-frequencies)\
2.4 [Generate k-means Cluster Clouds](#generate-k-means-cluster-clouds)\
2.5 [Recolor Words](#recolor-words)\
2.6 [Export Word Clouds](#export-word-clouds)
3. [Color "Functions"](#color-functions)
## **Installation**
`swordcloud` can be installed using `pip`:
```
pip install swordcloud
```
Optionally, if you want to be able to embed fonts directly into [the generated SVGs](#export-word-clouds), an `embedfont` extra can also be specified:
```
pip install swordcloud[embedfont]
```
As of **version 0.0.10**, the exact list of dependencies is as follow:
- `python >= 3.8`
- `numpy >= 1.21.0`
- `pillow`
- `matplotlib >= 1.5.3`
- `gensim >= 4.0.0`
- `pandas`
- `pythainlp >= 3.1.0`
- `k-means-constrained`
- `scikit-learn`
- (optional) `fonttools`
## **Usage**
All code below can also be found in [the example folder](https://github.com/nlp-chula/swordcloud/tree/main/example).
### **Initialize `SemanticWordCloud` instance**
For most use cases, the `SemanticWordCloud` class is the main API the users will be interacting with.
```python
from swordcloud import SemanticWordCloud
# See the `Color "Functions"` section for detail about these color functions
from swordcloud.color_func import SingleColorFunc
wordcloud = SemanticWordCloud(
language = 'TH',
width = 1600,
height = 800,
max_font_size = 150,
prefer_horizontal = 1,
color_func = SingleColorFunc('black')
)
```
Please refer to the documentation in [src/swordcloud/wordcloud.py](https://github.com/nlp-chula/swordcloud/blob/main/src/swordcloud/wordcloud.py) or in your IDE for more detail about various options available for customizing the word cloud.
### **Generate from Raw Text**
```python
# Can also be one large string instead of a list of strings
raw_text = list(map(str.strip, open('raw_text.txt', encoding='utf-8')))
wordcloud.generate_from_text(raw_text, random_state=42)
```
![Word cloud generated from raw text](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_from_raw_text.png)
### **Generate from Word Frequencies**
```python
freq = {}
for line in open("word_frequencies.tsv", encoding="utf-8"):
word, count = line.strip().split('\t')
freq[word] = int(count)
wordcloud.generate_from_frequencies(freq, random_state=42)
```
![Word cloud generated from word frequencies](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_from_frequencies.png)
### **Generate k-means Cluster Clouds**
```python
from swordcloud.color_func import FrequencyColorFunc
wordcloud = SemanticWordCloud(
language = 'TH',
# make sure the canvas is appropriately large for the number of clusters
width = 2400,
height = 1200,
max_font_size = 150,
prefer_horizontal = 1
)
wordcloud.generate_from_text(raw_text, kmeans=6, random_state=42, plot_now=False)
# Or directly from `generate_kmeans_cloud` if you already have word frequencies
wordcloud.generate_kmeans_cloud(freq, n_clusters=6, random_state=42, plot_now=False)
# Each sub cloud can then be individually interacted with
# by accessing individual cloud in `sub_clouds` attribute
for cloud, color in zip(wordcloud.sub_clouds, ["red", "blue", "brown", "green", "black", "orange"]):
cloud.recolor(FrequencyColorFunc(color), plot_now=False)
cloud.show()
```
||||
-|-|-
![Word cloud 1 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_1.png)|![Word cloud 2 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_2.png)|![Word cloud 3 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_3.png)
![Word cloud 4 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_4.png)|![Word cloud 5 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_5.png)|![Word cloud 6 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_6.png)
### **Recolor Words**
```python
# If the generated colors are not to your liking
# We can recolor them instead of re-generating the whole cloud
from swordcloud.color_func import RandomColorFunc
wordcloud.recolor(RandomColorFunc, random_state=42)
```
![Recolored word cloud](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/recolor.png)
### **Export Word Clouds**
- As `pillow`'s `Image`
```python
img = wordcloud.to_image()
```
- As image file
```python
wordcloud.to_file('wordcloud.png')
```
- As SVG
```python
# Without embedded font
svg = wordcloud.to_svg()
# With embedded font
svg = wordcloud.to_svg(embed_font=True)
# Note that in order to be able to embed fonts
# the `fonttools` package needs to be installed
```
- As `numpy`'s image array
```python
array = wordcloud.to_array()
```
## **Color "Functions"**
A number of built-in color "functions" can be accessed from `swordcloud.color_func`:
```python
from swordcloud.color_func import <your_color_function_here>
```
The list of available functions is as follow:
- `RandomColorFunc` (Default)\
Return a random color.
- `ColorMapFunc`\
Return a random color from the user-specified [`matplotlib`'s colormap](https://matplotlib.org/stable/gallery/color/colormap_reference.html).
- `ImageColorFunc`\
Use a user-provided colored image array to determine word color at each position on the canvas.
- `SingleColorFunc`\
Always return the user-specified color every single time, resulting in every word having the same color.
- `ExactColorFunc`\
Use a user-provided color dictionary to determine exactly which word should have which color.
- `FrequencyColorFunc`\
Assign colors based on word frequencies, with less frequent words having lighter colors. The base color is specified by the user.
All the above functions, **except** `RandomColorFunc` which cannot be customized further, must be initialized before passing them to the `SemanticWordCloud` class. For example:
```python
from swordcloud.color_func import ColorMapFunc
color_func = ColorMapFunc("magma")
wordcloud = SemanticWordCloud(
...
color_func = color_func
...
)
```
Users can also implement their own color functions, provided that they are callable with the following signature:
**Input**:
- `word: str`\
The word we are coloring
- `frequency: float`\
Frequency of the word in a scale from 0 to 1
- `font_size: int`\
Font size of the word
- `position: tuple[int, int]`\
Coordinate of the top-left point of the word's bounding box on the canvas
- `orientation: PIL.Image.Transpose | None`\
[`pillow`'s orientation](https://pillow.readthedocs.io/en/stable/reference/Image.html#transpose-methods).
- `font_path: str`\
Path to the font file (OTF or TFF)
- `random_state: random.Random`\
Python's `random.Random` object
**Return**:\
Any object that can be interpreted as a color by `pillow`. See [`pillow`'s documentation](https://pillow.readthedocs.io/en/stable/) for more detail.
Internally, arguments to color functions are always passed as keyword arguments so they can be in any order. However, if your functions only use some of them, make sure to include `**kwargs` at the end of your function headers so that other arguments do not cause an error.
Raw data
{
"_id": null,
"home_page": "",
"name": "swordcloud",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "python,word cloud,t-SNE,K-means",
"author": "",
"author_email": "Attapol Thamrongrattanarit <profte@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8c/9d/b6b49b4ca119b34722f362b23abff2eab9387fb57e9a610883e3b14e4130/swordcloud-0.0.10.tar.gz",
"platform": null,
"description": "# **Semantic Word Cloud for Thai and English**\r\n`swordcloud`: A semantic word cloud generator that uses t-SNE and k-means clustering to visualize words in high-dimensional semantic space. Based on [A. Mueller's `wordcloud` module](https://github.com/amueller/word_cloud), `swordcloud` can generate semantic word clouds from Thai and English texts based on any word vector models.\r\n\r\n## **Content**\r\n1. [Installation](#installation)\r\n2. [Usage](#usage)\\\r\n 2.1 [Initialize `SemanticWordCloud` instance](#initialize-semanticwordcloud-instance)\\\r\n 2.2 [Generate from Raw Text](#generate-from-raw-text)\\\r\n 2.3 [Generate from Word Frequencies](#generate-from-word-frequencies)\\\r\n 2.4 [Generate k-means Cluster Clouds](#generate-k-means-cluster-clouds)\\\r\n 2.5 [Recolor Words](#recolor-words)\\\r\n 2.6 [Export Word Clouds](#export-word-clouds)\r\n3. [Color \"Functions\"](#color-functions)\r\n\r\n## **Installation**\r\n`swordcloud` can be installed using `pip`:\r\n```\r\npip install swordcloud\r\n```\r\nOptionally, if you want to be able to embed fonts directly into [the generated SVGs](#export-word-clouds), an `embedfont` extra can also be specified:\r\n```\r\npip install swordcloud[embedfont]\r\n```\r\nAs of **version 0.0.10**, the exact list of dependencies is as follow:\r\n- `python >= 3.8`\r\n- `numpy >= 1.21.0`\r\n- `pillow`\r\n- `matplotlib >= 1.5.3`\r\n- `gensim >= 4.0.0`\r\n- `pandas`\r\n- `pythainlp >= 3.1.0`\r\n- `k-means-constrained`\r\n- `scikit-learn`\r\n- (optional) `fonttools`\r\n\r\n## **Usage**\r\nAll code below can also be found in [the example folder](https://github.com/nlp-chula/swordcloud/tree/main/example).\r\n### **Initialize `SemanticWordCloud` instance**\r\nFor most use cases, the `SemanticWordCloud` class is the main API the users will be interacting with.\r\n```python\r\nfrom swordcloud import SemanticWordCloud\r\n# See the `Color \"Functions\"` section for detail about these color functions\r\nfrom swordcloud.color_func import SingleColorFunc\r\n\r\nwordcloud = SemanticWordCloud(\r\n language = 'TH',\r\n width = 1600,\r\n height = 800,\r\n max_font_size = 150,\r\n prefer_horizontal = 1,\r\n color_func = SingleColorFunc('black')\r\n)\r\n```\r\nPlease refer to the documentation in [src/swordcloud/wordcloud.py](https://github.com/nlp-chula/swordcloud/blob/main/src/swordcloud/wordcloud.py) or in your IDE for more detail about various options available for customizing the word cloud.\r\n### **Generate from Raw Text**\r\n```python\r\n# Can also be one large string instead of a list of strings\r\nraw_text = list(map(str.strip, open('raw_text.txt', encoding='utf-8')))\r\n\r\nwordcloud.generate_from_text(raw_text, random_state=42)\r\n```\r\n![Word cloud generated from raw text](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_from_raw_text.png)\r\n### **Generate from Word Frequencies**\r\n```python\r\nfreq = {}\r\nfor line in open(\"word_frequencies.tsv\", encoding=\"utf-8\"):\r\n word, count = line.strip().split('\\t')\r\n freq[word] = int(count)\r\n\r\nwordcloud.generate_from_frequencies(freq, random_state=42)\r\n```\r\n![Word cloud generated from word frequencies](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_from_frequencies.png)\r\n### **Generate k-means Cluster Clouds**\r\n```python\r\nfrom swordcloud.color_func import FrequencyColorFunc\r\n\r\nwordcloud = SemanticWordCloud(\r\n language = 'TH',\r\n # make sure the canvas is appropriately large for the number of clusters\r\n width = 2400,\r\n height = 1200,\r\n max_font_size = 150,\r\n prefer_horizontal = 1\r\n)\r\n\r\nwordcloud.generate_from_text(raw_text, kmeans=6, random_state=42, plot_now=False)\r\n# Or directly from `generate_kmeans_cloud` if you already have word frequencies\r\nwordcloud.generate_kmeans_cloud(freq, n_clusters=6, random_state=42, plot_now=False)\r\n\r\n# Each sub cloud can then be individually interacted with\r\n# by accessing individual cloud in `sub_clouds` attribute\r\nfor cloud, color in zip(wordcloud.sub_clouds, [\"red\", \"blue\", \"brown\", \"green\", \"black\", \"orange\"]):\r\n cloud.recolor(FrequencyColorFunc(color), plot_now=False)\r\n\r\ncloud.show()\r\n```\r\n||||\r\n-|-|-\r\n![Word cloud 1 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_1.png)|![Word cloud 2 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_2.png)|![Word cloud 3 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_3.png)\r\n![Word cloud 4 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_4.png)|![Word cloud 5 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_5.png)|![Word cloud 6 generated from k-means clustering](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/generate_kmeans_cloud_6.png)\r\n### **Recolor Words**\r\n```python\r\n# If the generated colors are not to your liking\r\n# We can recolor them instead of re-generating the whole cloud\r\nfrom swordcloud.color_func import RandomColorFunc\r\nwordcloud.recolor(RandomColorFunc, random_state=42)\r\n```\r\n![Recolored word cloud](https://raw.githubusercontent.com/nlp-chula/swordcloud/main/example/recolor.png)\r\n### **Export Word Clouds**\r\n- As `pillow`'s `Image`\r\n```python\r\nimg = wordcloud.to_image()\r\n```\r\n- As image file\r\n```python\r\nwordcloud.to_file('wordcloud.png')\r\n```\r\n- As SVG\r\n```python\r\n# Without embedded font\r\nsvg = wordcloud.to_svg()\r\n# With embedded font\r\nsvg = wordcloud.to_svg(embed_font=True)\r\n\r\n# Note that in order to be able to embed fonts\r\n# the `fonttools` package needs to be installed\r\n```\r\n- As `numpy`'s image array\r\n```python\r\narray = wordcloud.to_array()\r\n```\r\n\r\n## **Color \"Functions\"**\r\nA number of built-in color \"functions\" can be accessed from `swordcloud.color_func`:\r\n```python\r\nfrom swordcloud.color_func import <your_color_function_here>\r\n```\r\nThe list of available functions is as follow:\r\n- `RandomColorFunc` (Default)\\\r\n Return a random color.\r\n- `ColorMapFunc`\\\r\n Return a random color from the user-specified [`matplotlib`'s colormap](https://matplotlib.org/stable/gallery/color/colormap_reference.html).\r\n- `ImageColorFunc`\\\r\n Use a user-provided colored image array to determine word color at each position on the canvas.\r\n- `SingleColorFunc`\\\r\n Always return the user-specified color every single time, resulting in every word having the same color.\r\n- `ExactColorFunc`\\\r\n Use a user-provided color dictionary to determine exactly which word should have which color.\r\n- `FrequencyColorFunc`\\\r\n Assign colors based on word frequencies, with less frequent words having lighter colors. The base color is specified by the user.\r\n\r\nAll the above functions, **except** `RandomColorFunc` which cannot be customized further, must be initialized before passing them to the `SemanticWordCloud` class. For example:\r\n```python\r\nfrom swordcloud.color_func import ColorMapFunc\r\ncolor_func = ColorMapFunc(\"magma\")\r\nwordcloud = SemanticWordCloud(\r\n ...\r\n color_func = color_func\r\n ...\r\n)\r\n```\r\nUsers can also implement their own color functions, provided that they are callable with the following signature:\r\n\r\n**Input**:\r\n- `word: str`\\\r\n The word we are coloring\r\n- `frequency: float`\\\r\n Frequency of the word in a scale from 0 to 1\r\n- `font_size: int`\\\r\n Font size of the word\r\n- `position: tuple[int, int]`\\\r\n Coordinate of the top-left point of the word's bounding box on the canvas\r\n- `orientation: PIL.Image.Transpose | None`\\\r\n [`pillow`'s orientation](https://pillow.readthedocs.io/en/stable/reference/Image.html#transpose-methods).\r\n- `font_path: str`\\\r\n Path to the font file (OTF or TFF)\r\n- `random_state: random.Random`\\\r\n Python's `random.Random` object\r\n\r\n**Return**:\\\r\nAny object that can be interpreted as a color by `pillow`. See [`pillow`'s documentation](https://pillow.readthedocs.io/en/stable/) for more detail.\r\n\r\nInternally, arguments to color functions are always passed as keyword arguments so they can be in any order. However, if your functions only use some of them, make sure to include `**kwargs` at the end of your function headers so that other arguments do not cause an error.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Semantic word cloud package for Thai and English",
"version": "0.0.10",
"project_urls": {
"Homepage": "https://github.com/nlp-chula/swordcloud"
},
"split_keywords": [
"python",
"word cloud",
"t-sne",
"k-means"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3128cf5d6bc94a5a1a708d22b20f86a40761b61283514ca404518ea0b0ea4e7b",
"md5": "958ef946f42077001cce008f505d0130",
"sha256": "7e6d4a20780c77b000fb6549283430a5d7117f51dedeb370626753f2f1acfe07"
},
"downloads": -1,
"filename": "swordcloud-0.0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "958ef946f42077001cce008f505d0130",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 71265,
"upload_time": "2023-11-08T19:49:18",
"upload_time_iso_8601": "2023-11-08T19:49:18.352705Z",
"url": "https://files.pythonhosted.org/packages/31/28/cf5d6bc94a5a1a708d22b20f86a40761b61283514ca404518ea0b0ea4e7b/swordcloud-0.0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8c9db6b49b4ca119b34722f362b23abff2eab9387fb57e9a610883e3b14e4130",
"md5": "74fb879c21ec009d2022958343be67bc",
"sha256": "65a2ba6bef8ebec47b83bdf0e949a7fc0419e30cf517427227cd4d613c74d59b"
},
"downloads": -1,
"filename": "swordcloud-0.0.10.tar.gz",
"has_sig": false,
"md5_digest": "74fb879c21ec009d2022958343be67bc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 73965,
"upload_time": "2023-11-08T19:49:20",
"upload_time_iso_8601": "2023-11-08T19:49:20.472221Z",
"url": "https://files.pythonhosted.org/packages/8c/9d/b6b49b4ca119b34722f362b23abff2eab9387fb57e9a610883e3b14e4130/swordcloud-0.0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-08 19:49:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nlp-chula",
"github_project": "swordcloud",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "swordcloud"
}