# myclustering package
## Description and Features
The MyClustering package is a Python library that provides implementations of various clustering algorithms, including K-means. It also includes utilities for visualizing clustering results and performing dimensionality reduction using PCA. The package aims to simplify the process of clustering and provide tools for analyzing and interpreting clustering results.
Key features of the myclustering package include:
- K-means clustering algorithm
- Silhouette score calculation and elbow method visualization
- PCA for dimensionality reduction
- Visualization of clustering results using scatter plots
## Installation
To install the myclustering package, you can use pip:
```bash
pip install myclustering
```
## Usage Examples
Here are some examples of how to use the myclustering package:
### K-means Clustering
```python
import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
from myclustering.kmeans.kmeans import KMeans
X, y = make_blobs(centers=3, n_samples=500, n_features=2, shuffle=True, random_state=40)
print(X.shape)
clusters = len(np.unique(y))
print(clusters)
k = KMeans(K=clusters, max_iters=150, plot_steps=False)
y_pred = k.fit(X)
k.plot()
```
You can identify the best number of cluster for K-means by looking at the silhouette score and elbow method
visualization.
```python
from myclustering.kmeans.silhouette import silhouette_score
from myclustering.kmeans.elbow import elbow_method
silhouette_score(X,k)
elbow_method(X)
```
### PCA for dimensionality reduction
```python
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as np
from myclustering.pca.pca import PCA
data = datasets.load_iris()
X = data.data
y = data.target
# Project the data onto the 2 primary principal components
pca = PCA(2)
pca.fit(X)
X_projected = pca.transform(X)
print('Shape of X:', X.shape)
print('Shape of transformed X:', X_projected.shape)
x1 = X_projected[:, 0]
x2 = X_projected[:, 1]
plt.scatter(x1, x2,
c=y, edgecolor='none', alpha=0.8,
cmap=plt.cm.get_cmap('viridis', 3))
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar()
plt.show()
```
### DBSCAN visualization with the help of PCA
You can also visalize the results of your DBSCAN algorithm and identify the outliers in you data.
```python
from myclustering.dbscan.visualization import plot_dbscan_pca
plot_dbscan_pca(X, epsilon = 0.3, min_points = 5)
```
## Customer segmentation
One of the common applications of clustering is customer segmentation, where customers are grouped into distinct segments based on their behavior, preferences, or characteristics. The MyClustering package can be used for customer segmentation tasks.
Here's an example of how the myclustering package can be used for customer segmentation:
```python
import pandas as pd
from myclustering.kmeans.kmeans import KMeans
from myclustering.pca.pca import PCA
# Load customer data
data = pd.read_csv('customer_data.csv')
# Preprocess the data (e.g., remove missing values, scale features)
# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
pca.fit(data)
X_pca = pca.transform(data)
# Apply K-means clustering
kmeans = KMeans(K=3, max_iters=150, plot_steps=False)
kmeans.fit(data)
# Analyze the clustering results
# (e.g., visualize clusters, identify key features for each cluster)
kmeans.plot()
# Interpret and use the customer segments for targeted marketing, personalized recommendations, etc.
```
## Contributing
Contributions to the MyClustering package are welcome! If you find any issues, have suggestions for improvements, or would like to add new features, feel free to open an issue or submit a pull request on the GitHub repository.
## License
The myclustering package is licensed under the MIT License. See the [MIT](https://opensource.org/license/mit/) for more information.
Credits
-------
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
Raw data
{
"_id": null,
"home_page": "https://github.com/Natali-Hovhannisyan/DS233_Python_Package",
"name": "myclustering",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "myclustering",
"author": "Natali Hovhannisyan",
"author_email": "natalihovhannisyan00@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/38/21/187c70595197da900e0693dc67d749c9b6e37cab4421f6b06fbbe0b46b33/myclustering-0.1.0.tar.gz",
"platform": null,
"description": "# myclustering package\n\n\n## Description and Features\n\nThe MyClustering package is a Python library that provides implementations of various clustering algorithms, including K-means. It also includes utilities for visualizing clustering results and performing dimensionality reduction using PCA. The package aims to simplify the process of clustering and provide tools for analyzing and interpreting clustering results.\n\nKey features of the myclustering package include:\n\n- K-means clustering algorithm\n- Silhouette score calculation and elbow method visualization\n- PCA for dimensionality reduction\n- Visualization of clustering results using scatter plots\n\n## Installation\n\nTo install the myclustering package, you can use pip:\n\n```bash\npip install myclustering\n```\n\n## Usage Examples\n\nHere are some examples of how to use the myclustering package:\n\n### K-means Clustering\n\n```python\nimport numpy as np\nfrom sklearn.datasets import make_blobs\nimport matplotlib.pyplot as plt\nfrom myclustering.kmeans.kmeans import KMeans\n\nX, y = make_blobs(centers=3, n_samples=500, n_features=2, shuffle=True, random_state=40)\nprint(X.shape)\n\nclusters = len(np.unique(y))\nprint(clusters)\nk = KMeans(K=clusters, max_iters=150, plot_steps=False)\ny_pred = k.fit(X)\n\nk.plot()\n\n```\nYou can identify the best number of cluster for K-means by looking at the silhouette score and elbow method\nvisualization.\n\n```python\nfrom myclustering.kmeans.silhouette import silhouette_score\nfrom myclustering.kmeans.elbow import elbow_method \n\nsilhouette_score(X,k)\nelbow_method(X)\n```\n### PCA for dimensionality reduction\n\n```python\nfrom sklearn import datasets\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom myclustering.pca.pca import PCA\n\n\ndata = datasets.load_iris()\nX = data.data\ny = data.target\n\n# Project the data onto the 2 primary principal components\npca = PCA(2)\npca.fit(X)\nX_projected = pca.transform(X)\n\nprint('Shape of X:', X.shape)\nprint('Shape of transformed X:', X_projected.shape)\n\nx1 = X_projected[:, 0]\nx2 = X_projected[:, 1]\n\nplt.scatter(x1, x2,\n c=y, edgecolor='none', alpha=0.8,\n cmap=plt.cm.get_cmap('viridis', 3))\n\nplt.xlabel('Principal Component 1')\nplt.ylabel('Principal Component 2')\nplt.colorbar()\nplt.show()\n```\n### DBSCAN visualization with the help of PCA\n\nYou can also visalize the results of your DBSCAN algorithm and identify the outliers in you data.\n\n```python\nfrom myclustering.dbscan.visualization import plot_dbscan_pca\nplot_dbscan_pca(X, epsilon = 0.3, min_points = 5)\n```\n## Customer segmentation\n\nOne of the common applications of clustering is customer segmentation, where customers are grouped into distinct segments based on their behavior, preferences, or characteristics. The MyClustering package can be used for customer segmentation tasks.\n\nHere's an example of how the myclustering package can be used for customer segmentation:\n\n```python\nimport pandas as pd\nfrom myclustering.kmeans.kmeans import KMeans\nfrom myclustering.pca.pca import PCA\n\n# Load customer data\ndata = pd.read_csv('customer_data.csv')\n\n# Preprocess the data (e.g., remove missing values, scale features)\n\n# Apply PCA for dimensionality reduction\npca = PCA(n_components=2)\npca.fit(data)\nX_pca = pca.transform(data)\n\n# Apply K-means clustering\nkmeans = KMeans(K=3, max_iters=150, plot_steps=False)\nkmeans.fit(data)\n\n# Analyze the clustering results\n# (e.g., visualize clusters, identify key features for each cluster)\nkmeans.plot()\n\n# Interpret and use the customer segments for targeted marketing, personalized recommendations, etc.\n```\n## Contributing\n\nContributions to the MyClustering package are welcome! If you find any issues, have suggestions for improvements, or would like to add new features, feel free to open an issue or submit a pull request on the GitHub repository.\n\n## License\n\nThe myclustering package is licensed under the MIT License. See the [MIT](https://opensource.org/license/mit/) for more information.\n\n\nCredits\n-------\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n.. _Cookiecutter: https://github.com/audreyr/cookiecutter\n.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n\n\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Python package for clustering",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/Natali-Hovhannisyan/DS233_Python_Package"
},
"split_keywords": [
"myclustering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "979e443cc059c7375498285896fe6c79f4f8ab12b237aa9cd62495f5e960b0ec",
"md5": "0ca5a1f8ef00fad35203960d4e078c0e",
"sha256": "8cd8a4a1a59c10fae79638b24c1ca2c6974cfa07c8b666f83db4b3435261f97b"
},
"downloads": -1,
"filename": "myclustering-0.1.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "0ca5a1f8ef00fad35203960d4e078c0e",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.6",
"size": 13292,
"upload_time": "2023-05-16T15:16:26",
"upload_time_iso_8601": "2023-05-16T15:16:26.996329Z",
"url": "https://files.pythonhosted.org/packages/97/9e/443cc059c7375498285896fe6c79f4f8ab12b237aa9cd62495f5e960b0ec/myclustering-0.1.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3821187c70595197da900e0693dc67d749c9b6e37cab4421f6b06fbbe0b46b33",
"md5": "bc675441b1cf474936ec205c3cf428a0",
"sha256": "d6f13469fdc23bc3bc75ed387470bcdcb4cf706b2e5c2b93bf4b2f6d6eb9f497"
},
"downloads": -1,
"filename": "myclustering-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "bc675441b1cf474936ec205c3cf428a0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 19537,
"upload_time": "2023-05-16T15:16:28",
"upload_time_iso_8601": "2023-05-16T15:16:28.964552Z",
"url": "https://files.pythonhosted.org/packages/38/21/187c70595197da900e0693dc67d749c9b6e37cab4421f6b06fbbe0b46b33/myclustering-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-16 15:16:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Natali-Hovhannisyan",
"github_project": "DS233_Python_Package",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"tox": true,
"lcname": "myclustering"
}