# Pandas-powered LabelEncoder
## Performance benchmark
From the test, compare to sklearn's LabelEncoder.
```
Total rows: 24,123,464
Scikit-learn's LabelEncoder - 13.35 seconds
Pandas-powered LabelEncoder - 2.44 seconds
```
## Usage
## Installation
```shell
pip install pandas-label-encoder
```
### Initiation and fitting
```python
import pandas_label_encoder as ec
from pandas_label_encoder import EncoderCategoryError
categories = ['Cat', 'Dog', 'Bird'] # can be pd.Series, np.array, list
# Fit at inititation
animal_encoder = ec.Encoder(categories)
# Fit later
animal_encoder = ec.Encoder()
animal_encoder.fit(categories)
animal_encoder.categories # ['Cat', 'Dog', 'Bird'], read-only
# Trying to use functions before assign appropiate categories will raise EncoderCategoryError
ec.Encoder().transform() # Raise EncoderCategoryError
ec.Encoder().inverse_transform() # Raise EncoderCategoryError
```
### Transform
- Unknown categories would be parsed as -1
- If you want to raise an error, there are 2 validation options.
- validation=`all` -- Raise EncoderError if any result is -1
- validation=`any` -- Raise EncoderError if all of them are -1
```python
from pandas_label_encoder import EncoderValidationError
animal_encoder.transform(['Cat']) # [2]
animal_encoder.transform(['Fish']) # [-1]
animal_encoder.transform(['Fish'], validation='all') # Raise EncoderValidationError
animal_encoder.transform(['Fish'], validation='any') # Raise EncoderValidationError
try:
animal_encoder.transform(['Fish', 'Cat'], validation='all') # Raise EncoderValidationError
except EncoderError:
print('There is an unknown animal.')
animal_encoder.transform(['Fish', 'Cat'], validation='any') # [-1, 2]
```
### Inverse transform
- Unknown categories would be parsed as NaN
- If you want to raise an error, there are 2 validation options.
- validation=`all` -- Raise EncoderError if any result is NaN
- validation=`any` -- Raise EncoderError if all of them are NaN
```python
from pandas_label_encoder import EncoderValidationError
animal_encoder.inverse_transform([2]) # ['Cat']
animal_encoder.inverse_transform([9]) # [NaN]
animal_encoder.inverse_transform([9], validation='all') # Raise EncoderValidationError
animal_encoder.inverse_transform([9], validation='any') # Raise EncoderValidationError
try:
animal_encoder.inverse_transform([9, 2], validation='all') # Raise EncoderValidationError
except EncoderError:
print('There is an unknown animal.')
animal_encoder.inverse_transform([9, 2], validation='any') # [NaN, 'Cat']
```
### Save and load the encoder
The load_encoder and encoder.Encoder.load methods will load the encoder and check for the encoder version.
Different encoder version may have some changes that cause errors.
To check current encoder version, use `encoder.Encoder.__version__`.
```python
from pandas_label_encoder import save_encoder, load_encoder
# Save or load other encoder directly from the encoder itself
animal_encoder.save(path) # save current encoder
animal_encoder.load(path) # load other encoder and assign to current encoder
# Save or load other encoder by using functions
animal_encoder = load_encoder(path)
save_encoder(path)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/benzerer/pandas-label-encoder",
"name": "pandas-label-encoder",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "pandas,label-encoder,label-encoding,label,encoding,encoder",
"author": "NOPDANAI DEJVORAKUL",
"author_email": "b.intm@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/c4/00/63ed3f15b935d652e616e74e49377f8d0c7f2d1d816ec547127f4ef1a7ab/pandas_label_encoder-1.0.1.tar.gz",
"platform": null,
"description": "# Pandas-powered LabelEncoder\n\n## Performance benchmark\nFrom the test, compare to sklearn's LabelEncoder.\n```\nTotal rows: 24,123,464\nScikit-learn's LabelEncoder - 13.35 seconds\nPandas-powered LabelEncoder - 2.44 seconds\n```\n\n## Usage\n## Installation\n```shell\npip install pandas-label-encoder\n```\n### Initiation and fitting\n```python\nimport pandas_label_encoder as ec\nfrom pandas_label_encoder import EncoderCategoryError\n\ncategories = ['Cat', 'Dog', 'Bird'] # can be pd.Series, np.array, list\n\n# Fit at inititation\nanimal_encoder = ec.Encoder(categories)\n\n# Fit later\nanimal_encoder = ec.Encoder()\nanimal_encoder.fit(categories)\n\nanimal_encoder.categories # ['Cat', 'Dog', 'Bird'], read-only\n\n# Trying to use functions before assign appropiate categories will raise EncoderCategoryError\nec.Encoder().transform() # Raise EncoderCategoryError\nec.Encoder().inverse_transform() # Raise EncoderCategoryError\n```\n\n### Transform\n- Unknown categories would be parsed as -1\n- If you want to raise an error, there are 2 validation options.\n - validation=`all` -- Raise EncoderError if any result is -1\n - validation=`any` -- Raise EncoderError if all of them are -1\n```python\nfrom pandas_label_encoder import EncoderValidationError\n\nanimal_encoder.transform(['Cat']) # [2]\nanimal_encoder.transform(['Fish']) # [-1]\n\nanimal_encoder.transform(['Fish'], validation='all') # Raise EncoderValidationError\nanimal_encoder.transform(['Fish'], validation='any') # Raise EncoderValidationError\n\ntry:\n animal_encoder.transform(['Fish', 'Cat'], validation='all') # Raise EncoderValidationError\nexcept EncoderError:\n print('There is an unknown animal.')\n\nanimal_encoder.transform(['Fish', 'Cat'], validation='any') # [-1, 2]\n```\n\n### Inverse transform\n- Unknown categories would be parsed as NaN\n- If you want to raise an error, there are 2 validation options.\n - validation=`all` -- Raise EncoderError if any result is NaN\n - validation=`any` -- Raise EncoderError if all of them are NaN\n```python\nfrom pandas_label_encoder import EncoderValidationError\n\nanimal_encoder.inverse_transform([2]) # ['Cat']\nanimal_encoder.inverse_transform([9]) # [NaN]\n\nanimal_encoder.inverse_transform([9], validation='all') # Raise EncoderValidationError\nanimal_encoder.inverse_transform([9], validation='any') # Raise EncoderValidationError\n\ntry:\n animal_encoder.inverse_transform([9, 2], validation='all') # Raise EncoderValidationError\nexcept EncoderError:\n print('There is an unknown animal.')\n\nanimal_encoder.inverse_transform([9, 2], validation='any') # [NaN, 'Cat']\n```\n\n### Save and load the encoder\nThe load_encoder and encoder.Encoder.load methods will load the encoder and check for the encoder version.\n\nDifferent encoder version may have some changes that cause errors.\n\nTo check current encoder version, use `encoder.Encoder.__version__`.\n```python\nfrom pandas_label_encoder import save_encoder, load_encoder\n\n# Save or load other encoder directly from the encoder itself\nanimal_encoder.save(path) # save current encoder\nanimal_encoder.load(path) # load other encoder and assign to current encoder\n\n# Save or load other encoder by using functions\nanimal_encoder = load_encoder(path)\nsave_encoder(path)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Label encoder backed by pandas",
"version": "1.0.1",
"split_keywords": [
"pandas",
"label-encoder",
"label-encoding",
"label",
"encoding",
"encoder"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "9b59726841a1c64e4b4d29954b5ac2c9",
"sha256": "12c8d9bdc5a1c3fdb3686a7f4ddde99776decf24f0339144793d0e308c24a94e"
},
"downloads": -1,
"filename": "pandas_label_encoder-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b59726841a1c64e4b4d29954b5ac2c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 4176,
"upload_time": "2022-12-16T03:52:11",
"upload_time_iso_8601": "2022-12-16T03:52:11.448643Z",
"url": "https://files.pythonhosted.org/packages/e8/a8/b8714cd50a60f7bf06bcd9befb2275adeeb48316e5ed4451a4b61be7d5f0/pandas_label_encoder-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "42cda95b6e52a530909d5c33ef759367",
"sha256": "5e21d36993b90fe85e7a679ac607c03506c4dfbd5e698521e9a22136350e73b3"
},
"downloads": -1,
"filename": "pandas_label_encoder-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "42cda95b6e52a530909d5c33ef759367",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 4194,
"upload_time": "2022-12-16T03:52:13",
"upload_time_iso_8601": "2022-12-16T03:52:13.123234Z",
"url": "https://files.pythonhosted.org/packages/c4/00/63ed3f15b935d652e616e74e49377f8d0c7f2d1d816ec547127f4ef1a7ab/pandas_label_encoder-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-16 03:52:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "benzerer",
"github_project": "pandas-label-encoder",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pandas-label-encoder"
}