pandas-label-encoder


Namepandas-label-encoder JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/benzerer/pandas-label-encoder
SummaryLabel encoder backed by pandas
upload_time2022-12-16 03:52:13
maintainer
docs_urlNone
authorNOPDANAI DEJVORAKUL
requires_python>=3.8,<4.0
licenseMIT
keywords pandas label-encoder label-encoding label encoding encoder
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pandas-powered LabelEncoder

## Performance benchmark
From the test, compare to sklearn's LabelEncoder.
```
Total rows: 24,123,464
Scikit-learn's LabelEncoder - 13.35 seconds
Pandas-powered LabelEncoder - 2.44 seconds
```

## Usage
## Installation
```shell
pip install pandas-label-encoder
```
### Initiation and fitting
```python
import pandas_label_encoder as ec
from pandas_label_encoder import EncoderCategoryError

categories = ['Cat', 'Dog', 'Bird']  # can be pd.Series, np.array, list

# Fit at inititation
animal_encoder = ec.Encoder(categories)

# Fit later
animal_encoder = ec.Encoder()
animal_encoder.fit(categories)

animal_encoder.categories # ['Cat', 'Dog', 'Bird'], read-only

# Trying to use functions before assign appropiate categories will raise EncoderCategoryError
ec.Encoder().transform() # Raise EncoderCategoryError
ec.Encoder().inverse_transform() # Raise EncoderCategoryError
```

### Transform
- Unknown categories would be parsed as -1
- If you want to raise an error, there are 2 validation options.
  - validation=`all` -- Raise EncoderError if any result is -1
  - validation=`any` -- Raise EncoderError if all of them are -1
```python
from pandas_label_encoder import EncoderValidationError

animal_encoder.transform(['Cat']) # [2]
animal_encoder.transform(['Fish']) # [-1]

animal_encoder.transform(['Fish'], validation='all') # Raise EncoderValidationError
animal_encoder.transform(['Fish'], validation='any') # Raise EncoderValidationError

try:
  animal_encoder.transform(['Fish', 'Cat'], validation='all') # Raise EncoderValidationError
except EncoderError:
  print('There is an unknown animal.')

animal_encoder.transform(['Fish', 'Cat'], validation='any') # [-1, 2]
```

### Inverse transform
- Unknown categories would be parsed as NaN
- If you want to raise an error, there are 2 validation options.
  - validation=`all` -- Raise EncoderError if any result is NaN
  - validation=`any` -- Raise EncoderError if all of them are NaN
```python
from pandas_label_encoder import EncoderValidationError

animal_encoder.inverse_transform([2]) # ['Cat']
animal_encoder.inverse_transform([9]) # [NaN]

animal_encoder.inverse_transform([9], validation='all') # Raise EncoderValidationError
animal_encoder.inverse_transform([9], validation='any') # Raise EncoderValidationError

try:
  animal_encoder.inverse_transform([9, 2], validation='all') # Raise EncoderValidationError
except EncoderError:
  print('There is an unknown animal.')

animal_encoder.inverse_transform([9, 2], validation='any') # [NaN, 'Cat']
```

### Save and load the encoder
The load_encoder and encoder.Encoder.load methods will load the encoder and check for the encoder version.

Different encoder version may have some changes that cause errors.

To check current encoder version, use `encoder.Encoder.__version__`.
```python
from pandas_label_encoder import save_encoder, load_encoder

# Save or load other encoder directly from the encoder itself
animal_encoder.save(path) # save current encoder
animal_encoder.load(path) # load other encoder and assign to current encoder

# Save or load other encoder by using functions
animal_encoder = load_encoder(path)
save_encoder(path)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/benzerer/pandas-label-encoder",
    "name": "pandas-label-encoder",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "pandas,label-encoder,label-encoding,label,encoding,encoder",
    "author": "NOPDANAI DEJVORAKUL",
    "author_email": "b.intm@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c4/00/63ed3f15b935d652e616e74e49377f8d0c7f2d1d816ec547127f4ef1a7ab/pandas_label_encoder-1.0.1.tar.gz",
    "platform": null,
    "description": "# Pandas-powered LabelEncoder\n\n## Performance benchmark\nFrom the test, compare to sklearn's LabelEncoder.\n```\nTotal rows: 24,123,464\nScikit-learn's LabelEncoder - 13.35 seconds\nPandas-powered LabelEncoder - 2.44 seconds\n```\n\n## Usage\n## Installation\n```shell\npip install pandas-label-encoder\n```\n### Initiation and fitting\n```python\nimport pandas_label_encoder as ec\nfrom pandas_label_encoder import EncoderCategoryError\n\ncategories = ['Cat', 'Dog', 'Bird']  # can be pd.Series, np.array, list\n\n# Fit at inititation\nanimal_encoder = ec.Encoder(categories)\n\n# Fit later\nanimal_encoder = ec.Encoder()\nanimal_encoder.fit(categories)\n\nanimal_encoder.categories # ['Cat', 'Dog', 'Bird'], read-only\n\n# Trying to use functions before assign appropiate categories will raise EncoderCategoryError\nec.Encoder().transform() # Raise EncoderCategoryError\nec.Encoder().inverse_transform() # Raise EncoderCategoryError\n```\n\n### Transform\n- Unknown categories would be parsed as -1\n- If you want to raise an error, there are 2 validation options.\n  - validation=`all` -- Raise EncoderError if any result is -1\n  - validation=`any` -- Raise EncoderError if all of them are -1\n```python\nfrom pandas_label_encoder import EncoderValidationError\n\nanimal_encoder.transform(['Cat']) # [2]\nanimal_encoder.transform(['Fish']) # [-1]\n\nanimal_encoder.transform(['Fish'], validation='all') # Raise EncoderValidationError\nanimal_encoder.transform(['Fish'], validation='any') # Raise EncoderValidationError\n\ntry:\n  animal_encoder.transform(['Fish', 'Cat'], validation='all') # Raise EncoderValidationError\nexcept EncoderError:\n  print('There is an unknown animal.')\n\nanimal_encoder.transform(['Fish', 'Cat'], validation='any') # [-1, 2]\n```\n\n### Inverse transform\n- Unknown categories would be parsed as NaN\n- If you want to raise an error, there are 2 validation options.\n  - validation=`all` -- Raise EncoderError if any result is NaN\n  - validation=`any` -- Raise EncoderError if all of them are NaN\n```python\nfrom pandas_label_encoder import EncoderValidationError\n\nanimal_encoder.inverse_transform([2]) # ['Cat']\nanimal_encoder.inverse_transform([9]) # [NaN]\n\nanimal_encoder.inverse_transform([9], validation='all') # Raise EncoderValidationError\nanimal_encoder.inverse_transform([9], validation='any') # Raise EncoderValidationError\n\ntry:\n  animal_encoder.inverse_transform([9, 2], validation='all') # Raise EncoderValidationError\nexcept EncoderError:\n  print('There is an unknown animal.')\n\nanimal_encoder.inverse_transform([9, 2], validation='any') # [NaN, 'Cat']\n```\n\n### Save and load the encoder\nThe load_encoder and encoder.Encoder.load methods will load the encoder and check for the encoder version.\n\nDifferent encoder version may have some changes that cause errors.\n\nTo check current encoder version, use `encoder.Encoder.__version__`.\n```python\nfrom pandas_label_encoder import save_encoder, load_encoder\n\n# Save or load other encoder directly from the encoder itself\nanimal_encoder.save(path) # save current encoder\nanimal_encoder.load(path) # load other encoder and assign to current encoder\n\n# Save or load other encoder by using functions\nanimal_encoder = load_encoder(path)\nsave_encoder(path)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Label encoder backed by pandas",
    "version": "1.0.1",
    "split_keywords": [
        "pandas",
        "label-encoder",
        "label-encoding",
        "label",
        "encoding",
        "encoder"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "9b59726841a1c64e4b4d29954b5ac2c9",
                "sha256": "12c8d9bdc5a1c3fdb3686a7f4ddde99776decf24f0339144793d0e308c24a94e"
            },
            "downloads": -1,
            "filename": "pandas_label_encoder-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b59726841a1c64e4b4d29954b5ac2c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 4176,
            "upload_time": "2022-12-16T03:52:11",
            "upload_time_iso_8601": "2022-12-16T03:52:11.448643Z",
            "url": "https://files.pythonhosted.org/packages/e8/a8/b8714cd50a60f7bf06bcd9befb2275adeeb48316e5ed4451a4b61be7d5f0/pandas_label_encoder-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "42cda95b6e52a530909d5c33ef759367",
                "sha256": "5e21d36993b90fe85e7a679ac607c03506c4dfbd5e698521e9a22136350e73b3"
            },
            "downloads": -1,
            "filename": "pandas_label_encoder-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "42cda95b6e52a530909d5c33ef759367",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 4194,
            "upload_time": "2022-12-16T03:52:13",
            "upload_time_iso_8601": "2022-12-16T03:52:13.123234Z",
            "url": "https://files.pythonhosted.org/packages/c4/00/63ed3f15b935d652e616e74e49377f8d0c7f2d1d816ec547127f4ef1a7ab/pandas_label_encoder-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-16 03:52:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "benzerer",
    "github_project": "pandas-label-encoder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pandas-label-encoder"
}
        
Elapsed time: 0.16008s