labelit


Namelabelit JSON
Version 0.1.5 PyPI version JSON
download
home_pagehttps://github.com/shibing624/labelit
Summarylabel text and image based on active learning.
upload_time2022-12-09 04:17:04
maintainer
docs_urlNone
authorXuMing
requires_python>=3.5
licenseApache 2.0
keywords labelit active learning label text label image
VCS
bugtrack_url
requirements jieba loguru cleanlab scipy pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # labelit
label text and image based on active learning.

# Active Learning Playground

## Introduction

This is a python module for experimenting with different active learning
algorithms. There are a few key components to running active learning
experiments:

*   Main experiment script is
    [`run_experiment.py`](run_experiment.py)
    with many flags for different run options.

*   Supported datasets can be downloaded to a specified directory by running
    [`utils/create_data.py`](utils/create_data.py).

*   Supported active learning methods are in
    [`sampling_methods`](sampling_methods/).

Below I will go into each component in more detail.

DISCLAIMER: This is not an official Google product.

## Setup
The dependencies are in [`requirements.txt`](requirements.txt).  Please make sure these packages are
installed before running experiments.  If GPU capable `tensorflow` is desired, please follow
instructions [here](https://www.tensorflow.org/install/).

It is highly suggested that you install all dependencies into a separate `virtualenv` for
easy package management.

## Getting benchmark datasets

By default the datasets are saved to `/tmp/data`. You can specify another directory via the
`--save_dir` flag.

Redownloading all the datasets will be very time consuming so please be patient.
You can specify a subset of the data to download by passing in a comma separated
string of datasets via the `--datasets` flag.

## Running experiments

There are a few key flags for
[`run_experiment.py`](run_experiment.py):

*   `dataset`: name of the dataset, must match the save name used in
    `create_data.py`. Must also exist in the data_dir.

*   `sampling_method`: active learning method to use. Must be specified in
    [`sampling_methods/constants.py`](sampling_methods/constants.py).

*   `warmstart_size`: initial batch of uniformly sampled examples to use as seed
    data. Float indicates percentage of total training data and integer
    indicates raw size.

*   `batch_size`: number of datapoints to request in each batch. Float indicates
    percentage of total training data and integer indicates raw size.

*   `score_method`: model to use to evaluate the performance of the sampling
    method. Must be in `get_model` method of
    [`utils/utils.py`](utils/utils.py).

*   `data_dir`: directory with saved datasets.

*   `save_dir`: directory to save results.

This is just a subset of all the flags. There are also options for
preprocessing, introducing labeling noise, dataset subsampling, and using a
different model to select than to score/evaluate.

## Available active learning methods

All named active learning methods are in
[`sampling_methods/constants.py`](sampling_methods/constants.py).

You can also specify a mixture of active learning methods by following the
pattern of `[sampling_method]-[mixture_weight]` separated by dashes; i.e.
`mixture_of_samplers-margin-0.33-informative_diverse-0.33-uniform-0.34`.

Some supported sampling methods include:

*   Uniform: samples are selected via uniform sampling.

*   Margin: uncertainty based sampling method.

*   Informative and diverse: margin and cluster based sampling method.

*   k-center greedy: representative strategy that greedily forms a batch of
    points to minimize maximum distance from a labeled point.

*   Graph density: representative strategy that selects points in dense regions
    of pool.

*   Exp3 bandit: meta-active learning method that tries to learns optimal
    sampling method using a popular multi-armed bandit algorithm.

### Adding new active learning methods

Implement either a base sampler that inherits from
[`SamplingMethod`](sampling_methods/sampling_def.py)
or a meta-sampler that calls base samplers which inherits from
[`WrapperSamplingMethod`](sampling_methods/wrapper_sampler_def.py).

The only method that must be implemented by any sampler is `select_batch_`,
which can have arbitrary named arguments. The only restriction is that the name
for the same input must be consistent across all the samplers (i.e. the indices
for already selected examples all have the same name across samplers). Adding a
new named argument that hasn't been used in other sampling methods will require
feeding that into the `select_batch` call in
[`run_experiment.py`](run_experiment.py).

After implementing your sampler, be sure to add it to
[`constants.py`](sampling_methods/constants.py)
so that it can be called from
[`run_experiment.py`](run_experiment.py).

## Available models

All available models are in the `get_model` method of
[`utils/utils.py`](utils/utils.py).

Supported methods:

*   Linear SVM: scikit method with grid search wrapper for regularization
    parameter.

*   Kernel SVM: scikit method with grid search wrapper for regularization
    parameter.

*   Logistc Regression: scikit method with grid search wrapper for
    regularization parameter.

*   Small CNN: 4 layer CNN optimized using rmsprop implemented in Keras with
    tensorflow backend.

*   Kernel Least Squares Classification: block gradient descient solver that can
    use multiple cores so is often faster than scikit Kernel SVM.

### Adding new models

New models must follow the scikit learn api and implement the following methods

*   `fit(X, y[, sample_weight])`: fit the model to the input features and
    target.

*   `predict(X)`: predict the value of the input features.

*   `score(X, y)`: returns target metric given test features and test targets.

*   `decision_function(X)` (optional): return class probabilities, distance to
    decision boundaries, or other metric that can be used by margin sampler as a
    measure of uncertainty.

See
[`small_cnn.py`](utils/small_cnn.py)
for an example.

After implementing your new model, be sure to add it to `get_model` method of
[`utils/utils.py`](utils/utils.py).

Currently models must be added on a one-off basis and not all scikit-learn
classifiers are supported due to the need for user input on whether and how to
tune the hyperparameters of the model. However, it is very easy to add a
scikit-learn model with hyperparameter search wrapped around as a supported
model.

## Collecting results and charting

The
[`utils/chart_data.py`](utils/chart_data.py)
script handles processing of data and charting for a specified dataset and
source directory.


# 主动学习
在某些情况下,没有类标签的数据相当丰富而有类标签的数据相当稀少,并且人工对数据进行标记的成本又相当高昂。在这种情况下,我们可以让学习算法主动地提出要对哪些数据进行标注,之后我们要将这些数据送到专家那里进行标注,再将这些数据加入到训练样本集中对算法进行训练。这一过程叫做主动学习。

主动学习方法一般可以分为两部分: 学习引擎和选择引擎。学习引擎维护一个基准分类器,并使用监督学习算法学习已标注的样例,进而提高该分类器的性能,而选择引擎通过样例选择算法选择一个未标注的样例并将其交由人类专家进行标注,再将标注后的样例加入到已标注样例集。学习引擎和选择引擎交替工作,经过多次循环,基准分类器的性能逐渐提高,当满足预设条件时,过程终止。

# 样例选择算法
根据获得未标注样例的方式,可以将主动学习分为两种类型:基于流的和基于池的。

- 基于池(pool-based)的主动学习中则维护一个未标注样例的集合,由选择引擎在该集合中选择当前要标注的样例。
- 基于流(stream-based)的主动学习中,未标记的样例按先后顺序逐个提交给选择引擎,由选择引擎决定是否标注当前提交的样例,如果不标注,则将其丢弃。由于基于流的算法不能对未标注样例逐一比较,需要对样例的相应评价指标设定阈值,当提交给选择引擎的样例评价指标超过阈值,则进行标注,但这种方法需要针对不同的任务进行调整,所以难以作为一种成熟的方法投入使用。此处不再介绍。

## 基于池的样例选择算法

1. 基于不确定度缩减的方法

这类方法选择那些当前基准分类器最不能确定其分类的样例进行标注。这类方法以信息熵作为衡量样例所含信息量大小的度量,而信息熵最大的样例正是当前分类器最不能确定其分类的样例。从几何角度看,这种方法优先选择靠近分类边界的样例。

2. 基于版本缩减的方法

这类方法选择那些训练后能够最大程度缩减版本空间的样例进行标注。在二值分类问题中,这类方法选择的样例总是差不多平分版本空间。

代表:QBC算法

QBC算法从版本空间中随机选择若干假设构成一个委员会,然后选择委员会中的假设预测分歧最大的样例进行标注。为了优化委员会的构成,可以采用Bagging,AdaBoost等分类器集成算法从版本空间中产生委员会。

3. 基于泛化误差缩减的方法

这类方法试图选择那些能够使未来泛化误差最大程度减小的样例。其一般过程为:首先选择一个损失函数用于估计未来错误率,然后将未标注样例集中的每一个样例都分别估计其能给基准分类器带来的误差缩减,选择估计值最大的那个样例进行标注。

这类方法直接针对分类器性能的最终评价指标,但是计算量较大,同时损失函数的精度对性能影响较大。

4. 其它方法

- COMB算法:组合三种不同的学习器,迅速切换到当前性能最好的学习器从而使选择样例尽可能高效。

- 多视图主动学习:用于学习问题为多视图学习的情况,选择那些使不同视图的预测分类不一致的样例进行学习。这种方法对于处理高维的主动学习问题非常有效。

- 预聚类主动学习:预先运行聚类算法预处理,选择样例时优先选择最靠近分类边界的样例和最能代表聚类的样例(即聚类中心)。

# 应用
## 文档分类和信息提取
以贝叶斯方法位基准分类器,使用基于不确定度缩减的样例选择算法进行文本分类。

将EM算法同基于QBC方法的主动学习集合。EM算法能够有效的利用未标注样例中的信息提高基准分类器的分类正确率。而QBC方法能够迅速缩减版本空间。

## 图像检索
利用SVM作为基准分类器的主动学习算法来处理图像检索。该算法采用最近边界方法作为样例选择算法,同时将图像的颜色、纹理等提取出来作为部分特征进行学习。

## 入侵检测
由于入侵检测系统较多地依赖专家知识和有效的数据集,所以可以采用主动学习算法降低这种依赖性。


# Usage
1. python3 label.py
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shibing624/labelit",
    "name": "labelit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "labelit,active learning,label text,label image",
    "author": "XuMing",
    "author_email": "xuming624@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/fc/79/ae877010f5183be9104291e7480e2fbca5ddcfeeacecc9763fbfd9db4a9d/labelit-0.1.5.tar.gz",
    "platform": "Windows",
    "description": "# labelit\nlabel text and image based on active learning.\n\n# Active Learning Playground\n\n## Introduction\n\nThis is a python module for experimenting with different active learning\nalgorithms. There are a few key components to running active learning\nexperiments:\n\n*   Main experiment script is\n    [`run_experiment.py`](run_experiment.py)\n    with many flags for different run options.\n\n*   Supported datasets can be downloaded to a specified directory by running\n    [`utils/create_data.py`](utils/create_data.py).\n\n*   Supported active learning methods are in\n    [`sampling_methods`](sampling_methods/).\n\nBelow I will go into each component in more detail.\n\nDISCLAIMER: This is not an official Google product.\n\n## Setup\nThe dependencies are in [`requirements.txt`](requirements.txt).  Please make sure these packages are\ninstalled before running experiments.  If GPU capable `tensorflow` is desired, please follow\ninstructions [here](https://www.tensorflow.org/install/).\n\nIt is highly suggested that you install all dependencies into a separate `virtualenv` for\neasy package management.\n\n## Getting benchmark datasets\n\nBy default the datasets are saved to `/tmp/data`. You can specify another directory via the\n`--save_dir` flag.\n\nRedownloading all the datasets will be very time consuming so please be patient.\nYou can specify a subset of the data to download by passing in a comma separated\nstring of datasets via the `--datasets` flag.\n\n## Running experiments\n\nThere are a few key flags for\n[`run_experiment.py`](run_experiment.py):\n\n*   `dataset`: name of the dataset, must match the save name used in\n    `create_data.py`. Must also exist in the data_dir.\n\n*   `sampling_method`: active learning method to use. Must be specified in\n    [`sampling_methods/constants.py`](sampling_methods/constants.py).\n\n*   `warmstart_size`: initial batch of uniformly sampled examples to use as seed\n    data. Float indicates percentage of total training data and integer\n    indicates raw size.\n\n*   `batch_size`: number of datapoints to request in each batch. Float indicates\n    percentage of total training data and integer indicates raw size.\n\n*   `score_method`: model to use to evaluate the performance of the sampling\n    method. Must be in `get_model` method of\n    [`utils/utils.py`](utils/utils.py).\n\n*   `data_dir`: directory with saved datasets.\n\n*   `save_dir`: directory to save results.\n\nThis is just a subset of all the flags. There are also options for\npreprocessing, introducing labeling noise, dataset subsampling, and using a\ndifferent model to select than to score/evaluate.\n\n## Available active learning methods\n\nAll named active learning methods are in\n[`sampling_methods/constants.py`](sampling_methods/constants.py).\n\nYou can also specify a mixture of active learning methods by following the\npattern of `[sampling_method]-[mixture_weight]` separated by dashes; i.e.\n`mixture_of_samplers-margin-0.33-informative_diverse-0.33-uniform-0.34`.\n\nSome supported sampling methods include:\n\n*   Uniform: samples are selected via uniform sampling.\n\n*   Margin: uncertainty based sampling method.\n\n*   Informative and diverse: margin and cluster based sampling method.\n\n*   k-center greedy: representative strategy that greedily forms a batch of\n    points to minimize maximum distance from a labeled point.\n\n*   Graph density: representative strategy that selects points in dense regions\n    of pool.\n\n*   Exp3 bandit: meta-active learning method that tries to learns optimal\n    sampling method using a popular multi-armed bandit algorithm.\n\n### Adding new active learning methods\n\nImplement either a base sampler that inherits from\n[`SamplingMethod`](sampling_methods/sampling_def.py)\nor a meta-sampler that calls base samplers which inherits from\n[`WrapperSamplingMethod`](sampling_methods/wrapper_sampler_def.py).\n\nThe only method that must be implemented by any sampler is `select_batch_`,\nwhich can have arbitrary named arguments. The only restriction is that the name\nfor the same input must be consistent across all the samplers (i.e. the indices\nfor already selected examples all have the same name across samplers). Adding a\nnew named argument that hasn't been used in other sampling methods will require\nfeeding that into the `select_batch` call in\n[`run_experiment.py`](run_experiment.py).\n\nAfter implementing your sampler, be sure to add it to\n[`constants.py`](sampling_methods/constants.py)\nso that it can be called from\n[`run_experiment.py`](run_experiment.py).\n\n## Available models\n\nAll available models are in the `get_model` method of\n[`utils/utils.py`](utils/utils.py).\n\nSupported methods:\n\n*   Linear SVM: scikit method with grid search wrapper for regularization\n    parameter.\n\n*   Kernel SVM: scikit method with grid search wrapper for regularization\n    parameter.\n\n*   Logistc Regression: scikit method with grid search wrapper for\n    regularization parameter.\n\n*   Small CNN: 4 layer CNN optimized using rmsprop implemented in Keras with\n    tensorflow backend.\n\n*   Kernel Least Squares Classification: block gradient descient solver that can\n    use multiple cores so is often faster than scikit Kernel SVM.\n\n### Adding new models\n\nNew models must follow the scikit learn api and implement the following methods\n\n*   `fit(X, y[, sample_weight])`: fit the model to the input features and\n    target.\n\n*   `predict(X)`: predict the value of the input features.\n\n*   `score(X, y)`: returns target metric given test features and test targets.\n\n*   `decision_function(X)` (optional): return class probabilities, distance to\n    decision boundaries, or other metric that can be used by margin sampler as a\n    measure of uncertainty.\n\nSee\n[`small_cnn.py`](utils/small_cnn.py)\nfor an example.\n\nAfter implementing your new model, be sure to add it to `get_model` method of\n[`utils/utils.py`](utils/utils.py).\n\nCurrently models must be added on a one-off basis and not all scikit-learn\nclassifiers are supported due to the need for user input on whether and how to\ntune the hyperparameters of the model. However, it is very easy to add a\nscikit-learn model with hyperparameter search wrapped around as a supported\nmodel.\n\n## Collecting results and charting\n\nThe\n[`utils/chart_data.py`](utils/chart_data.py)\nscript handles processing of data and charting for a specified dataset and\nsource directory.\n\n\n# \u4e3b\u52a8\u5b66\u4e60\n\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u6ca1\u6709\u7c7b\u6807\u7b7e\u7684\u6570\u636e\u76f8\u5f53\u4e30\u5bcc\u800c\u6709\u7c7b\u6807\u7b7e\u7684\u6570\u636e\u76f8\u5f53\u7a00\u5c11\uff0c\u5e76\u4e14\u4eba\u5de5\u5bf9\u6570\u636e\u8fdb\u884c\u6807\u8bb0\u7684\u6210\u672c\u53c8\u76f8\u5f53\u9ad8\u6602\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9\u5b66\u4e60\u7b97\u6cd5\u4e3b\u52a8\u5730\u63d0\u51fa\u8981\u5bf9\u54ea\u4e9b\u6570\u636e\u8fdb\u884c\u6807\u6ce8\uff0c\u4e4b\u540e\u6211\u4eec\u8981\u5c06\u8fd9\u4e9b\u6570\u636e\u9001\u5230\u4e13\u5bb6\u90a3\u91cc\u8fdb\u884c\u6807\u6ce8\uff0c\u518d\u5c06\u8fd9\u4e9b\u6570\u636e\u52a0\u5165\u5230\u8bad\u7ec3\u6837\u672c\u96c6\u4e2d\u5bf9\u7b97\u6cd5\u8fdb\u884c\u8bad\u7ec3\u3002\u8fd9\u4e00\u8fc7\u7a0b\u53eb\u505a\u4e3b\u52a8\u5b66\u4e60\u3002\n\n\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u4e00\u822c\u53ef\u4ee5\u5206\u4e3a\u4e24\u90e8\u5206\uff1a \u5b66\u4e60\u5f15\u64ce\u548c\u9009\u62e9\u5f15\u64ce\u3002\u5b66\u4e60\u5f15\u64ce\u7ef4\u62a4\u4e00\u4e2a\u57fa\u51c6\u5206\u7c7b\u5668\uff0c\u5e76\u4f7f\u7528\u76d1\u7763\u5b66\u4e60\u7b97\u6cd5\u5b66\u4e60\u5df2\u6807\u6ce8\u7684\u6837\u4f8b\uff0c\u8fdb\u800c\u63d0\u9ad8\u8be5\u5206\u7c7b\u5668\u7684\u6027\u80fd\uff0c\u800c\u9009\u62e9\u5f15\u64ce\u901a\u8fc7\u6837\u4f8b\u9009\u62e9\u7b97\u6cd5\u9009\u62e9\u4e00\u4e2a\u672a\u6807\u6ce8\u7684\u6837\u4f8b\u5e76\u5c06\u5176\u4ea4\u7531\u4eba\u7c7b\u4e13\u5bb6\u8fdb\u884c\u6807\u6ce8\uff0c\u518d\u5c06\u6807\u6ce8\u540e\u7684\u6837\u4f8b\u52a0\u5165\u5230\u5df2\u6807\u6ce8\u6837\u4f8b\u96c6\u3002\u5b66\u4e60\u5f15\u64ce\u548c\u9009\u62e9\u5f15\u64ce\u4ea4\u66ff\u5de5\u4f5c\uff0c\u7ecf\u8fc7\u591a\u6b21\u5faa\u73af\uff0c\u57fa\u51c6\u5206\u7c7b\u5668\u7684\u6027\u80fd\u9010\u6e10\u63d0\u9ad8\uff0c\u5f53\u6ee1\u8db3\u9884\u8bbe\u6761\u4ef6\u65f6\uff0c\u8fc7\u7a0b\u7ec8\u6b62\u3002\n\n# \u6837\u4f8b\u9009\u62e9\u7b97\u6cd5\n\u6839\u636e\u83b7\u5f97\u672a\u6807\u6ce8\u6837\u4f8b\u7684\u65b9\u5f0f\uff0c\u53ef\u4ee5\u5c06\u4e3b\u52a8\u5b66\u4e60\u5206\u4e3a\u4e24\u79cd\u7c7b\u578b\uff1a\u57fa\u4e8e\u6d41\u7684\u548c\u57fa\u4e8e\u6c60\u7684\u3002\n\n- \u57fa\u4e8e\u6c60(pool-based)\u7684\u4e3b\u52a8\u5b66\u4e60\u4e2d\u5219\u7ef4\u62a4\u4e00\u4e2a\u672a\u6807\u6ce8\u6837\u4f8b\u7684\u96c6\u5408\uff0c\u7531\u9009\u62e9\u5f15\u64ce\u5728\u8be5\u96c6\u5408\u4e2d\u9009\u62e9\u5f53\u524d\u8981\u6807\u6ce8\u7684\u6837\u4f8b\u3002\n- \u57fa\u4e8e\u6d41(stream-based)\u7684\u4e3b\u52a8\u5b66\u4e60\u4e2d\uff0c\u672a\u6807\u8bb0\u7684\u6837\u4f8b\u6309\u5148\u540e\u987a\u5e8f\u9010\u4e2a\u63d0\u4ea4\u7ed9\u9009\u62e9\u5f15\u64ce\uff0c\u7531\u9009\u62e9\u5f15\u64ce\u51b3\u5b9a\u662f\u5426\u6807\u6ce8\u5f53\u524d\u63d0\u4ea4\u7684\u6837\u4f8b\uff0c\u5982\u679c\u4e0d\u6807\u6ce8\uff0c\u5219\u5c06\u5176\u4e22\u5f03\u3002\u7531\u4e8e\u57fa\u4e8e\u6d41\u7684\u7b97\u6cd5\u4e0d\u80fd\u5bf9\u672a\u6807\u6ce8\u6837\u4f8b\u9010\u4e00\u6bd4\u8f83\uff0c\u9700\u8981\u5bf9\u6837\u4f8b\u7684\u76f8\u5e94\u8bc4\u4ef7\u6307\u6807\u8bbe\u5b9a\u9608\u503c\uff0c\u5f53\u63d0\u4ea4\u7ed9\u9009\u62e9\u5f15\u64ce\u7684\u6837\u4f8b\u8bc4\u4ef7\u6307\u6807\u8d85\u8fc7\u9608\u503c\uff0c\u5219\u8fdb\u884c\u6807\u6ce8\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8c03\u6574\uff0c\u6240\u4ee5\u96be\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u6210\u719f\u7684\u65b9\u6cd5\u6295\u5165\u4f7f\u7528\u3002\u6b64\u5904\u4e0d\u518d\u4ecb\u7ecd\u3002\n\n## \u57fa\u4e8e\u6c60\u7684\u6837\u4f8b\u9009\u62e9\u7b97\u6cd5\n\n1. \u57fa\u4e8e\u4e0d\u786e\u5b9a\u5ea6\u7f29\u51cf\u7684\u65b9\u6cd5\n\n\u8fd9\u7c7b\u65b9\u6cd5\u9009\u62e9\u90a3\u4e9b\u5f53\u524d\u57fa\u51c6\u5206\u7c7b\u5668\u6700\u4e0d\u80fd\u786e\u5b9a\u5176\u5206\u7c7b\u7684\u6837\u4f8b\u8fdb\u884c\u6807\u6ce8\u3002\u8fd9\u7c7b\u65b9\u6cd5\u4ee5\u4fe1\u606f\u71b5\u4f5c\u4e3a\u8861\u91cf\u6837\u4f8b\u6240\u542b\u4fe1\u606f\u91cf\u5927\u5c0f\u7684\u5ea6\u91cf\uff0c\u800c\u4fe1\u606f\u71b5\u6700\u5927\u7684\u6837\u4f8b\u6b63\u662f\u5f53\u524d\u5206\u7c7b\u5668\u6700\u4e0d\u80fd\u786e\u5b9a\u5176\u5206\u7c7b\u7684\u6837\u4f8b\u3002\u4ece\u51e0\u4f55\u89d2\u5ea6\u770b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9760\u8fd1\u5206\u7c7b\u8fb9\u754c\u7684\u6837\u4f8b\u3002\n\n2. \u57fa\u4e8e\u7248\u672c\u7f29\u51cf\u7684\u65b9\u6cd5\n\n\u8fd9\u7c7b\u65b9\u6cd5\u9009\u62e9\u90a3\u4e9b\u8bad\u7ec3\u540e\u80fd\u591f\u6700\u5927\u7a0b\u5ea6\u7f29\u51cf\u7248\u672c\u7a7a\u95f4\u7684\u6837\u4f8b\u8fdb\u884c\u6807\u6ce8\u3002\u5728\u4e8c\u503c\u5206\u7c7b\u95ee\u9898\u4e2d\uff0c\u8fd9\u7c7b\u65b9\u6cd5\u9009\u62e9\u7684\u6837\u4f8b\u603b\u662f\u5dee\u4e0d\u591a\u5e73\u5206\u7248\u672c\u7a7a\u95f4\u3002\n\n\u4ee3\u8868\uff1aQBC\u7b97\u6cd5\n\nQBC\u7b97\u6cd5\u4ece\u7248\u672c\u7a7a\u95f4\u4e2d\u968f\u673a\u9009\u62e9\u82e5\u5e72\u5047\u8bbe\u6784\u6210\u4e00\u4e2a\u59d4\u5458\u4f1a\uff0c\u7136\u540e\u9009\u62e9\u59d4\u5458\u4f1a\u4e2d\u7684\u5047\u8bbe\u9884\u6d4b\u5206\u6b67\u6700\u5927\u7684\u6837\u4f8b\u8fdb\u884c\u6807\u6ce8\u3002\u4e3a\u4e86\u4f18\u5316\u59d4\u5458\u4f1a\u7684\u6784\u6210\uff0c\u53ef\u4ee5\u91c7\u7528Bagging,AdaBoost\u7b49\u5206\u7c7b\u5668\u96c6\u6210\u7b97\u6cd5\u4ece\u7248\u672c\u7a7a\u95f4\u4e2d\u4ea7\u751f\u59d4\u5458\u4f1a\u3002\n\n3. \u57fa\u4e8e\u6cdb\u5316\u8bef\u5dee\u7f29\u51cf\u7684\u65b9\u6cd5\n\n\u8fd9\u7c7b\u65b9\u6cd5\u8bd5\u56fe\u9009\u62e9\u90a3\u4e9b\u80fd\u591f\u4f7f\u672a\u6765\u6cdb\u5316\u8bef\u5dee\u6700\u5927\u7a0b\u5ea6\u51cf\u5c0f\u7684\u6837\u4f8b\u3002\u5176\u4e00\u822c\u8fc7\u7a0b\u4e3a\uff1a\u9996\u5148\u9009\u62e9\u4e00\u4e2a\u635f\u5931\u51fd\u6570\u7528\u4e8e\u4f30\u8ba1\u672a\u6765\u9519\u8bef\u7387\uff0c\u7136\u540e\u5c06\u672a\u6807\u6ce8\u6837\u4f8b\u96c6\u4e2d\u7684\u6bcf\u4e00\u4e2a\u6837\u4f8b\u90fd\u5206\u522b\u4f30\u8ba1\u5176\u80fd\u7ed9\u57fa\u51c6\u5206\u7c7b\u5668\u5e26\u6765\u7684\u8bef\u5dee\u7f29\u51cf\uff0c\u9009\u62e9\u4f30\u8ba1\u503c\u6700\u5927\u7684\u90a3\u4e2a\u6837\u4f8b\u8fdb\u884c\u6807\u6ce8\u3002\n\n\u8fd9\u7c7b\u65b9\u6cd5\u76f4\u63a5\u9488\u5bf9\u5206\u7c7b\u5668\u6027\u80fd\u7684\u6700\u7ec8\u8bc4\u4ef7\u6307\u6807\uff0c\u4f46\u662f\u8ba1\u7b97\u91cf\u8f83\u5927\uff0c\u540c\u65f6\u635f\u5931\u51fd\u6570\u7684\u7cbe\u5ea6\u5bf9\u6027\u80fd\u5f71\u54cd\u8f83\u5927\u3002\n\n4. \u5176\u5b83\u65b9\u6cd5\n\n- COMB\u7b97\u6cd5\uff1a\u7ec4\u5408\u4e09\u79cd\u4e0d\u540c\u7684\u5b66\u4e60\u5668\uff0c\u8fc5\u901f\u5207\u6362\u5230\u5f53\u524d\u6027\u80fd\u6700\u597d\u7684\u5b66\u4e60\u5668\u4ece\u800c\u4f7f\u9009\u62e9\u6837\u4f8b\u5c3d\u53ef\u80fd\u9ad8\u6548\u3002\n\n- \u591a\u89c6\u56fe\u4e3b\u52a8\u5b66\u4e60\uff1a\u7528\u4e8e\u5b66\u4e60\u95ee\u9898\u4e3a\u591a\u89c6\u56fe\u5b66\u4e60\u7684\u60c5\u51b5\uff0c\u9009\u62e9\u90a3\u4e9b\u4f7f\u4e0d\u540c\u89c6\u56fe\u7684\u9884\u6d4b\u5206\u7c7b\u4e0d\u4e00\u81f4\u7684\u6837\u4f8b\u8fdb\u884c\u5b66\u4e60\u3002\u8fd9\u79cd\u65b9\u6cd5\u5bf9\u4e8e\u5904\u7406\u9ad8\u7ef4\u7684\u4e3b\u52a8\u5b66\u4e60\u95ee\u9898\u975e\u5e38\u6709\u6548\u3002\n\n- \u9884\u805a\u7c7b\u4e3b\u52a8\u5b66\u4e60\uff1a\u9884\u5148\u8fd0\u884c\u805a\u7c7b\u7b97\u6cd5\u9884\u5904\u7406\uff0c\u9009\u62e9\u6837\u4f8b\u65f6\u4f18\u5148\u9009\u62e9\u6700\u9760\u8fd1\u5206\u7c7b\u8fb9\u754c\u7684\u6837\u4f8b\u548c\u6700\u80fd\u4ee3\u8868\u805a\u7c7b\u7684\u6837\u4f8b\uff08\u5373\u805a\u7c7b\u4e2d\u5fc3\uff09\u3002\n\n# \u5e94\u7528\n## \u6587\u6863\u5206\u7c7b\u548c\u4fe1\u606f\u63d0\u53d6\n\u4ee5\u8d1d\u53f6\u65af\u65b9\u6cd5\u4f4d\u57fa\u51c6\u5206\u7c7b\u5668\uff0c\u4f7f\u7528\u57fa\u4e8e\u4e0d\u786e\u5b9a\u5ea6\u7f29\u51cf\u7684\u6837\u4f8b\u9009\u62e9\u7b97\u6cd5\u8fdb\u884c\u6587\u672c\u5206\u7c7b\u3002\n\n\u5c06EM\u7b97\u6cd5\u540c\u57fa\u4e8eQBC\u65b9\u6cd5\u7684\u4e3b\u52a8\u5b66\u4e60\u96c6\u5408\u3002EM\u7b97\u6cd5\u80fd\u591f\u6709\u6548\u7684\u5229\u7528\u672a\u6807\u6ce8\u6837\u4f8b\u4e2d\u7684\u4fe1\u606f\u63d0\u9ad8\u57fa\u51c6\u5206\u7c7b\u5668\u7684\u5206\u7c7b\u6b63\u786e\u7387\u3002\u800cQBC\u65b9\u6cd5\u80fd\u591f\u8fc5\u901f\u7f29\u51cf\u7248\u672c\u7a7a\u95f4\u3002\n\n## \u56fe\u50cf\u68c0\u7d22\n\u5229\u7528SVM\u4f5c\u4e3a\u57fa\u51c6\u5206\u7c7b\u5668\u7684\u4e3b\u52a8\u5b66\u4e60\u7b97\u6cd5\u6765\u5904\u7406\u56fe\u50cf\u68c0\u7d22\u3002\u8be5\u7b97\u6cd5\u91c7\u7528\u6700\u8fd1\u8fb9\u754c\u65b9\u6cd5\u4f5c\u4e3a\u6837\u4f8b\u9009\u62e9\u7b97\u6cd5\uff0c\u540c\u65f6\u5c06\u56fe\u50cf\u7684\u989c\u8272\u3001\u7eb9\u7406\u7b49\u63d0\u53d6\u51fa\u6765\u4f5c\u4e3a\u90e8\u5206\u7279\u5f81\u8fdb\u884c\u5b66\u4e60\u3002\n\n## \u5165\u4fb5\u68c0\u6d4b\n\u7531\u4e8e\u5165\u4fb5\u68c0\u6d4b\u7cfb\u7edf\u8f83\u591a\u5730\u4f9d\u8d56\u4e13\u5bb6\u77e5\u8bc6\u548c\u6709\u6548\u7684\u6570\u636e\u96c6\uff0c\u6240\u4ee5\u53ef\u4ee5\u91c7\u7528\u4e3b\u52a8\u5b66\u4e60\u7b97\u6cd5\u964d\u4f4e\u8fd9\u79cd\u4f9d\u8d56\u6027\u3002\n\n\n# Usage\n1. python3 label.py",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "label text and image based on active learning.",
    "version": "0.1.5",
    "split_keywords": [
        "labelit",
        "active learning",
        "label text",
        "label image"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e00ccc9dd256e0a52db419671dc6e3fe",
                "sha256": "a423694915db31cb391c113c3797d511abd9f0f60c4aa03742d9cda0bc4af8fc"
            },
            "downloads": -1,
            "filename": "labelit-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "e00ccc9dd256e0a52db419671dc6e3fe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 55866,
            "upload_time": "2022-12-09T04:17:04",
            "upload_time_iso_8601": "2022-12-09T04:17:04.070200Z",
            "url": "https://files.pythonhosted.org/packages/fc/79/ae877010f5183be9104291e7480e2fbca5ddcfeeacecc9763fbfd9db4a9d/labelit-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-09 04:17:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "shibing624",
    "github_project": "labelit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "jieba",
            "specs": []
        },
        {
            "name": "loguru",
            "specs": []
        },
        {
            "name": "cleanlab",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "0.19"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "0.20"
                ]
            ]
        }
    ],
    "lcname": "labelit"
}
        
Elapsed time: 0.11016s