insight-extractor-packaage

Name	insight-extractor-packaage JSON
Version	0.0.1 JSON
	download
home_page
Summary	Insight Extractor Package
upload_time	2022-12-27 17:04:02
maintainer
docs_url	None
author	Research and Innovation
requires_python
license
keywords	insight extractor
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # TakeBlipInsightExtractor Package
_Data & Analytics Research_

## Overview

Here is presented these content:

* [Intro](#intro)
* [Run](#run)
* [Example of initialization e usage](#Example of initialization e usage)


## Intro

The Insight Extractor offers a way to analyze huge volumes of textual data in order to identify, cluster and detail subjects. 
This project achieves this results by way of applying a proprietary Named Entity Recognition (NER) algorithm followed by a clustering algorithm. 
The IE Cloud also allows any person to use this tool without having too many computational resources available to themselves.

The package outputs four types of files:

- **Wordcloud**: It's an image file containing a wordcloud describing the most frequent subjects on the text. The colours represent the groups of similar subjects.
- **Wordtree**: It's an html file which contains the graphic relationship between the subjects and the examples of uses in sentences. It's an interactive graphic where the user can navigate along the tree.
- **Hierarchy**: It's a json file which contains the hierarchical relationship between subjects.
- **Table**: It's a csv file containing the following columns:


        Message                   |  Entities                                                                                    | Groups     | Structured Message
        sobre cobranca inexistente|[{'value': 'cobranÃ§a', 'lowercase_value': 'cobranÃ§a', 'postags': 'SUBS', 'type': 'financial'}]|['cobranÃ§a']|sobre cobranÃ§a inexistente



### Parameters

The following parameters need to be set by the user on the command line:
- **embedding_path**: path to the embedding model, the file should end with .kv;
- **postagging_model_path**: path to the postagging model, the file should end with .pkl;
- **postagging_label_path**: path to the postagging label file, the file should end with .pkl;
- **ner_model_path**: path to the ner model, the file should end with .pkl;
- **ner_label_path**: path to the ner label file, the file should end with .pkl;
- **file**: path to the csv file the user wants to analyze;
- **user_email**: user's Take Blip email where they want to receive the analysis;
- **bot_name**: bot ID.


The following parameters have default settings, but can be customized by the user;
- **node_messages_examples**: it is an int representing the number of examples outputed for each subject on the Wordtree file. The default value is 100;
- **similarity_threshold**: it is a float representing the similarity threshold between the subject groups. The default value is 0.65, we recommend that this parameter not be modified;
- **percentage_threshold**: it is a float representing the frequency percentile of subject from which they are not removed from the analysis. The default value is 0.9;
- **batch_size**: it is an int representing the batch size. The default value is 50;
- **chunk_size**: it is an int representing chunk file size for upload in storaged. The default value is 1024;
- **separator**: it is a str for the csv file delimiter character. The default value is '|'.


## Example of initialization e usage:
1) Import main packages;
2) Initialize main variables;   
3) Initialize eventhub logger;
4) Initialize Insight Extractor;
5) Insight Extractor usage.


An example of the above steps could be found in the python code below:

1) Import main packages
```
import uuid
from TakeBlipInsightExtractor.insight_extractor import InsightExtractor
from TakeBlipInsightExtractor.outputs.eventhub_log_sender import EventHubLogSender
``` 
2) Initialize main variables
```
embedding_path = '*.kv'
postag_model_path = '*.pkl'
postag_label_path = '*.pkl'
ner_model_path = '*.pkl'
ner_label_path = '*.pkl'

user_email = 'your_email@host.com'
bot_name = 'my_bot_for_insight_extractor'
application_name = 'your application'

eventhub_name = '*'
eventhub_connection_string = '*'

file_name = '*'
input_data = '*.csv'
separator = '|'

similarity_threshold = 0.65
node_messages_examples = 100
batch_size = 1024
percentage_threshold = 0.7
```

3) Initialize eventhub logger
```
correlation_id = str(uuid.uuid3(uuid.NAMESPACE_DNS, user_email + bot_name))
logger = EventHubLogSender(application_name=application_name,
                           user_email=user_email,
                           bot_name=bot_name,
                           file_name=file_name,
                           correlation_id=correlation_id,
                           connection_string=eventhub_connection_string,
                           eventhub_name=eventhub_name)
```
4) Initialize Insight Extractor
```
insight_extractor = InsightExtractor(input_data,
                                     separator=separator,
                                     similarity_threshold=similarity_threshold,
                                     embedding_path=embedding_path,
                                     postagging_model_path=postag_model_path,
                                     postagging_label_path=postag_label_path,
                                     ner_model_path=ner_model_path,
                                     ner_label_path=ner_label_path,
                                     user_email=user_email,
                                     bot_name=bot_name,
                                     logger=logger)
```   
5) Insight Extractor usage
```
insight_extractor.predict(percentage_threshold=percentage_threshold,
                          node_messages_examples=node_messages_examples,
                          batch_size=batch_size)
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "insight-extractor-packaage",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "insight extractor",
    "author": "Research and Innovation",
    "author_email": "insightextractor.dataanalytics@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/33/8d/65da0355b30dda53e41cf907675da7aed47e01b97e4becab586f5715d163/insight-extractor-packaage-0.0.1.tar.gz",
    "platform": null,
    "description": "# TakeBlipInsightExtractor Package\n_Data & Analytics Research_\n\n## Overview\n\nHere is presented these content:\n\n* [Intro](#intro)\n* [Run](#run)\n* [Example of initialization e usage](#Example of initialization e usage)\n\n\n## Intro\n\nThe Insight Extractor offers a way to analyze huge volumes of textual data in order to identify, cluster and detail subjects. \nThis project achieves this results by way of applying a proprietary Named Entity Recognition (NER) algorithm followed by a clustering algorithm. \nThe IE Cloud also allows any person to use this tool without having too many computational resources available to themselves.\n\nThe package outputs four types of files:\n\n- **Wordcloud**: It's an image file containing a wordcloud describing the most frequent subjects on the text. The colours represent the groups of similar subjects.\n- **Wordtree**: It's an html file which contains the graphic relationship between the subjects and the examples of uses in sentences. It's an interactive graphic where the user can navigate along the tree.\n- **Hierarchy**: It's a json file which contains the hierarchical relationship between subjects.\n- **Table**: It's a csv file containing the following columns:\n\n\n        Message                   |  Entities                                                                                    | Groups     | Structured Message\n        sobre cobranca inexistente|[{'value': 'cobran\u00c3\u00a7a', 'lowercase_value': 'cobran\u00c3\u00a7a', 'postags': 'SUBS', 'type': 'financial'}]|['cobran\u00c3\u00a7a']|sobre cobran\u00c3\u00a7a inexistente\n\n\n\n### Parameters\n\nThe following parameters need to be set by the user on the command line:\n- **embedding_path**: path to the embedding model, the file should end with .kv;\n- **postagging_model_path**: path to the postagging model, the file should end with .pkl;\n- **postagging_label_path**: path to the postagging label file, the file should end with .pkl;\n- **ner_model_path**: path to the ner model, the file should end with .pkl;\n- **ner_label_path**: path to the ner label file, the file should end with .pkl;\n- **file**: path to the csv file the user wants to analyze;\n- **user_email**: user's Take Blip email where they want to receive the analysis;\n- **bot_name**: bot ID.\n\n\nThe following parameters have default settings, but can be customized by the user;\n- **node_messages_examples**: it is an int representing the number of examples outputed for each subject on the Wordtree file. The default value is 100;\n- **similarity_threshold**: it is a float representing the similarity threshold between the subject groups. The default value is 0.65, we recommend that this parameter not be modified;\n- **percentage_threshold**: it is a float representing the frequency percentile of subject from which they are not removed from the analysis. The default value is 0.9;\n- **batch_size**: it is an int representing the batch size. The default value is 50;\n- **chunk_size**: it is an int representing chunk file size for upload in storaged. The default value is 1024;\n- **separator**: it is a str for the csv file delimiter character. The default value is '|'.\n\n\n## Example of initialization e usage:\n1) Import main packages;\n2) Initialize main variables;   \n3) Initialize eventhub logger;\n4) Initialize Insight Extractor;\n5) Insight Extractor usage.\n\n\nAn example of the above steps could be found in the python code below:\n\n1) Import main packages\n```\nimport uuid\nfrom TakeBlipInsightExtractor.insight_extractor import InsightExtractor\nfrom TakeBlipInsightExtractor.outputs.eventhub_log_sender import EventHubLogSender\n``` \n2) Initialize main variables\n```\nembedding_path = '*.kv'\npostag_model_path = '*.pkl'\npostag_label_path = '*.pkl'\nner_model_path = '*.pkl'\nner_label_path = '*.pkl'\n\nuser_email = 'your_email@host.com'\nbot_name = 'my_bot_for_insight_extractor'\napplication_name = 'your application'\n\neventhub_name = '*'\neventhub_connection_string = '*'\n\nfile_name = '*'\ninput_data = '*.csv'\nseparator = '|'\n\nsimilarity_threshold = 0.65\nnode_messages_examples = 100\nbatch_size = 1024\npercentage_threshold = 0.7\n```\n\n3) Initialize eventhub logger\n```\ncorrelation_id = str(uuid.uuid3(uuid.NAMESPACE_DNS, user_email + bot_name))\nlogger = EventHubLogSender(application_name=application_name,\n                           user_email=user_email,\n                           bot_name=bot_name,\n                           file_name=file_name,\n                           correlation_id=correlation_id,\n                           connection_string=eventhub_connection_string,\n                           eventhub_name=eventhub_name)\n```\n4) Initialize Insight Extractor\n```\ninsight_extractor = InsightExtractor(input_data,\n                                     separator=separator,\n                                     similarity_threshold=similarity_threshold,\n                                     embedding_path=embedding_path,\n                                     postagging_model_path=postag_model_path,\n                                     postagging_label_path=postag_label_path,\n                                     ner_model_path=ner_model_path,\n                                     ner_label_path=ner_label_path,\n                                     user_email=user_email,\n                                     bot_name=bot_name,\n                                     logger=logger)\n```   \n5) Insight Extractor usage\n```\ninsight_extractor.predict(percentage_threshold=percentage_threshold,\n                          node_messages_examples=node_messages_examples,\n                          batch_size=batch_size)\n``` \n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Insight Extractor Package",
    "version": "0.0.1",
    "split_keywords": [
        "insight",
        "extractor"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "5450b14338abdbde32c7bf472a830bff",
                "sha256": "c3bc65fc368e1bbe572ee70b6e730532e09e6d067506d2d0f5c1f5eb6303a1c7"
            },
            "downloads": -1,
            "filename": "insight_extractor_packaage-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5450b14338abdbde32c7bf472a830bff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 31532,
            "upload_time": "2022-12-27T17:04:00",
            "upload_time_iso_8601": "2022-12-27T17:04:00.469949Z",
            "url": "https://files.pythonhosted.org/packages/1e/ad/0685ea5e987d8982de35deee4d48a455f7e6ad5d5ba443dd47dd40d6cc46/insight_extractor_packaage-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "e0e4106e29546413dc0f8a2c591fa67d",
                "sha256": "89eb4be7fbabf432613f729d7bcfa4f63a1e5bc2acf5f571b5f811deb3940605"
            },
            "downloads": -1,
            "filename": "insight-extractor-packaage-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e0e4106e29546413dc0f8a2c591fa67d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 24104,
            "upload_time": "2022-12-27T17:04:02",
            "upload_time_iso_8601": "2022-12-27T17:04:02.435340Z",
            "url": "https://files.pythonhosted.org/packages/33/8d/65da0355b30dda53e41cf907675da7aed47e01b97e4becab586f5715d163/insight-extractor-packaage-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-27 17:04:02",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "insight-extractor-packaage"
}

Research and Innovation