Bani


NameBani JSON
Version 0.6.3 PyPI version JSON
download
home_pagehttps://github.com/captanlevi/FAQ-QnA-matching.git
Summary
upload_time2020-12-27 12:55:01
maintainer
docs_urlNone
authorRushi Babaria
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bani
This package aims to provide an easy way to set up a question answering system,  
Taking as input just raw text question answer pairs. The principal used is question similirity, ie the most similar question to a  
given query is found and the answer corrosponding to the said question is answered. For this purpose KNN algorithm is used, And Batch hard Tripet  
Loss is used to train a sentence transformer model.

## Installation 
#### Install with pip
```
pip install Bani
python -m spacy download en_core_web_md
```
This will install all the necessary packages , including the correct version of sentence transformers and transformers. 
#### Copy the source code
Clone or download the source and then 
```
python -m spacy download en_core_web_md
cd Bani ; pip install -r requirements
```


### Getting Started
See the [tutorial](https://github.com/captanlevi/Bani/blob/master/Tutorial.ipynb) notebook for a quick introduction to the usage of the package.

### Docs

#### FAQ
```
class FAQ (self,name : str,questions : List[str] = None, answers : List[str] = None)
```
All the user supplied FAQs are stored in the FAQ class, The FAQ class further runs sanity checks on the faqs ,and provides interface to  
generate questions and assign vectors.  

##### Parameters
     name : The name of an FAQ , all FAQs must have unique names.  
     questions : list of questions or None.  
     answers : list of corrosponding answers or None.  
    (if questions are None answers must also be None , and the FAQ will be empty , you can load this empty faq with another presaved FAQ)

##### Methods
     getAnswerWithLabel(self, label : int) -> Answer  
     getQuestionWithLabel(self, label : int) -> Question  
     buildFAQ(self,generator : GenerateManager,model = None) : this method will generate questions using the given generator , and   
                                                             if the model is also provided , it will assign the vectors to questions as well.  
     isEmpty(self) -> bool : Returns true if the FAQ is empty  
     isUsable(self) -> bool : Returns true if buildFAQ has been called and questions are generated.
     hasVectorsAssigned(self) -> bool : Returns true if all the questions have vectors assigned.  
     load(self,rootDirPath) -> None : Loads the FAQ with the name as self.name  within the root directory.  
     save(self,rootDirPath) -> None : Saves the current object (self) as (self.name).pkl in the root directory.  
     resetAssignedVectors -> None : Resets all the FAQ's assigned vectors to None.  
     resetFAQ -> None : Resets the FAQ to an empty FAQ.  

#### GenerateManager  
```
class GenerateManager (self , producers : List[Any], names : List[str] = None, nums : List[int] = None)
```
The GenerateManager is the interface where the user can register their own sentence prodicers. The class takes care of  
how to run the producers (multi processing , multi threading or single process).  

##### Parameters  
     producers : list of producers (A producer is an instance of any class that implements either batch_generate method or exact_batch_generate).  
     names : list of names of the producers , each producer must have a unique name.
     nums : list of numbers , each number indicates the max number of questions to generate from the producer.  

##### Methods  
     addProducer(self,producer , name : str , toGenerate : int) : adding producer , the name must be different from the preexisting ones.  
     producerList(self) -> Tuple[List[str],List[int],List[Any]] : returns the names,nums and producers that are registered.  
     removeProducer(self, name) -> None : remove a producer from the generateManager.  


#### Bani

```
class Bani(self,FAQs : List[FAQ], modelPath : str = None, assignVectors : bool = True):
```
The class that acts as the chatbot , It registers any number of FAQs , trains a model on the FAQs and then answers the questions on these FAQs.  

##### Parameters  
     FAQs : list of instances of FAQ class. (each FAQ is given a unique id)
     modelPath : The path to a pretrained model , or any model from the sentence transformers models , if None then the roberta model is pulled.  
     assignVectors : Whether to assign vectors wrt the new model, if true every question in all FAQs are passed through the current model , and new  
                     vectors are assigned, if false then all the FAQs should have re assigned vectors.  


##### Methods  
     train(self,outputPath : str,batchSize = 16, epochs : int = 1, **kwargs) : method to train the model , after training  the new model is loaded and  
                                                                               the FAQ vectors are reassigned using this model.  

     saveFAQs(self, rootDirPath : str) : method to save the FAQ with vectors assigned to rootDirPath , so that the next time you can set,  
                                         assignVectors to False, if you are loading these FAQs (Just to save time).  

     getFAQWithId(self, id : int) -> FAQ: method to get the faq wrt the given id , the indexing starts from 0.  


     findClosestFromFAQ(self,faqId : int, query : str, K : int = 3, topSimilar : int = 5) -> FAQOutput : Takes in a user query and runs the knn algo over it.  
                                         with K as K, and returns a FAQOutput object, whick topSimilar number of closest questions. The query is processed only  
                                         over the 'faqId'  FAQ.
     findClosest(self,query : str,  K : int = 3 , topSimilar : int = 5) -> List[FAQOutput] : The same as findClosestFromFAQ, but here the query is run over all the,  
                                        FAQs and the result is a list of FAQOutputs , the length of the list is the same as the number of FAQs.

     test(self,faqId : int,testData : List[Tuple[str,str]], K : int = 3) -> float:    Interface to test any given faq , expects a list of tuples of size 2
                                        first element is the orignal question and second is the paraphrased version. All the orignal question should ideally match the                                                 questions in the FAQ , if not you will be warned about it.


#### FAQOutput
    The user will get this , or a list of FAqOutput ,as the output for any query. It contains.  

     answer : Answer : The actual answer
     question : Question : The question that is being answered. (A generated question may be being answered, but only orignal question is given here)
     faqName : str,      : name of the faq the answer is from. 
     faqId : int,        : Id of the FAQ wrt the Bani object. 
     score : float       : Combined KNN score
     similarQuestions : List[str]  : Similar questions to the query asked , from the said FAQ.
     maxScore : float    : The question with maximum similirity with the query.
## Adding your own producers(sentence_generator)
The quality of the FAQ is directely related to the quality of questions produced, As such Bani comes with a default  
question generation pipeline , but also gives full freedom to customize or add your own **producers**.
A producer is an instance of any class that implements either batch_generate method or exact_batch_generate
```
class MyProducer1:
    def __init__(self):
        pass

    def batch_generate(questions : List[str]) -> Dict[str, List[str]]:
        """
        Takes list of questions and returns a dict , with each question 
        mapped to the list of generated questions
        """

        resultDict = dict()
        for question in questions:
            resultDict[question] = ["generated1", "generated2", "and so on"]

        return resultDict
```

The objects that implement exact_batch_generate will produce at most **n** questions for a given question. 

```
class MyProducer2:
    def __init__(self):
        pass

    def exact_batch_generate(questions : List[str], num : int) -> Dict[str, List[str]]:
        """
        Takes list of questions and returns a dict , with each question 
        mapped to the list of generated questions , for each question at most num questions are generated
        """

        resultDict = dict()
        for question in questions:
            resultDict[question] = ["generated1", "generated2", "and so on"]

        return resultDict
```

Each of the producers are registered in a GenerateManager , with their names and how many questions to generate at max from  
the producer.

```
from Bani.core.generation import GenerateManager

names = ["myProducer1_name", "myProducer2_name"]
toGenerate = [3,5] # At max generate 3 for first producer and 5 for second
producers = [MyProducer1(), MyProducer2()]

myGenerateManager = GenerateManager(producers = producers , names = names , nums = toGenerate)

# Or you can register the producers one by one

myGenerateManager.addProducer(producer = myProducer3, name = "myProducer3Name", togenerate = 5)
```







            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/captanlevi/FAQ-QnA-matching.git",
    "name": "Bani",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Rushi Babaria",
    "author_email": "rjlinkin50@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ec/9f/132758a5ee265433d9c00b883cec4a62d3e4ff2fd955c339045d534501e0/Bani-0.6.3.tar.gz",
    "platform": "",
    "description": "# Bani\nThis package aims to provide an easy way to set up a question answering system,  \nTaking as input just raw text question answer pairs. The principal used is question similirity, ie the most similar question to a  \ngiven query is found and the answer corrosponding to the said question is answered. For this purpose KNN algorithm is used, And Batch hard Tripet  \nLoss is used to train a sentence transformer model.\n\n## Installation \n#### Install with pip\n```\npip install Bani\npython -m spacy download en_core_web_md\n```\nThis will install all the necessary packages , including the correct version of sentence transformers and transformers. \n#### Copy the source code\nClone or download the source and then \n```\npython -m spacy download en_core_web_md\ncd Bani ; pip install -r requirements\n```\n\n\n### Getting Started\nSee the [tutorial](https://github.com/captanlevi/Bani/blob/master/Tutorial.ipynb) notebook for a quick introduction to the usage of the package.\n\n### Docs\n\n#### FAQ\n```\nclass FAQ (self,name : str,questions : List[str] = None, answers : List[str] = None)\n```\nAll the user supplied FAQs are stored in the FAQ class, The FAQ class further runs sanity checks on the faqs ,and provides interface to  \ngenerate questions and assign vectors.  \n\n##### Parameters\n     name : The name of an FAQ , all FAQs must have unique names.  \n     questions : list of questions or None.  \n     answers : list of corrosponding answers or None.  \n    (if questions are None answers must also be None , and the FAQ will be empty , you can load this empty faq with another presaved FAQ)\n\n##### Methods\n     getAnswerWithLabel(self, label : int) -> Answer  \n     getQuestionWithLabel(self, label : int) -> Question  \n     buildFAQ(self,generator : GenerateManager,model = None) : this method will generate questions using the given generator , and   \n                                                             if the model is also provided , it will assign the vectors to questions as well.  \n     isEmpty(self) -> bool : Returns true if the FAQ is empty  \n     isUsable(self) -> bool : Returns true if buildFAQ has been called and questions are generated.\n     hasVectorsAssigned(self) -> bool : Returns true if all the questions have vectors assigned.  \n     load(self,rootDirPath) -> None : Loads the FAQ with the name as self.name  within the root directory.  \n     save(self,rootDirPath) -> None : Saves the current object (self) as (self.name).pkl in the root directory.  \n     resetAssignedVectors -> None : Resets all the FAQ's assigned vectors to None.  \n     resetFAQ -> None : Resets the FAQ to an empty FAQ.  \n\n#### GenerateManager  \n```\nclass GenerateManager (self , producers : List[Any], names : List[str] = None, nums : List[int] = None)\n```\nThe GenerateManager is the interface where the user can register their own sentence prodicers. The class takes care of  \nhow to run the producers (multi processing , multi threading or single process).  \n\n##### Parameters  \n     producers : list of producers (A producer is an instance of any class that implements either batch_generate method or exact_batch_generate).  \n     names : list of names of the producers , each producer must have a unique name.\n     nums : list of numbers , each number indicates the max number of questions to generate from the producer.  \n\n##### Methods  \n     addProducer(self,producer , name : str , toGenerate : int) : adding producer , the name must be different from the preexisting ones.  \n     producerList(self) -> Tuple[List[str],List[int],List[Any]] : returns the names,nums and producers that are registered.  \n     removeProducer(self, name) -> None : remove a producer from the generateManager.  \n\n\n#### Bani\n\n```\nclass Bani(self,FAQs : List[FAQ], modelPath : str = None, assignVectors : bool = True):\n```\nThe class that acts as the chatbot , It registers any number of FAQs , trains a model on the FAQs and then answers the questions on these FAQs.  \n\n##### Parameters  \n     FAQs : list of instances of FAQ class. (each FAQ is given a unique id)\n     modelPath : The path to a pretrained model , or any model from the sentence transformers models , if None then the roberta model is pulled.  \n     assignVectors : Whether to assign vectors wrt the new model, if true every question in all FAQs are passed through the current model , and new  \n                     vectors are assigned, if false then all the FAQs should have re assigned vectors.  \n\n\n##### Methods  \n     train(self,outputPath : str,batchSize = 16, epochs : int = 1, **kwargs) : method to train the model , after training  the new model is loaded and  \n                                                                               the FAQ vectors are reassigned using this model.  \n\n     saveFAQs(self, rootDirPath : str) : method to save the FAQ with vectors assigned to rootDirPath , so that the next time you can set,  \n                                         assignVectors to False, if you are loading these FAQs (Just to save time).  \n\n     getFAQWithId(self, id : int) -> FAQ: method to get the faq wrt the given id , the indexing starts from 0.  \n\n\n     findClosestFromFAQ(self,faqId : int, query : str, K : int = 3, topSimilar : int = 5) -> FAQOutput : Takes in a user query and runs the knn algo over it.  \n                                         with K as K, and returns a FAQOutput object, whick topSimilar number of closest questions. The query is processed only  \n                                         over the 'faqId'  FAQ.\n     findClosest(self,query : str,  K : int = 3 , topSimilar : int = 5) -> List[FAQOutput] : The same as findClosestFromFAQ, but here the query is run over all the,  \n                                        FAQs and the result is a list of FAQOutputs , the length of the list is the same as the number of FAQs.\n\n     test(self,faqId : int,testData : List[Tuple[str,str]], K : int = 3) -> float:    Interface to test any given faq , expects a list of tuples of size 2\n                                        first element is the orignal question and second is the paraphrased version. All the orignal question should ideally match the                                                 questions in the FAQ , if not you will be warned about it.\n\n\n#### FAQOutput\n    The user will get this , or a list of FAqOutput ,as the output for any query. It contains.  \n\n     answer : Answer : The actual answer\n     question : Question : The question that is being answered. (A generated question may be being answered, but only orignal question is given here)\n     faqName : str,      : name of the faq the answer is from. \n     faqId : int,        : Id of the FAQ wrt the Bani object. \n     score : float       : Combined KNN score\n     similarQuestions : List[str]  : Similar questions to the query asked , from the said FAQ.\n     maxScore : float    : The question with maximum similirity with the query.\n## Adding your own producers(sentence_generator)\nThe quality of the FAQ is directely related to the quality of questions produced, As such Bani comes with a default  \nquestion generation pipeline , but also gives full freedom to customize or add your own **producers**.\nA producer is an instance of any class that implements either batch_generate method or exact_batch_generate\n```\nclass MyProducer1:\n    def __init__(self):\n        pass\n\n    def batch_generate(questions : List[str]) -> Dict[str, List[str]]:\n        \"\"\"\n        Takes list of questions and returns a dict , with each question \n        mapped to the list of generated questions\n        \"\"\"\n\n        resultDict = dict()\n        for question in questions:\n            resultDict[question] = [\"generated1\", \"generated2\", \"and so on\"]\n\n        return resultDict\n```\n\nThe objects that implement exact_batch_generate will produce at most **n** questions for a given question. \n\n```\nclass MyProducer2:\n    def __init__(self):\n        pass\n\n    def exact_batch_generate(questions : List[str], num : int) -> Dict[str, List[str]]:\n        \"\"\"\n        Takes list of questions and returns a dict , with each question \n        mapped to the list of generated questions , for each question at most num questions are generated\n        \"\"\"\n\n        resultDict = dict()\n        for question in questions:\n            resultDict[question] = [\"generated1\", \"generated2\", \"and so on\"]\n\n        return resultDict\n```\n\nEach of the producers are registered in a GenerateManager , with their names and how many questions to generate at max from  \nthe producer.\n\n```\nfrom Bani.core.generation import GenerateManager\n\nnames = [\"myProducer1_name\", \"myProducer2_name\"]\ntoGenerate = [3,5] # At max generate 3 for first producer and 5 for second\nproducers = [MyProducer1(), MyProducer2()]\n\nmyGenerateManager = GenerateManager(producers = producers , names = names , nums = toGenerate)\n\n# Or you can register the producers one by one\n\nmyGenerateManager.addProducer(producer = myProducer3, name = \"myProducer3Name\", togenerate = 5)\n```\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.6.3",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "818a81ee03b279f9b359af1a804c3e13",
                "sha256": "5005fa14dc091aeae7c5e01d656b31870b2a6ab11982732009928f93e3253b0d"
            },
            "downloads": -1,
            "filename": "Bani-0.6.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "818a81ee03b279f9b359af1a804c3e13",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 42384,
            "upload_time": "2020-12-27T12:55:00",
            "upload_time_iso_8601": "2020-12-27T12:55:00.469205Z",
            "url": "https://files.pythonhosted.org/packages/c2/b7/3470d5fc79b06eff524b54237a09edac2e81f638807e27f162e99b24b168/Bani-0.6.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "9f2d1f985627dcf86379ec42432f0802",
                "sha256": "b88500df18e31a44b35bd79c67911cee9c4138ffe74c6e2cd8962f5ccc99a525"
            },
            "downloads": -1,
            "filename": "Bani-0.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9f2d1f985627dcf86379ec42432f0802",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 33064,
            "upload_time": "2020-12-27T12:55:01",
            "upload_time_iso_8601": "2020-12-27T12:55:01.846365Z",
            "url": "https://files.pythonhosted.org/packages/ec/9f/132758a5ee265433d9c00b883cec4a62d3e4ff2fd955c339045d534501e0/Bani-0.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-12-27 12:55:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "captanlevi",
    "error": "Could not fetch GitHub repository",
    "lcname": "bani"
}
        
Elapsed time: 0.17752s