pyAutoSummarizer


NamepyAutoSummarizer JSON
Version 1.1.8 PyPI version JSON
download
home_pagehttps://github.com/Valdecy/pyAutoSummarizer
SummaryAn Extractive and Abstractive Summarization Library Powered with Artificial Intelligence
upload_time2023-12-03 18:35:25
maintainer
docs_urlNone
authorValdecy Pereira
requires_python
licenseGNU
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyAutoSummarizer

pyAutoSummarizer - An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence. 

## Introduction

pyAutoSummarizer is a sophisticated Python library developed to handle the complex task of text summarization, an essential component of NLP (Natural Language Processing). The library implements several advanced summarization algorithms, both extractive and abstractive. Extractive summarization algorithms focus on identifying and extracting key sentences or phrases from the original text to form the summary. Among the techniques utilized by pyAutoSummarizer are **TextRank**, **LexRank**, **LSA** (Latent Semantic Analysis), and **KL-Sum**. In the domain of deep learning, pyAutoSummarizer incorporates **BART** (Bidirectional and Auto-Regressive Transformers) and the use of **T5** (Text-to-Text Transfer Transformer) model, which is known for its versatility in handling a range of language tasks including summarization. Furthermore, pyAutoSummarizer also utilizes **PEGASUS** (Pre-training with Extracted Gap-sentences for Abstractive Summarization) and the OpenAI's GPT (Generative Pretrained Transformer), specifically the **chatGPT** model for abstractive summarization. Unlike extractive techniques, abstractive summarization involves generating new sentences, offering a summary that maintains the essence of the original text but may not use the exact wording.

pyAutoSummarizer stands out for its proficient preprocessing capabilities that pave the way for high-quality text summarization. Recognizing the importance of text normalization, the library offers a range of text cleansing and standardization features. It can convert text to **lowercase**, ensuring uniformity across the data. Additionally, it can **remove accents**, **remove special characters**, and **remove numbers**, which helps mitigate the text's noise. It also offers the functionality to **remove custom words**, enabling users to tailor their preprocessing needs. Notably, pyAutoSummarizer supports **stopwords** removal across various languages, including Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hind, Hungarian, Italian, Japanese, Korean, Marathi, Persia, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian. The library provides flexibility in sentence segmentation, allowing sentences to be split based on **punctuation**, **character count**, or **word count**. 

To evaluate the quality of the summaries generated, pyAutoSummarizer integrates various metrics such as **Rouge-N**, **Rouge-L**, and **Rouge-S**, which compare the overlap of n-grams, longest common subsequence, and skip-bigram between the generated summary and the reference summary respectively. Additionally, it employs **BLEU** (Bilingual Evaluation Understudy), and **METEOR** (Metric for Evaluation of Translation with Explicit ORdering).

## Usage

1. Install
```bash
pip install pyAutoSummarizer
```

2. Try it in **Colab**:

Extractive Summarization
- Example 01: TextRank             ([ Colab Demo ](https://colab.research.google.com/drive/1m7mF4R7s6hakuVhrwymrgqNNJpTySUM4?usp=sharing#scrollTo=npuyBY596tJ5))
- Example 02: LexRank              ([ Colab Demo ](https://colab.research.google.com/drive/1gT9fV7hAE4mvwAHbfzolF6TN3TjGgJOF?usp=sharing#scrollTo=npuyBY596tJ5))
- Example 03: LSA                  ([ Colab Demo ](https://colab.research.google.com/drive/19fUslzp43_Owib9YDCb0Xfe9XZm1OKmB?usp=sharing#scrollTo=npuyBY596tJ5))
- Example 04: KL-Sum               ([ Colab Demo ](https://colab.research.google.com/drive/19zHjE0nR1GcAWi4NQmaJh1gjpqm4sqjP?usp=sharing#scrollTo=npuyBY596tJ5))
- Example 05: BART (Deep Learning) ([ Colab Demo ](https://colab.research.google.com/drive/1sAYBDQFxwlA16nBUozgE28_xZlNzUCg-?usp=sharing))
- Example 06: T5 (Deep Learning)   ([ Colab Demo ](https://colab.research.google.com/drive/1tyWu-19xA9QMrwl_kPcGJH0ZSS3r_rDZ?usp=sharing#scrollTo=npuyBY596tJ5))

Abstractive Summarization. 
- Example 01: chatGPT (Deep Learning) ([ Colab Demo ](https://colab.research.google.com/drive/1ipl6ZnyumJeuxsYelcmZEdsXDMIuM5WG?usp=sharing#scrollTo=npuyBY596tJ5)) Requires the user to have an **API key** (https://platform.openai.com/account/api-keys)
- Example 02: PEGASUS (Deep Learning) ([ Colab Demo ](https://colab.research.google.com/drive/1RWIEm9WoZBPYA_p4A1LqKnFPaXhNsQcM?usp=sharing))

## Others

- [pyBibX](https://github.com/Valdecy/pyBibX) - A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Valdecy/pyAutoSummarizer",
    "name": "pyAutoSummarizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Valdecy Pereira",
    "author_email": "valdecy.pereira@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/60/60/c2649940805a774ffbf7dfda871d697f03f97072bd215d2cbdcffe643f76/pyAutoSummarizer-1.1.8.tar.gz",
    "platform": null,
    "description": "# pyAutoSummarizer\n\npyAutoSummarizer - An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence. \n\n## Introduction\n\npyAutoSummarizer is a sophisticated Python library developed to handle the complex task of text summarization, an essential component of NLP (Natural Language Processing). The library implements several advanced summarization algorithms, both extractive and abstractive. Extractive summarization algorithms focus on identifying and extracting key sentences or phrases from the original text to form the summary. Among the techniques utilized by pyAutoSummarizer are **TextRank**, **LexRank**, **LSA** (Latent Semantic Analysis), and **KL-Sum**. In the domain of deep learning, pyAutoSummarizer incorporates **BART** (Bidirectional and Auto-Regressive Transformers) and the use of **T5** (Text-to-Text Transfer Transformer) model, which is known for its versatility in handling a range of language tasks including summarization. Furthermore, pyAutoSummarizer also utilizes **PEGASUS** (Pre-training with Extracted Gap-sentences for Abstractive Summarization) and the OpenAI's GPT (Generative Pretrained Transformer), specifically the **chatGPT** model for abstractive summarization. Unlike extractive techniques, abstractive summarization involves generating new sentences, offering a summary that maintains the essence of the original text but may not use the exact wording.\n\npyAutoSummarizer stands out for its proficient preprocessing capabilities that pave the way for high-quality text summarization. Recognizing the importance of text normalization, the library offers a range of text cleansing and standardization features. It can convert text to **lowercase**, ensuring uniformity across the data. Additionally, it can **remove accents**, **remove special characters**, and **remove numbers**, which helps mitigate the text's noise. It also offers the functionality to **remove custom words**, enabling users to tailor their preprocessing needs. Notably, pyAutoSummarizer supports **stopwords** removal across various languages, including Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hind, Hungarian, Italian, Japanese, Korean, Marathi, Persia, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian. The library provides flexibility in sentence segmentation, allowing sentences to be split based on **punctuation**, **character count**, or **word count**. \n\nTo evaluate the quality of the summaries generated, pyAutoSummarizer integrates various metrics such as **Rouge-N**, **Rouge-L**, and **Rouge-S**, which compare the overlap of n-grams, longest common subsequence, and skip-bigram between the generated summary and the reference summary respectively. Additionally, it employs **BLEU** (Bilingual Evaluation Understudy), and **METEOR** (Metric for Evaluation of Translation with Explicit ORdering).\n\n## Usage\n\n1. Install\n```bash\npip install pyAutoSummarizer\n```\n\n2. Try it in **Colab**:\n\nExtractive Summarization\n- Example 01: TextRank             ([ Colab Demo ](https://colab.research.google.com/drive/1m7mF4R7s6hakuVhrwymrgqNNJpTySUM4?usp=sharing#scrollTo=npuyBY596tJ5))\n- Example 02: LexRank              ([ Colab Demo ](https://colab.research.google.com/drive/1gT9fV7hAE4mvwAHbfzolF6TN3TjGgJOF?usp=sharing#scrollTo=npuyBY596tJ5))\n- Example 03: LSA                  ([ Colab Demo ](https://colab.research.google.com/drive/19fUslzp43_Owib9YDCb0Xfe9XZm1OKmB?usp=sharing#scrollTo=npuyBY596tJ5))\n- Example 04: KL-Sum               ([ Colab Demo ](https://colab.research.google.com/drive/19zHjE0nR1GcAWi4NQmaJh1gjpqm4sqjP?usp=sharing#scrollTo=npuyBY596tJ5))\n- Example 05: BART (Deep Learning) ([ Colab Demo ](https://colab.research.google.com/drive/1sAYBDQFxwlA16nBUozgE28_xZlNzUCg-?usp=sharing))\n- Example 06: T5 (Deep Learning)   ([ Colab Demo ](https://colab.research.google.com/drive/1tyWu-19xA9QMrwl_kPcGJH0ZSS3r_rDZ?usp=sharing#scrollTo=npuyBY596tJ5))\n\nAbstractive Summarization. \n- Example 01: chatGPT (Deep Learning) ([ Colab Demo ](https://colab.research.google.com/drive/1ipl6ZnyumJeuxsYelcmZEdsXDMIuM5WG?usp=sharing#scrollTo=npuyBY596tJ5)) Requires the user to have an **API key** (https://platform.openai.com/account/api-keys)\n- Example 02: PEGASUS (Deep Learning) ([ Colab Demo ](https://colab.research.google.com/drive/1RWIEm9WoZBPYA_p4A1LqKnFPaXhNsQcM?usp=sharing))\n\n## Others\n\n- [pyBibX](https://github.com/Valdecy/pyBibX) - A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools\n",
    "bugtrack_url": null,
    "license": "GNU",
    "summary": "An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence",
    "version": "1.1.8",
    "project_urls": {
        "Homepage": "https://github.com/Valdecy/pyAutoSummarizer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "538fd5bdef951867010dc5deb457c030affd286d1c53d5f821dab691e9ff7c4f",
                "md5": "5b0c711617849e1a6a93995d82915fdd",
                "sha256": "f8b4424da6bcb7da177b8d89187d4859e268df631b6b282c5dcf137268f75501"
            },
            "downloads": -1,
            "filename": "pyAutoSummarizer-1.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5b0c711617849e1a6a93995d82915fdd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 50612,
            "upload_time": "2023-12-03T18:35:23",
            "upload_time_iso_8601": "2023-12-03T18:35:23.835695Z",
            "url": "https://files.pythonhosted.org/packages/53/8f/d5bdef951867010dc5deb457c030affd286d1c53d5f821dab691e9ff7c4f/pyAutoSummarizer-1.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6060c2649940805a774ffbf7dfda871d697f03f97072bd215d2cbdcffe643f76",
                "md5": "67feebe2292dcf3c4b2fe1f7dfe1da00",
                "sha256": "b88e6878fd084659d1e1ffd437efe5fe31eb39cf761a92e78add737c0c40c781"
            },
            "downloads": -1,
            "filename": "pyAutoSummarizer-1.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "67feebe2292dcf3c4b2fe1f7dfe1da00",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 50364,
            "upload_time": "2023-12-03T18:35:25",
            "upload_time_iso_8601": "2023-12-03T18:35:25.628977Z",
            "url": "https://files.pythonhosted.org/packages/60/60/c2649940805a774ffbf7dfda871d697f03f97072bd215d2cbdcffe643f76/pyAutoSummarizer-1.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-03 18:35:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Valdecy",
    "github_project": "pyAutoSummarizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyautosummarizer"
}
        
Elapsed time: 0.57358s