musaddiquehussainlabs

Name	musaddiquehussainlabs JSON
Version	0.0.2 JSON
	download
home_page
Summary	MusaddiqueHussainLabs: Empowering text analytics with advanced tools for comprehensive Natural Language Processing (NLP) and Language Models (LLMs).
upload_time	2023-12-25 14:32:14
maintainer
docs_url	None
author
requires_python	>=3.9
license
keywords	nlp llm text preprocessing document analysis langchain google gemini
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library

MusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.

## Features

Currently the package is organized into three primary modules:

### 1. NLP Components

| Component Type | Description                 |
|----------------|-----------------------------|
| tokenize       | Text tokenization           |
| pos            | Part-of-Speech tagging      |
| lemma          | Word lemmatization          |
| morphology     | Study of word forms         |
| dep            | Dependency parsing          |
| ner            | Named Entity Recognition    |
| norm           | Text normalization          |

### 2. Text Preprocessing

This module equips users with an extensive set of text preprocessing tools:

| Function                      | Description                                          |
|-------------------------------|------------------------------------------------------|
| to_lower                      | Convert text to lowercase                             |
| to_upper                      | Convert text to uppercase                             |
| remove_number                 | Remove numerical characters                           |
| remove_itemized_bullet_and_numbering | Eliminate itemized/bullet-point numbering |
| remove_url                    | Remove URLs from text                                 |
| remove_punctuation            | Remove punctuation marks                              |
| remove_special_character      | Remove special characters                             |
| keep_alpha_numeric            | Keep only alphanumeric characters                     |
| remove_whitespace             | Remove excess whitespace                              |
| normalize_unicode             | Normalize Unicode characters                          |
| remove_stopword               | Eliminate common stopwords                            |
| remove_freqwords              | Remove frequently occurring words                      |
| remove_rarewords              | Remove rare words                                     |
| remove_email                  | Remove email addresses                                |
| remove_phone_number           | Remove phone numbers                                  |
| remove_ssn                    | Remove Social Security Numbers (SSN)                  |
| remove_credit_card_number     | Remove credit card numbers                            |
| remove_emoji                  | Remove emojis                                         |
| remove_emoticons              | Remove emoticons                                      |
| convert_emoticons_to_words    | Convert emoticons to words                            |
| convert_emojis_to_words       | Convert emojis to words                               |
| remove_html                   | Remove HTML tags                                      |
| chat_words_conversion         | Convert chat language to standard English              |
| expand_contraction            | Expand contractions (e.g., "can't" to "cannot")        |
| tokenize_word                 | Tokenize words                                        |
| tokenize_sentence             | Tokenize sentences                                    |
| stem_word                     | Stem words                                            |
| lemmatize_word                | Lemmatize words                                       |
| preprocess_text               | Combine multiple preprocessing steps into one function|

### 3. Document Analysis

| Functionality     | Description                                  |
|-------------------|----------------------------------------------|
| Language          | Detect document language                     |
| Linguistic Analysis    | Resolve ambiguities                          |
| Key phrases         | Retrieve relevant information from documents |
| NER               | Named Entity Recognition                     |
| Sentiment         | Analyze sentiment of text                    |
| PII Anonymization | Anonymize Personally Identifiable Information|

## Prerequisites

- Python >= 3.9
- GOOGLE_API_KEY from [Google AI Studio](https://makersuite.google.com)
- Place the API key in a `.env` file in the project root directory.

## Installation

To install `musaddiquehussainlabs`, you can use `pip`:

```bash
pip install musaddiquehussainlabs
```

## Usage

```python
from musaddiquehussainlabs.nlp_components import nlp
from musaddiquehussainlabs.text_preprocessing import preprocess_text, preprocess_operations
from musaddiquehussainlabs.document_analysis import DocumentAnalysis

data_to_process = "The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555."

# Using NLP component
result = nlp.predict(component_type="ner", input_text=data_to_process)
print(result)

# Text preprocessing
preprocessed_text = preprocess_text(data_to_process)
print(preprocessed_text)

# Custom Text preprocessing
preprocess_functions = [preprocess_operations.to_lower]
preprocessed_text = preprocess_text(data_to_process, preprocess_functions)
print(preprocessed_text)

# Document analysis
document_analysis = DocumentAnalysis()

# Option 1: full analysis
result = document_analysis.full_analysis(data_to_process)

# Option 2: Individual document analysis
result = document_analysis.pii_anonymization(data_to_process)

print(result)
```

Feel free to explore more functionalities and customize the usage based on your requirements!

For detailed usage examples and API documentation, please refer to the [documentation](link_to_your_documentation) (docs link comming soon) available.

## Upcoming Features

We're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks.
Please stay tuned for these exciting enhancements!

## License

This project is licensed under the [MIT License](LICENSE).

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "musaddiquehussainlabs",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "MusaddiqueHussain Jahagirdar <musaddiquehussainlabs@gmail.com>",
    "keywords": "nlp,llm,text preprocessing,document analysis,langchain,google gemini",
    "author": "",
    "author_email": "MusaddiqueHussain Jahagirdar <musaddiquehussainlabs@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b1/2f/a94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f/musaddiquehussainlabs-0.0.2.tar.gz",
    "platform": null,
    "description": "# MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library\r\n\r\nMusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.\r\n\r\n## Features\r\n\r\nCurrently the package is organized into three primary modules:\r\n\r\n### 1. NLP Components\r\n\r\n| Component Type | Description                 |\r\n|----------------|-----------------------------|\r\n| tokenize       | Text tokenization           |\r\n| pos            | Part-of-Speech tagging      |\r\n| lemma          | Word lemmatization          |\r\n| morphology     | Study of word forms         |\r\n| dep            | Dependency parsing          |\r\n| ner            | Named Entity Recognition    |\r\n| norm           | Text normalization          |\r\n\r\n### 2. Text Preprocessing\r\n\r\nThis module equips users with an extensive set of text preprocessing tools:\r\n\r\n| Function                      | Description                                          |\r\n|-------------------------------|------------------------------------------------------|\r\n| to_lower                      | Convert text to lowercase                             |\r\n| to_upper                      | Convert text to uppercase                             |\r\n| remove_number                 | Remove numerical characters                           |\r\n| remove_itemized_bullet_and_numbering | Eliminate itemized/bullet-point numbering |\r\n| remove_url                    | Remove URLs from text                                 |\r\n| remove_punctuation            | Remove punctuation marks                              |\r\n| remove_special_character      | Remove special characters                             |\r\n| keep_alpha_numeric            | Keep only alphanumeric characters                     |\r\n| remove_whitespace             | Remove excess whitespace                              |\r\n| normalize_unicode             | Normalize Unicode characters                          |\r\n| remove_stopword               | Eliminate common stopwords                            |\r\n| remove_freqwords              | Remove frequently occurring words                      |\r\n| remove_rarewords              | Remove rare words                                     |\r\n| remove_email                  | Remove email addresses                                |\r\n| remove_phone_number           | Remove phone numbers                                  |\r\n| remove_ssn                    | Remove Social Security Numbers (SSN)                  |\r\n| remove_credit_card_number     | Remove credit card numbers                            |\r\n| remove_emoji                  | Remove emojis                                         |\r\n| remove_emoticons              | Remove emoticons                                      |\r\n| convert_emoticons_to_words    | Convert emoticons to words                            |\r\n| convert_emojis_to_words       | Convert emojis to words                               |\r\n| remove_html                   | Remove HTML tags                                      |\r\n| chat_words_conversion         | Convert chat language to standard English              |\r\n| expand_contraction            | Expand contractions (e.g., \"can't\" to \"cannot\")        |\r\n| tokenize_word                 | Tokenize words                                        |\r\n| tokenize_sentence             | Tokenize sentences                                    |\r\n| stem_word                     | Stem words                                            |\r\n| lemmatize_word                | Lemmatize words                                       |\r\n| preprocess_text               | Combine multiple preprocessing steps into one function|\r\n\r\n### 3. Document Analysis\r\n\r\n| Functionality     | Description                                  |\r\n|-------------------|----------------------------------------------|\r\n| Language          | Detect document language                     |\r\n| Linguistic Analysis    | Resolve ambiguities                          |\r\n| Key phrases         | Retrieve relevant information from documents |\r\n| NER               | Named Entity Recognition                     |\r\n| Sentiment         | Analyze sentiment of text                    |\r\n| PII Anonymization | Anonymize Personally Identifiable Information|\r\n\r\n## Prerequisites\r\n\r\n- Python >= 3.9\r\n- GOOGLE_API_KEY from [Google AI Studio](https://makersuite.google.com)\r\n- Place the API key in a `.env` file in the project root directory.\r\n\r\n## Installation\r\n\r\nTo install `musaddiquehussainlabs`, you can use `pip`:\r\n\r\n```bash\r\npip install musaddiquehussainlabs\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\nfrom musaddiquehussainlabs.nlp_components import nlp\r\nfrom musaddiquehussainlabs.text_preprocessing import preprocess_text, preprocess_operations\r\nfrom musaddiquehussainlabs.document_analysis import DocumentAnalysis\r\n\r\ndata_to_process = \"The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555.\"\r\n\r\n# Using NLP component\r\nresult = nlp.predict(component_type=\"ner\", input_text=data_to_process)\r\nprint(result)\r\n\r\n# Text preprocessing\r\npreprocessed_text = preprocess_text(data_to_process)\r\nprint(preprocessed_text)\r\n\r\n# Custom Text preprocessing\r\npreprocess_functions = [preprocess_operations.to_lower]\r\npreprocessed_text = preprocess_text(data_to_process, preprocess_functions)\r\nprint(preprocessed_text)\r\n\r\n# Document analysis\r\ndocument_analysis = DocumentAnalysis()\r\n\r\n# Option 1: full analysis\r\nresult = document_analysis.full_analysis(data_to_process)\r\n\r\n# Option 2: Individual document analysis\r\nresult = document_analysis.pii_anonymization(data_to_process)\r\n\r\nprint(result)\r\n```\r\n\r\nFeel free to explore more functionalities and customize the usage based on your requirements!\r\n\r\nFor detailed usage examples and API documentation, please refer to the [documentation](link_to_your_documentation) (docs link comming soon) available.\r\n\r\n## Upcoming Features\r\n\r\nWe're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks.\r\nPlease stay tuned for these exciting enhancements!\r\n\r\n## License\r\n\r\nThis project is licensed under the [MIT License](LICENSE).\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "MusaddiqueHussainLabs: Empowering text analytics with advanced tools for comprehensive Natural Language Processing (NLP) and Language Models (LLMs).",
    "version": "0.0.2",
    "project_urls": {
        "Changelog": "https://github.com/MusaddiqueHussainLabs/musaddiquehussainlabs_package/blob/main/CHANGELOG.md",
        "Documentation": "https://musaddiquehussainlabs.github.io/musaddiquehussainlabs_package/",
        "Homepage": "https://demo-musaddiquehussainlabs.streamlit.app/",
        "Issues": "https://github.com/MusaddiqueHussainLabs/musaddiquehussainlabs_package/issues",
        "Repository": "https://github.com/MusaddiqueHussainLabs/musaddiquehussainlabs_package.git"
    },
    "split_keywords": [
        "nlp",
        "llm",
        "text preprocessing",
        "document analysis",
        "langchain",
        "google gemini"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa0d159d6f9afec86da036ee29ec73c6c63df66de073f3ba3b45be053feecc41",
                "md5": "94f170f6771c0d6af3312cac51a4edcb",
                "sha256": "9d466aab81f29c664202bbfd6cb1f5f7c6219620a8679f5db316d164c1b200e6"
            },
            "downloads": -1,
            "filename": "musaddiquehussainlabs-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "94f170f6771c0d6af3312cac51a4edcb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 50158,
            "upload_time": "2023-12-25T14:32:12",
            "upload_time_iso_8601": "2023-12-25T14:32:12.031678Z",
            "url": "https://files.pythonhosted.org/packages/aa/0d/159d6f9afec86da036ee29ec73c6c63df66de073f3ba3b45be053feecc41/musaddiquehussainlabs-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b12fa94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f",
                "md5": "faca7bd170f141ae366aaf5b92c36e2b",
                "sha256": "2c1554d68a792fa9d929b4f32c1930927c087bb6a5d9cd98c8b84ae81f2225c6"
            },
            "downloads": -1,
            "filename": "musaddiquehussainlabs-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "faca7bd170f141ae366aaf5b92c36e2b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 47724,
            "upload_time": "2023-12-25T14:32:14",
            "upload_time_iso_8601": "2023-12-25T14:32:14.217374Z",
            "url": "https://files.pythonhosted.org/packages/b1/2f/a94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f/musaddiquehussainlabs-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-25 14:32:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MusaddiqueHussainLabs",
    "github_project": "musaddiquehussainlabs_package",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "musaddiquehussainlabs"
}