# MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library
MusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.
## Features
Currently the package is organized into three primary modules:
### 1. NLP Components
| Component Type | Description |
|----------------|-----------------------------|
| tokenize | Text tokenization |
| pos | Part-of-Speech tagging |
| lemma | Word lemmatization |
| morphology | Study of word forms |
| dep | Dependency parsing |
| ner | Named Entity Recognition |
| norm | Text normalization |
### 2. Text Preprocessing
This module equips users with an extensive set of text preprocessing tools:
| Function | Description |
|-------------------------------|------------------------------------------------------|
| to_lower | Convert text to lowercase |
| to_upper | Convert text to uppercase |
| remove_number | Remove numerical characters |
| remove_itemized_bullet_and_numbering | Eliminate itemized/bullet-point numbering |
| remove_url | Remove URLs from text |
| remove_punctuation | Remove punctuation marks |
| remove_special_character | Remove special characters |
| keep_alpha_numeric | Keep only alphanumeric characters |
| remove_whitespace | Remove excess whitespace |
| normalize_unicode | Normalize Unicode characters |
| remove_stopword | Eliminate common stopwords |
| remove_freqwords | Remove frequently occurring words |
| remove_rarewords | Remove rare words |
| remove_email | Remove email addresses |
| remove_phone_number | Remove phone numbers |
| remove_ssn | Remove Social Security Numbers (SSN) |
| remove_credit_card_number | Remove credit card numbers |
| remove_emoji | Remove emojis |
| remove_emoticons | Remove emoticons |
| convert_emoticons_to_words | Convert emoticons to words |
| convert_emojis_to_words | Convert emojis to words |
| remove_html | Remove HTML tags |
| chat_words_conversion | Convert chat language to standard English |
| expand_contraction | Expand contractions (e.g., "can't" to "cannot") |
| tokenize_word | Tokenize words |
| tokenize_sentence | Tokenize sentences |
| stem_word | Stem words |
| lemmatize_word | Lemmatize words |
| preprocess_text | Combine multiple preprocessing steps into one function|
### 3. Document Analysis
| Functionality | Description |
|-------------------|----------------------------------------------|
| Language | Detect document language |
| Linguistic Analysis | Resolve ambiguities |
| Key phrases | Retrieve relevant information from documents |
| NER | Named Entity Recognition |
| Sentiment | Analyze sentiment of text |
| PII Anonymization | Anonymize Personally Identifiable Information|
## Prerequisites
- Python >= 3.9
- GOOGLE_API_KEY from [Google AI Studio](https://makersuite.google.com)
- Place the API key in a `.env` file in the project root directory.
## Installation
To install `musaddiquehussainlabs`, you can use `pip`:
```bash
pip install musaddiquehussainlabs
```
## Usage
```python
from musaddiquehussainlabs.nlp_components import nlp
from musaddiquehussainlabs.text_preprocessing import preprocess_text, preprocess_operations
from musaddiquehussainlabs.document_analysis import DocumentAnalysis
data_to_process = "The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555."
# Using NLP component
result = nlp.predict(component_type="ner", input_text=data_to_process)
print(result)
# Text preprocessing
preprocessed_text = preprocess_text(data_to_process)
print(preprocessed_text)
# Custom Text preprocessing
preprocess_functions = [preprocess_operations.to_lower]
preprocessed_text = preprocess_text(data_to_process, preprocess_functions)
print(preprocessed_text)
# Document analysis
document_analysis = DocumentAnalysis()
# Option 1: full analysis
result = document_analysis.full_analysis(data_to_process)
# Option 2: Individual document analysis
result = document_analysis.pii_anonymization(data_to_process)
print(result)
```
Feel free to explore more functionalities and customize the usage based on your requirements!
For detailed usage examples and API documentation, please refer to the [documentation](link_to_your_documentation) (docs link comming soon) available.
## Upcoming Features
We're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks.
Please stay tuned for these exciting enhancements!
## License
This project is licensed under the [MIT License](LICENSE).
Raw data
{
"_id": null,
"home_page": "",
"name": "musaddiquehussainlabs",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "MusaddiqueHussain Jahagirdar <musaddiquehussainlabs@gmail.com>",
"keywords": "nlp,llm,text preprocessing,document analysis,langchain,google gemini",
"author": "",
"author_email": "MusaddiqueHussain Jahagirdar <musaddiquehussainlabs@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b1/2f/a94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f/musaddiquehussainlabs-0.0.2.tar.gz",
"platform": null,
"description": "# MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library\r\n\r\nMusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.\r\n\r\n## Features\r\n\r\nCurrently the package is organized into three primary modules:\r\n\r\n### 1. NLP Components\r\n\r\n| Component Type | Description |\r\n|----------------|-----------------------------|\r\n| tokenize | Text tokenization |\r\n| pos | Part-of-Speech tagging |\r\n| lemma | Word lemmatization |\r\n| morphology | Study of word forms |\r\n| dep | Dependency parsing |\r\n| ner | Named Entity Recognition |\r\n| norm | Text normalization |\r\n\r\n### 2. Text Preprocessing\r\n\r\nThis module equips users with an extensive set of text preprocessing tools:\r\n\r\n| Function | Description |\r\n|-------------------------------|------------------------------------------------------|\r\n| to_lower | Convert text to lowercase |\r\n| to_upper | Convert text to uppercase |\r\n| remove_number | Remove numerical characters |\r\n| remove_itemized_bullet_and_numbering | Eliminate itemized/bullet-point numbering |\r\n| remove_url | Remove URLs from text |\r\n| remove_punctuation | Remove punctuation marks |\r\n| remove_special_character | Remove special characters |\r\n| keep_alpha_numeric | Keep only alphanumeric characters |\r\n| remove_whitespace | Remove excess whitespace |\r\n| normalize_unicode | Normalize Unicode characters |\r\n| remove_stopword | Eliminate common stopwords |\r\n| remove_freqwords | Remove frequently occurring words |\r\n| remove_rarewords | Remove rare words |\r\n| remove_email | Remove email addresses |\r\n| remove_phone_number | Remove phone numbers |\r\n| remove_ssn | Remove Social Security Numbers (SSN) |\r\n| remove_credit_card_number | Remove credit card numbers |\r\n| remove_emoji | Remove emojis |\r\n| remove_emoticons | Remove emoticons |\r\n| convert_emoticons_to_words | Convert emoticons to words |\r\n| convert_emojis_to_words | Convert emojis to words |\r\n| remove_html | Remove HTML tags |\r\n| chat_words_conversion | Convert chat language to standard English |\r\n| expand_contraction | Expand contractions (e.g., \"can't\" to \"cannot\") |\r\n| tokenize_word | Tokenize words |\r\n| tokenize_sentence | Tokenize sentences |\r\n| stem_word | Stem words |\r\n| lemmatize_word | Lemmatize words |\r\n| preprocess_text | Combine multiple preprocessing steps into one function|\r\n\r\n### 3. Document Analysis\r\n\r\n| Functionality | Description |\r\n|-------------------|----------------------------------------------|\r\n| Language | Detect document language |\r\n| Linguistic Analysis | Resolve ambiguities |\r\n| Key phrases | Retrieve relevant information from documents |\r\n| NER | Named Entity Recognition |\r\n| Sentiment | Analyze sentiment of text |\r\n| PII Anonymization | Anonymize Personally Identifiable Information|\r\n\r\n## Prerequisites\r\n\r\n- Python >= 3.9\r\n- GOOGLE_API_KEY from [Google AI Studio](https://makersuite.google.com)\r\n- Place the API key in a `.env` file in the project root directory.\r\n\r\n## Installation\r\n\r\nTo install `musaddiquehussainlabs`, you can use `pip`:\r\n\r\n```bash\r\npip install musaddiquehussainlabs\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\nfrom musaddiquehussainlabs.nlp_components import nlp\r\nfrom musaddiquehussainlabs.text_preprocessing import preprocess_text, preprocess_operations\r\nfrom musaddiquehussainlabs.document_analysis import DocumentAnalysis\r\n\r\ndata_to_process = \"The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555.\"\r\n\r\n# Using NLP component\r\nresult = nlp.predict(component_type=\"ner\", input_text=data_to_process)\r\nprint(result)\r\n\r\n# Text preprocessing\r\npreprocessed_text = preprocess_text(data_to_process)\r\nprint(preprocessed_text)\r\n\r\n# Custom Text preprocessing\r\npreprocess_functions = [preprocess_operations.to_lower]\r\npreprocessed_text = preprocess_text(data_to_process, preprocess_functions)\r\nprint(preprocessed_text)\r\n\r\n# Document analysis\r\ndocument_analysis = DocumentAnalysis()\r\n\r\n# Option 1: full analysis\r\nresult = document_analysis.full_analysis(data_to_process)\r\n\r\n# Option 2: Individual document analysis\r\nresult = document_analysis.pii_anonymization(data_to_process)\r\n\r\nprint(result)\r\n```\r\n\r\nFeel free to explore more functionalities and customize the usage based on your requirements!\r\n\r\nFor detailed usage examples and API documentation, please refer to the [documentation](link_to_your_documentation) (docs link comming soon) available.\r\n\r\n## Upcoming Features\r\n\r\nWe're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks.\r\nPlease stay tuned for these exciting enhancements!\r\n\r\n## License\r\n\r\nThis project is licensed under the [MIT License](LICENSE).\r\n",
"bugtrack_url": null,
"license": "",
"summary": "MusaddiqueHussainLabs: Empowering text analytics with advanced tools for comprehensive Natural Language Processing (NLP) and Language Models (LLMs).",
"version": "0.0.2",
"project_urls": {
"Changelog": "https://github.com/MusaddiqueHussainLabs/musaddiquehussainlabs_package/blob/main/CHANGELOG.md",
"Documentation": "https://musaddiquehussainlabs.github.io/musaddiquehussainlabs_package/",
"Homepage": "https://demo-musaddiquehussainlabs.streamlit.app/",
"Issues": "https://github.com/MusaddiqueHussainLabs/musaddiquehussainlabs_package/issues",
"Repository": "https://github.com/MusaddiqueHussainLabs/musaddiquehussainlabs_package.git"
},
"split_keywords": [
"nlp",
"llm",
"text preprocessing",
"document analysis",
"langchain",
"google gemini"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aa0d159d6f9afec86da036ee29ec73c6c63df66de073f3ba3b45be053feecc41",
"md5": "94f170f6771c0d6af3312cac51a4edcb",
"sha256": "9d466aab81f29c664202bbfd6cb1f5f7c6219620a8679f5db316d164c1b200e6"
},
"downloads": -1,
"filename": "musaddiquehussainlabs-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "94f170f6771c0d6af3312cac51a4edcb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 50158,
"upload_time": "2023-12-25T14:32:12",
"upload_time_iso_8601": "2023-12-25T14:32:12.031678Z",
"url": "https://files.pythonhosted.org/packages/aa/0d/159d6f9afec86da036ee29ec73c6c63df66de073f3ba3b45be053feecc41/musaddiquehussainlabs-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b12fa94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f",
"md5": "faca7bd170f141ae366aaf5b92c36e2b",
"sha256": "2c1554d68a792fa9d929b4f32c1930927c087bb6a5d9cd98c8b84ae81f2225c6"
},
"downloads": -1,
"filename": "musaddiquehussainlabs-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "faca7bd170f141ae366aaf5b92c36e2b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 47724,
"upload_time": "2023-12-25T14:32:14",
"upload_time_iso_8601": "2023-12-25T14:32:14.217374Z",
"url": "https://files.pythonhosted.org/packages/b1/2f/a94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f/musaddiquehussainlabs-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-25 14:32:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MusaddiqueHussainLabs",
"github_project": "musaddiquehussainlabs_package",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "musaddiquehussainlabs"
}