lingo-nlp-toolkit


Namelingo-nlp-toolkit JSON
Version 2.3 PyPI version JSON
download
home_pagehttps://github.com/irfanalidv/lingo-nlp-toolkit
SummaryAdvanced NLP Toolkit - Lightweight, Fast, and Transformer-Ready
upload_time2025-08-15 08:42:51
maintainerNone
docs_urlNone
authorMd Irfan Ali
requires_python>=3.9
licenseNone
keywords nlp natural-language-processing transformers bert machine-learning ai text-processing sentiment-analysis named-entity-recognition text-classification embeddings
VCS
bugtrack_url
requirements torch transformers tokenizers spacy nltk scikit-learn numpy pandas scipy tqdm pyyaml requests beautifulsoup4 lxml fastapi uvicorn pydantic sentence-transformers
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # **Lingo: Advanced NLP Toolkit**

**Lightweight, Fast, and Transformer-Ready**

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/lingo-nlp-toolkit.svg)](https://badge.fury.io/py/lingo-nlp-toolkit)
[![PyPI Downloads](https://static.pepy.tech/badge/lingo-nlp-toolkit)](https://pepy.tech/projects/lingo-nlp-toolkit)

## **Overview**

Lingo is a modern, high-performance **Natural Language Processing (NLP) toolkit** designed for researchers, data scientists, and developers building intelligent language-powered applications. It combines **ease of use**, **speed**, and **state-of-the-art transformer capabilities**, offering an end-to-end pipeline — from text preprocessing to advanced contextual understanding.

Lingo bridges the gap between traditional NLP techniques and next-generation transformer-based architectures like **BERT**, **GPT**, and **LLaMA**, ensuring flexibility, scalability, and cutting-edge accuracy.

---

## **🚀 Quick Start**

### **Installation**

```bash
# One-command installation (recommended)
pip install lingo-nlp-toolkit

# Full installation with all dependencies
pip install lingo-nlp-toolkit[full]

# Development installation
pip install lingo-nlp-toolkit[dev]

# GPU support
pip install lingo-nlp-toolkit[gpu]
```

**✨ Auto-Setup**: Lingo automatically downloads all required NLP data and models on first use!

**📦 PyPI Package**: [lingo-nlp-toolkit on PyPI](https://pypi.org/project/lingo-nlp-toolkit/#files)

### **Examples & Use Cases**

```bash
# Basic usage
python examples/basic_usage.py

# Advanced real-world applications
python examples/advanced_use_cases.py

# Enterprise-grade NLP workflows
python examples/enterprise_nlp.py

# Capability showcase
python examples/showcase.py

# Interactive demo
python demo.py
```

### **First Steps**

```python
from lingo import Pipeline

# Create a sentiment analysis pipeline
nlp = Pipeline(task="sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")

# Run inference
text = "I absolutely love the new product update!"
result = nlp(text)
print(result)
# Output: {'label': 'POSITIVE', 'score': 0.988}
```

### **Command Line Usage**

```bash
# Sentiment analysis
lingo run sentiment-analysis --model cardiffnlp/twitter-roberta-base-sentiment-latest --text "I love this product!"

# List available models
lingo list-models

# Download a model
lingo download-model --model bert-base-uncased
```

---

## **✨ Key Features**

### **1. Text Preprocessing & Normalization**

- ✅ Unicode normalization (NFC/NFD)
- ✅ Lowercasing, punctuation removal, special character stripping
- ✅ Stopword removal (multi-language support)
- ✅ Lemmatization & stemming
- ✅ Advanced tokenization (Word, Sentence, Subword)
- ✅ Spell correction & slang expansion

### **2. Core NLP Tasks**

- ✅ **Text Classification** - Multi-class & multi-label
- ✅ **Named Entity Recognition (NER)** - Domain-specific models
- ✅ **Sentiment Analysis** - Binary, ternary, fine-grained
- ✅ **Text Embeddings** - BERT, Sentence-BERT, LLaMA
- ✅ **Question Answering** - Extractive & generative
- ✅ **Text Summarization** - Abstractive & extractive

### **3. Hugging Face Integration**

- ✅ Load any model from Hugging Face Hub
- ✅ Fine-tune pre-trained transformers
- ✅ Export models to Hugging Face Hub
- ✅ Mixed precision training

### **4. Performance & Scalability**

- ✅ GPU & multi-core CPU support
- ✅ Asynchronous batch processing
- ✅ Memory-efficient tokenization
- ✅ Lightweight deployment mode

---

## **📚 Comprehensive Examples**

### **Text Classification**

```python
from lingo import Pipeline

# Create classifier
classifier = Pipeline(
    task="text-classification",
    model="bert-base-uncased"
)

# Classify texts
texts = [
    "This is a positive review about the product.",
    "I'm not satisfied with the service quality.",
    "The product meets my expectations."
]

results = classifier(texts)
for text, result in zip(texts, results):
    print(f"{text[:30]}... → {result['label']}")

# Output:
# This is a positive review abou... → LABEL_0
# I'm not satisfied with the ser... → LABEL_0
# The product meets my expectati... → LABEL_0
```

### **Named Entity Recognition**

```python
# Create NER pipeline
ner = Pipeline(
    task="ner",
    model="dslim/bert-base-NER"
)

# Extract entities
text = "Apple Inc. is headquartered in Cupertino, California. Tim Cook is the CEO."
entities = ner(text)

for entity in entities:
    print(f"Entity: {entity['entity']}, Type: {entity['word']}, Score: {entity['score']:.3f}")

# Output:
# Entity: B-LOC, Type: cup, Score: 0.940
# Entity: B-LOC, Type: ##ert, Score: 0.671
# Entity: I-LOC, Type: ##ino, Score: 0.437
# Entity: B-LOC, Type: ca, Score: 0.506
```

### **Sentiment Analysis**

```python
# Create sentiment analyzer
sentiment = Pipeline(
    task="sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest"
)

# Analyze sentiment
texts = [
    "I love this amazing product!",
    "This is terrible, worst purchase ever.",
    "It's okay, nothing special."
]

results = sentiment(texts)
for text, result in zip(texts, results):
    print(f"{text[:30]}... → {result['label']} ({result['score']:.3f})")

# Output:
# I love this amazing product!... → positive (0.987)
# This is terrible, worst purcha... → negative (0.953)
# It's okay, nothing special.... → neutral (0.596)
```

### **Text Embeddings & Similarity**

```python
# Create embedding pipeline
embeddings = Pipeline(
    task="embedding",
    model="sentence-transformers/all-MiniLM-L6-v2"
)

# Generate embeddings
texts = [
    "The cat is on the mat.",
    "A cat is sitting on the mat.",
    "The weather is beautiful today."
]

embeds = embeddings(texts)

# Calculate similarity
from lingo.models import EmbeddingModel
embedding_model = embeddings.model

similarity = embedding_model.similarity(texts[0], texts[1])
print(f"Similarity: {similarity:.3f}")

# Output:
# Similarity: 0.907
```

### **Question Answering**

```python
# Create QA pipeline
qa = Pipeline(
    task="question-answering",
    model="deepset/roberta-base-squad2"
)

# Answer questions
context = """
Python is a high-level programming language created by Guido van Rossum in 1991.
It's known for its simplicity and readability, making it popular for beginners.
Python is widely used in data science, machine learning, and web development.
"""

question = "Who created Python?"
answer = qa(question=question, context=context)

print(f"Q: {question}")
print(f"A: {answer['answer']} (confidence: {answer['score']:.3f})")

# Output:
# Q: Who created Python?
# A: Guido van Rossum (confidence: 0.990)
```

### **Text Summarization**

```python
# Create summarization pipeline
summarizer = Pipeline(
    task="summarization",
    model="facebook/bart-large-cnn"
)

# Summarize long text
long_text = """
Artificial Intelligence (AI) has emerged as one of the most transformative technologies of the 21st century.
It encompasses a wide range of capabilities including machine learning, natural language processing, computer vision,
and robotics. AI systems can now perform tasks that were once thought to be exclusively human, such as recognizing
speech, translating languages, making decisions, and solving complex problems.
"""

summary = summarizer(long_text)
print(f"Summary: {summary['summary_text']}")

# Output:
# Summary: artificial intelligence (ai) has emerged as one of the most transformative technologies of the 21st century. it encompasses a wide range of capabilities including machine learning, natural language processing, computer vision, and robotics. ai systems can now perform tasks that were once thought to be exclusively human, such as recognizing speech and translating languages.
```

---

## **🔧 Advanced Usage**

### **Custom Preprocessing**

```python
from lingo import TextPreprocessor

# Configure preprocessing
preprocessor = TextPreprocessor(
    config={
        "lowercase": True,
        "remove_punctuation": True,
        "remove_stopwords": True,
        "lemmatize": True,
        "use_spacy": True,
        "spacy_model": "en_core_web_sm"
    }
)

# Process text
text = "The quick brown foxes are jumping over the lazy dogs! 🦊🐕"
cleaned = preprocessor(text)
print(f"Cleaned: {cleaned}")

# Get detailed preprocessing results
pipeline_result = preprocessor.get_preprocessing_pipeline(text)
print(f"Words: {pipeline_result['words']}")
print(f"Lemmatized: {pipeline_result['lemmatized']}")

# Output:
# Cleaned: the quick brown foxes are jumping over the lazy dogs
# Words: ['the', 'quick', 'brown', 'foxes', 'are', 'jumping', 'over', 'the', 'lazy', 'dogs']
# Lemmatized: ['the', 'quick', 'brown', 'fox', 'are', 'jumping', 'over', 'the', 'lazy', 'dog']
```

### **Batch Processing**

```python
# Process large datasets efficiently
texts = ["Text 1", "Text 2", "Text 3", ...]  # Large list

# Batch processing
results = pipeline.batch_predict(texts, batch_size=32)

# Or use utility function
from lingo.utils import batch_texts
batches = batch_texts(texts, batch_size=32)
```

### **Model Evaluation**

```python
from lingo.utils import evaluate_classification

# Evaluate model performance
y_true = ["positive", "negative", "positive", "neutral"]
y_pred = ["positive", "negative", "positive", "positive"]

metrics = evaluate_classification(y_true, y_pred)
print(f"Accuracy: {metrics['accuracy']:.3f}")
print(f"F1 Score: {metrics['f1']:.3f}")

# Output:
# Accuracy: 0.750
# F1 Score: 0.800
```

### **Pipeline Configuration**

```python
# Load configuration from file
import yaml

with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)

# Create pipeline with custom config
pipeline = Pipeline(
    task="sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest",
    config=config
)
```

---

## **📁 Project Structure**

```
lingo/
├── lingo/                    # Core package
│   ├── __init__.py          # Main imports
│   ├── core.py              # Pipeline class
│   ├── preprocessing.py      # Text preprocessing
│   ├── models.py            # NLP model classes
│   ├── utils.py             # Utility functions
│   └── cli.py               # Command-line interface
├── examples/                 # Usage examples
│   └── basic_usage.py       # Basic examples
├── lingo/configs/           # Configuration files
│   └── default.yaml         # Default config
├── tests/                   # Test suite
├── setup.py                 # Package setup
├── requirements.txt          # Dependencies
└── README.md                # This file
```

---

## **⚡ Performance & Optimization**

### **Device Selection**

```python
# Automatic device detection
pipeline = Pipeline(task="sentiment-analysis", model="...", device="auto")

# Manual device selection
pipeline = Pipeline(task="sentiment-analysis", model="...", device="cuda")
pipeline = Pipeline(task="sentiment-analysis", model="...", device="mps")  # Apple Silicon
```

### **Batch Processing**

```python
# Optimize for large datasets
results = pipeline.batch_predict(texts, batch_size=64)
```

### **Memory Management**

```python
# Use mixed precision for faster inference
pipeline = Pipeline(
    task="sentiment-analysis",
    model="...",
    config={"use_mixed_precision": True}
)
```

---

## **🔌 Integration & Extensibility**

### **With Existing Libraries**

```python
# spaCy integration
import spacy
nlp = spacy.load("en_core_web_sm")

# NLTK integration
import nltk
from nltk.tokenize import word_tokenize

# scikit-learn integration
from sklearn.metrics import classification_report
```

### **Custom Models**

```python
# Extend base model class
from lingo.models import BaseModel

class CustomModel(BaseModel):
    def _load_model(self):
        # Custom model loading logic
        pass

    def __call__(self, inputs, **kwargs):
        # Custom inference logic
        pass
```

---

## **🚀 Deployment & Production**

### **Save & Load Pipelines**

```python
# Save pipeline
pipeline.save("./saved_pipeline")

# Load pipeline
loaded_pipeline = Pipeline.load("./saved_pipeline")
```

### **REST API Template**

```python
from fastapi import FastAPI
from lingo import Pipeline

app = FastAPI()

# Load pipeline
pipeline = Pipeline.load("./saved_pipeline")

@app.post("/analyze")
async def analyze_text(text: str):
    result = pipeline(text)
    return {"result": result}
```

### **Docker Deployment**

```dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```

---

## **📊 Benchmarks & Performance**

| Task                | Model         | Speed (CPU) | Speed (GPU) | Memory Usage |
| ------------------- | ------------- | ----------- | ----------- | ------------ |
| Sentiment Analysis  | RoBERTa-base  | 50 ms       | 15 ms       | 500 MB       |
| NER                 | BERT-base-NER | 80 ms       | 25 ms       | 400 MB       |
| Text Classification | DistilBERT    | 30 ms       | 10 ms       | 300 MB       |
| Embeddings          | MiniLM-L6     | 40 ms       | 12 ms       | 200 MB       |

_Benchmarks on Intel i7-10700K (CPU) and RTX 3080 (GPU)_

---

## **🤝 Contributing**

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### **Development Setup**

```bash
# Clone repository
git clone https://github.com/irfanalidv/lingo-nlp-toolkit.git
cd lingo-nlp-toolkit

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Format code
black lingo/
isort lingo/
```

---

## **📄 License**

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## **🙏 Acknowledgments**

- **Hugging Face** for the amazing transformers library
- **spaCy** for excellent NLP tools
- **NLTK** for foundational NLP capabilities
- **PyTorch** for deep learning framework
- **scikit-learn** for machine learning utilities

---

## **📞 Support & Community**

- **Documentation**: [GitHub Repository](https://github.com/irfanalidv/lingo-nlp-toolkit)
- **Issues**: [GitHub Issues](https://github.com/irfanalidv/lingo-nlp-toolkit/issues)
- **Discussions**: [GitHub Discussions](https://github.com/irfanalidv/lingo-nlp-toolkit/discussions)

---

## **⭐ Star History**

[![Star History Chart](https://api.star-history.com/svg?repos=irfanalidv/lingo-nlp-toolkit&type=Date)](https://star-history.com/#irfanalidv/lingo-nlp-toolkit&Date)

---

**Made with ❤️ by Md Irfan Ali**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/irfanalidv/lingo-nlp-toolkit",
    "name": "lingo-nlp-toolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Md Irfan Ali <irfanali29@hotmail.com>",
    "keywords": "nlp, natural-language-processing, transformers, bert, machine-learning, ai, text-processing, sentiment-analysis, named-entity-recognition, text-classification, embeddings",
    "author": "Md Irfan Ali",
    "author_email": "Md Irfan Ali <irfanali29@hotmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0b/72/82b4d88c86c0b036bd508659c6704b88e12455470744ab122ef331c7bebf/lingo_nlp_toolkit-2.3.tar.gz",
    "platform": null,
    "description": "# **Lingo: Advanced NLP Toolkit**\n\n**Lightweight, Fast, and Transformer-Ready**\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PyPI version](https://badge.fury.io/py/lingo-nlp-toolkit.svg)](https://badge.fury.io/py/lingo-nlp-toolkit)\n[![PyPI Downloads](https://static.pepy.tech/badge/lingo-nlp-toolkit)](https://pepy.tech/projects/lingo-nlp-toolkit)\n\n## **Overview**\n\nLingo is a modern, high-performance **Natural Language Processing (NLP) toolkit** designed for researchers, data scientists, and developers building intelligent language-powered applications. It combines **ease of use**, **speed**, and **state-of-the-art transformer capabilities**, offering an end-to-end pipeline \u2014 from text preprocessing to advanced contextual understanding.\n\nLingo bridges the gap between traditional NLP techniques and next-generation transformer-based architectures like **BERT**, **GPT**, and **LLaMA**, ensuring flexibility, scalability, and cutting-edge accuracy.\n\n---\n\n## **\ud83d\ude80 Quick Start**\n\n### **Installation**\n\n```bash\n# One-command installation (recommended)\npip install lingo-nlp-toolkit\n\n# Full installation with all dependencies\npip install lingo-nlp-toolkit[full]\n\n# Development installation\npip install lingo-nlp-toolkit[dev]\n\n# GPU support\npip install lingo-nlp-toolkit[gpu]\n```\n\n**\u2728 Auto-Setup**: Lingo automatically downloads all required NLP data and models on first use!\n\n**\ud83d\udce6 PyPI Package**: [lingo-nlp-toolkit on PyPI](https://pypi.org/project/lingo-nlp-toolkit/#files)\n\n### **Examples & Use Cases**\n\n```bash\n# Basic usage\npython examples/basic_usage.py\n\n# Advanced real-world applications\npython examples/advanced_use_cases.py\n\n# Enterprise-grade NLP workflows\npython examples/enterprise_nlp.py\n\n# Capability showcase\npython examples/showcase.py\n\n# Interactive demo\npython demo.py\n```\n\n### **First Steps**\n\n```python\nfrom lingo import Pipeline\n\n# Create a sentiment analysis pipeline\nnlp = Pipeline(task=\"sentiment-analysis\", model=\"cardiffnlp/twitter-roberta-base-sentiment-latest\")\n\n# Run inference\ntext = \"I absolutely love the new product update!\"\nresult = nlp(text)\nprint(result)\n# Output: {'label': 'POSITIVE', 'score': 0.988}\n```\n\n### **Command Line Usage**\n\n```bash\n# Sentiment analysis\nlingo run sentiment-analysis --model cardiffnlp/twitter-roberta-base-sentiment-latest --text \"I love this product!\"\n\n# List available models\nlingo list-models\n\n# Download a model\nlingo download-model --model bert-base-uncased\n```\n\n---\n\n## **\u2728 Key Features**\n\n### **1. Text Preprocessing & Normalization**\n\n- \u2705 Unicode normalization (NFC/NFD)\n- \u2705 Lowercasing, punctuation removal, special character stripping\n- \u2705 Stopword removal (multi-language support)\n- \u2705 Lemmatization & stemming\n- \u2705 Advanced tokenization (Word, Sentence, Subword)\n- \u2705 Spell correction & slang expansion\n\n### **2. Core NLP Tasks**\n\n- \u2705 **Text Classification** - Multi-class & multi-label\n- \u2705 **Named Entity Recognition (NER)** - Domain-specific models\n- \u2705 **Sentiment Analysis** - Binary, ternary, fine-grained\n- \u2705 **Text Embeddings** - BERT, Sentence-BERT, LLaMA\n- \u2705 **Question Answering** - Extractive & generative\n- \u2705 **Text Summarization** - Abstractive & extractive\n\n### **3. Hugging Face Integration**\n\n- \u2705 Load any model from Hugging Face Hub\n- \u2705 Fine-tune pre-trained transformers\n- \u2705 Export models to Hugging Face Hub\n- \u2705 Mixed precision training\n\n### **4. Performance & Scalability**\n\n- \u2705 GPU & multi-core CPU support\n- \u2705 Asynchronous batch processing\n- \u2705 Memory-efficient tokenization\n- \u2705 Lightweight deployment mode\n\n---\n\n## **\ud83d\udcda Comprehensive Examples**\n\n### **Text Classification**\n\n```python\nfrom lingo import Pipeline\n\n# Create classifier\nclassifier = Pipeline(\n    task=\"text-classification\",\n    model=\"bert-base-uncased\"\n)\n\n# Classify texts\ntexts = [\n    \"This is a positive review about the product.\",\n    \"I'm not satisfied with the service quality.\",\n    \"The product meets my expectations.\"\n]\n\nresults = classifier(texts)\nfor text, result in zip(texts, results):\n    print(f\"{text[:30]}... \u2192 {result['label']}\")\n\n# Output:\n# This is a positive review abou... \u2192 LABEL_0\n# I'm not satisfied with the ser... \u2192 LABEL_0\n# The product meets my expectati... \u2192 LABEL_0\n```\n\n### **Named Entity Recognition**\n\n```python\n# Create NER pipeline\nner = Pipeline(\n    task=\"ner\",\n    model=\"dslim/bert-base-NER\"\n)\n\n# Extract entities\ntext = \"Apple Inc. is headquartered in Cupertino, California. Tim Cook is the CEO.\"\nentities = ner(text)\n\nfor entity in entities:\n    print(f\"Entity: {entity['entity']}, Type: {entity['word']}, Score: {entity['score']:.3f}\")\n\n# Output:\n# Entity: B-LOC, Type: cup, Score: 0.940\n# Entity: B-LOC, Type: ##ert, Score: 0.671\n# Entity: I-LOC, Type: ##ino, Score: 0.437\n# Entity: B-LOC, Type: ca, Score: 0.506\n```\n\n### **Sentiment Analysis**\n\n```python\n# Create sentiment analyzer\nsentiment = Pipeline(\n    task=\"sentiment-analysis\",\n    model=\"cardiffnlp/twitter-roberta-base-sentiment-latest\"\n)\n\n# Analyze sentiment\ntexts = [\n    \"I love this amazing product!\",\n    \"This is terrible, worst purchase ever.\",\n    \"It's okay, nothing special.\"\n]\n\nresults = sentiment(texts)\nfor text, result in zip(texts, results):\n    print(f\"{text[:30]}... \u2192 {result['label']} ({result['score']:.3f})\")\n\n# Output:\n# I love this amazing product!... \u2192 positive (0.987)\n# This is terrible, worst purcha... \u2192 negative (0.953)\n# It's okay, nothing special.... \u2192 neutral (0.596)\n```\n\n### **Text Embeddings & Similarity**\n\n```python\n# Create embedding pipeline\nembeddings = Pipeline(\n    task=\"embedding\",\n    model=\"sentence-transformers/all-MiniLM-L6-v2\"\n)\n\n# Generate embeddings\ntexts = [\n    \"The cat is on the mat.\",\n    \"A cat is sitting on the mat.\",\n    \"The weather is beautiful today.\"\n]\n\nembeds = embeddings(texts)\n\n# Calculate similarity\nfrom lingo.models import EmbeddingModel\nembedding_model = embeddings.model\n\nsimilarity = embedding_model.similarity(texts[0], texts[1])\nprint(f\"Similarity: {similarity:.3f}\")\n\n# Output:\n# Similarity: 0.907\n```\n\n### **Question Answering**\n\n```python\n# Create QA pipeline\nqa = Pipeline(\n    task=\"question-answering\",\n    model=\"deepset/roberta-base-squad2\"\n)\n\n# Answer questions\ncontext = \"\"\"\nPython is a high-level programming language created by Guido van Rossum in 1991.\nIt's known for its simplicity and readability, making it popular for beginners.\nPython is widely used in data science, machine learning, and web development.\n\"\"\"\n\nquestion = \"Who created Python?\"\nanswer = qa(question=question, context=context)\n\nprint(f\"Q: {question}\")\nprint(f\"A: {answer['answer']} (confidence: {answer['score']:.3f})\")\n\n# Output:\n# Q: Who created Python?\n# A: Guido van Rossum (confidence: 0.990)\n```\n\n### **Text Summarization**\n\n```python\n# Create summarization pipeline\nsummarizer = Pipeline(\n    task=\"summarization\",\n    model=\"facebook/bart-large-cnn\"\n)\n\n# Summarize long text\nlong_text = \"\"\"\nArtificial Intelligence (AI) has emerged as one of the most transformative technologies of the 21st century.\nIt encompasses a wide range of capabilities including machine learning, natural language processing, computer vision,\nand robotics. AI systems can now perform tasks that were once thought to be exclusively human, such as recognizing\nspeech, translating languages, making decisions, and solving complex problems.\n\"\"\"\n\nsummary = summarizer(long_text)\nprint(f\"Summary: {summary['summary_text']}\")\n\n# Output:\n# Summary: artificial intelligence (ai) has emerged as one of the most transformative technologies of the 21st century. it encompasses a wide range of capabilities including machine learning, natural language processing, computer vision, and robotics. ai systems can now perform tasks that were once thought to be exclusively human, such as recognizing speech and translating languages.\n```\n\n---\n\n## **\ud83d\udd27 Advanced Usage**\n\n### **Custom Preprocessing**\n\n```python\nfrom lingo import TextPreprocessor\n\n# Configure preprocessing\npreprocessor = TextPreprocessor(\n    config={\n        \"lowercase\": True,\n        \"remove_punctuation\": True,\n        \"remove_stopwords\": True,\n        \"lemmatize\": True,\n        \"use_spacy\": True,\n        \"spacy_model\": \"en_core_web_sm\"\n    }\n)\n\n# Process text\ntext = \"The quick brown foxes are jumping over the lazy dogs! \ud83e\udd8a\ud83d\udc15\"\ncleaned = preprocessor(text)\nprint(f\"Cleaned: {cleaned}\")\n\n# Get detailed preprocessing results\npipeline_result = preprocessor.get_preprocessing_pipeline(text)\nprint(f\"Words: {pipeline_result['words']}\")\nprint(f\"Lemmatized: {pipeline_result['lemmatized']}\")\n\n# Output:\n# Cleaned: the quick brown foxes are jumping over the lazy dogs\n# Words: ['the', 'quick', 'brown', 'foxes', 'are', 'jumping', 'over', 'the', 'lazy', 'dogs']\n# Lemmatized: ['the', 'quick', 'brown', 'fox', 'are', 'jumping', 'over', 'the', 'lazy', 'dog']\n```\n\n### **Batch Processing**\n\n```python\n# Process large datasets efficiently\ntexts = [\"Text 1\", \"Text 2\", \"Text 3\", ...]  # Large list\n\n# Batch processing\nresults = pipeline.batch_predict(texts, batch_size=32)\n\n# Or use utility function\nfrom lingo.utils import batch_texts\nbatches = batch_texts(texts, batch_size=32)\n```\n\n### **Model Evaluation**\n\n```python\nfrom lingo.utils import evaluate_classification\n\n# Evaluate model performance\ny_true = [\"positive\", \"negative\", \"positive\", \"neutral\"]\ny_pred = [\"positive\", \"negative\", \"positive\", \"positive\"]\n\nmetrics = evaluate_classification(y_true, y_pred)\nprint(f\"Accuracy: {metrics['accuracy']:.3f}\")\nprint(f\"F1 Score: {metrics['f1']:.3f}\")\n\n# Output:\n# Accuracy: 0.750\n# F1 Score: 0.800\n```\n\n### **Pipeline Configuration**\n\n```python\n# Load configuration from file\nimport yaml\n\nwith open(\"config.yaml\", \"r\") as f:\n    config = yaml.safe_load(f)\n\n# Create pipeline with custom config\npipeline = Pipeline(\n    task=\"sentiment-analysis\",\n    model=\"cardiffnlp/twitter-roberta-base-sentiment-latest\",\n    config=config\n)\n```\n\n---\n\n## **\ud83d\udcc1 Project Structure**\n\n```\nlingo/\n\u251c\u2500\u2500 lingo/                    # Core package\n\u2502   \u251c\u2500\u2500 __init__.py          # Main imports\n\u2502   \u251c\u2500\u2500 core.py              # Pipeline class\n\u2502   \u251c\u2500\u2500 preprocessing.py      # Text preprocessing\n\u2502   \u251c\u2500\u2500 models.py            # NLP model classes\n\u2502   \u251c\u2500\u2500 utils.py             # Utility functions\n\u2502   \u2514\u2500\u2500 cli.py               # Command-line interface\n\u251c\u2500\u2500 examples/                 # Usage examples\n\u2502   \u2514\u2500\u2500 basic_usage.py       # Basic examples\n\u251c\u2500\u2500 lingo/configs/           # Configuration files\n\u2502   \u2514\u2500\u2500 default.yaml         # Default config\n\u251c\u2500\u2500 tests/                   # Test suite\n\u251c\u2500\u2500 setup.py                 # Package setup\n\u251c\u2500\u2500 requirements.txt          # Dependencies\n\u2514\u2500\u2500 README.md                # This file\n```\n\n---\n\n## **\u26a1 Performance & Optimization**\n\n### **Device Selection**\n\n```python\n# Automatic device detection\npipeline = Pipeline(task=\"sentiment-analysis\", model=\"...\", device=\"auto\")\n\n# Manual device selection\npipeline = Pipeline(task=\"sentiment-analysis\", model=\"...\", device=\"cuda\")\npipeline = Pipeline(task=\"sentiment-analysis\", model=\"...\", device=\"mps\")  # Apple Silicon\n```\n\n### **Batch Processing**\n\n```python\n# Optimize for large datasets\nresults = pipeline.batch_predict(texts, batch_size=64)\n```\n\n### **Memory Management**\n\n```python\n# Use mixed precision for faster inference\npipeline = Pipeline(\n    task=\"sentiment-analysis\",\n    model=\"...\",\n    config={\"use_mixed_precision\": True}\n)\n```\n\n---\n\n## **\ud83d\udd0c Integration & Extensibility**\n\n### **With Existing Libraries**\n\n```python\n# spaCy integration\nimport spacy\nnlp = spacy.load(\"en_core_web_sm\")\n\n# NLTK integration\nimport nltk\nfrom nltk.tokenize import word_tokenize\n\n# scikit-learn integration\nfrom sklearn.metrics import classification_report\n```\n\n### **Custom Models**\n\n```python\n# Extend base model class\nfrom lingo.models import BaseModel\n\nclass CustomModel(BaseModel):\n    def _load_model(self):\n        # Custom model loading logic\n        pass\n\n    def __call__(self, inputs, **kwargs):\n        # Custom inference logic\n        pass\n```\n\n---\n\n## **\ud83d\ude80 Deployment & Production**\n\n### **Save & Load Pipelines**\n\n```python\n# Save pipeline\npipeline.save(\"./saved_pipeline\")\n\n# Load pipeline\nloaded_pipeline = Pipeline.load(\"./saved_pipeline\")\n```\n\n### **REST API Template**\n\n```python\nfrom fastapi import FastAPI\nfrom lingo import Pipeline\n\napp = FastAPI()\n\n# Load pipeline\npipeline = Pipeline.load(\"./saved_pipeline\")\n\n@app.post(\"/analyze\")\nasync def analyze_text(text: str):\n    result = pipeline(text)\n    return {\"result\": result}\n```\n\n### **Docker Deployment**\n\n```dockerfile\nFROM python:3.9-slim\n\nWORKDIR /app\nCOPY requirements.txt .\nRUN pip install -r requirements.txt\n\nCOPY . .\nEXPOSE 8000\n\nCMD [\"uvicorn\", \"main:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\n```\n\n---\n\n## **\ud83d\udcca Benchmarks & Performance**\n\n| Task                | Model         | Speed (CPU) | Speed (GPU) | Memory Usage |\n| ------------------- | ------------- | ----------- | ----------- | ------------ |\n| Sentiment Analysis  | RoBERTa-base  | 50 ms       | 15 ms       | 500 MB       |\n| NER                 | BERT-base-NER | 80 ms       | 25 ms       | 400 MB       |\n| Text Classification | DistilBERT    | 30 ms       | 10 ms       | 300 MB       |\n| Embeddings          | MiniLM-L6     | 40 ms       | 12 ms       | 200 MB       |\n\n_Benchmarks on Intel i7-10700K (CPU) and RTX 3080 (GPU)_\n\n---\n\n## **\ud83e\udd1d Contributing**\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### **Development Setup**\n\n```bash\n# Clone repository\ngit clone https://github.com/irfanalidv/lingo-nlp-toolkit.git\ncd lingo-nlp-toolkit\n\n# Install in development mode\npip install -e .[dev]\n\n# Run tests\npytest\n\n# Format code\nblack lingo/\nisort lingo/\n```\n\n---\n\n## **\ud83d\udcc4 License**\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## **\ud83d\ude4f Acknowledgments**\n\n- **Hugging Face** for the amazing transformers library\n- **spaCy** for excellent NLP tools\n- **NLTK** for foundational NLP capabilities\n- **PyTorch** for deep learning framework\n- **scikit-learn** for machine learning utilities\n\n---\n\n## **\ud83d\udcde Support & Community**\n\n- **Documentation**: [GitHub Repository](https://github.com/irfanalidv/lingo-nlp-toolkit)\n- **Issues**: [GitHub Issues](https://github.com/irfanalidv/lingo-nlp-toolkit/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/irfanalidv/lingo-nlp-toolkit/discussions)\n\n---\n\n## **\u2b50 Star History**\n\n[![Star History Chart](https://api.star-history.com/svg?repos=irfanalidv/lingo-nlp-toolkit&type=Date)](https://star-history.com/#irfanalidv/lingo-nlp-toolkit&Date)\n\n---\n\n**Made with \u2764\ufe0f by Md Irfan Ali**\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Advanced NLP Toolkit - Lightweight, Fast, and Transformer-Ready",
    "version": "2.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/irfanalidv/lingo-nlp-toolkit/issues",
        "Changelog": "https://github.com/irfanalidv/lingo-nlp-toolkit/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/irfanalidv/lingo-nlp-toolkit",
        "Download": "https://pypi.org/project/lingo-nlp-toolkit/#files",
        "Homepage": "https://github.com/irfanalidv/lingo-nlp-toolkit",
        "Repository": "https://github.com/irfanalidv/lingo-nlp-toolkit.git",
        "Source Code": "https://github.com/irfanalidv/lingo-nlp-toolkit"
    },
    "split_keywords": [
        "nlp",
        " natural-language-processing",
        " transformers",
        " bert",
        " machine-learning",
        " ai",
        " text-processing",
        " sentiment-analysis",
        " named-entity-recognition",
        " text-classification",
        " embeddings"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "54b53d2986a928b5d9261d2eb43e65d981a4d08b722e5c85e5cdfb119d95efbb",
                "md5": "2cbbde86bd8b22d77dcccbcf6aa07439",
                "sha256": "ab492548bdfdeba123163c2028b05a9b29005c8c190e45cd872f5063df5d5f6c"
            },
            "downloads": -1,
            "filename": "lingo_nlp_toolkit-2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2cbbde86bd8b22d77dcccbcf6aa07439",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 25534,
            "upload_time": "2025-08-15T08:42:48",
            "upload_time_iso_8601": "2025-08-15T08:42:48.227027Z",
            "url": "https://files.pythonhosted.org/packages/54/b5/3d2986a928b5d9261d2eb43e65d981a4d08b722e5c85e5cdfb119d95efbb/lingo_nlp_toolkit-2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b7282b4d88c86c0b036bd508659c6704b88e12455470744ab122ef331c7bebf",
                "md5": "cde7b1f1ddace945f3bf1c9661f27e6c",
                "sha256": "238623bbe803d3a2134ef5cc5f1be5f2db27df39b71164176cf48da7dda8c35f"
            },
            "downloads": -1,
            "filename": "lingo_nlp_toolkit-2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "cde7b1f1ddace945f3bf1c9661f27e6c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 65895,
            "upload_time": "2025-08-15T08:42:51",
            "upload_time_iso_8601": "2025-08-15T08:42:51.202470Z",
            "url": "https://files.pythonhosted.org/packages/0b/72/82b4d88c86c0b036bd508659c6704b88e12455470744ab122ef331c7bebf/lingo_nlp_toolkit-2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-15 08:42:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "irfanalidv",
    "github_project": "lingo-nlp-toolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.12.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.20.0"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    ">=",
                    "0.13.0"
                ]
            ]
        },
        {
            "name": "spacy",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    ">=",
                    "3.8"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.64.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.28.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.11.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    ">=",
                    "4.9.0"
                ]
            ]
        },
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.95.0"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "2.2.0"
                ]
            ]
        }
    ],
    "lcname": "lingo-nlp-toolkit"
}
        
Elapsed time: 2.14826s