# HybridVectorizer
**Unified embedding for tabular, text, and multimodal data with powerful similarity search.**
HybridVectorizer automatically handles mixed data types (numerical, categorical, text, dates) and creates high-quality vector representations for similarity search, recommendation systems, and machine learning pipelines.
# HybridVectorizer
**Late-fusion embeddings for mixed tabular + text data.** One line to vectorize text + numeric + categorical into a single search space with adjustable block weights.
[](https://pypi.org/project/hybrid-vectorizer/)
[](LICENSE.txt)
[](https://pepy.tech/project/hybrid-vectorizer)
[](https://github.com/hariharaprabhu/hybrid-vectorizer/actions)
**Quick links:**
• **PyPI** · https://pypi.org/project/hybrid-vectorizer/
• **Examples** · https://github.com/hariharaprabhu/hybrid-vectorizer/tree/main/Examples
• **Issues** · https://github.com/hariharaprabhu/hybrid-vectorizer/issues
# Quick Start
## Basic Usage
```python
import pandas as pd
from hybrid_vectorizer import HybridVectorizer
# Load financial data (S&P 500 companies)
df = pd.read_csv("sp500_companies.csv")
# Select relevant columns for analysis
df = df[['Symbol', 'Sector', 'Industry', 'Currentprice', 'Marketcap',
'Fulltimeemployees', 'Longbusinesssummary']]
# Initialize with company symbol as index
hv = HybridVectorizer(index_column="Symbol")
vectors = hv.fit_transform(df)
print(f"Generated {vectors.shape[0]} vectors with {vectors.shape[1]} features")
```
## Finding Similar Companies
```python
# Find companies similar to Google (GOOGL)
query = df.loc[df['Symbol']=='GOOGL'].iloc[0].to_dict()
results = hv.similarity_search(query, ignore_exact_matches=True, top_n=5)
print(results[['Symbol', 'Sector', 'Marketcap', 'similarity']])
```
## Weight Tuning for Different Use Cases
### Focus on Business Description (Text Similarity)
```python
# Emphasize business model similarity
results = hv.similarity_search(
query,
block_weights={'text': 2.0, 'numerical': 0.5, 'categorical': 0.5},
top_n=5
)
print("Companies with similar business models:")
print(results[['Symbol', 'Sector', 'Longbusinesssummary', 'similarity']])
```
### Focus on Financial Metrics
```python
# Emphasize financial similarity
results = hv.similarity_search(
query,
block_weights={'text': 0.3, 'numerical': 2.0, 'categorical': 0.5},
top_n=5
)
print("Companies with similar financials:")
print(results[['Symbol', 'Currentprice', 'Marketcap', 'similarity']])
```
### Focus on Industry/Sector
```python
# Emphasize sector/industry similarity
results = hv.similarity_search(
query,
block_weights={'text': 0.3, 'numerical': 0.5, 'categorical': 2.0},
top_n=5
)
print("Companies in similar sectors:")
print(results[['Symbol', 'Sector', 'Industry', 'similarity']])
```
## Custom Queries
```python
# Search for specific characteristics
custom_query = {
'Sector': 'Technology',
'Longbusinesssummary': 'cloud computing artificial intelligence',
'Marketcap': 500000000000, # $500B market cap
'Fulltimeemployees': 100000
}
results = hv.similarity_search(custom_query, top_n=5)
print("Large tech companies with AI/cloud focus:")
print(results[['Symbol', 'Sector', 'Marketcap', 'similarity']])
```
## Real-World Use Cases
- **Investment Research**: Find companies with similar business models or financials
- **Competitor Analysis**: Identify direct and indirect competitors
- **Portfolio Construction**: Build diversified portfolios based on similarity
- **Market Research**: Understand sector clustering and relationships
## 📦 Installation
```bash
pip install hybrid-vectorizer
```
## GPU Support (Recommended for Better Performance)
For faster text embedding with large datasets, install GPU-accelerated PyTorch:
### Check Your CUDA Version
```bash
nvidia-smi
```
Look for "CUDA Version: X.X" in the output.
### Install GPU Support
```bash
# For CUDA 11.8 (most common)
pip install hybrid-vectorizer
pip install torch --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121
```
### Verify GPU Installation
```python
import torch
from hybrid_vectorizer import HybridVectorizer
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name()}")
# Test with your data
hv = HybridVectorizer()
# Should show "GPU detected" instead of "Using CPU"
```
### Performance Notes
- **CPU**: Works well for datasets <1000 rows
- **GPU**: 5-10x faster for text embedding, recommended for larger datasets
- Text columns benefit most from GPU acceleration
**Requirements:**
- Python 3.8+
- pandas, numpy, scikit-learn
- sentence-transformers, torch
## 🏗️ Architecture
HybridVectorizer uses a novel late fusion approach for multimodal similarity search:

## ✨ Key Features
### 🔄 **Automatic Data Type Handling**
- **Numerical**: Auto-normalized with MinMaxScaler
- **Categorical**: One-hot or frequency encoding (smart threshold)
- **Text**: SentenceTransformer embeddings
- **Dates**: Extract features or ignore
- **Mixed**: Handles missing values, inf, NaN gracefully
### 🎯 **Powerful Similarity Search**
- **Late Fusion**: Combines modalities with configurable weights
- **Block-level Control**: Weight text vs. numerical vs. categorical separately
- **Explanation**: See which features drive similarity
### 🛠️ **Production Ready**
- **Memory Efficient**: Optimized for large datasets
- **GPU Support**: Automatic GPU detection for text encoding
- **Persistence**: Save/load trained models
- **Error Handling**: Informative custom exceptions
## 💡 Usage Examples
### Basic Usage
```python
# Fit and transform
hv = HybridVectorizer()
vectors = hv.fit_transform(df)
# Simple query
results = hv.similarity_search({'description': 'machine learning'})
```
### Advanced Configuration
```python
hv = HybridVectorizer(
column_encodings={'description': 'text', 'category': 'categorical'},
ignore_columns=['id', 'created_at'],
index_column='id',
onehot_threshold=15,
text_batch_size=64
)
```
### Weighted Search
```python
# Emphasize text over numerical features
results = hv.similarity_search(
query,
block_weights={'text': 3, 'categorical': 2, 'numerical': 1}
)
```
### Text-Only Search
```python
results = hv.similarity_search(
{'description': 'AI startup'}
)
```
## 🔧 Configuration Options
| Parameter | Description | Default |
|-----------|-------------|---------|
| `column_encodings` | Manual type overrides | `{}` |
| `ignore_columns` | Skip these columns | `[]` |
| `index_column` | ID column (preserved in results) | `None` |
| `onehot_threshold` | Max categories for one-hot encoding | `10` |
| `default_text_model` | SentenceTransformer model | `'all-MiniLM-L6-v2'` |
| `text_batch_size` | Batch size for text encoding | `128` |
## 📊 Data Type Detection
HybridVectorizer automatically detects:
- **Numerical**: `int64`, `float64`, etc. → MinMax normalization
- **Categorical**: `object` with ≤10 unique values → One-hot encoding
- **Text**: `object` with >10 unique values → SentenceTransformer embeddings
- **Dates**: `datetime64` → Extract year/month/day or ignore
Override with `column_encodings={'col': 'text'}` if needed.
## 🎛️ Advanced Features
### Model Persistence
```python
# Save trained model
hv.save('my_vectorizer.pkl')
# Load later
hv2 = HybridVectorizer.load('my_vectorizer.pkl')
results = hv2.similarity_search(query)
```
### Encoding Report
```python
# See how each column was processed
report = hv.get_encoding_report()
print(report)
```
### External Vector Database
```python
import faiss
# Use FAISS for faster search
index = faiss.IndexFlatIP(vectors.shape[1])
index.add(vectors)
hv.set_vector_db(index)
```
## 🚨 Error Handling
```python
from hybrid_vectorizer import HybridVectorizerError, ModelNotFittedError
try:
results = hv.similarity_search(query)
except ModelNotFittedError:
print("Call fit_transform() first!")
except HybridVectorizerError as e:
print(f"HybridVectorizer error: {e}")
```
## 📈 Performance
Typical performance on modern hardware:
| Dataset Size | Fit Time | Search Time | Memory |
|--------------|----------|-------------|--------|
| 1K rows | <1s | <1ms | ~50MB |
| 10K rows | <10s | <10ms | ~200MB |
| 100K rows | <2min | <100ms | ~1GB |
*With mixed data types including text columns*
## 🛠️ Development
```bash
# Clone repository
git clone https://github.com/hariharaprabhu/hybrid-vectorizer
cd hybrid-vectorizer
# Install in development mode
pip install -e .
# Run tests
python tests/test_basic.py
```
## 📄 License
Apache-2.0 License - see LICENSE file for details.
## 📞 Support
- **Issues**: [GitHub Issues](https://github.com/hariharaprabhu/hybrid-vectorizer/issues)
- **Documentation**: See this README and docstrings
- **Questions**: Open an issue for questions or feature requests
---
**HybridVectorizer** - Making multimodal similarity search simple and powerful. 🚀
Raw data
{
"_id": null,
"home_page": null,
"name": "hybrid-vectorizer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "machine-learning, embeddings, similarity-search, multimodal, vectorization",
"author": null,
"author_email": "Hari Narayanan <hari.dataprojects@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/83/59/aeb77a1fc9fdc6e916a0660424cc4c10c651b00a1ffbc154bdb0f18f3b6c/hybrid_vectorizer-0.1.2.tar.gz",
"platform": null,
"description": "# HybridVectorizer\r\n\r\n**Unified embedding for tabular, text, and multimodal data with powerful similarity search.**\r\n\r\nHybridVectorizer automatically handles mixed data types (numerical, categorical, text, dates) and creates high-quality vector representations for similarity search, recommendation systems, and machine learning pipelines.\r\n\r\n# HybridVectorizer\r\n**Late-fusion embeddings for mixed tabular + text data.** One line to vectorize text + numeric + categorical into a single search space with adjustable block weights.\r\n\r\n[](https://pypi.org/project/hybrid-vectorizer/)\r\n[](LICENSE.txt)\r\n[](https://pepy.tech/project/hybrid-vectorizer)\r\n[](https://github.com/hariharaprabhu/hybrid-vectorizer/actions)\r\n\r\n**Quick links:** \r\n\u2022 **PyPI** \u00b7 https://pypi.org/project/hybrid-vectorizer/ \r\n\u2022 **Examples** \u00b7 https://github.com/hariharaprabhu/hybrid-vectorizer/tree/main/Examples \r\n\u2022 **Issues** \u00b7 https://github.com/hariharaprabhu/hybrid-vectorizer/issues\r\n\r\n\r\n# Quick Start\r\n\r\n## Basic Usage\r\n\r\n```python\r\nimport pandas as pd\r\nfrom hybrid_vectorizer import HybridVectorizer\r\n\r\n# Load financial data (S&P 500 companies)\r\ndf = pd.read_csv(\"sp500_companies.csv\")\r\n\r\n# Select relevant columns for analysis\r\ndf = df[['Symbol', 'Sector', 'Industry', 'Currentprice', 'Marketcap', \r\n 'Fulltimeemployees', 'Longbusinesssummary']]\r\n\r\n# Initialize with company symbol as index\r\nhv = HybridVectorizer(index_column=\"Symbol\")\r\nvectors = hv.fit_transform(df)\r\n\r\nprint(f\"Generated {vectors.shape[0]} vectors with {vectors.shape[1]} features\")\r\n```\r\n\r\n## Finding Similar Companies\r\n\r\n```python\r\n# Find companies similar to Google (GOOGL)\r\nquery = df.loc[df['Symbol']=='GOOGL'].iloc[0].to_dict()\r\n\r\nresults = hv.similarity_search(query, ignore_exact_matches=True, top_n=5)\r\nprint(results[['Symbol', 'Sector', 'Marketcap', 'similarity']])\r\n```\r\n\r\n## Weight Tuning for Different Use Cases\r\n\r\n### Focus on Business Description (Text Similarity)\r\n```python\r\n# Emphasize business model similarity\r\nresults = hv.similarity_search(\r\n query,\r\n block_weights={'text': 2.0, 'numerical': 0.5, 'categorical': 0.5},\r\n top_n=5\r\n)\r\nprint(\"Companies with similar business models:\")\r\nprint(results[['Symbol', 'Sector', 'Longbusinesssummary', 'similarity']])\r\n```\r\n\r\n### Focus on Financial Metrics\r\n```python\r\n# Emphasize financial similarity\r\nresults = hv.similarity_search(\r\n query,\r\n block_weights={'text': 0.3, 'numerical': 2.0, 'categorical': 0.5},\r\n top_n=5\r\n)\r\nprint(\"Companies with similar financials:\")\r\nprint(results[['Symbol', 'Currentprice', 'Marketcap', 'similarity']])\r\n```\r\n\r\n### Focus on Industry/Sector\r\n```python\r\n# Emphasize sector/industry similarity\r\nresults = hv.similarity_search(\r\n query,\r\n block_weights={'text': 0.3, 'numerical': 0.5, 'categorical': 2.0},\r\n top_n=5\r\n)\r\nprint(\"Companies in similar sectors:\")\r\nprint(results[['Symbol', 'Sector', 'Industry', 'similarity']])\r\n```\r\n\r\n## Custom Queries\r\n\r\n```python\r\n# Search for specific characteristics\r\ncustom_query = {\r\n 'Sector': 'Technology',\r\n 'Longbusinesssummary': 'cloud computing artificial intelligence',\r\n 'Marketcap': 500000000000, # $500B market cap\r\n 'Fulltimeemployees': 100000\r\n}\r\n\r\nresults = hv.similarity_search(custom_query, top_n=5)\r\nprint(\"Large tech companies with AI/cloud focus:\")\r\nprint(results[['Symbol', 'Sector', 'Marketcap', 'similarity']])\r\n```\r\n\r\n## Real-World Use Cases\r\n\r\n- **Investment Research**: Find companies with similar business models or financials\r\n- **Competitor Analysis**: Identify direct and indirect competitors \r\n- **Portfolio Construction**: Build diversified portfolios based on similarity\r\n- **Market Research**: Understand sector clustering and relationships\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install hybrid-vectorizer\r\n```\r\n\r\n## GPU Support (Recommended for Better Performance)\r\n\r\nFor faster text embedding with large datasets, install GPU-accelerated PyTorch:\r\n\r\n### Check Your CUDA Version\r\n```bash\r\nnvidia-smi\r\n```\r\nLook for \"CUDA Version: X.X\" in the output.\r\n\r\n### Install GPU Support\r\n```bash\r\n# For CUDA 11.8 (most common)\r\npip install hybrid-vectorizer\r\npip install torch --index-url https://download.pytorch.org/whl/cu118\r\n\r\n# For CUDA 12.1\r\npip install torch --index-url https://download.pytorch.org/whl/cu121\r\n```\r\n\r\n### Verify GPU Installation\r\n```python\r\nimport torch\r\nfrom hybrid_vectorizer import HybridVectorizer\r\n\r\nprint(f\"CUDA available: {torch.cuda.is_available()}\")\r\nif torch.cuda.is_available():\r\n print(f\"GPU: {torch.cuda.get_device_name()}\")\r\n\r\n# Test with your data\r\nhv = HybridVectorizer()\r\n# Should show \"GPU detected\" instead of \"Using CPU\"\r\n```\r\n\r\n### Performance Notes\r\n- **CPU**: Works well for datasets <1000 rows\r\n- **GPU**: 5-10x faster for text embedding, recommended for larger datasets\r\n- Text columns benefit most from GPU acceleration\r\n\r\n**Requirements:**\r\n- Python 3.8+\r\n- pandas, numpy, scikit-learn\r\n- sentence-transformers, torch\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\nHybridVectorizer uses a novel late fusion approach for multimodal similarity search:\r\n\r\n\r\n## \u2728 Key Features\r\n\r\n### \ud83d\udd04 **Automatic Data Type Handling**\r\n- **Numerical**: Auto-normalized with MinMaxScaler\r\n- **Categorical**: One-hot or frequency encoding (smart threshold)\r\n- **Text**: SentenceTransformer embeddings\r\n- **Dates**: Extract features or ignore\r\n- **Mixed**: Handles missing values, inf, NaN gracefully\r\n\r\n### \ud83c\udfaf **Powerful Similarity Search**\r\n- **Late Fusion**: Combines modalities with configurable weights\r\n- **Block-level Control**: Weight text vs. numerical vs. categorical separately\r\n- **Explanation**: See which features drive similarity\r\n\r\n### \ud83d\udee0\ufe0f **Production Ready**\r\n- **Memory Efficient**: Optimized for large datasets\r\n- **GPU Support**: Automatic GPU detection for text encoding\r\n- **Persistence**: Save/load trained models\r\n- **Error Handling**: Informative custom exceptions\r\n\r\n## \ud83d\udca1 Usage Examples\r\n\r\n### Basic Usage\r\n```python\r\n# Fit and transform\r\nhv = HybridVectorizer()\r\nvectors = hv.fit_transform(df)\r\n\r\n# Simple query\r\nresults = hv.similarity_search({'description': 'machine learning'})\r\n```\r\n\r\n### Advanced Configuration\r\n```python\r\nhv = HybridVectorizer(\r\n column_encodings={'description': 'text', 'category': 'categorical'},\r\n ignore_columns=['id', 'created_at'],\r\n index_column='id',\r\n onehot_threshold=15,\r\n text_batch_size=64\r\n)\r\n```\r\n\r\n### Weighted Search\r\n```python\r\n# Emphasize text over numerical features\r\nresults = hv.similarity_search(\r\n query,\r\n block_weights={'text': 3, 'categorical': 2, 'numerical': 1}\r\n)\r\n```\r\n\r\n### Text-Only Search\r\n```python\r\nresults = hv.similarity_search(\r\n {'description': 'AI startup'}\r\n)\r\n```\r\n\r\n## \ud83d\udd27 Configuration Options\r\n\r\n| Parameter | Description | Default |\r\n|-----------|-------------|---------|\r\n| `column_encodings` | Manual type overrides | `{}` |\r\n| `ignore_columns` | Skip these columns | `[]` |\r\n| `index_column` | ID column (preserved in results) | `None` |\r\n| `onehot_threshold` | Max categories for one-hot encoding | `10` |\r\n| `default_text_model` | SentenceTransformer model | `'all-MiniLM-L6-v2'` |\r\n| `text_batch_size` | Batch size for text encoding | `128` |\r\n\r\n## \ud83d\udcca Data Type Detection\r\n\r\nHybridVectorizer automatically detects:\r\n\r\n- **Numerical**: `int64`, `float64`, etc. \u2192 MinMax normalization\r\n- **Categorical**: `object` with \u226410 unique values \u2192 One-hot encoding\r\n- **Text**: `object` with >10 unique values \u2192 SentenceTransformer embeddings\r\n- **Dates**: `datetime64` \u2192 Extract year/month/day or ignore\r\n\r\nOverride with `column_encodings={'col': 'text'}` if needed.\r\n\r\n## \ud83c\udf9b\ufe0f Advanced Features\r\n\r\n### Model Persistence\r\n```python\r\n# Save trained model\r\nhv.save('my_vectorizer.pkl')\r\n\r\n# Load later\r\nhv2 = HybridVectorizer.load('my_vectorizer.pkl')\r\nresults = hv2.similarity_search(query)\r\n```\r\n\r\n### Encoding Report\r\n```python\r\n# See how each column was processed\r\nreport = hv.get_encoding_report()\r\nprint(report)\r\n```\r\n\r\n### External Vector Database\r\n```python\r\nimport faiss\r\n\r\n# Use FAISS for faster search\r\nindex = faiss.IndexFlatIP(vectors.shape[1])\r\nindex.add(vectors)\r\nhv.set_vector_db(index)\r\n```\r\n\r\n## \ud83d\udea8 Error Handling\r\n\r\n```python\r\nfrom hybrid_vectorizer import HybridVectorizerError, ModelNotFittedError\r\n\r\ntry:\r\n results = hv.similarity_search(query)\r\nexcept ModelNotFittedError:\r\n print(\"Call fit_transform() first!\")\r\nexcept HybridVectorizerError as e:\r\n print(f\"HybridVectorizer error: {e}\")\r\n```\r\n\r\n## \ud83d\udcc8 Performance\r\n\r\nTypical performance on modern hardware:\r\n\r\n| Dataset Size | Fit Time | Search Time | Memory |\r\n|--------------|----------|-------------|--------|\r\n| 1K rows | <1s | <1ms | ~50MB |\r\n| 10K rows | <10s | <10ms | ~200MB |\r\n| 100K rows | <2min | <100ms | ~1GB |\r\n\r\n*With mixed data types including text columns*\r\n\r\n## \ud83d\udee0\ufe0f Development\r\n\r\n```bash\r\n# Clone repository\r\ngit clone https://github.com/hariharaprabhu/hybrid-vectorizer\r\ncd hybrid-vectorizer\r\n\r\n# Install in development mode\r\npip install -e .\r\n\r\n# Run tests\r\npython tests/test_basic.py\r\n```\r\n\r\n## \ud83d\udcc4 License\r\n\r\nApache-2.0 License - see LICENSE file for details.\r\n\r\n## \ud83d\udcde Support\r\n\r\n- **Issues**: [GitHub Issues](https://github.com/hariharaprabhu/hybrid-vectorizer/issues)\r\n- **Documentation**: See this README and docstrings\r\n- **Questions**: Open an issue for questions or feature requests\r\n\r\n---\r\n\r\n**HybridVectorizer** - Making multimodal similarity search simple and powerful. \ud83d\ude80\r\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Unified embedding for tabular, text, and multimodal data",
"version": "0.1.2",
"project_urls": {
"Documentation": "https://github.com/hariharaprabhu/hybrid-vectorizer#readme",
"Example Notebooks": "https://github.com/hariharaprabhu/hybrid-vectorizer/tree/Examples",
"Homepage": "https://github.com/hariharaprabhu/hybrid-vectorizer",
"Issues": "https://github.com/hariharaprabhu/hybrid-vectorizer/issues",
"Repository": "https://github.com/hariharaprabhu/hybrid-vectorizer"
},
"split_keywords": [
"machine-learning",
" embeddings",
" similarity-search",
" multimodal",
" vectorization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6976bd6e8fbcc1cbc1b65fa5501c9912574261893a09cca11b83a03802223749",
"md5": "5ea0b39cacfb3d66ca65f6ff45431afb",
"sha256": "9443a4fc049ff32ce7cfd79ac6d008e3c22c67c8ac81c87b9bae14bd312cc307"
},
"downloads": -1,
"filename": "hybrid_vectorizer-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ea0b39cacfb3d66ca65f6ff45431afb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 20207,
"upload_time": "2025-08-19T23:17:42",
"upload_time_iso_8601": "2025-08-19T23:17:42.213344Z",
"url": "https://files.pythonhosted.org/packages/69/76/bd6e8fbcc1cbc1b65fa5501c9912574261893a09cca11b83a03802223749/hybrid_vectorizer-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8359aeb77a1fc9fdc6e916a0660424cc4c10c651b00a1ffbc154bdb0f18f3b6c",
"md5": "0709d684fb4cc7b2798a07b625a5c907",
"sha256": "25652734cea77faac7b77373494966526d94df6eea49201a8219264693145e0c"
},
"downloads": -1,
"filename": "hybrid_vectorizer-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "0709d684fb4cc7b2798a07b625a5c907",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 25230,
"upload_time": "2025-08-19T23:17:43",
"upload_time_iso_8601": "2025-08-19T23:17:43.579475Z",
"url": "https://files.pythonhosted.org/packages/83/59/aeb77a1fc9fdc6e916a0660424cc4c10c651b00a1ffbc154bdb0f18f3b6c/hybrid_vectorizer-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-19 23:17:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hariharaprabhu",
"github_project": "hybrid-vectorizer#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"1.9.0"
]
]
},
{
"name": "joblib",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "hybrid-vectorizer"
}