vocabulous

Name	vocabulous JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	A bootstrapping language detection system that builds dictionaries from noisy and ambiguous training data
upload_time	2025-07-20 15:54:10
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	bootstrapping dictionary-building language-detection machine-learning nlp text-classification
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Vocabulous

A bootstrapping language detection system that builds high-quality dictionaries from noisy training data.

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://badge.fury.io/py/vocabulous.svg)](https://badge.fury.io/py/vocabulous)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Vocabulous Tests](https://github.com/omarkamali/vocabulous/actions/workflows/pytest.yml/badge.svg)](https://github.com/omarkamali/vocabulous/actions/workflows/pytest.yml)


## Overview

Vocabulous addresses a common challenge in NLP: building reliable language detection systems when you only have noisy, potentially mislabeled training data. Traditional approaches either require clean, manually curated datasets or sophisticated neural networks. Vocabulous takes a different approach by using iterative dictionary building and progressive data cleaning to bootstrap accurate language detection from imperfect data.

### Key Features

- **Bootstrapping from Noisy Data**: Starts with potentially mislabeled training data and iteratively improves
- **Dictionary-Based Detection**: Uses word frequency dictionaries for fast, interpretable language detection  
- **Progressive Data Cleaning**: Removes ambiguous and mislabeled samples across training cycles
- **Multi-Script Support**: Handles both Latin and Arabic scripts with appropriate text normalization
- **Configurable Training**: Adjustable confidence thresholds and confidence margin parameters for different scenarios
- **Model Persistence**: Save and load trained models for reuse

## Installation

```bash
pip install vocabulous
```

### Development Installation

```bash
git clone https://github.com/omarkamali/vocabulous.git
cd vocabulous
pip install -e .
```

## Quick Start

```python
from vocabulous import Vocabulous

# Initialize model
model = Vocabulous()

# Prepare training data (list of dicts with 'text' and 'lang' keys)
train_data = [
    {'text': 'Hello world', 'lang': 'en'},
    {'text': 'Bonjour le monde', 'lang': 'fr'},
    {'text': 'Hola mundo', 'lang': 'es'},
    # ... more training examples
]

# Evaluation data for monitoring training progress
eval_data = [
    {'text': 'Good morning', 'lang': 'en'},
    {'text': 'Bon matin', 'lang': 'fr'},
    {'text': 'Buenos días', 'lang': 'es'},
]

# Train the model
model, report = model.train(
    train_data=train_data,
    eval_data=eval_data,
    cycles=3,
    base_confidence=0.5,
    confidence_margin=0.3
)

# Use for language detection
scores = model._score_sentence("Hello there")
print(scores)  # {'en': 1.0}

# Save the model
model.save('my_model.json')

# Load later
loaded_model = Vocabulous.load('my_model.json')
```

## Methodology

### The Bootstrapping Approach

Vocabulous implements a novel bootstrapping methodology for language detection:

1. **Initial Dictionary Building**: Creates word-language frequency dictionaries from all training data
2. **Scoring & Evaluation**: Scores evaluation data to measure current model performance
3. **Data Cleaning**: Removes training samples that contradict the current dictionaries
4. **Iteration**: Repeats the process with cleaned data to progressively improve quality

### Why This Works

The approach is based on several key insights:

- **Majority Signal**: Even noisy datasets typically contain more correct than incorrect labels
- **Word Uniqueness**: Many words are language-specific and provide strong signals
- **Progressive Refinement**: Each iteration removes the most problematic samples first
- **Convergence**: The process naturally converges when no more samples can be confidently removed

### Training Parameters

- **`cycles`**: Number of training iterations (default: 2)
- **`base_confidence`**: Minimum score threshold for keeping samples (0-1)
- **`confidence_margin`**: Minimum difference between top two language scores (0-1)

Higher values make the filtering more aggressive, while lower values are more permissive.

## Use Cases

### 1. Bootstrapping Language Detection

**Scenario**: You have a large dataset of multilingual text with potentially noisy language labels.

```python
# Start with noisy data
noisy_data = [
    {'text': 'Hello world', 'lang': 'en'},
    {'text': 'Bonjour', 'lang': 'en'},  # Mislabeled!
    {'text': 'Hello', 'lang': 'fr'},    # Mislabeled!
    {'text': 'Comment ça va?', 'lang': 'fr'},
    # ... thousands more with ~10% label noise
]

model = Vocabulous()
model, report = model.train(noisy_data, eval_data, cycles=3)

# The model learns to ignore mislabeled samples
print(f"Dictionary size: {len(model.word_lang_freq)}")
print(f"Final accuracy: {report['cycle_reports'][-1]['accuracy']:.3f}")
```

### 2. Data Cleaning Pipeline

**Scenario**: Clean a noisy multilingual dataset before using it for other NLP tasks.

```python
# Train model on subset of data
model, _ = model.train(sample_data, eval_data)

# Clean the full dataset
cleaned_dataset = model.clean(full_noisy_dataset)

# Now use cleaned_dataset for training other models
print(f"Kept {len(cleaned_dataset)}/{len(full_noisy_dataset)} samples")
```

### 3. Incremental Learning

**Scenario**: Continuously improve language detection as new data becomes available.

```python
# Initial training
model, _ = model.train(initial_data, eval_data)

# Later, integrate new data
model, updated_report = model.train(
    new_data + initial_data,  # Combine old and new
    eval_data,
    cycles=2
)
```

### 4. Cross-Domain Adaptation

**Scenario**: Adapt a model trained on one domain (e.g., news) to another (e.g., social media).

```python
# Train on news data
news_model, _ = news_model.train(news_data, news_eval)

# Adapt to social media by combining datasets
adapted_model = Vocabulous()
adapted_model, _ = adapted_model.train(
    social_media_data + news_data,
    social_media_eval,
    cycles=3,
    base_confidence=0.3  # Lower threshold for noisy social media text
)
```

## Advanced Usage

### Custom Text Preprocessing

```python
# Subclass to customize text cleaning
class CustomVocabulous(Vocabulous):
    def _clean_text(self, text):
        # Add custom preprocessing
        text = super()._clean_text(text)
        # Your custom logic here
        return text
```

### Training Monitoring

```python
model, report = model.train(train_data, eval_data, cycles=5)

# Analyze training progress
for i, cycle_report in enumerate(report['cycle_reports']):
    print(f"Cycle {i+1}:")
    print(f"  Accuracy: {cycle_report['accuracy']:.3f}")
    print(f"  F1 Score: {cycle_report['f1']:.3f}")
    print(f"  Samples removed: {cycle_report['removed_samples']}")
    print(f"  Confidence Margin: {cycle_report['confidence_margin']:.3f}")
```

### Confidence Scoring

```python
# Get detailed scores for a sentence
scores = model._score_sentence("Hello world")
# Returns: {'en': 0.75, 'fr': 0.25}

# For datasets
scored_df = model._score(test_data)
print(scored_df[['text', 'scores', 'lang']])
```

## Performance Tips

### Memory Optimization

```python
# For large datasets, disable training data storage
model = Vocabulous(store_training_data=False)
```

### Speed Optimization

```python
# Use fewer cycles for faster training
model, _ = model.train(data, eval_data, cycles=1)

# Lower confidence margin for less aggressive filtering
model, _ = model.train(data, eval_data, confidence_margin=0.1)
```

### Quality Optimization

```python
# More cycles for higher quality
model, _ = model.train(data, eval_data, cycles=5)

# Higher confidence threshold for cleaner dictionaries
model, _ = model.train(data, eval_data, base_confidence=0.7)
```

## Evaluation Metrics

Vocabulous provides comprehensive evaluation metrics:

- **Accuracy**: Overall classification accuracy
- **Precision/Recall/F1**: Per-language and macro-averaged metrics
- **Confusion Score**: Measures how often languages are confused with each other
- **Confidence Margin**: Average difference between top two language scores (higher = more confident)

## Limitations

1. **Vocabulary-Based**: Works best with languages that have distinct vocabularies
2. **Training Data Size**: Requires sufficient training data for each language
3. **Script Mixing**: May struggle with code-switched text within sentences
4. **Short Text**: Performance degrades on very short texts (1-2 words)

## API Reference

### Core Classes

#### `Vocabulous(store_training_data=False)`

Main class for language detection and training.

**Parameters:**
- `store_training_data` (bool): Whether to store training data internally

#### Methods

##### `train(train_df, eval_df, cycles=2, base_confidence=0.5, confidence_margin=0.5)`

Train the model on provided data.

**Parameters:**
- `train_df`: Training data (list of dicts or DataFrame)
- `eval_df`: Evaluation data (list of dicts or DataFrame)
- `cycles` (int): Number of training cycles
- `base_confidence` (float): Minimum confidence threshold
- `confidence_margin` (float): Minimum score difference threshold

**Returns:**
- `(model, report)`: Updated model and training report

##### `clean(dataset)`

Clean a dataset by filtering confident predictions.

**Parameters:**
- `dataset`: DataFrame with 'text' and 'lang' columns

**Returns:**
- DataFrame with confident predictions only

##### `save(path)` / `load(path)`

Save/load model to/from JSON file.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

```bash
git clone https://github.com/omarkamali/vocabulous.git
cd vocabulous
pip install -e ".[dev]"
pytest tests/
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

This project is supported by Omneity Labs, a research lab focused on building NLP and generative AI models for low-resource languages and techniques for cultural alignment.

## Citation

If you use Vocabulous in your research, please cite:

```bibtex
@software{vocabulous2025,
  title={Vocabulous: Bootstrapping Language Detection from Noisy \& Ambiguous Data},
  author={Omar Kamali},
  year={2025},
  url={https://github.com/omarkamali/vocabulous},
  note={Project developed under Omneity Labs}
}
```

## Support

- **Issues**: [GitHub Issues](https://github.com/omarkamali/vocabulous/issues)
- **Discussions**: [GitHub Discussions](https://github.com/omarkamali/vocabulous/discussions)
- **Documentation**: [GitHub README](https://github.com/omarkamali/vocabulous#readme)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vocabulous",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "bootstrapping, dictionary-building, language-detection, machine-learning, nlp, text-classification",
    "author": null,
    "author_email": "Omar Kamali <unscript@omarkama.li>",
    "download_url": "https://files.pythonhosted.org/packages/62/2c/826a774ce09aa6f3153b5b25d035295d7e315196a91959a1fab3a6f6de22/vocabulous-0.0.1.tar.gz",
    "platform": null,
    "description": "# Vocabulous\n\nA bootstrapping language detection system that builds high-quality dictionaries from noisy training data.\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![PyPI version](https://badge.fury.io/py/vocabulous.svg)](https://badge.fury.io/py/vocabulous)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Vocabulous Tests](https://github.com/omarkamali/vocabulous/actions/workflows/pytest.yml/badge.svg)](https://github.com/omarkamali/vocabulous/actions/workflows/pytest.yml)\n\n\n## Overview\n\nVocabulous addresses a common challenge in NLP: building reliable language detection systems when you only have noisy, potentially mislabeled training data. Traditional approaches either require clean, manually curated datasets or sophisticated neural networks. Vocabulous takes a different approach by using iterative dictionary building and progressive data cleaning to bootstrap accurate language detection from imperfect data.\n\n### Key Features\n\n- **Bootstrapping from Noisy Data**: Starts with potentially mislabeled training data and iteratively improves\n- **Dictionary-Based Detection**: Uses word frequency dictionaries for fast, interpretable language detection  \n- **Progressive Data Cleaning**: Removes ambiguous and mislabeled samples across training cycles\n- **Multi-Script Support**: Handles both Latin and Arabic scripts with appropriate text normalization\n- **Configurable Training**: Adjustable confidence thresholds and confidence margin parameters for different scenarios\n- **Model Persistence**: Save and load trained models for reuse\n\n## Installation\n\n```bash\npip install vocabulous\n```\n\n### Development Installation\n\n```bash\ngit clone https://github.com/omarkamali/vocabulous.git\ncd vocabulous\npip install -e .\n```\n\n## Quick Start\n\n```python\nfrom vocabulous import Vocabulous\n\n# Initialize model\nmodel = Vocabulous()\n\n# Prepare training data (list of dicts with 'text' and 'lang' keys)\ntrain_data = [\n    {'text': 'Hello world', 'lang': 'en'},\n    {'text': 'Bonjour le monde', 'lang': 'fr'},\n    {'text': 'Hola mundo', 'lang': 'es'},\n    # ... more training examples\n]\n\n# Evaluation data for monitoring training progress\neval_data = [\n    {'text': 'Good morning', 'lang': 'en'},\n    {'text': 'Bon matin', 'lang': 'fr'},\n    {'text': 'Buenos d\u00edas', 'lang': 'es'},\n]\n\n# Train the model\nmodel, report = model.train(\n    train_data=train_data,\n    eval_data=eval_data,\n    cycles=3,\n    base_confidence=0.5,\n    confidence_margin=0.3\n)\n\n# Use for language detection\nscores = model._score_sentence(\"Hello there\")\nprint(scores)  # {'en': 1.0}\n\n# Save the model\nmodel.save('my_model.json')\n\n# Load later\nloaded_model = Vocabulous.load('my_model.json')\n```\n\n## Methodology\n\n### The Bootstrapping Approach\n\nVocabulous implements a novel bootstrapping methodology for language detection:\n\n1. **Initial Dictionary Building**: Creates word-language frequency dictionaries from all training data\n2. **Scoring & Evaluation**: Scores evaluation data to measure current model performance\n3. **Data Cleaning**: Removes training samples that contradict the current dictionaries\n4. **Iteration**: Repeats the process with cleaned data to progressively improve quality\n\n### Why This Works\n\nThe approach is based on several key insights:\n\n- **Majority Signal**: Even noisy datasets typically contain more correct than incorrect labels\n- **Word Uniqueness**: Many words are language-specific and provide strong signals\n- **Progressive Refinement**: Each iteration removes the most problematic samples first\n- **Convergence**: The process naturally converges when no more samples can be confidently removed\n\n### Training Parameters\n\n- **`cycles`**: Number of training iterations (default: 2)\n- **`base_confidence`**: Minimum score threshold for keeping samples (0-1)\n- **`confidence_margin`**: Minimum difference between top two language scores (0-1)\n\nHigher values make the filtering more aggressive, while lower values are more permissive.\n\n## Use Cases\n\n### 1. Bootstrapping Language Detection\n\n**Scenario**: You have a large dataset of multilingual text with potentially noisy language labels.\n\n```python\n# Start with noisy data\nnoisy_data = [\n    {'text': 'Hello world', 'lang': 'en'},\n    {'text': 'Bonjour', 'lang': 'en'},  # Mislabeled!\n    {'text': 'Hello', 'lang': 'fr'},    # Mislabeled!\n    {'text': 'Comment \u00e7a va?', 'lang': 'fr'},\n    # ... thousands more with ~10% label noise\n]\n\nmodel = Vocabulous()\nmodel, report = model.train(noisy_data, eval_data, cycles=3)\n\n# The model learns to ignore mislabeled samples\nprint(f\"Dictionary size: {len(model.word_lang_freq)}\")\nprint(f\"Final accuracy: {report['cycle_reports'][-1]['accuracy']:.3f}\")\n```\n\n### 2. Data Cleaning Pipeline\n\n**Scenario**: Clean a noisy multilingual dataset before using it for other NLP tasks.\n\n```python\n# Train model on subset of data\nmodel, _ = model.train(sample_data, eval_data)\n\n# Clean the full dataset\ncleaned_dataset = model.clean(full_noisy_dataset)\n\n# Now use cleaned_dataset for training other models\nprint(f\"Kept {len(cleaned_dataset)}/{len(full_noisy_dataset)} samples\")\n```\n\n### 3. Incremental Learning\n\n**Scenario**: Continuously improve language detection as new data becomes available.\n\n```python\n# Initial training\nmodel, _ = model.train(initial_data, eval_data)\n\n# Later, integrate new data\nmodel, updated_report = model.train(\n    new_data + initial_data,  # Combine old and new\n    eval_data,\n    cycles=2\n)\n```\n\n### 4. Cross-Domain Adaptation\n\n**Scenario**: Adapt a model trained on one domain (e.g., news) to another (e.g., social media).\n\n```python\n# Train on news data\nnews_model, _ = news_model.train(news_data, news_eval)\n\n# Adapt to social media by combining datasets\nadapted_model = Vocabulous()\nadapted_model, _ = adapted_model.train(\n    social_media_data + news_data,\n    social_media_eval,\n    cycles=3,\n    base_confidence=0.3  # Lower threshold for noisy social media text\n)\n```\n\n## Advanced Usage\n\n### Custom Text Preprocessing\n\n```python\n# Subclass to customize text cleaning\nclass CustomVocabulous(Vocabulous):\n    def _clean_text(self, text):\n        # Add custom preprocessing\n        text = super()._clean_text(text)\n        # Your custom logic here\n        return text\n```\n\n### Training Monitoring\n\n```python\nmodel, report = model.train(train_data, eval_data, cycles=5)\n\n# Analyze training progress\nfor i, cycle_report in enumerate(report['cycle_reports']):\n    print(f\"Cycle {i+1}:\")\n    print(f\"  Accuracy: {cycle_report['accuracy']:.3f}\")\n    print(f\"  F1 Score: {cycle_report['f1']:.3f}\")\n    print(f\"  Samples removed: {cycle_report['removed_samples']}\")\n    print(f\"  Confidence Margin: {cycle_report['confidence_margin']:.3f}\")\n```\n\n### Confidence Scoring\n\n```python\n# Get detailed scores for a sentence\nscores = model._score_sentence(\"Hello world\")\n# Returns: {'en': 0.75, 'fr': 0.25}\n\n# For datasets\nscored_df = model._score(test_data)\nprint(scored_df[['text', 'scores', 'lang']])\n```\n\n## Performance Tips\n\n### Memory Optimization\n\n```python\n# For large datasets, disable training data storage\nmodel = Vocabulous(store_training_data=False)\n```\n\n### Speed Optimization\n\n```python\n# Use fewer cycles for faster training\nmodel, _ = model.train(data, eval_data, cycles=1)\n\n# Lower confidence margin for less aggressive filtering\nmodel, _ = model.train(data, eval_data, confidence_margin=0.1)\n```\n\n### Quality Optimization\n\n```python\n# More cycles for higher quality\nmodel, _ = model.train(data, eval_data, cycles=5)\n\n# Higher confidence threshold for cleaner dictionaries\nmodel, _ = model.train(data, eval_data, base_confidence=0.7)\n```\n\n## Evaluation Metrics\n\nVocabulous provides comprehensive evaluation metrics:\n\n- **Accuracy**: Overall classification accuracy\n- **Precision/Recall/F1**: Per-language and macro-averaged metrics\n- **Confusion Score**: Measures how often languages are confused with each other\n- **Confidence Margin**: Average difference between top two language scores (higher = more confident)\n\n## Limitations\n\n1. **Vocabulary-Based**: Works best with languages that have distinct vocabularies\n2. **Training Data Size**: Requires sufficient training data for each language\n3. **Script Mixing**: May struggle with code-switched text within sentences\n4. **Short Text**: Performance degrades on very short texts (1-2 words)\n\n## API Reference\n\n### Core Classes\n\n#### `Vocabulous(store_training_data=False)`\n\nMain class for language detection and training.\n\n**Parameters:**\n- `store_training_data` (bool): Whether to store training data internally\n\n#### Methods\n\n##### `train(train_df, eval_df, cycles=2, base_confidence=0.5, confidence_margin=0.5)`\n\nTrain the model on provided data.\n\n**Parameters:**\n- `train_df`: Training data (list of dicts or DataFrame)\n- `eval_df`: Evaluation data (list of dicts or DataFrame)\n- `cycles` (int): Number of training cycles\n- `base_confidence` (float): Minimum confidence threshold\n- `confidence_margin` (float): Minimum score difference threshold\n\n**Returns:**\n- `(model, report)`: Updated model and training report\n\n##### `clean(dataset)`\n\nClean a dataset by filtering confident predictions.\n\n**Parameters:**\n- `dataset`: DataFrame with 'text' and 'lang' columns\n\n**Returns:**\n- DataFrame with confident predictions only\n\n##### `save(path)` / `load(path)`\n\nSave/load model to/from JSON file.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n### Development Setup\n\n```bash\ngit clone https://github.com/omarkamali/vocabulous.git\ncd vocabulous\npip install -e \".[dev]\"\npytest tests/\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nThis project is supported by Omneity Labs, a research lab focused on building NLP and generative AI models for low-resource languages and techniques for cultural alignment.\n\n## Citation\n\nIf you use Vocabulous in your research, please cite:\n\n```bibtex\n@software{vocabulous2025,\n  title={Vocabulous: Bootstrapping Language Detection from Noisy \\& Ambiguous Data},\n  author={Omar Kamali},\n  year={2025},\n  url={https://github.com/omarkamali/vocabulous},\n  note={Project developed under Omneity Labs}\n}\n```\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/omarkamali/vocabulous/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/omarkamali/vocabulous/discussions)\n- **Documentation**: [GitHub README](https://github.com/omarkamali/vocabulous#readme) ",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A bootstrapping language detection system that builds dictionaries from noisy and ambiguous training data",
    "version": "0.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/omarkamali/vocabulous/issues",
        "Documentation": "https://github.com/omarkamali/vocabulous#readme",
        "Homepage": "https://github.com/omarkamali/vocabulous",
        "Repository": "https://github.com/omarkamali/vocabulous.git"
    },
    "split_keywords": [
        "bootstrapping",
        " dictionary-building",
        " language-detection",
        " machine-learning",
        " nlp",
        " text-classification"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ce31a443ea28927f81a3106d868f4c0b6bcee3a51648bb09d0fe1a964ae6d68d",
                "md5": "b8aaaabd96f72c1318d4e5193a5ecf5d",
                "sha256": "0df56a5f127964e1e104f9225e0e088c4affb23e31716046c1f70dac2d67bbd0"
            },
            "downloads": -1,
            "filename": "vocabulous-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b8aaaabd96f72c1318d4e5193a5ecf5d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11733,
            "upload_time": "2025-07-20T15:54:09",
            "upload_time_iso_8601": "2025-07-20T15:54:09.103355Z",
            "url": "https://files.pythonhosted.org/packages/ce/31/a443ea28927f81a3106d868f4c0b6bcee3a51648bb09d0fe1a964ae6d68d/vocabulous-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "622c826a774ce09aa6f3153b5b25d035295d7e315196a91959a1fab3a6f6de22",
                "md5": "74bc0e65a1908047de952c7c51479018",
                "sha256": "e69d17be124df6e29b71f18aea3c429acbed3100be55cf9c21d04b9e104178f4"
            },
            "downloads": -1,
            "filename": "vocabulous-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "74bc0e65a1908047de952c7c51479018",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 14491,
            "upload_time": "2025-07-20T15:54:10",
            "upload_time_iso_8601": "2025-07-20T15:54:10.629160Z",
            "url": "https://files.pythonhosted.org/packages/62/2c/826a774ce09aa6f3153b5b25d035295d7e315196a91959a1fab3a6f6de22/vocabulous-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 15:54:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "omarkamali",
    "github_project": "vocabulous",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vocabulous"
}

None