# VoxVersa: Few Shot Language Agnostic Keyword Spotting (FSLAKWS) System
## Overview
**VoxVersa** is an advanced system designed to efficiently detect and classify keywords across multiple languages using few training samples per keyword. The system leverages cutting-edge meta-learning techniques and audio signal processing to create a flexible, scalable, and adaptable keyword spotting model that works across diverse linguistic environments.
The system processes audio at various sample rates (8k-48k) and is capable of quickly learning new keywords and adapting to different audio conditions, making it highly effective for applications in voice-controlled technologies, multilingual customer service, and more.
## Features
- **Few-Shot Learning**: Efficient detection and classification of keywords using very few training samples.
- **Language Agnostic**: Capable of handling keywords in multiple languages without requiring extensive language-specific training data.
- **Audio Flexibility**: Processes audio at multiple sample rates (8kHz to 48kHz).
- **Meta-Learning**: Uses model-agnostic meta-learning techniques for rapid adaptation to new keywords and environments.
- **On-Device Processing**: Enhances user privacy and security by enabling on-device processing.
## Technologies Used
- **Programming Language**: Python
- **Framework**: PyTorch
## Installation
To set up the environment for **VoxVersa**, follow the steps below:
1. **Clone the repository**:
```bash
git clone https://github.com/Kou-shik2004/SIH-2024.git
cd SIH-2024
```
2. **Install dependencies**:
Create a virtual environment and install the required Python packages:
```bash
python3 -m venv venv
source venv/bin/activate # For Windows: venv\Scripts\activate
pip install -r requirements.txt
```
3. **Start with the project**:
```bash
python setup.py install
```
## Usage
Once the environment is set up, you can start training the model on your dataset or testing it on new audio samples.
### 1. Training the Model
To train the model using a custom dataset, use the following command:
```bash
python test_model.py
```
### 2. Inference
To get inference from the model:
```bash
python inference.py
```
## Customizing for Your Own Few-Shot Data
To train the model on your own few-shot data and use it for inference, you'll need to make changes to the `test_model.py` and `inference.py` files. Here are specific instructions based on the current implementation:
### Modifying `test_model.py`:
1. Update the support set:
- Replace the file paths in `support_examples` with your own audio files.
- Update the `classes` list with your own keyword classes.
- Adjust the `int_indices` if necessary.
```python
support_examples = ["./your_clips/keyword1.wav", "./your_clips/keyword2.wav", ...]
classes = ["keyword1", "keyword2", ...]
int_indices = [0, 1, 2, ...]
```
2. Modify the model loading if needed:
- Change the `encoder_name` or `language` parameters to match your use case.
```python
fws_model = model.load(encoder_name="your_encoder", language="your_language", device="cpu")
```
3. Adjust audio processing parameters if necessary:
- Modify `sample_rate` and `frames_per_buffer` to match your audio data.
### Modifying `inference.py`:
1. Update the support set:
- Replace the file paths in `support["paths"]` with your own audio files.
- Update the `support["classes"]` list with your own keyword classes.
- Adjust the `support["labels"]` tensor if necessary.
```python
support = {
"paths": ["./your_clips/keyword1.wav", "./your_clips/keyword2.wav", ...],
"labels": torch.tensor([0, 1, 2, ...]),
"classes": ["keyword1", "keyword2", ...],
}
```
2. Modify the model loading if needed:
- Change the `encoder_name` or `language` parameters to match your use case.
```python
fws_model = model.load(encoder_name="your_encoder", language="your_language", device="cpu")
```
3. Adjust the query processing:
- If you're using different test clips, update the paths in the `query` dictionary.
```python
query = {
"paths": ["./your_test_clips/query1.wav", "./your_test_clips/query2.wav"]
}
```
4. Fine-tune the inference process:
- You may need to adjust the audio processing parameters or prediction threshold based on your specific use case.
Remember to thoroughly test your modifications to ensure they work correctly with your specific dataset and use case. You may also need to update the `requirements.txt` file if you introduce any new dependencies.
## Running the Customized Model
After making the necessary modifications:
1. To train and test the model:
```bash
python test_model.py
```
2. To run inference:
```bash
python inference.py
```
Make sure you have the required audio files in the correct directories before running these scripts.
Raw data
{
"_id": null,
"home_page": "https://github.com/Kou-shik2004/SIH-2024",
"name": "voxws",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Keyword Spotting, Few-shot Learning, Deep Neural Network, Audio, Speech",
"author": "Koushik S",
"author_email": "koushik20040804@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/46/5d/b5f30652dc1bfb62ee3b673247060d0077b44c8212534c9ed02aa813f522/voxws-1.0.1.tar.gz",
"platform": null,
"description": "# VoxVersa: Few Shot Language Agnostic Keyword Spotting (FSLAKWS) System\n\n## Overview\n\n**VoxVersa** is an advanced system designed to efficiently detect and classify keywords across multiple languages using few training samples per keyword. The system leverages cutting-edge meta-learning techniques and audio signal processing to create a flexible, scalable, and adaptable keyword spotting model that works across diverse linguistic environments. \n\nThe system processes audio at various sample rates (8k-48k) and is capable of quickly learning new keywords and adapting to different audio conditions, making it highly effective for applications in voice-controlled technologies, multilingual customer service, and more.\n\n## Features\n\n- **Few-Shot Learning**: Efficient detection and classification of keywords using very few training samples.\n- **Language Agnostic**: Capable of handling keywords in multiple languages without requiring extensive language-specific training data.\n- **Audio Flexibility**: Processes audio at multiple sample rates (8kHz to 48kHz).\n- **Meta-Learning**: Uses model-agnostic meta-learning techniques for rapid adaptation to new keywords and environments.\n- **On-Device Processing**: Enhances user privacy and security by enabling on-device processing.\n\n## Technologies Used\n\n- **Programming Language**: Python\n- **Framework**: PyTorch\n\n## Installation\n\nTo set up the environment for **VoxVersa**, follow the steps below:\n\n1. **Clone the repository**:\n ```bash\n git clone https://github.com/Kou-shik2004/SIH-2024.git\n cd SIH-2024\n ```\n\n2. **Install dependencies**:\n Create a virtual environment and install the required Python packages:\n ```bash\n python3 -m venv venv\n source venv/bin/activate # For Windows: venv\\Scripts\\activate\n pip install -r requirements.txt\n ```\n\n3. **Start with the project**:\n ```bash\n python setup.py install\n ```\n\n## Usage\n\nOnce the environment is set up, you can start training the model on your dataset or testing it on new audio samples.\n\n### 1. Training the Model\nTo train the model using a custom dataset, use the following command:\n```bash\npython test_model.py\n```\n\n### 2. Inference\nTo get inference from the model:\n```bash\npython inference.py\n```\n\n## Customizing for Your Own Few-Shot Data\n\nTo train the model on your own few-shot data and use it for inference, you'll need to make changes to the `test_model.py` and `inference.py` files. Here are specific instructions based on the current implementation:\n\n### Modifying `test_model.py`:\n\n1. Update the support set:\n - Replace the file paths in `support_examples` with your own audio files.\n - Update the `classes` list with your own keyword classes.\n - Adjust the `int_indices` if necessary.\n\n```python\nsupport_examples = [\"./your_clips/keyword1.wav\", \"./your_clips/keyword2.wav\", ...]\nclasses = [\"keyword1\", \"keyword2\", ...]\nint_indices = [0, 1, 2, ...]\n```\n\n2. Modify the model loading if needed:\n - Change the `encoder_name` or `language` parameters to match your use case.\n\n```python\nfws_model = model.load(encoder_name=\"your_encoder\", language=\"your_language\", device=\"cpu\")\n```\n\n3. Adjust audio processing parameters if necessary:\n - Modify `sample_rate` and `frames_per_buffer` to match your audio data.\n\n### Modifying `inference.py`:\n\n1. Update the support set:\n - Replace the file paths in `support[\"paths\"]` with your own audio files.\n - Update the `support[\"classes\"]` list with your own keyword classes.\n - Adjust the `support[\"labels\"]` tensor if necessary.\n\n```python\nsupport = {\n \"paths\": [\"./your_clips/keyword1.wav\", \"./your_clips/keyword2.wav\", ...],\n \"labels\": torch.tensor([0, 1, 2, ...]),\n \"classes\": [\"keyword1\", \"keyword2\", ...],\n}\n```\n\n2. Modify the model loading if needed:\n - Change the `encoder_name` or `language` parameters to match your use case.\n\n```python\nfws_model = model.load(encoder_name=\"your_encoder\", language=\"your_language\", device=\"cpu\")\n```\n\n3. Adjust the query processing:\n - If you're using different test clips, update the paths in the `query` dictionary.\n\n```python\nquery = {\n \"paths\": [\"./your_test_clips/query1.wav\", \"./your_test_clips/query2.wav\"]\n}\n```\n\n4. Fine-tune the inference process:\n - You may need to adjust the audio processing parameters or prediction threshold based on your specific use case.\n\nRemember to thoroughly test your modifications to ensure they work correctly with your specific dataset and use case. You may also need to update the `requirements.txt` file if you introduce any new dependencies.\n\n## Running the Customized Model\n\nAfter making the necessary modifications:\n\n1. To train and test the model:\n ```bash\n python test_model.py\n ```\n\n2. To run inference:\n ```bash\n python inference.py\n ```\n\nMake sure you have the required audio files in the correct directories before running these scripts.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Few Shot Language Agnostic Keyword Spotting (FSLAKWS) System",
"version": "1.0.1",
"project_urls": {
"Download": "https://pypi.org/project/voxws/",
"Homepage": "https://github.com/Kou-shik2004/SIH-2024"
},
"split_keywords": [
"keyword spotting",
" few-shot learning",
" deep neural network",
" audio",
" speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0430f78ae7432514d13099777596cc18c03dd44eab1f933b2480469d03f490f0",
"md5": "3f092dbbba0b48aae276643a877f4653",
"sha256": "b3c9a80acb487367add1f5096897ae681459706232c5061f1470239cbfd17b13"
},
"downloads": -1,
"filename": "voxws-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3f092dbbba0b48aae276643a877f4653",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6006,
"upload_time": "2024-09-09T04:16:21",
"upload_time_iso_8601": "2024-09-09T04:16:21.036727Z",
"url": "https://files.pythonhosted.org/packages/04/30/f78ae7432514d13099777596cc18c03dd44eab1f933b2480469d03f490f0/voxws-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "465db5f30652dc1bfb62ee3b673247060d0077b44c8212534c9ed02aa813f522",
"md5": "c92c6d448af1d1478c19b88b5ef6f1d1",
"sha256": "3cb8b5db5537f921718d2e46efc73e0c21fd82ffef35b8ee4da74693cf2134b3"
},
"downloads": -1,
"filename": "voxws-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "c92c6d448af1d1478c19b88b5ef6f1d1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5681,
"upload_time": "2024-09-09T04:16:22",
"upload_time_iso_8601": "2024-09-09T04:16:22.697714Z",
"url": "https://files.pythonhosted.org/packages/46/5d/b5f30652dc1bfb62ee3b673247060d0077b44c8212534c9ed02aa813f522/voxws-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-09 04:16:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Kou-shik2004",
"github_project": "SIH-2024",
"github_not_found": true,
"lcname": "voxws"
}