# Transcribes audio files
## pip install audiotranser
#### Tested against Windows 10 / Python 3.10 / Anaconda
Uses the models from https://huggingface.co/ggerganov/whisper.cpp/tree/main
```python
Args:
inputfile: path to the input audio file
small_large: model size (small or large)
blas: use BLAS library for faster decoding
silence_threshold: silence threshold in milliseconds
min_silence_len: minimum silence length in milliseconds
keep_silence: minimum silence length to keep after silence removal
threads: number of threads to use
processors: number of processors to use
offset_t: time offset in milliseconds
offset_n: segment index offset
duration: duration of audio to process in milliseconds
max_context: maximum number of text context tokens to store
max_len: maximum segment length in characters
best_of: number of best candidates to keep
beam_size: beam size for beam search
word_thold: word timestamp probability threshold
entropy_thold: entropy threshold for decoder fail
logprob_thold: log probability threshold for decoder fail
speed_up: speed up audio by x2 (reduced accuracy)
translate: translate from source language to english
diarize: stereo audio diarization
language: spoken language ('auto' for auto_detect)
Returns:
Pandas DataFrame with the results of the inference or the path to the output CSV file if pd.read_csv fails.
from audiotranser import transcribe_audio
df = transcribe_audio(
inputfile=r"C:\untitled.wav",
small_large="large",
blas=True,
silence_threshold=-30, # ignored if == 0 or None
min_silence_len=500, # ignored if silence_threshold == 0 or None
keep_silence=1000, # ignored if silence_threshold == 0 or None
threads=3, # number of threads to use during computation
processors=1, # number of processors to use during computation
offset_t=0, # time offset in milliseconds
offset_n=0, # segment index offset
duration=0, # duration of audio to process in milliseconds
max_context=-1, # maximum number of text context tokens to store
max_len=0, # maximum segment length in characters
best_of=5, # number of best candidates to keep
beam_size=-1, # beam size for beam search
word_thold=0.01, # word timestamp probability threshold
entropy_thold=2.40, # entropy threshold for decoder fail
logprob_thold=-1.00, # log probability threshold for decoder fail
speed_up=True, # speed up audio by x2 (reduced accuracy)
translate=False, # translate from source language to english
diarize=False, # stereo audio diarization
language="en", # spoken language ('auto' for auto_detect)
)
print(df)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/audiotranser",
"name": "audiotranser",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "audio,Transcribe",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/29/95/9de2149217988d09d7836d3aa2f695f02ae85df1e003e671361083fece2b/audiotranser-0.10.tar.gz",
"platform": null,
"description": "\r\n# Transcribes audio files \r\n\r\n## pip install audiotranser \r\n\r\n#### Tested against Windows 10 / Python 3.10 / Anaconda \r\n\r\nUses the models from https://huggingface.co/ggerganov/whisper.cpp/tree/main\r\n\r\n```python\r\n Args:\r\n inputfile: path to the input audio file\r\n small_large: model size (small or large)\r\n blas: use BLAS library for faster decoding\r\n silence_threshold: silence threshold in milliseconds\r\n min_silence_len: minimum silence length in milliseconds\r\n keep_silence: minimum silence length to keep after silence removal\r\n threads: number of threads to use\r\n processors: number of processors to use\r\n offset_t: time offset in milliseconds\r\n offset_n: segment index offset\r\n duration: duration of audio to process in milliseconds\r\n max_context: maximum number of text context tokens to store\r\n max_len: maximum segment length in characters\r\n best_of: number of best candidates to keep\r\n beam_size: beam size for beam search\r\n word_thold: word timestamp probability threshold\r\n entropy_thold: entropy threshold for decoder fail\r\n logprob_thold: log probability threshold for decoder fail\r\n speed_up: speed up audio by x2 (reduced accuracy)\r\n translate: translate from source language to english\r\n diarize: stereo audio diarization\r\n language: spoken language ('auto' for auto_detect)\r\n\r\n Returns:\r\n Pandas DataFrame with the results of the inference or the path to the output CSV file if pd.read_csv fails.\r\n\r\nfrom audiotranser import transcribe_audio\r\ndf = transcribe_audio(\r\n inputfile=r\"C:\\untitled.wav\",\r\n small_large=\"large\",\r\n blas=True,\r\n silence_threshold=-30, # ignored if == 0 or None\r\n min_silence_len=500, # ignored if silence_threshold == 0 or None\r\n keep_silence=1000, # ignored if silence_threshold == 0 or None\r\n threads=3, # number of threads to use during computation\r\n processors=1, # number of processors to use during computation\r\n offset_t=0, # time offset in milliseconds\r\n offset_n=0, # segment index offset\r\n duration=0, # duration of audio to process in milliseconds\r\n max_context=-1, # maximum number of text context tokens to store\r\n max_len=0, # maximum segment length in characters\r\n best_of=5, # number of best candidates to keep\r\n beam_size=-1, # beam size for beam search\r\n word_thold=0.01, # word timestamp probability threshold\r\n entropy_thold=2.40, # entropy threshold for decoder fail\r\n logprob_thold=-1.00, # log probability threshold for decoder fail\r\n speed_up=True, # speed up audio by x2 (reduced accuracy)\r\n translate=False, # translate from source language to english\r\n diarize=False, # stereo audio diarization\r\n language=\"en\", # spoken language ('auto' for auto_detect)\r\n)\r\nprint(df)\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Transcribes audio files",
"version": "0.10",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/audiotranser"
},
"split_keywords": [
"audio",
"transcribe"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "380f8c0a2ec09dd91caf445112310574ddd0e217598ccff9a166ff4c66ed37e1",
"md5": "36dbed10f37d5af710339c97600a26c9",
"sha256": "5e6d51355d5086f44ce3f2b0a43faffb7af58a2c3de06a960d21a6862dd7d765"
},
"downloads": -1,
"filename": "audiotranser-0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "36dbed10f37d5af710339c97600a26c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14182474,
"upload_time": "2023-08-06T08:39:37",
"upload_time_iso_8601": "2023-08-06T08:39:37.692337Z",
"url": "https://files.pythonhosted.org/packages/38/0f/8c0a2ec09dd91caf445112310574ddd0e217598ccff9a166ff4c66ed37e1/audiotranser-0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "29959de2149217988d09d7836d3aa2f695f02ae85df1e003e671361083fece2b",
"md5": "258564d7a0a32b48ab05b29dd42089d3",
"sha256": "f60c1f2b32d281365efbcb1ee8a01f2788eb61f6b4e004e6ffb659952a2b4253"
},
"downloads": -1,
"filename": "audiotranser-0.10.tar.gz",
"has_sig": false,
"md5_digest": "258564d7a0a32b48ab05b29dd42089d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14035543,
"upload_time": "2023-08-06T08:39:51",
"upload_time_iso_8601": "2023-08-06T08:39:51.735011Z",
"url": "https://files.pythonhosted.org/packages/29/95/9de2149217988d09d7836d3aa2f695f02ae85df1e003e671361083fece2b/audiotranser-0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-06 08:39:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "audiotranser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "audiotranser"
}