# hum
Generate synthetic signals for ML pipelines
To install: ```pip install hum```
# functionality
This notebook gathers various examples of the functionality of `hum`:
- Synthetic datasets
- sound-like datasets
- diagnosis datasets
- signal generation
- Plotting and visualization
- plot
- display
- melspectrograms
- Infinite waveform from spectrums
- Various sample sounds
- Voiced time
```python
from hum import (mk_sine_wf,
freq_based_stationary_wf,
BinarySound,
WfGen,
TimeSound,
mk_some_buzz_wf,
wf_with_timed_bleeps,
Sound,
plot_wf,
disp_wf,
InfiniteWaveform,
Voicer,
tell_time_continuously,
random_samples,
pure_tone,
triangular_tone,
square_tone,
AnnotatedWaveform,
gen_words,
categorical_gen,
bernoulli_gen,
create_session,
session_to_df
)
import matplotlib.pyplot as plt
from numpy.random import randint
import numpy as np
```
## Synthetic datasets
There are several different forms of synthetic data that `hum` can produce to be used in machine learning pipelines, with the first being sound-like datasets generally in the form of sine waves
### Sound-like datasets
`mk_sine_wf` provides an easy way to generate a simple waveform for synthetic testing purposes
```python
DFLT_N_SAMPLES = 21 * 2048
DFLT_SR = 44100
wf = mk_sine_wf(freq=5, n_samples=DFLT_N_SAMPLES, sr=DFLT_SR, phase=0, gain=1)
plt.plot(wf);
```

```python
wf = mk_sine_wf(freq=20, n_samples=DFLT_N_SAMPLES, sr=DFLT_SR, phase = 0.25, gain = 3)
plt.plot(wf);
```

`freq_based_stationary_wf` provides the ability to generate a more complex waveform by mixing sine waves of different frequencies with potentially different weights
```python
wf_mix = freq_based_stationary_wf(freqs=(2, 4, 6, 8), weights=None,
n_samples = DFLT_N_SAMPLES, sr = DFLT_SR)
plt.plot(wf_mix);
```

```python
wf_mix = freq_based_stationary_wf(freqs=(2, 4, 6, 8), weights=(3,3,1,1),
n_samples = DFLT_N_SAMPLES, sr = DFLT_SR)
plt.plot(wf_mix);
```

`WfGen` is a class that allows for the generation of sinusoidal waveforms, the generation of lookup tables to be used in generating waveforms, and frequency weighted mixed waveforms
```python
wfgen = WfGen(sr=44100, buf_size_frm=2048, amplitude=0.5)
lookup = np.array(wfgen.mk_lookup_table(freq=880))
wf = wfgen.mk_sine_wf(n_frm=100, freq=880)
```
```python
np.array(lookup).T
```
array([ 0. , 0.06252526, 0.12406892, 0.1836648 , 0.24037727,
0.293316 , 0.34164989, 0.38462013, 0.42155213, 0.45186607,
0.47508605, 0.49084754, 0.49890309, 0.49912624, 0.49151348,
0.47618432, 0.45337943, 0.42345682, 0.38688626, 0.34424188,
0.29619315, 0.24349441, 0.186973 , 0.12751624, 0.06605758,
0.00356187, -0.05898977, -0.12061531, -0.18034728, -0.23724793,
-0.29042397, -0.33904057, -0.38233448, -0.41962604, -0.45032977,
-0.47396367, -0.4901567 , -0.49865463, -0.49932406, -0.49215447,
-0.47725843, -0.45486979, -0.42534003, -0.38913276, -0.34681639,
-0.29905527, -0.2465992 , -0.19027171, -0.13095709, -0.06958655])
```python
plt.plot(wf);
```

```python
wf_weight = wfgen.mk_wf_from_freq_weight_array(n_frm=10000, freq_weight_array=(10,1,6))
plt.plot(wf_weight);
```

### Diagnosis datasets
`hum` can also produce diagnosis datasets to be applied to machine learning pipelines
`BinarySound` is a class that generates binary waveforms
```python
bs = BinarySound(nbits=50, redundancy=142, repetition=3, header_size_words=1)
utc = randint(0,2,50)
wf = bs.mk_phrase(utc)
plt.plot(wf[:200]);
all(bs.decode(wf) == utc)
```
True

`BinarySound` can also be instantiated using audio parameters using the `for_audio_params` class method
```python
bs = BinarySound.for_audio_params(nbits=50, freq=6000, chk_size_frm=43008, sr=44100, header_size_words=1)
wf = bs.mk_phrase(utc)
plt.plot(wf[:200]);
all(bs.decode(wf) == utc)
```
True

utc phrases can be generated using `mk_utc_phrases` when `BinarySound` is instantiated with audio parameters
```python
plt.plot(bs.mk_utc_phrases()[:200]);
```

`TimeSound` is a class that generates timestamped waveform data
```python
time = TimeSound(sr=44100, buf_size_frm=2048, amplitude=0.5, n_ums_bits=30)
wf = time.timestamped_wf()
plt.plot(wf[2000:2300]);
```

`mk_some_buzz_wf` and `wf_with_timed_bleeps` are two more options to generate synthetic data of diagnosis sounds
```python
wf = mk_some_buzz_wf(sr=DFLT_SR)
plt.plot(wf[:500]);
```

```python
wf = wf_with_timed_bleeps(n_samples=DFLT_SR*2, bleep_loc=400, bleep_spec=100, sr=DFLT_SR)
plt.plot(wf[:150]);
```

### Signal generation
`hum` can create signals generated by sequences of symbols, perturbed by outliers injected at given points
```python
symb_res = categorical_gen(gen_words)
out_res = bernoulli_gen(p_out=0.01)
df = session_to_df(create_session(symb_res, out_res, alphabet=list('abcde'), session_length=500))
df.plot(subplots=True, figsize=(20,7));
```

## Plotting and visualization
`hum` also provides several options for plotting and visualization for the synthetic datasets it generates
```python
wfgen = WfGen()
wf = list()
for i in range(1, 1000, 20):
wf.extend(list(wfgen.mk_sine_wf(n_frm=2048, freq=i)))
wf = np.array(wf)
sr = 44100
```
### Plot waveform
```python
plot_wf(wf[:20000], sr)
```

### Display waveform
```python
disp_wf(wf, sr)
```

### Melspectrograms with `Sound`
```python
snd = Sound(wf=wf, sr=sr)
```
```python
snd.plot_wf(wf=wf[:20000], sr=sr)
```

```python
snd.melspectrogram(plot_it=False)
```
array([[-63.34856485, -45.14910401, -36.14726097, ..., -80. ,
-73.35788085, -60.58728436],
[-67.99632241, -74.80503122, -80. , ..., -80. ,
-72.1600597 , -60.16803079],
[-80. , -80. , -80. , ..., -80. ,
-72.90050429, -60.90871386],
...,
[-80. , -80. , -80. , ..., -80. ,
-80. , -80. ],
[-80. , -80. , -80. , ..., -80. ,
-80. , -80. ],
[-80. , -80. , -80. , ..., -80. ,
-80. , -80. ]])
```python
snd.display()
```

## Infinite waveform from spectrum
`hum` also provides the functionality to create an infinite waveform based on a given spectrum, and a noise amplifier if desired
```python
iwf = InfiniteWaveform(wf)
```
```python
wf = list(iwf.query(0,500000))
```
```python
disp_wf(wf)
```

```python
Sound(wf=wf).display()
```

## Sample sounds
`hum` also provides several functions to generate sample sounds shown below
### Random sample
```python
wf = random_samples(chk_size_frm=21*2048, max_amplitude=30000)
disp_wf(wf=wf, sr=sr)
Sound(wf=wf).display()
```


### Pure tone sample
```python
wf = pure_tone(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)
disp_wf(wf=wf, sr=sr)
Sound(wf=wf).display()
```
/Users/owenlloyd/opt/anaconda3/envs/oto3/lib/python3.8/site-packages/matplotlib/axes/_axes.py:7723: RuntimeWarning: divide by zero encountered in log10
Z = 10. * np.log10(spec)


### Triangular tone sample
```python
wf = triangular_tone(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)
disp_wf(wf=wf, sr=sr)
Sound(wf=wf).display()
```


### Square tone sample
```python
wf = square_tone(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)
disp_wf(wf=wf, sr=sr)
Sound(wf=wf).display()
```


### Annotated Waveform
```python
awf = AnnotatedWaveform(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)
gen = awf.chk_and_tag_gen()
list(gen)
```
[(array([-14025, 11555, 22270, ..., 10243, -18225, 3874], dtype=int16),
'random'),
(array([ 0, 1902, 3797, ..., 9361, 11149, 12893], dtype=int16),
'pure_tone'),
(array([-30000, -29900, -29800, ..., 10500, 10600, 10700], dtype=int16),
'triangular_tone'),
(array([30000, 30000, 30000, ..., 30000, 30000, 30000], dtype=int16),
'square_tone')]
```python
awf.get_wf_and_annots()
```
(array([ 5183, 10421, -21645, ..., 30000, 30000, 30000], dtype=int16),
{'random': [(0, 43008)],
'pure_tone': [(43008, 86016)],
'triangular_tone': [(86016, 129024)],
'square_tone': [(129024, 172032)]})
## Voiced time
Finally `hum` provides a function that will tell the time continuously with parameters for the frequency, speed, voice, volume, and time format
```python
tell_time_continuously(every_secs=5, verbose=True)
```
15 45 11
15 45 16
15 45 21
15 45 26
KeyboardInterrupt!!!
Raw data
{
"_id": null,
"home_page": "https://github.com/thorwhalen/hum",
"name": "hum",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "data, signal processing, audio",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/b0/1b/9ad601356a2e8c2f753995e47fbeb11759893b9ec7c4396fdbc7d9a0d421/hum-0.1.33.tar.gz",
"platform": "any",
"description": "\n# hum\nGenerate synthetic signals for ML pipelines\n\n\nTo install:\t```pip install hum```\n\n# functionality\nThis notebook gathers various examples of the functionality of `hum`:\n- Synthetic datasets\n - sound-like datasets\n - diagnosis datasets\n - signal generation\n- Plotting and visualization\n - plot\n - display\n - melspectrograms\n- Infinite waveform from spectrums\n- Various sample sounds\n- Voiced time\n\n\n```python\nfrom hum import (mk_sine_wf, \n freq_based_stationary_wf, \n BinarySound, \n WfGen, \n TimeSound, \n mk_some_buzz_wf, \n wf_with_timed_bleeps,\n Sound,\n plot_wf,\n disp_wf,\n InfiniteWaveform,\n Voicer, \n tell_time_continuously,\n random_samples,\n pure_tone,\n triangular_tone,\n square_tone,\n AnnotatedWaveform,\n gen_words,\n categorical_gen,\n bernoulli_gen,\n create_session,\n session_to_df\n )\nimport matplotlib.pyplot as plt\nfrom numpy.random import randint\nimport numpy as np\n```\n\n## Synthetic datasets\nThere are several different forms of synthetic data that `hum` can produce to be used in machine learning pipelines, with the first being sound-like datasets generally in the form of sine waves \n\n### Sound-like datasets\n\n`mk_sine_wf` provides an easy way to generate a simple waveform for synthetic testing purposes\n\n\n```python\nDFLT_N_SAMPLES = 21 * 2048\nDFLT_SR = 44100\nwf = mk_sine_wf(freq=5, n_samples=DFLT_N_SAMPLES, sr=DFLT_SR, phase=0, gain=1)\nplt.plot(wf);\n```\n\n\n \n\n \n\n\n\n```python\nwf = mk_sine_wf(freq=20, n_samples=DFLT_N_SAMPLES, sr=DFLT_SR, phase = 0.25, gain = 3)\nplt.plot(wf);\n```\n\n\n \n\n \n\n\n`freq_based_stationary_wf` provides the ability to generate a more complex waveform by mixing sine waves of different frequencies with potentially different weights\n\n\n```python\nwf_mix = freq_based_stationary_wf(freqs=(2, 4, 6, 8), weights=None,\n n_samples = DFLT_N_SAMPLES, sr = DFLT_SR)\nplt.plot(wf_mix);\n```\n\n\n \n\n \n\n\n\n```python\nwf_mix = freq_based_stationary_wf(freqs=(2, 4, 6, 8), weights=(3,3,1,1),\n n_samples = DFLT_N_SAMPLES, sr = DFLT_SR)\nplt.plot(wf_mix);\n```\n\n\n \n\n \n\n\n`WfGen` is a class that allows for the generation of sinusoidal waveforms, the generation of lookup tables to be used in generating waveforms, and frequency weighted mixed waveforms\n\n\n```python\nwfgen = WfGen(sr=44100, buf_size_frm=2048, amplitude=0.5)\nlookup = np.array(wfgen.mk_lookup_table(freq=880))\nwf = wfgen.mk_sine_wf(n_frm=100, freq=880)\n```\n\n\n```python\nnp.array(lookup).T\n```\n\n\n\n\n array([ 0. , 0.06252526, 0.12406892, 0.1836648 , 0.24037727,\n 0.293316 , 0.34164989, 0.38462013, 0.42155213, 0.45186607,\n 0.47508605, 0.49084754, 0.49890309, 0.49912624, 0.49151348,\n 0.47618432, 0.45337943, 0.42345682, 0.38688626, 0.34424188,\n 0.29619315, 0.24349441, 0.186973 , 0.12751624, 0.06605758,\n 0.00356187, -0.05898977, -0.12061531, -0.18034728, -0.23724793,\n -0.29042397, -0.33904057, -0.38233448, -0.41962604, -0.45032977,\n -0.47396367, -0.4901567 , -0.49865463, -0.49932406, -0.49215447,\n -0.47725843, -0.45486979, -0.42534003, -0.38913276, -0.34681639,\n -0.29905527, -0.2465992 , -0.19027171, -0.13095709, -0.06958655])\n\n\n\n\n```python\nplt.plot(wf);\n```\n\n\n \n\n \n\n\n\n```python\nwf_weight = wfgen.mk_wf_from_freq_weight_array(n_frm=10000, freq_weight_array=(10,1,6))\nplt.plot(wf_weight);\n```\n\n\n \n\n \n\n\n### Diagnosis datasets\n`hum` can also produce diagnosis datasets to be applied to machine learning pipelines\n\n`BinarySound` is a class that generates binary waveforms\n\n\n```python\nbs = BinarySound(nbits=50, redundancy=142, repetition=3, header_size_words=1)\nutc = randint(0,2,50)\nwf = bs.mk_phrase(utc)\nplt.plot(wf[:200]);\nall(bs.decode(wf) == utc)\n```\n\n\n\n\n True\n\n\n\n\n \n\n \n\n\n`BinarySound` can also be instantiated using audio parameters using the `for_audio_params` class method\n\n\n```python\nbs = BinarySound.for_audio_params(nbits=50, freq=6000, chk_size_frm=43008, sr=44100, header_size_words=1)\nwf = bs.mk_phrase(utc)\nplt.plot(wf[:200]);\nall(bs.decode(wf) == utc)\n```\n\n\n\n\n True\n\n\n\n\n \n\n \n\n\nutc phrases can be generated using `mk_utc_phrases` when `BinarySound` is instantiated with audio parameters\n\n\n```python\nplt.plot(bs.mk_utc_phrases()[:200]);\n```\n\n\n \n\n \n\n\n`TimeSound` is a class that generates timestamped waveform data\n\n\n```python\ntime = TimeSound(sr=44100, buf_size_frm=2048, amplitude=0.5, n_ums_bits=30)\nwf = time.timestamped_wf()\nplt.plot(wf[2000:2300]);\n```\n\n\n \n\n \n\n\n`mk_some_buzz_wf` and `wf_with_timed_bleeps` are two more options to generate synthetic data of diagnosis sounds\n\n\n```python\nwf = mk_some_buzz_wf(sr=DFLT_SR)\nplt.plot(wf[:500]);\n```\n\n\n \n\n \n\n\n\n```python\nwf = wf_with_timed_bleeps(n_samples=DFLT_SR*2, bleep_loc=400, bleep_spec=100, sr=DFLT_SR)\nplt.plot(wf[:150]);\n```\n\n\n \n\n \n\n\n### Signal generation\n\n`hum` can create signals generated by sequences of symbols, perturbed by outliers injected at given points\n\n\n```python\nsymb_res = categorical_gen(gen_words)\nout_res = bernoulli_gen(p_out=0.01)\ndf = session_to_df(create_session(symb_res, out_res, alphabet=list('abcde'), session_length=500))\ndf.plot(subplots=True, figsize=(20,7));\n```\n\n\n \n\n \n\n\n## Plotting and visualization\n`hum` also provides several options for plotting and visualization for the synthetic datasets it generates\n\n\n```python\nwfgen = WfGen()\nwf = list()\nfor i in range(1, 1000, 20):\n wf.extend(list(wfgen.mk_sine_wf(n_frm=2048, freq=i)))\nwf = np.array(wf)\nsr = 44100\n```\n\n### Plot waveform\n\n\n```python\nplot_wf(wf[:20000], sr)\n```\n\n\n \n\n \n\n\n### Display waveform\n\n\n```python\ndisp_wf(wf, sr)\n```\n\n\n \n\n \n\n\n### Melspectrograms with `Sound`\n\n\n```python\nsnd = Sound(wf=wf, sr=sr)\n```\n\n\n```python\nsnd.plot_wf(wf=wf[:20000], sr=sr)\n```\n\n\n \n\n \n\n\n\n```python\nsnd.melspectrogram(plot_it=False)\n```\n\n\n\n\n array([[-63.34856485, -45.14910401, -36.14726097, ..., -80. ,\n -73.35788085, -60.58728436],\n [-67.99632241, -74.80503122, -80. , ..., -80. ,\n -72.1600597 , -60.16803079],\n [-80. , -80. , -80. , ..., -80. ,\n -72.90050429, -60.90871386],\n ...,\n [-80. , -80. , -80. , ..., -80. ,\n -80. , -80. ],\n [-80. , -80. , -80. , ..., -80. ,\n -80. , -80. ],\n [-80. , -80. , -80. , ..., -80. ,\n -80. , -80. ]])\n\n\n\n\n```python\nsnd.display()\n```\n\n\n \n\n \n\n\n## Infinite waveform from spectrum\n`hum` also provides the functionality to create an infinite waveform based on a given spectrum, and a noise amplifier if desired\n\n\n```python\niwf = InfiniteWaveform(wf)\n```\n\n\n```python\nwf = list(iwf.query(0,500000))\n```\n\n\n```python\ndisp_wf(wf)\n```\n\n\n \n\n \n\n\n\n```python\nSound(wf=wf).display()\n```\n\n\n \n\n \n\n\n## Sample sounds\n`hum` also provides several functions to generate sample sounds shown below\n\n### Random sample\n\n\n```python\nwf = random_samples(chk_size_frm=21*2048, max_amplitude=30000)\ndisp_wf(wf=wf, sr=sr)\nSound(wf=wf).display()\n```\n\n\n \n\n \n\n\n\n \n\n \n\n\n### Pure tone sample\n\n\n```python\nwf = pure_tone(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)\ndisp_wf(wf=wf, sr=sr)\nSound(wf=wf).display()\n```\n\n /Users/owenlloyd/opt/anaconda3/envs/oto3/lib/python3.8/site-packages/matplotlib/axes/_axes.py:7723: RuntimeWarning: divide by zero encountered in log10\n Z = 10. * np.log10(spec)\n\n\n\n \n\n \n\n\n\n \n\n \n\n\n### Triangular tone sample\n\n\n```python\nwf = triangular_tone(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)\ndisp_wf(wf=wf, sr=sr)\nSound(wf=wf).display()\n```\n\n\n \n\n \n\n\n\n \n\n \n\n\n### Square tone sample\n\n\n```python\nwf = square_tone(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)\ndisp_wf(wf=wf, sr=sr)\nSound(wf=wf).display()\n```\n\n\n \n\n \n\n\n\n \n\n \n\n\n### Annotated Waveform\n\n\n```python\nawf = AnnotatedWaveform(chk_size_frm=21*2048, freq=440, sr=44100, max_amplitude=30000)\ngen = awf.chk_and_tag_gen()\nlist(gen)\n```\n\n\n\n\n [(array([-14025, 11555, 22270, ..., 10243, -18225, 3874], dtype=int16),\n 'random'),\n (array([ 0, 1902, 3797, ..., 9361, 11149, 12893], dtype=int16),\n 'pure_tone'),\n (array([-30000, -29900, -29800, ..., 10500, 10600, 10700], dtype=int16),\n 'triangular_tone'),\n (array([30000, 30000, 30000, ..., 30000, 30000, 30000], dtype=int16),\n 'square_tone')]\n\n\n\n\n```python\nawf.get_wf_and_annots()\n```\n\n\n\n\n (array([ 5183, 10421, -21645, ..., 30000, 30000, 30000], dtype=int16),\n {'random': [(0, 43008)],\n 'pure_tone': [(43008, 86016)],\n 'triangular_tone': [(86016, 129024)],\n 'square_tone': [(129024, 172032)]})\n\n\n\n## Voiced time\nFinally `hum` provides a function that will tell the time continuously with parameters for the frequency, speed, voice, volume, and time format\n\n\n```python\ntell_time_continuously(every_secs=5, verbose=True)\n```\n\n 15 45 11\n 15 45 16\n 15 45 21\n 15 45 26\n KeyboardInterrupt!!!\n\n",
"bugtrack_url": null,
"license": "mit",
"summary": "Generate synthetic signals for ML pipelines",
"version": "0.1.33",
"project_urls": {
"Homepage": "https://github.com/thorwhalen/hum"
},
"split_keywords": [
"data",
" signal processing",
" audio"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7fd072157c418e027111ce0a77fa41fda94bb8de182ffb36ae0dac59af55583f",
"md5": "214ef6ead30540126b201b22d6904b12",
"sha256": "31126ed595f8e6a6a734cee940ec893a624a3c02af28ef455e2640b30d32a2e7"
},
"downloads": -1,
"filename": "hum-0.1.33-py3-none-any.whl",
"has_sig": false,
"md5_digest": "214ef6ead30540126b201b22d6904b12",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 45971,
"upload_time": "2024-10-28T10:48:07",
"upload_time_iso_8601": "2024-10-28T10:48:07.519061Z",
"url": "https://files.pythonhosted.org/packages/7f/d0/72157c418e027111ce0a77fa41fda94bb8de182ffb36ae0dac59af55583f/hum-0.1.33-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b01b9ad601356a2e8c2f753995e47fbeb11759893b9ec7c4396fdbc7d9a0d421",
"md5": "fb80dfc674e427778355a40af13122c4",
"sha256": "868106bd04d500a4febc0995a1e5296c344816839566af81bcfccdc22763fe6c"
},
"downloads": -1,
"filename": "hum-0.1.33.tar.gz",
"has_sig": false,
"md5_digest": "fb80dfc674e427778355a40af13122c4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 41905,
"upload_time": "2024-10-28T10:48:09",
"upload_time_iso_8601": "2024-10-28T10:48:09.209115Z",
"url": "https://files.pythonhosted.org/packages/b0/1b/9ad601356a2e8c2f753995e47fbeb11759893b9ec7c4396fdbc7d9a0d421/hum-0.1.33.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-28 10:48:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thorwhalen",
"github_project": "hum",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hum"
}