diff --git a/.gitignore b/.gitignore index 2fa087e..2363dc8 100644 --- a/.gitignore +++ b/.gitignore @@ -7,4 +7,5 @@ code/project/music/ code/project/neopixel/ code/project/radio/ code/project/speech/ -*lock \ No newline at end of file +*lock +adj-*.wav \ No newline at end of file diff --git a/README.md b/README.md index b539e26..1690ef7 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ This portfolio repository tracks my regular progress and updates to correspond w ### [`nodebook.md`](./notebook.md) Portfolio progress will primarily be noted in this notebook, with the intention that newest entries will appear at the top. Types of entires will be indicated as follows -* 📝 Assignments +* 📝 Objectives * 🤔 Lecture reflections * ⚙️ Clerical Changes diff --git a/code/adaptive-tone-control/README.md b/code/adaptive-tone-control/README.md index cc3b0a7..5b2804f 100644 --- a/code/adaptive-tone-control/README.md +++ b/code/adaptive-tone-control/README.md @@ -3,23 +3,47 @@ Included here is a simple python script to analyze and an manipulate an audio si ## Setup Potential libraries needed for debian-based gnu+linux -``` +```bash sudo apt-get install libportaudio2 ``` Install python libraries -``` +```bash pip install -r requirnments.txt ``` ## Run -``` +```bash python3 main.py +#or +python3 main.py --alien.wav ``` ## View Source [main.py](./main.py) +The over all flow of this objective work is roughly as follow +* Load the wave file into memory, handling errors or stereo signals. +* Figure out how many 1024 window frames will fit in the signal. Perform an FFT to get a general idea of energy level +* Iterate on all of the frames and: + * Perform hanning on the frame + * Calculate the FFT value and frequency of the frame + * Use those to get the three desired band_energiesTODO + * Perform adjustments on each frame + * Reconstruct the frames with inverse_fft +* Reconstruct the signal +* Write the audio + +## Included Audio Sources +[airplane.wav](./airplane.wav): Royalty free wav from Daniel Simion on [soundbible.com](https://soundbible.com/2213-Alien-Spaceship-UFO.html). My first impression of this one is that lower frequencies dominate + +[alien.wav](./alien.wav): Royalty free wav from Daniel Simion on [soundbible.com](https://soundbible.com/2219-Airplane-Landing-Airport.html). Higher frequencies seem to dominate here ## Reflections, Results, Analysis -TODO \ No newline at end of file +This objective at first sounded slightly easier than it turned out to be. To be more specific, the biggest challenge was understanding the use of windowing and window size. While all other steps can be defined in a algorithm, the window size, as well as methods are less deterministic and can vary in effectiveness based on the input. The best window is often after trying a few with different sizes, until you find the frequencies you are interested in, within the result. + +_A Hann window of size 1024 vs a Rectangle Window of size 256, on a small sample of `airplane.wav` in audacity._ + +![hann-1024.png](./hann-1024.png) + +![rectangle-256.png](./rectangle-256.png) \ No newline at end of file diff --git a/code/adaptive-tone-control/airplane.wav b/code/adaptive-tone-control/airplane.wav new file mode 100644 index 0000000..ff09458 Binary files /dev/null and b/code/adaptive-tone-control/airplane.wav differ diff --git a/code/adaptive-tone-control/alien.wav b/code/adaptive-tone-control/alien.wav new file mode 100644 index 0000000..8ff0684 Binary files /dev/null and b/code/adaptive-tone-control/alien.wav differ diff --git a/code/adaptive-tone-control/hann-1024.png b/code/adaptive-tone-control/hann-1024.png new file mode 100644 index 0000000..f1d0a0e Binary files /dev/null and b/code/adaptive-tone-control/hann-1024.png differ diff --git a/code/adaptive-tone-control/main.py b/code/adaptive-tone-control/main.py index 388a857..a2e9f0f 100644 --- a/code/adaptive-tone-control/main.py +++ b/code/adaptive-tone-control/main.py @@ -4,29 +4,42 @@ import sys print("Portfolio Object 2: Adaptive Tone Control") -# Contants -bands = {"low": (0, 300), "mid": (300, 2000), "high": (2000, 20000)} +# Ingest the wave signal, and convert to mono if needed +file = "alien.wav" -file = "test.wav" if len(sys.argv) > 1: file = sys.argv[1] -print("Input file: ", file) -# Init + sample_rate, wav_signal = wav.read(file) -# window_size = 1024 -window_size = len(wav_signal) +if wav_signal.ndim > 1: + wav_signal = np.mean(wav_signal, axis=1) + +# Constants +bands = {"low": (0, 300), "mid": (300, 2000), "high": (2000, 20000)} +window_size = ( + 1024 # This seems like a fine choice, but hard to know one way or the other +) +hop_size = window_size // 2 # Overlap window by 1/2 of previous +num_frames = ( + len(wav_signal) - window_size +) // hop_size + 1 # discrete window frames in the signal length + +reconstructed_signal = np.zeros(len(wav_signal)) + +# FFT to get energy at an arbitrart window fft_values = np.fft.fft(wav_signal[:window_size]) fft_freqs = np.fft.fftfreq(window_size, 1 / sample_rate) -def band_energy(band, fft_values): +def band_energy(band, fft_values, fft_freqs): idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs <= band[1]))[0] return np.sum(np.abs(fft_values[idx_band]) ** 2) -energy_low = band_energy(bands["low"], fft_values) -energy_mid = band_energy(bands["mid"], fft_values) -energy_high = band_energy(bands["high"], fft_values) +# Calculate and display the band energy results for this given window +energy_low = band_energy(bands["low"], fft_values, fft_freqs) +energy_mid = band_energy(bands["mid"], fft_values, fft_freqs) +energy_high = band_energy(bands["high"], fft_values, fft_freqs) avg_energy = (energy_low + energy_mid + energy_high) / 3 @@ -36,29 +49,48 @@ print(f"high {energy_high:.2e}") print(f"avg {avg_energy:.2e}") +# Adjust the fft_value of all frequencies in the given band by a factor of the gain. This could be > 0 or < 0 def adjust(target_energy, current_energy, fft_values, band): - idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs <= band[1]))[0] + idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs < band[1]))[0] gain = np.sqrt(target_energy / (current_energy + 1e-6)) - adjusted_fft_values = np.copy(fft_values) - adjusted_fft_values[idx_band] *= gain - return adjusted_fft_values + fft_values[idx_band] *= gain -adjusted_low = adjust(avg_energy, energy_low, fft_values, bands["low"]) -adjusted_mid = adjust(avg_energy, energy_mid, fft_values, bands["mid"]) -adjusted_high = adjust(avg_energy, energy_high, fft_values, bands["high"]) +# For each window in the sample, we need to calculate, then adjust the low, medium, and hight band energies +for i in range(num_frames): -print(f'adj low {band_energy( bands["low"],adjusted_low):.2e}') -print(f'adj mid {band_energy(bands["mid"],adjusted_mid):.2e}') -print(f'adj high {band_energy( bands["high"],adjusted_high):.2e}') + # Window bounds and window frame contents + start_idx = i * hop_size + end_idx = start_idx + window_size + frame = wav_signal[start_idx:end_idx] * np.hanning( + min(window_size, end_idx - start_idx) + ) + + # Calculate FFT + fft_values = np.fft.fft(frame) + fft_freqs = np.fft.fftfreq(window_size, 1 / sample_rate) + + energy_low = band_energy(bands["low"], fft_values, fft_freqs) + energy_mid = band_energy(bands["mid"], fft_values, fft_freqs) + energy_high = band_energy(bands["high"], fft_values, fft_freqs) + avg_energy = (energy_low + energy_mid + energy_high) / 3 + + adjust(avg_energy, energy_low, fft_values, bands["low"]) + adjust(avg_energy, energy_mid, fft_values, bands["mid"]) + adjust(avg_energy, energy_high, fft_values, bands["high"]) + + # Now, FFT values have been modified in place, up or down. We can inverse FFT + adjusted_frame = np.fft.ifft(fft_values).real + + # Put the signal back together, frame by frame + reconstructed_signal[start_idx:end_idx] += adjusted_frame * np.hanning(window_size) -adjusted_all = adjusted_low + adjusted_mid + adjusted_high -full_spectrum = np.concatenate([adjusted_all]) -reconstructed_signal = np.fft.ifft(adjusted_all).real reconstructed_signal = np.int16( reconstructed_signal / np.max(np.abs(reconstructed_signal)) * np.iinfo(np.int16).max ) +assert len(wav_signal) == len(reconstructed_signal) # Sanity check + output_file = "adj-" + file wav.write(output_file, sample_rate, reconstructed_signal) print(f"Adjusted audio written to {output_file}") diff --git a/code/adaptive-tone-control/rectangle-256.png b/code/adaptive-tone-control/rectangle-256.png new file mode 100644 index 0000000..59f1c39 Binary files /dev/null and b/code/adaptive-tone-control/rectangle-256.png differ diff --git a/code/adaptive-tone-control/requirements.txt b/code/adaptive-tone-control/requirements.txt new file mode 100644 index 0000000..e0784fc --- /dev/null +++ b/code/adaptive-tone-control/requirements.txt @@ -0,0 +1,3 @@ +numpy==1.24.4 +scipy==1.8.0 +sounddevice==0.4.6 diff --git a/code/popgen/README.md b/code/popgen/README.md index 671c763..4b27238 100644 --- a/code/popgen/README.md +++ b/code/popgen/README.md @@ -20,7 +20,7 @@ cargo run cargo run -- --help # Example with flags -cargo run -- -b 80 --root C[2] +cargo run -- -b 80 --root C[4] ``` ## View Source @@ -34,11 +34,8 @@ The effort involved for this portfolio objective was non trivial. I'll briefy di * [clap](https://crates.io/crates/clap): In order to not drop any functionality from the original source, I wanted to ensure my application could handle all of the command-line argument as is done in popgen.py. I noticed there, [argparse](https://docs.python.org/3/library/argparse.html) was doing some heaving lifting. I only briefy pondered rolling my own command line parsing, but then I remembered a past experience doing so with much simpler arguments. That made using `clap` a straightforward decision. * [rand](https://crates.io/crates/rand): Picking the sequence of notes requires some randomness * [regex](https://crates.io/crates/regex): The go-to for regex, validating and parsing the note from the command line arguments -* [rodio](https://crates.io/crates/rodio): Back in a portfolio objective rodio, cpal, portaudio-rs. I did a little research, and it seemed that rodio was the most abstract/easy to use, but still allowed me to manually set the sample rate as is necessary -* [hound](https://crates.io/crates/hound): This was recommened on a previous portfolio objective for saving wave files - -### Typing hell -The most tedious aspect of this objective was dealing with the typecasting needed for integer and float arithmetic. I won't discuss strong typing/rust arguments or any of that, but the crux of issues here came down to a few competing philosophies. +* [rodio](https://crates.io/crates/rodio): Back in a previous portfolio objective rodio, cpal, portaudio-rs were for audio playing in rust. I did a little research, and it seemed that `rodio` +s dealing with the typecasting needed for integer and float arithmetic. I won't discuss strong typing/rust arguments or any of that, but the crux of issues here came down to a few competing philosophies. * I want to use the smallest integer units possible to handle the basic parameters. For example, a `u8` should easily contain the `bpm` * Many parameters can be, and often are negative requiring the use of signed integers such as `melody_root`, `position`, etc. On one hand, I did not realize this in time, so some rework was required. On the other hand, there is a desire to capture initial values as unsigned if that seems appropriate, and cast them only as needed for signed arithmetic (Example: `chord_root` into `bass_note`). * I want to use the highest precision possible floating point representation for the wave data samples which go to the buffer, to ensure the most accurate audio. I initially chose `f64` for this before realizing that `rodio` seems to only work with `f32`. Aside from that, all sorts of casting was needed up from the base integer units. @@ -53,7 +50,7 @@ I brought in the provided unit tests for `note_to_key_offset` and `chord_to_note ## Access outputs -Two output files were generated. I must have a slight bug in the `save` function, because these sound of somewhat lower quality. +Two output files were generated. I must have a slight bug in the `save` function, because these sound are of somewhat lower quality. [out0.wav](out0.wav) diff --git a/notebook.md b/notebook.md index 0e1f3ed..d2eced0 100644 --- a/notebook.md +++ b/notebook.md @@ -1,3 +1,6 @@ +### Wednesday 11-Dec-2024 +📝 I reviewed all portfolio objectives for touch-ups like spelling mistakes, adding extra code comments, and double checking work + ### Tuesday 10-Dec-2024 📝 [Project](./code/project/README.md): Continued work