touch ups

This commit is contained in:
David Westgate 2024-12-11 09:54:49 -08:00
parent 757cce5f30
commit fba6ccacdb
11 changed files with 97 additions and 37 deletions

1
.gitignore vendored
View File

@ -8,3 +8,4 @@ code/project/neopixel/
code/project/radio/
code/project/speech/
*lock
adj-*.wav

View File

@ -7,7 +7,7 @@ This portfolio repository tracks my regular progress and updates to correspond w
### [`nodebook.md`](./notebook.md)
Portfolio progress will primarily be noted in this notebook, with the intention that newest entries will appear at the top. Types of entires will be indicated as follows
* 📝 Assignments
* 📝 Objectives
* 🤔 Lecture reflections
* ⚙️ Clerical Changes

View File

@ -3,23 +3,47 @@ Included here is a simple python script to analyze and an manipulate an audio si
## Setup
Potential libraries needed for debian-based gnu+linux
```
```bash
sudo apt-get install libportaudio2
```
Install python libraries
```
```bash
pip install -r requirnments.txt
```
## Run
```
```bash
python3 main.py
#or
python3 main.py --alien.wav
```
## View Source
[main.py](./main.py)
The over all flow of this objective work is roughly as follow
* Load the wave file into memory, handling errors or stereo signals.
* Figure out how many 1024 window frames will fit in the signal. Perform an FFT to get a general idea of energy level
* Iterate on all of the frames and:
* Perform hanning on the frame
* Calculate the FFT value and frequency of the frame
* Use those to get the three desired band_energiesTODO
* Perform adjustments on each frame
* Reconstruct the frames with inverse_fft
* Reconstruct the signal
* Write the audio
## Included Audio Sources
[airplane.wav](./airplane.wav): Royalty free wav from Daniel Simion on [soundbible.com](https://soundbible.com/2213-Alien-Spaceship-UFO.html). My first impression of this one is that lower frequencies dominate
[alien.wav](./alien.wav): Royalty free wav from Daniel Simion on [soundbible.com](https://soundbible.com/2219-Airplane-Landing-Airport.html). Higher frequencies seem to dominate here
## Reflections, Results, Analysis
TODO
This objective at first sounded slightly easier than it turned out to be. To be more specific, the biggest challenge was understanding the use of windowing and window size. While all other steps can be defined in a algorithm, the window size, as well as methods are less deterministic and can vary in effectiveness based on the input. The best window is often after trying a few with different sizes, until you find the frequencies you are interested in, within the result.
_A Hann window of size 1024 vs a Rectangle Window of size 256, on a small sample of `airplane.wav` in audacity._
![hann-1024.png](./hann-1024.png)
![rectangle-256.png](./rectangle-256.png)

Binary file not shown.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@ -4,29 +4,42 @@ import sys
print("Portfolio Object 2: Adaptive Tone Control")
# Contants
bands = {"low": (0, 300), "mid": (300, 2000), "high": (2000, 20000)}
# Ingest the wave signal, and convert to mono if needed
file = "alien.wav"
file = "test.wav"
if len(sys.argv) > 1:
file = sys.argv[1]
print("Input file: ", file)
# Init
sample_rate, wav_signal = wav.read(file)
# window_size = 1024
window_size = len(wav_signal)
if wav_signal.ndim > 1:
wav_signal = np.mean(wav_signal, axis=1)
# Constants
bands = {"low": (0, 300), "mid": (300, 2000), "high": (2000, 20000)}
window_size = (
1024 # This seems like a fine choice, but hard to know one way or the other
)
hop_size = window_size // 2 # Overlap window by 1/2 of previous
num_frames = (
len(wav_signal) - window_size
) // hop_size + 1 # discrete window frames in the signal length
reconstructed_signal = np.zeros(len(wav_signal))
# FFT to get energy at an arbitrart window
fft_values = np.fft.fft(wav_signal[:window_size])
fft_freqs = np.fft.fftfreq(window_size, 1 / sample_rate)
def band_energy(band, fft_values):
def band_energy(band, fft_values, fft_freqs):
idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs <= band[1]))[0]
return np.sum(np.abs(fft_values[idx_band]) ** 2)
energy_low = band_energy(bands["low"], fft_values)
energy_mid = band_energy(bands["mid"], fft_values)
energy_high = band_energy(bands["high"], fft_values)
# Calculate and display the band energy results for this given window
energy_low = band_energy(bands["low"], fft_values, fft_freqs)
energy_mid = band_energy(bands["mid"], fft_values, fft_freqs)
energy_high = band_energy(bands["high"], fft_values, fft_freqs)
avg_energy = (energy_low + energy_mid + energy_high) / 3
@ -36,29 +49,48 @@ print(f"high {energy_high:.2e}")
print(f"avg {avg_energy:.2e}")
# Adjust the fft_value of all frequencies in the given band by a factor of the gain. This could be > 0 or < 0
def adjust(target_energy, current_energy, fft_values, band):
idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs <= band[1]))[0]
idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs < band[1]))[0]
gain = np.sqrt(target_energy / (current_energy + 1e-6))
adjusted_fft_values = np.copy(fft_values)
adjusted_fft_values[idx_band] *= gain
return adjusted_fft_values
fft_values[idx_band] *= gain
adjusted_low = adjust(avg_energy, energy_low, fft_values, bands["low"])
adjusted_mid = adjust(avg_energy, energy_mid, fft_values, bands["mid"])
adjusted_high = adjust(avg_energy, energy_high, fft_values, bands["high"])
# For each window in the sample, we need to calculate, then adjust the low, medium, and hight band energies
for i in range(num_frames):
print(f'adj low {band_energy( bands["low"],adjusted_low):.2e}')
print(f'adj mid {band_energy(bands["mid"],adjusted_mid):.2e}')
print(f'adj high {band_energy( bands["high"],adjusted_high):.2e}')
# Window bounds and window frame contents
start_idx = i * hop_size
end_idx = start_idx + window_size
frame = wav_signal[start_idx:end_idx] * np.hanning(
min(window_size, end_idx - start_idx)
)
# Calculate FFT
fft_values = np.fft.fft(frame)
fft_freqs = np.fft.fftfreq(window_size, 1 / sample_rate)
energy_low = band_energy(bands["low"], fft_values, fft_freqs)
energy_mid = band_energy(bands["mid"], fft_values, fft_freqs)
energy_high = band_energy(bands["high"], fft_values, fft_freqs)
avg_energy = (energy_low + energy_mid + energy_high) / 3
adjust(avg_energy, energy_low, fft_values, bands["low"])
adjust(avg_energy, energy_mid, fft_values, bands["mid"])
adjust(avg_energy, energy_high, fft_values, bands["high"])
# Now, FFT values have been modified in place, up or down. We can inverse FFT
adjusted_frame = np.fft.ifft(fft_values).real
# Put the signal back together, frame by frame
reconstructed_signal[start_idx:end_idx] += adjusted_frame * np.hanning(window_size)
adjusted_all = adjusted_low + adjusted_mid + adjusted_high
full_spectrum = np.concatenate([adjusted_all])
reconstructed_signal = np.fft.ifft(adjusted_all).real
reconstructed_signal = np.int16(
reconstructed_signal / np.max(np.abs(reconstructed_signal)) * np.iinfo(np.int16).max
)
assert len(wav_signal) == len(reconstructed_signal) # Sanity check
output_file = "adj-" + file
wav.write(output_file, sample_rate, reconstructed_signal)
print(f"Adjusted audio written to {output_file}")

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@ -0,0 +1,3 @@
numpy==1.24.4
scipy==1.8.0
sounddevice==0.4.6

View File

@ -20,7 +20,7 @@ cargo run
cargo run -- --help
# Example with flags
cargo run -- -b 80 --root C[2]
cargo run -- -b 80 --root C[4]
```
## View Source
@ -34,11 +34,8 @@ The effort involved for this portfolio objective was non trivial. I'll briefy di
* [clap](https://crates.io/crates/clap): In order to not drop any functionality from the original source, I wanted to ensure my application could handle all of the command-line argument as is done in popgen.py. I noticed there, [argparse](https://docs.python.org/3/library/argparse.html) was doing some heaving lifting. I only briefy pondered rolling my own command line parsing, but then I remembered a past experience doing so with much simpler arguments. That made using `clap` a straightforward decision.
* [rand](https://crates.io/crates/rand): Picking the sequence of notes requires some randomness
* [regex](https://crates.io/crates/regex): The go-to for regex, validating and parsing the note from the command line arguments
* [rodio](https://crates.io/crates/rodio): Back in a portfolio objective rodio, cpal, portaudio-rs. I did a little research, and it seemed that rodio was the most abstract/easy to use, but still allowed me to manually set the sample rate as is necessary
* [hound](https://crates.io/crates/hound): This was recommened on a previous portfolio objective for saving wave files
### Typing hell
The most tedious aspect of this objective was dealing with the typecasting needed for integer and float arithmetic. I won't discuss strong typing/rust arguments or any of that, but the crux of issues here came down to a few competing philosophies.
* [rodio](https://crates.io/crates/rodio): Back in a previous portfolio objective rodio, cpal, portaudio-rs were for audio playing in rust. I did a little research, and it seemed that `rodio`
s dealing with the typecasting needed for integer and float arithmetic. I won't discuss strong typing/rust arguments or any of that, but the crux of issues here came down to a few competing philosophies.
* I want to use the smallest integer units possible to handle the basic parameters. For example, a `u8` should easily contain the `bpm`
* Many parameters can be, and often are negative requiring the use of signed integers such as `melody_root`, `position`, etc. On one hand, I did not realize this in time, so some rework was required. On the other hand, there is a desire to capture initial values as unsigned if that seems appropriate, and cast them only as needed for signed arithmetic (Example: `chord_root` into `bass_note`).
* I want to use the highest precision possible floating point representation for the wave data samples which go to the buffer, to ensure the most accurate audio. I initially chose `f64` for this before realizing that `rodio` seems to only work with `f32`. Aside from that, all sorts of casting was needed up from the base integer units.
@ -53,7 +50,7 @@ I brought in the provided unit tests for `note_to_key_offset` and `chord_to_note
## Access outputs
Two output files were generated. I must have a slight bug in the `save` function, because these sound of somewhat lower quality.
Two output files were generated. I must have a slight bug in the `save` function, because these sound are of somewhat lower quality.
[out0.wav](out0.wav)

View File

@ -1,3 +1,6 @@
### Wednesday 11-Dec-2024
📝 I reviewed all portfolio objectives for touch-ups like spelling mistakes, adding extra code comments, and double checking work
### Tuesday 10-Dec-2024
📝 [Project](./code/project/README.md): Continued work