touch ups
This commit is contained in:
parent
757cce5f30
commit
fba6ccacdb
3
.gitignore
vendored
3
.gitignore
vendored
@ -7,4 +7,5 @@ code/project/music/
|
||||
code/project/neopixel/
|
||||
code/project/radio/
|
||||
code/project/speech/
|
||||
*lock
|
||||
*lock
|
||||
adj-*.wav
|
@ -7,7 +7,7 @@ This portfolio repository tracks my regular progress and updates to correspond w
|
||||
### [`nodebook.md`](./notebook.md)
|
||||
|
||||
Portfolio progress will primarily be noted in this notebook, with the intention that newest entries will appear at the top. Types of entires will be indicated as follows
|
||||
* 📝 Assignments
|
||||
* 📝 Objectives
|
||||
* 🤔 Lecture reflections
|
||||
* ⚙️ Clerical Changes
|
||||
|
||||
|
@ -3,23 +3,47 @@ Included here is a simple python script to analyze and an manipulate an audio si
|
||||
|
||||
## Setup
|
||||
Potential libraries needed for debian-based gnu+linux
|
||||
```
|
||||
```bash
|
||||
sudo apt-get install libportaudio2
|
||||
```
|
||||
Install python libraries
|
||||
```
|
||||
```bash
|
||||
pip install -r requirnments.txt
|
||||
```
|
||||
|
||||
## Run
|
||||
```
|
||||
```bash
|
||||
python3 main.py
|
||||
#or
|
||||
python3 main.py --alien.wav
|
||||
```
|
||||
|
||||
## View Source
|
||||
[main.py](./main.py)
|
||||
|
||||
The over all flow of this objective work is roughly as follow
|
||||
* Load the wave file into memory, handling errors or stereo signals.
|
||||
* Figure out how many 1024 window frames will fit in the signal. Perform an FFT to get a general idea of energy level
|
||||
* Iterate on all of the frames and:
|
||||
* Perform hanning on the frame
|
||||
* Calculate the FFT value and frequency of the frame
|
||||
* Use those to get the three desired band_energiesTODO
|
||||
* Perform adjustments on each frame
|
||||
* Reconstruct the frames with inverse_fft
|
||||
* Reconstruct the signal
|
||||
* Write the audio
|
||||
|
||||
## Included Audio Sources
|
||||
[airplane.wav](./airplane.wav): Royalty free wav from Daniel Simion on [soundbible.com](https://soundbible.com/2213-Alien-Spaceship-UFO.html). My first impression of this one is that lower frequencies dominate
|
||||
|
||||
[alien.wav](./alien.wav): Royalty free wav from Daniel Simion on [soundbible.com](https://soundbible.com/2219-Airplane-Landing-Airport.html). Higher frequencies seem to dominate here
|
||||
|
||||
|
||||
## Reflections, Results, Analysis
|
||||
TODO
|
||||
This objective at first sounded slightly easier than it turned out to be. To be more specific, the biggest challenge was understanding the use of windowing and window size. While all other steps can be defined in a algorithm, the window size, as well as methods are less deterministic and can vary in effectiveness based on the input. The best window is often after trying a few with different sizes, until you find the frequencies you are interested in, within the result.
|
||||
|
||||
_A Hann window of size 1024 vs a Rectangle Window of size 256, on a small sample of `airplane.wav` in audacity._
|
||||
|
||||

|
||||
|
||||

|
BIN
code/adaptive-tone-control/airplane.wav
Normal file
BIN
code/adaptive-tone-control/airplane.wav
Normal file
Binary file not shown.
BIN
code/adaptive-tone-control/alien.wav
Normal file
BIN
code/adaptive-tone-control/alien.wav
Normal file
Binary file not shown.
BIN
code/adaptive-tone-control/hann-1024.png
Normal file
BIN
code/adaptive-tone-control/hann-1024.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
@ -4,29 +4,42 @@ import sys
|
||||
|
||||
print("Portfolio Object 2: Adaptive Tone Control")
|
||||
|
||||
# Contants
|
||||
bands = {"low": (0, 300), "mid": (300, 2000), "high": (2000, 20000)}
|
||||
# Ingest the wave signal, and convert to mono if needed
|
||||
file = "alien.wav"
|
||||
|
||||
file = "test.wav"
|
||||
if len(sys.argv) > 1:
|
||||
file = sys.argv[1]
|
||||
print("Input file: ", file)
|
||||
# Init
|
||||
|
||||
sample_rate, wav_signal = wav.read(file)
|
||||
# window_size = 1024
|
||||
window_size = len(wav_signal)
|
||||
if wav_signal.ndim > 1:
|
||||
wav_signal = np.mean(wav_signal, axis=1)
|
||||
|
||||
# Constants
|
||||
bands = {"low": (0, 300), "mid": (300, 2000), "high": (2000, 20000)}
|
||||
window_size = (
|
||||
1024 # This seems like a fine choice, but hard to know one way or the other
|
||||
)
|
||||
hop_size = window_size // 2 # Overlap window by 1/2 of previous
|
||||
num_frames = (
|
||||
len(wav_signal) - window_size
|
||||
) // hop_size + 1 # discrete window frames in the signal length
|
||||
|
||||
reconstructed_signal = np.zeros(len(wav_signal))
|
||||
|
||||
# FFT to get energy at an arbitrart window
|
||||
fft_values = np.fft.fft(wav_signal[:window_size])
|
||||
fft_freqs = np.fft.fftfreq(window_size, 1 / sample_rate)
|
||||
|
||||
|
||||
def band_energy(band, fft_values):
|
||||
def band_energy(band, fft_values, fft_freqs):
|
||||
idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs <= band[1]))[0]
|
||||
return np.sum(np.abs(fft_values[idx_band]) ** 2)
|
||||
|
||||
|
||||
energy_low = band_energy(bands["low"], fft_values)
|
||||
energy_mid = band_energy(bands["mid"], fft_values)
|
||||
energy_high = band_energy(bands["high"], fft_values)
|
||||
# Calculate and display the band energy results for this given window
|
||||
energy_low = band_energy(bands["low"], fft_values, fft_freqs)
|
||||
energy_mid = band_energy(bands["mid"], fft_values, fft_freqs)
|
||||
energy_high = band_energy(bands["high"], fft_values, fft_freqs)
|
||||
avg_energy = (energy_low + energy_mid + energy_high) / 3
|
||||
|
||||
|
||||
@ -36,29 +49,48 @@ print(f"high {energy_high:.2e}")
|
||||
print(f"avg {avg_energy:.2e}")
|
||||
|
||||
|
||||
# Adjust the fft_value of all frequencies in the given band by a factor of the gain. This could be > 0 or < 0
|
||||
def adjust(target_energy, current_energy, fft_values, band):
|
||||
idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs <= band[1]))[0]
|
||||
idx_band = np.where((fft_freqs >= band[0]) & (fft_freqs < band[1]))[0]
|
||||
gain = np.sqrt(target_energy / (current_energy + 1e-6))
|
||||
adjusted_fft_values = np.copy(fft_values)
|
||||
adjusted_fft_values[idx_band] *= gain
|
||||
return adjusted_fft_values
|
||||
fft_values[idx_band] *= gain
|
||||
|
||||
|
||||
adjusted_low = adjust(avg_energy, energy_low, fft_values, bands["low"])
|
||||
adjusted_mid = adjust(avg_energy, energy_mid, fft_values, bands["mid"])
|
||||
adjusted_high = adjust(avg_energy, energy_high, fft_values, bands["high"])
|
||||
# For each window in the sample, we need to calculate, then adjust the low, medium, and hight band energies
|
||||
for i in range(num_frames):
|
||||
|
||||
print(f'adj low {band_energy( bands["low"],adjusted_low):.2e}')
|
||||
print(f'adj mid {band_energy(bands["mid"],adjusted_mid):.2e}')
|
||||
print(f'adj high {band_energy( bands["high"],adjusted_high):.2e}')
|
||||
# Window bounds and window frame contents
|
||||
start_idx = i * hop_size
|
||||
end_idx = start_idx + window_size
|
||||
frame = wav_signal[start_idx:end_idx] * np.hanning(
|
||||
min(window_size, end_idx - start_idx)
|
||||
)
|
||||
|
||||
# Calculate FFT
|
||||
fft_values = np.fft.fft(frame)
|
||||
fft_freqs = np.fft.fftfreq(window_size, 1 / sample_rate)
|
||||
|
||||
energy_low = band_energy(bands["low"], fft_values, fft_freqs)
|
||||
energy_mid = band_energy(bands["mid"], fft_values, fft_freqs)
|
||||
energy_high = band_energy(bands["high"], fft_values, fft_freqs)
|
||||
avg_energy = (energy_low + energy_mid + energy_high) / 3
|
||||
|
||||
adjust(avg_energy, energy_low, fft_values, bands["low"])
|
||||
adjust(avg_energy, energy_mid, fft_values, bands["mid"])
|
||||
adjust(avg_energy, energy_high, fft_values, bands["high"])
|
||||
|
||||
# Now, FFT values have been modified in place, up or down. We can inverse FFT
|
||||
adjusted_frame = np.fft.ifft(fft_values).real
|
||||
|
||||
# Put the signal back together, frame by frame
|
||||
reconstructed_signal[start_idx:end_idx] += adjusted_frame * np.hanning(window_size)
|
||||
|
||||
adjusted_all = adjusted_low + adjusted_mid + adjusted_high
|
||||
full_spectrum = np.concatenate([adjusted_all])
|
||||
reconstructed_signal = np.fft.ifft(adjusted_all).real
|
||||
reconstructed_signal = np.int16(
|
||||
reconstructed_signal / np.max(np.abs(reconstructed_signal)) * np.iinfo(np.int16).max
|
||||
)
|
||||
|
||||
assert len(wav_signal) == len(reconstructed_signal) # Sanity check
|
||||
|
||||
output_file = "adj-" + file
|
||||
wav.write(output_file, sample_rate, reconstructed_signal)
|
||||
print(f"Adjusted audio written to {output_file}")
|
||||
|
BIN
code/adaptive-tone-control/rectangle-256.png
Normal file
BIN
code/adaptive-tone-control/rectangle-256.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
3
code/adaptive-tone-control/requirements.txt
Normal file
3
code/adaptive-tone-control/requirements.txt
Normal file
@ -0,0 +1,3 @@
|
||||
numpy==1.24.4
|
||||
scipy==1.8.0
|
||||
sounddevice==0.4.6
|
@ -20,7 +20,7 @@ cargo run
|
||||
cargo run -- --help
|
||||
|
||||
# Example with flags
|
||||
cargo run -- -b 80 --root C[2]
|
||||
cargo run -- -b 80 --root C[4]
|
||||
```
|
||||
|
||||
## View Source
|
||||
@ -34,11 +34,8 @@ The effort involved for this portfolio objective was non trivial. I'll briefy di
|
||||
* [clap](https://crates.io/crates/clap): In order to not drop any functionality from the original source, I wanted to ensure my application could handle all of the command-line argument as is done in popgen.py. I noticed there, [argparse](https://docs.python.org/3/library/argparse.html) was doing some heaving lifting. I only briefy pondered rolling my own command line parsing, but then I remembered a past experience doing so with much simpler arguments. That made using `clap` a straightforward decision.
|
||||
* [rand](https://crates.io/crates/rand): Picking the sequence of notes requires some randomness
|
||||
* [regex](https://crates.io/crates/regex): The go-to for regex, validating and parsing the note from the command line arguments
|
||||
* [rodio](https://crates.io/crates/rodio): Back in a portfolio objective rodio, cpal, portaudio-rs. I did a little research, and it seemed that rodio was the most abstract/easy to use, but still allowed me to manually set the sample rate as is necessary
|
||||
* [hound](https://crates.io/crates/hound): This was recommened on a previous portfolio objective for saving wave files
|
||||
|
||||
### Typing hell
|
||||
The most tedious aspect of this objective was dealing with the typecasting needed for integer and float arithmetic. I won't discuss strong typing/rust arguments or any of that, but the crux of issues here came down to a few competing philosophies.
|
||||
* [rodio](https://crates.io/crates/rodio): Back in a previous portfolio objective rodio, cpal, portaudio-rs were for audio playing in rust. I did a little research, and it seemed that `rodio`
|
||||
s dealing with the typecasting needed for integer and float arithmetic. I won't discuss strong typing/rust arguments or any of that, but the crux of issues here came down to a few competing philosophies.
|
||||
* I want to use the smallest integer units possible to handle the basic parameters. For example, a `u8` should easily contain the `bpm`
|
||||
* Many parameters can be, and often are negative requiring the use of signed integers such as `melody_root`, `position`, etc. On one hand, I did not realize this in time, so some rework was required. On the other hand, there is a desire to capture initial values as unsigned if that seems appropriate, and cast them only as needed for signed arithmetic (Example: `chord_root` into `bass_note`).
|
||||
* I want to use the highest precision possible floating point representation for the wave data samples which go to the buffer, to ensure the most accurate audio. I initially chose `f64` for this before realizing that `rodio` seems to only work with `f32`. Aside from that, all sorts of casting was needed up from the base integer units.
|
||||
@ -53,7 +50,7 @@ I brought in the provided unit tests for `note_to_key_offset` and `chord_to_note
|
||||
|
||||
|
||||
## Access outputs
|
||||
Two output files were generated. I must have a slight bug in the `save` function, because these sound of somewhat lower quality.
|
||||
Two output files were generated. I must have a slight bug in the `save` function, because these sound are of somewhat lower quality.
|
||||
|
||||
[out0.wav](out0.wav)
|
||||
|
||||
|
@ -1,3 +1,6 @@
|
||||
### Wednesday 11-Dec-2024
|
||||
📝 I reviewed all portfolio objectives for touch-ups like spelling mistakes, adding extra code comments, and double checking work
|
||||
|
||||
### Tuesday 10-Dec-2024
|
||||
📝 [Project](./code/project/README.md): Continued work
|
||||
|
||||
|
Reference in New Issue
Block a user