[21] L'lasons de Aether ver Beinags #3

Conlan Walker
Feb 17, 2022
2 min read

Going on the theme of ineffective Youtube tutorials and obtuse and esoteric research papers, I really only figured out 2 things. The bright side, however, is that those things are pretty important.

Firstly, I wanted to solve the riddle of how I'd apply the filter to the source generator, even though I didn't quite know how the source generator would even work.

To prove my previously-held hypothesis that I could implement a filter using a dynamic equalizer, I had to have something resembling a vocalization source. I ended up using a technique I used earlier, and got a small snippet of me pronouncing the "schwa" phonetic, with its start/ end sample matching perfect neutral for ease of looping.

Audacity has an inconvenient EQ effect, because you can't see a spectrum of the audio in real time. So when I went to boost the two specific frequencies that are most prominent when pronouncing "ah", I effectively eyeballed it without being able to get any visual feedback to compare with an already existing graph of those two frequencies. This results in a pretty crude and approximate representation:

The graph being logarithmic probably doesn't help things, either.

The following is the original "schwa" sound, versus the "ah" sound I wanted to create out of it:

Considering that there's a noticeable change between the two, I'll call that a success.

Now on to the 2nd thing.

I was doing the usual 'looking at bad youtube explanations and research papers to hopefully connect the dots at some point' thing, and found a paper that contains this diagram:

Now, most of the stuff shown here are of little importance to me, because I'm not making an implementation that uses a neural network-based model. What caught my eye is this section, the "Source module":

At this point, I was fairly certain that the source could be made up of a bunch of sine waves, which increase in pitch, and proportionally decrease in volume, so my attention was immediately brought to the word "harmonics". The 'Intro to Speech Acoustics' video series I watched referred to the word in passing, without actually explaining its relevance (if I recall correctly, anyway.) 1 Andrew Huang video and Encyclopedia Britannica article later, I'm pretty sure I know what it is.

A harmonic is any integer multiple of some frequency. For instance, if you have a base frequency of 100Hz, its 2nd (x2) harmonic would be 200Hz, its 3rd (x3) harmonic would be 300Hz, et cetera. The Andrew Huang video linked an interactive demo of this, which you can find here: https://alexanderchen.github.io/harmonics/

As for the important part of the Encyclopedia Britannica article:

I wanted to see how accurately this matches up with my voice, so I recorded my voice, and viewed it using an actual, proper spectrum analyzer:

Please ignore the thinner blue curves, as I was also messing with the EQ at the time.

What's actually important are the magenta dots, and the bold cyan line.

The magenta dots denote the first six harmonics, with the first one being just under 100Hz.

The bold cyan line shows a rough relationship between the pitches of the harmonics, and their volume. My hypothesis was confirmed.

That's it for this week.

[21] L'lasons de Aether ver Beinags #3

Recent Posts

Comments