PHYS 536: Introduction to Acoustics and Digital Signal Processing

DRAFT VERSION

Initial Proposal
As final project I would like to develop a basic audio descriptor that could help as starting point for a long term research that I want to do. For the last years I have been doing research in the field of human-computer interaction in the musical field. As one of the goals I want to create a machine to improvise with. Therefore I need to have some tools to analyze on real-time some parameters of the audio. I have been using tools developed by other people but I would like to understand how they work. I would like to study one or two basic audio descriptors that could work on real time. Brightness or Loudness for example.

Introduction
For a long time, I have been doing research in the field of computer music. In recent years, I started to do experiments in the field that people usually called Machine Listening. In the project Improvisatory Music and Painting Interfase I explored the notion of gesture in the audiovisual domain. Later, in the project Understanding Collective Gestural Improvisation; a computation approach I explored the field of computational musical analyzis. The former one did not use audio signals in any way. In the later al the audio analysis was developed off-line. There was no reason for having a real-time analyzis of the sources.

However, the intended goal of all previous works has been the creation of a virtual partner to play with. This constrains requires a real-time analyzis of the signal. The only previous exploration in this line was the porting of the MaxMSP brightness~ object into the Pure Data (PD) program. The porting was realized without a deep understanding of the code and infering a lot of it by the comments within the code.

After taking the sequence Digital Sound at the dxarts department at UW with Juan Pampin I realized the benefits of working with another popular audio programing languaje, Super Collider (SC). SC is a great tool for audio synthesis and it is optimized for real-time performance. Deciding to work with SC instead of PD involves doing a research about the current state in the field of Machine Listening with such a Lenguaje.

After an initial research on the web I found that there are at least few libraries and SC extentions that extract basic information of a signal. This projects consist in:

  1. Review such libraries for comparison purposes
  2. Learn how to extend Super Collider and create a new UGen
  3. Implement my own audio descriptors
  4. Test audio descriptor in a musical context

Rreview machine listening libraries of Super Collider
By sarching on the internet and the supercollider webpages and forms I finally ended up getting a list of plug-ins. Of this list only few of them have units for the analyzis of the signal. This are:

  1. The UGens by Dan Stowell, includes some extra chaos UGens (the Rossler attractor and some Finco/Sprott attractors) and frequency-domain analysis (Signal analysis): Various FFT-based analyses (flatness, spectral flux, power, subband power, spectral percentile, complex flux, modified Kullback-Leibler), plus FFTTriggered too.
  2. The Machine Listening SC3 realtime machine listening plug-in pack is part of the research that Nick Collins has developed in recent years. It is a cross-platform using fftw, includes the main bbcut2 UGens, Tartini as below, Qitch (constant Q pitch tracker) and Concat (live concantenative synth). It contains all the source code and is a great starting point for this project.

Besides these two examples, there does not seem to be more Machine Listening code for SuperCollider.

Learn how to extend Super Collider by creating new UGens
Fortunatelly, SC cames with a short explanations about how to great Unit Generators. Basically involves downloding the soure code, and create a C++ project using the right parameters. It is a time consuming task that is required only the first. Subsequent UG may be easier to be devloped.

After some initial issues, a testing UGenerator, a class for testing the UGen, and a code with an instantiation of the class were sucesfully created. You can see the initial code which only copies the input into the output in the Folder Initial Code of the zip file.

Implement an audio descriptor
Based on the code by Nick Collins and Tristan Jehan I extended the initial code and developed a brightness analyzer.

"The Spectral Centroid (here called brightness) is a low level audio descriptor that has certain relationship with the brightness and pitch region of a sound. It measures the average frequency, weighted by amplitude, of a spectrum. The centroid of a spectral frame is defined as the average frequency weighted by amplitudes, divided by the sum of the amplitudes." See DEA thesis.

The important (acousticaly speaking) part is presented here:

Brightness in Pure Data

// brightness
for (i=1; ix_FFTSizeOver2; i++) {
numerator += (i * x->BufFFT[i]);
sumSpectrum += x->BufFFT[i];
}
if (sumSpectrum <= 0.0f) x->x_brightness = 0.0f;
else x->x_brightness = (numerator * FsOverFFTSize) / sumSpectrum;
outlet_float(x->x_outlet, x->x_brightness);

Brightness in Super Collider

preparefft(unit, in, inNumSamples); //prepare and do fft
//
for (i=0; i>1] = ((fftbuf[i] * fftbuf[i]) + (fftbuf[i+1] * fftbuf[i+1]))*0.25;//cont..
//
for (int i=1; i < NOVER2; ++i)
{
numerator += i * fftbuf[i];
sumSpectrum += fftbuf[i];
}
if (sumSpectrum <= 0.0f) brightness = 0.0f;
else brightness = (numerator * FsOverFFTSize) / sumSpectrum;
//printf("brightness %f", brightness);
for (int i=0; i < inNumSamples; ++i)
{
out[i] = brightness; //put the value in the output
}
}

Tests
Five examples were created in order to test the functionallity of the Unit Generator. Four experimental cases and one music test. For all the cases an input signal is presented in the left speaker. The signal is passed throw the Brigthness analyzer and the output of the analyzser sets the frequency of a sinusoidal that is presented in the right speaker. In other words, we have the original signal in the left speaker and the representacion of the brightness as frequency of a sinusodial in the right speaker.

In the test swiftSin a pure sine tone is swift from 20 to 10000 Hz in ten seconds. We can listen that the brightness produces the same result. Because the sine tone does not have harmonics all power is in one region of the spectrum at each time. Because the brightness is a kind of average of all the spectrum we end up with the same value than the input. This test help us to verify that the code is correct.

In the test swiftBlip a sound with a fixed fudamental at 440 gradually increases the number of harmonics from 1 to 20 in ten seconds. We can listen how the brightness of the sound gradually increases while the number of harmonics increases.

In the test gradualNoise a white noise is passed throw a second order Butterworth bandpass filter with a cutoff frequency of 3000 Hertz. Q is gradually swift from 100 to 1.1. The result is a noise sound that gradually lost frequencies until become a pure sound at 3000 Hertz. In this case, we can hear that the brighness gradually increasses. Brighness can be seen as how pure a sound is. A noise sound is therefore and opaque sound.

In the test audioExample a short fragment of electroacoustic music by the Juum Duet is used as input signal. The source is made of an acoustic viola and electronic sounds. It is interesting to see how the global brighness of the musical fragment gradually increasses. This correspond to the change in the preassure in the bow of the viola and also in the gradual modification of the timbre quality of the electronic section.

AttachmentSize
InitialCode.zip2.41 KB
Brightness.zip13.84 KB
swiftSin.mp3239.48 KB
swiftBlip.mp3244.38 KB
gradualNoise.mp3270.09 KB
audioExample.mp31.77 MB