In this lab we take a short C program that scales a set of random int16_t values representing the amplitude of sound waves for audio encoding, and multiplying them by 0.75 (lowering the volume). This type of operation occurs often when using audio since the computer must do these calculations in the background.

We begin with the naive algorithm (taking each value and performing a multiplication operation using floating-point arithmetic 0.75. Performing this over 10 000 000 samples on the aarch64 server supplied by our school gives gave our entire program a runtime of 514ms. Subtracting the setup and teardown, as well as unrelated code (which took 496ms to run), we get 18ms of runtime for the operations we care about.

Lets look at two alternative algorithms to see if we can speed things up:

Alternative Algorithm 1: Pre-scaled array

Since we are performing the same operation many times, and since we are likely to get repeat inputs and outputs, perhaps it would save time to do each possible calculation once (about 65,000 times, for all possible values of an int16_t). This way instead of doing a new calculation, we can just look up the output value that we already know tot be correct.

Unfortnately, this did not result in a speedup. It actually slowed things down to a total of 42ms (vs the original 18). I was expecting it to be faster, and I think it would have, except for the problem of our int16_t’s being unsigned. The subtracting and adding of the negative portions (since you cannot use a negative index in an array) may have slowed down my implementation here. This method also requires quite a bit of memory overhead (an array storing around 65000 16-bit integers). I tested this with 1 billion samples after 10 million seemed to be slower, and found that it was still slower. The payoff didn’t seem become better in the long game either.

Alternative Algorithm 2: Fixed-point arithmetic

Instead of using floating point arithmetic to multiple by 0.75, we can use regular integer multiplication and bit-shifting to get the same results. Here, my algorithm multiplied input sample values by 6, and shifted the result to the right by 3 bits (truncating the last 3 digits, thus dividing by 2 three times). The end result is that we get an integer value that is 3/4 of the original.

Now, this algorithm has given us a significant speedup (at least, around the specific operations being done) and is the best of the three. 13ms compared to the naive algorithm’s 18ms puts us at 28% faster with the same memory overhead (or lack of it, compared with the pre-calculated array).

*Are they fast enough to process CD data?* – At 44100 samples per second, and 2 channels, I think each algorithm is sufficient. If it can process 10 million samples in a few milliseconds, it should easily be able to keep up with a CD. I will note however, that we should use the best algorithm regardless, since it won’t be the only program running on the machine.

Further Optimizations

As stated before, I think treating the values as unsigned integers may prove to speed things up, especially in the pre-calculated array algorithm. The addition and subtraction waste processing of a repetitive and useless task that could be avoided.