HISE Lossless Audio Codec is ready
I spent the last weeks writing a lossless audio compression codec which will be the default format for monolith sample files exported with HISE.
There are a lot of lossless audio codecs around, so why did I even bother to roll my own solution? The answer is simple: Speed. Most codecs try to squeeze out the last bits of the audio signal by applying rather complex algorithms (most codecs use Linear Predictive Coding with some sort of RICE encoding for the residual). This is because for their intented purpose (playing back one audio file at a time) the performance is well enough even on low power devices such as MP3 Players etc.
But when streaming hundreds of voices from disk, things start to look differently. If you load FLAC samples into HISE, playing 32 voices already yield ~40% disk usage (the disk usage here does not mean actual disk reading but already includes the decompression), which is way above a serious performance level (uncompressed files yield something like 2% - 3% on my system). It may be possible to tune the FLAC performance (I overheard somewhere UVI made some changes to this format which brings down the CPU usage).
Apple Lossless is slightly better, but it's still an order of magnitude away from the uncompressed performance. - BTW both codecs actually use the same algorithms.
The codec I had in mind has the following requirements:
- Keep the performance on par with uncompressed sample reading (it might be a few percents worse, but not orders of magnitude like the other ones)
- Try to crunch out as much bits as possible as long as rule 1 is not violated.
- Don't bother about generic audio material - focus on samples.
Rule 3 is pretty important because it makes a big difference if you try to compress a mastered recording or a decaying pianissimo sample.
The results are pretty pleasing so far: the compression ratio is lower than FLAC (FLAC comes down to ~25% when applied to decaying samples while HLAC yields ~45% file size). However the decompression performance is almost equal to uncompressed PCM reading.
This is how HLAC works:
- Save every 4th sample
- Calculate a direct line between every forth sample
- For sample 2, 3 and 4, just store the distance to the line value
- Store the full samples with bit depth compression.
- Store the error signal (2, 3 and 4) with bit depth compression.
The bit depth compression just uses even bit depths - adding odd bit depth compressors decreases the decompression performance.
That's it. The decompression algorithm just reverses those steps (the step size of 4 makes it really easy to apply SSE optimizations). I also added support for memory mapped file reading (which is not 100% trivial because it needs to calculate the offsets).
Feel free to play around with it:
- Load samples into HISE
- Choose "Save Samplemap as Monolith"
- Select compression level (there is actually no reason to use anything else than "low file size", unless you're really pedantic about performance).
- Press OK
- Reload the samplemap and play with HLAC-compressed samples.
This sounds great, looking forward to trying it out.
I know nothing about compression algorithms - what is the reason for using every 4th sample and not every 10th or 50th or just the first and last?
If you use the eg. first and the last sample, then the error signal is the same as the original signal and you don't gain anything :)
4 samples yield the best balance between full signal bit depth and error signal bit depth.
I might improve the algorithm by using cubic splines and 8 sample stride, but I have the gut feeling that this only yields a few percents vs. a massive performance increase...
Can the same monolith be used by all formats of plugin? so I can create my monolith on a PC and it can also be used on Linux and Mac - that was I only need to upload the samples once for every OS
Of course. i didn 't test it on Linux yet but the test suite passes on OSX and so I expect it to also pass on Linux...
Christoph, this format would actually be good for audio editing as well. I'll keep this in mind, it would be great to use this in other places.
Well, there are a few limitations that may make it less useful for this purpose:
- it's currently limited to 16 bit (but with a auto normalization feature when converting 24bit samples so that you get the full 16 bit range so you don't further reduce the bit depth when compressing unnormalized samples). While editing you might want to retain the full bit depth though.
- It has a internal block size of 4096 samples and a lookup table at the beginning to seek to these positions. Seeking to intermediate positions comes with a slight performance penalty (it just decodes the whole block and throws away the unused samples) , but editing may be more tricky because you usually want to cut / truncate samples anywhere.