Audio frame loader faster execution (!185) · Merge requests · public_projects / ketos

Audio frame loader faster execution

This merge request adds

minor bug fixes in the base_audio and spectrogram modules (most notably, I fixed a bug that was causing all spectrograms computed with the from_wav method to be slightly misaligned if offset was specified to 0)
a batch_size argument for the AudioFrameLoader class

The AudioFrameLoader loads waveforms with duration frame_length + step_size * (batch_size - 1) and computes the spectrogram. If batch_size > 1, it then splits the computed spectrogram into segments with the specified frame length and step size. The advantage of this approach is that the spectrogram is not computed twice in the overlap region, except at the end of every batch. If, for example, the step_size is half of the frame_length, using a batch size of 8 can reduce the computational time by 30% on my machine.

Note that this branch was branched out from the detection_utils branch.

Should be ready to merge!

Edited Sep 18, 2020 by Oliver Kirsebom