Improving the design of the Spectrogram class (#76) · Issues · public_projects / ketos

Improving the design of the Spectrogram class

The current implementation of Spectrogram class has several short-comings on the developer side, most notably:

The handling of annotations and time/file data is clumsy
The conversion from physical values (time, frequency) to bin numbers is a constant source of trouble
Repeated application of the same operation (e.g. cropping) to a list of spectrograms is rather slow
The spectrogram shares several methods with the audio class, but these currently have separate implementations

To solve these issues, I suggest the following changes:

Store annotations and time/file data as numpy arrays with same 0th dimension as the spectrogram image
Create a separate Axis class to handle conversion from physical values to bin numbers
Add an extra dimension to all numpy arrays to allow for multiple spectrograms and vectorized operations
Implement the Audio class a special instance of the more general Spectrogram class with 1st dimension equal to 1

Thus, the new Spectrogram class would have the following attributes:

image: 3D numpy array (L x M x N) of type float
annotation_matrix: 3D numpy array (L x K x N) of type bool
time_vector: 2D numpy array (L x N) of type float
file_vector: 2D numpy array (L x N) of type int
taxis: instance of the Axis class to handle conversion of time to bin numbers
faxis: instance of the Axis class to handle conversion of frequency to bin numbers

where,

The current Spectrogram class would then correspond to the special case N = 1, and the Audio class would correspond to the special case M = 1.

@fsfrazao , any thoughts?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information