Improving the design of the Spectrogram class
The current implementation of Spectrogram class has several short-comings on the developer side, most notably:
- The handling of annotations and time/file data is clumsy
- The conversion from physical values (time, frequency) to bin numbers is a constant source of trouble
- Repeated application of the same operation (e.g. cropping) to a list of spectrograms is rather slow
- The spectrogram shares several methods with the audio class, but these currently have separate implementations
To solve these issues, I suggest the following changes:
- Store annotations and time/file data as numpy arrays with same 0th dimension as the spectrogram image
- Create a separate Axis class to handle conversion from physical values to bin numbers
- Add an extra dimension to all numpy arrays to allow for multiple spectrograms and vectorized operations
- Implement the Audio class a special instance of the more general Spectrogram class with 1st dimension equal to 1
Thus, the new Spectrogram class would have the following attributes:
- image: 3D numpy array (L x M x N) of type float
- annotation_matrix: 3D numpy array (L x K x N) of type bool
- time_vector: 2D numpy array (L x N) of type float
- file_vector: 2D numpy array (L x N) of type int
- taxis: instance of the Axis class to handle conversion of time to bin numbers
- faxis: instance of the Axis class to handle conversion of frequency to bin numbers
where,
- L = number of time bins
- M = number of frequency bins
- N = number of spectrograms
- K = number of labels
The current Spectrogram class would then correspond to the special case N = 1, and the Audio class would correspond to the special case M = 1.
@fsfrazao , any thoughts?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information