Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • ketos ketos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 27
    • Issues 27
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • public_projects
  • ketosketos
  • Issues
  • #76

Closed
Open
Created Apr 04, 2019 by Oliver Kirsebom@kirsebomOwner

Improving the design of the Spectrogram class

The current implementation of Spectrogram class has several short-comings on the developer side, most notably:

  1. The handling of annotations and time/file data is clumsy
  2. The conversion from physical values (time, frequency) to bin numbers is a constant source of trouble
  3. Repeated application of the same operation (e.g. cropping) to a list of spectrograms is rather slow
  4. The spectrogram shares several methods with the audio class, but these currently have separate implementations

To solve these issues, I suggest the following changes:

  1. Store annotations and time/file data as numpy arrays with same 0th dimension as the spectrogram image
  2. Create a separate Axis class to handle conversion from physical values to bin numbers
  3. Add an extra dimension to all numpy arrays to allow for multiple spectrograms and vectorized operations
  4. Implement the Audio class a special instance of the more general Spectrogram class with 1st dimension equal to 1

Thus, the new Spectrogram class would have the following attributes:

  • image: 3D numpy array (L x M x N) of type float
  • annotation_matrix: 3D numpy array (L x K x N) of type bool
  • time_vector: 2D numpy array (L x N) of type float
  • file_vector: 2D numpy array (L x N) of type int
  • taxis: instance of the Axis class to handle conversion of time to bin numbers
  • faxis: instance of the Axis class to handle conversion of frequency to bin numbers

where,

  • L = number of time bins
  • M = number of frequency bins
  • N = number of spectrograms
  • K = number of labels

The current Spectrogram class would then correspond to the special case N = 1, and the Audio class would correspond to the special case M = 1.

@fsfrazao , any thoughts?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking