diff --git a/docs/source/modules/audio/audio_loader.rst b/docs/source/modules/audio/audio_loader.rst index 17687f8fb1b933692969e716523e97dceb5e377f..735db7e6a64587e4e3f5fdaefb0fbc6262ab6751 100644 --- a/docs/source/modules/audio/audio_loader.rst +++ b/docs/source/modules/audio/audio_loader.rst @@ -1,3 +1,5 @@ +.. _audio_loader: + Audio Loader ============ diff --git a/docs/source/modules/audio/index.rst b/docs/source/modules/audio/index.rst index b95ed0b62d921c7d06b7e31864409e5c83470aba..4145768d83c044addf2c22a35025bced9936f346 100644 --- a/docs/source/modules/audio/index.rst +++ b/docs/source/modules/audio/index.rst @@ -1,6 +1,12 @@ Audio Processing ----------------- +The audio modules provide high-level interfaces for loading and manipulating audio data +and computing various spectral representations such as magnitude spectrograms and CQT spectrograms. +For the implementation of these functionalities, we rely extensively on +`LibROSA <https://librosa.github.io/librosa/>`_ and `SoundFile <https://pysoundfile.readthedocs.io/en/latest/index.html>`_ . + + .. toctree:: :maxdepth: 2 :glob: @@ -11,3 +17,88 @@ Audio Processing audio_loader annotation Utilities <utils/index> + + +Waveforms +~~~~~~~~~ +The :class:`Waveform <ketos.audio.waveform.Waveform>` class provides a convenient interface for working with +audio time series. For example, the following command will load a segment of a wav file into memory:: + + >>> from ketos.audio.waveform import Waveform + >>> audio = Waveform.from_wav('sound.wav', offset=3.0, duration=6.0) #load 6-s long segment starting 3 s from the beginning of the audio file + +The Waveform object thus created stores the audio data as a Numpy array along with the filename, offset, and some additional attributes:: + + >>> type(audio.get_data()) + <class 'numpy.ndarray'> + >>> audio.get_filename() + 'sound.wav' + >>> audio.get_offset() + 3.0 + >>> audio.get_attrs() + {'rate': 1000, 'type': 'Waveform'} + +To Waveform class has a number of useful methods for manipulating audio data, e.g., adding Gaussian noise to +an audio segment (:meth:`add_gaussian_noise() <ketos.audio.waveform.Waveform.add_gaussian_noise>`), or splitting an audio segment +into several shorter segments (:meth:`segment() <ketos.audio.waveform.Waveform.segment>`). Please consult the documentation of the +:ref:`waveform` module for the complete list. + + +Spectrograms +~~~~~~~~~~~~ +Four different types of spectrograms have been implemented in ketos: :class:`magnitude spectrogram <ketos.audio.spectrogram.MagSpectrogram>`, +:class:`power spectrogram <ketos.audio.spectrogram.PowSpectrogram>`, :class:`mel spectrogram <ketos.audio.spectrogram.MelSpectrogram>`, and +:class:`CQT spectrogram <ketos.audio.spectrogram.CQTSpectrogram>`. These are all derived from the same +:class:`Spectrogram <ketos.audio.spectrogram.Spectrogram>` parent class, which in turn derives from the +:class:`BaseAudio <ketos.audio.base_audio.BaseAudio>` base class. + +The spectrogram classes provide interfaces for computing and manipulating spectral frequency presentations of audio data. +Like a waveform, a spectrogram object can also be created directly from a wav file:: + + >>> from ketos.audio.spectrogram import MagSpectrogram + >>> spec = MagSpectrogram.from_wav('sound.wav', window=0.2, step=0.01, offset=3.0, duration=6.0) #spectrogram of a 6-s long segment starting 3 s from the beginning of the audio file + +The MagSpectrogram object thus created stores the spectral representation of the audio data as a (masked) 2D Numpy array along with the +filename, offset, and some additional attributes:: + + >>> type(spec.get_data()) + <class 'numpy.ma.core.MaskedArray'> + >>> audio.get_filename() + 'sound.wav' + >>> spec.get_offset() + 3.0 + >>> spec.get_attrs() + {'time_res': 0.01, 'freq_min': 0.0, 'freq_res': 4.9504950495049505, 'window_func': 'hamming', 'type': 'MagSpectrogram'} + +The spectrogram classes have a number of useful methods for manipulating spectrograms, e.g., cropping in either the time or +frequency dimension or both (:meth:`crop() <ketos.audio.spectrogram.Spectrogram.crop>`), or recovering +the original waveform (:meth:`recover_waveform() <ketos.audio.spectrogram.MagSpectrogram.recover_waveform>`). +Note that annotations can be added to both waveform and spectrogram objects using the +:meth:`annotate() <ketos.audio.base_audio.BaseAudio.annotate>` method. For example,:: + + >>> spec.annotate(start=3.5, end=4.6, label=1) + >>> spec.get_annotations() + label start end freq_min freq_max + 0 1 3.5 4.6 NaN NaN + +See the documentation of the :ref:`spectrogram` module for the complete list. + + +Loading Multiple Audio Segments +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :class:`AudioSelectionLoader <ketos.audio.audio_loader.AudioSelectionLoader>` and +:class:`AudioFrameLoader <ketos.audio.audio_loader.AudioFrameLoader>` classes provide +convenient interfaces for loading a selection or sequence of audio segments into memory, +one at a time. For example,:: + + >>> from ketos.audio.audio_loader import AudioFrameLoader + >>> # specify the audio representation + >>> audio_repres = {'type':'MagSpectrogram', 'window':0.2, 'step':0.01} + >>> # create an object for loading 3-s long segments with a step size of 1.5 s (50% overlap) + >>> loader = AudioFrameLoader(frame=3.0, step=1.5, filename='sound.wav', repres=audio_repres) + >>> # load the first two segments + >>> spec1 = next(loader) + >>> spec2 = next(loader) + +See the documentation of the :ref:`audio_loader` module for more examples and details. diff --git a/docs/source/modules/audio/spectrogram.rst b/docs/source/modules/audio/spectrogram.rst index 91b15851499a89f6db3cba4b66fe2a89f6b6c95d..54c6560bae05c8407c1b44addc7fdfc0f1f0efaa 100644 --- a/docs/source/modules/audio/spectrogram.rst +++ b/docs/source/modules/audio/spectrogram.rst @@ -1,3 +1,5 @@ +.. _spectrogram: + Spectrogram =========== diff --git a/docs/source/modules/audio/waveform.rst b/docs/source/modules/audio/waveform.rst index 09bb9d80b63ecae6174a00fe28e40e8c5c7d0c4e..af2b5e7278391fa569460a156f46d5971c435cf2 100644 --- a/docs/source/modules/audio/waveform.rst +++ b/docs/source/modules/audio/waveform.rst @@ -1,3 +1,5 @@ +.. _waveform: + Waveform ======== diff --git a/docs/source/modules/data_handling/database_interface.rst b/docs/source/modules/data_handling/database_interface.rst index 59fdf9f5190e40a98ea60c2a3b44c0defc8817c0..46456bce55bd467a83d004111a75c25f4433cd09 100644 --- a/docs/source/modules/data_handling/database_interface.rst +++ b/docs/source/modules/data_handling/database_interface.rst @@ -1,3 +1,5 @@ +.. _database_interface: + Database Interface ================== diff --git a/docs/source/modules/data_handling/index.rst b/docs/source/modules/data_handling/index.rst index 29095e311cf7913cdf654cb92c980f7a058206a2..cdbae96393e68a42c64ae60be682b9e226a868f9 100644 --- a/docs/source/modules/data_handling/index.rst +++ b/docs/source/modules/data_handling/index.rst @@ -1,6 +1,16 @@ Data Handling -------------- +The data handling modules provide high-level interfaces for storing audio samples in +databases along with relevant metadata and annotations, and for retrieving stored data +for efficient ingestion into neural networks. +Ketos uses the `HDF5 <https://en.wikipedia.org/wiki/Hierarchical_Data_Format>`_ database +format, a file format designed to store and organize large amounts of data which is +widely used in scientific computing. +The data handling modules also provide high-level functionalities for working with +annotation data and selection tables. + + .. toctree:: :maxdepth: 2 @@ -9,3 +19,150 @@ Data Handling database_interface parsing selection_table + + +Annotation and Selection Tables +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :ref:`selection_table` module provides functions for manipulating annotation +tables and creating selection tables. The tables are saved in .csv format and +loaded into memory as `pandas DataFrames +<https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_. + +A Ketos annotation table always has the column 'label'. +For call-level annotations, the table also contains the columns 'start' +and 'end', giving the start and end time of the call measured in seconds +since the beginning of the file. +The table may also contain the columns 'freq_min' and 'freq_max', giving the +minimum and maximum frequencies of the call in Hz, but this is not required. +The user may add any number of additional columns. +Note that the table uses two levels of indices, the first index being the +filename and the second index an annotation identifier. + +Here is a minimum example:: + + label + filename annot_id + file1.wav 0 2 + 1 1 + 2 2 + file2.wav 0 2 + 1 2 + 2 1 + + +And here is a table with call-level annotations and a few extra columns:: + + start end label min_freq max_freq file_time_stamp + filename annot_id + file1.wav 0 7.0 8.1 2 180.6 294.3 2019-02-24 13:15:00 + 1 8.5 12.5 1 174.2 258.7 2019-02-24 13:15:00 + 2 13.1 14.0 2 183.4 292.3 2019-02-24 13:15:00 + file2.wav 0 2.2 3.1 2 148.8 286.6 2019-02-24 13:30:00 + 1 5.8 6.8 2 156.6 278.3 2019-02-24 13:30:00 + 2 9.0 13.0 1 178.2 304.5 2019-02-24 13:30:00 + +Selection tables look similar to annotation tables, except that they are not +required to have 'label' column. Instead, they typically only have the columns +'start' and 'end', supplemented by a filename index and a selection index. + +When working with annotation tables, the first step is typically to standardize the +table format to match the format expected by Ketos. For example, given the annotation +table:: + + >>> import pandas as pd + >>> annot = pd.read_csv('annotations.csv') + >>> annot + source start_time stop_time species time_stamp + 0 file1.wav 7.0 8.1 humpback 2019-02-24 13:15:00 + 1 file1.wav 8.5 12.5 killer whale 2019-02-24 13:15:00 + 2 file2.wav 2.2 3.1 killer whale 2019-02-24 13:30:00 + 3 file2.wav 5.8 6.8 boat 2019-02-24 13:30:00 + 4 file2.wav 9.0 13.0 humpback 2019-02-24 13:30:00 + +we apply the :meth:`standardize() <ketos.data_handling.selection_table.standardize>` +method to obtain:: + + >>> from ketos.data_handling.selection_table import standardize + >>> annot_std, label_dict = standardize(annot, mapper={'source':'filename', 'start_time':'start', 'stop_time':'end', 'species':'label'}, return_label_dict=True) + >>> label_dict + {'boat': 1, 'humpback': 2, 'killer whale': 3} + >>> annot_std + start end label time_stamp + filename annot_id + file1.wav 0 7.0 8.1 2 2019-02-24 13:15:00 + 1 8.5 12.5 3 2019-02-24 13:15:00 + file2.wav 0 2.2 3.1 3 2019-02-24 13:30:00 + 1 5.8 6.8 1 2019-02-24 13:30:00 + 2 9.0 13.0 2 2019-02-24 13:30:00 + +Having transformed the annotation table to the standard Ketos format, we can now +use it to create a selection table. The :ref:`selection_table` module provides +a few methods for this task such as :meth:`select() <ketos.data_handling.selection_table.select>`, +:meth:`select_by_segmenting() <ketos.data_handling.selection_table.select_by_segmenting>`, and +:meth:`create_rndm_backgr_selections() <ketos.data_handling.selection_table.create_rndm_backgr_selections>`. +Here, we will demonstrate a simple use case of the :meth:`select() <ketos.data_handling.selection_table.select>` method:: + + >>> from ketos.data_handling.selection_table import select + >>> st = select(df_std, length=6.0, center=True) #create 6-s wide selection windows, centered on each annotation + >>> st + label time_stamp start end + filename sel_id + file1.wav 0 2 2019-02-24 13:15:00 4.55 10.55 + 1 3 2019-02-24 13:15:00 7.50 13.50 + file2.wav 0 3 2019-02-24 13:30:00 -0.35 5.65 + 1 1 2019-02-24 13:30:00 3.30 9.30 + 2 2 2019-02-24 13:30:00 8.00 14.00 + +Based on this selection table, one can create a database of sound clips using +:meth:`create_database() <ketos.data_handling.database_interface.create_database>`, +as discussed below. + +The :ref:`selection_table` module provides several other useful methods, e.g., for querying +annotation tables. See the documentation of the :ref:`selection_table` module for more information. + + +Database Interface +~~~~~~~~~~~~~~~~~~ +The :ref:`database_interface` module provides high-level functions for managing audio data +stored in the `HDF5 <https://en.wikipedia.org/wiki/Hierarchical_Data_Format>`_ databases. +For the implementation of these functionalities, we rely extensively on the +`PyTables <https://www.pytables.org/index.html>`_ package. + +The :class:`AudioWriter <ketos.data_handling.database_interface.AudioWriter>` class provides a convenient +interface for saving Ketos audio objects such :class:`Waveform <ketos.audio.waveform.Waveform>` +or :class:`Spectrogram <ketos.audio.spectrogram.Spectrogram>` to a database,:: + + >>> from ketos.data_handling.database_interface import AudioWriter + >>> aw = AudioWriter('db.h5') #create an audio writer instance + >>> from ketos.audio.spectrogram import MagSpectrogram + >>> spec = MagSpectrogram.from_wav('sound.wav', window=0.2, step=0.01) #load a spectrogram + >>> aw.write(spec) #save the spectrogram to the database (by default, the spectrogram is stored under /audio) + >>> aw.close() #close the database file + +The spectrogram is saved along with relevant metadata such as the filename, +the window and step sizes used, etc. Any annotations associated with the spectrogram +are also saved. + +The spectrogram can be loaded back into memory as follows,:: + + >>> import ketos.data_handling.database_interface as dbi + >>> fil = dbi.open_file('db.h5', 'r') + >>> tbl = dbi.open_table(fil, '/audio') + >>> spec = load_audio(tbl)[0] + +The :ref:`database_interface` module provides several other useful methods, including +:meth:`create_database() <ketos.data_handling.database_interface.create_database>` +for creating a database of audio samples directly from a set of .wav files. + +See the documentation of the :ref:`database_interface` module for more information. + + + +Data Feeding +~~~~~~~~~~~~ + +The :class:`ketos.data_handling.data_feeding.BatchGenerator` class provides a high-level +interface for loading waveform and spectrogram objects stored in the Ketos HDF5 database +format and feeding them in batches to a machine learning model. +See the class documentation for more information. \ No newline at end of file diff --git a/docs/source/modules/data_handling/selection_table.rst b/docs/source/modules/data_handling/selection_table.rst index 10fa69ef782eb02cb2d1acb92a300346c52e803c..7e4112e6a7b1de0a57204c1aafc8fc74ecd9c8d7 100644 --- a/docs/source/modules/data_handling/selection_table.rst +++ b/docs/source/modules/data_handling/selection_table.rst @@ -1,3 +1,5 @@ +.. _selection_table: + Selection Table ================ diff --git a/docs/source/modules/index.rst b/docs/source/modules/index.rst index 211a0ed710f79fd05b07bf8b8b81f6c2332abe26..2cedb71b11fb90dc74222112a96b83b5e5e4e26b 100644 --- a/docs/source/modules/index.rst +++ b/docs/source/modules/index.rst @@ -6,4 +6,6 @@ ketos API Data Handling <data_handling/index> Audio Processing <audio/index> Neural networks <neural_networks/index> - Utilities <utils> \ No newline at end of file + Utilities <utils> + + diff --git a/docs/source/modules/wavfile.rst b/docs/source/modules/wavfile.rst deleted file mode 100644 index 2f0ab215d7599ba55a65015c08076e8de0f4cdee..0000000000000000000000000000000000000000 --- a/docs/source/modules/wavfile.rst +++ /dev/null @@ -1,7 +0,0 @@ -Wavfile -======= - -.. automodule:: ketos.external.wavfile - :members: - :undoc-members: - :show-inheritance: diff --git a/ketos/audio/base_audio.py b/ketos/audio/base_audio.py index fa5e763e26db53c2f6bbf676af1010e8a13af1de..a6d14f24702c5def7b712bbfac3fa71ef5eb230f 100644 --- a/ketos/audio/base_audio.py +++ b/ketos/audio/base_audio.py @@ -461,7 +461,7 @@ class BaseAudio(): def annotate(self, **kwargs): """ Add an annotation or a collection of annotations. - Input arguments are described in :method:`audio.annotation.AnnotationHandler.add` + Input arguments are described in :meth:`ketos.audio.annotation.AnnotationHandler.add` """ if self.annot is None: #if the object does not have an annotation handler, create one! self.annot = AnnotationHandler() @@ -558,7 +558,7 @@ class BaseAudio(): return d - def plot(self, id=0, figsize=(5,4)): + def plot(self, id=0, figsize=(5,4), label_in_title=True): """ Plot the data with proper axes ranges and labels. Optionally, also display annotations as boxes superimposed on the data. @@ -572,6 +572,8 @@ class BaseAudio(): contains multiple, stacked data arrays. figsize: tuple Figure size + label_in_title: bool + Include label (if available) in figure title Returns: fig: matplotlib.figure.Figure @@ -594,7 +596,7 @@ class BaseAudio(): # title title = "" if filename is not None: title += "{0}".format(filename) - if label is not None: + if label is not None and label_in_title: if len(title) > 0: title += ", " title += "{0}".format(label) diff --git a/ketos/audio/spectrogram.py b/ketos/audio/spectrogram.py index 4ff48a2d93f77e110b928c69170d9728ab0a8873..99f6938f20af2558bc2a32371f7e8f0b6848c021 100644 --- a/ketos/audio/spectrogram.py +++ b/ketos/audio/spectrogram.py @@ -644,7 +644,7 @@ class Spectrogram(BaseAudio): self.data = reduce_tonal_noise(self.data, method=method, time_const_len=time_const_len) - def plot(self, id=0, show_annot=False, figsize=(5,4)): + def plot(self, id=0, show_annot=False, figsize=(5,4), label_in_title=True): """ Plot the spectrogram with proper axes ranges and labels. Optionally, also display annotations as boxes superimposed on the spectrogram. @@ -660,6 +660,8 @@ class Spectrogram(BaseAudio): Display annotations figsize: tuple Figure size + label_in_title: bool + Include label (if available) in figure title Returns: fig: matplotlib.figure.Figure @@ -680,7 +682,7 @@ class Spectrogram(BaseAudio): .. image:: ../../../../ketos/tests/assets/tmp/spec_w_annot_box.png """ - fig, ax = super().plot(id, figsize) + fig, ax = super().plot(id, figsize, label_in_title) x = self.get_data(id) # select image data extent = (0., self.duration(), self.freq_min(), self.freq_max()) # axes ranges @@ -1507,7 +1509,7 @@ class CQTSpectrogram(Spectrogram): """ return self.freq_ax.bins_per_oct - def plot(self, id=0, show_annot=False, figsize=(5,4)): + def plot(self, id=0, show_annot=False, figsize=(5,4), label_in_title=True): """ Plot the spectrogram with proper axes ranges and labels. Optionally, also display annotations as boxes superimposed on the spectrogram. @@ -1521,12 +1523,16 @@ class CQTSpectrogram(Spectrogram): contains multiple, stacked spectrograms. show_annot: bool Display annotations + figsize: tuple + Figure size + label_in_title: bool + Include label (if available) in figure title Returns: fig: matplotlib.figure.Figure A figure object. """ - fig = super().plot(id, show_annot, figsize) + fig = super().plot(id, show_annot, figsize, label_in_title) ticks, labels = self.freq_ax.ticks_and_labels() plt.yticks(ticks, labels) return fig diff --git a/ketos/audio/waveform.py b/ketos/audio/waveform.py index 90cad6cb139dca9b94c1383d4317fd4ec023da50..71d3df4a528b85975be20d37bc90da5d89091423 100644 --- a/ketos/audio/waveform.py +++ b/ketos/audio/waveform.py @@ -24,7 +24,7 @@ # along with this program. If not, see <https://www.gnu.org/licenses/>. # # ================================================================================ # -""" audio module within the ketos library +""" Waveform module within the ketos library This module provides utilities to work with audio data. diff --git a/ketos/data_handling/selection_table.py b/ketos/data_handling/selection_table.py index d0fa3169e16792585b9e66c350c9562c826b6d9f..48d5e140cc5e7d007affe0cdbf22bdf7a346fda8 100644 --- a/ketos/data_handling/selection_table.py +++ b/ketos/data_handling/selection_table.py @@ -28,9 +28,7 @@ This module provides functions for handling annotation tables and creating selection tables. - A Ketos annotation table always uses two levels of indices, the first index - being the filename and the second index an annotation identifier, and always - has the column 'label'. + A Ketos annotation table always has the column 'label'. For call-level annotations, the table also contains the columns 'start' and 'end', giving the start and end time of the call measured in seconds since the beginning of the file.