Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • ketos ketos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 27
    • Issues 27
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • public_projects
  • ketosketos
  • Issues
  • #175

Closed
Open
Created May 18, 2022 by Fabio Frazao@fsfrazaoOwner

create_database indexes separate annotations table incorrectly

When using the database_interface.create data function to create a database with a separate annotations table, the resulting annotations table has incorrect values for the "data_index" column.

This is the case used when each data sample in the data table can have multiple annotations in the data_annot table, and the data_index field is supposed to act as a key linking each annotation to its corresponding data row.

However, I found that the values of this column don't actually match the data table.

Here is a minimal example to reproduce the issue:

import pandas as pd
from ketos.data_handling import selection_table as sl
import ketos.data_handling.database_interface as dbi
from ketos.audio.waveform import Waveform




sample_audio = Waveform.cosine(rate=1000,frequency=1000,duration=30.0)
sample_audio.to_wav("sample_audio.wav")
filelist = pd.DataFrame([{"filename":"sample_audio.wav", "duration":30.0}])


annot = pd.DataFrame([{"filename":"sample_audio.wav", "start":2.0, "end":3.0, "label":"A"},
                     {"filename":"sample_audio.wav", "start":5.0, "end":6.0, "label":"A"},
                     {"filename":"sample_audio.wav", "start":21.0, "end":22.0, "label":"A"},
                     {"filename":"sample_audio.wav", "start":25.0, "end":27.0, "label":"A"}])


annot_std = sl.standardize(table=annot, labels=["A"],trim_table=False, start_labels_at_1=True)

selection_table = sl.select_by_segmenting(files=filelist, length=10, annotations=annot_std, step=None, discard_empty=False, pad=False)

audio_repr = {'duration': 10.0,
              'rate': 1000,
              'window': 0.051,
              'step': 0.01955,
              'freq_min': 0,
              'freq_max': 500,
              'window_func': 'hamming',
              'type': 'MagSpectrogram'}

dbi.create_database(output_file="db.h5", data_dir="./",
                        dataset_name='train/', selections=selection_table[0], annotations=selection_table[1],
                        audio_repres=audio_repr,include_attrs=True, unique_labels=[1], include_label=False)

Once db.h5 is created, reading the `data_index column returns:

import tables
db = tables.open_file("db.h5",'r')
tbl = db.get_node("/train/data_annot")
tbl.col("data_index")

array([0, 0, 0, 0], dtype=uint32)

But what I expected was

array([0, 0, 2, 2], dtype=uint32)

Edited May 18, 2022 by Fabio Frazao
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking