updating create_database function to read data from tar files
We could update the create_database method to incorporate functionalities to read and create the database from a tar file. We can make use of python's tarfile package (https://docs.python.org/3/library/tarfile.html).
For reference: I modified the current (ketos version 2.4.1) create_database function to add this functionality to the function. A working example of the code which I implemented for the HALLO DFO dataset can be found be here: https://github.com/coastal-science/detectors_and_classifiers/blob/main/KW_detector_multiclass/code/create_db/prepare_dfo_db.ipynb
This implementation extracts one audio file at a time and then use typical functions to create the database and then removes that extracted file and move onto the next file.
We could either use this modified create_database()
function as a separate function and maybe name it something like create_database_from_tar()
. Else, we could also use a flag-like argument which can be used inside the function as a condition if the user wants to use a tar file to create an audio file.