Option to specify relative sample occurrence in batch generator (#112) · Issues · public_projects / ketos

Option to specify relative sample occurrence in batch generator

In an active-learning scenario where a human validates model outputs and identifies certain samples as particularly important, it would be useful to have a mechanism for over-sampling individual samples in the training process. One way to achieve this could be to assign to each sample an integer to indicate how many times that sample should appear in a training epoch. By default, the value would be 1, i.e., each sample appears once in an epoch. This integer (which we could call multiplicity') could be saved to a column in the hdf5 table. Then we could then point the BatchGenerator to this column in the hdf5 table using an argument like mult_field` or something like that.

@fsfrazao , your thoughts?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information