Remove include_attr and attrs arguments from create_database
the create database function has two arguments called include_attr and attrs that I find to be completely unnecessary and redundant with other ketos functions.
What these two attributes do is they allow us to create a database and pass extra columns that are not mandatory by ketos. So in summary, optional columns.
However, i think that it is extremely easy to filter out unwanted columns from the selection table or annotation either with pandas or with ketos itself. For instance the standardize
function has a trim_table
argument that removes extra fields. I think it is safe to say that someone that set trim_table
to True does not want these extra fields making include_attr
irrelevant. the other way around also holds true. Someone that sets trim_table
to False either wants those extra columns or doesnt have any extra column to start with. And in the rare case where the user just wants some columns but not all, that is also extremely easy to do with pandas:
df.drop(columns=['columns', 'we', 'dont', 'want'])
Which makes the attrs
argument irrelevant. Effectively, this change would mean that the dataframe that is passed to the create_database function is complete and ketos should create a table with all the columns contained in the dataframe.
I dont think every ketos function we have need to do everything and cover all bases. I think it is perfectly reasonable to have the user be responsible for some of the work in preparing and handling the data as this will make our functions clearer, simpler to use, and simpler to maintain as well with less conditions etc. Specially pandas, I think we should encourage our users to rely on.
The reason why i am writing an issue rather than making a merge request with this is that even though this seems like a simple modification, i think the code runs a lot deeper and some care is required when changing.
Thoughts?