Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • ketos ketos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 27
    • Issues 27
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • public_projects
  • ketosketos
  • Issues
  • #185

Closed
Open
Created Sep 01, 2022 by Bruno Padovese@bpadoveseOwner

Remove include_attr and attrs arguments from create_database

the create database function has two arguments called include_attr and attrs that I find to be completely unnecessary and redundant with other ketos functions.

What these two attributes do is they allow us to create a database and pass extra columns that are not mandatory by ketos. So in summary, optional columns.

However, i think that it is extremely easy to filter out unwanted columns from the selection table or annotation either with pandas or with ketos itself. For instance the standardize function has a trim_table argument that removes extra fields. I think it is safe to say that someone that set trim_table to True does not want these extra fields making include_attr irrelevant. the other way around also holds true. Someone that sets trim_table to False either wants those extra columns or doesnt have any extra column to start with. And in the rare case where the user just wants some columns but not all, that is also extremely easy to do with pandas:

df.drop(columns=['columns', 'we', 'dont', 'want'])

Which makes the attrs argument irrelevant. Effectively, this change would mean that the dataframe that is passed to the create_database function is complete and ketos should create a table with all the columns contained in the dataframe.

I dont think every ketos function we have need to do everything and cover all bases. I think it is perfectly reasonable to have the user be responsible for some of the work in preparing and handling the data as this will make our functions clearer, simpler to use, and simpler to maintain as well with less conditions etc. Specially pandas, I think we should encourage our users to rely on.

The reason why i am writing an issue rather than making a merge request with this is that even though this seems like a simple modification, i think the code runs a lot deeper and some care is required when changing.

Thoughts?

Edited Sep 01, 2022 by Bruno Padovese
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking