repo size
Closed
repo size
Look into reducing the size of the Ketos gitlab repository
The
ketos
folder on my computer takes up 3.2G of disk space. The main culprits are.git
folder (2.7G), thedocs
folder (400M), and theketos
folder (150M). Within thedocs
folder, most of the disk space comes from the data used for the tutorials. So not much we can do about this. I am not sure what we can do to reduce the.git
folder size. Perhaps getting rid of some of the old release_vX.Y.Z branches would help? Do we want to keep all of those?We could put the .zip files somewhere else so that if a user wants to follow along they will still download this bundle with everything necessary to run the tutorial. The repo itself could just have the html that sphinx will use when building the docs. However, that would make the process of modifying the tutorial a little more cumbersome:
- download the .zip file from wherever it is hosted
- modify the notebook/files
- replace the html in the repo
- zip everything again
- upload the .zip to wherever it should be hosted, updating the available version
- delete the development copy of the .zip from the repo
- build the docs
Edited by Fabio FrazaoI could add this step to the new release checklist: https://gitlab.meridian.cs.dal.ca/public_projects/ketos/-/wikis/New-release-instructions
so we don't forget
Here is the thing, I am not sure why me and Xuhui received errors when fetching everything. I think it might be some limitation with the git configuration that the user can change. But I played around with it for a long time and couldn't fix it. I don`t know why cloning it worked while fetching didnt.
Anyway, wouldn't moving to a different repo still have the same problem of being very large?
Good point Bruno.
Then I would propose that we keep the ipynb file and html file on the ketos gitlab repo, whereas we keep the zip file and the data somewhere else.
The ipynb file will serve as the "single truth".
The html file and the zip file will have to be updated every time changes are made to the ipynb file.
I have now created a new ketos1 archive repo (https://gitlab.meridian.cs.dal.ca/data_analytics_dal/packages/ketos1) which contains all the v1 release branches. These branches have been removed from the main ketos repo, which now only conttains three branches:
- master
- development
- nn_interface_v2
I just did a clean pull of the repo and it still takes up more than 3 GB of disk space (3.1 GB to be precise). I tried deleting the master branch locally, but this did not help. So it seems removing all the old branches had no effect :-(
Suggestions as to what else we could try would be welcome. @matt_s , I seem to remember you had a suggestion about squashing/rebasing? How does this work and what does it do?
Edited by Oliver Kirsebom@kirsebom I think a git rebase can be used to flatten the commit history into a single commit containing the current snapshot, and remove the old commit history to reduce size. I've never tried it, but this stackoverflow post looks like it might be relevant https://stackoverflow.com/questions/24153548/how-do-i-reduce-the-size-of-a-bloated-git-repo-by-non-interactively-squashing-al
Thanks for sharing the link @matt_s .
I suggest we do as follows,
- Move the tutorials to a separate git repo (ketos_tutorials)
- Reduced the size of the test/assets/ folder
- Once ready for release of ketos v2.0, merge development branches into master
- Remove all branches but the master branch
- Squash all commits, as described here
@fsfrazao , @bpadovese any thoughts?