Storage and clean up
Implementing a storage quota and effectively managing capacity could be crucial for enhancing Codabench's scalability and ensuring long-term service availability in production.
Features
Deleting files from the MinIO
@OhMaley
: Add a button, in admin analytics page, to manually starts a script removing all orphans files.
-
Nothing is never erased from the MinIO. Add a button to remove all orphan files manually (#1523, #1575) -
To check: #1319
Quota
We need to:
-
Impose a storage quota to each user (e.g. 15GB, platform setting) #1254 -
Idea: separated quota for each joined competition -
Admins should be able to manually increase the quota of any user (on Django admin interface) #1254 -
Have an user interface to manage and delete submissions Solved by: #918 - Actually, submissions can't be removed if there are part of a benchmark (which makes sense to avoid spamming submissions when there is a daily limit).
- Solution: submissions are counted on the quota of the participants, but they can remove a submission (even if it's part of a benchmark) after a delay of e.g. 7 days.
- WARNING: disallow removing leaderboard submissions
- Other idea: organizer phase setting to allow or disallow deleting submissions
-
Be able to delete tasks when deleting a benchmark #810 Partially Solved by: #918 -
Be able to re-use past submissions #632 (closed) -
Failed submission when deleted still remains as Data (deleted from leaderboard/submission panel but can be found in submissions in resources) -
The dumps should be counted too -
Refresh the quota figure automatically when submisison/dataset is uploaded in resource interface -
Readable format for the quota in the user table in admin interface .e.g MB
Light/archive mode after competition end
Option to clean competitions when there are completed. It would for instance keep only leaderboard submissions.
Automatic cleaning
-
Failed submissions, orphans datasets, useless benchmarks, etc. should be marked and cleaned automatically. Users could manually unmark the object to keep it (from the resources interface). Deleted object should be kept in a trash can for one month. This feature needs to be discussed. -
#302 (closed) (schedule)
Monitoring and statistics
-
Fix analytics page download CSV buttons: #649 (closed) Solved by: #988 -
Improve analytics page to include storage statistics. Migrate analytics features from CodaLab. Solved by: #1248 -
date of storage calculation like codalab Solved by: #1265
Per benchmark submission size limit
Migrate the CodaLab feature on Codabench. This may be less important once the quota feature is implemented.
-
Limit the size of submissions -
Django admin interface to define the limit independantly for each benchmark
Interface
@ihsaan-ullah
Improve resources interface
The interface for managing datasets, submissions and tasks could be improved. Indeed, improving the resources interface is also part of helping participants to manage their storage space.
-
Add the size of the files Solved by #895 -
Submission details: in which benchmark Solved by #896 -
make benchmark name clickable link Solved by #896 -
Submission: name it using the filename Solved by #896 -
Rename datasets tab in "Datasets and programs" #840, Solved by #898 -
Have a separate tab for submissions #596 (closed) (and show the title given by the user for datasets and programs, and other consideration in the last comment of #896) Solved by #899 -
Public datasets should have their own tab or a "show/hide" button (see #758) Solved by #906 -
Public datasets should not have delete button if they do not belong to us (see #758) Solved by #906 -
submission detail, make created_by
a clickable link to user profile Solved by #910 -
The search bar for the tab "submissions" should search by filename Solved by #917 -
We need more filtering options. For instance, in the tab "submissions", we should be able to filter a specific competition. Solved by #917 -
UI for quota management and cleanup (A "progress bar" for the storage quota) Solved by #918 -
Delete unused tasks Solved by #918 -
Delete unused datasets and programs Solved by #918 -
Delete unused submissions Solved by #918 -
Delete failed submissions Solved by #918 -
Being able to order rows by clicking on the column name (e.g. size) Solved by #919
As shown in the screenshot below, we can manage and remove submissions from the "resources" interface. However, we do not track what the benchmark on which the submission was made, so it is hard to know what we are removing.
Some issues
-
#145 (closed) -
#637 (closed) -
#552 (closed) -
#862 (impose a limit on dumps for each benchmark) -
#163 (closed)