Placeholder Eric Carmichael requested to merge 334-dataset-caching into develop Feb 22, 2020

@ mention of reviewers`

@jimmykodes`

A brief description of the purpose of the changes contained in this PR.

Adds dataset caching to speed up compute worker runs. (issue #334 (closed))

Known issues/discussion

~~This caches everything.. could probably not cache submissions and maybe some other things?~~ Now only caches input_data and reference_data
Should I move these utility functions somewhere else? This is getting kinda gross structure wise. A simple "entry point" module (ie worker.py) with some peripheral modules (ie utils.py, run.py?) seems to be easier to grok..

A checklist for hand testing

submissions should process more quickly with fatty datasets, ideally test with large AutoDL datasets
cache cleaned when cache size exceeds MAX_CACHE_DIR_SIZE_GB
changing MAX_CACHE_DIR_SIZE_GB env var works properly (converts str to float for comparison)

Checklist