Fix submission stuck at scoring
Issue solved
Summary
The problem was that this two actions starts simultaneously:
- Uploading the output of the ingestion program
- Download it in the scoring program
If the ingestion and scoring are separated in two workers, and the output from ingestion step is big, then it results in a BadZipFile error because the scoring program download a file that is not completely uploaded yet.
This pull request:
- Add comments to clarify compute_worker.py
- Add retries and waiting
Note that it may not always work. The max wait is 10 minutes.
Checklist
-
Code review by me -
Hand tested by me -
I'm proud of my work -
CircleCi tests are passing -
Ready to merge