No error logs when the compute worker crashes
TODOs:
-
Show logs when compute worker crashes because of invalid docker image -
show logs when compute worker crashes because of time limit
Reproduce error:
- execution time limit exceeds
- wrong docker image (does not exist)
When the compute worker crashes, the submission is stuck in Running
state and no logs are forwarded to the UI.
When the worker hit the ExecutionTimeLimitExceeded error (see below), the program simply stops and no logs are returned to the platform for the users.
To reproduce, edit a benchmark to put a very low execution time limit and make a submission.
Traceback (most recent call last):
File "/compute_worker.py", line 687, in start
loop.run_until_complete(gathered_tasks)
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
event_list = self._selector.select(timeout)
File "/usr/local/lib/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
File "/compute_worker.py", line 158, in alarm_handler
raise ExecutionTimeLimitExceeded
compute_worker.ExecutionTimeLimitExceeded