Paper ID: 2209.13531

Analysis of Reinforcement Learning for determining task replication in workflows

Andrew Stephen McGough, Matthew Forshaw

Executing workflows on volunteer computing resources where individual tasks may be forced to relinquish their resource for the resource's primary use leads to unpredictability and often significantly increases execution time. Task replication is one approach that can ameliorate this challenge. This comes at the expense of a potentially significant increase in system load and energy consumption. We propose the use of Reinforcement Learning (RL) such that a system may `learn' the `best' number of replicas to run to increase the number of workflows which complete promptly whilst minimising the additional workload on the system when replicas are not beneficial. We show, through simulation, that we can save 34% of the energy consumption using RL compared to a fixed number of replicas with only a 4% decrease in workflows achieving a pre-defined overhead bound.

Submitted: Sep 14, 2022