Paper ID: 2408.04736

Towards Using Multiple Iterated, Reproduced, and Replicated Experiments with Robots (MIRRER) for Evaluation and Benchmarking

Adam Norton, Brian Flynn

The robotics research field lacks formalized definitions and frameworks for evaluating advanced capabilities including generalizability (the ability for robots to perform tasks under varied contexts) and reproducibility (the performance of a reproduced robot capability in different labs under the same experimental conditions). This paper presents an initial conceptual framework, MIRRER, that unites the concepts of performance evaluation, benchmarking, and reproduced/replicated experimentation in order to facilitate comparable robotics research. Several open issues with the application of the framework are also presented.

Submitted: Aug 8, 2024