A LLM Benchmark based on the Minecraft Builder Dialog Agent Task [2407.12734]