Benchmarking Defeasible Reasoning with Large Language Models -- Initial Experiments and Future Directions [2410.12509]