Abstract: Designing reward functions for reinforcement learning (RL)-based quadruped locomotion often requires extensive trial-and-error, limiting efficiency and interpretability. Lack of ...