Run Evaluations

Evaluate agents with user simulation cases

DM