How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark | Flume