Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning | Flume