Learning to Reason as Action Abstractions with Scalable Mid-Training RL | Flume