Optimizing Token Generation in PyTorch Decoder Models | Flume