Text-Conditional JEPA for Learning Semantically Rich Visual Representations | Flume