FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
要約
Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long l…