Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning
要約
Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue …