論文 Hugging Face 発表: 2026-06-10 HF ↑23

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

著者: Guozhen Zhang, Xuerui Qiu, Yutao Cui, Tianhui Song, Changlin Li ほか9名

要約

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is dri…

#multimodal#llm#vision

同じカテゴリの記事