MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
要約
Memory is essential for large vision-language models (LVLMs) to handle long, multimodal interactions, with two method directions providing this capability: long-context LVLMs and memory-augmented agents. However, no existing benchmark conducts a systematic comparison of the two on questions that gen…