Multimodal Learning


* denotes equal contribution. Papers highlighted with a red dashed border are representative works.

TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Yao Xiao, Qiqian Fu, Heyi Tao, Yuqun Wu, Zhen Zhu, Derek Hoiem

Under Review.

[Paper] [Code]
Last Updated: 5/30/2025, 8:04:28 PM