Abstract: Visual question answering (VQA) systems face significant challenges when adapting to real-world data shifts, especially in multi-modal contexts. While robust fine-tuning strategies are ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results