Mostrar el registro sencillo del ítem

Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

dc.contributor.authorGu, Xiang
dc.contributor.authorPang, Shuchao
dc.contributor.authorDu, Anan
dc.contributor.authorWang, Yifei
dc.contributor.authorMiao, Jixiang
dc.contributor.authorDíez Peláez, Jorge 
dc.date.accessioned2024-12-03T08:09:16Z
dc.date.available2024-12-03T08:09:16Z
dc.date.issued2024-10-19
dc.identifier.isbn978-1-64368-548-9
dc.identifier.urihttps://hdl.handle.net/10651/75852
dc.description.abstractFew-shot learning is crucial for downstream tasks involving point clouds, given the challenge of obtaining sufficient datasets due to extensive collecting and labeling efforts. Pre-trained VLM-Guided point cloud models, containing abundant knowledge, can compensate for the scarcity of training data, potentially leading to very good performance. However, adapting these pre-trained point cloud models to specific few-shot learning tasks is challenging due to their huge number of parameters and high computational cost. To this end, we propose a novel Dynamic Multimodal Prompt Tuning method, named DMMPT, for boosting few-shot learning with pre-trained VLM-Guided point cloud models. Specifically, we build a dynamic knowledge collector capable of gathering task- and data-related information from various modalities. Then, a multimodal prompt generator is constructed to integrate collected dynamic knowledge and generate multimodal prompts, which efficiently direct pre-trained VLM-guided point cloud models toward few-shot learning tasks and address the issue of limited training data. Our method is evaluated on benchmark datasets not only in a standard N-way K-shot few-shot learning setting, but also in a more challenging setting with all classes and K-shot few-shot learning. Notably, our method outperforms other prompt-tuning techniques, achieving highly competitive results comparable to full fine-tuning methods while significantly enhancing computational efficiency.spa
dc.format.extent761-768spa
dc.language.isoengspa
dc.publisherUlle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarín-Diz, José M. Alonso-Moral, Senén Barro, Fredrik Heintzspa
dc.relation.ispartofEuropean Conference on Artificial Intelligencespa
dc.rightsCC Reconocimiento - No Comercial 4.0 Internacional
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectmultimodalspa
dc.subjectprompt tuningspa
dc.subjectfew-shot learningspa
dc.subjectpoint cloud modelspa
dc.subjectvision language modelspa
dc.titleDynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Modelsspa
dc.typeconference outputspa
dc.identifier.doi10.3233/FAIA240559
dc.rights.accessRightsopen accessspa
dc.rights.accessRightsopen access
dc.type.hasVersionVoRspa


Ficheros en el ítem

untranslated

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

CC Reconocimiento - No Comercial 4.0 Internacional
Este ítem está sujeto a una licencia Creative Commons