Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

Gu, Xiang; Pang, Shuchao; Du, Anan; Wang, Yifei; Miao, Jixiang; Díez Peláez, Jorge

doi:10.3233/FAIA240559

Repositorio

Cómo publicar

Recursos

FAQs

Mostrar el registro sencillo del ítem

Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

dc.contributor.author	Gu, Xiang
dc.contributor.author	Pang, Shuchao
dc.contributor.author	Du, Anan
dc.contributor.author	Wang, Yifei
dc.contributor.author	Miao, Jixiang
dc.contributor.author	Díez Peláez, Jorge
dc.date.accessioned	2024-12-03T08:09:16Z
dc.date.available	2024-12-03T08:09:16Z
dc.date.issued	2024-10-19
dc.identifier.isbn	978-1-64368-548-9
dc.identifier.uri	https://hdl.handle.net/10651/75852
dc.description.abstract	Few-shot learning is crucial for downstream tasks involving point clouds, given the challenge of obtaining sufficient datasets due to extensive collecting and labeling efforts. Pre-trained VLM-Guided point cloud models, containing abundant knowledge, can compensate for the scarcity of training data, potentially leading to very good performance. However, adapting these pre-trained point cloud models to specific few-shot learning tasks is challenging due to their huge number of parameters and high computational cost. To this end, we propose a novel Dynamic Multimodal Prompt Tuning method, named DMMPT, for boosting few-shot learning with pre-trained VLM-Guided point cloud models. Specifically, we build a dynamic knowledge collector capable of gathering task- and data-related information from various modalities. Then, a multimodal prompt generator is constructed to integrate collected dynamic knowledge and generate multimodal prompts, which efficiently direct pre-trained VLM-guided point cloud models toward few-shot learning tasks and address the issue of limited training data. Our method is evaluated on benchmark datasets not only in a standard N-way K-shot few-shot learning setting, but also in a more challenging setting with all classes and K-shot few-shot learning. Notably, our method outperforms other prompt-tuning techniques, achieving highly competitive results comparable to full fine-tuning methods while significantly enhancing computational efficiency.	spa
dc.format.extent	761-768	spa
dc.language.iso	eng	spa
dc.publisher	Ulle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarín-Diz, José M. Alonso-Moral, Senén Barro, Fredrik Heintz	spa
dc.relation.ispartof	European Conference on Artificial Intelligence	spa
dc.rights	CC Reconocimiento - No Comercial 4.0 Internacional
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject	multimodal	spa
dc.subject	prompt tuning	spa
dc.subject	few-shot learning	spa
dc.subject	point cloud model	spa
dc.subject	vision language model	spa
dc.title	Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models	spa
dc.type	conference output	spa
dc.identifier.doi	10.3233/FAIA240559
dc.rights.accessRights	open access	spa
dc.rights.accessRights	open access
dc.type.hasVersion	VoR	spa

Ficheros en el ítem

Nombre:: 2024 ECAI.pdf
Tamaño:: 929.3Kb
Formato:: PDF
Descripción:: Versión de la editorial

Este ítem aparece en la(s) siguiente(s) colección(ones)

Informática [875]
Ponencias, Discursos y Conferencias [4233]

Mostrar el registro sencillo del ítem

CC Reconocimiento - No Comercial 4.0 Internacional

Este ítem está sujeto a una licencia Creative Commons