Mostrar el registro sencillo del ítem
Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models
dc.contributor.author | Gu, Xiang | |
dc.contributor.author | Pang, Shuchao | |
dc.contributor.author | Du, Anan | |
dc.contributor.author | Wang, Yifei | |
dc.contributor.author | Miao, Jixiang | |
dc.contributor.author | Díez Peláez, Jorge | |
dc.date.accessioned | 2024-12-03T08:09:16Z | |
dc.date.available | 2024-12-03T08:09:16Z | |
dc.date.issued | 2024-10-19 | |
dc.identifier.isbn | 978-1-64368-548-9 | |
dc.identifier.uri | https://hdl.handle.net/10651/75852 | |
dc.description.abstract | Few-shot learning is crucial for downstream tasks involving point clouds, given the challenge of obtaining sufficient datasets due to extensive collecting and labeling efforts. Pre-trained VLM-Guided point cloud models, containing abundant knowledge, can compensate for the scarcity of training data, potentially leading to very good performance. However, adapting these pre-trained point cloud models to specific few-shot learning tasks is challenging due to their huge number of parameters and high computational cost. To this end, we propose a novel Dynamic Multimodal Prompt Tuning method, named DMMPT, for boosting few-shot learning with pre-trained VLM-Guided point cloud models. Specifically, we build a dynamic knowledge collector capable of gathering task- and data-related information from various modalities. Then, a multimodal prompt generator is constructed to integrate collected dynamic knowledge and generate multimodal prompts, which efficiently direct pre-trained VLM-guided point cloud models toward few-shot learning tasks and address the issue of limited training data. Our method is evaluated on benchmark datasets not only in a standard N-way K-shot few-shot learning setting, but also in a more challenging setting with all classes and K-shot few-shot learning. Notably, our method outperforms other prompt-tuning techniques, achieving highly competitive results comparable to full fine-tuning methods while significantly enhancing computational efficiency. | spa |
dc.format.extent | 761-768 | spa |
dc.language.iso | eng | spa |
dc.publisher | Ulle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarín-Diz, José M. Alonso-Moral, Senén Barro, Fredrik Heintz | spa |
dc.relation.ispartof | European Conference on Artificial Intelligence | spa |
dc.rights | CC Reconocimiento - No Comercial 4.0 Internacional | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
dc.subject | multimodal | spa |
dc.subject | prompt tuning | spa |
dc.subject | few-shot learning | spa |
dc.subject | point cloud model | spa |
dc.subject | vision language model | spa |
dc.title | Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models | spa |
dc.type | conference output | spa |
dc.identifier.doi | 10.3233/FAIA240559 | |
dc.rights.accessRights | open access | spa |
dc.rights.accessRights | open access | |
dc.type.hasVersion | VoR | spa |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
Informática [803]
-
Ponencias, Discursos y Conferencias [4062]