Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

Gu, Xiang; Pang, Shuchao; Du, Anan; Wang, Yifei; Miao, Jixiang; Díez Peláez, Jorge

doi:10.3233/FAIA240559

Repositorio

Cómo publicar

Recursos

FAQs

Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

Autor(es) y otros:

Gu, Xiang; Pang, Shuchao; Du, Anan; Wang, Yifei; Miao, Jixiang; Díez Peláez, Jorge

Palabra(s) clave:

multimodal

prompt tuning

few-shot learning

point cloud model

vision language model

Fecha de publicación:

2024-10-19

Editorial:

Ulle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarín-Diz, José M. Alonso-Moral, Senén Barro, Fredrik Heintz

Descripción física:

761-768

Resumen:

Few-shot learning is crucial for downstream tasks involving point clouds, given the challenge of obtaining sufficient datasets due to extensive collecting and labeling efforts. Pre-trained VLM-Guided point cloud models, containing abundant knowledge, can compensate for the scarcity of training data, potentially leading to very good performance. However, adapting these pre-trained point cloud models to specific few-shot learning tasks is challenging due to their huge number of parameters and high computational cost. To this end, we propose a novel Dynamic Multimodal Prompt Tuning method, named DMMPT, for boosting few-shot learning with pre-trained VLM-Guided point cloud models. Specifically, we build a dynamic knowledge collector capable of gathering task- and data-related information from various modalities. Then, a multimodal prompt generator is constructed to integrate collected dynamic knowledge and generate multimodal prompts, which efficiently direct pre-trained VLM-guided point cloud models toward few-shot learning tasks and address the issue of limited training data. Our method is evaluated on benchmark datasets not only in a standard N-way K-shot few-shot learning setting, but also in a more challenging setting with all classes and K-shot few-shot learning. Notably, our method outperforms other prompt-tuning techniques, achieving highly competitive results comparable to full fine-tuning methods while significantly enhancing computational efficiency.

URI:

https://hdl.handle.net/10651/75852

ISBN:

978-1-64368-548-9

Métricas

Estadísticas de uso

Metadatos

Mostrar el registro completo del ítem

CC Reconocimiento - No Comercial 4.0 Internacional

Este ítem está sujeto a una licencia Creative Commons

Repositorio Institucional de la Universidad de Oviedo

Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

Autor(es) y otros:

Palabra(s) clave:

Fecha de publicación:

Editorial:

Descripción física:

Resumen:

URI:

ISBN:

DOI:

Colecciones

Ficheros en el ítem

Métricas

Compartir

Estadísticas de uso

Metadatos