Speech-to-text transcription using neural networks: training of a Spanish STT model using the DeepSpeech engine
Author:
Director:
Publication date:
Serie:
Grado en Ingeniería Informática del Software
Descripción física:
Abstract:
This project arose from the need to replace cloud-based transcription services with a proprietary solution. Through multiple Machine Learning tools and techniques, the idea is to create a model that allows the transcription of audio speech in Spanish into text. One of these tools is the open-source project called DeepSpeech, developed by Mozilla. This project allows its users to train Artificial Intelligence models that transform audio from any language into text, known as Speech-To-Text in the literature. By means of a process consisting of several phases, a trained model is created to demonstrate the operational capacity of these systems and their potential to emulate existing paid services on the current market. After a theoretical introduction, this document explains in detail all the phases of this process: data collection, preprocessing, training preparation, training, results collection, and evaluation. In addition, the document includes a series of possible extensions to this process that add value and functionality to the final product.
This project arose from the need to replace cloud-based transcription services with a proprietary solution. Through multiple Machine Learning tools and techniques, the idea is to create a model that allows the transcription of audio speech in Spanish into text. One of these tools is the open-source project called DeepSpeech, developed by Mozilla. This project allows its users to train Artificial Intelligence models that transform audio from any language into text, known as Speech-To-Text in the literature. By means of a process consisting of several phases, a trained model is created to demonstrate the operational capacity of these systems and their potential to emulate existing paid services on the current market. After a theoretical introduction, this document explains in detail all the phases of this process: data collection, preprocessing, training preparation, training, results collection, and evaluation. In addition, the document includes a series of possible extensions to this process that add value and functionality to the final product.
Collections
- Trabajos Fin de Grado [2056]
Files in this item
