Full Text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The complex and specialized terminology of financial language in Portuguese-speaking markets create significant challenges for natural language processing (NLP) applications, which must capture nuanced linguistic and contextual information to support accurate analysis and decision-making. This paper presents DeB3RTa, a transformer-based model specifically developed through a mixed-domain pretraining strategy that combines extensive corpora from finance, politics, business management, and accounting to enable a nuanced understanding of financial language. DeB3RTa was evaluated against prominent models—including BERTimbau, XLM-RoBERTa, SEC-BERT, BusinessBERT, and GPT-based variants—and consistently achieved significant gains across key financial NLP benchmarks. To maximize adaptability and accuracy, DeB3RTa integrates advanced fine-tuning techniques such as layer reinitialization, mixout regularization, stochastic weight averaging, and layer-wise learning rate decay, which together enhance its performance across varied and high-stakes NLP tasks. These findings underscore the efficacy of mixed-domain pretraining in building high-performance language models for specialized applications. With its robust performance in complex analytical and classification tasks, DeB3RTa offers a powerful tool for advancing NLP in the financial sector and supporting nuanced language processing needs in Portuguese-speaking contexts.

Details

Title
DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain
Author
Pires, Higo 1   VIAFID ORCID Logo  ; Paucar, Leonardo 2   VIAFID ORCID Logo  ; Carvalho, Joao Paulo 3   VIAFID ORCID Logo 

 Education Department, Federal Institute of Maranhão, Pinheiro 65200-000, MA, Brazil 
 Department of Electrical Engineering, Federal University of Maranhão, São Luís 65080-805, MA, Brazil; [email protected] 
 Instituto de Engenharia de Sistemas e Computadores–Investigação e Desenvolvimento (INESC-ID)/Instituto Superior Técnico, Universidade de Lisboa, 1000-029 Lisbon, Portugal; [email protected] 
First page
51
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
25042289
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3181353665
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.