Full Text

Turn on search term navigation

INTRODUCTION

The carbon trading system (ETS) is a crucial tool for facing global climate change and warming. ETS can effectively promote enterprises' green transformation to achieve emission reduction and is widely used in many countries and regions.¹ Carbon prices greatly influence the decisions of enterprises and policymakers. A high carbon price increases the operating costs of enterprises, which is not conducive to their transformation. Carbon prices are too low to play a role in reducing emissions. Therefore, accurate carbon price forecasts help establish a long-term stable and efficient carbon market. Enterprises can formulate reasonable emission reduction and investment strategies for the preservation and appreciation of carbon assets and avoid investment risks. Policymakers can deeply understand and grasp the price fluctuation law of the carbon market and establish an effective carbon price stability mechanism.² However, the carbon market is a policy-based market and is impacted by the heterogeneity of internal market mechanisms and external environmental factors,³ which cause the nonlinearity, nonstationarity, and complexity of fluctuations of carbon prices. Therefore, accurate predictions of carbon prices have become a popular research topic for scholars.

In recent years, many data-driven carbon price forecasting models have been proposed, mainly divided into three categories: econometric, artificial intelligence (AI), and hybrid models. Table 1 lists representative studies of three types of carbon price-prediction models.

Table 1 Summary of selected carbon price forecasting studies

Classification	Application field	Input variable	Exogenous variable treatment	Decomposition method	Predictive model
Econometric models	European Union Emission Trading System (EU ETS) price	Brent oil, oil, coal, natural gas, electricity	-	-	GARCH, EGARCG, TGARCH, GJT-GARCH⁴
	European Union Allowance (EUA) prices	Policy variable, Future economic outlook, Current economic activity	-	-	FIAPGARCH, APGARCH⁵
Artificial intelligence models	Shenzhen carbon price	Coal, Temperature, Air quality index	-	-	Combination-mixed data sampling regression model and back propagation neural network⁶
	EU ETS price	-	-	-	Combination of autoregressive integrated moving average (ARIMA) and least squares support vector machine (LSSVM)⁷
	EU ETS price	-	-	-	Based on phase reconstruction and multi-layer perceptron (MLP) neural network⁸
	EUA price	Dow Jones Euro Stoxx 50 Index, Brent oil, Henry Hub's natural gas future price, Australian BJ thermal coal spot price, Australian Newcastle thermal coal spot price	Concatenate and Direct input	-	Temporal convolutional network(TCN)⁹
	EUA price	Online carbon market news	Concatenate and Direct input	-	Long short-term memory network (LSTM)¹⁰
	Hubei and Guangdong carbon price	Brent oil, NYMEX natural gas, Newcastle coal,	Concatenate and Direct input	-	LSTM¹¹
Hybrid models	Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Tianjin, Chongqing, Fujian carbon price	-	-	Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN)	Extreme gradient boosting (XGboost), Random forest (RF), support vector machine(SVM), radial basis function neural network (RBFNN)¹²
	Beijing, Guangdong, Hubei carbon price	-	-	Adaptive variational mode decomposition (AVMD)	Extreme learning machine (ELM)¹³
	Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Tianjin, Chongqing, Fujian carbon price	-	-	complementary ensemble empirical mode decomposition(CEEMD)	LSTM¹⁴
	Guangdong, Hubei, Shanghai carbon price	-	-	ensemble empirical mode decomposition (EEMD)	Wavelet least square support vector machine (wLSSVM)¹⁵
	Hubei, Shenzhen carbon price	Energy factor, Economic factor, International carbon price, Environmental factor	Max-relevance min-redundanc	Improved complementary ensemble empirical mode decomposition with adaptive noise (ICEEMDAN)	Kernel-based extreme learning machine¹⁶
	Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Tianjin, Chongqing carbon price	Similar products, Energy structure, Economic factors, Environmental factors	Random forest (RF) and Stacked auto encoder (SAE)	Variational modal decomposition (VMD)	Bidirectional long short-term memory (BiLSTM)¹⁷
	EUA price	Brent oil, European ARA port power coal, IPE natural gas, S&P Clean Energy Index, Stoxx 50 Index, CAC40 Index, DAX Index, FTSE-100 Index, S&500, Commodity Research Bureau Futures Index, Certified emission reduction	least absolute shrinkage and selection operator (LASSO)	Hodrick-Prescott filter	ELM¹⁸
	Hubei carbon price	Coal price, Oil price, Natural gas price, Electricity price, Baidu index, social, media sentiment	Concatenate and Direct input	Discrete wavelet transform(DWT), Singular spectrum analysis (SSA), EMD, VMD	Holt's exponential smoothing method (HOLT), Support vector regression (SVR), Back propagation neural network(BPNN), ARIMA¹⁹
	Hubei, Shenzhen, Beijing carbon price	International carbon price, Energy prices, Exchange rate, Macroeconomics, Temperature change	Factor analysis	EMD	LSSVM²⁰

Econometric models were first used in carbon price forecasting, including autoregressive moving averages (ARMA) and generalized autoregressive conditional heteroskedasticity (GARCH). Byun and Cho⁴ used a GARCH-type model to predict European carbon price volatility and found that it outperforms other models. Conrad et al.⁵ employed the FIAPGARCH model to predict carbon prices in the first and second stages of EU-ETS and indicated that FIAPGARCH can well capture the heteroscedasticity and long memory of carbon price fluctuations. However, econometrics is based on the assumption of linearity and stationarity. Thus, capturing the nonlinear and nonstationary characteristics of carbon prices and achieving high prediction accuracy are challenging.

With the rapid development of AI, many AI models have been applied to carbon price forecasting. Han et al.⁶ used a backpropagation neural network (BPNN) to predict the weekly carbon market price in Shenzhen, China. They revealed that the proposed model has an improved prediction accuracy of 30%–40% compared with the benchmark model. Zhu and Wei⁷ adopted the least squares support vector machine (LSSVM) to forecast European carbon prices and found that the prediction accuracy of LSSVM is significantly better than that of ARIMA. Fan et al.⁸ used a multilayer perceptron to forecast carbon prices and verified the forecast validity. Zhang and Wen⁹ proposed a new carbon price prediction model on the basis of temporal convolutional neural network (TCN)-Seq. 2Seq argued that TCN is suitable for small-sample carbon price data set learning, and its performance is better than traditional statistical prediction models. AI-based prediction models exhibit strong data adaptability and feature extraction capabilities¹² and can well handle the nonlinear and nonstationary nature of carbon prices, such as machine and deep learning. In previous studies, AI models exhibited better performance than econometric models.

However, a single model has certain limitations, and achieving the ideal prediction accuracy is difficult.²¹ To further improve the forecasting accuracy of a single model, a decomposition–integration-based hybrid carbon price forecasting model is proposed. The central idea is to first decompose carbon prices into several subsequences through a decomposition algorithm, then forecast these subsequences separately, and finally integrate the forecast results. On the one hand, decomposing the original sequence can effectively reduce the influence of noise on prediction. On the other hand, it can capture the internal characteristics of carbon price time series. Sun and Xu¹³ used adaptive variational mode decomposition and extreme learning machines to predict Beijing, Guangdong, and Hubei carbon markets. They gathered evidence proving that the model has good performance. Sun and Li¹⁴ combined complementary integrated empirical mode decomposition and long short-term neural network (LSTM) to predict eight carbon market prices in China. The experimental results showed that the model has good stability and applicability. Sun and Xu¹⁵ proposed a carbon price prediction model by combining ensemble empirical mode decomposition and an improved wavelet least squares vector machine. The results indicated an improvement in the accuracy of the model's root mean square error (RMSE) compared with other comparable models. Moreover, decomposition technology can further improve the prediction accuracy of carbon prices. Actually, the carbon market is a complex nonlinear system, carbon price is affected by many factors, such as energy factors, economic factors, and weather conditions.²² It is not enough to predict it based on historical information on carbon prices. Therefore, some scholars have considered the effects of exogenous variables in carbon price forecasting. Hao and Tian¹⁶ proposed a hybrid carbon price prediction model that comprehensively considers energy, economic, international carbon price, and environmental factors using the maximum correlation minimum redundancy to determine input features. They found that these external variables can significantly improve the prediction accuracy of carbon prices. Xu et al.¹⁷ used two-stage feature reconstruction to rebuild and reduce the external factors affecting carbon prices and combined VMD and bidirectional long and short-term neural network (BiLSTM) to build a carbon price prediction model. Zhao et al.¹⁸ combined the Hodrick–Prescott filter, extreme learning, and feature selection to select the impact characteristics of the multidimensional carbon price, the experiment shows that these exogenous variables are helpful in carbon price forecasting. Li et al.¹¹ considered the influence of oil, natural gas, and coal on carbon prices, used LSTM for prediction, and experimentally verified the importance of these factors. Zhang and Xia¹⁰ incorporate online news into the LSTM, considering its impact on European Union Allowance price forecasting. Sun and Wang²⁰ reduced the dimensionality of the selected exogenous variables by factor analysis, and combined EMD and least squares support vector machine to predict the carbon price. Wang et al.¹⁹ based on multisource information fusion (MSIF), hybrid multiscale decomposition (HMSD), and combination forecasting method (CFM) to forecast the carbon price. These experimental results reveal that the decomposition method and considering exogenous variables can substantially improve prediction accuracy.

From Table 1 in the hybrid model forecasting methodology, EMD, discrete wavelet transform (DWT), and variational mode decomposition (VMD) are commonly used decomposition methods. However, they have some defects. EMD is an empirically based decomposition method without strict mathematical derivation. Guaranteeing prior convergence is difficult for EMD²³ and problems such as mode aliasing and over-decomposition occur during the decomposition process.²⁴ Wavelet transform and VMD are not adaptive arithmetic, the number of subsequence decompositions must be preset, and wavelet transform should select a wavelet basis function. In addition, existing studies mainly adopt direct input and after dimensionality reduction input when considering exogenous variables—unable to determine how these exogenous variables contribute to carbon price forecasts, that is, the importance of these variables for prediction, and usually only carbon prices are decomposed, ignoring the intrinsic connection between exogenous variables and carbon prices at different timescales. Compared with previous studies, our research proposes a new carbon price prediction model that includes multivariate fast iterative filtering (MFIF), sample entropy (SE), and temporal fusion transform (TFT) (MFIF–SE–TFT). First, the original multivariate time series is composed of carbon prices, and their exogenous variables are decomposed by MFIF. Second, the subsequences decomposed by MFIF are reconstructed using SE to further characterize the features. Finally, the reconstructed subsequences are predicted and integrated using TFT while producing interpretable results for different variable and temporal features. The innovations and contributions of this study are as follows:

(1)
This study uses TFT for carbon price forecasting for the first time. When considering the multiple variables of carbon prices, most studies use the method of direct input after dimension reduction, which not only fails to provide an interpretable result but also leads to a decrease in prediction accuracy, an increase in computational cost, and overfitting. TFT compared to other deep learning models (such as LSTM, convolutional neural network [CNN]), provides an end-to-end learning framework that can adaptively learn important variables for carbon price prediction and analyze persistent temporal patterns through attention to different lag orders, compared to the above model with higher stability and robustness to multivariate variables.
(2)
A new carbon price prediction model MFIF–SE–TFT is proposed. MFIF has strict mathematical derivation to ensure its a priori convergence, which can effectively avoid the modal mixing phenomenon in EMD. It is an adaptive decomposition algorithm compared to DWT and VMD. Therefore, MFIF can achieve better decomposition. MFIF can also ensure the consistency of the number of subsequences after the decomposition of each exogenous variable, and the corresponding subsequences have similar time-frequency characteristics. Therefore, in the hybrid model, MFIF is applied to decompose the multivariate time series of carbon prices, and the linear and nonlinear characteristics are obtained. SE is used to reconstruct subsequence features, thereby reducing prediction cumulative error and time cost. TFT performs prediction and integration on the subsequences to obtain the final prediction result. The experimental results show that the model is generally better than other benchmark models, verifying model stability and reliability.
(3)
This study evaluates the importance of each exogenous variable to different subsequence forecasts through TFT and indicates important variables in carbon price forecasting and its time dependence. The results provide decision-makers with reliable carbon price forecast analysis and decision support.

The remaining sections of this paper are as follows: Sections 2 and 3 mainly introduce the basic theory and framework of our proposed model. Section 4 explains the experimental results and comparative analysis. Section 5 presents the conclusion and expounds on future work.

METHODS

This section introduces related techniques and methods, including MFIF, SE, and TFT. The construction of the whole model is also described.

MFIF

Cicone²⁵ proposed fast iterative filtering (FIF), which can quickly realize iterative filtering (IF) calculation. IF, as an alternative to the EMD class, is an adaptive, local iterative decomposition method. It decomposes nonlinear and nonstationary time series S(t) into several intrinsic mode functions (IMFs) with similar oscillatory components, from high frequency to low frequency, and a residual term²⁶ which is widely used in the field of natural sciences.^27–30 However, the essential difference between IF and EMD is that EMD calculates the local mean of the sequence through the upper and lower envelopes of the sequence determined by the cubic spline, and the convergence and iterative termination conditions are difficult to prove. IF calculates the local mean of the sequence through the convolution of the preselected filter function and the sequence, the convergence and termination conditions have been strictly mathematically proven, and the prior convergence and stability are guaranteed.²⁶ The iterative process of EMD and IF to generate IMFs is as follows: [Image Omitted. See PDF]where S_{n + 1} = M_n(S_n), S₁ = S is the original sequence. L_n(S_n) is the local mean of S_n. In EMD: [Image Omitted. See PDF]

In Equation (2), E_U is the upper envelope of the sequence. E_L is the lower envelope of the sequence. In IF, [Image Omitted. See PDF]

In Equation (3), w(x) is a filter function. L is the mask length, which is determined by the sequence length and the number of extreme points in the sequence.³¹ In FIF, discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT) are obtained through the fast Fourier transform, and then IF is quickly calculated as follows: [Image Omitted. See PDF]where I is the identity matrix, diag is a diagonal matrix, and N₀ represents the number of calculations when IF calculates IMFs. However, FIF cannot decompose multiple time series simultaneously, only one time series at a time. The number of IMFs resulting from different variables may be unequal, splitting the internal relationships among different data to a certain extent. Therefore, this study uses MFIF,³² which can obtain the same number of IMFs and make the IMFs of different data aligned on the time scale. s is a multivariate time series. s = [v₁,v₂,…,v_t], v_t = [v_i(t)]_i=1,…,n, and n is the dimension of the multiple time series. Calculate angle θ(t), and the multidimensional vector rotates with time. [Image Omitted. See PDF]

According to θ(t), the calculation process of MFIF is shown in Table 2.

Table 2 Calculation process of MFIF

Algorithm: MFIF
IMF = {}
Compute $\theta (t)$
while the number of extrema of $\theta (t)$ $\ge$ 2:
compute filter length L of filter function w
set N₀ = 0
while the stopping criterion is not satisfied:
for $i=1$ to n:
${({\hat{u}}_{i}^{(k)})}^{T}={(I-{diag}({DFT}(w)))}^{{N}_{0}}{DFT}\left({u}_{i}^{T}\right)$
End for
${N}_{0}={N}_{0}+1$
end while
$\text{IMF}=\text{IMF}\cup \{{[\text{iDFT}({\hat{u}}_{i}^{(k)})]}_{{\rm{i}}}\}$
${\rm{s}}={\rm{s}}-{[\text{iDFT}({\hat{u}}_{i}^{(k)})]}_{{\rm{i}}}$
end while
$\text{IMF}=\text{IMF}\cup \{{\rm{s}}\}$

When the decomposition algorithm is used, the obtained IMFs are usually affected by the boundary effect; that is, spurious peaks appear at the boundary of the IMFs. Therefore, the boundary effect affects the decomposition quality of each IMF, influencing the overall prediction accuracy. In this study, the method of sequence boundary effect proposed by Stallone et al.³³ is adopted, and the sequence is symmetrically extended before decomposition to eliminate the influence of its boundary effect. The processing method is as follows:

(1)
Subtract its mean m from the original sequence s(t).
(2)
Symmetrically extending the s(t)-m sequence to both ends of the original sequence, the generated extended sequence s_ext(t) is v times the length of the original sequence.
(3)
Multiply characteristic function λ by the extended sequence s_ext(t). λ is 1 in the interval corresponding to the original sequence s(t), smoothly approaching 0 at the new extended boundary.
(4)
Finally, add the mean value m of the original signal to obtain the final processed sequence.

[Image Omitted. See PDF]

SE³⁴ is often used to measure time series complexity. Similar to a template matching search over the entire input signal, the main parameters controlling SE are embedding dimension m and tolerance r, which are used to control the length of each search segment (template) and the similarity among segments, respectively. The calculation process is as follows:

Step 1: Calculate the embedded sequence x_m(i)(i = 1, 2, …, n − m + 1) of the original sequence x(i)(i = 1, 2, 3, …, n). [Image Omitted. See PDF]

Step 2: Calculate the distance d_m between x_m(i) and x_m(j). [Image Omitted. See PDF]

Step 3: Calculate the probability of matching points. A^m(r) is the probability that two sequences match m + 1 points. B^m(r) is the probability that two sequences match m points. v^m is the number of d_m[x_m(i),x_m(j)]≤r, i ≠ j, w^{m + 1} is the number of d_{m + 1}[x_{m + 1}(i),x_{m + 1}(j)] ≤ r number, i ≠ j. [Image Omitted. See PDF] [Image Omitted. See PDF]

Step 4: Calculate the value of SE SampEn(m,r). [Image Omitted. See PDF]

When n is finite, SE can be expressed as follows: [Image Omitted. See PDF]

TFT

TFT was proposed by the Google Cloud AI team, a multihorizon time series prediction model based on attention mechanism and deep neural network, and has been applied in many fields and achieved excellent results^35,36 Compared with other neural networks, such as LSTM, CNN has excellent interpretability and can help understand internal relationships between input features and prediction targets. CNN also has an excellent performance in time series prediction. Figure 1 illustrates the main structure of TFT, comprising five main components. Gating mechanisms through the selection of variables to minimize the contribution of irrelevant variables. Variable selection networks learn the importance of different variables to predictions at each time step. Static covariate encoders are used to integrate static features. In temporal processing, the temporal self-attention decoder learns long and short-term temporal dependencies in the data. The Prediction interval constructs the forecast range of the target value through quantile forecasting. We provide a detailed description of mechanisms, variable selection networks, and temporal processing. For static covariate encoders and prediction intervals, both blocks were not used due to the lack of static information related to carbon prices, and the main focus of this study on deterministic forecasting. A detailed description of the two blocks is given in Lim et al.³⁶

Figure 1. Model architecture of TFT

Gating mechanisms

To make the nonlinear process deal with the relationship between exogenous variables and targets, TFT adopts the gated residual network (GRN). GRN receives primary input a and optional context vector c. GRN is calculated as follows: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]

In the above formula, ELU is an exponential linear unit activation function. ${\eta }_{1}$ and ${\eta }_{2}\in {{\mathbb{R}}}^{{d}_{{model}}}$ represent the middle layer. LayerNorm is the standard layer normalization. ω stands for weight sharing. Using component control layers based on gated linear units (GLUs) provides flexibility to compress any part of the structure that is unnecessary for a given data set. The form of GLU is as follows: [Image Omitted. See PDF]where $\gamma \in {{\mathbb{R}}}^{{d}_{{model}}}$ is the given input. σ(.) is the sigmoid activation function. ${W}_{(.)}{\rm{\epsilon }}{{\mathbb{R}}}^{{d}_{{model}}\times {d}_{{model}}}$ is the weight matrix. ${b}_{(.)}{\rm{\epsilon }}{{\mathbb{R}}}^{{d}_{{model}}}$ is the bias vector. dmodel is the size of the hidden layer. ⊙ stands for Hadamard product. GLU enables TFT to control the degree of control GRN has over original sequence a. If necessary, skip this layer entirely because to suppress the nonlinear contribution, the GLU output can be close to 0. If no context vector c exists, then c can be treated as a zero vector.

Variable selection networks

Multiple feature variables are used in predictions, but the relationships and specific contributions of these variables to targets are unknown. The variable selection layer is designed in TFT to select which variables are most important for prediction and to exclude extraneous noise features that can degrade model performance. ${\Xi }_{t}={[{\xi }_{t}^{{(1)}^{T}},\text{\unicode{x02026}},{\xi }_{t}^{{({m}_{\chi })}^{T}}]}^{T}$ is the flattened input at time t, and ${\xi }_{t}^{{(j)}^{T}}$ is the input transformation of the jth variable. By inputting ${\Xi }_{t}$ and external context variable c_s into GRN. Variable selection weight ${v}_{\chi t}$ is then generated through the Softmax layer. [Image Omitted. See PDF]

At each time step, the nonlinear variation of ${\xi }_{t}^{(j)}$ is performed by GRN. The weights of variable selection are then weighted with the processed features. [Image Omitted. See PDF] [Image Omitted. See PDF]

Multihead attention

TFT improves on the multi-head attention mechanism by employing a self-attention mechanism to learn long-term dependencies in different time steps. Based on the query matrix ${\rm{Q}}\in {{\mathbb{R}}}^{N\times {d}_{{attn}}}$ and the key-value matrix ${\rm{K}}\in {{\mathbb{R}}}^{{\rm{N}}\times {{\rm{d}}}_{\text{attn}}}$ , the attention mechanism scale value ${\rm{V}}\in {{\mathbb{R}}}^{N\times {d}_{V}}$ is as follows: [Image Omitted. See PDF]

N is the number of time steps input into the attention layer. A( ${\rm{Q}},{\rm{K}}$ ) is the normalization function. Compute attention by scaling clicks. [Image Omitted. See PDF]

To improve the learning ability of the single attention mechanism, multi-head attention is used in TFT, and different heads are used for different representation subspaces. [Image Omitted. See PDF] [Image Omitted. See PDF]

Among them, ${W}_{Q}^{(h)}\in {{\mathbb{R}}}^{{d}_{{model}}\times {d}_{{attn}}}$ represents the query matrix weight, ${W}_{K}^{(h)}\in {{\mathbb{R}}}^{{d}_{{model}}\times {d}_{{attn}}}$ is the key-value matrix weight, and ${W}_{V}^{(h)}\in {{\mathbb{R}}}^{{{\rm{d}}}_{\text{model}}\times {{\rm{d}}}_{\text{attn}}}$ represents the scale matrix weights. W_H performs a linear collection of all heads.

Different values are used in each head, so attention weights alone cannot illustrate the importance of features. On this basis, multihead attention is modified to have a shared value at each head, and the sum of all heads. [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]

Among them, ${W}_{V}\in {{\mathbb{R}}}^{{d}_{{model}}\times {d}_{v}}$ is the weight contributed by all heads, and ${W}_{H}\in {{\mathbb{R}}}^{{d}_{{model}}\times {d}_{{attn}}}$ is the final nonlinear mapping. By changing the multihead attention weight generation form, each head can learn different temporal patterns, thereby effectively improving the expression ability. At the same time, simple interpretability studies can still be conducted by analyzing a set of attention weights.

Temporal processing

In temporal processing, TFT first uses LSTM encoder-decoder to generate uniform temporal features, denoted by $\phi (t,n)\in \{\phi (t,-k),.,\phi (t,{\tau }_{{\rm{\max }}})\}$ , n is the position index, and a gated skip connection is employed in this layer. [Image Omitted. See PDF]

A static enrichment layer was then introduced to enhance temporal features. [Image Omitted. See PDF]where c_e is the encoded context vector. Self-attention is added after the static enrichment layer. All temporal features are combined into a matrix ${\rm{\Theta }}(t)={[\theta (t,-k),.,\theta (t,\tau )]}^{T}$ , The multi-head attention (Section 2.3.3) is used at each time step: [Image Omitted. See PDF]

$B(t)=[\beta (t,-k),.,\beta (t,-\tau )]$ , the self-attention mechanism allows TFT to extract long-term dependence between data. After the self-attention, a gated skip connection is also added to simplify training: [Image Omitted. See PDF]

The output of self-attention is processed non-linearly using GRNs, which is similar to the static enrichment layer. [Image Omitted. See PDF]

Afterward, a gated skip connection that skips the entire transformer block is added to make the model adaptively adjust to the complexity, followed by connecting a fully connected layer to produce the predicted output. [Image Omitted. See PDF]

COMBINED MFIF–SE–TFT MODEL

In this study, a new hybrid carbon price forecasting model is proposed that combines the advanced multivariate data decomposition and reconstruction technique MFIF-SE, multiple influence factors, and the interpretable deep learning model TFT. Figure 2 describes the framework of the proposed method. The forecasting steps of this model are as follows:

Figure 2. Framework of the proposed method

Step 1: The advanced multivariate data decomposition technology multivariate fast iterative filtering (MFIF) is used to decompose the multi-dimensional time series and get several multidimensional intrinsic mode functions (IMFs). MFIF can remove the noise in the original sequence and extract the features of different time frequencies, which can effectively improve the prediction accuracy of the model. In the decomposition process, not all multivariate series are input into MFIF, relevant market data with time-frequency features are included, and nonmarket data are not included in this process.

Step 2: Sample Entropy evaluates the complexity of each IMFs, and IMFs with similar complexity are reconstructed to reduce the relevant computational burden, increase the inference speed of the model, and avoid overfitting and error accumulation.

Step 3: The reconstructed multiunit subsequence is input into TFT for training. Before input into TF, due to large differences in the numerical ranges among different factors, to be conducive to the model training, all data are subject to max–min normalization before being put into the model training.³⁷ Max–Min maps the data to [0, 1] and inversely normalizes the output result to obtain the prediction result. The prediction result of each subsequence is aggregated to obtain the final prediction result. TFT not only can extract the interrelation between carbon price and other factors but also obtain different time characteristics.

Step 4: The model performance and interpretability are analyzed. China's five carbon markets (Guangdong, Beijing, Shanghai, Hubei, and Shenzhen) are used. Mean absolute error (MAE), RMSE, mean absolute percentage error (MAPE), coefficient of determination (R²), and directional accuracy assessment (DA) are used to evaluate the prediction results. Interpretability results include the order of importance among variables and the attention to different lagged steps.

DATA COLLECTION AND EVALUATION SYSTEM CONSTRUCTION

This section presents sources of carbon price data and their influence factors. The evaluation index system is also explained.

Data description Carbon price data collection

In the empirical study, the closing prices of five carbon markets in China (Guangdong, Beijing, Shanghai, Hubei, and, Shenzhen) are selected to test the proposed model. These carbon markets are established early enough to provide ample data for experiments. Among them, Hubei and Guangdong each account for about 30% of the trading volume of China's carbon market.³⁸ The selected time range is from the establishment of each carbon market to June 2, 2022, and the date with 0 transaction volume is deleted. The data are obtained from China Carbon Trading Network. Figure 3 shows each carbon price series, which is nonlinear; the price fluctuation law is complex, and no obvious clear pattern is observed. At the same time, differences in the fluctuation laws of different carbon prices are found. For example, Guangdong's carbon prices are more stable than other markets, whereas Beijing and Shenzhen show great volatility. Table 3 presents the relevant statistics for each carbon price. Among them, the mean, maximum (max), median, minimum (min), and variance of each carbon price indicate that the data have large fluctuations. The kurtosis, skewness, and Jbtest indicate that each piece of information does not obey the normal distribution. ADFtest shows that except for Shenzhen's carbon price, other carbon prices have not passed the stationarity test, suggesting that the series is nonstationary. As illustrated in Figure 3 and Table 3, all carbon price sequences are divided into three parts: training, validation, and test sets, with a ratio of 7:1:2.

Figure 3. Carbon price and PACF results for five carbon markets

Table 3 Statistics characteristics of the original data

Carbon market	Abbreviation	Size	Train samples	Validation samples	Test samples	Mean	Max	Median	Min	Std	Kurt	Skew	ADF (p)	JB (p)
Guangdong	GD	1755	1229	175	351	27.4223	95.26	22.71	8.1	17.2313	1.9684	1.5907	0.6264	1.0000
Beijing	BJ	1250	875	125	250	60.0055	107.26	53.34	24	16.4310	−0.5177	0.6650	0.2017	1.0000
Shanghai	SH	1235	865	123	247	35.1317	63	38.1	4.2	10.5377	1.9004	−0.8372	0.5567	1.0000
Hubei	HB	1909	1338	190	381	25.8399	61.48	25.2	10.38	8.7343	0.4859	0.7406	0.6845	1.0000
Shenzhen	SZ	1921	1345	192	384	32.8379	130.9	30.44	3.03	20.1615	0.8627	0.9261	1.0000	1.0000

Influencing factor data collection

According to previous studies, many factors impact carbon prices,³⁹ leading to the uncertainties and complexities of carbon price changes. Considering the impacts of various factors on carbon prices in carbon price forecasts has an important influence on improving the accuracy of carbon price forecasts. This study comprehensively considers historical carbon prices, the trading volumes of carbon allowances, energy prices, economic factors, international carbon prices, carbon-intensive product prices, and environmental factors.

(1)
Historical carbon prices and trading volumes

Historical carbon prices are a key feature in carbon price forecasts. Therefore, correlation analysis is carried out on the historical and predicted values of each carbon price series. Figure 3 illustrates the partial autocorrelation coefficient (PACF) for different lag steps of each carbon price. Historical prices are found to have a strong correlation with predicted prices. Trading volumes not only directly reflect carbon market activities but also contain much information related to carbon market operations.
(2)
Energy prices

Changes in oil, natural gas, and coal price lead to changes in their consumption, which, in turn, affects carbon price volatility.⁴⁰ In this study, Brent and WTI crude oils are selected as oil prices because the two major crude oil markets in the world can well reflect the crude oil market. For the natural gas price, that on the New York Mercantile Exchange is chosen due to the price limit of natural gas by the Chinese government. For coal prices, China is less dependent on coal wellheads and selects thermal and Qinhuangdao coal prices.
(3)
Economic factors

Macroeconomic growth affects the demand and consumption of society as a whole. The carbon trading market is a critical node in the social network. As a significant energy consumer, China is highly dependent on energy imports. Therefore, changes in exchange rates affect domestic energy markets and thus change carbon prices.²⁰ This study selects the exchange rate of USD to RMB, the exchange rate of EUR to RMB, and H&S300 as economic factors.
(4)
International carbon price

The European Union Emissions Trading Scheme (EU-ETS) is the world's largest carbon trading market and plays a leading role in the international carbon trading market. Volatility in the EU-ETS may impact China's carbon trading market, and carbon futures (EUA) traded at over 85% in EU-ETS. Therefore, EUA is chosen as the international carbon price.
(5)
Carbon-intensive product prices

In addition to power companies, China's various carbon markets also cover many other carbon-intensive companies, such as cement, chemical, and steel. The decisions of these enterprises are significantly affected by product and raw material prices, resulting in changes in carbon emissions and carbon prices. Therefore, this study introduces the cement price index, iron ore price, and Chinese chemical product price.
(6)
Environmental factors

Weather changes can affect energy consumption and CO₂ emissions and thus affect carbon prices. For example, in winter, heating in northern China increases energy consumption and CO₂ emissions, resulting in an increased demand for carbon allowances and a rise in carbon prices. We select the highest temperature, lowest temperature, and air quality index (AQI) in various places to measure weather factors.

The above data come from the Wind database and Yahoo Finance. The dates of variable data acquisitions are consistent with the dates in the selected carbon market. To ensure the integrity of carbon price data information, interpolation is used to fill in the missing values in the impact factor data. To simplify the representation, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, and S18, respectively represent carbon price, Brent crude oil, WTI crude oil, natural gas price, Rotterdam coal, thermal coal price, Qinhuangdao coal price, EUA, H&S300, USD/CYN, EUR/CYN, cement price index, iron ore, chemical industry index, AQI, minimum temperature, maximum temperature, and trading volume. Figure 4 displays the Pearson correlation coefficient between each factor and the carbon price in Guangdong, which reflects the correlation with the carbon price to a certain extent.

Figure 4. Correlation heat map of the carbon market in Guangdong

Evaluation metrics

First, several general evaluation criteria are used, including MAE, RMSE, MAPE, and R². This study also joins the Orientation Accuracy Assessment (DA).⁴¹ DA represents the predicted direction accuracy. In practice, not only the forecast accuracy of carbon prices themselves must be considered but also the accuracy of forecasting carbon price rises and falls. The calculation formula of each evaluation index is as follows: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]

In the above formula, y_i represents the actual value, and ŷ_i represents the predicted value. n is the number of predicted samples. The smaller the MAE, RMSE, and MAPE, the better the prediction result. The smaller the MAE, RMSE, and MAPE, the closer R² and DA are to 1, and the better the prediction effect.

However, no comparability is found due to the differences in theoretical principles among different evaluation criteria. Even with the same evaluation criteria, we cannot judge that the predictive ability of the proposed model is better than the benchmark model only by the numerical values of the evaluation criteria because the differences in prediction errors between the two models may not be significant.⁴² Therefore, this study adopted two statistical methods to test the significance of the predictive power of different models. Firstly, the Diebold–Mariano test (DM test)⁴³ statistical method to test whether the differences in the predictive ability of multiple models are significant. DM test compares the predicted values and actual value errors of different models to judge whether significant differences exist in the prediction effects of different models. The null hypothesis (H₀) is that the two models have the same prediction accuracy, whereas the alternative hypothesis (H₁) is the opposite. The DM test value is calculated as follows: [Image Omitted. See PDF] [Image Omitted. See PDF]where j and k represent the predicted values of different models. σ is the standard deviation of d_i.

Second, in the prediction system, if the changes of the two variables are synchronous, it can be shown that there is a high correlation; otherwise, the correlation is low. Therefore, grey relational analysis (GCA) measures the geometric proximity between the predicted results of different models and the actual values,⁴⁴ indicating the correlation between predicted and actual values. The GCA is calculated as follows: [Image Omitted. See PDF] [Image Omitted. See PDF]where a and b are the maximum and minimum absolute errors of all models, respectively, and p is the resolution coefficient, which is generally 0.5.

EXPERIMENTAL STUDY

In this section, samples from five carbon markets in Guangdong, Beijing, Shanghai, Hubei, and Shenzhen are selected as experimental objects to verify the feasibility of our proposed model in carbon price prediction. First, the structure and basic parameters of the experimental model are described. Second, the experimental results are comprehensively analyzed and compared.

Experimental setup

To fully compare the proposed model with the benchmark model, it is necessary to set the model parameters appropriately. Many parameters exist in TFT, including time steps, batch sizes, learning rates, number of hidden layers, number of neuron nodes, and number of attention heads. Among them, the number of hidden layers and attention heads determines the main structure of TFT, which adopts the design in the original paper.³⁶ In addition, since this study does not use the static features of the carbon price, a uniform mask is used as its input. The input time step is uniformly set to 10 according to the PACF result, that is, the price data of the first 10 steps are used to predict the price of the next step. Learning rates will affect the training time and convergence efficiency of the model. A large learning rate may cause the model to fail to converge and hover around the optimal value, while a small learning rate will lead to slow network convergence and easily fall into the local optimum. Therefore, this study adopts the adaptive learning rate reduction and early stopping mechanism to adjust the learning rate according to the validation set's accuracy. At the initial stage of training, a large learning rate is adopted, but with the increase of training rounds, the learning rate gradually decreases, so the model converges rapidly. And the batch size and the number of neuron nodes were determined experimentally based on Yun et al.⁴⁵ The Guangdong carbon price was chosen as the experimental object, and the error on the validation set was calculated when the batch sizes were 16, 32, 64, and 128 and neuron nodes were 4, 8, 16, 32, 64, and 128, respectively (as showed in Table 4). It is found that the MAE, RMSE, and MAPE on the validation set are 0.262%, 0.329%, and 0.939% when the Batch size and neuron nodes are 16 and 64, respectively, which are optimal compared to the whole experimental sample. Therefore, the batch size and neuron nodes in TFT are set to 16 and 64.

Table 4 Performance of the proposed model: batch sizes and neuron nodes

Batch_size	Neuron nodes	MAE	RMSE	MAPE (%)	Running_time	Batch_size	Neuron nodes	MAE	MSE	MAPE (%)	Running_time
16	4	0.368	0.451	1.317	342.194	32	4	0.405	0.501	1.447	184.582
	8	0.279	0.359	1.000	422.423		8	0.272	0.348	0.972	220.950
	16	0.353	0.421	1.260	480.679		16	0.430	0.543	1.535	150.531
	32	0.273	0.352	0.979	413.102		32	0.272	0.351	0.973	187.407
	64	0.262	0.329	0.939	428.417		64	0.311	0.384	1.112	167.341
	128	0.276	0.356	0.989	390.950		128	0.271	0.351	0.972	177.006
64	4	0.408	0.492	1.463	96.477	128	4	0.366	0.467	1.305	59.296
	8	0.319	0.390	1.143	110.177		8	0.394	0.466	1.410	72.550
	16	0.279	0.345	0.999	101.942		16	0.333	0.421	1.189	70.822
	32	0.269	0.347	0.964	120.436		32	0.267	0.345	0.957	53.778
	64	0.279	0.359	1.000	115.425		64	0.275	0.355	0.984	55.588
	128	0.274	0.354	0.982	96.581		128	0.279	0.360	0.999	67.752

To verify the effectiveness of our proposed model, several benchmark neural network models are selected for comparison, including the backpropagation (BP) network, TCN, recurrent neural network (RNN), LSTM, and gated recurrent unit (GRU) network. The LSTM is commonly used for carbon price prediction. The parameters of the LSTM are set according to Zhou et al.⁴⁶ and verified by cross-validation. There are three layers in the LSTM, and the number of units in each layer is 128, 64, and 32, and the RNN and GRU are also recurrent neural networks like LSTM, so the structure of the RNN and GRU is the same as that of the LSTM. For TCN and BP, cross-validation was used to determine their parameter settings. The parameter setting of each comparable model is shown in Table 5. In addition, the setting of time steps, batch sizes, and learning rate as the same as TFT.

Table 5 Hypermeters for comparable models

Model	Hypermeter
BP	Hidden layer = 3
	Hidden size = 256,128,64
	Activation function = “Relu”
TCN	Hidden layer = 3
	Kernel size = 2
	Dilated rate = 2
	Activation function = “Relu”
RNN	Hidden layer = 3
	Unit = 128, 64,32
	Activation function = “Relu”
LSTM	Hidden layer = 3
	Unit = 128, 64,32
	Activation function = “Relu”
GRU	Hidden layer = 3
	Unit = 128, 64,32
	Activation function = “Relu”

The input dimensions of the models used are all (10, 18). Given that BP cannot input data at different time steps, all-time data are flattened before input. The output size is 1. The loss function is MAE. To avoid overfitting, a Dropout layer with a coefficient of 0.2 is added after each layer. The Adam optimization algorithm is used to optimize the training. All operations can be performed in Python3.8.8 and Tensorflow2.7.0. In addition, multivariate empirical mode decomposition (MEMD)⁴⁷ is selected as the contrast decomposition technique, and the parameter settings of MFIF refer to Cicone et al.³²

Experimental results and analysis

First, the decomposition and reconstruction of the original sequence of each carbon price are presented. Second, the prediction results of the proposed and benchmark models are comprehensively compared. Finally, the importance of each factor and time step obtained by the model is explained.

Decomposition and reconstruction of carbon price series

According to the hybrid carbon price forecast structure proposed in this study, MFIF is used to decompose carbon prices and their related factors into several IMFs. Note that for transaction volumes, the minimum and maximum temperatures and AQI are excluded in the decomposition algorithm. Figure 5 takes the Guangdong carbon trading market as an example to illustrate the decomposition and reconstruction results. Many IMFs exist, which not only increase the computational complexity but also reduce prediction accuracy due to error accumulation. Therefore, reconstructing these IMFs is indispensable. Given that sequences with similar complexities have similar prediction difficulties, SE can be a good measure of sequence complexity.⁴⁸ Therefore, reconstructing the original sequence according to the SE value can reduce computational cost and error. IMFs with the same complexities are reconstructed into a new sequence. The carbon price decomposition–reconstruction results in the remaining markets are shown in Figure 6. As displayed in Figures 5 and 6, from the first part to the third part of each carbon market, the frequency and complexity are from high to low. For the data decomposed by MEMD, the IMFs obtained by decomposing it are reconstructed into three parts in the same manner as above. When calculating SE and reconstruction, we do not calculate SE values for other factors because the IMFs of other factors are in one-to-one correspondence with the IMFs of carbon prices. Moreover, the IMFs of other factors can be reconstructed only by adding the corresponding IMFs according to the reconstruction results of each IMF of the carbon price.

View Image - Figure 5. Decomposition–reconstruction of Guangdong market. (A) is the SE of each IMF and which IMFs are grouped into a part, (B) is the result after reconstruction.

Figure 5. Decomposition–reconstruction of Guangdong market. (A) is the SE of each IMF and which IMFs are grouped into a part, (B) is the result after reconstruction.

Figure 6. Reconstruction results. (A), (B), (C), and (D) represent the carbon markets of Beijing, Shanghai, Hubei, and Shenzhen, respectively.

Forecasting results

The forecast results of the proposed MFIF–SE–TFT model and other benchmark models for various carbon market prices are shown in Figures 7, 8, 9, 10, and 11, representing Guangdong, Beijing, Shanghai, Hubei, and Shenzhen, respectively. The predicted value closest to the actual value comes from MFIF–SE–TFT, and its fitted curve is the closest to the actual curve. Even for particularly oscillating parts (e.g., the part framed in Figures 7–11) or in the extreme value part (max and min values), the model prediction accuracy is satisfactory without major deviations. For a single model, no good prediction is found for the violent fluctuation and long-term trend of carbon prices. For example, in Figures 7, 8, and 9, the trend of a single model, especially BP, for the latter half of the three markets of Guangdong, Shanghai, and Hubei is quite different from the actual value. Thus, the effectiveness of the decomposition algorithm applied to carbon price forecasting is illustrated, which is beneficial for the model to learn different parts of the carbon price series. Subplots (A)–(H) in Figures 7–11 further reflect the trend between predicted and actual values, with the diagonal line indicating that the true and predicted values are the same. Therefore, the closer the scatter is to this line, the better the performance. Among them, the distribution of the scatter points of MFIF–SE–TFT converges to the diagonal line. From these subgraphs, a significant deviation is observed in the prediction result of a single model, and the prediction progress of the model using decomposition technology has been improved to a certain extent. Looking further, the heatmap of the scatter plots in these subplots represents the absolute magnitude of the error between the predicted and actual values. Although the forecasting effects in different carbon markets are different, most of the absolute errors of MFIF–SE–TFT are less than 5. For a single model, especially BP, its prediction results are quite different from the actual value. The absolute errors of some points are greater than 20. The fitting curves, scatter plots, and heatmap of each carbon market in the above analysis preliminarily confirm that our proposed forecasting model has good forecasting accuracy.

Figure 7. Guangdong forecasting results

Figure 8. Beijing forecasting results

Figure 9. Shanghai forecasting results

Figure 10. Hubei forecasting results

Figure 11. Shenzhen forecasting results

Table 6 lists the values of the five carbon market evaluation criteria. The bold in Table 6 represents the optimal value under this criterion. In terms of MAE, R², RMSE, MAPE, and DA, MFIF–SE–TFT has the best-predicted performance in each carbon market, with the smallest MAE, RMSE, and MAPE and the largest R² and DA. According to the results of the above evaluation criteria compared with other benchmark models, the following key conclusions can be drawn:

(1)
The prediction effect of TFT in a single model is better than BP, TCN, RNN, LSTM, and GRU models, with small MAE, RMSE, and MAPE and large R² and DA. In the Guangdong market, the various evaluation indicators of TFT increased by 49.68%, 44.37%, 45.27%, 2.49%, and 19.76%, respectively, compared with the second-best model in MAE, RMSE, MAPE, R², and DA. Beijing market increased by 4.28%, 6.28%, 7.51%, and 2.26% in MAE, RMSE, MAPE, and R², respectively. Shanghai market increased by 54.64%, 56.34%, 48.34%, and 30.80% in MAE, RMSE, MAPE, and R², respectively. The Hubei market increased by 25.96%, 15.01%, 25.14%, and 1.25% in MAE, RMSE, MAPE, and R², respectively. Shenzhen market increased by 5.61%, 1.21%, 3.75%, and 2.26% in MAE, RMSE, MAPE, and DA, respectively. These results are mainly attributed to the fact that TFT can fully learn the importance of different factors for carbon price prediction and eliminate some irrelevant variables. For a single model, these variables interfere with carbon price prediction, and TFT can learn the temporal characteristics of carbon prices.
(2)
After introducing the decomposition algorithm, including MEMD and MFIF, the prediction accuracy of a single model can be significantly improved for the following reasons: (a) The decomposition algorithm can reduce the complexity of the original series, whereas SE can accurately estimate the complexity of the time series. (b) The TFT model is suitable for capturing the internal features of different factors in subsequences that are beneficial to the prediction of different subsequences so that the model can well learn different features, thereby improving prediction accuracy. However, comparing the two decomposition methods of MEMD and MFIF, the decomposition effect of MFIF is better than MEMD. In the Guangdong market, the various evaluation indicators of MFIF–SE–TFT increased by 61.07%, 61.17%, 59.63%, 0.71%, and 52.71%, respectively in MAE, RMSE, MAPE, R², and DA compared with MEMD–SE–TFT. Beijing market increased by 32.93%, 29.33%, 33.61%, 6.19%, and 26.65%; Shanghai market increased by 65.39%, 70.83%, 66.03%, 2.25%, and 79.50%; Hubei market increases by 15.90%, 15.79%, 15.02%, 0.51%, and 17.80%; Shenzhen market increases by 70.93%, 71.17%, 75.77%, 65.81%, and 107.52% in MAE, RMSE, MAPE, R², and DA, respectively. MFIF can decompose carbon prices in a more detailed manner than MEMD, so a more regular subsequence can be obtained after SE.
(3)
As presented in Table 6, differences are observed in the forecast results of different carbon markets. The Beijing and Shenzhen carbon markets are less accurate than other carbon markets, with slightly higher MAE, RMSE, and MAPE. To determine the reasons for this phenomenon, the forecast result of each subsegment in each carbon market is extracted, and three indicators, MAE, RMSE, and MAPE, are selected. The specific results are shown in Table 7. The evaluation index result of each part in Table 7 reveals that the prediction error of Part I contributes a significant part of the overall error because Part I of the carbon price is mainly composed of high-frequency IMFs, which contain large uncertainties and volatilities and much noise and outliers. Beijing and Shenzhen markets are more volatile in the short term than other markets. Therefore, the difference in Part Ⅰ prediction leads to the difference in the final forecast result of each carbon market. However, for Parts Ⅱ and III, the model can be fitted almost perfectly.

Table 6 Prediction error evaluation metrics

Carbon market	Evaluation metrics	Forecasting Models
		BP	TCN	RNN	LSTM	GRU	TFT	MEMD-SE-TFT	MFIF-SE-TFT
Guangdong	MAE	11.459	5.236	4.869	3.453	2.043	1.028	0.971	0.378
	RMSE	16.940	8.331	6.826	4.492	3.205	1.783	1.558	0.605
	MAPE	18.143%	7.984%	8.150%	6.136%	3.298%	1.805%	1.734%	0.700%
	R²	0.335	0.767	0.843	0.932	0.965	0.989	0.992	0.999
	DA	0.504	0.493	0.504	0.507	0.507	0.521	0.516	0.788
Beijing	MAE	11.252	8.148	6.423	6.287	6.805	6.018	5.475	3.672
	RMSE	14.414	10.006	8.238	8.259	8.972	7.740	6.938	4.903
	MAPE	21.789%	13.488%	11.133%	10.675%	11.820%	9.873%	8.786%	5.833%
	R²	0.519	0.768	0.843	0.842	0.814	0.861	0.889	0.944
	DA	0.448	0.540	0.460	0.464	0.480	0.452	0.484	0.613
Shanghai	MAE	7.089	5.107	3.773	4.177	2.522	1.144	0.731	0.253
	RMSE	9.877	7.951	6.249	5.823	4.104	1.792	1.200	0.350
	MAPE	14.234%	9.998%	7.232%	8.465%	4.954%	2.559%	1.666%	0.566%
	R²	0.300	0.337	0.360	0.444	0.724	0.947	0.976	0.998
	DA	0.331	0.327	0.347	0.343	0.331	0.318	0.400	0.718
Hubei	MAE	4.363	3.845	1.967	1.341	1.144	0.847	0.648	0.545
	RMSE	5.606	5.458	2.765	1.850	1.566	1.331	0.950	0.800
	MAPE	10.935%	8.958%	4.835%	3.458%	3.099%	2.320%	1.764%	1.499%
	R²	0.454	0.482	0.867	0.941	0.957	0.969	0.984	0.989
	DA	0.497	0.513	0.487	0.461	0.455	0.479	0.545	0.642
Shenzhen	MAE	6.709	6.512	5.583	6.98	5.083	4.798	4.389	1.276
	RMSE	8.973	9.108	8.224	9.21	7.93	8.026	7.475	2.155
	MAPE	53.105%	48.676%	45.182%	59.331%	43.718%	42.079%	39.162%	9.489%
	R²	0.398	0.379	0.494	0.365	0.53	0.518	0.582	0.965
	DA	0.394	0.415	0.355	0.36	0.371	0.342	0.399	0.828

Table 7 Subpart prediction results

	Evaluation metrics	Part Ⅰ	Part Ⅱ	Part Ⅲ
Guangdong	MAE	0.2713	0.0256	0.0811
	RMSE	0.4672	0.0551	0.1251
	MAPE	1845.58%	0.035%	3.73%
Beijing	MAE	3.6694	0.1425	0.0528
	RMSE	4.8971	0.2383	0.0593
	MAPE	9172.33%	0.31%	6.21%
Shanghai	MAE	0.3295	0.0757	0.0413
	RMSE	0.5471	0.1207	0.0732
	MAPE	4653.31%	4.14%	0.01%
Hubei	MAE	0.5365	0.0235	0.0492
	RMSE	0.7917	0.0347	0.0714
	MAPE	4856.25%	13.71%	0.12%
Shenzhen	MAE	2.2623	0.0988	0.0498
	RMSE	3.0890	0.1697	0.0641
	MAPE	5220.93%	3.17%	0.40%

The above is an analysis of the traditional evaluation indicators, but the model's predictive ability cannot be judged simply because of the levels of the above evaluation indicators. These differences may be insignificant or may be due to chance in model training. Therefore, the DM test and GCA are performed to determine whether the prediction results of multiple models are different. Table 7 presents the DM test and GCA results of MFIF–SE–TFT and the benchmark models in various carbon markets. From Table 8, the proposed MFIF–SE–TFT model in each carbon market is significantly better than the comparative models at a 95% confidence interval. Therefore, the established model has a probability of more than 95% better than the benchmark model. While the GCA of the proposed model is the highest, its most correlated with the actual value. Indicating that MFIF–SE–TFT dramatically improves the accuracy of carbon price prediction compared with other models.

Table 8 DM test and GCA results

Model	Guangdong		Beijing		Shanghai		Hubei		Shenzhen
	DM	GCA	DM	GCA	DM	GCA	DM	GCA	DM	GCA
BP	11.8355***	0.757	10.2276***	0.685	9.1531***	0.693	12.6734***	0.746	9.7301***	0.797
TCN	10.2866***	0.860	9.1316***	0.734	8.1878***	0.768	11.1564***	0.782	9.4200***	0.807
RNN	10.6461***	0.857	8.0565***	0.780	7.8540***	0.818	7.9341***	0.862	7.6040***	0.830
LSTM	12.1364***	0.888	7.1674***	0.785	9.0745***	0.780	6.4625***	0.898	9.4589***	0.790
GRU	7.5170***	0.932	7.3716***	0.774	8.0323***	0.86	7.4033***	0.911	6.1975***	0.845
TFT	8.2677***	0.958	7.3350***	0.79	7.8590***	0.887	6.3421***	0.918	7.8805***	0.841
MEMD-SE-TFT	4.4585***	0.965	7.1346***	0.802	3.3983***	0.947	5.0412***	0.947	5.4904***	0.866
MFIF-SE-TFT		0.986		0.856		0.980		0.954		0.951

***

5% significance level.

The accuracy of a single benchmark model (BP, TCN, RNN, and LSTM GRU) is low, and a large gap exists between the prediction accuracy of MFIF–SE–TFT. These deep learning models also have strong nonlinear fitting capabilities and are often used in time series forecasting. Therefore, this study explores the reason for such a gap: too many variables are selected to predict carbon prices. Many variables have effects on carbon prices, and different variables have different mechanisms for carbon prices, and the carbon price is highly uncertain with large noise, other variables also have this feature. When there are more exogenous variables, extracting the characteristic relationship between the highly uncertain variables and the original carbon price series is difficult for a single model, and overfitting is likely to occur. Three extended experiments were conducted to verify the above analysis. First, use only historical carbon price data as input. Second, the exogenous variables are used as feature input after dimensionality reduction through factor analysis.²⁰ Kaiser–Meyer–Olkin (KMO) and the Bartlett test of phericity are used to determine whether the original variables are suitable for factor analysis. The statistical test is shown in Table 9. According to Table 9, the KMO value of almost all samples is more significant than 0.7, and the probability is less than the confidence level, indicating that factor analysis can be used in all samples. Thirdly, irrelevant features were eliminated by random forest (RF)¹⁷ feature screening before being used as input, and Figure 12 shows the top 10 RF scores. Set the threshold to 0.05, that is, variables with scores greater than 0.05 are selected as input features. For extended experiments, the same time step is 10. The parameter settings are the same as those in Table 5. The five indicators of MAE, R², RMSE, MAPE, and DA are also selected, The experimental results are shown in Tables 10–12. Compare the results between Tables 10–12 and Table 6. In addition to the DA indicator, the evaluation results using only one feature of a carbon price are significantly better than the prediction results when other variables are included. Then, the variable features can improve the prediction accuracy of a single model by dimensionality reduction or filtering, indicating that introducing exogenous variables is beneficial. The prediction accuracy of both treatments has advantages and disadvantages, but both are lower than TFT. Extended experiments validate our analysis that it is difficult for a single model to learn valid information between features when there are more variable features, which leads to overfitting of the model. However, it is difficult to retain all the data information using dimensionality reduction and feature screening. While TFT has considerable advantages in dealing with multiple variables, it can reasonably deal with the complex relationship between variables and carbon price fluctuations, capture its characteristics, and eliminate and weaken irrelevant interfering variables to achieve improved forecasting effects. Three extended experiments further illustrate the superiority and stability of the proposed model.

Table 9 KMO and Bartlett test

Guangdong		Beijing		Shanghai		Hubei		Shenzhen
KMO	0.688	KMO	0.704	KMO	0.704	KMO	0.71	KMO	0.696
Bartlett test of sphericity		Bartlett test of sphericity		Bartlett test of sphericity		Bartlett test of sphericity		Bartlett test of sphericity
Approximate chi-square value	31,173.299	Approximate chi-square value	25,062.618	Approximate chi-square value	23,748.974	Approximate chi-square value	35,468.347	Approximate chi-square value	36,698.013
Degrees of freedom	136	Degrees of freedom	136	Degrees of freedom	136	Degrees of freedom	136	Degrees of freedom	136
Significance	0	Significance	0	Significance	0	Significance	0	Significance	0

Figure 12. RF score of different feature variables

Table 10 Single model results of using carbon price itself

Carbon market	Evaluation metrics	Forecasting models
		BP	TCN	RNN	LSTM	GRU
Guangdong	MAE	1.185	1.428	1.432	2.536	1.913
	RMSE	1.897	2.258	2.285	3.454	2.612
	MAPE	1.890%	2.355%	2.409%	4.402%	3.370%
	R²	0.988	0.983	0.982	0.960	0.977
	DA	0.605	0.542	0.499	0.513	0.527
Beijing	MAE	6.122	6.233	6.147	6.146	6.033
	RMSE	7.814	7.928	7.789	7.831	7.748
	MAPE	10.013%	10.380%	9.910%	9.871%	9.738%
	R²	0.859	0.855	0.860	0.858	0.861
	DA	0.452	0.440	0.468	0.480	0.452
Shanghai	MAE	7.638	4.274	3.840	3.610	3.532
	RMSE	2.764	2.067	1.959	1.900	1.879
	MAPE	3.697%	2.871%	2.725%	2.654%	2.624%
	R²	0.875	0.930	0.937	0.941	0.942
	DA	0.347	0.306	0.327	0.343	0.335
Hubei	MAE	1.036	1.189	1.022	0.904	0.889
	RMSE	1.451	1.661	1.454	1.396	1.402
	MAPE	2.720%	3.044%	2.680%	2.446%	2.418%
	R²	0.963	0.952	0.963	0.966	0.966
	DA	0.492	0.479	0.468	0.482	0.482
Shenzhen	MAE	5.460	4.809	4.875	4.823	5.092
	RMSE	8.820	8.034	8.100	7.816	8.434
	MAPE	41.385%	42.118%	42.457%	42.645%	40.121%
	R²	0.418	0.517	0.509	0.543	0.468
	DA	0.436	0.347	0.358	0.360	0.342

Table 11 Single model results of using factor analysis

Carbon market	Evaluation metrics	Forecasting models
		BP	TCN	RNN	LSTM	GRU
Guangdong	MAE	1.118	1.107	1.497	1.825	1.706
	RMSE	1.637	1.725	2.291	3.045	2.930
	MAPE	1.788%	1.719%	2.327%	2.873%	2.671%
	R2	0.991	0.990	0.982	0.969	0.971
	DA	0.547	0.536	0.553	0.519	0.504
Beijing	MAE	5.985	6.103	5.773	6.145	6.024
	RMSE	7.558	7.876	7.318	8.265	7.730
	MAPE	9.742%	10.593%	9.611%	11.210%	9.852%
	R2	0.868	0.856	0.876	0.842	0.862
	DA	0.480	0.460	0.484	0.496	0.452
Shanghai	MAE	4.074	4.748	3.558	3.454	3.376
	RMSE	5.672	6.610	4.931	5.310	5.059
	MAPE	8.268%	9.636%	7.209%	6.885%	6.756%
	R2	0.473	0.284	0.601	0.538	0.580
	DA	0.385	0.389	0.401	0.409	0.425
Hubei	MAE	1.020	1.184	1.019	0.894	0.877
	RMSE	1.417	1.687	1.325	1.309	1.190
	MAPE	2.765%	3.023%	2.754%	2.394%	2.363%
	R2	0.965	0.951	0.969	0.970	0.975
	DA	0.478	0.454	0.480	0.488	0.467
Shenzhen	MAE	5.244	4.800	4.927	4.809	4.963
	RMSE	7.114	7.135	7.677	6.653	7.810
	MAPE	46.846%	42.131%	42.982%	46.468%	43.078%
	R2	0.621	0.619	0.559	0.669	0.544
	DA	0.453	0.393	0.448	0.367	0.385

Table 12 Single model results of using RF feature screening

Carbon market	Evaluation metrics	Forecasting models
		BP	TCN	RNN	LSTM	GRU
Guangdong	MAE	1.105	1.353	1.423	2.036	1.692
	RMSE	1.680	2.326	2.247	2.689	2.479
	MAPE	1.779%	1.983%	2.203%	3.601%	2.796%
	R2	0.991	0.982	0.983	0.976	0.979
	DA	0.490	0.541	0.493	0.556	0.513
Beijing	MAE	6.107	6.053	6.035	6.078	6.036
	RMSE	8.080	7.919	7.833	7.812	7.740
	MAPE	10.997%	10.311%	10.766%	9.887%	9.658%
	R2	0.849	0.855	0.858	0.859	0.861
	DA	0.456	0.432	0.464	0.456	0.468
Shanghai	MAE	2.481	2.786	2.949	2.811	2.010
	RMSE	4.214	4.568	5.102	4.638	3.305
	MAPE	4.842%	5.513%	5.709%	5.642%	4.025%
	R2	0.709	0.658	0.573	0.647	0.821
	DA	0.356	0.377	0.352	0.364	0.397
Hubei	MAE	1.015	1.150	1.009	0.855	0.852
	RMSE	1.424	1.627	1.529	1.064	1.187
	MAPE	2.409%	2.795%	2.705%	2.395%	2.050%
	R2	0.965	0.954	0.959	0.980	0.976
	DA	0.480	0.470	0.472	0.454	0.470
Shenzhen	MAE	5.221	4.859	4.844	4.822	4.995
	RMSE	6.545	6.940	7.293	6.532	6.792
	MAPE	45.236%	41.260%	40.643%	39.796%	41.571%
	R2	0.680	0.640	0.602	0.681	0.655
	DA	0.370	0.349	0.339	0.383	0.380

Weigh importance

Figure 13 shows the importance of different past variables in Part 1 and the importance of different lag orders in Part 1, Part 2, and Part 3 in the five carbon price sample experiments. Through Equation (17), the importance of different past input variables can be obtained by averaging the weights of variables selected in the output test set overall lag orders. Similarly, the attention of different lags orders can be obtained according to the score of attention in Equation (21). The results of weight importance are analyzed according to Figure 13.

(1)
Differences are observed in the importance of feature variables among different carbon price sample forecasts. In the Beijing and Shenzhen experiments, the carbon price itself has the highest importance of more than 0.7 and 0.6, respectively, while other feature variables account for only a small part, indicating that in both carbon markets, the carbon price itself makes the most significant contribution to the prediction of the short-term part. In the remaining three carbon markets, other variables also show considerable importance in addition to the carbon prices themselves. The main focus is on energy prices (Brent, WTI, natural gas, coal), so the changes in energy prices have a considerable impact on carbon prices, which is consistent with some scholars' studies that energy prices play an important role in explaining the volatility of carbon prices.⁴⁹ In addition to energy prices H&S300, cement prices and trading volumes contribute considerable importance in these three carbon markets.
(2)
For the low and medium frequency part of the forecast, the carbon price contributes to the entire importance. In Figure 12, only the importance of the Part1 feature variables is shown because the other two-part models assign all importance weights to carbon prices themselves. Parts II and III represent the mid-and long-term fluctuations and trends of carbon prices. The lag step number chosen for this study is 10, but capturing the mid-and long-term dependencies between carbon prices and other variables is difficult. In addition, the medium- and long-term trends of carbon prices are mainly affected by major events and long-term supply and demand,⁵⁰ which are mainly generated within the carbon market. The importance weights are allocated to the carbon price in the experiments using a single TFT. Compared with Parts II and II, Part Ⅰ has high fluctuation frequency and low regularity, resulting in the model failing to learn multilevel features in the carbon price sequence. Therefore, decomposition techniques are effective for forecasting carbon prices.
(3)
The importance of each lag shows the attention change over temporal features. A general trend is observed in each part's forecast; the smaller the lag order, the more significant the contribution to the carbon price forecast. In Part III prediction, except for the Shanghai carbon market, the first three lags orders occupy almost all the importance. Part Ⅲ is relatively smooth and has low volatility, its performance is very regular, and the first few steps of the lag contain most of the information. Carbon price predictions in Part II rely on information in larger lag orders. In Part Ⅰ, the importance of the lag order fluctuates seriously, and it depends on a long lag time step and a wide time range; that is, the greater the volatility of the forecast series, the more information the model needs. Therefore, incorporating an attention mechanism can resolve the temporal characteristics of carbon price dependences in different sections.

Figure 13. Weigh importance results of the MFIF–SE–TFT in five carbon markets

CONCLUSION AND FUTURE WORK

The nonlinearity and nonstationarity of carbon prices and the influences of many external variables pose huge challenges for the accurate predictions of carbon prices. In this study, a new carbon price prediction model, MFIF–SE–TFT, is proposed, which not only considers carbon prices themselves but also the impacts of other variables on carbon prices. First, the multidimensional time series is decomposed using the advanced MFIF, which can effectively deal with the nonlinearity and nonstationarity of carbon prices. Second, to reduce the calculation speed of the model, this study adopts SE to calculate the complexities of IMFs and reconstructs IMFs with the same complexities. Third, the reconstructed carbon price subsequence and external variable sequence are input into the TFT model to obtain the prediction result of each subsequence and aggregate them to obtain the final prediction result. Five carbon trading markets (Guangdong, Beijing, Shanghai, Hubei, and Shenzhen) and multiple benchmark models (BP, TCN, RNN, LSTM, GRU) are selected for simulation experiments. According to the experimental results, the following conclusions are drawn:

(1)
TFT outperforms single BP, TCN, RNN, LSTM, and GRU in carbon price prediction. When a single model incorporates many variable features, it not only cannot improve the model prediction accuracy but also leads to a decrease in prediction accuracy due to overfitting, some feature engineering methods need to be used in a single model to reduce variables' dimension. TFT can well learn adaptively the potential relationships between variables and carbon prices, mine inherent features, extract effective information, and eliminate interference factors, thereby improving model efficiency and stability.
(2)
MFIF has a better decomposition effect than MEMD. High degrees of nonlinearity and uncertainty are observed in carbon prices. After introducing the decomposition algorithm, the prediction performance of TFT is further improved. Thus, decomposition technology can effectively extract complex features in carbon price series. Compared with MFIF and MEMD, MFIF has better decomposition efficiency; due to its better algorithm design, it can solve the defects of EMD-type methods.
(3)
External variables mainly contribute to the prediction of the high-frequency part of carbon prices. Through the importance weights of TFT for external variables, in Part Ⅰ, that is, the high-frequency part, external variables contribute to carbon price predictions, of which crude oil prices have a significant role. For Parts II and III, external variables hardly contribute any role. Importance analysis of different feature variables of carbon price forecasts can provide policymakers with valuable information.
(4)
The predictions of different subsequences have different time dependencies. According to the results of the attention mechanism, the prediction in Part Ⅲ only depends on the previous day's information. As the subsequence complexity increases, Parts I and II gradually need to rely on information from longer time steps.

Future studies can first consider additional lag time steps. The number of lag steps chosen in this research is 10. If a more significant number of time steps is used, then the long-term dependence of a carbon price on itself and other variables can be taken. At the same time, as the number of time steps increases, more external variables, such as policy factors and seasonal periodicity, can be considered. Second, the weights of different variables learned by TFT are generated by the model; determining whether a more profound economic explanation for them exists is worth paying attention to. In addition, more multivariate decomposition techniques can be considered for carbon price forecasting.

ACKNOWLEDGMENTS

This research was funded by the Project of Sichuan Oil and Natural Gas Development Research Center (Grant No. SKB20-06), Strategic Research and Consulting Project of the Chinese Academy of Engineering (2022-28-33), Major Project of Sichuan Philosophy and Social Science Planning Research (Grant No. SC21ZDZT010), Key Project of Chengdu Water Ecological Civilization Construction Research Key Base (Grant No. SST2021-2022-03), Key Project of Mineral Resources Research Center in Sichuan Province (Grant No. SCKCZY2021-ZD002), Key Project of Chengdu Park City Demonstration Zone Construction Research Center (Grant No. GYCS2021-ZD001), General Project of Research Center for Science and Technology Innovation and New Economy in Chengdu-Chongqing Economic Circle (Grant No. CYCX2021YB08), Key Project of Sichuan Leisure Sports Industry Development and Research Center (Grant No. XXTYCY2021A01), Social Science Research of Sichuan Province for the 14th Five-year Plan (Grant No. SC21B007), Philosophy and Social Science Research Foundation of the Chengdu University of Technology (Grant No. YJ2021-YB002), and General Project of Research Center for Sichuan Disaster Economy (Grant No. ZHJJ2021-YB001).

CONFLICT OF INTEREST

The authors declare no conflict of interest.

Word count: 10982

Show less

© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The accurate forecasts of carbon prices can help policymakers and enterprises further understand the laws of carbon price fluctuations and formulate related policies and investment strategies. Nowadays, many carbon price prediction models have been proposed. However, some models ignore the time–frequency relationship when considering exogenous variables and fail to measure their importance to the forecasting results, leading to unsatisfactory results. Therefore, this study proposes a novel hybrid model for carbon price forecasting on the basis of advanced multidimensional time series decomposition techniques and interpretable multifactor models. In the proposed model, multivariate fast iterative filtering is used to decompose carbon price and its exogenous variable sequence into several intrinsic mode functions, which can overcome the nonlinearity and nonstationarity of carbon prices and obtain their intrinsic characteristics. Meanwhile, temporal fusion transform (TFT) is used to interpret predictions for multivariate time series. TFT is a new attention-based deep learning model combining high-performance multihorizon prediction and interpretability and can adaptively select the optimal features for carbon price prediction. Five carbon markets in Guangdong, Beijing, Shanghai, Hubei, and Shenzhen are selected for experimental studies. Empirical results indicate that the proposed model outperforms the compared benchmark models in all performance metrics. In the interpretable output of TFT, the prediction of the high-frequency part requires the participation of exogenous variables and has a long time dependence; for the middle and low-frequency part, only using the carbon price itself and a short time step can lead to good results. This finding can inform future research on carbon price forecasting and help policymakers.

Details

Title

A novel interpretable model ensemble multivariate fast iterative filtering and temporal fusion transform for carbon price forecasting

Author

Wang, Yue¹; Wang, Zhong¹

; Kang, Xinyu¹; Luo, Yuyan¹

¹ College of Management Science, Chengdu University of Technology, Chengdu, China

Pages

1148-1179

Section

ORIGINAL ARTICLES

Publication year

2023

Publication date

Mar 2023

Publisher

John Wiley & Sons, Inc.

e-ISSN

20500505

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/ese3.1380

ProQuest document ID

2786567852

A novel interpretable model ensemble multivariate fast iterative filtering and temporal fusion transform for carbon price forecasting

Jump to:

Full Text

Abstract

Details

Suggested sources