The carbon trading system (ETS) is a crucial tool for facing global climate change and warming. ETS can effectively promote enterprises' green transformation to achieve emission reduction and is widely used in many countries and regions.1 Carbon prices greatly influence the decisions of enterprises and policymakers. A high carbon price increases the operating costs of enterprises, which is not conducive to their transformation. Carbon prices are too low to play a role in reducing emissions. Therefore, accurate carbon price forecasts help establish a long-term stable and efficient carbon market. Enterprises can formulate reasonable emission reduction and investment strategies for the preservation and appreciation of carbon assets and avoid investment risks. Policymakers can deeply understand and grasp the price fluctuation law of the carbon market and establish an effective carbon price stability mechanism.2 However, the carbon market is a policy-based market and is impacted by the heterogeneity of internal market mechanisms and external environmental factors,3 which cause the nonlinearity, nonstationarity, and complexity of fluctuations of carbon prices. Therefore, accurate predictions of carbon prices have become a popular research topic for scholars.
In recent years, many data-driven carbon price forecasting models have been proposed, mainly divided into three categories: econometric, artificial intelligence (AI), and hybrid models. Table 1 lists representative studies of three types of carbon price-prediction models.
Table 1 Summary of selected carbon price forecasting studies
Classification | Application field | Input variable | Exogenous variable treatment | Decomposition method | Predictive model |
Econometric models | European Union Emission Trading System (EU ETS) price | Brent oil, oil, coal, natural gas, electricity | - | - | GARCH, EGARCG, TGARCH, GJT-GARCH4 |
European Union Allowance (EUA) prices | Policy variable, Future economic outlook, Current economic activity | - | - | FIAPGARCH, APGARCH5 | |
Artificial intelligence models | Shenzhen carbon price | Coal, Temperature, Air quality index | - | - | Combination-mixed data sampling regression model and back propagation neural network6 |
EU ETS price | - | - | - | Combination of autoregressive integrated moving average (ARIMA) and least squares support vector machine (LSSVM)7 | |
EU ETS price | - | - | - | Based on phase reconstruction and multi-layer perceptron (MLP) neural network8 | |
EUA price | Dow Jones Euro Stoxx 50 Index, Brent oil, Henry Hub's natural gas future price, Australian BJ thermal coal spot price, Australian Newcastle thermal coal spot price | Concatenate and Direct input | - | Temporal convolutional network(TCN)9 | |
EUA price | Online carbon market news | Concatenate and Direct input | - | Long short-term memory network (LSTM)10 | |
Hubei and Guangdong carbon price | Brent oil, NYMEX natural gas, Newcastle coal, | Concatenate and Direct input | - | LSTM11 | |
Hybrid models | Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Tianjin, Chongqing, Fujian carbon price | - | - | Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) | Extreme gradient boosting (XGboost), Random forest (RF), support vector machine(SVM), radial basis function neural network (RBFNN)12 |
Beijing, Guangdong, Hubei carbon price | - | - | Adaptive variational mode decomposition (AVMD) | Extreme learning machine (ELM)13 | |
Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Tianjin, Chongqing, Fujian carbon price | - | - | complementary ensemble empirical mode decomposition(CEEMD) | LSTM14 | |
Guangdong, Hubei, Shanghai carbon price | - | - | ensemble empirical mode decomposition (EEMD) | Wavelet least square support vector machine (wLSSVM)15 | |
Hubei, Shenzhen carbon price | Energy factor, Economic factor, International carbon price, Environmental factor | Max-relevance min-redundanc | Improved complementary ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) | Kernel-based extreme learning machine16 | |
Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Tianjin, Chongqing carbon price | Similar products, Energy structure, Economic factors, Environmental factors | Random forest (RF) and Stacked auto encoder (SAE) | Variational modal decomposition (VMD) | Bidirectional long short-term memory (BiLSTM)17 | |
EUA price | Brent oil, European ARA port power coal, IPE natural gas, S&P Clean Energy Index, Stoxx 50 Index, CAC40 Index, DAX Index, FTSE-100 Index, S&500, Commodity Research Bureau Futures Index, Certified emission reduction | least absolute shrinkage and selection operator (LASSO) | Hodrick-Prescott filter | ELM18 | |
Hubei carbon price | Coal price, Oil price, Natural gas price, Electricity price, Baidu index, social, media sentiment | Concatenate and Direct input | Discrete wavelet transform(DWT), Singular spectrum analysis (SSA), EMD, VMD | Holt's exponential smoothing method (HOLT), Support vector regression (SVR), Back propagation neural network(BPNN), ARIMA19 | |
Hubei, Shenzhen, Beijing carbon price | International carbon price, Energy prices, Exchange rate, Macroeconomics, Temperature change | Factor analysis | EMD | LSSVM20 |
Econometric models were first used in carbon price forecasting, including autoregressive moving averages (ARMA) and generalized autoregressive conditional heteroskedasticity (GARCH). Byun and Cho4 used a GARCH-type model to predict European carbon price volatility and found that it outperforms other models. Conrad et al.5 employed the FIAPGARCH model to predict carbon prices in the first and second stages of EU-ETS and indicated that FIAPGARCH can well capture the heteroscedasticity and long memory of carbon price fluctuations. However, econometrics is based on the assumption of linearity and stationarity. Thus, capturing the nonlinear and nonstationary characteristics of carbon prices and achieving high prediction accuracy are challenging.
With the rapid development of AI, many AI models have been applied to carbon price forecasting. Han et al.6 used a backpropagation neural network (BPNN) to predict the weekly carbon market price in Shenzhen, China. They revealed that the proposed model has an improved prediction accuracy of 30%–40% compared with the benchmark model. Zhu and Wei7 adopted the least squares support vector machine (LSSVM) to forecast European carbon prices and found that the prediction accuracy of LSSVM is significantly better than that of ARIMA. Fan et al.8 used a multilayer perceptron to forecast carbon prices and verified the forecast validity. Zhang and Wen9 proposed a new carbon price prediction model on the basis of temporal convolutional neural network (TCN)-Seq. 2Seq argued that TCN is suitable for small-sample carbon price data set learning, and its performance is better than traditional statistical prediction models. AI-based prediction models exhibit strong data adaptability and feature extraction capabilities12 and can well handle the nonlinear and nonstationary nature of carbon prices, such as machine and deep learning. In previous studies, AI models exhibited better performance than econometric models.
However, a single model has certain limitations, and achieving the ideal prediction accuracy is difficult.21 To further improve the forecasting accuracy of a single model, a decomposition–integration-based hybrid carbon price forecasting model is proposed. The central idea is to first decompose carbon prices into several subsequences through a decomposition algorithm, then forecast these subsequences separately, and finally integrate the forecast results. On the one hand, decomposing the original sequence can effectively reduce the influence of noise on prediction. On the other hand, it can capture the internal characteristics of carbon price time series. Sun and Xu13 used adaptive variational mode decomposition and extreme learning machines to predict Beijing, Guangdong, and Hubei carbon markets. They gathered evidence proving that the model has good performance. Sun and Li14 combined complementary integrated empirical mode decomposition and long short-term neural network (LSTM) to predict eight carbon market prices in China. The experimental results showed that the model has good stability and applicability. Sun and Xu15 proposed a carbon price prediction model by combining ensemble empirical mode decomposition and an improved wavelet least squares vector machine. The results indicated an improvement in the accuracy of the model's root mean square error (RMSE) compared with other comparable models. Moreover, decomposition technology can further improve the prediction accuracy of carbon prices. Actually, the carbon market is a complex nonlinear system, carbon price is affected by many factors, such as energy factors, economic factors, and weather conditions.22 It is not enough to predict it based on historical information on carbon prices. Therefore, some scholars have considered the effects of exogenous variables in carbon price forecasting. Hao and Tian16 proposed a hybrid carbon price prediction model that comprehensively considers energy, economic, international carbon price, and environmental factors using the maximum correlation minimum redundancy to determine input features. They found that these external variables can significantly improve the prediction accuracy of carbon prices. Xu et al.17 used two-stage feature reconstruction to rebuild and reduce the external factors affecting carbon prices and combined VMD and bidirectional long and short-term neural network (BiLSTM) to build a carbon price prediction model. Zhao et al.18 combined the Hodrick–Prescott filter, extreme learning, and feature selection to select the impact characteristics of the multidimensional carbon price, the experiment shows that these exogenous variables are helpful in carbon price forecasting. Li et al.11 considered the influence of oil, natural gas, and coal on carbon prices, used LSTM for prediction, and experimentally verified the importance of these factors. Zhang and Xia10 incorporate online news into the LSTM, considering its impact on European Union Allowance price forecasting. Sun and Wang20 reduced the dimensionality of the selected exogenous variables by factor analysis, and combined EMD and least squares support vector machine to predict the carbon price. Wang et al.19 based on multisource information fusion (MSIF), hybrid multiscale decomposition (HMSD), and combination forecasting method (CFM) to forecast the carbon price. These experimental results reveal that the decomposition method and considering exogenous variables can substantially improve prediction accuracy.
From Table 1 in the hybrid model forecasting methodology, EMD, discrete wavelet transform (DWT), and variational mode decomposition (VMD) are commonly used decomposition methods. However, they have some defects. EMD is an empirically based decomposition method without strict mathematical derivation. Guaranteeing prior convergence is difficult for EMD23 and problems such as mode aliasing and over-decomposition occur during the decomposition process.24 Wavelet transform and VMD are not adaptive arithmetic, the number of subsequence decompositions must be preset, and wavelet transform should select a wavelet basis function. In addition, existing studies mainly adopt direct input and after dimensionality reduction input when considering exogenous variables—unable to determine how these exogenous variables contribute to carbon price forecasts, that is, the importance of these variables for prediction, and usually only carbon prices are decomposed, ignoring the intrinsic connection between exogenous variables and carbon prices at different timescales. Compared with previous studies, our research proposes a new carbon price prediction model that includes multivariate fast iterative filtering (MFIF), sample entropy (SE), and temporal fusion transform (TFT) (MFIF–SE–TFT). First, the original multivariate time series is composed of carbon prices, and their exogenous variables are decomposed by MFIF. Second, the subsequences decomposed by MFIF are reconstructed using SE to further characterize the features. Finally, the reconstructed subsequences are predicted and integrated using TFT while producing interpretable results for different variable and temporal features. The innovations and contributions of this study are as follows:
This study uses TFT for carbon price forecasting for the first time. When considering the multiple variables of carbon prices, most studies use the method of direct input after dimension reduction, which not only fails to provide an interpretable result but also leads to a decrease in prediction accuracy, an increase in computational cost, and overfitting. TFT compared to other deep learning models (such as LSTM, convolutional neural network [CNN]), provides an end-to-end learning framework that can adaptively learn important variables for carbon price prediction and analyze persistent temporal patterns through attention to different lag orders, compared to the above model with higher stability and robustness to multivariate variables.
A new carbon price prediction model MFIF–SE–TFT is proposed. MFIF has strict mathematical derivation to ensure its a priori convergence, which can effectively avoid the modal mixing phenomenon in EMD. It is an adaptive decomposition algorithm compared to DWT and VMD. Therefore, MFIF can achieve better decomposition. MFIF can also ensure the consistency of the number of subsequences after the decomposition of each exogenous variable, and the corresponding subsequences have similar time-frequency characteristics. Therefore, in the hybrid model, MFIF is applied to decompose the multivariate time series of carbon prices, and the linear and nonlinear characteristics are obtained. SE is used to reconstruct subsequence features, thereby reducing prediction cumulative error and time cost. TFT performs prediction and integration on the subsequences to obtain the final prediction result. The experimental results show that the model is generally better than other benchmark models, verifying model stability and reliability.
This study evaluates the importance of each exogenous variable to different subsequence forecasts through TFT and indicates important variables in carbon price forecasting and its time dependence. The results provide decision-makers with reliable carbon price forecast analysis and decision support.
The remaining sections of this paper are as follows: Sections 2 and 3 mainly introduce the basic theory and framework of our proposed model. Section 4 explains the experimental results and comparative analysis. Section 5 presents the conclusion and expounds on future work.
METHODSThis section introduces related techniques and methods, including MFIF, SE, and TFT. The construction of the whole model is also described.
MFIFCicone25 proposed fast iterative filtering (FIF), which can quickly realize iterative filtering (IF) calculation. IF, as an alternative to the EMD class, is an adaptive, local iterative decomposition method. It decomposes nonlinear and nonstationary time series S(t) into several intrinsic mode functions (IMFs) with similar oscillatory components, from high frequency to low frequency, and a residual term26 which is widely used in the field of natural sciences.27–30 However, the essential difference between IF and EMD is that EMD calculates the local mean of the sequence through the upper and lower envelopes of the sequence determined by the cubic spline, and the convergence and iterative termination conditions are difficult to prove. IF calculates the local mean of the sequence through the convolution of the preselected filter function and the sequence, the convergence and termination conditions have been strictly mathematically proven, and the prior convergence and stability are guaranteed.26 The iterative process of EMD and IF to generate IMFs is as follows: [Image Omitted. See PDF]where Sn + 1 = Mn(Sn), S1 = S is the original sequence. Ln(Sn) is the local mean of Sn. In EMD: [Image Omitted. See PDF]
In Equation (2), EU is the upper envelope of the sequence. EL is the lower envelope of the sequence. In IF, [Image Omitted. See PDF]
In Equation (3), w(x) is a filter function. L is the mask length, which is determined by the sequence length and the number of extreme points in the sequence.31 In FIF, discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT) are obtained through the fast Fourier transform, and then IF is quickly calculated as follows: [Image Omitted. See PDF]where I is the identity matrix, diag is a diagonal matrix, and N0 represents the number of calculations when IF calculates IMFs. However, FIF cannot decompose multiple time series simultaneously, only one time series at a time. The number of IMFs resulting from different variables may be unequal, splitting the internal relationships among different data to a certain extent. Therefore, this study uses MFIF,32 which can obtain the same number of IMFs and make the IMFs of different data aligned on the time scale. s is a multivariate time series. s = [v1,v2,…,vt], vt = [vi(t)]i=1,…,n, and n is the dimension of the multiple time series. Calculate angle θ(t), and the multidimensional vector rotates with time. [Image Omitted. See PDF]
According to θ(t), the calculation process of MFIF is shown in Table 2.
Table 2 Calculation process of MFIF
Algorithm: MFIF |
IMF = {} |
Compute |
while the number of extrema of 2: |
compute filter length L of filter function w |
set N0 = 0 |
while the stopping criterion is not satisfied: |
for to n: |
End for |
end while |
end while |
When the decomposition algorithm is used, the obtained IMFs are usually affected by the boundary effect; that is, spurious peaks appear at the boundary of the IMFs. Therefore, the boundary effect affects the decomposition quality of each IMF, influencing the overall prediction accuracy. In this study, the method of sequence boundary effect proposed by Stallone et al.33 is adopted, and the sequence is symmetrically extended before decomposition to eliminate the influence of its boundary effect. The processing method is as follows:
Subtract its mean m from the original sequence s(t).
Symmetrically extending the s(t)-m sequence to both ends of the original sequence, the generated extended sequence sext(t) is v times the length of the original sequence.
Multiply characteristic function λ by the extended sequence sext(t). λ is 1 in the interval corresponding to the original sequence s(t), smoothly approaching 0 at the new extended boundary.
Finally, add the mean value m of the original signal to obtain the final processed sequence.
SE34 is often used to measure time series complexity. Similar to a template matching search over the entire input signal, the main parameters controlling SE are embedding dimension m and tolerance r, which are used to control the length of each search segment (template) and the similarity among segments, respectively. The calculation process is as follows:
Step 1: Calculate the embedded sequence xm(i)(i = 1, 2, …, n − m + 1) of the original sequence x(i)(i = 1, 2, 3, …, n). [Image Omitted. See PDF]
Step 2: Calculate the distance dm between xm(i) and xm(j). [Image Omitted. See PDF]
Step 3: Calculate the probability of matching points. Am(r) is the probability that two sequences match m + 1 points. Bm(r) is the probability that two sequences match m points. vm is the number of dm[xm(i),xm(j)]≤r, i ≠ j, wm + 1 is the number of dm + 1[xm + 1(i),xm + 1(j)] ≤ r number, i ≠ j. [Image Omitted. See PDF] [Image Omitted. See PDF]
Step 4: Calculate the value of SE SampEn(m,r). [Image Omitted. See PDF]
When n is finite, SE can be expressed as follows: [Image Omitted. See PDF]
TFTTFT was proposed by the Google Cloud AI team, a multihorizon time series prediction model based on attention mechanism and deep neural network, and has been applied in many fields and achieved excellent results35,36 Compared with other neural networks, such as LSTM, CNN has excellent interpretability and can help understand internal relationships between input features and prediction targets. CNN also has an excellent performance in time series prediction. Figure 1 illustrates the main structure of TFT, comprising five main components. Gating mechanisms through the selection of variables to minimize the contribution of irrelevant variables. Variable selection networks learn the importance of different variables to predictions at each time step. Static covariate encoders are used to integrate static features. In temporal processing, the temporal self-attention decoder learns long and short-term temporal dependencies in the data. The Prediction interval constructs the forecast range of the target value through quantile forecasting. We provide a detailed description of mechanisms, variable selection networks, and temporal processing. For static covariate encoders and prediction intervals, both blocks were not used due to the lack of static information related to carbon prices, and the main focus of this study on deterministic forecasting. A detailed description of the two blocks is given in Lim et al.36
Gating mechanismsTo make the nonlinear process deal with the relationship between exogenous variables and targets, TFT adopts the gated residual network (GRN). GRN receives primary input a and optional context vector c. GRN is calculated as follows: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]
In the above formula, ELU is an exponential linear unit activation function. and represent the middle layer. LayerNorm is the standard layer normalization. ω stands for weight sharing. Using component control layers based on gated linear units (GLUs) provides flexibility to compress any part of the structure that is unnecessary for a given data set. The form of GLU is as follows: [Image Omitted. See PDF]where is the given input. σ(.) is the sigmoid activation function. is the weight matrix. is the bias vector. dmodel is the size of the hidden layer. ⊙ stands for Hadamard product. GLU enables TFT to control the degree of control GRN has over original sequence a. If necessary, skip this layer entirely because to suppress the nonlinear contribution, the GLU output can be close to 0. If no context vector c exists, then c can be treated as a zero vector.
Variable selection networksMultiple feature variables are used in predictions, but the relationships and specific contributions of these variables to targets are unknown. The variable selection layer is designed in TFT to select which variables are most important for prediction and to exclude extraneous noise features that can degrade model performance. is the flattened input at time t, and is the input transformation of the jth variable. By inputting and external context variable cs into GRN. Variable selection weight is then generated through the Softmax layer. [Image Omitted. See PDF]
At each time step, the nonlinear variation of is performed by GRN. The weights of variable selection are then weighted with the processed features. [Image Omitted. See PDF] [Image Omitted. See PDF]
Multihead attentionTFT improves on the multi-head attention mechanism by employing a self-attention mechanism to learn long-term dependencies in different time steps. Based on the query matrix and the key-value matrix , the attention mechanism scale value is as follows: [Image Omitted. See PDF]
N is the number of time steps input into the attention layer. A() is the normalization function. Compute attention by scaling clicks. [Image Omitted. See PDF]
To improve the learning ability of the single attention mechanism, multi-head attention is used in TFT, and different heads are used for different representation subspaces. [Image Omitted. See PDF] [Image Omitted. See PDF]
Among them, represents the query matrix weight, is the key-value matrix weight, and represents the scale matrix weights. WH performs a linear collection of all heads.
Different values are used in each head, so attention weights alone cannot illustrate the importance of features. On this basis, multihead attention is modified to have a shared value at each head, and the sum of all heads. [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]
Among them, is the weight contributed by all heads, and is the final nonlinear mapping. By changing the multihead attention weight generation form, each head can learn different temporal patterns, thereby effectively improving the expression ability. At the same time, simple interpretability studies can still be conducted by analyzing a set of attention weights.
Temporal processingIn temporal processing, TFT first uses LSTM encoder-decoder to generate uniform temporal features, denoted by , n is the position index, and a gated skip connection is employed in this layer. [Image Omitted. See PDF]
A static enrichment layer was then introduced to enhance temporal features. [Image Omitted. See PDF]where ce is the encoded context vector. Self-attention is added after the static enrichment layer. All temporal features are combined into a matrix , The multi-head attention (Section 2.3.3) is used at each time step: [Image Omitted. See PDF]
, the self-attention mechanism allows TFT to extract long-term dependence between data. After the self-attention, a gated skip connection is also added to simplify training: [Image Omitted. See PDF]
The output of self-attention is processed non-linearly using GRNs, which is similar to the static enrichment layer. [Image Omitted. See PDF]
Afterward, a gated skip connection that skips the entire transformer block is added to make the model adaptively adjust to the complexity, followed by connecting a fully connected layer to produce the predicted output. [Image Omitted. See PDF]
COMBINED MFIF–SE–TFT MODELIn this study, a new hybrid carbon price forecasting model is proposed that combines the advanced multivariate data decomposition and reconstruction technique MFIF-SE, multiple influence factors, and the interpretable deep learning model TFT. Figure 2 describes the framework of the proposed method. The forecasting steps of this model are as follows:
Step 1: The advanced multivariate data decomposition technology multivariate fast iterative filtering (MFIF) is used to decompose the multi-dimensional time series and get several multidimensional intrinsic mode functions (IMFs). MFIF can remove the noise in the original sequence and extract the features of different time frequencies, which can effectively improve the prediction accuracy of the model. In the decomposition process, not all multivariate series are input into MFIF, relevant market data with time-frequency features are included, and nonmarket data are not included in this process.
Step 2: Sample Entropy evaluates the complexity of each IMFs, and IMFs with similar complexity are reconstructed to reduce the relevant computational burden, increase the inference speed of the model, and avoid overfitting and error accumulation.
Step 3: The reconstructed multiunit subsequence is input into TFT for training. Before input into TF, due to large differences in the numerical ranges among different factors, to be conducive to the model training, all data are subject to max–min normalization before being put into the model training.37 Max–Min maps the data to [0, 1] and inversely normalizes the output result to obtain the prediction result. The prediction result of each subsequence is aggregated to obtain the final prediction result. TFT not only can extract the interrelation between carbon price and other factors but also obtain different time characteristics.
Step 4: The model performance and interpretability are analyzed. China's five carbon markets (Guangdong, Beijing, Shanghai, Hubei, and Shenzhen) are used. Mean absolute error (MAE), RMSE, mean absolute percentage error (MAPE), coefficient of determination (R2), and directional accuracy assessment (DA) are used to evaluate the prediction results. Interpretability results include the order of importance among variables and the attention to different lagged steps.
DATA COLLECTION AND EVALUATION SYSTEM CONSTRUCTIONThis section presents sources of carbon price data and their influence factors. The evaluation index system is also explained.
Data description Carbon price data collectionIn the empirical study, the closing prices of five carbon markets in China (Guangdong, Beijing, Shanghai, Hubei, and, Shenzhen) are selected to test the proposed model. These carbon markets are established early enough to provide ample data for experiments. Among them, Hubei and Guangdong each account for about 30% of the trading volume of China's carbon market.38 The selected time range is from the establishment of each carbon market to June 2, 2022, and the date with 0 transaction volume is deleted. The data are obtained from China Carbon Trading Network. Figure 3 shows each carbon price series, which is nonlinear; the price fluctuation law is complex, and no obvious clear pattern is observed. At the same time, differences in the fluctuation laws of different carbon prices are found. For example, Guangdong's carbon prices are more stable than other markets, whereas Beijing and Shenzhen show great volatility. Table 3 presents the relevant statistics for each carbon price. Among them, the mean, maximum (max), median, minimum (min), and variance of each carbon price indicate that the data have large fluctuations. The kurtosis, skewness, and Jbtest indicate that each piece of information does not obey the normal distribution. ADFtest shows that except for Shenzhen's carbon price, other carbon prices have not passed the stationarity test, suggesting that the series is nonstationary. As illustrated in Figure 3 and Table 3, all carbon price sequences are divided into three parts: training, validation, and test sets, with a ratio of 7:1:2.
Table 3 Statistics characteristics of the original data
Carbon market | Abbreviation | Size | Train samples | Validation samples | Test samples | Mean | Max | Median | Min | Std | Kurt | Skew | ADF (p) | JB (p) |
Guangdong | GD | 1755 | 1229 | 175 | 351 | 27.4223 | 95.26 | 22.71 | 8.1 | 17.2313 | 1.9684 | 1.5907 | 0.6264 | 1.0000 |
Beijing | BJ | 1250 | 875 | 125 | 250 | 60.0055 | 107.26 | 53.34 | 24 | 16.4310 | −0.5177 | 0.6650 | 0.2017 | 1.0000 |
Shanghai | SH | 1235 | 865 | 123 | 247 | 35.1317 | 63 | 38.1 | 4.2 | 10.5377 | 1.9004 | −0.8372 | 0.5567 | 1.0000 |
Hubei | HB | 1909 | 1338 | 190 | 381 | 25.8399 | 61.48 | 25.2 | 10.38 | 8.7343 | 0.4859 | 0.7406 | 0.6845 | 1.0000 |
Shenzhen | SZ | 1921 | 1345 | 192 | 384 | 32.8379 | 130.9 | 30.44 | 3.03 | 20.1615 | 0.8627 | 0.9261 | 1.0000 | 1.0000 |
According to previous studies, many factors impact carbon prices,39 leading to the uncertainties and complexities of carbon price changes. Considering the impacts of various factors on carbon prices in carbon price forecasts has an important influence on improving the accuracy of carbon price forecasts. This study comprehensively considers historical carbon prices, the trading volumes of carbon allowances, energy prices, economic factors, international carbon prices, carbon-intensive product prices, and environmental factors.
Historical carbon prices and trading volumes
Historical carbon prices are a key feature in carbon price forecasts. Therefore, correlation analysis is carried out on the historical and predicted values of each carbon price series. Figure 3 illustrates the partial autocorrelation coefficient (PACF) for different lag steps of each carbon price. Historical prices are found to have a strong correlation with predicted prices. Trading volumes not only directly reflect carbon market activities but also contain much information related to carbon market operations.
Energy prices
Changes in oil, natural gas, and coal price lead to changes in their consumption, which, in turn, affects carbon price volatility.40 In this study, Brent and WTI crude oils are selected as oil prices because the two major crude oil markets in the world can well reflect the crude oil market. For the natural gas price, that on the New York Mercantile Exchange is chosen due to the price limit of natural gas by the Chinese government. For coal prices, China is less dependent on coal wellheads and selects thermal and Qinhuangdao coal prices.
Economic factors
Macroeconomic growth affects the demand and consumption of society as a whole. The carbon trading market is a critical node in the social network. As a significant energy consumer, China is highly dependent on energy imports. Therefore, changes in exchange rates affect domestic energy markets and thus change carbon prices.20 This study selects the exchange rate of USD to RMB, the exchange rate of EUR to RMB, and H&S300 as economic factors.
International carbon price
The European Union Emissions Trading Scheme (EU-ETS) is the world's largest carbon trading market and plays a leading role in the international carbon trading market. Volatility in the EU-ETS may impact China's carbon trading market, and carbon futures (EUA) traded at over 85% in EU-ETS. Therefore, EUA is chosen as the international carbon price.
Carbon-intensive product prices
In addition to power companies, China's various carbon markets also cover many other carbon-intensive companies, such as cement, chemical, and steel. The decisions of these enterprises are significantly affected by product and raw material prices, resulting in changes in carbon emissions and carbon prices. Therefore, this study introduces the cement price index, iron ore price, and Chinese chemical product price.
Environmental factors
Weather changes can affect energy consumption and CO2 emissions and thus affect carbon prices. For example, in winter, heating in northern China increases energy consumption and CO2 emissions, resulting in an increased demand for carbon allowances and a rise in carbon prices. We select the highest temperature, lowest temperature, and air quality index (AQI) in various places to measure weather factors.
The above data come from the Wind database and Yahoo Finance. The dates of variable data acquisitions are consistent with the dates in the selected carbon market. To ensure the integrity of carbon price data information, interpolation is used to fill in the missing values in the impact factor data. To simplify the representation, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, and S18, respectively represent carbon price, Brent crude oil, WTI crude oil, natural gas price, Rotterdam coal, thermal coal price, Qinhuangdao coal price, EUA, H&S300, USD/CYN, EUR/CYN, cement price index, iron ore, chemical industry index, AQI, minimum temperature, maximum temperature, and trading volume. Figure 4 displays the Pearson correlation coefficient between each factor and the carbon price in Guangdong, which reflects the correlation with the carbon price to a certain extent.
Evaluation metricsFirst, several general evaluation criteria are used, including MAE, RMSE, MAPE, and R2. This study also joins the Orientation Accuracy Assessment (DA).41 DA represents the predicted direction accuracy. In practice, not only the forecast accuracy of carbon prices themselves must be considered but also the accuracy of forecasting carbon price rises and falls. The calculation formula of each evaluation index is as follows: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]
In the above formula, yi represents the actual value, and ŷi represents the predicted value. n is the number of predicted samples. The smaller the MAE, RMSE, and MAPE, the better the prediction result. The smaller the MAE, RMSE, and MAPE, the closer R2 and DA are to 1, and the better the prediction effect.
However, no comparability is found due to the differences in theoretical principles among different evaluation criteria. Even with the same evaluation criteria, we cannot judge that the predictive ability of the proposed model is better than the benchmark model only by the numerical values of the evaluation criteria because the differences in prediction errors between the two models may not be significant.42 Therefore, this study adopted two statistical methods to test the significance of the predictive power of different models. Firstly, the Diebold–Mariano test (DM test)43 statistical method to test whether the differences in the predictive ability of multiple models are significant. DM test compares the predicted values and actual value errors of different models to judge whether significant differences exist in the prediction effects of different models. The null hypothesis (H0) is that the two models have the same prediction accuracy, whereas the alternative hypothesis (H1) is the opposite. The DM test value is calculated as follows: [Image Omitted. See PDF] [Image Omitted. See PDF]where j and k represent the predicted values of different models. σ is the standard deviation of di.
Second, in the prediction system, if the changes of the two variables are synchronous, it can be shown that there is a high correlation; otherwise, the correlation is low. Therefore, grey relational analysis (GCA) measures the geometric proximity between the predicted results of different models and the actual values,44 indicating the correlation between predicted and actual values. The GCA is calculated as follows: [Image Omitted. See PDF] [Image Omitted. See PDF]where a and b are the maximum and minimum absolute errors of all models, respectively, and p is the resolution coefficient, which is generally 0.5.
EXPERIMENTAL STUDYIn this section, samples from five carbon markets in Guangdong, Beijing, Shanghai, Hubei, and Shenzhen are selected as experimental objects to verify the feasibility of our proposed model in carbon price prediction. First, the structure and basic parameters of the experimental model are described. Second, the experimental results are comprehensively analyzed and compared.
Experimental setupTo fully compare the proposed model with the benchmark model, it is necessary to set the model parameters appropriately. Many parameters exist in TFT, including time steps, batch sizes, learning rates, number of hidden layers, number of neuron nodes, and number of attention heads. Among them, the number of hidden layers and attention heads determines the main structure of TFT, which adopts the design in the original paper.36 In addition, since this study does not use the static features of the carbon price, a uniform mask is used as its input. The input time step is uniformly set to 10 according to the PACF result, that is, the price data of the first 10 steps are used to predict the price of the next step. Learning rates will affect the training time and convergence efficiency of the model. A large learning rate may cause the model to fail to converge and hover around the optimal value, while a small learning rate will lead to slow network convergence and easily fall into the local optimum. Therefore, this study adopts the adaptive learning rate reduction and early stopping mechanism to adjust the learning rate according to the validation set's accuracy. At the initial stage of training, a large learning rate is adopted, but with the increase of training rounds, the learning rate gradually decreases, so the model converges rapidly. And the batch size and the number of neuron nodes were determined experimentally based on Yun et al.45 The Guangdong carbon price was chosen as the experimental object, and the error on the validation set was calculated when the batch sizes were 16, 32, 64, and 128 and neuron nodes were 4, 8, 16, 32, 64, and 128, respectively (as showed in Table 4). It is found that the MAE, RMSE, and MAPE on the validation set are 0.262%, 0.329%, and 0.939% when the Batch size and neuron nodes are 16 and 64, respectively, which are optimal compared to the whole experimental sample. Therefore, the batch size and neuron nodes in TFT are set to 16 and 64.
Table 4 Performance of the proposed model: batch sizes and neuron nodes
Batch_size | Neuron nodes | MAE | RMSE | MAPE (%) | Running_time | Batch_size | Neuron nodes | MAE | MSE | MAPE (%) | Running_time |
16 | 4 | 0.368 | 0.451 | 1.317 | 342.194 | 32 | 4 | 0.405 | 0.501 | 1.447 | 184.582 |
8 | 0.279 | 0.359 | 1.000 | 422.423 | 8 | 0.272 | 0.348 | 0.972 | 220.950 | ||
16 | 0.353 | 0.421 | 1.260 | 480.679 | 16 | 0.430 | 0.543 | 1.535 | 150.531 | ||
32 | 0.273 | 0.352 | 0.979 | 413.102 | 32 | 0.272 | 0.351 | 0.973 | 187.407 | ||
64 | 0.262 | 0.329 | 0.939 | 428.417 | 64 | 0.311 | 0.384 | 1.112 | 167.341 | ||
128 | 0.276 | 0.356 | 0.989 | 390.950 | 128 | 0.271 | 0.351 | 0.972 | 177.006 | ||
64 | 4 | 0.408 | 0.492 | 1.463 | 96.477 | 128 | 4 | 0.366 | 0.467 | 1.305 | 59.296 |
8 | 0.319 | 0.390 | 1.143 | 110.177 | 8 | 0.394 | 0.466 | 1.410 | 72.550 | ||
16 | 0.279 | 0.345 | 0.999 | 101.942 | 16 | 0.333 | 0.421 | 1.189 | 70.822 | ||
32 | 0.269 | 0.347 | 0.964 | 120.436 | 32 | 0.267 | 0.345 | 0.957 | 53.778 | ||
64 | 0.279 | 0.359 | 1.000 | 115.425 | 64 | 0.275 | 0.355 | 0.984 | 55.588 | ||
128 | 0.274 | 0.354 | 0.982 | 96.581 | 128 | 0.279 | 0.360 | 0.999 | 67.752 |
To verify the effectiveness of our proposed model, several benchmark neural network models are selected for comparison, including the backpropagation (BP) network, TCN, recurrent neural network (RNN), LSTM, and gated recurrent unit (GRU) network. The LSTM is commonly used for carbon price prediction. The parameters of the LSTM are set according to Zhou et al.46 and verified by cross-validation. There are three layers in the LSTM, and the number of units in each layer is 128, 64, and 32, and the RNN and GRU are also recurrent neural networks like LSTM, so the structure of the RNN and GRU is the same as that of the LSTM. For TCN and BP, cross-validation was used to determine their parameter settings. The parameter setting of each comparable model is shown in Table 5. In addition, the setting of time steps, batch sizes, and learning rate as the same as TFT.
Table 5 Hypermeters for comparable models
Model | Hypermeter |
BP | Hidden layer = 3 |
Hidden size = 256,128,64 | |
Activation function = “Relu” | |
TCN | Hidden layer = 3 |
Kernel size = 2 | |
Dilated rate = 2 | |
Activation function = “Relu” | |
RNN | Hidden layer = 3 |
Unit = 128, 64,32 | |
Activation function = “Relu” | |
LSTM | Hidden layer = 3 |
Unit = 128, 64,32 | |
Activation function = “Relu” | |
GRU | Hidden layer = 3 |
Unit = 128, 64,32 | |
Activation function = “Relu” |
The input dimensions of the models used are all (10, 18). Given that BP cannot input data at different time steps, all-time data are flattened before input. The output size is 1. The loss function is MAE. To avoid overfitting, a Dropout layer with a coefficient of 0.2 is added after each layer. The Adam optimization algorithm is used to optimize the training. All operations can be performed in Python3.8.8 and Tensorflow2.7.0. In addition, multivariate empirical mode decomposition (MEMD)47 is selected as the contrast decomposition technique, and the parameter settings of MFIF refer to Cicone et al.32
Experimental results and analysisFirst, the decomposition and reconstruction of the original sequence of each carbon price are presented. Second, the prediction results of the proposed and benchmark models are comprehensively compared. Finally, the importance of each factor and time step obtained by the model is explained.
Decomposition and reconstruction of carbon price seriesAccording to the hybrid carbon price forecast structure proposed in this study, MFIF is used to decompose carbon prices and their related factors into several IMFs. Note that for transaction volumes, the minimum and maximum temperatures and AQI are excluded in the decomposition algorithm. Figure 5 takes the Guangdong carbon trading market as an example to illustrate the decomposition and reconstruction results. Many IMFs exist, which not only increase the computational complexity but also reduce prediction accuracy due to error accumulation. Therefore, reconstructing these IMFs is indispensable. Given that sequences with similar complexities have similar prediction difficulties, SE can be a good measure of sequence complexity.48 Therefore, reconstructing the original sequence according to the SE value can reduce computational cost and error. IMFs with the same complexities are reconstructed into a new sequence. The carbon price decomposition–reconstruction results in the remaining markets are shown in Figure 6. As displayed in Figures 5 and 6, from the first part to the third part of each carbon market, the frequency and complexity are from high to low. For the data decomposed by MEMD, the IMFs obtained by decomposing it are reconstructed into three parts in the same manner as above. When calculating SE and reconstruction, we do not calculate SE values for other factors because the IMFs of other factors are in one-to-one correspondence with the IMFs of carbon prices. Moreover, the IMFs of other factors can be reconstructed only by adding the corresponding IMFs according to the reconstruction results of each IMF of the carbon price.
Figure 5. Decomposition–reconstruction of Guangdong market. (A) is the SE of each IMF and which IMFs are grouped into a part, (B) is the result after reconstruction.
Figure 6. Reconstruction results. (A), (B), (C), and (D) represent the carbon markets of Beijing, Shanghai, Hubei, and Shenzhen, respectively.
The forecast results of the proposed MFIF–SE–TFT model and other benchmark models for various carbon market prices are shown in Figures 7, 8, 9, 10, and 11, representing Guangdong, Beijing, Shanghai, Hubei, and Shenzhen, respectively. The predicted value closest to the actual value comes from MFIF–SE–TFT, and its fitted curve is the closest to the actual curve. Even for particularly oscillating parts (e.g., the part framed in Figures 7–11) or in the extreme value part (max and min values), the model prediction accuracy is satisfactory without major deviations. For a single model, no good prediction is found for the violent fluctuation and long-term trend of carbon prices. For example, in Figures 7, 8, and 9, the trend of a single model, especially BP, for the latter half of the three markets of Guangdong, Shanghai, and Hubei is quite different from the actual value. Thus, the effectiveness of the decomposition algorithm applied to carbon price forecasting is illustrated, which is beneficial for the model to learn different parts of the carbon price series. Subplots (A)–(H) in Figures 7–11 further reflect the trend between predicted and actual values, with the diagonal line indicating that the true and predicted values are the same. Therefore, the closer the scatter is to this line, the better the performance. Among them, the distribution of the scatter points of MFIF–SE–TFT converges to the diagonal line. From these subgraphs, a significant deviation is observed in the prediction result of a single model, and the prediction progress of the model using decomposition technology has been improved to a certain extent. Looking further, the heatmap of the scatter plots in these subplots represents the absolute magnitude of the error between the predicted and actual values. Although the forecasting effects in different carbon markets are different, most of the absolute errors of MFIF–SE–TFT are less than 5. For a single model, especially BP, its prediction results are quite different from the actual value. The absolute errors of some points are greater than 20. The fitting curves, scatter plots, and heatmap of each carbon market in the above analysis preliminarily confirm that our proposed forecasting model has good forecasting accuracy.
Table 6 lists the values of the five carbon market evaluation criteria. The bold in Table 6 represents the optimal value under this criterion. In terms of MAE, R2, RMSE, MAPE, and DA, MFIF–SE–TFT has the best-predicted performance in each carbon market, with the smallest MAE, RMSE, and MAPE and the largest R2 and DA. According to the results of the above evaluation criteria compared with other benchmark models, the following key conclusions can be drawn:
The prediction effect of TFT in a single model is better than BP, TCN, RNN, LSTM, and GRU models, with small MAE, RMSE, and MAPE and large R2 and DA. In the Guangdong market, the various evaluation indicators of TFT increased by 49.68%, 44.37%, 45.27%, 2.49%, and 19.76%, respectively, compared with the second-best model in MAE, RMSE, MAPE, R2, and DA. Beijing market increased by 4.28%, 6.28%, 7.51%, and 2.26% in MAE, RMSE, MAPE, and R2, respectively. Shanghai market increased by 54.64%, 56.34%, 48.34%, and 30.80% in MAE, RMSE, MAPE, and R2, respectively. The Hubei market increased by 25.96%, 15.01%, 25.14%, and 1.25% in MAE, RMSE, MAPE, and R2, respectively. Shenzhen market increased by 5.61%, 1.21%, 3.75%, and 2.26% in MAE, RMSE, MAPE, and DA, respectively. These results are mainly attributed to the fact that TFT can fully learn the importance of different factors for carbon price prediction and eliminate some irrelevant variables. For a single model, these variables interfere with carbon price prediction, and TFT can learn the temporal characteristics of carbon prices.
After introducing the decomposition algorithm, including MEMD and MFIF, the prediction accuracy of a single model can be significantly improved for the following reasons: (a) The decomposition algorithm can reduce the complexity of the original series, whereas SE can accurately estimate the complexity of the time series. (b) The TFT model is suitable for capturing the internal features of different factors in subsequences that are beneficial to the prediction of different subsequences so that the model can well learn different features, thereby improving prediction accuracy. However, comparing the two decomposition methods of MEMD and MFIF, the decomposition effect of MFIF is better than MEMD. In the Guangdong market, the various evaluation indicators of MFIF–SE–TFT increased by 61.07%, 61.17%, 59.63%, 0.71%, and 52.71%, respectively in MAE, RMSE, MAPE, R2, and DA compared with MEMD–SE–TFT. Beijing market increased by 32.93%, 29.33%, 33.61%, 6.19%, and 26.65%; Shanghai market increased by 65.39%, 70.83%, 66.03%, 2.25%, and 79.50%; Hubei market increases by 15.90%, 15.79%, 15.02%, 0.51%, and 17.80%; Shenzhen market increases by 70.93%, 71.17%, 75.77%, 65.81%, and 107.52% in MAE, RMSE, MAPE, R2, and DA, respectively. MFIF can decompose carbon prices in a more detailed manner than MEMD, so a more regular subsequence can be obtained after SE.
As presented in Table 6, differences are observed in the forecast results of different carbon markets. The Beijing and Shenzhen carbon markets are less accurate than other carbon markets, with slightly higher MAE, RMSE, and MAPE. To determine the reasons for this phenomenon, the forecast result of each subsegment in each carbon market is extracted, and three indicators, MAE, RMSE, and MAPE, are selected. The specific results are shown in Table 7. The evaluation index result of each part in Table 7 reveals that the prediction error of Part I contributes a significant part of the overall error because Part I of the carbon price is mainly composed of high-frequency IMFs, which contain large uncertainties and volatilities and much noise and outliers. Beijing and Shenzhen markets are more volatile in the short term than other markets. Therefore, the difference in Part Ⅰ prediction leads to the difference in the final forecast result of each carbon market. However, for Parts Ⅱ and III, the model can be fitted almost perfectly.
Table 6 Prediction error evaluation metrics
Carbon market | Evaluation metrics | Forecasting Models | |||||||
BP | TCN | RNN | LSTM | GRU | TFT | MEMD-SE-TFT | MFIF-SE-TFT | ||
Guangdong | MAE | 11.459 | 5.236 | 4.869 | 3.453 | 2.043 | 1.028 | 0.971 | 0.378 |
RMSE | 16.940 | 8.331 | 6.826 | 4.492 | 3.205 | 1.783 | 1.558 | 0.605 | |
MAPE | 18.143% | 7.984% | 8.150% | 6.136% | 3.298% | 1.805% | 1.734% | 0.700% | |
R2 | 0.335 | 0.767 | 0.843 | 0.932 | 0.965 | 0.989 | 0.992 | 0.999 | |
DA | 0.504 | 0.493 | 0.504 | 0.507 | 0.507 | 0.521 | 0.516 | 0.788 | |
Beijing | MAE | 11.252 | 8.148 | 6.423 | 6.287 | 6.805 | 6.018 | 5.475 | 3.672 |
RMSE | 14.414 | 10.006 | 8.238 | 8.259 | 8.972 | 7.740 | 6.938 | 4.903 | |
MAPE | 21.789% | 13.488% | 11.133% | 10.675% | 11.820% | 9.873% | 8.786% | 5.833% | |
R2 | 0.519 | 0.768 | 0.843 | 0.842 | 0.814 | 0.861 | 0.889 | 0.944 | |
DA | 0.448 | 0.540 | 0.460 | 0.464 | 0.480 | 0.452 | 0.484 | 0.613 | |
Shanghai | MAE | 7.089 | 5.107 | 3.773 | 4.177 | 2.522 | 1.144 | 0.731 | 0.253 |
RMSE | 9.877 | 7.951 | 6.249 | 5.823 | 4.104 | 1.792 | 1.200 | 0.350 | |
MAPE | 14.234% | 9.998% | 7.232% | 8.465% | 4.954% | 2.559% | 1.666% | 0.566% | |
R2 | 0.300 | 0.337 | 0.360 | 0.444 | 0.724 | 0.947 | 0.976 | 0.998 | |
DA | 0.331 | 0.327 | 0.347 | 0.343 | 0.331 | 0.318 | 0.400 | 0.718 | |
Hubei | MAE | 4.363 | 3.845 | 1.967 | 1.341 | 1.144 | 0.847 | 0.648 | 0.545 |
RMSE | 5.606 | 5.458 | 2.765 | 1.850 | 1.566 | 1.331 | 0.950 | 0.800 | |
MAPE | 10.935% | 8.958% | 4.835% | 3.458% | 3.099% | 2.320% | 1.764% | 1.499% | |
R2 | 0.454 | 0.482 | 0.867 | 0.941 | 0.957 | 0.969 | 0.984 | 0.989 | |
DA | 0.497 | 0.513 | 0.487 | 0.461 | 0.455 | 0.479 | 0.545 | 0.642 | |
Shenzhen | MAE | 6.709 | 6.512 | 5.583 | 6.98 | 5.083 | 4.798 | 4.389 | 1.276 |
RMSE | 8.973 | 9.108 | 8.224 | 9.21 | 7.93 | 8.026 | 7.475 | 2.155 | |
MAPE | 53.105% | 48.676% | 45.182% | 59.331% | 43.718% | 42.079% | 39.162% | 9.489% | |
R2 | 0.398 | 0.379 | 0.494 | 0.365 | 0.53 | 0.518 | 0.582 | 0.965 | |
DA | 0.394 | 0.415 | 0.355 | 0.36 | 0.371 | 0.342 | 0.399 | 0.828 |
Table 7 Subpart prediction results
Evaluation metrics | Part Ⅰ | Part Ⅱ | Part Ⅲ | |
Guangdong | MAE | 0.2713 | 0.0256 | 0.0811 |
RMSE | 0.4672 | 0.0551 | 0.1251 | |
MAPE | 1845.58% | 0.035% | 3.73% | |
Beijing | MAE | 3.6694 | 0.1425 | 0.0528 |
RMSE | 4.8971 | 0.2383 | 0.0593 | |
MAPE | 9172.33% | 0.31% | 6.21% | |
Shanghai | MAE | 0.3295 | 0.0757 | 0.0413 |
RMSE | 0.5471 | 0.1207 | 0.0732 | |
MAPE | 4653.31% | 4.14% | 0.01% | |
Hubei | MAE | 0.5365 | 0.0235 | 0.0492 |
RMSE | 0.7917 | 0.0347 | 0.0714 | |
MAPE | 4856.25% | 13.71% | 0.12% | |
Shenzhen | MAE | 2.2623 | 0.0988 | 0.0498 |
RMSE | 3.0890 | 0.1697 | 0.0641 | |
MAPE | 5220.93% | 3.17% | 0.40% |
The above is an analysis of the traditional evaluation indicators, but the model's predictive ability cannot be judged simply because of the levels of the above evaluation indicators. These differences may be insignificant or may be due to chance in model training. Therefore, the DM test and GCA are performed to determine whether the prediction results of multiple models are different. Table 7 presents the DM test and GCA results of MFIF–SE–TFT and the benchmark models in various carbon markets. From Table 8, the proposed MFIF–SE–TFT model in each carbon market is significantly better than the comparative models at a 95% confidence interval. Therefore, the established model has a probability of more than 95% better than the benchmark model. While the GCA of the proposed model is the highest, its most correlated with the actual value. Indicating that MFIF–SE–TFT dramatically improves the accuracy of carbon price prediction compared with other models.
Table 8 DM test and GCA results
Model | Guangdong | Beijing | Shanghai | Hubei | Shenzhen | |||||
DM | GCA | DM | GCA | DM | GCA | DM | GCA | DM | GCA | |
BP | 11.8355*** | 0.757 | 10.2276*** | 0.685 | 9.1531*** | 0.693 | 12.6734*** | 0.746 | 9.7301*** | 0.797 |
TCN | 10.2866*** | 0.860 | 9.1316*** | 0.734 | 8.1878*** | 0.768 | 11.1564*** | 0.782 | 9.4200*** | 0.807 |
RNN | 10.6461*** | 0.857 | 8.0565*** | 0.780 | 7.8540*** | 0.818 | 7.9341*** | 0.862 | 7.6040*** | 0.830 |
LSTM | 12.1364*** | 0.888 | 7.1674*** | 0.785 | 9.0745*** | 0.780 | 6.4625*** | 0.898 | 9.4589*** | 0.790 |
GRU | 7.5170*** | 0.932 | 7.3716*** | 0.774 | 8.0323*** | 0.86 | 7.4033*** | 0.911 | 6.1975*** | 0.845 |
TFT | 8.2677*** | 0.958 | 7.3350*** | 0.79 | 7.8590*** | 0.887 | 6.3421*** | 0.918 | 7.8805*** | 0.841 |
MEMD-SE-TFT | 4.4585*** | 0.965 | 7.1346*** | 0.802 | 3.3983*** | 0.947 | 5.0412*** | 0.947 | 5.4904*** | 0.866 |
MFIF-SE-TFT | 0.986 | 0.856 | 0.980 | 0.954 | 0.951 |
5% significance level.
The accuracy of a single benchmark model (BP, TCN, RNN, and LSTM GRU) is low, and a large gap exists between the prediction accuracy of MFIF–SE–TFT. These deep learning models also have strong nonlinear fitting capabilities and are often used in time series forecasting. Therefore, this study explores the reason for such a gap: too many variables are selected to predict carbon prices. Many variables have effects on carbon prices, and different variables have different mechanisms for carbon prices, and the carbon price is highly uncertain with large noise, other variables also have this feature. When there are more exogenous variables, extracting the characteristic relationship between the highly uncertain variables and the original carbon price series is difficult for a single model, and overfitting is likely to occur. Three extended experiments were conducted to verify the above analysis. First, use only historical carbon price data as input. Second, the exogenous variables are used as feature input after dimensionality reduction through factor analysis.20 Kaiser–Meyer–Olkin (KMO) and the Bartlett test of phericity are used to determine whether the original variables are suitable for factor analysis. The statistical test is shown in Table 9. According to Table 9, the KMO value of almost all samples is more significant than 0.7, and the probability is less than the confidence level, indicating that factor analysis can be used in all samples. Thirdly, irrelevant features were eliminated by random forest (RF)17 feature screening before being used as input, and Figure 12 shows the top 10 RF scores. Set the threshold to 0.05, that is, variables with scores greater than 0.05 are selected as input features. For extended experiments, the same time step is 10. The parameter settings are the same as those in Table 5. The five indicators of MAE, R2, RMSE, MAPE, and DA are also selected, The experimental results are shown in Tables 10–12. Compare the results between Tables 10–12 and Table 6. In addition to the DA indicator, the evaluation results using only one feature of a carbon price are significantly better than the prediction results when other variables are included. Then, the variable features can improve the prediction accuracy of a single model by dimensionality reduction or filtering, indicating that introducing exogenous variables is beneficial. The prediction accuracy of both treatments has advantages and disadvantages, but both are lower than TFT. Extended experiments validate our analysis that it is difficult for a single model to learn valid information between features when there are more variable features, which leads to overfitting of the model. However, it is difficult to retain all the data information using dimensionality reduction and feature screening. While TFT has considerable advantages in dealing with multiple variables, it can reasonably deal with the complex relationship between variables and carbon price fluctuations, capture its characteristics, and eliminate and weaken irrelevant interfering variables to achieve improved forecasting effects. Three extended experiments further illustrate the superiority and stability of the proposed model.
Table 9 KMO and Bartlett test
Guangdong | Beijing | Shanghai | Hubei | Shenzhen | |||||
KMO | 0.688 | KMO | 0.704 | KMO | 0.704 | KMO | 0.71 | KMO | 0.696 |
Bartlett test of sphericity | Bartlett test of sphericity | Bartlett test of sphericity | Bartlett test of sphericity | Bartlett test of sphericity | |||||
Approximate chi-square value | 31,173.299 | Approximate chi-square value | 25,062.618 | Approximate chi-square value | 23,748.974 | Approximate chi-square value | 35,468.347 | Approximate chi-square value | 36,698.013 |
Degrees of freedom | 136 | Degrees of freedom | 136 | Degrees of freedom | 136 | Degrees of freedom | 136 | Degrees of freedom | 136 |
Significance | 0 | Significance | 0 | Significance | 0 | Significance | 0 | Significance | 0 |
Table 10 Single model results of using carbon price itself
Carbon market | Evaluation metrics | Forecasting models | ||||
BP | TCN | RNN | LSTM | GRU | ||
Guangdong | MAE | 1.185 | 1.428 | 1.432 | 2.536 | 1.913 |
RMSE | 1.897 | 2.258 | 2.285 | 3.454 | 2.612 | |
MAPE | 1.890% | 2.355% | 2.409% | 4.402% | 3.370% | |
R2 | 0.988 | 0.983 | 0.982 | 0.960 | 0.977 | |
DA | 0.605 | 0.542 | 0.499 | 0.513 | 0.527 | |
Beijing | MAE | 6.122 | 6.233 | 6.147 | 6.146 | 6.033 |
RMSE | 7.814 | 7.928 | 7.789 | 7.831 | 7.748 | |
MAPE | 10.013% | 10.380% | 9.910% | 9.871% | 9.738% | |
R2 | 0.859 | 0.855 | 0.860 | 0.858 | 0.861 | |
DA | 0.452 | 0.440 | 0.468 | 0.480 | 0.452 | |
Shanghai | MAE | 7.638 | 4.274 | 3.840 | 3.610 | 3.532 |
RMSE | 2.764 | 2.067 | 1.959 | 1.900 | 1.879 | |
MAPE | 3.697% | 2.871% | 2.725% | 2.654% | 2.624% | |
R2 | 0.875 | 0.930 | 0.937 | 0.941 | 0.942 | |
DA | 0.347 | 0.306 | 0.327 | 0.343 | 0.335 | |
Hubei | MAE | 1.036 | 1.189 | 1.022 | 0.904 | 0.889 |
RMSE | 1.451 | 1.661 | 1.454 | 1.396 | 1.402 | |
MAPE | 2.720% | 3.044% | 2.680% | 2.446% | 2.418% | |
R2 | 0.963 | 0.952 | 0.963 | 0.966 | 0.966 | |
DA | 0.492 | 0.479 | 0.468 | 0.482 | 0.482 | |
Shenzhen | MAE | 5.460 | 4.809 | 4.875 | 4.823 | 5.092 |
RMSE | 8.820 | 8.034 | 8.100 | 7.816 | 8.434 | |
MAPE | 41.385% | 42.118% | 42.457% | 42.645% | 40.121% | |
R2 | 0.418 | 0.517 | 0.509 | 0.543 | 0.468 | |
DA | 0.436 | 0.347 | 0.358 | 0.360 | 0.342 |
Table 11 Single model results of using factor analysis
Carbon market | Evaluation metrics | Forecasting models | ||||
BP | TCN | RNN | LSTM | GRU | ||
Guangdong | MAE | 1.118 | 1.107 | 1.497 | 1.825 | 1.706 |
RMSE | 1.637 | 1.725 | 2.291 | 3.045 | 2.930 | |
MAPE | 1.788% | 1.719% | 2.327% | 2.873% | 2.671% | |
R2 | 0.991 | 0.990 | 0.982 | 0.969 | 0.971 | |
DA | 0.547 | 0.536 | 0.553 | 0.519 | 0.504 | |
Beijing | MAE | 5.985 | 6.103 | 5.773 | 6.145 | 6.024 |
RMSE | 7.558 | 7.876 | 7.318 | 8.265 | 7.730 | |
MAPE | 9.742% | 10.593% | 9.611% | 11.210% | 9.852% | |
R2 | 0.868 | 0.856 | 0.876 | 0.842 | 0.862 | |
DA | 0.480 | 0.460 | 0.484 | 0.496 | 0.452 | |
Shanghai | MAE | 4.074 | 4.748 | 3.558 | 3.454 | 3.376 |
RMSE | 5.672 | 6.610 | 4.931 | 5.310 | 5.059 | |
MAPE | 8.268% | 9.636% | 7.209% | 6.885% | 6.756% | |
R2 | 0.473 | 0.284 | 0.601 | 0.538 | 0.580 | |
DA | 0.385 | 0.389 | 0.401 | 0.409 | 0.425 | |
Hubei | MAE | 1.020 | 1.184 | 1.019 | 0.894 | 0.877 |
RMSE | 1.417 | 1.687 | 1.325 | 1.309 | 1.190 | |
MAPE | 2.765% | 3.023% | 2.754% | 2.394% | 2.363% | |
R2 | 0.965 | 0.951 | 0.969 | 0.970 | 0.975 | |
DA | 0.478 | 0.454 | 0.480 | 0.488 | 0.467 | |
Shenzhen | MAE | 5.244 | 4.800 | 4.927 | 4.809 | 4.963 |
RMSE | 7.114 | 7.135 | 7.677 | 6.653 | 7.810 | |
MAPE | 46.846% | 42.131% | 42.982% | 46.468% | 43.078% | |
R2 | 0.621 | 0.619 | 0.559 | 0.669 | 0.544 | |
DA | 0.453 | 0.393 | 0.448 | 0.367 | 0.385 |
Table 12 Single model results of using RF feature screening
Carbon market | Evaluation metrics | Forecasting models | ||||
BP | TCN | RNN | LSTM | GRU | ||
Guangdong | MAE | 1.105 | 1.353 | 1.423 | 2.036 | 1.692 |
RMSE | 1.680 | 2.326 | 2.247 | 2.689 | 2.479 | |
MAPE | 1.779% | 1.983% | 2.203% | 3.601% | 2.796% | |
R2 | 0.991 | 0.982 | 0.983 | 0.976 | 0.979 | |
DA | 0.490 | 0.541 | 0.493 | 0.556 | 0.513 | |
Beijing | MAE | 6.107 | 6.053 | 6.035 | 6.078 | 6.036 |
RMSE | 8.080 | 7.919 | 7.833 | 7.812 | 7.740 | |
MAPE | 10.997% | 10.311% | 10.766% | 9.887% | 9.658% | |
R2 | 0.849 | 0.855 | 0.858 | 0.859 | 0.861 | |
DA | 0.456 | 0.432 | 0.464 | 0.456 | 0.468 | |
Shanghai | MAE | 2.481 | 2.786 | 2.949 | 2.811 | 2.010 |
RMSE | 4.214 | 4.568 | 5.102 | 4.638 | 3.305 | |
MAPE | 4.842% | 5.513% | 5.709% | 5.642% | 4.025% | |
R2 | 0.709 | 0.658 | 0.573 | 0.647 | 0.821 | |
DA | 0.356 | 0.377 | 0.352 | 0.364 | 0.397 | |
Hubei | MAE | 1.015 | 1.150 | 1.009 | 0.855 | 0.852 |
RMSE | 1.424 | 1.627 | 1.529 | 1.064 | 1.187 | |
MAPE | 2.409% | 2.795% | 2.705% | 2.395% | 2.050% | |
R2 | 0.965 | 0.954 | 0.959 | 0.980 | 0.976 | |
DA | 0.480 | 0.470 | 0.472 | 0.454 | 0.470 | |
Shenzhen | MAE | 5.221 | 4.859 | 4.844 | 4.822 | 4.995 |
RMSE | 6.545 | 6.940 | 7.293 | 6.532 | 6.792 | |
MAPE | 45.236% | 41.260% | 40.643% | 39.796% | 41.571% | |
R2 | 0.680 | 0.640 | 0.602 | 0.681 | 0.655 | |
DA | 0.370 | 0.349 | 0.339 | 0.383 | 0.380 |
Figure 13 shows the importance of different past variables in Part 1 and the importance of different lag orders in Part 1, Part 2, and Part 3 in the five carbon price sample experiments. Through Equation (17), the importance of different past input variables can be obtained by averaging the weights of variables selected in the output test set overall lag orders. Similarly, the attention of different lags orders can be obtained according to the score of attention in Equation (21). The results of weight importance are analyzed according to Figure 13.
Differences are observed in the importance of feature variables among different carbon price sample forecasts. In the Beijing and Shenzhen experiments, the carbon price itself has the highest importance of more than 0.7 and 0.6, respectively, while other feature variables account for only a small part, indicating that in both carbon markets, the carbon price itself makes the most significant contribution to the prediction of the short-term part. In the remaining three carbon markets, other variables also show considerable importance in addition to the carbon prices themselves. The main focus is on energy prices (Brent, WTI, natural gas, coal), so the changes in energy prices have a considerable impact on carbon prices, which is consistent with some scholars' studies that energy prices play an important role in explaining the volatility of carbon prices.49 In addition to energy prices H&S300, cement prices and trading volumes contribute considerable importance in these three carbon markets.
For the low and medium frequency part of the forecast, the carbon price contributes to the entire importance. In Figure 12, only the importance of the Part1 feature variables is shown because the other two-part models assign all importance weights to carbon prices themselves. Parts II and III represent the mid-and long-term fluctuations and trends of carbon prices. The lag step number chosen for this study is 10, but capturing the mid-and long-term dependencies between carbon prices and other variables is difficult. In addition, the medium- and long-term trends of carbon prices are mainly affected by major events and long-term supply and demand,50 which are mainly generated within the carbon market. The importance weights are allocated to the carbon price in the experiments using a single TFT. Compared with Parts II and II, Part Ⅰ has high fluctuation frequency and low regularity, resulting in the model failing to learn multilevel features in the carbon price sequence. Therefore, decomposition techniques are effective for forecasting carbon prices.
The importance of each lag shows the attention change over temporal features. A general trend is observed in each part's forecast; the smaller the lag order, the more significant the contribution to the carbon price forecast. In Part III prediction, except for the Shanghai carbon market, the first three lags orders occupy almost all the importance. Part Ⅲ is relatively smooth and has low volatility, its performance is very regular, and the first few steps of the lag contain most of the information. Carbon price predictions in Part II rely on information in larger lag orders. In Part Ⅰ, the importance of the lag order fluctuates seriously, and it depends on a long lag time step and a wide time range; that is, the greater the volatility of the forecast series, the more information the model needs. Therefore, incorporating an attention mechanism can resolve the temporal characteristics of carbon price dependences in different sections.
The nonlinearity and nonstationarity of carbon prices and the influences of many external variables pose huge challenges for the accurate predictions of carbon prices. In this study, a new carbon price prediction model, MFIF–SE–TFT, is proposed, which not only considers carbon prices themselves but also the impacts of other variables on carbon prices. First, the multidimensional time series is decomposed using the advanced MFIF, which can effectively deal with the nonlinearity and nonstationarity of carbon prices. Second, to reduce the calculation speed of the model, this study adopts SE to calculate the complexities of IMFs and reconstructs IMFs with the same complexities. Third, the reconstructed carbon price subsequence and external variable sequence are input into the TFT model to obtain the prediction result of each subsequence and aggregate them to obtain the final prediction result. Five carbon trading markets (Guangdong, Beijing, Shanghai, Hubei, and Shenzhen) and multiple benchmark models (BP, TCN, RNN, LSTM, GRU) are selected for simulation experiments. According to the experimental results, the following conclusions are drawn:
TFT outperforms single BP, TCN, RNN, LSTM, and GRU in carbon price prediction. When a single model incorporates many variable features, it not only cannot improve the model prediction accuracy but also leads to a decrease in prediction accuracy due to overfitting, some feature engineering methods need to be used in a single model to reduce variables' dimension. TFT can well learn adaptively the potential relationships between variables and carbon prices, mine inherent features, extract effective information, and eliminate interference factors, thereby improving model efficiency and stability.
MFIF has a better decomposition effect than MEMD. High degrees of nonlinearity and uncertainty are observed in carbon prices. After introducing the decomposition algorithm, the prediction performance of TFT is further improved. Thus, decomposition technology can effectively extract complex features in carbon price series. Compared with MFIF and MEMD, MFIF has better decomposition efficiency; due to its better algorithm design, it can solve the defects of EMD-type methods.
External variables mainly contribute to the prediction of the high-frequency part of carbon prices. Through the importance weights of TFT for external variables, in Part Ⅰ, that is, the high-frequency part, external variables contribute to carbon price predictions, of which crude oil prices have a significant role. For Parts II and III, external variables hardly contribute any role. Importance analysis of different feature variables of carbon price forecasts can provide policymakers with valuable information.
The predictions of different subsequences have different time dependencies. According to the results of the attention mechanism, the prediction in Part Ⅲ only depends on the previous day's information. As the subsequence complexity increases, Parts I and II gradually need to rely on information from longer time steps.
Future studies can first consider additional lag time steps. The number of lag steps chosen in this research is 10. If a more significant number of time steps is used, then the long-term dependence of a carbon price on itself and other variables can be taken. At the same time, as the number of time steps increases, more external variables, such as policy factors and seasonal periodicity, can be considered. Second, the weights of different variables learned by TFT are generated by the model; determining whether a more profound economic explanation for them exists is worth paying attention to. In addition, more multivariate decomposition techniques can be considered for carbon price forecasting.
ACKNOWLEDGMENTSThis research was funded by the Project of Sichuan Oil and Natural Gas Development Research Center (Grant No. SKB20-06), Strategic Research and Consulting Project of the Chinese Academy of Engineering (2022-28-33), Major Project of Sichuan Philosophy and Social Science Planning Research (Grant No. SC21ZDZT010), Key Project of Chengdu Water Ecological Civilization Construction Research Key Base (Grant No. SST2021-2022-03), Key Project of Mineral Resources Research Center in Sichuan Province (Grant No. SCKCZY2021-ZD002), Key Project of Chengdu Park City Demonstration Zone Construction Research Center (Grant No. GYCS2021-ZD001), General Project of Research Center for Science and Technology Innovation and New Economy in Chengdu-Chongqing Economic Circle (Grant No. CYCX2021YB08), Key Project of Sichuan Leisure Sports Industry Development and Research Center (Grant No. XXTYCY2021A01), Social Science Research of Sichuan Province for the 14th Five-year Plan (Grant No. SC21B007), Philosophy and Social Science Research Foundation of the Chengdu University of Technology (Grant No. YJ2021-YB002), and General Project of Research Center for Sichuan Disaster Economy (Grant No. ZHJJ2021-YB001).
CONFLICT OF INTERESTThe authors declare no conflict of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The accurate forecasts of carbon prices can help policymakers and enterprises further understand the laws of carbon price fluctuations and formulate related policies and investment strategies. Nowadays, many carbon price prediction models have been proposed. However, some models ignore the time–frequency relationship when considering exogenous variables and fail to measure their importance to the forecasting results, leading to unsatisfactory results. Therefore, this study proposes a novel hybrid model for carbon price forecasting on the basis of advanced multidimensional time series decomposition techniques and interpretable multifactor models. In the proposed model, multivariate fast iterative filtering is used to decompose carbon price and its exogenous variable sequence into several intrinsic mode functions, which can overcome the nonlinearity and nonstationarity of carbon prices and obtain their intrinsic characteristics. Meanwhile, temporal fusion transform (TFT) is used to interpret predictions for multivariate time series. TFT is a new attention-based deep learning model combining high-performance multihorizon prediction and interpretability and can adaptively select the optimal features for carbon price prediction. Five carbon markets in Guangdong, Beijing, Shanghai, Hubei, and Shenzhen are selected for experimental studies. Empirical results indicate that the proposed model outperforms the compared benchmark models in all performance metrics. In the interpretable output of TFT, the prediction of the high-frequency part requires the participation of exogenous variables and has a long time dependence; for the middle and low-frequency part, only using the carbon price itself and a short time step can lead to good results. This finding can inform future research on carbon price forecasting and help policymakers.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer