A VMD and LSTM based hybrid model of load forecasting for power grid security

—As the basis for the static security of the power grid, power load forecasting directly aﬀects the safety of grid operation, the rationality of grid planning, and the economy of supply-demand balance. However, various factors lead to drastic changes in short-term power consumption, making the data more complex and thus more diﬃcult to forecast. In response to this problem, a new hybrid model based on Variational mode decomposition (VMD) and Long Short-Term Memory (LSTM) with seasonal factors elimination and error correction is proposed in this paper. Comprehensive case studies on four real-world load datasets from Singapore and the United States are employed to demonstrate the eﬀectiveness and practicality of the proposed hybrid model. The experimental results show that the prediction accuracy of the proposed model is signiﬁcantly higher than that of the contrast models.


I. Introduction
T HE load of the power grid is closely related to the security of the power grid. Short-term load forecasting (STLF) is a necessary condition to ensure the safety and economic operation of the power grid. It is also the basis for formulating power generation and power supply plans and balancing the supply and demand of the power grid. Only accurate load forecasting can reasonably arrange the operation mode of the power grid and maximize the benefits of the power supply enterprises. At the same time, load forecasting is also an effective proof of the maintenance and operation mode of electrical facilities. Therefore, the practicality and accuracy of STLF models are significant to the safety and stability of smart grid normal operations [1].
There is a complex non-linear relationship between electric load data. Therefore, decomposition algorithms are used by scholars to decompose the load data into multiple subsequences to reduce the difficulty of prediction. A hybrid method based on empirical modal decomposition (EMD) and LSTM with extreme learning machine optimization algorithm was proposed for predicting biofuel in [2]. A feature extraction STLF model based on EMD and improved generalized regression neural network(GRNN) with minimizing redundancy and maximizing correlation(mRMR) was presented in [3]. A load identification method based on modified ensemble empirical mode decomposition (MEEMD) for load state identification was proposed in [4]. But there are two technical problems with the EMD-like decomposition algorithm. First, the number of their decompositions is unstable when decomposing complex time series, and the subsequent compilation of the program cannot determine the number of predictive models will lead to interruptions. Second, they will add white noise in the decomposition, which will increase the difficulty of prediction and decrease the prediction accuracy.
The experimental results in [5] show that VMD has better decomposition than CEEMD. A new hybrid short-term prediction model based on VMD-WT(Wavelet transform) and Radial basis function(RBF) for wind speed was proposed in [6]. With VMD decomposing the load data into linear and nonlinear, which contributes to the well prediction accuracy [7]. The VMD model can decompose various continuous time series data, but the problem is how to find the best number of decomposition modes [8]. To solve this problem, many scholars have applied different evaluation methods to consider the number of decomposition modes. Permutation Entropy(PE) was used to determine the number of MEEMD decomposition time series submodels in [9]. The multi-scale alignment entropy(MPE) was used to select the intrinsic mode functions (IMFs) with the highest energy by the authors of [10]. Evaluating the effectiveness of the VMD model for decomposing data depends largely on the purpose of the experiment. Therefore, selecting an approprite evaluation method is critical for the VMD model.
With the growth of performance of computer, deep learning methods have been used in all kinds of fileds as emotion analysis [11], intelligent traffic [12], ensemble classification [13], smart grid and so on. The accuracy of the hybrid methods is improved greatly since which takes advantage of the possibilities of the individual approach [14]. A prediction accuracy of 99.07% is achieved by a novel method based Multidirectional LSTM proposed by the authors of [1]. An effective hybrid model based on VMD and LSTM considering temperature and humidity with Bayesian optimization algorithm for STLF was proposed by the authors of [15]. A building STLF framework based LSTM-like neural networks and attention mechanisms was developed in [16] , the prediction results of LSTM-like models with attention were more accurate compared to other LSTMlike models without attention. A composite model based on

Work
Deep learning method Application Data analysis Error analysis [1] Multi-LSTM Forecasting stability of grid yes no [15] VMD+LSTM Forecasting electric load yes no [16] LSTM/BiLSTM with attention Forecasting electric load yes no [17] SVM with SecRPSO Forecasting electric load yes no [18] ILMD+ANN Forecasting electric load yes no [19] ICEEMD+ARIMA/ELM et al Forecasting wind speed yes no [20] LSTM with Haar wavelet Forecasting building energy consumer yes no Our work VMD+LSTM Forecasting electric load yes yes SVR considering multi-features expecially real-time prices with second-order oscillation and repulsion particle swarm optimization (SecRPSO) for electric load forecasting was proposed in [17]. Combining the improved local mean decomposition (ILMD) method with artificial neural network (ANN) was used to predict the short-term wind speed in [18]. A hybrid wind speed prediction method based on improved CEEMDAN and ANN with a multi-objective optimization algorithm was presented in [19]. A forecasting model based on Haar wavelet and LSTM with the improved sine cosine optimization algorithm for predicting the building energy consumption was presented in [20]. A summary of the above related hybrid methods forecasting work is presented in Table I. Although the above hybrid models achieved good prediction accuracy, they did not take into account data preprocessing for data characteristics and validating the practicality of the models with the latest load data. In order to overcome the above problems and achieve higher prediction accuracy, a new data pre-processing method which is based on VMD and seasonal factors elimination with error correction is proposed by us. The main contributions of this paper are the following: 1) A novel prediction method with detailed mathematical analysis process for achieving high accuracy of short-term electric load prediction is presented in this paper. 2) In the process of data pre-processing, the method proposed by us that can reflect the most suitable number of decompositions of VMD, eliminate seasonal factors based holiday and workday features of the original data, examine the data features lost during data pre-processing and recover the lost features. 3) Twelve experiments are conducted on the basis of realworld data from the United States and Singapore over the past few months to demonstrate the effectiveness and usefulness of the proposed approach. The rest of this paper consists of the following: Section II first describes the framework of our proposed method. Section III introduces the whole process of the proposed prediction model, including VMD, PE, seasonal factors elimination, error correction and LSTM. The evaluation metrics are shown in Section IV. In Section V, the case studies on real-world load data from different data sets are used to test the method proposed in this paper. Finally, Section VI concludes the study and discusses future work.

II. Main steps and framework
The main methods applied in this paper includes VMD, PE, seasonal factors elimination, error correction and LSTM  Fig. 1 The whole process of the proposed forecasting model. as shown in Fig.1. It can be summarized in the following four steps: 1) VMD is used to decompose the load data into subsequences, and PE is used to determine the number of subsequences decomposed by the VMD. 2) Considering the characteristics of the raw load data, a seasonal factor elimination treatment for weekday and holiday is proposed. Error correction is used to restore data features lost during VMD decomposition and seasonal factors elimination. 3) Combine the subsequences with LSTM to forecast the electric load. Different subsequences of the VMD decomposition have different characteristics, so each subsequence is predicted based on a LSTM model with different hyperparameters. 4) The final prediction results are obtained by summing the predictions of each LSTM model, and the real data are used to compare with it.

A. VMD(Variational mode decomposition)
VMD is an adaptive, completely non-recursive approach to modal change signal processing. The original load data has complex nonlinear relationships, and using it as input data for prediction is difficult. VMD can decompose the original load data into multiple simple sequences, which will reduce the difficulty of prediction and improve the prediction accuracy. Different subsequences have different sparse properties, so the determination of the central pulsation and bandwidth of the subsequence are very important in the VMD decomposition. The unilateral frequency spectrum of the subsequence is obtained by Hilbert transform, then the center frequency of each subsequence is estimated by hybrid exponential tuned to shift it to the baseband, and finally the bandwidth of the subsequence is obtained by 1 Gaussian smoothing [21]. Then the resulting constrained variational problem can be generalized as Eq. (1) : where means the -th mode, is the -th mode center frequency, means the total number of the modes, and ( ) represents the Dirac distribution.
In order to make the above constrained variational problem unconstrained, the combination of quadratic penalty term and Lagrangian multiplier is introduced as: The original minimization problem can be solved by the iteration count optimization method which is called alternate direction method of multipliers (ADMM). The and can be updated by ADMM as Eq.(3), Eq.(4): 1) minimizing : 2) minimizing : where is the number of original data, ( ), ( ) and ( ) are the Fourier transform of ( ), ( ) and ( ), respectively.

B. Permutation Entropy
The PE (Permutation Entropy) is a measure of the complexity of time series. The description of time series through symbolic space can clearly reflect the correlation between time series data. The embedding size and delay time are key factors in the calculation of the PE. A time series { ( ), = 1, 2, ...} can be shown as: where means embedding dimension, and means the delay time. In order to find the degree of correlation between the data, each embedding dimension of data which means the number of is arranged incrementally. The process is shown as follows: indicates the position of the data before being arranged. Then is maping onto Introduce a probability distribution 1 , 2 , ..., to represent the frequency of occurrence of each row in the order of arrangement, where ≤ !. The PE of the time series { ( ), = 1, 2, ...} can be obtained by calculating the Shannon entropy of distinct symbols as: The experimental results show that the more regular the time series, the smaller the value of the PE; the more complex the time series, the larger the value of the PE.
In this paper, the subseries with different numbers of VMD decompositions are first summed to obtain their time series, and then the ranking entropy of the time series is calculated. In order to avoid the over-decomposition and underdecomposition of VMD decomposition, the minimum number of VMD decomposition is 5 and the maximum is 12. Table II lists the number of VMD decomposition subsequences from 5 to 12 for the PE values on the Singapore datasets. It can be clearly seen that October, November and February have the lowest PE with a VMD decomposition subsequence number of 7. However, the September PE is minimum when the number of VMD decomposition subsequences is 12 and the January PE is minimum when the number of VMD decomposition subsequences is 5. The overdecomposition occurs when the number of VMD decomposition subsequences is 12, and the under-decomposition occurs when the number of VMD decomposition subsequences is 5. In order to ensure the best VMD decomposition for each month, the number of VMD decomposition subsequences is selected to 7, and the decomposition results of original data are expressed as VMD1, VMD2, VMD3, VMD4, VMD5, VMD6, VMD7, where VMD1 has the main characteristics of the original data.

C. Error analysis and processing
The data set used in the paper are from Singapore grid data and the U.S. grid data. Seasonal factors have a significant influence on STLF. Therefore, the matter is that how to analysis the original data and eliminate seasonal components. Fig.2 shows the real data of Singapore in September 2020. These data are sampled every 30 minutes for a total of 48 This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3130237, IEEE Transactions on Industrial Informatics 4   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  samples per day. It can be clear seen from the Fig.2 that the maximum load value of September 5 is smaller than September 4. The daily data changes of September 5 and September 6 are similar, but the maximum load value of September 6 is also much smaller than the maximum load value of September 7.
The mean values of September 6 and September 7 are also significantly smaller than the mean values of September 5 and September 7. In addition, the variation of the load values from September 1 to September 5 is similar, and the other daily data changes are similar according to a period of 7 days. Therefore, it is perfectly in line with the high load electricity consumption of companies and factories during weekday and low load electricity consumption during holiday. To verify that the seasonal factors of the above data set is a general phenomenon, all data sets from September 2020 to February 2021 are tabulated in this paper as shown in Table III.  Table III clearly shows the minimum, average and maximum values of holiday and weekday for the data set from September 2020 to February 2021 for Singapore. In the September 2020 dataset, the maximum value of 6822 for weekdays is 766 larger than the maximum value of 6056 for holiday, and the average value of 5946 for weekdays is 487 larger than the average value of 5459 for holidays, but the minimum value of 4615 for weekdays is 43 smaller than the minimum value of 4658 for holidays. The difference between the weekday data characteristics and the holiday data characteristics for September 2020 is the largest in all datasets. Therefore, we can conclude that the deviation between the minimum load values for weekday and holiday is small, the deviation between the maximum and average values is large.
In this paper, the data sets are separated into weekday and holiday to reduce the effect of seasonal factors. The data sets are divided into weekday data and holiday data . For weekday data , the load values at the same moments of each day are found as the mean value as shown in Eq.(8): where means the number of weekday, is the number of load values for one day, indicates the number of load data recorded per day. The mean of moments and which is the average value of are regularized as shown in Eq.(9): Then the seasonality index at the th moment can be obtained: The data set without the effect of seasonal factors ′ can be obtained as shown in Eq. (11): For holiday data , the same method is used for processing. After the process of seasonal factors processing, the weekday data and the holiday data are combined to obtain the new data set called . In the experiment, the seasonality factor treatment process has been done before the VMD decomposition, but the results were not satisfactory. Therefore, the seasonal factor elimination is placed after the VMD decomposition.

D. Error analysis and correction processing
Fast Fourier transform is used in the process of decomposition of the data by VMD, and the noise of the original data is eliminated to some extent. In addition, after the process of seasonal factors mentioned above, the obtained data shows a large error compared to the original data. Fig.3 shows the percentage error of data processed by VMD decomposition and seasonal factors with the original data in Singapore from September 2020 to February 2021. The specific formula for calculating the error is shown in Eq. (12): where [ ] means the -th point of the original data . From Fig.3 we can find that the largest positive error is 1.9% in December 2020 and the smallest is 1.4% in January 2021. The largest negative error is 2.8% in September 2020 and the smallest is 1.4% in February 2021. Even though the minimum absolute error in February 2021 is only 0.35%, the mean value of 5907 times the absolute error in February 2021 is 20.67, which has a huge bias effect on the final forecast result. When forecasting based on a dataset with large errors, the final prediction accuracy of the model can be poor. Therefore, an error correction process is needed to improve the prediction accuracy .
The error of the data varies from moment to moment, so it is important to calculate the error of each moment with the corresponding moment of the original data for error correction, and the calculation process is shown as:  where * [ ] denotes the error between the original data and the data processed by VMD decomposition and seasonal factors elimination. The data are corrected according to the error at each moment in time, as shown in the following Eq. (14): where denotes the -th data and * (·) is the deviation of the processed data from the original data divided by the processed data. When * [ ] is a negative (positive) number, the processed data can be increased (decreased) after subtraction operation to reduce the deviation from the original data.

E. LSTM(Long Short-Term Memory)
RNN has achieved good results in training short time series in the past, but its hidden layer has only one state, and when dealing with long time series there will be gradient explosion or gradient disappearance problem, which cannot meet the realistic needs. LSTM introduces cell state on the basis of RNN to achieve long-term memory to solve the problems of RNN. LSTM proposes the concept of multiplicative gate cell, through the input gate, forgetting gate and output gate in each storage cell, with the current cell state and the cooperation of sigmoid activation function and tanh activation function, the long time sequence can be predicted and achieved good results. The sigmoid activation function eliminates data less than 0 to achieve the filtering function, while the hyperbolic tangent function adjusts the input information to a number between -1 and 1 to facilitate the calculation. The specific calculation procedure for each cell of LSTM is shown as follows: (1) 1) Forget gate: Determine how much useful information from the previous cell state −1 is retained to the current cell state with the help of the sigmoid activation function.
where [ℎ −1 , ] denote combining two vectors into a single vector, denote the weight matrices of forget gate, denote the bias terms of forget gate, and (·) is the sigmoid function.
2) Input gate: The output information ℎ −1 of the previous cell and the input information of the current cell are obtained by the sigmoid activation function and tanh activation function respectively to obtain two different vectors, and then the two different vectors are multiplied to achieve the information input.
where and denote the weight matrices of input gate and cell state, respectively, and denote the bias terms of input gate and cell state, respectively, and tanh(·)is the hyperbolic tangent function.
3) Cell state: Long-term memory is achieved by retaining previous information and introducing current cellular information at the same time. The state information of the previous cell and the forget gate are dot-producted, and the result of the dot-product operation of the input gate and the current cell state information is added to achieve the state update.
4) Output gate: Realize the screening of all previous cells with valid information.The output information ℎ −1 of the previous cell and the current cell input information are processed by the sigmoid activation function respectively, and then the dot product operation is performed with the output of the cell state processed by the hyperbolic tangent function.
where denote the weight matrices of output gate, respectively, denote the bias terms of output gate.

IV. Performance Indicators
In this paper, the mean absolute error (MAE), root mean square error (RMSE), adjusted R-Square ( 2 ) and mean absolute percentage error (MAPE) are selected to evaluate the prediction results of the model. MAE can reflect the absolute value error of each data over the real data, which is the most widely used performance index currently. RMSE provides a visual representation of the predictive effect of the model. 2 _ is used to indicate the predictive fit of the model. MAPE can clearly indicate the predictive effect of the model as a percentage of the data and facilitate the analysis of its economic value. These metrics are defined as follows: 1) MAE is defined as follows: 2) RMSE is defined as follows: 3) 2 is defined as follows: 4) MAPE is defined as follows: In the above formulas denotes the data length, Pata denotes the final forecasting results, data denotes the original data, Pata[i] and data [i] denote the values of the data set at a given moment, and denotes the average value of the predicted data set.

V. Experiment and analysis
In this section, in order to demonstrate the practicality of the method proposed in this paper, data from two different regions, Singapore and the United States, are selected for forecasting in the most recent months. Four models, ARIMA, CEEMD+LSTM, VMD+LSTM and GRU+RNN [22] are selected for comparison with the model proposed in this paper. Since the IMFs decomposed by VMD and CEEMD decomposition have different characteristics, the corresponding LSTM hyperparameters are also different. After extensive experiments, the input data length of the input LSTM is between 22 and 70, and the maximum computational complexity of the proposed model in this paper is O(70). The validity and practicality of the proposed model in this paper are well verified. All training and analysis for this paper are done on a computer (CPU: Intel Core i7-10700, memory: 16 GB, GPU: Nvidia GeForce RTX 3070). All predictions in this paper are implemented in the Python 3.8 environment, using the TensorFlow 2.4.0 GPU version as the framework.

A. Singapore data specific experiments
In order to verify the practicality of the model proposed in this paper, data for the past six months from September 2020 to February 2021 in Singapore are chosen for this experiment to verify the model's STLF effectiveness. These data sets are sampled every 30 minutes for a total of 48 samples per day. The amount of data for each month is small, so a total of 144 data for the last 3 days of each month is selected as the test set for comparison. Table IV lists the values of the four error indicators including RMSE, MAE MAPE, and 2 for the proposed and contrast methods. It is clear that the CEEMD+LSTM outperforms ARIMA model, VMD+LSTM model and GRU+RNN model, but it is worse than the model proposed in this paper. Compared with CEEMD+LSTM, the RMSE, MAE, and MAPE of the proposed model in December are decreased by 22.1%, 25.3%, and 26.2%, respectively, which is the least improvement in dll data sets. Fig.4 shows the MAPE and RMSE for the proposed and contrast methods on the Singapore datasets. As we can see in Fig.4 that the MAPE of the model proposed in this paper varies steadily at around 0.15%, which is lower than other forecasting models. Fig.5 shows the MAE and 2 for the proposed and contrast methods on the Singapore datasets. It can be seen from Fig.5 that the purple line is above

B. The US data specific experiments
To verify that our proposed model remains valid in unstable load datasets, we conducted case studies of real load data from the dataminer2 website of PJM in the US. These data sets  are difficult to analyze, which can be clearly found when the data set is divided into 4 weeks. Therefore, the data sets in this subsection are also predicted according to the processing of Singapore data. When decomposing the original data sets by VMD, the PE value of 8 sub-models is less than 7. So the number of sub-models of the VMD for this experiment is 8. Finally, the test set predicted for this experiment is the data from the last 6 days and the length of the test set is 144. Table V lists the performance evaluations of different methods for the US datasets, where the bolded data are the results of the model proposed in this paper. The RMSE, MAE, MAPE and 2 of the proposed method are the smallest among all methods in all datasets, where the MAPE of the proposed model achieved 0.3% in the 2020 PJM dataset. The reason for the larger error in 2021 than in the 2020 dataset is the anomaly in electricity load caused by the impact of a new coronavirus in the US. In Fig. 7, the first three subplots reflect MAE, RMSE and MAPE, and the smallest area is occupied by the error of the proposed model (red line), while the fourth subplot reflects 2 , and the largest area is occupied by the proposed model. The experimental results show that the model proposed in this paper has excellent practicality and efficiency.

VI. Conclusion
In order to ensure the safety and economy of power grid operation, a new short-term load forecasting model based on VMD, seasonal factors elimination, error correction and LSTM is proposed in this paper. The VMD model is used to reduce the prediction difficulty by decomposing the original complex data into simple sequences and the PE is used to determine the number of decompositions of the VMD. Considering that there is a significant seasonal factors in the electric load data, a seasonal factor elimination process based on the original data characteristics is used to process the electric load data. But some data features are lost after the data has been pre-processed. Therefore, error correction is used to recover lost data features in this paper. In order to fully verify the performance of the model, the data sets selected in this paper are the most recent months of different data characteristics in different regions. The results indicate that the proposed model can obtain the prediction results of the power grid load that meet the real demand. Industry 5.0 is the next industrial evolution, and technologies like digital twins, edge computing and blockchain play a key role in it [23]. In future studies, STLF methods based on deep learning can be combined with edge computing and blockchain to implement smart grid 2.0.