A survey on rainfall forecasting using artificial neural network

Rainfall has a great impact on agriculture and people’s daily travel, so accurate prediction of precipitation is well worth studying for researchers. Traditional methods like numerical weather prediction (NWP) models or statistical models can’t provide satisfied effect of rainfall forecasting because of nonlinear and dynamic characteristics of precipitation. However, artificial neural network (ANN) has an ability to obtain complicated nonlinear relationship between variables, which is suitable to predict precipitation. This paper mainly introduces background knowledge of ANN and several algorithms using neural network applied to precipitation prediction in recent years. It is proved that neural network can greatly improve the accuracy and efficiency of prediction.


Introduction
According to a report (Report, 2017), in 2016, Jiangsu province's annual average precipitation is 1528.5mm,which reached the highest peak in the past 65 years.Too much or too little rainfall may cause severe meteorological disasters, such as floods or droughts, which greatly affect people's daily lives.As a result, it is very important to forecast precipitation accurately.Traditional methods to predict rainfall are divided into two main parts: numerical weather prediction (NWP) model and statistical model.But these two methods have great limitations in the case of the dynamic change and nonlinear shift of rainfall (Mohini P. et al., 2015).To solve this problem, artificial neural network (ANN) has been proposed to apply to precipitation forecasting.
ANN has lots of strengths to predict rainfall.First, ANN is a kind of data-driven model, so there is no need to set restrictions when modelling.Second, because of the learning ability of ANN, it can accumulate a large amount of experiences so as to predict patterns which did not exist before.Third, neurons in ANN work in parallel processing mechanism, so they are able to process big data efficiently.Last but not the least, complicated nonlinear relationship between variables can be extracted using ANN.
As early as a few decades ago, researchers have used ANN techniques for precipitation prediction.After so many years of development, a lot of progress have been made in this field.Young et al. (2014) used a physically based model HEC-HMS and an artificial neural network to predict hourly rainfall-runoff.Then result showed that the hybrid model performed better than a single ANN model.Abhishek et al. (2012) combined Back-Propagation Algorithm (BPA) with multilayer neural network to predict average rainfall over Udupi district of Karnataka, and compared to CBP and LRN, BPA worked better.Chai et al. (2015) proposes an EMD-LSSVM (empirical mode decomposition least squares support vector machine) model to analyse the CSI 300 index.It concluded that EMD-LSSVM model with GS (grid search) algorithm is a promising option.Abbot et al. (2015) came up with an independent artificial neural network model, which achieved a more skilful medium-term rainfall forecasts for the Bowen Basin in Queensland.Ahmed et al. (2015) proposed a multilayer perceptron neural network to downscale rainfall in an arid region.Multilayer Perceptron (MLP), Functional-link Artificial Neural Network (FLANN) and Legendre Polynomial Equation (LPE) were used to predict the time series data.It turned out that FLANN got lower absolute average percentage error (AAPE) than the other two models (Nanda et al., 2013).Shrivastava et al. (2013) found that BPN model had capacity to learn features of monsoon rainfall data time series.Rajkumar et al. (2015) combined Fuzzy Neural Network (FNN) and Hierarchical Particle Swarm Optimization (HPSO) technique to reduce the size and improve training speed of network.Mislan et al. (2015) tested rainfall data using two-hidden layers of BPNN architectures with three different epochs.Furthermore, BPNN algorithm provided a good model to predict rainfall in Tenggarong, East Kalimantan -Indonesia.Namitha et al. (2015) put forward an idea about Artificial Neural Network implemented on Map-reduce framework for short term rainfall prediction.The result showed that implement this solution on Hadoop makes it faster and scalable.Dubey (2015) utilized three different training algorithms, feedforward back propagation algorithm, layer recurrent algorithm and feed-forward distributed time delay algorithm to create ANN.It suggested that the best accuracy was obtained by feed-forward distributed time delay algorithm.The hybrid learning of MLP with CAPSO algorithm provided higher rainfall forecasting accuracy, lower error and higher classification accuracy (Beheshti et al., 2015).There are also many optimization algorithms to improve the performance in this field recently.A prediction model based on an optimized Kernel-based Extreme Learning Machine algorithm is proposed for faster forecast of job execution duration and space occupation (Liu et al., 2016 This paper is organized as follows.The concept of Artificial Neural Network and popular training algorithms are elaborated in Section 2. Section 3 focuses on several excellent applications using ANN for precipitation forecasting.Moreover, analysis and comparison of the applications in Section 2 is explained in Section 4.

Preliminary
In this section, conception of Artificial Neural Network is described, what's more, training algorithms in ANN are indispensable, which are introduced in this part as well.

The Concept of Artificial Neural Network
The structure of neural network is inspired by human brain.Just as neurons in human brain, there are many neurons in neural network to process information received.A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use (Nayak et al., 2013).The basic unit of ANN is an artificial neuron, which can accept inputs, then process them and export relative outputs finally.Basic working process of ANN is shown in Fig 1 .As shown in this figure, after inputs are pre-processed, they are sent to an artificial neural network.After that, the outputs of this model are compared with targets, and a value which represents the error of this comparison goes back to adjust ANN's parameters.After a large amount of iterations, the error reaches an optimal value, and the structure of the artificial neural network is adjusted to a suitable one.Just like the biological neural networks, an artificial neural network has similar structure and functions, which are represented by a mathematical model.Every artificial neural network is composed of an artificial neuron, which is a simple mathematical function with three basic rules: multiplication, summation and activation.For example, in the first step, each input value is multiplied with individual weight because the inputs are weighted.In the next step, the sum function works: it sums all weighted inputs and bias.In the last step, the result of the sum is passing through an activation function, which is also called a transfer function.
It seems that the working principle of artificial neurons is very simple, nothing special.But when the neuron is integrated into an artificial neural network, its potential is activated.So an artificial neural network is able to play a powerful role through the self-learning ability of neurons.
In order to fully reflect the powerful function of artificial neural networks, people usually do not randomly connect neurons.In the past, researchers have proposed several predefined topologies that can help people solve problems more efficiently and simply.But different problems should be solved in different ways.After determining the type of problems, we are supposed to take a suitable topology, and then take some measurements to adjust it.Usually the object to adjust is the structure and its parameters.As the biological neural network can learn from the inputs of the environment to learn the relative behaviours and responses, an artificial neural network can follow this point.It can learn all the time through a large amount of inputs, so as to the best condition after the training.
Researchers put forward a variety of learning rules and algorithms to meet the needs of different network models.The effective learning algorithm makes the neural network can construct the objective representation of the objective world through the adjustment of the connection weight, and form the information processing method with characteristic information.The information storage and processing are reflected in the connection of the network.According to the different learning environment, neural network learning can be divided into supervised learning and unsupervised learning.In the supervision of learning, the training sample data is added to the network input, while the corresponding expected outputs and network outputs compared to get an error signal, in order to control the weights of the connection strength adjustment, after repeated training to a convergence determined weights.When the sample situation changes, the study can modify the weights to adapt to new environment.There are many kinds of neural networks using supervised learning, such as a backpropagation network, a perceptron and so on.As for unsupervised learning, it does not give a standard sample in advance, the network directly into the environment, so the learning phase is as the same time as the working phase.At this time, the change of learning law follows the evolution equation of connection weight.The simplest example of unsupervised learning is the Hebb learning rule.

Back Propagation Algorithm (BPA)
Back propagation algorithm (BPA) is a traditional training algorithm of ANN.

The Concept of Back Propagation Algorithm (BPA)
Back Propagation Algorithm is one of the most popular training algorithms in neural network (Rumelhart et al., 1986).BPA consists of two channels: a forward pass and a backward pass.In the forward pass, vectors are transmitted into the neurons in input layer, then output real responses through layers at last.In the backward pass, the main purpose is adjusting weights of synapses to make actual responses close to expected responses.Specifically, actual responses are subtracted from the desired response to generate an error signal.Then the error signal is propagated along the opposite direction with synapses.

Optimization Methods of BPA
Five popular optimization methods which applied to BPA are briefly introduced below.

Gradient Descent
Gradient descent method is the simplest training algorithm.It only needs the information of gradient vectors, so it belongs to first order algorithm.It defines that , (1) is an original point, is moving to next point along with the direction of , repeatedly iterates like this until termination condition is satisfied.The recurrence formula of gradient descent is shown as follow (2) denotes learning rate.This parameter can be set to a fixed value, but also can be used to update the calculation along the direction of the training.

Levenberg-Marquardt Algorithm
Levenberg-Marquardt algorithm is also called least square method of attenuation.It does not need to calculate Hessian matrix, but gradient vectors and Jacob matrix are needed in this algorithm.Assuming that the loss function f is the sum of squared errors, , (3) m represents number of training samples.Jacob matrix of the loss function is consists of partial derivatives of error term, which is (4) m and n denotes number of samples in training set and the number of parameters of neural network individually.The scale of Jacob matrix is .The gradient vector of the loss function is (5) e is a vector consists of error terms.Finally, we can use this formula to estimate the Hessian matrix (6) λ is an attenuation factor, to ensure that the Hessian matrix is positive, I is the unit matrix.The parameters updating formula of the algorithm is as follows (7) If λ is equal to zero, this algorithm is comparable with Newton method.If λ is set to be very large, this algorithm is just like gradient descent method with a small learning rate.

Newton's method
Newton's method belongs to the second order algorithm because this method uses Hessian matrix.The goal of this algorithm is to use the second order partial derivative of the loss function to find a better learning direction.
It can define use the Taylor expansion to estimate the function f at the value 0 H is the Heisen matrix value of the function f at 0 w .
At the minimum of ) (w f , we get the second equation at Therefore, the parameters are initialized to 0 w , and the iterative formula of the Newton algorithm is is called Newton.It is worth noting that if the Hysen matrix is a non-definite matrix, then the parameter may move in the direction of the maximum value, rather than the minimum direction.So the loss of the function value does not guarantee that each iteration is reduced.In order to avoid this problem, we usually modify the equation of the Newton algorithm: Learning rate η can be set to a fixed value or dynamically adjusted.The vector The efficiency of this method to train the neural network model is proved to be better than the gradient descent method.Since the conjugate gradient method does not need to calculate the Hysen matrix, we also recommend that when the neural network model is large.

Conjugate gradient
The conjugate gradient method can be considered as the intermediate of the gradient descent method and the Newton method.The algorithm is expected to accelerate the convergence rate of gradient descent, while avoiding the use of the Cypriot matrix for evaluation, storage and inversion to obtain the necessary optimization information.
In the conjugate gradient training algorithm, because the search is performed along the conjugate direction, it is usually better to converge more quickly than the gradient descent direction.The training direction of the conjugate gradient method is conjugated with the Cypriot matrix. .The conjugate gradient method constructs the training direction sequence as follows: In the above equation, γ is called a conjugate parameter, and there are some methods to calculate this parameter.The two most common methods are derived from Fletcher, Reeves and Polak, Ribiere.For all conjugate gradient algorithms, the training direction is periodically reset to a negative gradient.
The parameters are updated and optimized by the following expression.The usual learning rate η can be obtained using the univariate function optimization method.
The conjugate gradient method has been shown to be much more effective in the neural network than the gradient descent method.And because the conjugate gradient method does not require the use of Cypriot matrix, so in large-scale neural network, it can still be a very good performance.5. Quasi-Newton method Since the Newton method needs to calculate the Hessian matrix and the inverse matrix, more computational resources are needed, so a variant algorithm, called Cauchy-Newton method, can be used to compensate for the large computational complexity.This method does not directly compute the Hysen matrix and its inverse matrix, but only the first order partial derivative of the loss function is used to estimate the inverse matrix of the Hessian matrix at each iteration.
The Hessian matrix is composed of the second derivative of the loss function.The main idea of the Cauchy-Newton method is to estimate the inverse matrix of the Hessian matrix with another matrix G, which requires only the first derivative of the loss function.The updating equation of the Cauchy-Newton method can be written as: Learning rate η can be set to a fixed value or dynamically adjusted.Hessian matrix inverse matrix estimation G has many different types.Two commonly used types are Davidon-Fletcher-Powell formula (DFP) and Broyden-Fletcher-Goldfarb-Shanno formula (BFGS).
In many cases, this is the default choice algorithm: it is faster than the gradient descent method and the conjugate gradient method, without the need to accurately calculate the Hysen matrix and its inverse matrix.
In recent years, some effective algorithms like GA, PSO and Wavelet Analysis are used with artificial neural network to predict rainfall.Next section is going to explain these techniques in detail.

RBF-NN with GAPSO algorithm
Wu et al. ( 2015) brought up a hybrid optimization method, namely HPSOGA, which combined particle swarm optimization (PSO) with genetic algorithm (GA), to construct a radial basis function neural network (RBF-NN) automatically.The object was to predict Liuzhou's monthly precipitation from 2005 to 2011.

Radial Basis Function Neural Network (RBF-NN)
Radial  Where denotes an input vector, means a radial function from , are equal to weights of the links from the number of hidden neuron i to the number of output neuron t in output layer.N represents the number of neurons in the hidden layer.

Hybrid of PSO and GA for RBF-NN Design
To determine the parameters of radial basis function, i.e., the values for centre, radii and weights, PSO and GA are used in RBF-NN.
GA is a computational model, which is derived from Darwin's theory of biological evolution and biological evolution process of natural selection.What's more, it is a kind of method to search the optimal solution through simulating the process of natural evolution.PSO is inspired by social behaviours among animals, such as bird flocking, fish schooling and so on.In this algorithm, a group (called swarm) is created which contains random search solutions.So each solution is regarded as a particle.Moreover, every particle has random velocity flying through multi-dimension to find global minimum.This algorithm have ability to remember information of good solutions in all particles, which is different with genetic algorithm.The hybrid algorithm is displayed in Fig 3.
. This will ensure that the mutated individual is still within the searching range.4. PSO method: After 4N individuals are divided into two parts, 2N best individuals are applied to PSO operators, which are regarded as the particle velocities, while the rest of all individuals are used as the particle positions applied to PSO operators.5.A new population is produced.6. Termination condition: If the new population with updated fitness value is not able to meet termination condition, go back to step 2, otherwise it comes to the final result.7. Observation: validation accuracy curve is observed all the time in order to prevent over training.Once it shows the best validation accuracy, the training process is supposed to the end.8. Recall: According to the termination condition of training procedure in the previous step, the number of hidden neurons and the parameters of RBF neural network are recalled.9. Retrain and build: RBF neural network is going to trained on a larger amount of testing dataset, on the basis of the number of hidden neurons and parameters of RBF-NN which are recalled in the former step.

Experimental Result
The equipment used to develop the RBF-HPSOGA is a computer, which owns following characters: Intel Pentium E3.20 GHz CPU, 1.96GB RAM, a Windows XP operating system and the Matlab 9.0 development environment.Parameters in this method are as follows: population size= 40, the number of iteration= 100, crossover probability= 0.8 and mutation probability= 0.2.Data sources are monthly rainfall data of 24 stations of Liuzhou Meteorology Administration rain gauge networks from 1949 to 2011.
Training set is total of 480 samples in the period from January 1949 to December 1988, and validation set is total of 180 samples in the period from January 1989 to December 2004, and 84 samples were used as a testing set from January 2005 to December 2011.
Three models were compared in that experiment, which were single RBF-NN, RBF-NN with pure GA and RBF-NN with HPSOGA.After tested 80 samples, the result was shown in Table 1.In Table 1, evaluation indexes, average absolute relative error (AARE), root mean square error (RMSE) and correlation coefficient (CC), are used to assess performance of monthly rainfall.Obviously, RBF-HPSOGA is outstanding among these three models.So the result proves that HPSOGA method is helpful to build an efficient architecture of radial basis neural network.

Wavelet Neural Network
An attempt had been made to produce an effective way for rainfall forecasting by combining wavelet technique with ANN (Ramana et al., 2013).In this hybrid model, input signal was processed by using wavelet analysis.To predict precipitation at Darjeeling rain gauge station, monthly rainfall and temperature data of 74 years was used.

Wavelet Analysis
Wavelet (Wavelet) this term, as its name implies, "wavelet" is a small waveform.The so-called "small" means that it has a decay; and called "wave" is its volatility.It is widely applied in signal processing, which has attracted much attention in many fields.Compared with the Fourier transform, the wavelet transform is a local transform of space (time) and frequency, so it can effectively extract information from signal.By scaling and translation operation, the function or signal can be refined and analyzed in many scales, which can solve many difficult problems that can't be solved by Fourier transform.By decomposing a time series into time-frequency-space, one is able to determine both the dominant modes of variability and how those modes vary in time.Wavelet have proven to be a powerful tool for the analysis and synthesis of data from long memory processes (Ramana et al., 2013).1. Discrete Wavelet Transform (DWT) Discrete Wavelet Transform (DWT) is useful in numerical analysis and time-frequency analysis.The first discrete wavelet transform is invented by Hungarian mathematicians.Discrete wavelet transform is the discrete input and discrete output, but there is not a simple and clear formula to represent the relationship between input and output, only in a hierarchical architecture.

Continuous Wavelet Transform (CWT)
CWT is used instead of the Window Fourier Transform (WFT) to overcome the problem that the resolution cannot change invariably with time and frequency.When the window function is selected, the window shape of the timefrequency window is fixed for the WFT, and it cannot be changed as the signal component analyzed is highfrequency information or low-frequency information, and the non-stationary signal is rich of the frequency components, so their ability to analyze non-stationary signals is very limited.Wavelet transform is similar to WFT, that is, the signal is multiplied by wavelet, and wavelet transform is calculated for different time periods of time domain signal.But there are two differences between WFT and wavelet transform: windowing signal does not do Fourier transform; the most important feature of wavelet transform is to calculate the frequency of each component can change the shape of the window.

Method of Combining Wavelet Analysis with ANN
Multilayer perceptron (MLP) neural network architecture is adopted.Fig 4 shows the structure of wavelet based MLP.It is worth reminding that Levenberg-Marquardt algorithm which has been mentioned in section 2 is taken as training algorithm in this wavelet neural network.

Experimental Result
Darjeeling is located in the east coast of 88 degrees 15 minutes 47 seconds, latitude 27 degrees 2 minutes and 30 seconds, is a small town in West Bengal, India, the capital of Darjeeling, located in the Himalayas foothills of the West Vallic Mountains, an average elevation of 2,134 meters.Darjeeling is also known as "King Kong Island".The hills of Darjeeling are part of the Mahabharat Range or Lesser Himalaya.Kanchenjunga, the world's third-highest peak, 8,598 m (28,209 ft) high, is the most prominent mountain visible from Darjeeling.
To prove the superiority of WNN intuitively, autoregression (AR)model and ANN model were used to make comparison with WNN model.Rainfall and temperature data of 74 years from Darjeeling rain gauge station was used

Rainfall Forecasting using FLANN
Here is the procedure of rainfall estimation using FLANN.
Step 3: Initialize weights , i=1,2,…,l where i represents the number of functional elements.
Step 4: A functional block is produced (18) Step 5: The output is calculated as (19) Step 6: The output error is calculated as (20) denotes the desired output, while is the predicted output of this system.
Step 7: Weights are updated as follow (21) k means the time index and α is the momentum parameter.
Step 8: If then go to next step.Otherwise, back to Step 3.
Step 9: The procedure of training is completed.Then testing can be implemented.

Experimental Result
To validate the method which proposed by the paper, the researcher constructed a simulation environment using MATLAB.Data sources were obtained from India Meteorological Department (IMD).
The researcher compared rainfall forecasting performances by proposed three models with real time data.Absolute Average Percentage Error (AAPE) was adopted as an evaluation index.From the experimental result, it was observed that FLANN owned a lower AAPE than MLP and LPE which is displayed in Table 3.So FLANN was proved to be an excellent neural network for precipitation prediction.

Conclusion
Because of the ability of distributed storage and nonlinear data processing, artificial neural network is an optical option to predict rainfall.By combining effective algorithms with ANN, accuracy and efficiency of precipitation forecasting is able to increase greatly.In this paper, background knowledge and techniques of artificial neural network for rainfall forecasting are introduced, and three methods are elaborated in detail.First, GAPSO algorithm is used to construct the architecture of RBF-NN by determining parameters of radial basis function (centre, radii) and weights.By comparing with pure RBF-NN and RBF-GA, RBF-NN with GAPSO is verified to be more accurate for precipitation prediction.Second, WNN is also an excellent way to forecast.This method mainly focuses on input signal processing.By contrasting WNN with ANN and AR models, WNN provides a better experimental result than the other two models.Finally, a single layer ANN, namely, FLANN, is illustrated in this paper.With a faster computing speed, it shows a more outstanding capability than MLP and LPE.We can see that these three methods improve the artificial neural network from different aspects.And they can achieve satisfactory results when applying to precipitation forecasting.

Fig. 1 .
Fig. 1.Basic working process of artificial neural network

d
can be used to express the training direction vector, and then start from the initial parameter vector 0

Table 1 .
Performance of Three Models

Table 3 .
Absolute Error Analysis of MLP, FLANN & LPE over Actual Rainfall Data