Uncategorized

Atmosphere | Free Full-Text | Enhancing Solar Radiation Forecasting in Diverse Moroccan Climate Zones: A Comparative Study of Machine Learning Models with Sugeno Integral Aggregation



1. Introduction

Solar radiation (SR), often noted because the energy emitted by the sun and reaching the Earth’s surface, plays an important role in various natural Earth processes, as noted by [1,2]. This type of energy finds applications in fields like hydrology, climate science, irrigation planning, and also the development of crop growth models, as demonstrated by the works of [3,4,5,6]. Moreover, radiation stands as a sustainable energy source, presenting a viable alternative to fossil fuels, as highlighted by [7,8,9,10].
Nonetheless, obtaining direct measurements of (SR) remains a challenge on a world scale, as acknowledged by [11,12,13]. Consequently, to handle this limitation, scientists have endeavored to predict radiation using various modeling approaches. Among these models, we will categorize them into three main types:
Empirical Relationships: These models are supported established empirical equations and relationships. Examples include the works of [14,15,16,17,18,19]. Artificial Intelligence (ANN)-Based Models: These models utilize ANN and machine learning (ML) techniques to estimate radiation. Examples include the research of [20,21,22,23,24]. Satellite-Based Models: These models depend upon satellite data and remote sensing technology to derive radiation estimates. They include the works of [25,26,27,28,29]. These diverse approaches offer various methods for estimating radiation, catering to different data availability, and modeling preferences.
Each of the models mentioned above has its own strengths and weaknesses. Therefore, when selecting a model for a study associated with radiation, several key factors acquire play, as noted by [2]. These factors include the desired level of accuracy, the desired spatial distribution of information, and therefore the availability of meteorological data. For instance, when considering satellite-based models, like the one discussed by [26], the continual data they supply, both in terms of spatial coverage and temporal resolution, can give researchers a highly accurate and spatially distributed view of radiation across an outsized geographical region at any given point in time. This could be particularly valuable in studies where comprehensive coverage and up-to-date information are essential. There is a big body of research on the ML approach for radiation forecasting. Notably, a considerable portion of those studies emerged after 2018. These recent investigations demonstrate a growing interest in several key areas:

Climate Change: Many of those studies have a pronounced target global climate change. This reflects the broader recognition of the importance of accurate radiation forecasting in understanding and mitigating the results of global climate change.

Deep Learning (DL): Researchers are increasingly exploring the potential of DL techniques for SR forecasting. Their architectures have shown promise in capturing complex patterns in radiation data.

New Machine Learning Models: Beyond traditional ML algorithms, like support vector machine (SVM) and extreme learning machine (ELM), newer and more advanced ML models are gaining attention. These models may offer improved accuracy and performance in SR forecasting.

Renewable Energy Development: The connection between radiation forecasting and also the development of renewable energy generation may be a significant area of interest. Accurate predictions of radiation are crucial for optimizing the efficiency of solar energy systems.

Several researchers have delved into modeling and forecasting radiation using various mathematical equations and ML approaches. Kumar et al. (2015) conducted a study comparing the performance of regression models with ANN models for SR prediction [30]. This research likely aimed to assess the effectiveness of ML techniques during this context [31]. This team employed a wavelet transform approach together with various ML techniques, including ANN, ELM, and radial basis function (RBF) networks, yet as their hybrid variations. This means an investigation into the potential benefits of mixing wavelet analysis with ML for SR modeling. Şahin (2013) involved a comparison between ANN-based methods and statistical methodologies to estimate radiation from satellite images [32]. This research could explore the benefits of using ANN in remote sensing applications. Polo et al. (2014) investigated the sensitivity of satellite-based approaches to calculate radiation concerning different aerosol input parameters and model choices [33]. This research likely aimed to grasp how varying factors affect the accuracy of satellite-derived radiation estimates. These studies collectively represent the varied approaches and methods employed by researchers to reinforce our understanding of radiation and its prediction using both traditional mathematical techniques and modern machine learning methods. In an investigation conducted by [34], various models for solar radiation (SR) estimation and forecasting were explored. Their research revealed that, among the models they assessed, the one modified by Gueymard from the Collares-Pereira and Rabl model demonstrated the very best level of accuracy in forecasting average hourly radiation. This means that, for his or her study, this specific model adaptation proved to be the foremost precise for predicting radiation under these specific circumstances. Belmahdi et al. (2023) provide five approaches for forecasting daily global solar radiation (GSR) in two Moroccan cities [35], Tetouan and Tangier; autoregressive integrated moving average (ARI-MA), autoregressive moving average (ARMA), feed forward back propagation neural networks (FFBP), hybrid ARIMA-FFBP, and hybrid ARMA-FFBP were selected to forecast the daily global radiation with different combinations of meteorological parameters, and the hybrid models improved accuracy and reduced errors in forecasting. The hybrid k-means and nonlinear autoregressive neural network models provide better hourly global solar radiation forecasting results than either method alone [36]. Machine learning techniques slightly improve hourly solar forecasting performance compared to linear autoregressive and scaled persistence models, with more pronounced improvements in unstable sky conditions [37].
Based on the literature available in the Web of Sciences database, more than 1000 research papers in the field of SR prediction with ML approaches were identified. These studies were analyzed bibliometrically using VOSviewer software (version1.6.20) [38,39]. Figure 1 shows the connection between keywords.
Figure 1 shows the intense use of ML approaches in SR predictions, solar energy, solar irradiation, renewable energy studies, and in recent years, especially that DL and remote sensing are among the topics researched by the authors.
SR is extensively researched worldwide, especially in sun-rich areas just like the Mediterranean and also the Mideast [40,41]. Unfortunately, many locations lack access to reliable observed radiation data because of the high costs related to acquiring, installing, and maintaining measurement devices. Challenges associated with calibrating radiation detection equipment further contribute to the present data gap [31]. Consequently, researchers commonly resort to numerous methods for estimating SR, including location-based, temperature-based, remote sensing-based, day and month-number-based, cloudiness-based, sunshine-based models, and hybrid models [42,43,44]. However, the complex relationships between independent and dependent variables often limit the accuracy of those models, especially in humid regions where inclemency significantly affects radiation [42].

The aim of this study is to investigate SR forecasting using long short-term memory, support vector machine regression, and multilayer artificial neural networks approaches. For SR estimation and to propose a novel aggerated model that integrates the forecasting outputs of LSTM, SVM, and MLANN with Sugeno λ-measure and Sugeno integral named (SLSM), 10 hydro-meteorological parameters and various reflectance values obtained by remote sensing techniques from 6 stations located in Morocco (Tantan, Fes, Agadir, Marrakech, Ouarzazate, and Tangier) were used as the main contributions to the present research, which are:

(1)

Combining information from remote sensing parameters and hydro-meteorological data to improve hourly SR forecast accuracy using input data from hourly timesteps.

(2)

Capturing a wider variety of environmental variables and incorporating spatial components into the study by using the reflectance data from remote sensing.

(3)

Long short-term memory (LSTM), support vector machine (SVM) regression, and multilayer artificial neural networks (MLANN) are being investigated as ML approaches to perform a comprehensive comparison of SR prediction models and provide valuable insights into their relative strengths and weaknesses.

(4)

Using various weather dataset profiles, and comparing different ML techniques to assess the stability of the offered approaches.

(5)

Evaluating the efficacy of the proposed methodologies under various geographical and meteorological variables to validate the generalizability and reliability of SR prediction.

(6)

Conducting statistical analysis using the Kruskal–Wallis test to see whether the forecasts and observations data points have the same underlying distributions.

(7)

Improving the forecasting accuracy by applying fuzzy measure of that combines the accurate prediction information of the three models.

6. Results and Discussion

The performance of the three proposed models was evaluated by statistical indictors which are listed in this section. The RMSE and MAE values along with R2 are utilized for evaluating the designed models that belong to irradiance and meteorological data for different locations in Morocco. The results indicate that the LSTM model has good forecasting performance where errors measuring between predicted and actual values are generally small as shown in Table 3. For instance, LSTM model achieves RMSE values ranges between 25.38 W/m2 and 41.09 W/m2 for the data of the six sites. However, the proposed SVM and MLANN yield higher forecasting error with RMSE value ranges between 57.04 W/m2 to 70.10 W/m2, and 75.85 W/m2 to 80.64 W/m2, respectively, with the data of the same sites. Further, the mean values of LSTM model for the six sites were calculated and compared to those in the other two forecasting models. The comparison shows the LSTM’s superiority in predicting hourly irradiances. Comparing LSTM model to conventional ANN (i.e., MLANN) and ML techniques (i.e., SVM) in the solar irradiance forecasting showed the capability of LSTM model to learn from nonlinearity patterns in high variability irradiance data by capturing a long range of temporal sequence dependencies. Further, as shown in Table 3 and the scatter plots in Figure 8, Figure 9 and Figure 10, which are the results under all training and testing phases, the proposed LSTM model outperformed the SVM and MLANN, performing better generalization capability and less overfitting behavior, and accurately predicting irradiance data showing strong correlations between the predicted and actual data points.
The findings regarding the performance evaluation of the models are included in Table 3 for all stations. The models used in the study are an example of the advancement of artificial intelligence techniques. The first ANN models were replaced by machine learning methods such as SVM, followed by deep learning methods such as LSTM architectures. When the results obtained are examined; considering the average values, the highest R2 values are in the LSTM model, followed by SVM and ANN. Similarly, RMSE and MAE values are compatible with this ranking. The scatter plots of the models are shown in Figure 8, Figure 9 and Figure 10.
As a way of validation, advanced statistical analyses were considered to the proposed LSTM by applying the violin plot, Taylor diagram, and Kruskal–Wallis (KW) test for predictive accuracy. As can be seen in the violin plot in Figure 11, the correspondence distribution of predicted data with actual data were examined. The comparison of predicted and actual data distributions showed that effectiveness of the proposed LSTM to mimic the peaks, valleys, and tails of the density curve of the actual data. The proposed LSTM model is also validated by the Taylor diagram in Figure 11 that represented the correlation between the predicted and actual data. Figure 11 showed a way of graphically summarizing how closely the patterns of the proposed LSTM predicted data (the blue cross sign) match the actual data (the red circle sign) with correlation confections of 98% to 99% for the six sites.
Violin diagrams are essentially based on the formal description of statistical quantities. In the study, normalization was made to see the change between shapes. From this perspective, the best fit to the observed data is observed in the LSTM model at all stations, while the average and median values are larger than the data observed in the SVM model. According to the correlation and RMSD relationship between the observation and models, the results were also examined using the Taylor diagram (Figure 12).
In Figure 12, Taylor diagrams are positioned at a point on the standard deviation axes according to correlation and RMSD values, and comparisons are made by taking into account the proximity of this point to the observed data. The graphs show that the models give very close results at the Fes, Ouarzazate, and Tangier stations, but LSTM is more successful than SVM and MLANN in terms of closeness to the data observed at other stations.
To further validate the model, the proposed LSTM was examined by statistical indicators to receive a proper evaluation of the model’s performance. The Kruskal–Wallis (KW) test is a nonparametric test that was employed to compare the distributions of the predicted and actual data, the work hypothesis was formulated as follows [84].

H0: the two distributions are different; H1: the two distributions are identical.

The statistic value is calculated as follows:

H = 12 N ( N + 1 ) i = 1 C   R i 2 n i 3 ( N + 1 )

where C: the sample number, n i : the observation number in ith sample, R i : the ranks sum in ith sample, and N : observations number.

As shown in Table 4, The KW test was performed at 95% confidence interval where the p-values indicate that H0 is significantly rejected, and distribution of the predicted and actual models is identical. Likewise, the p-values of the KW test indicate that we reject H0 and accept the alternative hypothesis for the six sites while showing the generalization capability of the proposed LSTM model.
The implementation of the proposed aggregation-based model (SLSM) developed with the fuzzy Sugeno integral-built MATLAB function. The obtained RMSE measures of the proposed SLSM are shown in Table 5 for the six different solar irradiance profiles. The RMSE values range between 16.09 W/m2 and 22.67 W/m2 for forecasting the highly fluctuating irradiance. Comparing the performance of the proposed SLSM with the proposed individual models for all sites, the proposed SLSM performed the best of all the models (e.g., LSTM, SVM, MLANN), achieving a high accuracy with an average RMSE of 20.16 W/m2.
To visualize the performance of the four proposed prediction models, Figure 13 shows the comparison of the predicted SR obtained by the proposed models with the real measurements of SR for the six sites for January 2014. The results show that the SLSM performance is significantly stable and superior for the different SR profiles. Most of the night-time irradiance measurements were zero or ideally close to zero; however, errors may occur, which are associated with the noise or failure in the sensor readings. Dealing with errors caused by noise or sensor failures in night-time irradiance measurements requires careful analysis and appropriate techniques.
Additionally, the proposed model was further tested with data covering different months. The predicted hourly SR for the month of April 2016 is shown in Figure 14. The months of January and April can convey data and climate information for different seasons and cities in Morocco. The results showed that the proposed SLSM is superior compared to the other models. It also displayed good generalization and satiability capabilities when interaction with different data. This indicates that, in addition to aggerating data, SLSM also had a comprehension capture of the patterning within different data and lower prediction errors when faced with data from a variety of seasons. The proposed SLSM demonstrated an effective strategy for reliable forecasting by capturing the high variability and seasonality patterns in the irradiance dataset. The SLSM model’s superior performance is likely due to the idea of combining multiple forecasting models and aggregating the interaction between the predicted values of these individual models. This is also clearly shown when the model was validated by data across various seasons and exhibited lower prediction errors.

7. Conclusions

In this study, the SR estimation was carried out using LSTM, SVM, and MLANN approaches. For SR estimation, 10 hydro-meteorological parameters and various reflectance values obtained by remote sensing techniques from six stations in Morocco (Tantan, Fes, Agadir, Marrakesh, Ouarzazate, and Tangier) were used, and the main findings of the current research are as follows:

  • The results were evaluated using the Taylor diagrams, violin plots, and the error criteria of RMSE, MAE, and R2, and it was determined that the method that best predicted the observed values was LSTM (mean, RMSE: 41.05, MAE: 21.99, R2: 0.98). SVM and ANN come after LSTM. While the advantage of the LSTM model is that it makes predictions with less error due to its integration with the learn-and-forget structure and optimization techniques. It is also more complex than other methods due to its structure consisting of hyper parameters.

  • The robustness of the model’s performance was also assessed using Kruskal–Wallis (KW) tests, which were used to confirm the stability of the suggested LSTM. The KW test confirmed at 95% confidence level that the distribution of the predicted and actual models were the same.

  • The investigation discovered that predicting accuracy can be greatly increased by connecting the model outputs with aggregation techniques. The hybrid model was used by integrating the prediction outputs of LSTM, SVM, and MLANN with the Sugeno λ-measure and the Sugeno integral named (SLSM). SLSM improved prediction accuracy with an improvement of 11.7 w/m2 in reducing irregularities associated with SR data.

  • Finally, these results proved that the LSTM model is applicable, valid, and an alternative for SR prediction in Morocco, which has tropical and subtropical desert climate zones.

The six main limitations of this study can be listed as follows: (i) use of data obtained for six stations to represent Morocco, (ii) use of daily data from 2013 to 2020, (iii) use of correlation analysis in input selection, (iv) use of three different machine learning methods, (v) use of visual comparison criteria (Violin, Taylor) as well as performance metrics, and (vi) that KW testing was used to compare the distributions of predicted and actual data.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *