A novel digital-twin approach based on transformer for photovoltaic power prediction

Scientific Reports volume 14, Article number: 26661 (2024) Cite this article

Metrics details

The prediction of photovoltaic (PV) system performance has been intensively studied as it plays an important role in the context of sustainability and renewable energy generation. In this paper, a digital twin (DT) model based on a domain-matched transformer is proposed using convolutional neural network (CNN) for domain-invariant feature extraction, transformer for PV performance prediction, and domain adaptation neural network (DANN) for domain adaptation. The effectiveness of the proposed framework is validated using a PV power prediction dataset. The results indicate an accuracy improvement of up to 39.99% in model performance. Additionally, experiments with varying numbers of timestamps demonstrate enhanced PV power prediction performance as parameters are continuously updated within the DT framework, offering a reliable solution for real-time and adaptive PV power forecasting.

Photovoltaics (PV) plays an important role in the context of sustainability and renewable energy generation. As PV performance continues to grow in importance, accurate prediction of PV power is becoming increasingly important as it is useful to energy managers in optimizing PV system operations and to policy makers in setting effective energy goals and policies.

Approaches to predict PV power mainly include model-based and data-driven methods. Some representative physical models are characterized by the irradiance on the PV cell. However, since it depends on a number of variables such as the environment and cell conditions, predicting the energy production of a PV module is difficult1. With the development of artificial intelligence, many machine learning-based models have been studied and applied in predicting PV power. Models such as Markov chain2, support vector machine (SVM)3, and artificial neural network (ANN)4,5,6 have achieved good results without prior knowledge of the system. Nevertheless, conventional machine learning techniques usually require beneficial manual feature extraction for training, otherwise they lead to unsatisfactory results. Fortunately, deep learning uses deep neural networks to automatically learn hierarchical representations from raw data and progressively extract higher-level features.

Among the main deep learning architectures that have attracted much attention in the field of time series prediction are recurrent neural networks (RNN) and long short-term memory (LSTM). Several researchers have worked on the prediction of PV power using RNN and LSTM based models7,8,9,10,11. Wang et al.12 presented a LSTM-RNN model based on the principles of time correlation modification (TCM). In addition, partial daily pattern prediction (PDPP) is proposed to improve the TCM method for the outputs of the LSTM-RNN model. Liu et al.13 constructed a short-term PV power prediction model based on particle swarm optimization (PSO) optimized LSTM. Although the above methods have achieved excellent performances, both RNN and LSTM have difficulties in capturing long-term dependencies in sequences due to the problem of vanishing or exploding gradient. As sequences get longer, they have difficulty retaining information from early time steps, which limits their ability to understand long-term dependencies.

Recently, a transformer model based on a self-attention mechanism14 has been introduced in various natural language processing (NLP) tasks to effectively capture dependencies between different items in the input sequence. By paying attention to relevant parts of the input sequence during both encoding and decoding, transformer can better capture long-range dependencies, which is particularly beneficial for understanding long time series. While transformer has been widely used for language processing, Yu et al.15 combined temporal attention, frequency attention, Fourier attention with transformer to provide deeper information contained in the data. N. Kim et al.16 and Q. Phan et al.17 partially modified the transformer model for PV power prediction and both showed very good prediction performances. However, the dot product self-attention prevents transformer from learning contextual information about the time series, which is very important. Therefore, Li et al.18 propose convolutional self-attention by using causal convolutions to first capture contextual information and connect regions with similar shapes. This is very applicable to PV systems, since similar overall patterns in PV power are very helpful for prediction. Therefore, in this paper, a double-attention based architecture is proposed using CNN and transformer architectures. The time series are first fed into a one-dimensional convolution, which is used to extract information around each node, and then the relationships between the nodes are transformed using a multi-head attention mechanism in transformer.

In terms of long-range dependence and learning contextual information, the models are still static, meaning that they are not able to continuously update and evolve in real time to reflect changes in PV systems. Unlike static models or representations, digital twins, which consist of a physical and a digital space, are designed to stay synchronized with the real plant or system. Thus, they are dynamic and responsive to changes, allowing for more accurate predictions19,20. Moreover, the performance of deep learning models can significantly deteriorate when encountering target data that has a different distribution than the training data. This inherent limitation of deep learning models can pose threat to real-world applications, where shifts in data distribution are frequently encountered. For PV power prediction, PV systems at different geographical locations have altering data distribution. In this case, models trained on the source domain will be difficult to be employed on the target domain. To address these issues, transfer learning is used in the digital twin model so that knowledge gained from a dataset can be transferred to the target task by mitigating their domain gap21,22. For example, Zhang et al.23 proposed an Attention-LSTM and transfer learning method for PV power prediction. This knowledge transfer can be beneficial, especially when the two datasets are related and have underlying similar patterns. One way to achieve this is through domain adaptation adversarial training24, where a domain discriminator is used to distinguish between source and target domain data. The Domain Adaptation Neural Network (DANN) approach requires predictions based on features that cannot be discriminated between the source domain in the digital space of the digital twin and the target domain in the physical space. This ensures that the two domains are properly aligned, allowing knowledge from the former to be effectively and safely utilized in the latter25,26.

Since the concept of digital twin was first announced by Michael Grieves in 2002, it has spread to various industries such as manufacturing27,28, aerospace29, and so on. In30,31, attempts were made to build predictive models for wind turbine performance using the digital twin. Photovoltaic power output is highly erratic and variable due to the influence of temperature, solar irradiation, and other random factors. Physical entities are synchronized and updated in real-time, thus yielding forecasting results that are more accurate than those obtained using conventional approaches22. Considering this, a digital twin-based model is proposed in this paper to integrate real-time data and analyze predictions through transfer learning. The effectiveness of the proposed approach was verified on a PV system and the experimental results outperformed other SOTA models. In summary, the main contributions of this work can be described as follows.

The digital twin scheme is built with a data-driven approach, enabling synchronization of a dynamic model with the actual physical PV system and realizing models with real-time parameter updates for more accurate PV power predictions.

The transformer architecture with the self-attention mechanism makes it possible to effectively capture long-range dependencies in the input sequences. A Convolutional Self-Attention was added in front of the Multi-Head-Attention, allowing the acquisition of contextual information and the generation of features and thus the learning of similar PV overall power patterns.

A DANN approach was employed to fulfill real-time update for the interaction within the digital twin scheme, allowing it to adapt to power predictions of PV systems under different working conditions to meet real application needs.

The remainder of this study is organized as follows. “Related work” provides the theoretical background of the architecture. The proposed digital twin scheme is described in “Formulation of the proposed digital twin(DT) scheme”. The experimental evaluation and analysis of the results are presented in “Experiment”. Finally, “Conclusions” concludes this work.

DT are digital models generated via mechanism or data-driven approaches, representing actual objects, and systems in the real world. The National Aeronautics and Space Administration (NASA)’s Apollo mission, first embraced the concept of DT to advance its space exploration and research endeavors. The implementation of DT at NASA involves creating digital replicas of spacecraft, satellites, and even entire missions using sophisticated computational models. These digital twins enable NASA to monitor, analyze, and simulate the behavior of its space assets in real-time, providing invaluable insights for mission planning, operation optimization, and problem-solving.

With the recent advances in artificial intelligence and IoT technology, DT are integrated into many predictive applications and employed into various fields. Xiong et al. proposed a DT-driven approach to achieve predictive maintenance for aircrafts, and an implicit DT (IDT) was built to evaluate and monitor the degradation process of the aero-engine32. Qiao et al. presented a five-dimensional DT model together with a hybrid model prediction method for machining tool condition predictive measures33. He et al. reviews digital twin-driven (remaining useful life) RUL prediction methods for gear performance degradation, from the view of physical model-based and virtual model-based prediction method34.

Considering the complexity of PV power system and the various potential influential factors of the output power in real life applications, physical model-based method may trigger great limitations. Thus, this paper proposes a data-driven based DT scheme for PV power prediction.

Since the transformer has no built-in sense of word order or position, positional embeddings are introduced to provide the model with information about the positions of words in a sequence. After the order information is acquired, the inputs are fed into the encoder and decoder layers and decoder layers, which are fundamental building blocks that allow the model to process sequential data. The core principle within the encoder and decoder layers is the self-attention mechanism, which allows the importance of each input token to be weighed against others in the same sequence. It calculates the attention values for each token using (1). and allows the model to focus on more important information. This allows the model to understand the dependencies and long-range relationships between different parts of the input.

where

Xf stands for the feature matrix, and WQ, WK, WV stands for the three weight matrices of the query matrix Q, the key matrix K, and the value matrix V, respectively. T is the size of the input feature sequence, and dk is the dimension of K.

The following feedforward neural network in the encoder and decoder layers consists of fully connected layers and nonlinear activation features that facilitate learning to capture complex, nonlinear transformations of input features. Multiple encoder and decoder layers together form the Transformer model.

In this work, a transformer-based model is used to predict PV power prediction. While positional embeddings are usually added before the encoders and decoders to encode the positions of tokens within the input sequence, they cannot capture contextual information that is very useful in learning general PV power patterns. Therefore, the proposed transformer-based model replaces positional embedding with a CNN architecture for feature generation and for learning the inherent sequential PV power order.

This study introduces a digital twin model, which facilitates domain adaptation from PV systems in the digital space to the physical counterparts. Through the reduction of the domain gap, the model, initially trained on digital PV systems, demonstrates effective performance when applied to the physical PV system, despite differences. Moreover, through ongoing real-time interaction between the physical and digital spaces, the process of domain adaptation continually advances, ultimately leading to the complete adaptation of the physical PV system and providing precise predictions of PV power.

Digital twin of a PV system.

The digital twin of the PV system in this study, as shown in Fig. 1., consists of a physical entity that is a solar panel system and a digital counterpart that contains a large amount of PV power data from other PV systems.

The digital twin first collects data from the physical PV system, in which various sensors and devices are installed. These sensors collect data on parameters such as solar radiation, temperature, wind speed, humidity, and air pressure. The collected data and historical PV power data are then integrated and pre-processed to act as a digital representation of the physical PV system. A detailed description of the integration and data preprocessing can be found in Sect. 3.2. The integrated data of the physical PV system and the historical data of other PV systems in the digital space are then fed into the model to fulfill the prediction of PV power prediction through a Transformer-based DANN approach. The data are sequentially fed into the feature generator, transformer, and discriminator to achieve domain matching to predict the PV power of the physical space.

With the continuous interaction between the physical space and digital space, the real-time data are simultaneously mapped to the digital space to update the parameters in the model, and the predicted PV power from the latest updated model is fed back to the physical space. Through continuous iterative process, the PV power prediction model is gradually optimized till fully domain adapted.

Let X = {X1, X2, …, Xn} where Xi is a d-dimensional row vector input, and n is the number of time steps of the PV system in the physical space. Let Y= {y1, y2, …, yn} where yi is a scalar, representing the PV power at the ith time step. Similarly, X’ = {X1’, X2’, …, Xn’} is the n\(\:\times\:\)d input data and Y’= {y1’, y2’, …, yn’} is the labelled PV power in the digital space. The goal is to find f(X) = Y by adapting from f’(X) = Y’.

Figure 2. shows the flowchart of the data preprocessing. To properly analyze the raw PV data in both the physical and digital domains, the data must be normalized and processed through sliding windows.

Flowchart of data preprocessing.

The measurement ranges of the different sensors are different. Therefore, the data is converted to a standard scale to make it more suitable for comparison and processing. In this work, the min-max scaling is used, which is described as follows.

where xi is the ith sensor reading, and xmaxi and xmini are the maximum and minimum value of the ith sensor readings.

To analyze the relationship between adjacent data points, sliding window processing is used to integrate data points at different time steps with a time frame. The value of the window size w is the value of the hyperparameter input length. Figure 3. illustrates that the dimension of the processed data input is w\(\:\times\:\) (n + 1), where n is the number of sensors.

Sliding window processing.

While each of the n sensors corresponds to one of the parameters, the additional parameter is historical PV power data prior to the prediction time step shown in the yellow box. If the partial raw data with a time step range from \(\:m-(1+w)\) to \(\:m+w\) are mathematically expressed as the following matrix,

where x is the parameters, y is the PV power, and m is the mth timestamp, to predict the PV power from time step m to m + w, viz.

then the corresponding processed input data should be

After preprocessing the data, the proposed domain-adapted transformer model is trained on the training sets of the digital and physical domains and tested on the test set of the target domain. Hence, the tokens are configured using a sliding window processing approach, which combines data points from various time steps within a specific time frame. Specifically, real-time sensor data and historical PV power data are merged into a single feature set for each window. The PV power values for future time steps are assigned as labels.

DANN is a machine learning technique developed to solve the problem of domain adaptation, in which a model trained on the data of one domain is applied to another domain with potentially different data distributions. The structure of DANN consists of several key components that work together to facilitate domain adaptation: Feature Generator, PV Power Predictor, Domain Classifier, and Gradient Reversal Layer (GRL).

Figure 4. shows how the four components (Feature Generator, PV Power Predictor, Domain Classifier, GRL) build a DANN model and cooperate in a joint optimization process to achieve domain adaptation from the digital domain to the physical domain of the digital twin.

Overall framework of the model.

The feature generator \(\:{G}_{f}(x;{\theta\:}_{f})\) is a neural network responsible for converting the input data into a set of features, where \(\:{\theta\:}_{f}\) represents the parameters in \(\:{G}_{f}.\) These features are intended to capture relevant information for both the prediction task and the domain adaptation task.

The common feature generator for the physical and digital domain data is the CNN architecture shown in Fig. 5. After preprocessing the data, the input data had the form B × H × L, where B is the batch size, H is the number of feature parameters, and L is the input length. The data is next fed into a convolutional layer, outputting data in the shape of B × 2E × L, where E represents the number of features fed into the encoders and decoders of the transformer model. To incorporate the spatial data from the feature maps, the convolutional layer is followed by a max-pooling layer and an average-pooling layer, which produce two descriptors in the result, labeled average-pooling features and max-pooling features35. Then, a sigmoid function introduces non-linearity into the model. However, since the sigmoid function can lead to a vanishing gradient problem and thus slow convergence during training, batch normalization is used to avoid this problem and improve the stability of the network. To integrate the information from the two branches, the output feature maps are combined by summation with a Softmax function for normalized weights. A skip connection and a dropout layer are also introduced to avoid the vanishing gradient problem.

Flow of CNN architecture.

The domain classifier \(\:{G}_{d}(x;{\theta\:}_{d})\) is a neural network with a binary output that aims to distinguish between the physical and digital domains based on the features generated by the feature generator, where \(\:{\theta\:}_{d}\) represents the parameters in \(\:{G}_{d}\). It is tasked with learning to recognize domain-specific patterns in the data. GRL is a key component of DANN. It is applied after feature generation and before the domain classifier. It multiplies the gradients by a negative constant value \(\:-\lambda\:\) during backpropagation, before being passed to the feature generator36. By reversing the gradients, the DANN model learns to maximize the discrepancy between the domain prediction and the actual domains of the data. At the same time, it minimizes the discrepancy between the representations learned by the feature extractor for the digital and physical domains. Note that GRL acts as an identity transformation during forward propagation.

Mathematically, the forward and backward propagation behavior of the GRL can be described by (6) and (7):

where \(\:{R}_{\lambda\:}\left(x\right)\) is the pseudo-function of GRL and I is the identity matrix.

Thus, the loss function of DANN can be expressed as (8).

The optimal parameters of DANN can then be calculated with (9) and (10).

Note that \(\:\lambda\:\) should not be a constant during training31. It varies with the training process and is calculated as (11).

where p is the ratio of the current number of iterations to the total number of iterations.

After CNN captures relevant information from the data, they are passed to the discriminator to minimize domain-specific information. The domain classifier takes the learned representations from the CNN architecture and attempts to predict the domain of the input data (from physical or digital spaces). By discriminating between domains, it encourages the feature generator to learn domain invariant features that are less discriminative with respect to the physical and digital spaces.

The encoder and decoder layers in the Transformer model serve as PV power predictor \(\:{G}_{p}(x;{\theta\:}_{p})\), where \(\:{\theta\:}_{p}\) represents the parameters in \(\:{G}_{p}.\) Besides feeding into the discriminator, the CNN generated features are also provided to Transformer for PV power prediction.

The loss between the domain classification result and the true domain label is referred to as Ld. The loss between the result of the PV prediction and the ground truth is called Lp. Lp is the MSE loss and Ld is the cross-entropy loss represented by the following Eq.

During training, the total loss function of the DANN model is a combination of the loss of PV power prediction from the digital domain Lpd and the physical domain Lpp, and the loss of the domain classifier.

This joint optimization process causes the feature generator to learn domain-invariant features that are beneficial to the primary task while being independent of the specific data distribution of the physical and digital domains. The domain adaptation procedure is summarized in Algorithm 1.

Algorithm 1. Transformer-DANN algorithm.

Note that as the physical PV power system continuously interacts with the digital space via inputting the latest collected data into the model, the parameters \(\:{\theta\:}_{f},{\theta\:}_{p},\:{\theta\:}_{d}\) in CNN, Transformer and domain classifier are updated automatically till fully adapted.

This study uses authoritative open data from a PV power forecasting competition sponsored by State Power Rixin Tech. Co. Ltd. The data provided includes environmental data (temperature, wind speed, humidity, and barometric pressure), actual irradiance, and power generated by four PV power plants after desensitization from 2016 to 2018. The environmental data provided are predicted values instead of measured values, while the actual irradiance and the actual power generated by.

PV power plants are the measured values after desensitization. Note that the units of the regressors and response are not provided. For the purposes of the following analysis, these four stations are referred to as Station A, Station B, Station C, and Station D in this document. The experiment involves conducting two trials with trial 1 inputs only few training samples into the digital space for training. With the continuous interaction between the physical and digital space, the test set in trial 1 will be transferred into training set of trial 2, and trial 2 will be tested on the PV power prediction with the most updated measured signals. This increasing training sample size in the digital space demonstrates the enhanced performance of domain adaptation as the physical PV power system continuously updates its collected data.

Each trial designates one of the four stations as the physical P power system, with the remaining three stations utilized for constructing models in the digital space. For example, taking Station A as the physical PV system while Station B, C and D are in the digital space, the number of timestamps in the training, validation, and test set are described in Table 1. Samples are collected every 15 min with eight parameters: predicted irradiance, wind speed, wind direction, temperature, humidity, barometric pressure, actual irradiance, and PV power.

To obtain a comprehensive evaluation of the prediction performance, several metrics were used, including the root mean square error (RMSE) and the mean absolute error (MAE). They are respectively defined as follows:

where Yi and Y’i denote the ground truth and PV value of the ith testing sample. In a final step, the trained model is applied to test samples to predict PV power.

The hyperparameters are determined using random search, as shown in Table 2.

In addition, the Adam optimization algorithm37 was used to update the weights of the network. The length of the input data and the length of the output data in each sliding window are both set to 96 time steps. Since each measurement point is acquired every 15 min, an input length of 96 means that the input in each sliding window is the historical data for the last 24 hours. Correspondingly, the output in each sliding window is the forecast for the next 24 hours. The first value in the sliding output window is considered the PV power prediction of the current time step and is used as the historical data for the subsequent predictions. The sliding window is then moved to the next timestamp until all timestamps in the test set are predicted. The model is then compared to other SOTA models for performance evaluation. Considering the exemplary predictive capabilities of LSTM in time series forecasting, this study compares the domain-adapted transformer model with the non-domain-adapted transformer and LSTM model. The outcomes of Trial 1 and Trial 2 are graphically depicted in Figs. 6 and 7, respectively. The quantitative results, including RMSE and MSE are elucidated in Tables 3, 4, 5 and 6. Evidently, LSTM manifests inferior performance relative to the other models with the highest RMSE and MAE scores. In trial 1, the transformer-based domain-adapted model demonstrates efficacy in predicting Station A within physical space, while its performance is surpassed by the non-domain-adapted transformer in predicting Stations B, C, and D. This discrepancy is attributed to the challenge faced by the physical PV power system in adapting to disparate domains, particularly with a limited number of training samples. As more data is collected by the physical PV power system and updated into the digital system shown in trial 2, the proposed model outperforms the other two SOTA models. For example, Station D → Station A initially has a higher RMSE and MAE score than transformer without DA (RMSE: 0.932 > 0.909, MAE: 0.760 > 0.655) in trial 1, but achieves lower scores in Trial 2 (RMSE: 0.559 < 0.829, MAE: 0.312 < 0.407). Station C → Station B primarily has a higher RMSE and MAE score than transformer without DA (RMSE: 0.739 > 0.734, MAE: 0.558 > 0.500) in trial 1, but it outperforms transformer without DA in trial 2 (RMSE: 0.959 > 1.027, MAE: 0.539 > 0.685). The same applies to Station D → Station C, Station A → Station D, and Station C → Station D. Figure 8. illustrate the difference in RMSE value between the proposed model and transformer without DA in trial 1 and trial 2. Positive values denote superior performance of the proposed model. It is discernible that, despite some initially negative differences in trial 1, many tasks exhibit positive differences in trial 2, underscoring the adaptability and enhanced performance of the proposed model. Although a few tasks maintain negative values in trial 2, the substantial reduction in the difference between RMSE values in trial 1 and trial 2 signifies an improvement in the model’s performance within the domain adaptation scheme. An exception is noted in the domain adaptation of Station A → Station C and Station C → Station A, possibly attributable to a significant domain gap between these stations. Overall, the DT scheme for domain adaptation proves to be effective.

Comparison of the PV power prediction of three models in Trial 1. (Red line represents the PV power predicted with Transformer with DA. Blue line represents the ground truth of PV power. Yellow line represents the PV power predicted with Transformer without DA. Green line represents the PV power predicted by LSTM.

Comparison of the PV power prediction of three models in Trial 2. (Red line represents the PV power predicted with Transformer with DA. Blue line represents the ground truth of PV power. Yellow line represents the PV power predicted with Transformer without DA. Green line represents the PV power predicted by LSTM.

RMSE value comparison between proposed model and transformer without DA.

The impact of ongoing update in Digital Twin on Domain Adaption. (Red dots represent the feature distribution of digital space. Blue dots represent the feature distribution of physical space.)

Using t-distributed stochastic neighbor embedding (t-SNE) visualization38, Fig. 9 shows a significant effect of the continuous updates in DT on domain adaptation. Specially, the physical space encompasses Station C, and Station A, B, and D are in digital space. As shown in Fig. 9a,d,g, there is an obvious distinction in the feature distributions between physical space and digital space. Following the process of domain adaptation, depicted in Fig. 9b,e,h, the generated features from both spaces amalgamate, signifying a diminishing gap in the various data distributions. Continuously, as Station C in the physical space accrues data, concomitant updates to the model parameters in the digital space contribute to a thoroughly mixed data distribution between the two spaces in the DT scheme, evident in Fig. 9f,i.

Quantitatively, Table 3c reveals that Station D → Station C exhibits a higher RMSE score when compared to the transformer without DA in trial 1. However, in trial 2, these scores notably decrease, surpassing the performance of the transformer without DA. Similarly, while the RMSE value for Station B → Station C remains higher than that of the transformer without DA, the magnitude of the difference diminishes. This substantiates the effectiveness and robustness of our proposed DT scheme in enhancing domain adaptation, thereby leading to more accurate predictions of PV power output. Figure 10 shows a radar chart that evaluates the proposed model with five criterions: accuracy, complexity, training time, robustness and generalization. It can be seen from the chart that transformer with DA achieves higher accuracy, generalization and robustness as there is the domain adaptation from the digital space to physical space. This allows the DT system to adapt to varying operational conditions.

Radar chart for model evaluation. (Red area represents model evaluation of LSTM. Blue area represents model evaluation of transformer without DA. Green area represents model evaluation of transformer with DA.)

This work presents a comprehensive framework for accurate PV power predictions through the development of a DT scheme which is composed of a Transformer-based architecture with a DA strategy. The DT approach ensures real-time synchronization with the physical PV system, enabling continuous model updates for more precise power forecasting. By incorporating a transformer architecture with self-attention and a convolutional self-attention module, the model effectively captures long-range dependencies and contextual patterns in PV power sequences, enhancing prediction accuracy. Additionally, the integration of DANN allows the DT system to adapt to varying operational conditions, ensuring its effectiveness in real-world applications. The effectiveness of the proposed method is evaluated using a dataset to predict PV performance. Results show a maximum 39.99% boost of accuracy in model performance. Experiments with different number of timestamps are also conducted to show the improving performance on PV power prediction results with continuous updates of the parameters within the DT scheme. providing a robust solution for real-time and adaptive PV power prediction.

The dataset used is an open sourced dataset, so the author has no permission to share. Data will be made available on request to corresponding author.

Dolara, A., Leva, S. & Manzolini, G. Comparison of different physical models for PV power output prediction, Sol. Energy 119, 83–99. https://doi.org/10.1016/j.solener.2015.06.017 (2015).

Li, Y. Z., He, L. & Nie, R. Q. Short-term forecast of power generation for grid-connected photovoltaic system based on advanced Grey-Markov chain. In International Conference on Energy and Environment Technology, 2009, 275–278. https://doi.org/10.1109/ICEET.2009.305 (2009).

Shi, J., Lee, W. J., Liu, Y., Yang, Y. & Wang, P. Forecasting power output of photovoltaic systems based on weather classification and support vector machines. IEEE Trans. Ind. Appl. 48(3), 1064–1069. https://doi.org/10.1109/TIA.2012.2190816 (2012).

Khatib, T., Mohamed, A., Mahmoud, M. & Sopian, K. A new approach for meteorological variables prediction at Kuala Lumpur, Malaysia, using artificial neural networks: application for sizing and maintaining photovoltaic systems. J. Sol Energy Eng. 134 (021005). https://doi.org/10.1115/1.4005754 (2012).

Abuella, M. & Chowdhury, B. Solar power forecasting using artificial neural networks. In 2015 North American Power Symposium (NAPS), 1–5. https://doi.org/10.1109/NAPS.2015.7335176 (2015).

O’Leary, D. & Kubby, J. Feature selection and ANN solar power prediction. J. Renew. Energy. 2017, e2437387. https://doi.org/10.1155/2017/2437387 (2017).

Photovoltaic power prediction. using a recurrent neural network RNN | IEEE Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/9236461 (accessed 16 Oct 2023).

Harrou, F. et al. Forecasting of photovoltaic solar power production using LSTM approach. In Advanced Statistical Modeling, Forecasting, and Fault Detection in Renewable Energy Systems. https://doi.org/10.5772/intechopen.91248 (IntechOpen, 2020).

Yang, J., Zhang, S., Liu, J., Xiang, Y. & Han, X. Short-term photovoltaic power prediction based on variational mode decomposition and long short-term memory with dual-stage attention mechanism, Dianli Xitong ZidonghuaAutomation Electr. Power Syst. 45, 174–182. https://doi.org/10.7500/AEPS20200226011 (2021).

Day-Ahead Nonparametric Probabilistic Forecasting of Photovoltaic Power Generation Based on the LSTM-QRA Ensemble Model | IEEE Journals. & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/document/9186100. Accessed 16 Oct 2023.

Li, X., Huang, Y. & Shi, Y. Ultra-short term power load prediction based on gated cycle neural network and XGBoost models, J. Phys. Conf. Ser. 2026(1), 012022. https://doi.org/10.1088/1742-6596/2026/1/012022 (2021).

Wang, F. et al. A day-ahead PV power forecasting method based on LSTM-RNN model and time correlation modification under partial daily pattern prediction framework. Energy Convers. Manag. 212, 112766. https://doi.org/10.1016/j.enconman.2020.112766 (2020).

Liu, Y. et al. Short-term prediction of photovoltaic power based on DBSCAN-SVM data cleaning and PSO-LSTM Model. Energy Eng. 121(10), 3019–3035. https://doi.org/10.32604/ee.2024.052594 (2024).

Vaswani, A. et al. Attention is all you need, Aug. 01, arXiv: arXiv:1706.03762. https://doi.org/10.48550/arXiv.1706.03762 (2023).

Yu, C., Qiao, J., Chen, C., Yu, C. & Mi, X. TFEformer: a new temporal frequency ensemble transformer for day-ahead photovoltaic power prediction. J. Clean. Prod. 448, 141690. https://doi.org/10.1016/j.jclepro.2024.141690 (2024).

Transformer based. prediction method for solar power generation data | IEEE Conference Publication | IEEE Xplore. Accessed: Oct. 16, 2023. https://ieeexplore.ieee.org/document/9620897

Phan, Q. T., Wu, Y. K. & Phan, Q. D. An approach using transformer-based model for short-term PV generation forecasting. In 8th International Conference on Applied System Innovation (ICASI), 2022, 17–20. https://doi.org/10.1109/ICASI55125.2022.9774491 (2022).

Li, S. et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, Jan. 03, arXiv: arXiv:1907.00235. https://doi.org/10.48550/arXiv.1907.00235 (2020).

VanDerHorn, E. & Mahadevan, S. Digital Twin: generalization, characterization and implementation. Decis. Support Syst. 145, 113524. https://doi.org/10.1016/j.dss.2021.113524 (2021).

Jones, D., Snider, C., Nassehi, A., Yon, J. & Hicks, B. Characterising the Digital Twin: a systematic literature review. CIRP J. Manuf. Sci. Technol. 29, 36–52. https://doi.org/10.1016/j.cirpj.2020.02.002 (2020).

Ganin, Y. et al. Domain-adversarial training of neural networks, May 26, 2016, arXiv: arXiv:1505.07818. https://doi.org/10.48550/arXiv.1505.07818

Yang, H. & Wang, W. Prediction of photovoltaic power generation based on LSTM and transfer learning digital twin. J. Phys. Conf. Ser. 2467 (1), 012015. https://doi.org/10.1088/1742-6596/2467/1/012015 (2023).

Zhang, J., Hong, L., Ibrahim, S. N. & He, Y. Short-term prediction of behind-the-meter PV power based on attention-LSTM and transfer learning. IET Renew. Power Gener. 18 (3), 321–330. https://doi.org/10.1049/rpg2.12829 (2024).

Article Google Scholar

Knapp, G. L. et al. Building blocks for a digital twin of additive manufacturing. Acta Mater. 135, 390–399. https://doi.org/10.1016/j.actamat.2017.06.039 (2017).

Long, M., Cao, Y., Wang, J. & Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on Machine Learning, PMLR, 97–105. (2015). https://proceedings.mlr.press/v37/long15.html (accessed 18 Sep 2024).

Chen, X., Wang, S., Wang, J. & Long, M. Representation subspace distance for domain adaptation regression. In Proceedings of the 38th International Conference on Machine Learning, PMLR, 1749–1759 (2021). https://proceedings.mlr.press/v139/chen21u.html (Accessed 18 Sep 2024).

Schleich, B., Anwer, N., Mathieu, L. & Wartzack, S. Shaping the digital twin for design and production engineering. CIRP Ann. 66 (1), 141–144. https://doi.org/10.1016/j.cirp.2017.04.040 (2017).

Transfer Learning With Neural Networks for Bearing Fault Diagnosis in Changing Working Conditions | IEEE Journals. & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/document/7961149 (accessed 16 Oct 2023).

Tuegel, E. J., Ingraffea, A. R., Eason, T. G. & Spottswood, S. M. Reengineering aircraft structural life prediction using a digital twin, Int. J. Aerosp. Eng. 2011, e154798. https://doi.org/10.1155/2011/154798 (2011).

Sivalingam, K., Sepulveda, M., Spring, M. & Davies, P. A Review and methodology development for remaining useful life prediction of offshore fixed and floating wind turbine power converter with digital twin technology perspective. In 2nd International Conference on Green Energy and Applications (ICGEA) 2018, 197–204. https://doi.org/10.1109/ICGEA.2018.8356292 (2018).

Machine Learning-. Based Digital Twin for Predictive Modeling in Wind Turbines | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/document/9696318 (accessed 16 Oct 2023).

Xiong, M., Wang, H., Fu, Q. & Xu, Y. Digital twin–driven aero-engine intelligent predictive maintenance, Int. J. Adv. Manuf. Technol. 114(11–12), 3751–3761. https://doi.org/10.1007/s00170-021-06976-w (2021).

Qiao, Q., Wang, J., Ye, L. & Gao, R. X. Digital twin for machining tool condition prediction. Proc. CIRP. 81, 1388–1393. https://doi.org/10.1016/j.procir.2019.04.049 (2019).

Article Google Scholar

He, B., Liu, L. & Zhang, D. Digital twin-driven remaining useful life prediction for gear performance degradation: a review. J. Comput. Inf. Sci. Eng. 21, 030801. https://doi.org/10.1115/1.4049537 (2021).

Liu, L., Song, X. & Zhou, Z. Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture. Reliab. Eng. Syst. Saf. 221, 108330. https://doi.org/10.1016/j.ress.2022.108330 (2022).

Ganin, Y. & Lempitsky, V. Unsupervised domain adaptation by backpropagation, Feb. 27, arXiv: arXiv:1409.7495. https://doi.org/10.48550/arXiv.1409.7495 (2015).

Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization, Jan. 29, arXiv: arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980 (2017).

van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (86), 2579–2605 (2008).

Google Scholar

Download references

Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore

Xi Zhao

You can also search for this author in PubMed Google Scholar

The author worked on this manuscript solely.

Correspondence to Xi Zhao.

The authors declare no competing interests.

The dataset used is an open sourced dataset. There are no potential ethical issues to declare.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Zhao, X. A novel digital-twin approach based on transformer for photovoltaic power prediction. Sci Rep 14, 26661 (2024). https://doi.org/10.1038/s41598-024-76711-4

Download citation

Received: 24 January 2024

Accepted: 16 October 2024

Published: 04 November 2024

DOI: https://doi.org/10.1038/s41598-024-76711-4

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

A novel digital-twin approach based on transformer for photovoltaic power prediction | Scientific Reports