IDE4L is a project co-funded by the European Commission Project no: 608860 Project acronym: IDE4L Project title: IDEAL GRID FOR ALL Deliverable 5.1: State Estimation and Forecasting Algorithms on MV & LV Networks Due date of deliverable: 01.03.2015 Actual submission date: 01.03.2015 Start date of project: 01.09.2013 Duration: 36 months Lead beneficiary name: Dansk Energi, Denmark Authors: Dansk Energi (DE), Universidad Carlos III de Madrid (UC3M), Tampere University of Technology (TUT) Project co-funded by the European Commission within the Seventh Framework Programme (2013-2016) Dissemination level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)
61
Embed
Deliverable 5.1: State Estimation and Forecasting Algorithms on … V 1.0.pdf · 2015-02-27 · IDE4L Deliverable 5.1 2 IDE4L is a project co-funded by the European Commission Track
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IDE4L is a project co-funded by the European Commission
Project no: 608860
Project acronym: IDE4L
Project title: IDEAL GRID FOR ALL
Deliverable 5.1: State Estimation and Forecasting Algorithms on MV & LV Networks
Due date of deliverable: 01.03.2015
Actual submission date: 01.03.2015
Start date of project: 01.09.2013 Duration: 36 months
Lead beneficiary name: Dansk Energi, Denmark
Authors:
Dansk Energi (DE), Universidad Carlos III de Madrid (UC3M), Tampere University of Technology (TUT)
Project co-funded by the European Commission within the Seventh Framework Programme (2013-2016)
Dissemination level
PU Public X
PP Restricted to other programme participants (including the Commission Services)
RE Restricted to a group specified by the consortium (including the Commission Services)
CO Confidential, only for members of the consortium (including the Commission Services)
IDE4L Deliverable 5.1
2 IDE4L is a project co-funded by the European Commission
Track Changes
Version Date Description Revised Approved
0.1 16.2.2015 First draft Daniel Olmeda
0.2 17.2.2015 Second draft Antti Mutanen
0.2 20.2.2015 Second draft Fannar Thordarson
0.2 20.2.2015 Second draft Sami Repo
0.2 20.02.2015 Second draft Jasmin Mehmedalic
0.4 27.02.2015 Fourth draft Daniel Olmeda
0.5 27.02.2015 Fifth draft Antti Mutanen
0.5 27.02.2015 Fifth draft Fannar Thordarson
0.5 27.02.2015 Fifth draft Jasmin Mehmedalic
1.0 27.02.2015 Final version Zaid Al-Jassim Zaid Al-Jassim
IDE4L Deliverable 5.1
3 IDE4L is a project co-funded by the European Commission
2 STATE OF THE ART ..................................................................................................................................... 6
2.1 State Estimation ................................................................................................................................ 7
2.2 Load Forecasting Load forecasting is a topic of great interest for electric utilities. By making load forecasting an integral part
of planning and operation, utilities are able to address crucial decisions on generation and purchasing of
electric power and future infrastructure development.
Load forecasting involves the accurate prediction of the electric load in a geographical area within a
planning horizon. Based on this horizon, load forecasting is usually classified in three categories: short-term
load forecasting (STLF) for a horizon within one day ahead, medium-term load forecasting (MTLF) for one
day to one year ahead and long-term load forecasting (LTLF) for one year to ten years ahead planning. The
scope of this review is the STLF methods.
2.2.1 Methods
In their most general form load forecasting models can be classified into two broad categories: statistical
approaches and artificial intelligence-based (AI-based) models. Statistical models forecast the load based
on historical data and/or exogenous variables such as weather, day of the week, and the date. Classical
statistical approaches include similar-day, regression, exponential smoothing and time series based models
[Brockwell2002].
On the contrary, AI-based (or non-parametric) models are more flexible and can cope with complexity.
Those systems enable the mapping of the inputs of the model with the outputs, fitting a non-linear curve
through a learning process. Among those inputs, the most relevant exogenous variable is the temperature.
Other parameters that should also be taken into account include user’s behaviour, electricity price,
geographic location and whether or not the forecast horizon includes non-working days. Several AI-
methods have been proposed for STLF, such as Artificial Neural Networks [Winters1960], fuzzy logic
[Peng1992] and expert systems [Hippert2001]. Support vector regression (SVR) [Ho1990] has been
proposed as a feasible alternative to ANNs.
In [Agrawal2013] an introductory study on time series modelling and forecasting is presented. An overview
of load forecasting methods may be found in [Papalexopoulos1994].
In this section, a brief overview of the most relevant methods for load demand forecasting is presented. For
a more detailed review of the state of art, please refer to state of art document of the IDE4L project.
The Similar-Day method or Naive Approach is based on searching for days in the historic data that
shares some common characteristic with the forecasted day. Those common characteristics may
include day of the week, day of the year, kind of day (holidays) and weather. The forecast is either
IDE4L Deliverable 5.1
11 IDE4L is a project co-funded by the European Commission
the load of a single similar-day or a linear combination of several similar-days. Similar-day methods,
though simple, usually achieve low forecast errors and are used as a benchmark to compare new
forecasting approaches
Regression is one of the most widely used statistical techniques for predicting electricity demand.
These regression methods use weighted least squares techniques to model the statistical
relationship between the load and other factors such as temperature, light intensity, wind speed,
humidity, type of day, and demand response. The regression coefficients are calculated by equally
or exponential weighted least squares, using a range of historical measures.
Load forecasting techniques based in stochastic time series assume, in their simplest form, that the
future load is only a function of the previous loads. The input of the algorithm is the load pattern of
historic data. The most common techniques in this field are known as Auto-Regressive (AR) and
Moving Average (MA). These approaches can be combined (i.e. ARMA) and expanded to non-
stationary processes by using the integrated moving average (i.e. ARIMA). Not considering the
effect of weather or other external variables, such as sociological variables, may result in less
robust forecasts [Alfares2002]. If the load is also dependent on external variables, variations of the
previous techniques may be applied (e.g. ARMAX).
Artificial Neural Networks (ANNs) are biologically inspired mathematical models, which learn a
possibly non-linear mapping between inputs and outputs of a system. The network structure
consists in three layers of neurons. The first layer has the same number of neurons as the number
of inputs. The second layer is hidden and encompasses an arbitrary number of neurons. The third
layer has the same number of neurons as there are outputs. Rather than explicitly modelling the
system as a mathematical function, ANNs learn the relation between input and output by example.
The result is a non-linear curve fitting.
Support Vector Machines (SVMs) were introduced by Vapnik [Vapnik2000] as a supervised learning
method for solving classification, and later regression, problems. The original definition of SVM
considered the separation of the feature space as a linear function, thus having poor results on
highly non-linear systems. With the addition of the kernel mapping concept, SVMs are able to map
the feature space into a higher (possibly infinite) dimension space, where the samples can be
separated using simple linear functions to create linear decision boundaries.
2.2.2 IDE4L Concept
In the IDE4L project load and production forecasting will be used in the following WPs:
WP5: Congestion management: mainly in:
o Task 5.1 State estimation where load estimation algorithms will be applied to
forecast the load demand of non-telemetered customers (MV, LV). This information
will be used as pseudo-measurements in the state estimation algorithm.
o Task 5.2 Power Control Algorithm will use load forecasting in the secondary
controller (MV) and (LV) and also in the tertiary controller for performing
congestion management, voltage control and network reconfiguration.
2.2.3 Recommendations in regard to the IDE4L concept
Stochastic time series analysis has proven to be robust, and to require small computational demands. Due
to its characteristics, the adoption of an autoregressive model with exogenous inputs forecaster is
suggested. Autoregressive models present the advantage of flexibility, since they may represent different
IDE4L Deliverable 5.1
12 IDE4L is a project co-funded by the European Commission
order time series. Moreover, in general, adding exogenous inputs, such as weather forecasts, improves
forecast results.
2.3 Production forecasting Short term forecasting techniques of wind or photovoltaic energy can be classified into physical models,
statistical models and Artificial Intelligence-based models (AI).
2.3.1 Methods
In this section, a brief overview of the most relevant methods for generation forecasting is presented. For a
more detailed review of the state of art, please refer to state of art document of the IDE4L project.
Physical models are based on physically modeling the location of the renewable sources whose
power is to be forecasted. In the case of wind turbines, models try to forecast wind reaching each
of the turbines, in order to be able to give a forecast of wind power. The physical models use as
input information global or regional weather forecasts (speed and direction of the wind, or sun
irradiance). Such data is adapted to the installation under study using a meso-scale or micro-scale
model to estimate the wind or irradiance at the position and height where wind turbines or
photovoltaic panels are located. Subsequently, in the case of wind turbines, this wind speed is
transformed into a value of power by using the power curves of installed machines, and a similar
process is used for photovoltaic panels.
The model Prediktor, developed by Landberg in the Risø National Laboratory in Denmark for Elkraft
electrical system operator, is an example of a physical model [Landberg1994]. It makes use of WAsP
(Wind Atlas Analysis and Application Program) to estimate the wind reaching the park turbines,
based on the weather forecasts of the atmospheric model HIRLAM. This information is then used to
forecast power output through the use of the power curve. The effect of turbine wake is
considered through the use of PARK, which models this effect based on the relative position of the
turbines inside the park. Other effects that physical models do not consider are corrected with a
statistical model or MOS, adjusting the results with historical power measures.
Among statistical models the family of time series is especially relevant to the task of generation
forecasting. Generally those methods consider that the output of the forecast system depends only
on historical states of the variables, which are used as inputs to the model. In addition to historical
data, meteorological forecasts of atmospheric models may be used as exogenous inputs. Examples
of statistical models are: Seasonality analysis, Box-Jenkins or Autoregressive Integrated Moving
Average (ARIMA), Multiple Regressions and Exponential Smoothing [Agrawal2013].
A forecast model based on time series extrapolates future values of a variable through analysis of a
set of past values of that variable or other descriptive variables. This is the approach followed in
ARIMA models or Box-Jenkins, among others, which has proven to be useful for forecasting some
industrial processes. In the context of wind power forecasting, they provide reasonably good
results for horizons up to 6 hours. These models have been applied in [Boland2008], mainly
because of the ability of the ARMA models to extract significant statistical properties, following the
Box-Jenkins methodology.
Several time series models of irradiance measurements are compared in [Reikard2009] to forecast
short-term PV production, ranging from few minutes to 6h. Simple auto-regressive (AR) models are
used in [Bacher2009] to directly forecast the PV production, comparing its performance with other
methods.
IDE4L Deliverable 5.1
13 IDE4L is a project co-funded by the European Commission
The main advantage of using ARMA models is their flexibility, since they may represent different
order time series. It has been shown, in addition, that ARMA models are able to robustly process
time series with an underlying linear correlation.
Previously discussed physical and statistical methods are computationally expensive and large
amounts of data from weather forecasts are needed in order to provide accurate production
forecasts. Some authors propose forecast techniques based on Artificial Intelligence (AI), for
example [Mellit2008] [Yona2008], to model and forecast the sun irradiance. Results show that AI
methods, such as genetic algorithms (GA), Fuzzy logic, expert systems or neural networks, do not
require any a priori knowledge of the internal parameters of the system, demand less
computational resources than traditional methods and are robust to multivariate problems. ANNs
have been applied with great success for the estimation of the solar irradiance in [Guarnieri2008],
where it is shown that ANNs reduce the average normalized quadratic error of global horizontal
irradiance (GHI) by 15% when compared with numerical weather predictions (NWP) forecasts,
within a 12-18 h horizon.
2.3.2 IDE4L Concept
In the IDE4L project the forecast of photovoltaic installations and small wind turbine generation can be
used in the following work packages:
WP5: State Estimation and Power Control
WP2: Planning Tools for Distribution Network Management
WP6: Distribution networks dynamics
2.3.3 Recommendations in regard to the IDE4L concept
Generation forecasting methods may be classified as physical, statistical or AI-based. The statistical models
provide better results for short-term horizons than physical models do and need less information about the
plant. Among statistical methods, autoregressive models present the advantage of flexibility, since they
may represent different order time series. Moreover, in general, adding exogenous inputs, such as weather
forecasts, improves forecast results.
IDE4L Deliverable 5.1
14 IDE4L is a project co-funded by the European Commission
3 Design Specifications
3.1 State Estimation The state estimation algorithm will be designed so that the same algorithm can be used for both medium
voltage network state estimation (MVSE) and low voltage network state estimation (LVSE). Furthermore,
since the state estimators and state forecaster have several similarities and common inputs, also the state
forecasting will be done largely with the same algorithm.
The state estimator will be based on a state estimator core developed in earlier INTEGRIS and Smart Domo
Grid projects where TUT and A2A co-operated to create a state estimator suitable for decentralized
monitoring and control of smart grids. In the decentralized monitoring and control concept the MV and LV
network monitoring applications (e.g. state estimation) and control applications (e.g. congestion
management and fault management) are run in different physical locations. LV network applications are
run at the secondary substation automation unit (SSAU) and MV network applications are run at the
primary substation automation unit (PSAU). This reduces the data transfer need between smart meters and
upper level control systems. Only necessary information and alarms from the low voltage network are sent
to upper level systems.
The state estimator core is based on a weighted least squares (WLS) state estimator that uses branch
currents as state variables. In the IDE4L project, the earlier developed state estimator will be improved by
making it more automatic, as well as able to adapt to changing network configurations and measurement
setups. The improvements will be done largely by adding new support functions around the state estimator
core. Functions that can read the network topology from a CIM compatible database, adjust the network
topology based on switch status information and forecast the future load and DG production will be added.
State estimator and state forecaster will receive load and production estimates and forecasts from low and
medium voltage network load and production forecasters (LVF & MVF), which are also described in this
document. Another important relation is the connection to databases. Almost all the inputs for the state
estimator, including network topology and real-time measurements, are read from local SQL databases. In
this project, these databases are referred to as Data eXchange Platforms (DXPs) and they are used for
storing data and sharing data between different functions. State estimator software accesses databases by
using Octave Database package. These databases contain, real-time measurements received from RTUs and
smart meters, and they are organized according to the IEC 61850 standard. The network topology contains
information about the structure of the grid, starting from the primary substation until the point of energy
delivery at the customer premises. Also, a formal definition of the measurements placed over the network
topology is provided. The topology data is based on a CIM model.
Table 3.1.1 contains step-by-step descriptions for an algorithm executing both LVSE and LVSF. Figure 3.1.1
clarifies further the connections between the different steps. The steps and flowcharts for MVSE and MVSF
are practically identical with the ones presented below and can be found from [IDE4L2014a]. The design
specification documents [IDE4L2014a] and [IDE4L2014b] include also detailed descriptions for all the
information that is either read from the database or written to the database. Later in chapters 4.1.3.1 and
4.1.3.2 the inputs and outputs for step 6 (state estimation) are described in detail.
IDE4L Deliverable 5.1
15 IDE4L is a project co-funded by the European Commission
Table 3.1.1. LVSE and LVSF algorithm steps.
1 Network topology import function requests network topology information from the secondary substation
database (LV DXP).
2 Switch status import function requests switch status information from the secondary substation database
(LV DXP).
3 Topology information processing function reshapes the network topology information into a format
understood by the state estimator.
4 Real-time measurement reading function reads the real-time measurements from the secondary substation
database (LV DXP).
5 Real-time measurement filtering function evaluates the real-time measurements read in step 4, removes
erroneous measurements and saves them into the secondary substation database (LV DXP).
6 State estimation function calculates the best possible estimates for the network states using the available
information gathered in steps 1– 5 and received from LVF.
7 Export function writes the state estimation results into the secondary substation database (LV DXP)
8 State forecasting function calculates forecasts for the network states using the available information gathered
in steps 1–3 and received from LVF.
9 Export function writes the state forecasting results into the secondary substation database (LV DXP)
The state estimation algorithm will be run on a fixed schedule (e.g. once every minute which is expected to
be the RTU measurement reading interval). If the status of the network switches or fuses changes, the
network topology is re-evaluated and state estimation is re-calculated immediately. Fault Location,
Isolation and Restoration algorithm (FLIR, WP4) and Network Reconfiguration Algorithms (NRA, WP5)
supply information about the switch positions and blown fuses.
The state estimates are always made for the present time. The state forecasts are made from t=1 to a
predefined n-steps ahead. The length of the forecasting horizon depends on the forecasting moment and
can be anything between 24 – 48 hours. The forecasting resolution can vary – a higher temporal resolution
is used for the immediate future and lower temporal resolution is used for time moments further away.
The state estimation and state forecasting algorithm will be implemented as an Octave program. Octave
can be run either on Windows or Linux machine. Windows will be used during the development phase and
Linux during the testing and implementation phases.
IDE4L Deliverable 5.1
16 IDE4L is a project co-funded by the European Commission
1. Network topology import
2. Switch status import
3. Topology information
processing
has the network
topology or switching
status changed?
YES
NO
Load and production estimates
from LVF (external input)
4. Real-time measurement
reading
5. Real-time measurement
filtering
6. State estimation
7. Export state estimation
results
8. State forecasting
Do the state
forecasts need
updating?
YES
NO
Has someone
modified network
topology information?
YES
NO
START
YES YESt<= forecasting
horizon
3. Topology information
processing
NO
9. Export state
forecasting results
NO
Are the scheduled
switch statuses
same as in t-1?
Load and production forecasts
from LVF (external input)
t=t+1
Figure 3.1.1. State estimation and state forecasting algorithm flow chart.
3.2 Load forecasting
3.2.1 Low Voltage Load Forecasting
In the LV load forecaster core there is a time series load forecaster that uses measurements (LV load
demand, weather data), weather forecasts (from a local weather station) and flexible demand schedule.
The main program running the load forecaster can be divided into 4 distinct steps:
1) Real-time measurement reading
2) Real-time measurement filtering
IDE4L Deliverable 5.1
17 IDE4L is a project co-funded by the European Commission
3) Load forecasting
4) Exporting of load forecasting results
Depending on the resolution and time horizon for forecasts needed in the IDE4L project, the LV load
forecaster will be able to provide both 1) very short-term forecasts (up to 30 minutes ahead with 10
minutes resolution) and 2) short-term forecasts (up to 48h with hourly resolution).
1) Short-term load forecasts: The algorithm will give load forecasts for the next 24-48h and it will run
on demand (as required in the IDE4L project) or by schedule, e.g. every day at 00:00 GMT.
- Inputs:
a. Day-ahead forecast (24h-48h) from a local weather station. Data required are: wind speed
and direction, solar irradiance, temperature, humidity and pressure with hourly time step.
b. Last year weather measurements from a local weather station. Data required are: Previous
year time series that includes wind speed and direction, solar irradiance, temperature,
humidity and pressure with hourly time step.
c. Historical load measurements for every LV customer. Minimal amount of data needed is
the last year with hourly time step.
- Outputs:
a. Load forecasts for the next 24-48h for every LV customer.
b. Time resolution: 1h
c. Forecast update step: on demand (if needed in the IDE4L project) and also daily at a fixed
schedule hour (e.g. 00:00 GMT).
d. Forecast horizon: from 1h up to 24h – 48h
2) Very short-term load forecasts: The algorithm will generate load forecasts for the next 10-30
minutes and, as required in the IDE4L project, it will be run on demand.
- Inputs:
a. Historical load for every LV customer. Minimal amount of data needed is the last 168 hours
with 10 minutes time step.
- Outputs:
a. Load forecasts for the next 10-30 minutes for every LV customer.
b. Time resolution: 10-minutes
c. Forecast update step: on demand (as required in the IDE4L project)
d. Forecast horizon: from 10 minutes up to 30 minutes.
IDE4L Deliverable 5.1
18 IDE4L is a project co-funded by the European Commission
3.2.2 Medium Voltage Load Forecasting
In the MV load forecaster core there is a time series load forecaster that uses measurements (MV load
demand, weather data), weather forecasts (from a local weather station) and flexible demand schedule.
The main program running the load forecaster can be divided into 4 distinct steps:
1) Real-time measurement reading
2) Real-time measurement filtering
3) Load forecasting
4) Exporting of load forecasting results
Depending on the resolution and time horizon for forecasts needed in the IDE4L project, the MV load
forecaster will be able to provide both 1) very short-term forecasts (up to 30 minutes ahead with 10
minutes resolution) and 2) short-term forecasts (up to 48h with hourly resolution).
1) Short-term load forecasts: The algorithm will give load forecasts for the next 24-48h and it will run
on demand (as required in the IDE4L project) or by schedule, e.g. every day at 00:00 GMT.
- Inputs:
a. Day-ahead forecast (24h-48h with hourly resolution) from a local weather station. Data
required are: wind speed and direction, solar irradiance, temperature, humidity and
pressure with hourly time step.
b. Last year weather measurements from a local weather station. Data required are: Previous
year time series that includes wind speed and direction, solar irradiance, temperature,
humidity and pressure with hourly time step.
c. Historical load measurements for every MV customer; aggregated load demand at the
MV/LV substation. Minimal amount of data needed is the last year with hourly time step.
- Outputs:
a. Load forecasts for the next 24-48h with hourly resolution for every MV customer;
aggregated load demand at the MV/LV substation.
b. Time resolution: 1h
c. Forecast update step: on demand (if needed in the IDE4L project) and also daily at a fixed
schedule hour (e.g. 00:00 GMT).
d. Forecast horizon: from 1h up to 24h – 48h
2) Very short-term load forecasts: The algorithm will generate load forecasts for the next 10-30
minutes and, as required in the IDE4L project, it will be run on demand.
- Inputs:
IDE4L Deliverable 5.1
19 IDE4L is a project co-funded by the European Commission
a. Historical load measurements for every MV customer; aggregated load demand at the
MV/LV substation. Minimal amount of data needed is the last 168 hours with 10 minutes
time step.
- Outputs:
a. Load forecasts for the next 10-30 minutes for every MV customer; aggregated load demand
at the MV/LV substation.
b. Time resolution: 10-minutes
c. Forecast update step: on demand (as required in the IDE4L project)
d. Forecast horizon: from 10 minutes up to 30 minutes.
3.3 Production Forecasting
3.3.1 Low Voltage Production Forecasting
In the LV production forecaster core there is a time series production forecaster that uses measurements
(LV DG production, weather data), weather forecasts (from a local weather station) and flexible demand
schedule.
The main program running the production forecaster can be divided into 4 distinct steps:
1) Real-time measurement reading
2) Real-time measurement filtering
3) Production forecasting
4) Exporting of production forecasting results
Depending on the resolution and time horizon for forecasts needed in the IDE4L project, the LV production
forecaster will be able to provide both 1) very short-term forecasts (up to 30 minutes ahead with 10
minutes resolution) and 2) short-term forecasts (up to 48h with hourly resolution).
1) Short-term production forecasts: The algorithm will give production forecasts for the next 24-48h
and it will run on demand (as required in the IDE4L project) or by schedule, e.g. every day at 00:00
GMT.
- Inputs:
a. Day-ahead forecast (24h-48h) from a local weather station. Data required are: wind speed
and direction, solar irradiance, temperature, humidity and pressure with hourly time step.
b. Last year weather measurements from a local weather station. Data required are: Previous
year time series that includes wind speed and direction, solar irradiance, temperature,
humidity and pressure with hourly time step.
c. Historical production measurements for every LV customer, LV DG production. Minimal
amount of data needed is the last year with hourly time step.
IDE4L Deliverable 5.1
20 IDE4L is a project co-funded by the European Commission
- Outputs:
a. Production forecasts for the next 24-48h for every LV customer, LV DG production.
b. Time resolution: 1h
c. Forecast update step: on demand (if needed in the IDE4L project) and also daily at a fixed
schedule hour (e.g. 00:00 GMT).
d. Forecast horizon: from 1h up to 24h – 48h
2) Very short-term production forecasts: The algorithm will generate production forecasts for the next
10-30 minutes and, as required in the IDE4L project, it will be run on demand.
- Inputs:
a. Historical production measurements for every LV customer; LV DG production. Minimal
amount of data needed is the last 168 hours with 10 minutes time step.
- Outputs:
a. Production forecasts for the next 10-30 minutes for every LV customer; LV DG production.
b. Time resolution: 10-minutes
c. Forecast update step: on demand (as required in the IDE4L project)
d. Forecast horizon: from 10 minutes up to 30 minutes.
3.3.2 Medium Voltage Production Forecasting
In the MV production forecaster core there is a time series production forecaster that uses measurements
(MV DG production, weather data), weather forecasts (from a local weather station) and flexible demand
schedule.
The main program running the production forecaster can be divided into 4 distinct steps:
1) Real-time measurement reading
2) Real-time measurement filtering
3) Production forecasting
4) Exporting of production forecasting results
Depending on the resolution and time horizon for forecasts needed in the IDE4L project, the MV production
forecaster will be able to provide both 1) very short-term forecasts (up to 30 minutes ahead with 10
minutes resolution) and 2) short-term forecasts (up to 48h with hourly resolution).
1) Short-term production forecasts: The algorithm will give production forecasts for the next 24-48h
and it will run on demand (as required in the IDE4L project) or by schedule, e.g. every day at 00:00
GMT.
- Inputs:
a. Day-ahead forecast (24h-48h with hourly resolution) from a local weather station. Data
required are: wind speed and direction, solar irradiance, temperature, humidity and
pressure with hourly time step.
IDE4L Deliverable 5.1
21 IDE4L is a project co-funded by the European Commission
b. Last year weather measurements from a local weather station. Data required are: Previous
year time series that includes wind speed and direction, solar irradiance, temperature,
humidity and pressure with hourly time step.
c. Historical production measurements for every MV customer; aggregated production at the
MV/LV substation, MV DG production. Minimal amount of data needed is the last year with
hourly time step.
- Outputs:
a. Production forecasts for the next 24-48h with hourly resolution for every MV customer;
aggregated production at the MV/LV substation, MV DG production.
b. Time resolution: 1h
c. Forecast update step: on demand (if needed in the IDE4L project) and also daily at a fixed
schedule hour (e.g. 00:00 GMT).
d. Forecast horizon: from 1h up to 24h – 48h
2) Very short-term production forecasts: The algorithm will generate production forecasts for the next
10-30 minutes and, as required in the IDE4L project, it will be run on demand.
- Inputs:
a. Historical production measurements for every MV customer; aggregated production at the
MV/LV substation, MV DG production. Minimal amount of data needed is the last 168
hours with 10 minutes time step.
- Outputs:
a. Production forecasts for the next 10-30 minutes for every MV customer; aggregated
production at the MV/LV substation, MV DG production.
b. Time resolution: 10-minutes
c. Forecast update step: on demand (as required in the IDE4L project)
d. Forecast horizon: from 10 minutes up to 30 minutes.
IDE4L Deliverable 5.1
22 IDE4L is a project co-funded by the European Commission
4 Algorithms
4.1 State Estimation This chapter describes the state estimation algorithm used in IDE4L project. The same algorithm is used for
both MV and LV network state estimation. As is recommended in Chapter 2.1, a WLS estimator which uses
branch currents as state variables is selected for this task.
4.1.1 Formulation for Branch Current Based State Estimation
Branch current based state estimators use line branch currents as state variables. If the network topology,
line parameters and source bus voltage are all known, the network state can be fully defined with complex
branch currents. The complex branch currents can be expressed either in polar or in rectangular form. In
this work the polar expression is chosen and thereby the state variables are the branch current magnitudes
and angles. All other measurable network variables; node voltages, power injections and line power flows
can be calculated from these variables.
4.1.1.1 Basic WLS Formulas
In WLS estimation, the goal is to minimize the weighted differences between measured network variables
and their estimated values. The most likely state of the network can be calculated by solving equation
4.1.1.
min𝒙 𝐽(𝒙) = 𝑚𝑖𝑛𝑥 ∑[𝑧𝑖 − ℎ𝑖(𝒙)]2
𝜎𝑖2
𝑁𝑚
𝑖=1
, (4.1.1)
where 𝐽(𝒙) is the cost function to be minimized a.k.a the weighted least square equation
𝒙 is the state vector that contains all state variables
𝑧𝑖 is value of measurement i
ℎ𝑖(𝒙) is measured variable i as a function of state variables
𝜎𝑖2 is variance of measurement i
𝑁𝑚 is number of measurements
If measurements and measurement functions are presented in vector form and measurement variances are
presented in a matrix form, the equation 4.1.1 can be expressed in a simpler form as is done in equation
4.1.2 [Abur2004].
min𝒙 𝐽(𝒙) = [𝒛 − 𝒉(𝒙)]𝑇𝑹−1[𝒛 − 𝒉(𝒙)], (4.1.2)
where 𝒛 = [
𝑧1
𝑧2
⋮𝑧𝑁𝑚
] (measurement vector)
𝒉(𝒙) =
[
ℎ1(𝒙)
ℎ2(𝒙)⋮
ℎ𝑁𝑚(𝒙)]
(measurement functions)
IDE4L Deliverable 5.1
23 IDE4L is a project co-funded by the European Commission
𝑹 =
[ 𝜎1
2 0 ⋯ 0
0 𝜎22 ⋯ 0
⋮ ⋮ ⋱ ⋮0 0 ⋯ 𝜎𝑁𝑚
2]
(covariance matrix)
The minimum of cost function 𝐽(𝒙) can be found by differentiating it and searching for the zero point. The
cost function derivative in respect to state vector 𝒙 is equal to its gradient. Therefore, the state vector
minimizing the cost function, forces the gradient to zero. The gradient of 𝐽(𝒙) is given in equation 4.1.3.
∇𝐽(𝒙) = −2𝑯𝑇𝑹−1𝒛 + 2𝑯𝑇𝑹−1𝑯𝒙, (4.1.3)
where 𝑯 = [𝜕𝒉(𝒙)
𝜕𝒙] (Jacobian matrix)
When gradient is zero, we can solve 𝒙 from equation 4.1.4.
𝒙 = (𝑯𝑇𝑹−1𝑯)−1𝑯𝑇𝑹−1𝒛 (4.1.4)
Since equation 4.1.4 is non-linear, solving the state vector 𝒙 requires the use of iterative methods, such as
the Newton-Raphson method. On every iteration round, a linearized approximation of the state vector
change ∆𝒙, shown in equation 4.1.5, is added to the initial state vector value. The iteration is continued
until ∆𝒙 is small enough [Abur2004].
∆𝒙 = (𝑯𝑇𝑹−1𝑯)−1𝑯𝑇𝑹−1[𝒛 − 𝒉(𝒙)] (4.1.5)
4.1.1.2 Equality Constrained WLS Estimation
In WLS estimation, measurements are weighted according to their accuracies. Load models are used as pseudo measurements and they are given low weights. Real-time measurements are given high weights and zero-injection measurements are given very high weights (if there is no load or production connected to a certain node, power injection on that node is known to be zero).
The combination of high and low weights can cause the gain matrix (𝑯𝑇𝑹−1𝑯) to become ill-conditioned. Gain matrix ill-conditioning reduces state estimation accuracy and it can in the worst case it can prevent gain matrix inversion. In order to avoid these ill-conditioning problems we use equality constraints to force the zero-injection measurements to zero instead of giving them very high weights. The equality constrained WLS problem can be solved by using the method of Lagrange multipliers [Wu1988]. In the method of Lagrange multipliers the constrained minimization problem is solved by minimizing the Lagrangian function
𝐿(𝒙, 𝝀) =1
2[𝒛 − 𝒉(𝒙)]𝑇𝑹−1[𝒛 − 𝒉(𝒙)] + 𝝀𝑇𝒄(𝒙) (4.1.6)
where 𝒙 is the state vector
𝝀 is the Lagrange multiplier vector 𝒛 is the measurement vector 𝒉(𝒙) is the measurement function
𝑹 is the covariance matrix (𝑹 = diag[𝜎12 𝜎2
2 ⋯ 𝜎𝑁2] where 𝜎𝑖
2 is the variance of the measurement 𝑖)
𝒄(𝒙) is the zero-injection measurement function.
IDE4L Deliverable 5.1
24 IDE4L is a project co-funded by the European Commission
The minimization problem can be solved by differentiating 𝐿(𝒙, 𝝀) partially with respect to 𝒙 and 𝝀 and setting the differentials to zero. This yields the following equations:
𝜕𝐿(𝒙, 𝝀)
𝜕𝒙= −𝑯𝑇𝑹−1[𝒛 − 𝒉(𝒙)] + 𝑪𝝀 = 0 (4.1.7)
𝜕𝐿(𝒙, 𝝀)
𝜕𝝀= 𝒄(𝒙) = 0 (4.1.8)
where 𝑯 =𝝏𝒉
𝝏𝒙 and 𝑪 =
𝝏𝒄
𝝏𝒙 are the Jacobian matrices.
Equations 4.1.7 and 4.1.8 form a system of equations which can be solved iteratively by the Newton–Raphson method. At each iteration, the incremental change to the state vector (Δ𝒙) is calculated with equation
[Δ𝒙𝝀
] = [𝑯𝑇𝑹−1𝑯 𝑪𝑇
𝑪 0]−1
[𝑯𝑇𝑹−1[𝒛 − 𝒉(𝒙)]
−𝒄(𝒙)]. (4.1.9)
4.1.1.3 Jacobian Matrices and Measurement Equations
Active power flow, reactive power flow, current flow, node voltage, current injection, active power
injection and reactive power injection measurements can all be used in the developed branch current
based state estimator. Measurements, their symbols and measurement equations are shown in table 4.1.1.
Table 4.1.1. Measurement equations
Symbol Measurement description Measurement equation
𝑃𝑘𝑚 Active power flow between nodes k
and m 𝑃𝑘𝑚 = 𝑟𝑒𝑎𝑙(��𝑘(𝐼��𝑚)∗)
𝑄𝑘𝑚 Reactive power flow between nodes k
and m 𝑄𝑘𝑚 = 𝑖𝑚𝑎𝑔(��𝑘(𝐼��𝑚)∗)
𝐼𝑘𝑚 Current flow between nodes k and m 𝐼𝑘𝑚 = |𝐼��𝑚|
Where B is the group of upper side nodes connected to node k
And D is the group of lower side nodes connected to node k
𝑃𝑘 Active power injection at node k 𝑃𝑘 = 𝑟𝑒𝑎𝑙 (∑��𝑘(𝐼��𝑘)∗
𝑖∈𝐵
− ∑��𝑘(𝐼��𝑗)∗
𝑗∈𝐷
)
𝑄𝑘 Reactive power injection at node k 𝑄𝑘 = 𝑖𝑚𝑎𝑔 (∑��𝑘(𝐼��𝑘)∗
𝑖∈𝐵
− ∑��𝑘(𝐼��𝑗)∗
𝑗∈𝐷
)
The Jacobian matrices H and C contain partial derivates for these measurements in respect to the state
variables Ikm and ∝ as show in equations 4.1.10 and 4.1.11.
IDE4L Deliverable 5.1
25 IDE4L is a project co-funded by the European Commission
𝑯 =
[ 𝜕𝑃𝑘𝑚,1
𝜕𝐼1⋯
𝜕𝑃𝑘𝑚,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝑃𝑘𝑚,𝐿
𝜕𝐼1⋯
𝜕𝑃𝑘𝑚,𝐿
𝜕𝐼𝑁
𝜕𝑃𝑘𝑚,1
𝜕𝛼1
⋯𝜕𝑃𝑘𝑚,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝑃𝑘𝑚,𝐿
𝜕𝛼1
⋯𝜕𝑃𝑘𝑚,𝐿
𝜕𝛼𝑁
𝜕𝑄𝑘𝑚,1
𝜕𝐼1⋯
𝜕𝑄𝑘𝑚,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝑄𝑘𝑚,𝐿
𝜕𝐼1⋯
𝜕𝑄𝑘𝑚,𝐿
𝜕𝐼𝑁
𝜕𝑄𝑘𝑚,1
𝜕𝛼1
⋯𝜕𝑄𝑘𝑚,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝑄𝑘𝑚,𝐿
𝜕𝛼1
⋯𝜕𝑄𝑘𝑚,𝐿
𝜕𝛼𝑁
𝜕𝐼𝑘𝑚,1
𝜕𝐼1⋯
𝜕𝐼𝑘𝑚,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝐼𝑘𝑚,𝐿
𝜕𝐼1⋯
𝜕𝐼𝑘𝑚,𝐿
𝜕𝐼𝑁
𝜕𝐼𝑘𝑚,1
𝜕𝛼1
⋯𝜕𝐼𝑘𝑚,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝐼𝑘𝑚,𝐿
𝜕𝛼1
⋯𝜕𝐼𝑘𝑚,𝐿
𝜕𝛼𝑁
𝜕𝑉𝑘,1
𝜕𝐼1 ⋯
𝜕𝑉𝑘,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝑉𝑘,𝐿
𝜕𝐼1⋯
𝜕𝑉𝑘,𝐿
𝜕𝐼𝑁
𝜕𝑉𝑘,1
𝜕𝛼1
⋯𝜕𝑉𝑘,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝑉𝑘,𝐿
𝜕𝛼1
⋯𝜕𝑉𝑘,𝐿
𝜕𝛼𝑁
𝜕𝐼𝑘,1
𝜕𝐼1⋯
𝜕𝐼𝑘,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝐼𝑘,𝐿
𝜕𝐼1⋯
𝜕𝐼𝑘,𝐿
𝜕𝐼𝑁
𝜕𝐼𝑘,1
𝜕𝛼1
⋯𝜕𝐼𝑘,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝐼𝑘,𝐿
𝜕𝛼1
⋯𝜕𝐼𝑘,𝐿
𝜕𝛼𝑁
𝜕𝑃𝑘,1
𝜕𝐼1⋯
𝜕𝑃𝑘,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝑃𝑘,𝐿
𝜕𝐼1⋯
𝜕𝑃𝑘,𝐿
𝜕𝐼𝑁
𝜕𝑃𝑘,1
𝜕𝛼1
⋯𝜕𝑃𝑘,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝑃𝑘,𝐿
𝜕𝛼1
⋯𝜕𝑃𝑘,𝐿
𝜕𝛼𝑁
𝜕𝑄𝑘,1
𝜕𝐼1⋯
𝜕𝑄𝑘,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝑄𝑘,𝐿
𝜕𝐼1⋯
𝜕𝑄𝑘,𝐿
𝜕𝐼𝑁
𝜕𝑄𝑘,1
𝜕𝛼1
⋯𝜕𝑄𝑘,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝑄𝑘,𝐿
𝜕𝛼1
⋯𝜕𝑄𝑘,𝐿
𝜕𝛼𝑁 ]
(4.1.10)
𝑪(𝒙) =
[ 𝜕𝑃𝑘,1
𝜕𝐼1⋯
𝜕𝑃𝑘,1
𝜕𝐼𝑁𝜕𝑄𝑘,1
𝜕𝐼1⋯
𝜕𝑄𝑘,1
𝜕𝐼𝑁⋮ ⋱ ⋮
𝜕𝑃𝑘,𝑀
𝜕𝐼1⋯
𝜕𝑃𝑘,𝑀
𝜕𝐼𝑁𝜕𝑄𝑘,𝑀
𝜕𝐼1⋯
𝜕𝑄𝑘,𝑀
𝜕𝐼𝑁
𝜕𝑃𝑘,1
𝜕𝛼1
⋯𝜕𝑃𝑘,1
𝜕𝛼𝑁
𝜕𝑄𝑘,1
𝜕𝛼1
⋯𝜕𝑄𝑘,1
𝜕𝛼𝑁
⋮ ⋱ ⋮𝜕𝑃𝑘,𝑀
𝜕𝛼1
⋯𝜕𝑃𝑘,𝑀
𝜕𝛼𝑁
𝜕𝑄𝑘,𝑀
𝜕𝛼1
⋯𝜕𝑄𝑘,𝑀
𝜕𝛼𝑁 ]
(4.1.11)
where 𝐿 is the number of each type of measurements
𝑀 is the number of zero-injection nodes
𝑁 is the number of line sections (and corresponding line current flows)
IDE4L Deliverable 5.1
26 IDE4L is a project co-funded by the European Commission
The partial derivatives shown in equations 4.1.10 and 4.1.11 are given below:
1) Power flow measurements: When branch power flow measurements are in the same line segments as
the state variable, the partial derivatives are:
𝜕𝑃𝑘𝑚
𝜕𝐼𝑘𝑚
= 𝑉𝑘 cos(𝛿𝑘 − 𝛼𝑘𝑚) = 𝑃𝑘𝑚
𝐼𝑘𝑚
(4.1.12)
𝜕𝑃𝑘𝑚
𝜕𝛼𝑘𝑚
= 𝑉𝑘 Ikmsin(𝛿𝑘 − 𝛼𝑘𝑚) = 𝑄𝑘𝑚 (4.1.13)
𝜕𝑄𝑘𝑚
𝜕𝐼𝑘𝑚
= 𝑉𝑘 sin(𝛿𝑘 − 𝛼𝑘𝑚) = 𝑄𝑘𝑚
𝐼𝑘𝑚
(4.1.14)
𝜕𝑄𝑘𝑚
𝜕𝛼𝑘𝑚
= −𝑉𝑘 Ikmcos(𝛿𝑘 − 𝛼𝑘𝑚) = −𝑃𝑘𝑚 (4.1.15)
where 𝑉𝑘 is the voltage at node k
𝛿𝑘 is the voltage angle at node k
𝛼𝑘𝑚 is the current angle at line km.
Otherwise, when the measurement and the state variable are not in the same line segment, all the
partial derivatives are zeros.
2) Current magnitude measurements: When the current magnitude measurement is in the same line
segments as the state variable, the partial derivative with respect to current magnitude is one. If the
current measurement and state variable are on different line section, the partial derivative is zero.
Partial derivatives with respect to current angles are always zeros.
𝜕𝐼𝑘𝑚
𝜕𝐼𝑠𝑡= {
1, 𝑖𝑓 𝑠𝑡 = 𝑘𝑚 0, 𝑖𝑓 𝑠𝑡 ≠ 𝑘𝑚
(4.1.16)
𝜕𝐼𝑘𝑚
𝜕𝛼𝑠𝑡
= 0 (4.1.17)
where 𝐼𝑠𝑡 is the state variable corresponding to current magnitude at line st
𝛼𝑠𝑡 is the state variable corresponding to current angle at line st.
3) Voltage magnitude measurements: When the source node voltage and line currents are known,
voltage at any point of the network can be calculated by subtracting the voltage losses that happen
between source node (node 1) and the studied node k from the source node voltage. In a radial feeder,
the voltage at node k can be calculated with equation 4.1.18 assuming the feeder nodes have been
numbered as in Figure 4.1.1.
��𝑘 = ��1 − ∑𝐼��−1,𝑖��𝑖−1,𝑖
𝑘
𝑖=2
(4.1.18)
IDE4L Deliverable 5.1
27 IDE4L is a project co-funded by the European Commission
1 2 3 k-1 k
nodes
Figure 4.1.1. Node numbering on a radial feeder.
The Jacobian matrix elements related to voltage magnitude measurements can be divided into two
groups. The first group contains elements that are between node 1 and measured node k. Then the
partial derivatives are:
𝜕𝑉𝑘
𝜕𝐼𝑖−1,𝑖
= −cos 𝛿𝑘 ∙ 𝑍𝑖−1,𝑖 cos(𝛼𝑖−1,𝑖 + 𝜃𝑖−1,𝑖)
− sin 𝛿𝑘 ∙ 𝑍𝑖−1,𝑖 sin(𝛼𝑖−1,𝑖 + 𝜃𝑖−1,𝑖) (4.1.19)
𝜕𝑉𝑘
𝜕𝛼𝑖−1,𝑖
= cos 𝛿𝑘 ∙ 𝐼𝑖−1,𝑖 𝑍𝑖−1,𝑖 sin(𝛼𝑖−1,𝑖 + 𝜃𝑖−1,𝑖)
− sin 𝛿𝑘 ∙ 𝐼𝑖−1,𝑖𝑍𝑖−1,𝑖 cos(𝛼𝑖−1,𝑖 + 𝜃𝑖−1,𝑖) (4.1.20)
where 𝐼𝑖−1,𝑖 is the current magnitude on line that goes from node i−1 to node i, where i
belongs to a group of nodes that are between nodes 1 and k (node k belongs
to this group, node 1 is excluded from this group)
𝛼𝑖−1,𝑖 is the current angle on the line that goes from node i−1 to node i
𝑍𝑖−1,𝑖 is the impedance on the line that goes from node i−1 to node i
𝜃𝑖−1,𝑖 is the impedance angle on the line that goes from node i−1 to node i
𝛿𝑘 is the voltage angle at node k.
The second group contains elements that are not between node 1 and measured node. All partial derivatives in this group are zeros.
4) Current injection measurements: When the state variable is connected to a line segment feeding the
measured node (t=k), the current injection measurement partial derivative with respect to branch
current magnitude is one. If the state variable is connected to a line segment below the measured node
(s=k), then the partial derivative is minus one. The partial derivative is zero always when the state
variable is connected to a line segment that does not connect to the measured node (𝑠 ≠ 𝑘 𝑎𝑛𝑑 𝑡 ≠
𝑘). Partial derivatives with respect to the current angle are always zero.
𝜕𝐼𝑘𝜕𝐼𝑠𝑡
= {
1, 𝑖𝑓 𝑡 = 𝑘 −1, 𝑖𝑓 𝑠 = 𝑘
0, 𝑖𝑓 𝑠 ≠ 𝑘 𝑎𝑛𝑑 𝑡 ≠ 𝑘 (4.1.21)
𝜕𝐼𝑘𝜕𝛼𝑠𝑡
= 0 (4.1.22)
IDE4L Deliverable 5.1
28 IDE4L is a project co-funded by the European Commission
5) Power injection measurements: The Jacobian matrix elements related to power injection measurements can be divided into three groups. When a line segment is connected to the measured node and is feeding it, the partial derivatives are:
𝜕𝑃𝑘
𝜕𝐼𝑠𝑘= 𝑉𝑘 cos(𝛿𝑘 − 𝛼𝑠𝑘) (4.1.23)
𝜕𝑃𝑘
𝜕𝛼𝑠𝑘
= 𝑉𝑘 Isksin(𝛿𝑘 − 𝛼𝑠𝑘) (4.1.24)
𝜕𝑄𝑘
𝜕𝐼𝑠𝑘= 𝑉𝑘 sin(𝛿𝑘 − 𝛼𝑠𝑘) (4.1.25)
𝜕𝑄𝑘
𝜕𝛼𝑠𝑘
= −𝑉𝑘 Iskcos(𝛿𝑘 − 𝛼𝑠𝑘) (4.1.26)
When a line segment is connected to the measured node and is below it, the partial derivatives are:
𝜕𝑃𝑘
𝜕𝐼𝑘𝑚
= −𝑉𝑘 cos(𝛿𝑘 − 𝛼𝑘𝑚) (4.1.27)
𝜕𝑃𝑘
𝜕𝛼𝑘𝑚
= −𝑉𝑘 Ikmsin(𝛿𝑘 − 𝛼𝑘𝑚) (4.1.28)
𝜕𝑄𝑘
𝜕𝐼𝑘𝑚
= −𝑉𝑘 sin(𝛿𝑘 − 𝛼𝑘𝑚) (4.1.29)
𝜕𝑄𝑘
𝜕𝛼𝑘𝑚
= 𝑉𝑘 Ikmcos(𝛿𝑘 − 𝛼𝑘𝑚) (4.1.30)
In equations 4.1.23 – 4.1.26, subscript s can be any node that is above the node k and is connected to it with a line. In equations 4.1.27 – 4.1.30, subscript m can be any node that is below the node k and is connected to it with a line. When a line segment is not connected to the measured node, all partial derivatives are zeros [Wang
2004].
4.1.1.4 Bad Data Detection
In IDE4L project, all the input data given to the state estimator has gone through a filter that filters out bad
measurements. However, to make absolutely sure that bad measurements are not used in state estimation,
bad data detection is added also to the state estimator. If undetected, bad data will corrupt the state
estimation results and can in some cases prevent the state estimator convergence.
Measurements may contain errors due to various reasons. Meters can have biases, drifts or wrong
connections. Telecommunication system failures can also lead to large deviations in recorded
measurements. Some measurement errors are easy to detect with simple logical rules. For example,
negative voltage and current magnitudes and measurements, which are several orders of magnitude larger
or smaller than expected, are easily recognized as bad data. In our state estimation algorithm, this kind of
rough bad data detection is done right in the beginning of the algorithm. Unfortunately, not all types of bad
data are detected that easily. However, in more indistinct cases, other detection methods can be utilized.
IDE4L Deliverable 5.1
29 IDE4L is a project co-funded by the European Commission
In WLS state estimation, bad data detection can be made by examining the measurement residuals. This
has to be done after the estimation process. The bad data detection is essentially based on the statistical
properties of the residuals. One of the most used bad data detection methods is the Largest Normalized
Residual 𝑟𝑚𝑎𝑥𝑁 -test. This test is composed of the following steps [Abur2004]:
1) Solve the WLS estimation and obtain the elements of the measurement residual vector (𝒓):
𝒓 = 𝒛 − 𝒉(𝒙) (4.1.31)
2) Compute the normalized residuals (𝒓𝑁):
𝒓𝑁 =|𝒓|
√𝛀𝐢𝐢
, (4.1.32)
where 𝛀𝐢𝐢 is 𝑑𝑖𝑎𝑔(𝛀)
𝛀 is 𝐶𝑜𝑣(𝒓).
3) Find the largest normalized residual (𝒓𝑚𝑎𝑥𝑵 ).
4) If 𝒓𝑚𝑎𝑥𝑁 > 𝑐, then the corresponding measurement is erroneous. Here, c is the chosen detection
threshold, usually 3.0 (3σ threshold, i.e. all data that is more than three standard deviations away
from the expected value is labelled as bad data).
5) If bad data is detected, eliminate the faulty measurement from the measurement set and go back
to step 1.
The faulty measurements are eliminated one by one. After each elimination, WLS state estimation
procedure is repeated.
The largest normalized residual test can detect bad data if the removal of the corresponding measurement
does not render the system unobservable. It is possible to identify all cases of single bad data where the
faulty measurements are not critical or belong to a critical pair or critical k-tuple. Critical measurements are
those measurements whose removal would cause the system to become unobservable. A critical pair and
k-tuple contain two or more measurements, respectively, whose simultaneous removal would make the
system unobservable.
In the case of multiple bad data, only part of the measurement errors can be identified. Faulty
measurements with weakly correlated measurement residuals can be identified. If the measurement
residuals are strongly correlated, the bad data can be identified only in the case of non-conforming bad
data. If the identification of faulty measurement fails, the largest normalized residual test can incorrectly
remove a faultless measurement.
Because our state estimator is based on equality constrained WLS estimation, the measurement residual
covariance matrix cannot be solved as shown in [Abur2004]. Solution for this problem can be found from
[Wu1988]. In equality constrained state estimation the measurement residual covariance matrix 𝛀 is equal
to
𝐶𝑜𝑣(𝒓) = 𝑹−𝟏 − 𝑯𝑬𝟏𝑯𝑻, (4.1.33)
IDE4L Deliverable 5.1
30 IDE4L is a project co-funded by the European Commission
where 𝑬𝟏 is the upper left corner of the inverse of 𝑭.
𝑭−1 = [𝑯𝑇𝑹−1𝑯 𝑪𝑇
𝑪 0]−1
= [𝑬𝟏 𝑬𝟐
𝑇
𝑬𝟐 𝑬𝟑], (4.1.34)
where 𝑪 is the Jacobian matrix of the equality constraint function.
The problem with measurement residual based bad data detection is that it requires a certain amount of
redundancy from the measurement configuration. In distribution networks, the number of measurements
and thus also the redundancy level is very limited. Considering traditional distribution network
measurement setup, real-time power flow measurements only in the beginning of the feeder and load
pseudo measurements, we cannot identify bad data unambiguously. We can only detect that bad data exist
and it is either in the feeder power flow measurement or in one of the load pseudo measurements. In this
situation we use the pseudo measurements to set plausible limits to the real-time measurements and if the
real-time measurement is outside these limits, then it is interpreted as bad data. When using this approach,
the bad data detection threshold should be raised from the commonly used 3σ threshold and accurate load
models should be used as pseudo measurements [Mutanen2011]. In IDE4L project the measurement
redundancy will be higher than in traditional distribution systems and bad data detection will be more
useful.
Sometimes bad data can cause non-convergence to the state estimation algorithm. If the algorithm does
not converge, the measurement residuals cannot be calculated. In this case the solution is to remove all
real-time measurements and calculate state estimation using only pseudo measurements. After the pseudo
measurement based state estimates have been calculated, the largest normalized residual test is used to
evaluate the real-time measurements. Measurement with the largest normalized residual is identified as
bad data and removed from the measurement set. Then the state estimation is run again. This procedure is
repeated until all erroneous measurements are removed and the state estimator converges.
4.1.2 Algorithm Steps
The state estimation algorithm description can be divided into two levels; general level and detailed level.
The algorithm has been described on general level shortly in chapter 3.1 and in more detail in design
specification documents [IDE4L2014a] and [IDE4L2014b]. Here we will focus on describing what happens
inside the state estimation block that is step 6 in general description (see Table 3.1.1 and Figure 3.1.1) and
contains the actual state estimator that does the WLS calculation.
The state estimation algorithm has been implemented as an Octave function. The algorithm flow chart has
been given in Figure 4.1.3 and the 13 distinct steps have been described below.
1) Input validity check: The state estimation algorithm inputs go through a simple validation process. The
purpose of input validation is to filter out coarse errors with simple logical rules. For example, negative
current magnitude measurements and node voltage measurements that are twice as large as the
nominal voltage are labelled as bad data and are removed.
2) Branch current calculation: Initial branch currents are calculated using the load pseudo measurements
provided as inputs. Backward sweep algorithm is used to calculate the branch currents. Backward
sweep starts from the end of the network and sums branch currents according to Kirchhoff’s current
law as it progresses upwards. This has been illustrated in Figure 4.1.2a.
IDE4L Deliverable 5.1
31 IDE4L is a project co-funded by the European Commission
The line charging capacitances are taken into account when calculating the branch currents. π-model is
used to model the lines. As can be seen from Figure 4.1.3, the branch current Ikm has three different
values depending weather it is measured from the beginning, middle or end of the line. In this work,
the current at the beginning of the line is chosen as state variable. The charging currents are calculated
with equations 4.1.35 and 4.1.36.
Figure 4.1.3. Line π-model.
𝐼𝐶,𝑘𝑚,𝑘 = 𝑗𝑉𝑘𝐵𝐶,𝑘𝑚
2= 𝑗𝑉𝑘𝜋𝑓𝐶𝑘𝑚 , (4.1.35)
𝐼𝐶,𝑘𝑚,𝑚 = 𝑗𝑉𝑚𝐵𝐶,𝑘𝑚
2= 𝑗𝑉𝑚𝜋𝑓𝐶𝑘𝑚 , (4.1.36)
where 𝑉𝑘 is the node 𝑘 voltage
𝑉𝑚 is the node 𝑚 voltage
𝐵𝐶,𝑘𝑚 is the capacitive susceptance on line section 𝑘–𝑚
𝑓 is the system frequency
𝐶𝑘𝑚 is the capacitance on line section 𝑘–𝑚
3) Node voltage calculation: Initial node voltages are calculated using the forward sweep method. Here
the previously calculated mid-branch currents are used to calculate the voltage losses on each line
section. The calculation starts from node number 1 and continues downwards as depicted in Figure
4.1.2b.
4) Covariance matrix formation: Measurement covariance matrix is formed from the input measurement
accuracies. The covariance matrix is a diagonal matrix and the diagonal elements correspond to the
accuracy of each measurement (pseudo measurements included).
IDE4L Deliverable 5.1
32 IDE4L is a project co-funded by the European Commission
5) Measurement vector formation: The provided measurements are collected into a measurement
vector. The order of measurements is same as in Table 4.1.1 and the same order is applied throughout
this algorithm.
6) State variable vector formation: The state variable vector is formed from the previously calculated
branch currents. The N first elements are branch current magnitudes and the elements from N+1 to 2N
are branch current angles.
7) Jacobian matrix calculation: Subfunction Jacobian.m calculates the Jacobian matrices H and C
described in equations 4.1.10 and 4.1.11. Also the measurement function values h(x) and equality
constraint function values c(x) are calculated inside this subfunction.
8) Calculation of ∆x: Corrections to the state variable vector are calculated using the equation 4.1.9.
9) State variable vector update: The state variable vector is updated by summing the previously
calculated corrections ∆x to it.
10) Mid-line current calculation: First, the state variable vector is converted into a corresponding branch
current vector. Then the currents at the middle of each line are calculated by adding appropriate
charging currents.
11) Node voltage calculation: The node voltages are recalculated using the forward sweep.
12) Bad data detection: Once the largest value in correction vector ∆x falls below the pre-set threshold ε,
the algorithm exits from the first loop and starts bad data detection. The bad data detection is done
using the Largest Normalized Residual 𝑟𝑚𝑎𝑥𝑁 –test as described in chapter 4.1.1.4. If bad data is
detected, it is removed and the algorithm returns to step 4.
13) Output calculation: The other required state estimation function outputs are calculated from the
branch currents and node voltages.
IDE4L Deliverable 5.1
33 IDE4L is a project co-funded by the European Commission
1. Check input validity
2. Calculate branch currents backward_sweep.m
3. Calculate node voltages forward_sweep.m
4. Form covariance matrix R covariance_matrix.m
5. Form measurement vector z
7. Calculate:
Jacobian matrices H and C
measurement values h
equality constaint values c
6.Form state variable vector
Jacobian.m
8. Calculate correction ∆x using
the Lagrange method
9. Add correction ∆x to state
variable vector
11. Calculate node voltages
Is correction ∆x<ε
forward_sweep.m
Ready
NO
YES
Main function
Subfunctions
measurement_vector.m
10. Calculate currents in the
middle of the line
12. Bad data detection bad_data_detection.m
Bad data detected
NO
YES
Remove bad data
13. Calculate outputs
Figure 4.1.3. State estimation algorithm flow chart.
IDE4L Deliverable 5.1
34 IDE4L is a project co-funded by the European Commission
4.1.3 Inputs and Outputs
This chapter describes the inputs and outputs for the state estimation algorithm core function
state_estimator.m that contains the functionalities described in chapter 4.1.1. Some additional support
functions are needed to format the information read from PSAU and SSAU databases into a format
understood by the state estimation core. Some of these support functions are described in Chapter 4.1.5.
Some are still under development and will be finalized during the algorithm testing phase.
This chapter describes inputs and outputs only for the state estimation block presented in chapter 4.1.2.2.
Inputs and outputs for the whole state estimation procedure have been described in chapter 3.1 and in the
design specification documents [IDE4L2014a] and [IDE4L2014b].
4.1.3.1 Inputs
The state estimation main function state_estimator.m shown in Figure 4.1.3 requires the following inputs:
1) Bus matrix, containing the bus numbers and load and production estimates or measurements for
each bus.
2) Line matrix, containing resistance, impedance and capacitive susceptance for each line section.
3) Line active power flow measurement matrix
4) Line reactive power flow measurement matrix
5) Line current flow measurement matrix
6) Bus current injection measurement matrix
7) Bus voltage measurement matrix.
The bus matrix contains as many rows as there are nodes in the network. The columns of the bus matrix
contain the information shown in table 4.1.2. The state estimation function state_estimator.m calculates
the system state using per unit values. Therefore, the inputs in bus matrix columns 5–12 are given in per
unit.
Table 4.1.2. Bus matrix columns.
Column Column name Column description
1 Bus number The bus numbering has to start from 1 and the numbering grows with
increments of 1 towards the end of the line
2 Phase A connection 1 if phase A is connected, 0 if not connected
3 Phase B connection 1 if phase B is connected, 0 if not connected
4 Phase C connection 1 if phase C is connected, 0 if not connected
5 3-phase power Estimate or measurement for the 3-phase power injection in this node
6 3-phase power std Standard deviation for the estimated or measured 3-phase power injection
7 Phase A power Estimate or measurement for phase A power injection
8 Phase B power Estimate or measurement for phase B power injection
9 Phase C power Estimate or measurement for phase C power injection
10 Phase A power std Standard deviation for the estimated or measured power injection in phase A
11 Phase B power std Standard deviation for the estimated or measured power injection in phase B
12 Phase C power std Standard deviation for the estimated or measured power injection in phase C
The line matrix contains as many rows as there are line sections in the network. The columns of the line
matrix contain the information shown in table 4.1.3. The inputs in line matrix columns 3–5 are given in per
unit.
IDE4L Deliverable 5.1
35 IDE4L is a project co-funded by the European Commission
Table 4.1.3. Line matrix columns.
Column Column name Column description
1 Start node Node from which the line section starts
2 End node Node to which the line section ends
3 Line resistance Line resistance in per unit
4 Line reactance Line reactance in per unit
5 Line capacitive susceptance Line capacitive susceptance in per unit
6 Phase A connection 1 if line for phase A exists, 0 if not
7 Phase A connection 1 if line for phase B exists, 0 if not
8 Phase A connection 1 if line for phase C exists, 0 if not
The line active power, reactive power and current flow matrices contain as many rows as there are
measurements. The columns of these matrices contain the information shown in table 4.1.4. The inputs in
columns 4–9 are given in per units.
Table 4.1.4. Flow measurement matrix columns.
Column Column name Column description
1 Start node Node from which the measured line sections starts
2 End node Node to which the measured line sections ends
3 Measurement location Measurements location
If at the beginning of the line section, then Col 3 = Col 1
If at the end of the line section, then Col 3 = Col 2
4 Phase A measurement Phase A measurement value in per unit
5 Phase B measurement Phase B measurement value in per unit
6 Phase C measurement Phase C measurement value in per unit
7 Phase A measurement std Standard deviation for phase A measurement
8 Phase B measurement std Standard deviation for phase B measurement
9 Phase C measurement std Standard deviation for phase C measurement
The bus current injection and voltage measurement matrices contain as many rows as there are
measurements. The columns of these matrices contain the information shown in table 4.1.5. The inputs in
columns 2–7 are given in per unit.
Table 4.1.5. Bus current injection and voltage measurement matrix columns.
Column Column name Column description
1 Bus Measurement location
2 Phase A measurement Phase A measurement value in per unit
3 Phase B measurement Phase B measurement value in per unit
4 Phase C measurement Phase C measurement value in per unit
5 Phase A measurement std Standard deviation for phase A measurement
6 Phase B measurement std Standard deviation for phase B measurement
7 Phase C measurement std Standard deviation for phase C measurement
The minimum requirement for the state estimation to run is that both bus and line matrices exists and bus
power injection values, either load/production estimates or measurements, have been written to the bus
matrix. The load and production estimates or measurements can be either 3-phase or single-phase. Single-
phase estimates or measurements are used primarily if they exist. If single-phase values do not exist, the
IDE4L Deliverable 5.1
36 IDE4L is a project co-funded by the European Commission
phase-wise values are estimated based on the 3-phase values. The standard deviation connected to each
measurement, reflects the accuracy of that measurement. During the WLS estimation, the measurements
are weighted according to their accuracies. The other measurement matrices are optional and contain data
only if measurements of that type exist.
4.1.3.2 Outputs
The state estimation function state_estimator.m gives out the following outputs:
1) Phase-wise bus voltages
2) Phase-wise line current flows in the beginning of each line section*
3) Phase-wise line current flows in the middle of each line section*
4) Phase-wise line current flows in the end of each line section*
5) Phase-wise line power flows on each line section
6) Phase-wise power losses on each line section
7) Phase-wise power injections on each bus
* Currents in the beginning, middle and end of the line are slightly different because the lines have been
modelled with a π-model (see Figure 4.1.3).
All these outputs are complex and it is possible to extract voltage and current magnitudes or angles or
active and reactive power components from these. The outputs are given in per unit and they are later
converted into corresponding volt, ampere and kilowatt values.
4.1.4 Coordination between MVSE and LVSE
MVSE and LVSE operate independently in PSAU and SSAU (primary and secondary substation automation
units). In normal operation mode, when a measurement setup corresponds to the IDE4L concept and
secondary substation voltage is measured with a RTU, no information is changed directly between the state
estimators.
LVSE outputs the phase-wise power flows on the distribution transformer secondary. The medium voltage
network load and production forecaster (MVF) needs information on primary side power flows. To fill this
gap, a distribution transformer model is fitted between these algorithms. The calculation block containing
the distribution transformer model takes into account the distribution transformer winding configuration
and other parameters and calculates the phase-wise primary side power flows from the phase-wise
secondary side power flows and voltages. Figure 4.1.4 illustrates this relationship.
IDE4L Deliverable 5.1
37 IDE4L is a project co-funded by the European Commission
LVSEMV/LV
transformer
model
Database
syncronization
SSAU database
MV/LV transformer
secondary side
power flows and
voltages
MV/LV transformer
primary side
power flowsMV/LV transformer
primary side
power flows
MVF
PSAU database
Historical
MV/LV transformer
primary side
power flows
MV/LV transformer
primary side
power flows
Figure 4.1.4. Data transfer between LVSE and MVSE in normal operation mode.
If the secondary substation voltage is not measured, the LVSE uses MV/LV network connection point
voltage estimates given by the MVSE. The state estimators become interdependent and their relationship
can be illustrated with Figure 4.1.5.
LVSEMV/LV
transformer
modelMVSE
SSAU database
MV/LV transformer
secondary side
voltages
MV/LV transformer
primary side
voltages
MV/LV transformer
primary side
voltages
MV/LV transformer
secondary side
power flows
Figure 4.1.5. Data transfer between MVSE and LVSE when secondary substation voltage measurement is
missing.
4.1.5 Optional (not implemented) Functionalities
The following functionalities would be possible to add to a WLS estimator.
4.1.5.1 State Estimate Uncertainties
If needed, a WLS estimator can output also the state estimate uncertainties. Uncertainties for the state
variables can be extracted easily from the gain matrix. The state variable variances are found from the
diagonal of the inverted gain matrix.
The calculation of other uncertainties requires some additional work. An additional Jacobian matrix
containing partial derivates (with respect to the state variables) for all those states for which we wish to
calculate uncertainties must be formed. Using equations 4.1.12 – 4.1.30, it is possible to calculate partial
derivates to any branch power flow, branch current flow, node voltage, node current injection or node
power injection. Once the new Jacobian (𝐊) matrix has been formed, the uncertainties (variances) can be
calculated as is shown in equation 4.1.37 [Cobelo2007].
𝑣𝑎𝑟(𝑋) = 𝑑𝑖𝑎𝑔(𝑲 ∙ 𝑮−𝟏 ∙ 𝑲𝑇), (4.1.37)
where 𝐊 is the Jacobian matrix
𝑮 is the gain matrix (𝑯𝑇𝑹−1𝑯).
IDE4L Deliverable 5.1
38 IDE4L is a project co-funded by the European Commission
4.1.5.2 Calculation of Meshed Networks
The branch current based WLS estimator can be modified to calculate also (weakly) meshed networks
[Baran1995], [Lin et al. 2001], [Pau2013]. Each mesh adds a constraint on branch currents because the
Kirchhoff’s voltage law must be satisfied in every loop. Kirchhoff’s voltage law states that the sum of
voltage losses around the loop must be zero.
∑ 𝝀𝑗𝒛𝑗𝒊𝑗𝑗∈Λ = 0, (4.1.38)
where Λ is the set of branches forming the loop backbone and 𝒛𝑗 and 𝒊𝑗 are the impedance and current
phasor of the jth branch. 𝝀𝑗 is +1 or −1 depending on the reference loop direction with respect to the
branch j direction. The loop reference direction can be chosen arbitrarily. The branch j direction is the
downward direction of the branch when the network is radial. Fig. 4.1.6 visualizes this.
Reference
direction
1
2
01
13 14
15
1
6
17 18
19
Loop break point
Figure 4.1.6. Example of λ values with respect to a chosen reference direction.
The above presented conditions can be added to the WLS problem either as virtual measurements or as
equality constraints. The output of equation 4.1.35 is complex and both real and imaginary part of the
output must be zero. From this we get two measurement equations corresponding real part (4.1.39) and
imaginary part (4.1.40). These are
𝒄(1) = ∑𝝀𝑗|𝒛𝑗||𝒊𝑗|
𝑗∈Λ
cos(𝜶𝑗 + 𝜽𝑗) (4.1.39)
𝒄(2) = ∑𝝀𝑗|𝒛𝑗||𝒊𝑗|
𝑗∈Λ
sin(𝜶𝑗 + 𝜽𝑗) (4.1.40)
where 𝒛𝑗 is the line j impedance and 𝒊𝑗 is the line j current. 𝛼 and 𝜃 are the line current and impedance
angles, respectively. The corresponding Jacobian entries (partial derivates with respect to the state
variables |𝒊| and 𝜶) for equation 4.1.39 are
{
𝜕𝑐(1)
𝜕𝑖= 𝝀𝑗|𝒛𝑗| cos(𝜶𝑗 + 𝜽𝑗), if 𝑗 ∈ Λ
𝜕𝑐(1)
𝜕𝑖= 0, if 𝑗 ∉ Λ
(4.1.41)
IDE4L Deliverable 5.1
39 IDE4L is a project co-funded by the European Commission
{
𝜕𝑐(1)
𝜕𝛼= −𝝀𝑗|𝒊𝑗||𝒛𝑗| sin(𝜶𝑗 + 𝜽𝑗), if 𝑗 ∈ Λ
𝜕𝑐(1)
𝜕𝛼= 0, if 𝑗 ∉ Λ
(4.1.42)
and the Jacobian entries corresponding equation 4.1.40 are
{
𝜕𝑐(2)
𝜕𝑖= 𝝀𝑗|𝒛𝑗| sin(𝜶𝑗 + 𝜽𝑗), if 𝑗 ∈ Λ
𝜕𝑐(2)
𝜕𝑖= 0, if 𝑗 ∉ Λ
(4.1.43)
{
𝜕𝑐(2)
𝜕𝛼= 𝝀𝑗|𝒊𝑗||𝒛𝑗| cos(𝜶𝑗 + 𝜽𝑗), if 𝑗 ∈ Λ
𝜕𝑐(2)
𝜕𝛼= 0, if 𝑗 ∉ Λ
(4.1.44)
When the network voltages are calculated, a temporary break point is added to each loop, so that radial
forward-sweep can be used to calculate the voltages.
4.1.6 Performance tests
First the validity of power flow equations used in the developed state estimator is verified by comparing it
to the power flow calculation algorithm found in the Power System Toolbox [PST2015]. The Power System
Toolbox was conceived and initially developed by Dr. Kwok W. Cheung and Prof. Joe Chow from Rensselaer
Polytechnic Institute in the early 1990s. From 1993 to 2009, it was marketed, and further developed, by
Graham Rogers (formerly Cherry Tree Scientific Software), and is in use by utilities, consultants and
universities worldwide. In this comparison the state estimator is used to calculate only the power flow by
omitting all measurements.
A modified IEEE 37-bus test feeder was used in this comparison. The modifications included:
Single-phase equivalent of the original 3 phase network was calculated
All unbalanced loads were changed into balanced loads
Voltage regulator between nodes 799 and 701 was removed
The unloaded transformer connected to node 709 and the unloaded node 775 were removed and
the network was reduced to 36 buses.
The one-line diagram of the modified test feeder is shown in Figure 4.1.7.
Figure 4.1.7. One-line diagram of the modified test feeder.
IDE4L Deliverable 5.1
40 IDE4L is a project co-funded by the European Commission
The node voltages calculated with both PST and state estimator (SE) are shown in Figure 4.1.8. The
differences in calculated voltages are so small that they cannot be seen from this figure. The average
difference in calculated node voltages was only 3.0410∙10-8 and the maximum error was 3.9154∙10-8. The
dedicated load flow algorithm was approximately 10 times faster in load flow calculation than the state
estimator. When repeated 100 times, the average calculation times with this 36-bus test feeder were 2.5
and 23 milliseconds, respectively.
Figure 4.1.8. Comparison between PST load flow and developed state estimator.
The same 36-bus test feeder was used to compare the state estimation accuracy of the developed method
to a reference method. The commercially used state estimation method described in chapter 2.1.2 of the
state of the art document [IDE4L2014c] was used as a reference. The loads were assumed to be normally
distributed and loads in area 1 assumed to have 50 % relative standard deviation and the loads in area 2
were assumed to have 20 % relative standard deviation. Monte Carlo simulation with 1000 repetitions was
calculated. Each simulation contained the following steps:
1) Draw random values for the normally distributed loads
2) Calculate load flow
3) Extract feeder active and reactive power flows (at the beginning of the feeder) from the load flow
results and use these as real-time measurements in state estimation
4) Add random measurement errors corresponding to ±1 % measurement accuracy to the power flow
measurements
5) Calculate state estimation with both studied methods
6) Calculate the difference between estimated node voltages and voltages calculated in step 2.
The comparison results are shown in Figure 4.1.9. The developed state estimator has 24 % smaller
estimation errors. With this measurement setup, also the reference method could achieve this same
estimation accuracy if the simple modifications proposed in chapter 2.1.3 of the state of the art document
[IDE4L2014c] were to be implemented. However, even then the reference method could not utilize
measurements as effectively as the developed WLS-based estimation method. The developed state
estimator can for example use voltage measurements at the end of the network to improve load and
0 5 10 15 20 25 30 35 400.98
0.99
1
1.01
1.02
1.03
1.04
1.05
Node
Voltage (
p.u
.)
PST
SE
IDE4L Deliverable 5.1
41 IDE4L is a project co-funded by the European Commission
voltage estimates also in other parts of the network. Figure 4.1.10a shows how voltage measurements at
nodes 11, 30 and 36 affect on the voltage estimation accuracy. The benefit of voltage measurements
depends heavily on the voltage measurement accuracy. Figure 4.1.10b shows that state estimation
accuracy improves also when the load pseudo measurement accuracy is improved. Therefore, it is
important that the load and production forecaster is able to provide accurate load and production
estimates.
Figure 4.1.9. Comparison between the reference method and the developed state estimation method.
Figure 4.1.10. Estimation accuracy a) with different voltage measurement accuracies and b) with improved pseudo measurement accuracies.
4.2 State Forecasting The same algorithm that was used for state estimation is also used for state forecasting. When used for
state forecasting, the algorithm inputs are load and productions forecasts instead of estimates and
measurements. The state estimator and state forecaster use the same algorithm core, but they have two
fundamental differences.
1) According to the IDE4L concept, only load and production forecasts will be made. This means that
the only inputs to the state forecaster are bus and line matrixes, where the bus matrix contains the
load and production forecasts and the line matrix contains the line parameters. The optional
measurement matrices are empty. The measurement redundancy is zero, and what the state
5 10 15 20 25 30 350.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Node
Avera
ge v
oltage e
stim
ation e
rror
(%)
Existing SE method
Proposed SE method
0 5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Node
Ave
rag
e v
olta
ge
estim
atio
n e
rro
r (%
)
a) Voltage measurements
Base case
±2 % accuracy
±1 % accuracy
±0,5 % accuracy
±0,2 % accuracy
0 5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Node
Ave
rag
e v
olta
ge
estim
atio
n e
rro
r (%
)
b) Pseudo measurements
Base case
-25 % RSD
-50 % RSD
IDE4L Deliverable 5.1
42 IDE4L is a project co-funded by the European Commission
forecaster actually calculates is just a simple load flow calculation. At this level, the state forecasts
could be calculated with any power flow algorithm, but by using the algorithm described in chapter
4.1 we reserve the possibility to improve the state forecasting accuracy with redundant
(overlapping) forecasts. Also, with this method, it is possible to extract uncertainties for the
forecasted states.
2) The state forecasting is done for N different future time spans in the forecasting horizon. Basically,
the state forecasting is repeated N times with different inputs, where each input forecasts the load
and production on that specific time span. See Figure 3.1.1 to observe how the state estimator and
forecaster are connected and how the N different state forecasts are calculated.
The state forecaster input structure is similar to the state estimator and also the outputs are the same that
were presented in chapter 4.1.3.2.
4.3 Load Forecasting For every node in the network at time t, the forecasting algorithm can provide a prediction of the load at
time t+1. In order to model the behaviour of the load over time, and thus be able to estimate the future
load three kinds of information are used: historical measurements, calendar information and
meteorological forecasts. This procedure is expanded to deliver h predictions in the future, with two
different resolutions, up to the forecasting horizon t+h.
The IDE4L project will follow two approaches for forecasting the load demand. Which one of the
approaches will be selected will depend on the quality of the recorded measurements of the consumer load
that is to be forecasted. In case the latest measurements are not available in the database, the future load
demand is estimated by querying a lookup table, containing a predefined load baseline. If every needed
measurement is stored in the data exchange platform, past measurements are used to infer the load
demand in the future. This load-forecasting algorithm has two separate parts. The training algorithm fits
the model based on historical data. This procedure is executed offline. The second part of the algorithm is
executed online, at every time step and for every node in the network. The forecaster is able to provide
several predictions in the future up to a forecasting horizon by recursively adding the new forecasts to the
feature vector.
Load forecasting for medium voltage and low voltage follow the same procedure, unless otherwise
indicated in the remainder of this section
4.3.1 Real Time Measurement Reading and Filtering
The implementation of this algorithm will account for errors in the data collection step. Real
implementations of smart meter networks are subject to errors in measurements due to communication
failures, corruption of the data or temporal unavailability of the meter.
New and historic measurements are read by the forecasting algorithm from the low voltage data exchange
platform. New measurements are filtered based on the statistical properties of the training data set. The
training data set must comprise a sufficient range of historic values with the appropriate resolution. Two
issues might arise when querying the database for the most recent measurement. The first one is the lack
of such a measurement due to communication failures, data corruption or meter malfunction. In this case
the measurement is substituted by the last forecasted value for the particular moment. The second issue is
erroneous measurements. Based on the mentioned training data set, any new measurement not contained
within five times the standard deviation of the time series is disregarded. In its place the last forecast for
IDE4L Deliverable 5.1
43 IDE4L is a project co-funded by the European Commission
that particular time is used instead. This procedure introduces uncertainties in the forecast and degrades its
behaviour. However, it allows for a forecast to be made in every situation. If the number of missing values
is above a predefined threshold the algorithm will automatically switch to the first scheme described in this
section, and future load demands will be estimated based on the predefined load baselines.
4.3.2 Load Baselines
Nodes with a similar load demand profile are grouped together, in order to create a load baseline that best
defines their behaviour. Similar nodes are clustered using the K-means algorithm.
The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance,
minimizing a criterion known as within-cluster sum-of-squares, which is a measure of how internally
coherent clusters are.
∑min𝜇𝑗
(‖𝑥𝑗 − 𝜇𝑖‖2)
𝑛
𝑖=0
(4.3.1)
The k-means algorithm divides a set of N samples X into K disjoint clusters C, each described by the mean
𝜇𝑗 of the samples in the cluster, known as centroids.
The K-means algorithm used in this implementation randomly selects k samples from the training data set
as centroids. It then assigns each sample to the nearest of the k clusters. It then creates new centroids by
taking the mean value of all of the samples assigned to each previous centroid. The difference between the
old and the new centroids are computed and the algorithm repeats these last two steps until this value is
less than a threshold.
Once every node has been assigned to one of the k clusters, a lookup table is created by averaging all
measurements from all samples in the k cluster having the same time of the day and day of the week. The
length of this look-up table depends on the resolution of the measurements. As an example, a meter
recording measurements once every hour would produce a load baseline of length 168.
IDE4L Deliverable 5.1
44 IDE4L is a project co-funded by the European Commission
Figure 4.3.1 shows the average load demand over a week of the consumers in a network, clustered into 15
separate groups. Consumers with similar load demand patterns are group together in order to create a load
Figure 4.3.1. Average load grouped by hour of the week. For this plot, weeks begin on Monday and end on Sunday. The shadowed area represents two times the standard deviation. The Active energy is expressed in
kWh.
IDE4L Deliverable 5.1
45 IDE4L is a project co-funded by the European Commission
profile that can be used to estimate future load demands in the future, for any of the members of that
group. Figure 4.3.1 shows the load profiles of a network where 15 different groups have been identified.
Measurements in the historic time series are grouped based on the hour of the day (seconds since
midnight) and the day of the week. Future loads based on this load baseline can be estimated by querying
the resulting look-up table. The number of predictions to be queried would then depend on the forecasting
horizon. For the very short-term approach, three predictions are made, with 10-minute resolution up to a
forecasting horizon of 30 minutes. For the short-term approach, 24 or 48 predictions are made, up to a
horizon of 24h or 48h.
4.3.3 Short Term Model Training
The methodology used to predict the future load demand is based on training linear models with an array
of specially designed features. This vector comprises historic load measurements, calendar variables, such
as time of day and day of week, historic temperatures and forecasted temperatures.
The demand at time t is expressed as a non-parametric additive model
��𝑡 = 𝑐(𝑡) + 𝑙(𝑦𝑡) + 𝑚(𝑇𝑡) + 휀𝑡 (4.3.2)
Where:
��𝑡 is the predicted demand at time t.
𝑐(𝑡) is the contextual information at time t. This information contains calendar effects, such as the hour of the day, and the day of the week.
𝑙(𝑦𝑡) is a series of recent demand measurements, going backwards from t-1.
𝑚(𝑇𝑡) models the forecasted meteorological conditions at time t, as well as recent measurements, going backwards from t-1.
휀𝑡 is the model error at time t. The term 휀𝑡 integrates all errors, explaining the difference between predicted and observed values of the
time series. These differences are due to process fluctuations, measurement errors, and model
misspecifications.
An additional term 𝑥𝑑 may be added to take into account the flexible demand schedule. This new term is
the desired load. A separate model has to be trained with this expanded feature vector. The fit of this
model is only possible if there are enough samples to constitute a training data set.
Meteorological factors are widely considered to have an influence on the active load demand. The feature
vector contains a varying number of historic temperature measurements, as well as the forecasted
temperature as provided by a third party service. The historic temperature measurements that have been
considered are 𝑇𝑛, where 𝑇𝑛 is the temperature n steps behind. The meteorological section of the feature
vector is 𝑥𝑚 = 𝑇𝑛 = {𝑇1, … , 𝑇𝑛}. Only the most influential variable, temperature, has been included in
the results, as it has been found that no benefit was achieved by including the others in the feature vector.
Load follows a distinctive pattern throughout the day and also within the same week. Two calendar
features are included into the feature vector. The calendar section of the feature vector is 𝑥𝑐 = {𝑑, 𝑤},
where d is the hour of the day, encoded as seconds from midnight, and w is the day of the week, encoded
as a number between 0 and 6.
IDE4L Deliverable 5.1
46 IDE4L is a project co-funded by the European Commission
Arguably, the most important regressor of a time series is its historic evolution in the past. The feature
vector includes the historic active energy, expressed as kWh, as 𝑥𝑦(𝑙) = {𝑦1, … , 𝑦𝑙}. Where 𝑦𝑙 is the load l
steps in the past.
The final feature vector is 𝑥 = 𝑥𝑦(𝑙)||𝑥𝑐||𝑥𝑚(𝑛), where || designates the concatenation operator. This
feature vector has two autoregressive parts: the latest l measurements and the latest n temperature
values. The alternative feature vector, designed to take into account the demand schedule is 𝑥 =
𝑥𝑦(𝑙)||𝑥𝑐||𝑥𝑚(𝑛)||𝑥𝑑.
The feature vector is then standardized, so that individual features approximate a Gaussian with zero mean
and unit variance. The data is transformed by removing the mean value 𝜇𝑡𝑟(𝑥) = {𝜇𝑥(1), … , 𝜇𝑥(𝑝)} of
each feature, and then scale it by dividing non-constant features by their standard deviation 𝜎𝑡𝑟 =
{𝜎1, … , 𝜎𝑝}, where 𝜇𝑡𝑟 is the average value of each individual feature in the training set and 𝜎𝑡𝑟 is the
standard deviation of each individual feature in the training set.
The target variable 𝑦 follows the same standardization procedure. The data is transformed by removing the
mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then scale it by dividing them by their standard deviation
𝜎𝑡𝑟(𝑦), where 𝜇𝑡𝑟(𝑦) is the average value of each individual feature in the training set and 𝜎𝑡𝑟(𝑦) is the
standard deviation of each individual feature in the training set.
Assuming the future load can be expressed as a linear combination of the input values, the forecaster can be expressed as
��(𝑤, 𝑥) = 𝑤0 + 𝑤1𝑥1 + … + 𝑤𝑛𝑥𝑛 (4.3.3)
Where ��(𝑤, 𝑥) is the predicted value, 𝑋 = (𝑥1, … , 𝑥𝑛) is the feature vector, made from n inputs, and
𝑤 = (𝑤1, … , 𝑤𝑛) are its coefficients.
Ordinary Least Squares (OLS) methods fit a linear model with coefficients 𝑤 = (𝑤1, … , 𝑤𝑝) in order to
minimize the residual sum of squares between training set and the testing set samples, expressed as
min𝑤
|| ��(𝑤, 𝑥) − 𝑦||22 (4.3.4)
Where ��(𝑤, 𝑥) is the predicted value and y is the real load value. The linear approximation is then used to
predict future data.
The training procedure assumes that every feature vector is complete. That is, the training set of feature
vectors have been pruned of samples with missing values.
The training algorithm may be called at any time and will provide an updated model of the time series. How
often this update is necessary, if at all, will depend on the variability of the time series and on the quality of
the recorded measurements. This update is an offline process and should not disrupt the forecasting
algorithm, which would keep using the current model until a new one is available.
4.3.4 Very Short Term Model Training
The methodology used to predict the future load demand is based on training linear models with an array
of specially designed features. This vector comprises historic load measurements and calendar variables,
such as time of day and day of week.
IDE4L Deliverable 5.1
47 IDE4L is a project co-funded by the European Commission
The demand at time t is expressed as a non-parametric additive model
��𝑡 = 𝑐(𝑡) + 𝑙(𝑦𝑡) + 휀𝑡 (4.3.5)
Where:
��𝑡 is the predicted demand at time t.
𝑐(𝑡) is the contextual information at time t. This information contains calendar effects, such as the hour of the day, and the day of the week.
𝑙(𝑦𝑡) is a series of recent demand measurements, going backwards from t-1.
휀𝑡 is the model error at time t. The term 휀𝑡 integrates all errors, explaining the difference between predicted and observed values of the
time series. These differences are due to process fluctuations, measurement errors, and model
misspecifications.
An additional term 𝑥𝑑 may be added to take into account the flexible demand schedule. This new term is
the desired load. A separate model has to be trained with this expanded feature vector. The fit of this
model is only possible if there are enough samples to constitute a training data set.
Load follows a distinctive pattern throughout the day and also within the same week. Two calendar
features are included into the feature vector. The calendar section of the feature vector is 𝑥𝑐 = {𝑑, 𝑤},
where d is the hour of the day, encoded as a seconds from midnight, and w is the day of the week, encoded
as a number between 0 and 6.
Arguably, the most important regressor of a time series is its historic evolution in the past. The feature
vector includes the historic active energy, expressed as kWh, as 𝑥𝑦(𝑙) = {𝑦1, … , 𝑦𝑙}. Where 𝑦𝑙 is the load l
steps in the past.
The final feature vector is 𝑥 = 𝑥𝑦(𝑙)||𝑥𝑐, where || designates the concatenation operator. This feature
vector has one autoregressive part: the latest l measurements. The alternative feature vector, designed to
take into account the demand schedule is 𝑥 = 𝑥𝑦(𝑙)||𝑥𝑐||𝑥𝑑.
The feature vector is then standardized, so that individual features approximate a Gaussian with zero mean
and unit variance. The data is transformed by removing the mean value 𝜇𝑡𝑟(𝑥) = {𝜇𝑥(1), … , 𝜇𝑥(𝑝)} of
each feature, and then scale it by dividing non-constant features by their standard deviation 𝜎𝑡𝑟 =
{𝜎1, … , 𝜎𝑝}, where 𝜇𝑡𝑟 is the average value of each individual feature in the training set and 𝜎𝑡𝑟 is the
standard deviation of each individual feature in the training set.
The target variable 𝑦 follows the same standardization procedure. The data is transformed by removing the
mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then scale it by dividing them by their standard deviation
𝜎𝑡𝑟(𝑦), where 𝜇𝑡𝑟(𝑦) is the average value of each individual feature in the training set and 𝜎𝑡𝑟(𝑦) is the
standard deviation of each individual feature in the training set.
Assuming the future load can be expressed as a linear combination of the input values, the forecaster can be expressed as
��(𝑤, 𝑥) = 𝑤0 + 𝑤1𝑥1 + … + 𝑤𝑛𝑥𝑛 (4.3.6)
IDE4L Deliverable 5.1
48 IDE4L is a project co-funded by the European Commission
Where ��(𝑤, 𝑥) is the predicted value, 𝑋 = (𝑥1, … , 𝑥𝑛) is the feature vector, made from n inputs, and
𝑤 = (𝑤1, … , 𝑤𝑛) are its coefficients.
Ordinary Least Squares (OLS) methods fit a linear model with coefficients 𝑤 = (𝑤1, … , 𝑤𝑛) in order to minimize the residual sum of squares between training set and the testing set samples, expressed as
min𝑤
|| ��(𝑤, 𝑥) − 𝑦||22 (4.3.7)
Where ��(𝑤, 𝑥) is the predicted value and y is the real load value. The linear approximation is then used to
predict future data.
The training procedure assumes that every feature vector is complete. That is, the training set of feature
vectors have been pruned of samples with missing values.
The training algorithm may be called at any time and will provide an updated model of the time series. How
often this update is necessary, if at all, will depend on the variability of the time series and on the quality of
the recorded measurements. This update is an offline process and should not disrupt the forecasting
algorithm, which would keep using the current model until a new one is available.
4.3.5 Forecasting
The short term forecasting and the very short term forecasting follow the same produce. The only
difference is the kind of information used to create the feature vector. The short-term forecaster vector
comprises historic load measurements, calendar variables, such as time of day and day of week, historic
temperatures and forecasted temperatures. The very short-term forecaster vector comprises historic load
measurements and calendar variables, such as time of day and day of week. As an optional input to the
algorithm, the feature vector may be expanded with the flexible demand. If available, the forecasting
algorithm will use the alternative model in order to provide the estimations.
The online forecast algorithm retrieves the latest measurements from the database in other to create a
feature vector following the same steps as in the training algorithm. This new feature vector needs to be
scaled using the same parameters used in the training dataset. The feature vector is standardized, so that
individual features approximate a Gaussian with zero mean and unit variance. The data is transformed by
removing the mean value 𝜇𝑡𝑟(𝑥) = {𝜇𝑥(1), … , 𝜇𝑥(𝑝)} of each feature, and then scale it by dividing non-
constant features by their standard deviation 𝜎𝑡𝑟 = {𝜎1, … , 𝜎𝑝}, where 𝜇𝑡𝑟 is the average value of each
individual feature in the training set and 𝜎𝑡𝑟 is the standard deviation of each individual feature in the
training set.
The target variable 𝑦 follows the same standarization procedure. The data is transformed by removing the
mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then scale it by dividing them by their standard deviation
𝜎𝑡𝑟(𝑦), where 𝜇𝑡𝑟(𝑦) is the average value of each individual feature in the training set and 𝜎𝑡𝑟(𝑦) is the
standard deviation of each individual feature in the training set.
The estimated load at time t is the linear combination of the input feature vector x weighted by the fitted coefficients w.
��(𝑤, 𝑥) = 𝑤0 + 𝑤1𝑥1 + … + 𝑤𝑛𝑥𝑛 (4.3.8)
IDE4L Deliverable 5.1
49 IDE4L is a project co-funded by the European Commission
Where ��(𝑤, 𝑥) is the predicted value, 𝑋 = (𝑥1, … , 𝑥𝑛) is the feature vector, made from n inputs, and
𝑤 = (𝑤1, … , 𝑤𝑛) are its coefficients.
If the forecasting horizon is larger than one, the algorithm is called recursively. Instead of using historic
values of the measurements, at least part of 𝑥𝑦 will be a forecast of the load at time t. For instance, the
load at time t+2 forecasted at time t will have an autoregressive part of its feature vector as 𝑥𝑦(𝑙) =
{��𝑡+1,𝑦𝑡 … , 𝑦𝑙−1} . For the very short-term approach, three predictions are made, with 10-minute
resolution up to a forecasting horizon of 30 minutes. For the short-term approach, 24 or 48 predictions are
made, up to a horizon of 24h or 48h.
The final forecast is then descaled by adding the mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then
multiplying them by their standard deviation 𝜎𝑡𝑟(𝑦) , where 𝜇𝑡𝑟(𝑦) is the average value of each individual
feature in the training set and 𝜎𝑡𝑟(𝑦) is the standard deviation of each individual feature in the training set.
4.4 Production Forecasting For every node in the distributed production network at time t, the production forecasting algorithm can
provide a prediction of its production at time t+1. In order to model the behaviour of the production over
time, and thus be able to estimate the future production three kinds of information are used: historical
measurements, calendar information and meteorological information. This procedure is expanded to
deliver h predictions in the future, with two different resolutions, up to the forecasting horizon t+h.
The IDE4L project will follow two approaches for forecasting the distributed production. Which one of the
approaches will be selected will depend on the quality of the recorded measurements and the availability
of accurate meteorological forecasts. In case the latest measurements are not available in the database,
the future production is estimated by querying a lookup table, containing a predefined production baseline.
If every needed measurement is stored in the data exchange platform, past measurements are used to
infer the production in the future. This production-forecasting algorithm has two separate parts. The
training algorithm fits the model based on historical data. This procedure is executed offline. The second
part of the algorithm is executed online, at every time step and for every node in the network. The
forecaster is able to provide several predictions in the future up to a forecasting horizon by recursively
adding the new forecasts to the feature vector.
Distributed production forecasting for medium voltage and low voltage follow the same procedure, unless
otherwise indicated in the remainder of this section
4.4.1 Real Time Measurement Reading and Filtering
The implementation of this algorithm will account for errors in the data collection step. Real
implementations of smart meter networks are subject to errors in measurements due to communication
failures, corruption of the data or temporal unavailability of the meter.
New and historic measurements are read by the forecasting algorithm from the low voltage data exchange
platform. New measurements are filtered based on the statistical properties of the training dataset. The
training dataset must comprise a sufficient range of historic values with the appropriate resolution. Two
issues might arise when querying the database for the most recent measurement. The first one is the lack
of such a measurement due to communication failure, data corruption or meter malfunction. In this case
the measurement is substituted by the last forecasted value for the particular moment. The second issue is
erroneous measurements. Based on the mentioned training dataset, any new measurement not contained
IDE4L Deliverable 5.1
50 IDE4L is a project co-funded by the European Commission
within five times the standard deviation of the time series is disregarded. In its place the last forecast for
that particular time is used instead. This procedure introduces uncertainties in the forecast and degrades its
behaviour. However, it allows for a forecast to be made in every situation. If the number of missing values
is above a predefined threshold the algorithm will automatically switch to the first scheme described in this
section, and future distributed production will be estimated based on the predefined production baselines.
4.4.2 Production Baselines
Nodes with a similar production profile are grouped together, in order to create a production baseline that
best defines their behaviour. Similar nodes are clustered using the K-means algorithm.
The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance,
minimizing a criterion known as within-cluster sum-of-squares, which is a measure of how internally
coherent clusters are.
∑min𝜇𝑗
(‖𝑥𝑗 − 𝜇𝑖‖2)
𝑛
𝑖=0
(4.4.1)
The k-means algorithm divides a set of N samples X into K disjoint clusters C, each described by the mean
𝜇𝑗 of the samples in the cluster, known as centroids.
The K-means algorithm used in this implementation randomly selects k samples from the training dataset
as centroids. It then assigns each sample to the nearest of the k clusters. It then creates new centroids by
taking the mean value of all of the samples assigned to each previous centroid. The difference between the
old and the new centroids are computed and the algorithm repeats these last two steps until this value is
less than a threshold.
Once every node has been assigned to one of the k clusters, a lookup table is created by averaging all
measurements from all samples in the k cluster having the same time of the day and day of the week. The
length of this look-up table depends on the resolution of the measurements. As an example, a meter
recording measurements once every hour would produce a production baseline of length 168.
Measurements in the historic time series are grouped based on the hour of the day (seconds since
midnight) and the day of the week. Future production based on this production baseline can be estimated
by querying the resulting look-up table. For the very short-term approach, three predictions are made, with
10-minute resolution up to a forecasting horizon of 30 minutes. For the short-term approach, 24 or 48
predictions are made, up to a horizon of 24h or 48h.
4.4.3 Short Term Model Training
The methodology used to predict the future distributed production is based on training linear models with
an array of specially designed features. This vector comprises historic production measurements, calendar
variables, such as time of day and day of week, historic temperatures and forecasted temperatures.
The production at time t is expressed as a non-parametric additive model
��𝑡 = 𝑐(𝑡) + 𝑙(𝑦𝑡) + 𝑚(𝑆𝑡) + 휀𝑡 (4.4.2)
Where:
IDE4L Deliverable 5.1
51 IDE4L is a project co-funded by the European Commission
��𝑡 is the predicted production at time t.
𝑐(𝑡) is the contextual information at time t. This information contains calendar effects, such as the hour of the day.
𝑙(𝑦𝑡) is a series of recent production measurements, going backwards from t-1.
𝑚(𝑆𝑡) models the forecasted meteorological conditions at time t, as well as recent measurements, going backwards from t-1.
휀𝑡 is the model error at time t. The term 휀𝑡 integrates all errors, explaining the difference between predicted and observed values of the
time series. These differences are due to process fluctuations, measurement errors, and model
misspecifications.
Meteorological factors have a great influence on the distributed production, especially in photovoltaic
production and micro wind turbines. The feature vector contains a varying number of historic
measurements of solar radiation, wind speed and wind direction, as well as the forecasted solar radiation,
wind speed and wind direction as provided by a third party service. The historic meteorological
measurements that have been considered are 𝑆𝑛, where 𝑆𝑛 are the meteorological variables n steps
backwards. The meteorological section of the feature vector is 𝑥𝑚 = 𝑆𝑛 = {𝑆1, … , 𝑆𝑛}.
Production follows a distinctive pattern throughout the day. One calendar feature is included into the
feature vector. The calendar section of the feature vector is 𝑥𝑐 = {𝑑}, where d is the hour of the day,
encoded as seconds from midnight.
An important feature of a time series is its historic evolution in the past. The feature vector includes the
historic production, expressed in kWh, as 𝑥𝑦(𝑙) = {𝑦1, … , 𝑦𝑙}. Where 𝑦𝑙 is the production l steps in the
past.
The final feature vector is 𝑥 = 𝑥𝑦(𝑙)||𝑥𝑐||𝑥𝑚(𝑛), where || designates the concatenation operator. This
feature vector has two autoregressive parts: the latest l measurements and the latest n meteorological
values.
The feature vector is then standardized, so that individual features approximate a Gaussian with zero mean
and unit variance. The data is transformed by removing the mean value 𝜇𝑡𝑟(𝑥) = {𝜇𝑥(1), … , 𝜇𝑥(𝑝)} of
each feature, and then scale it by dividing non-constant features by their standard deviation 𝜎𝑡𝑟 =
{𝜎1, … , 𝜎𝑝}, where 𝜇𝑡𝑟 is the average value of each individual feature in the training set and 𝜎𝑡𝑟 is the
standard deviation of each individual feature in the training set.
The target variable 𝑦 follows the same standardization procedure. The data is transformed by removing the
mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then scale it by dividing them by their standard deviation
𝜎𝑡𝑟(𝑦), where 𝜇𝑡𝑟(𝑦) is the average value of each individual feature in the training set and 𝜎𝑡𝑟(𝑦) is the
standard deviation of each individual feature in the training set.
Assuming the future production can be expressed as a linear combination of the input values, the forecaster can be expressed as
��(𝑤, 𝑥) = 𝑤0 + 𝑤1𝑥1 + … + 𝑤𝑛𝑥𝑛 (4.4.3)
Where ��(𝑤, 𝑥) is the predicted value, 𝑋 = (𝑥1, … , 𝑥𝑛) is the feature vector, made from n inputs, and
𝑤 = (𝑤1, … , 𝑤𝑛) are its coefficients.
IDE4L Deliverable 5.1
52 IDE4L is a project co-funded by the European Commission
Ordinary Least Squares (OLS) methods fit a linear model with coefficients 𝑤 = (𝑤1, … , 𝑤𝑛) in order to minimize the residual sum of squares between the predictions and the actual samples, expressed as
min𝑤
|| ��(𝑤, 𝑥) − 𝑦||22 (4.4.4)
Where ��(𝑤, 𝑥) is the predicted value and y is the real production value. The linear approximation is then
used to predict future data.
The training procedure assumes that every feature vector is complete. That is, the training set of feature
vectors has been pruned of samples with missing values.
The training algorithm may be called at any time and will provide an updated model of the time series. How
often this update is necessary, if at all, will depend on the variability of the time series and on the quality of
the recorded measurements. This update is an offline process and should not disrupt the forecasting
algorithm, which would keep using the current model until a new one is available.
4.4.4 Very Short Term Model Training
The methodology used to predict the future distributed production is based on training linear models with
an array of specially designed features. This vector comprises historic production measurements and
calendar variables, such as time of the day.
The production at time t is expressed as a non-parametric additive model
��𝑡 = 𝑐(𝑡) + 𝑙(𝑦𝑡) + 휀𝑡 (4.4.5)
Where:
��𝑡 is the predicted production at time t.
𝑐(𝑡) is the contextual information at time t. This information contains calendar effects, such as the hour of the day.
𝑙(𝑦𝑡) is a series of recent production measurements, going backwards from t-1.
휀𝑡 is the model error at time t. The term 휀𝑡 integrates all errors, explaining the difference between predicted and observed values of the
time series. These differences are due to process fluctuations, measurement errors, and model
misspecifications.
Distributed production follows a distinctive pattern throughout the day. One calendar feature is included
into the feature vector. The calendar section of the feature vector is 𝑥𝑐 = {𝑑}, where d is the hour of the
day, encoded as a seconds from midnight.
The feature vector includes the lagged active energy, expressed in kWh, as 𝑥𝑦(𝑙) = {𝑦1, … , 𝑦𝑙}. Where 𝑦𝑙
is the production measurement l steps in the past.
The final feature vector is 𝑥 = 𝑥𝑦(𝑙)||𝑥𝑐, where || designates the concatenation operator. This feature
vector has one autoregressive part: the latest 𝑙 measurements.
The feature vector is then standardized, so that individual features approximate a Gaussian with zero mean
and unit variance. The data is transformed by removing the mean value 𝜇𝑡𝑟(𝑥) = {𝜇𝑥(1), … , 𝜇𝑥(𝑝)} of
IDE4L Deliverable 5.1
53 IDE4L is a project co-funded by the European Commission
each feature, and then scale it by dividing non-constant features by their standard deviation 𝜎𝑡𝑟 =
{𝜎1, … , 𝜎𝑝}, where 𝜇𝑡𝑟 is the average value of each individual feature in the training set and 𝜎𝑡𝑟 is the
standard deviation of each individual feature in the training set.
The target variable 𝑦 follows the same standardization procedure. The data is transformed by removing the
mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then scale it by dividing them by their standard deviation
𝜎𝑡𝑟(𝑦), where 𝜇𝑡𝑟(𝑦) is the average value of each individual feature in the training set and 𝜎𝑡𝑟(𝑦) is the
standard deviation of each individual feature in the training set.
Assuming the future production can be expressed as a linear combination of the input values, the forecaster can be expressed as
��(𝑤, 𝑥) = 𝑤0 + 𝑤1𝑥1 + … + 𝑤𝑝𝑥𝑝 (4.4.6)
Where ��(𝑤, 𝑥) is the predicted value, 𝑋 = (𝑥1, … , 𝑥𝑛) is the feature vector, made from n inputs, and
𝑤 = (𝑤1, … , 𝑤𝑛) are its coefficients.
Ordinary Least Squares (OLS) methods fit a linear model with coefficients 𝑤 = (𝑤1, … , 𝑤𝑝) in order to
minimize the residual sum of squares between training set and the testing set samples, expressed as
min𝑤
|| ��(𝑤, 𝑥) − 𝑦||22 (4.4.7)
Where ��(𝑤, 𝑥) is the predicted value and y is the real production value. The linear approximation is then
used to predict future data.
The training procedure assumes that every feature vector is complete. That is, the training set of feature
vectors has been pruned of samples with missing values.
The training algorithm may be called at any time and will provide an updated model of the time series. How
often this update is necessary, if at all, will depend on the variability of the time series and on the quality of
the recorded measurements. This update is an offline process and should not disrupt the forecasting
algorithm, which would keep using the current model until a new one is available.
4.4.5 Forecasting
The short term forecasting and the very short term forecasting follow the same produce. The only
difference is the kind of information used to create the feature vector. The short-term forecaster vector
comprises historic production measurements, calendar variables, such as time of day, historic
measurements of solar radiation, wind speed and wind direction, as well as forecasted values of solar
radiation, wind speed and wind direction. The very short-term forecaster vector comprises historic
production measurements and calendar variables, such as time of day.
The online forecast algorithm retrieves the latest measurement from the database in other to create a
feature vector following the same steps as in the training algorithm. This new feature vector needs to be
scaled using the same parameters used in the training dataset. The feature vector is standardized, so that
individual features approximate a Gaussian with zero mean and unit variance. The data is transformed by
removing the mean value 𝜇𝑡𝑟(𝑥) = {𝜇𝑥(1), … , 𝜇𝑥(𝑝)} of each feature, and then scale it by dividing non-
constant features by their standard deviation 𝜎𝑡𝑟 = {𝜎1, … , 𝜎𝑝}, where 𝜇𝑡𝑟 is the average value of each
IDE4L Deliverable 5.1
54 IDE4L is a project co-funded by the European Commission
individual feature in the training set and 𝜎𝑡𝑟 is the standard deviation of each individual feature in the
training set.
The target variable 𝑦 follows the same standarization procedure. The data is transformed by removing the
mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then scale it by dividing them by their standard deviation
𝜎𝑡𝑟(𝑦), where 𝜇𝑡𝑟(𝑦) is the average value of each individual feature in the training set and 𝜎𝑡𝑟(𝑦) is the
standard deviation of each individual feature in the training set.
The estimated production at time t is the linear combination of the input feature vector x weighted by the fitted coefficients w.
��(𝑤, 𝑥) = 𝑤0 + 𝑤1𝑥1 + … + 𝑤𝑝𝑥𝑝 (4.4.8)
Where ��(𝑤, 𝑥) is the predicted value, 𝑋 = (𝑥1, … , 𝑥𝑛) is the feature vector, made from n inputs, and
𝑤 = (𝑤1, … , 𝑤𝑛) are its coefficients.
If the forecasting horizon is larger than one, the algorithm is called recursively. Instead of using historic
values of the measurements, at least part of 𝑥𝑦 will be a forecast of the production at time t. For instance,
the production at time t+1, forecasted at time t will have an autoregressive part of its feature vector as
𝑥𝑦(𝑙) = {��𝑡+1,𝑦𝑡 … , 𝑦𝑙−1} . For the very short-term approach, three predictions are made, with 10-minute
resolution up to a forecasting horizon of 30 minutes. For the short-term approach, 24 or 48 predictions are
made, up to a horizon of 24h or 48h.
The final forecast is then descaled by adding the mean value 𝜇𝑡𝑟(𝑦) of each target variable, and then
multiplying them by their standard deviation 𝜎𝑡𝑟(𝑥) , where 𝜇𝑡𝑟(𝑦) is the average value of each individual
feature in the training set and 𝜎𝑡𝑟(𝑦) is the standard deviation of each individual feature in the training set.
4.5 Interfaces
4.5.1 Medium Voltage Network Load and Production Forecaster
Almost all the inputs for medium voltage load and production forecasting are read from a SQL database
located at the primary substation computer. In the database, real-time measurement data contains values
received from RTU and smart meters, and they are organized according to an IEC 61850 data model.
Input Data exchanged Source Local / Remote Update schedule Format
MV Customer
information
- Contractual
power or fuse size
MV DXP Local Once a day - Contractual
power in
kilowatts [kW] or
fuse size in amps
[A]
- Power demand
[kW],
DG information - Nominal power
- Type (PV, wind
MV DXP Local Once a day Table with
integer and
floating point
IDE4L Deliverable 5.1
55 IDE4L is a project co-funded by the European Commission
etc.) numbers
Smart meter
measurement
time series 10
min interval
- Load and
production time
series from the
last 168 hours
with 10 minutes
time step
- Time stamp for
each
measurement
MV DXP Local Once a day - Power demand
[kW]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
Smart meter
measurement
time series
hourly resolution
- Load and
production time
series from the
last year with
hourly time step
- Time stamp for
each
measurement
MV DXP Local Once a day - Power demand
[kW]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
Weather
measurements
time series
- Outdoor
temperature,
solar irradiance
and wind speed
and direction
measurements
(hourly time step)
from the last year
Local weather
station (via web
service)
Remote Once a day - Temperature
[°C]
- Solar irradiance
[W/m2]
- Wind speed and
direction [m/s ;
degrees]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
Weather
forecasts
- Outdoor
temperature,
solar irradiance
and wind speed
and direction
forecasts for the
next 24 – 48 hours
Hourly time step
Local weather
station (via web
service)
Remote On a fixed
schedule and “on
demand”
- Temperature
[°C]
- Solar irradiance
[W/m2]
- Wind speed
[m/s]
- Wind direction
[°]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
IDE4L Deliverable 5.1
56 IDE4L is a project co-funded by the European Commission
4.5.2 Low Voltage Network Load and Production Forecaster
Almost all the inputs for low voltage load and production forecasting are read from a SQL database located
at the secondary substation computer. In the database, real-time measurement data contains values
received from RTU and smart meters, and they are organized according to an IEC 61850 data model.
Input Data exchanged Source Local / Remote Update schedule Format
LV Customer
information
- Contractual
power or fuse size
LV DXP Local Once a day - Contractual
power in
kilowatts [kW] or
fuse size in amps
[A]
- Power demand
[kW],
DG information - Nominal power
- Type (PV, wind
etc.)
LV DXP Local Once a day Table with
integer and
floating point
numbers
Smart meter
measurement
time series 10
min interval
- Load and
production time
series from the
last 168 hours
with 10 minutes
time step
- Time stamp for
each
measurement
LV DXP Local Once a day - Power demand
[kW]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
Smart meter
measurement
time series
hourly resolution
- Load and
production time
series from the
last year with
hourly time step
- Time stamp for
each
measurement
LV DXP Local Once a day - Power demand
[kW]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
Weather
measurements
time series
- Outdoor
temperature,
solar irradiance
and wind speed
and direction
measurements
(hourly time step)
Local weather
station (web
service via DXP)
Remote Once a day - Temperature
[°C]
- Solar irradiance
[W/m2]
- Wind speed and
IDE4L Deliverable 5.1
57 IDE4L is a project co-funded by the European Commission
from the last year direction [m/s ;
degrees]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
Weather
forecasts
- Outdoor
temperature,
solar irradiance
and wind speed
and direction
forecasts for the
next 24 – 48 hours
Hourly time step
Local weather
station (web
service via DXP)
Remote On a fixed
schedule and “on
demand”
- Temperature
[°C]
- Solar irradiance
[W/m2]
- Wind speed
[m/s]
- Wind direction
[°]
- Time stamp
[YYYY-mm-dd
HH:MM:SS]
IDE4L Deliverable 5.1
58 IDE4L is a project co-funded by the European Commission
REFERENCES
[Abur2004] Abur, A. & Exposito, A.G. Power System State Estimation: Theory and
Implementation. New York, Marcel Dekker,2004, 324 p.
[Agrawal2013] Adhikari R. K. Agrawal. An Introductory Study on Time Series Modeling and