Power Plant Operation Optimization

TVE-MFE 19009

Examensarbete 30 hpOktober 2019

Power Plant Operation Optimization

Unit Commitment of Combined Cycle Power

Plants Using Machine Learning and MILP

Mohamed Elhafiz Hassan Ahmed

Masterprogram i förnybar elgenereringMaster Programme in Renewable Electricity Production

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Unit Commitment of Combined Cycle Power PlantsUsing Machine Learning and MILP

Mohamed Elhafiz Hassan Ahmed

In modern days electric power systems, the penetration of renewable resources and the introduction of free market principleshave led to new challenges facing the power producers andregulators. Renewable production is intermittent which leads tofluctuations in the grid and requires more control for regulators,and the free market principle raises the challenge for power plantproducers to operate their plants in the most profitable way giventhe fluctuating prices. Those problems are addressed in theliterature as the Economic Dispatch, and they have been discussedfrom both regulator and producer viewpoints.Combined Cycle Power plants have the privileges of being dispatchable very fast and with low cost which put them as aprimary solution to power disturbance in grid, this fast dispatchability also allows them to exploit price changes very efficientlyto maximize their profit, and this sheds the light on the importance of prices forecasting as an input for the profitoptimization of power plants.In this project, an integrated solution is introduced to optimizethe dispatch of combined cycle power plants that are bidding forelectricity markets, the solution is composed of two models, theforecasting model and the optimization model. The forecastingmodel is flexible enough to forecast electricity and fuel pricesfor different markets and with different forecasting horizons.Machine learning algorithms were used to build and validate themodel, and data from different countries were used to test themodel.The optimization model incorporates the forecasting model outputsas inputs parameters and uses other parameters and constraintsfrom the operating conditions of the power plant as well as themarket in which the plant is selling. The power plant in this modeis assumed to satisfy different demands, each of these demandshave corresponding electricity price and cost of energy notserved. The model decides which units to be dispatched at eachtime stamp to give out the maximum profit given all theseconstraints, it also decides whether to satisfy all the demands ornot producing part of each of them.

TVE-MFE 19009Examinator: Irina TemizÄmnesgranskare: Juan de SantiagoHandledare: Edgar Bahilo Rodriguez

III

Acknowledgements

I would like to extend my gratitude and appreciation for everyone that helped me inproducing this research project.

I am very thankful for Siemens Industrial Turobmachinery for their continuous supportwith everything I asked for to complete this project. Thanks to all data analytics teammembers, namely my supervisor Edgar Bahilo rodriquez who guided me through outthe whole project. Everything in this project could not have been done without Siemensresources and help.

My gratitude to dr. Juan de Santiago at Upssala University who suervised me in thisproject, he also helped me with many more projects and other study decisions through-out my master study.

Last but not the least, my gratitude for my family and friends who provided me withthe moral support throuhout the tough times that I had throughout the project work.

Table of Contents IV

Table of Contents

Abstract II

Acknowledgements III

List of Tables VI

List of Figures VII

1 Introduction 11.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Economic dispatch in literature . . . . . . . . . . . . . . . . . . . . 51.3.2 Electricity prices forecasting in literature . . . . . . . . . . . . . . 7

1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Theory 92.1 Introduction to Electricity Markets . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Privatization of Electric power industry . . . . . . . . . . . . . . . 102.1.2 Electricity markets . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Importance of prices forecasting in electricity markets . . . . . . . 11

2.2 Time series and forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Time series Forecasting using lag features . . . . . . . . . . . . . . 162.2.3 Time series forecasting using decomposition features . . . . . . . 18

2.3 Machine learning and forecasting . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 Using machine learning regression for forecasting . . . . . . . . . 202.3.2 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.3 Over-fitting and Regularization . . . . . . . . . . . . . . . . . . . . 222.3.4 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . 232.3.5 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3.6 Ensemble models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.7 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.8 Gradient Boostinf and XGBoost . . . . . . . . . . . . . . . . . . . . 262.3.9 Features importance . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.10 Forecasting accuracy measurement . . . . . . . . . . . . . . . . . . 27

Table of Contents V

2.4 Mathematical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Linear Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.2 Mixed Integer Linear Programming (MILP) . . . . . . . . . . . . . 282.4.3 Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Methodology 313.1 Forecasting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.1.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.1.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.1.5 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.1.6 Case studies for forecasting model . . . . . . . . . . . . . . . . . . 41

3.2 Optimization Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2.1 Optimization Problem Formulation . . . . . . . . . . . . . . . . . 433.2.2 Input parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2.3 Optimization Variables . . . . . . . . . . . . . . . . . . . . . . . . . 433.2.4 Objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2.5 Optimization Constraints . . . . . . . . . . . . . . . . . . . . . . . 443.2.6 Optimization Problem Implementation . . . . . . . . . . . . . . . 463.2.7 Case studies for optimization Model . . . . . . . . . . . . . . . . . 46

4 Results 484.1 Forecasting Model Case Studies . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.1 Comparison between ML and time series forecasting methods . . 484.1.2 Comparison between different ML forecasting methods . . . . . 494.1.3 Comparison between models for different forecasting horizons . 49

4.2 Optimization model Results . . . . . . . . . . . . . . . . . . . . . . . . . . 544.2.1 No specific demand . . . . . . . . . . . . . . . . . . . . . . . . . . 544.2.2 Demand satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2.3 Different demands and power not served . . . . . . . . . . . . . . 574.2.4 Sensitivity analysis of the optimization output to the forecast . . 58

5 Conclusion 595.1 Study results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Literature 62

List of Tables VI

List of Tables

Table 3.1: Information of markets used to build and validate forecasting model . 34

Table 4.1: Comparison between time series methods and machine learning . . . 48Table 4.2: Results of different variables with different time intervals . . . . . . . . 53Table 4.3: Sensitivity analysis of the optimization output due to forecasting change 58

List of Figures VII

List of Figures

Figure 2.1: Electricity market setup . . . . . . . . . . . . . . . . . . . . . . . . . . 11Figure 2.2: Time series example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.3: ACF plot Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Figure 2.4: PACF plot Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 2.5: Example of decomposition of time series . . . . . . . . . . . . . . . . 16Figure 2.6: Example of differencing . . . . . . . . . . . . . . . . . . . . . . . . . . 17Figure 2.7: Example of linear regression . . . . . . . . . . . . . . . . . . . . . . . 21Figure 2.8: Example of overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.9: Support Vector regression example . . . . . . . . . . . . . . . . . . . 24Figure 2.10: Example of Regression Trees . . . . . . . . . . . . . . . . . . . . . . . 25Figure 2.11: Example of random forest model . . . . . . . . . . . . . . . . . . . . . 26Figure 2.12: An example of branch and bounding in solving MILP . . . . . . . . 29

Figure 3.1: Proposed model in the study . . . . . . . . . . . . . . . . . . . . . . . 31Figure 3.2: Machine Learning Framework. . . . . . . . . . . . . . . . . . . . . . . 32Figure 3.3: Sample of fuel prices in market-1 . . . . . . . . . . . . . . . . . . . . . 33Figure 3.4: Samples from weather data used in building the model. . . . . . . . 33Figure 3.5: Samples form market data used in building the model. . . . . . . . . 34Figure 3.6: Correlation between electricity price and some of its lags in market-3. 36Figure 3.7: Correlation matrix between electricity price and some external features. 37Figure 3.8: Correlation between the electricity price and its demand. . . . . . . . 38Figure 3.9: Nest validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 3.10: Two different demands plotted for a period of two days. . . . . . . . 45Figure 3.11: Prices corresponding to the demands in Figure 3.12. . . . . . . . . . 45Figure 3.12: Example of forecasting output with 95% confidence intervals. . . . . 47

Figure 4.1: Accuracy comparison between different ML algorithms. . . . . . . . 50Figure 4.2: Performance comparison between three models with different intervals. 51Figure 4.3: Forecasting residuals distribution. . . . . . . . . . . . . . . . . . . . . 51Figure 4.4: Example of spikes incidences. . . . . . . . . . . . . . . . . . . . . . . 52Figure 4.5: Example of Probability Distribution of MAPE results. . . . . . . . . . 52Figure 4.6: Dispatch output and the corresponding profit and efficiency. . . . . 54Figure 4.7: Optimized dispatch vs customer’s dispatch. . . . . . . . . . . . . . . 56Figure 4.8: Comparing dispatch with and without Pns. . . . . . . . . . . . . . . 57

Introduction 1

1 Introduction

Modern energy systems have been going through radical changes due to the introductionof new concepts to the structure of the electric power industry. Free market principlesare reshaping the industry and shifting it into a totally privatized industry. Studiesand real-life examples have shown that the conventional vertically integrated structureleads to unfairness and economic inefficiencies due to the monopolistic power of thevertically integrated companies [1]. The free market aims to ensure the efficiency and faircompetition in the system by making the supply and demand the sole determinants ofthe prices and generation. The job of system operators is to ensure that fair competitionoccurs without any disturbances in the satisfaction of the demand.

Renewable generation is another big factor that is changing the electric power industry,during the past 20 years the cost of renewables has decreased significantly, and theyare more environmentally friendly, so their production capacity is increasing with time.However, renewable resources can cause instability problems in the grid because of theirintermittent generation. This intermittency leads to discontinuity in supply and poweroutages as well as fluctuations and spikes in the prices.

Dispatchable production units are suitable to solve this intermittency problem. Despitethe ongoing change in their roles as primary generation resources, and the threat ofphasing them out in the future by environmental regulations as most of these dispatch-able units are dependent on fossil fuels, they are the most reliable sources to stabilizeenergy systems and compensate for problems caused by renewable resources, becauseof their fast response to grid changes, as they could be ramped up into their maximumcapacity and down into no load in terms of minutes. They are also the best alternativeto be used for applications that require high reliability in supply, or industries that arelocated far away from grid connections such as offshore oil refineries.

One of the most proficient dispatchable generation units are combined cycle powerplants (CCPP), they have the benefits of being more efficient and more environmentallyfriendly than other plants that use fossil fuels, the main fuel used in CCPPs -natural gas-has lower emissions and higher efficiency than coal and oil. The main block of a CCPP iscomposed of gas turbines and steam turbines, exhaust heat from the gas turbines is usedto heat the steam turbines and exploit more energy, leading to even higher efficiency andfewer emissions, some researchers are also working on more environmentally friendly

Introduction 2

solutions such as the use of biogas or bio oil in the combined cycle plants [2].

Siemens Industrial turbomachinery (SIT) is one of the most outstanding companies inmanufacturing CCP units and providing integrated solutions for energy systems. Thecompany has collected large amounts of information about best practices in turbinesmanufacturing and assembly of power plants. Besides, SIT has collected repositories ofdata about power plant operation acquired from sensors and probes of each of the fleetunits. These information and data represent important factors in placing the companyin the top position in this industry if they are utilized in the best way in research anddevelopment. The company always works on providing the customers with bettertools that help them optimize their plant operations, processes such as planning theiroperations, projecting financial outcomes, and reducing costs, all these processes couldbe put labelled the umbrella of power plant operation optimization (PPOO).

PPOO is a large field that generally aims to optimize the operation of the plants bycombining information from the different knowledge areas about the plants behaviors.Theoretical areas of the plants such as the thermodynamic equations that guide theplant units have been taken into consideration for a long time by researchers and manymodels were built to give the optimum theoretical operation guidelines. However, alarge area of enhancement is missing when employing only thermodynamic model inthe optimization models. This is because real-life operation induces new scenarios thatcannot be predicted by theoretical models. These effects could not be detected withoutlooking into historical data of the plant, and as an example checking the relationshipbetween some sensors readings and the probability of failure of some parts.

Data-driven models represent new knowledge area of the PPOO and in energy systemsin general [3], as it provides new insights from the statistical analysis of the data to beincluded in the optimization models. A large amount of information could be acquiredby the statistical analysis of signals from the plant unit and many applications couldmake use of these analysis outputs.

SIT had been working on combining data-driven models with theoretical models tooptimize certain parts of the plant operations, such as optimizing replacement timesfor spare parts and assessing components states using sensors data. Following theseresearch trends, SIT presents the dispatch optimizer, which is a project that aims toprovide a decision support system to the customers, which enables them to dispatchtheir plants in an optimum way given the operating conditions. The first iteration of thedispatch optimizer was done by Bahilo Rodriguez [4] which was about the economicdispatch of gas turbines, followed by three projects that aim to extend the problem to thescope of the whole combined cycle plant and taking into consideration external factorssuch as electricity prices and market constraints. One of the projects [5] is considering

Introduction 3

modeling the steam turbines and other related constraints and combining them with thegas turbine models from Bahilo Rodriguez [4] to build up a consolidated model that canperform the economic dispatch for the whole plant; another project is considering themaintenance constraints in optimization the profit of the plants. This research projectconsiders the problem for profit viewpoint, by combining the operating constraints formthe plant’s operation along with market constraints with the aim of maximizing theprofit of the power plants.

Economic dispatch is the problem of finding the most optimum power output fromspecific generation units given some considerations such as the demand to be satisfied,the cost of production, grid status, as well as the operating conditions of the plant such asthe available production capacity and ramp rates. The unit commitment is the extensionof economic dispatch when the problem is solved for a period of time. When time isconsidered it creates more useful outputs, such as the dispatch plan for the whole periodshowing the status of each unit at each timestamp, which would make it possible tointroduce other variables to better optimize the plant operation such as maintenancecosts, spinning reserve usage, and power not served costs. Economic dispatch andunit commitment have generally been considered for the perspectives of the powersystems operators, as they are trying to operate the units in the grid in the optimumway to satisfy the demand in the most cost-efficient way. However, it’s also useful forGeneration Companies (GENCOs) to consider the same methodology when solving theproblem of operating their plants in the most profitable way, in the end; the variablesand constraints are the same.

The outputs of the unit commitment for GENCOs are the dispatch plans of each unit ateach timestamp, which gives a general operation plan for the plant as well as insightsto other business operations such as sourcing the fuel and managing human resources.The project total profit is another important output, that shows the profit at each ofthe timestamps and relationships between price and demand and cost of production,all these projected outputs give intuitions about the future profit opportunities if thehistorical data is used in the optimization model.

This project aims to provide additional considerations to optimize the power plantsunit commitment from the profit perspective, by considering the internal and externalparameters that affect the plant profit, such as the prices, demands, and operating costs.

Introduction 4

1.1 Objectives

The goal of this study is to build an integrated optimization model that could be used byGENCOs to maximize their profit, this model is composed of two sub models:

• Forecasting model, which is used to predict a week ahead of the fuel prices andelectricity prices from the day-ahead market.

• Optimization model, that is used to incorporate the forecasted prices along withoperating constraints and other market constraints to optimize the plant operation.

This model should be flexible enough to enable the GENCOs to choose between differentalternatives, so it will have the options of whether it will be operated to satisfy a specificdemand such as long-term power purchasing agreement, or it will produce electricity toparticipate in the day-ahead market, or use the capacity as a spinning reserve to the grid,or it will satisfy many demands each with different prices and specifications.The flexibility of the model also entails that it should be validated to work in differentmarket conditions, so three markets were chosen -from different continents- in order totry the model, each of these markets were analyzed in order to decide the proper wayto forecast their prices, also each of these markets have different market settings whichaffect the optimization constraints.The optimization model should be able to provide at each time stamp for a week aheadand with high accuracy each of the following:

• The unit commitment of each unit in the plant.

• The total cost, and the detailed cost associated with satisfying each of the demands.

• The power not served part from each demand, and its associated cost and revenue.

• The total profit, and the detailed profit from each demand.

1.2 Scope and limitations

The scope of this thesis considers the profit maximization as a function of the revenuesand costs. As far as this thesis is concerned there are two considered costs, the cost offuel, and the cost of power not served. Costs such as maintenance costs, emission costs,and other operating costs are not incorporated into the model.

The forecasted prices of electricity and fuel are assumed to be definite parameters andnot random variables. This means that stochastic optimization will not be considered asthe method this project. The uncertainty of forecasting will be incorporated in the sensi-tivity analysis to check how high the changes in prices would change the optimizationoutputs, and in calculating the confidence intervals of the profit output.

Introduction 5

This optimization model is built on optimization models presented by Rosso [5] andBahilo Rodriguez [4] which aimed to optimize the fuel consumption in CCPP givenoperating conditions such as maximum available capacity, ramp rates, efficiency curves,and other operating conditions. Consequently, this thesis focuses on the market-relatedconstraints, so all the detailed operating constraints of the power plant will be consideredas they were explained by Rosso [5] and Bahilo Rodriguez [4].

1.3 Literature review

This section is divided into two subsections, the first one discusses the economic dispatchin energy systems, and looking at the problem from the viewpoints of system regulatorsand GENCOs. It also discusses the methods used in the literature for solving the problemand which one is suitable for this study. The second subsection discusses forecasting inelectricity markets, and the general methodologies used in the literature in forecastingelectricity market parameters, and the most suitable ones to be used in this study.

1.3.1 Economic dispatch in literature

Economic dispatch is defined as the short-term determination of the optimal outputof some electricity generation facilities to meet the system load, at the lowest possiblecost, subject to transmission and operational constraints [6]. The problem was firstregarded in conventional electric power systems with vertically integrated structure,where the system operator had total responsibility to dispatch the units in the grid, soeconomic dispatch aimed to operate the units in the most efficient way to reduce thecost of production while satisfying the demand.

According to Chowdhury and Rahman [7], the economic dispatch problem could betraced back to the 1920s, when engineers were concerned about the division of the loadamongst the generation units, the first method used to solve the problem was baseloadmethod and best point method, in the baseload method the units were ranked accordingto their efficiencies, the top-ranked unit is operated up to its maximum capacity, followedby the next one and so on, in the best point the units were successively operated at theirlowest heat rate points starting with the most efficient ones.

Many developments have then been introduced to the problem including adding morecomplexities in order to better optimize the solution, such as the introduction of thedynamic economic dispatch as discussed by Carpet and J [8], which considers changingcosts that come from some time-related constraints such as maximum ramp rates.

Due to the introduction of privatization into the electricity industry, the generationunits became owned by private companies instead of all being owned by electric grid

Introduction 6

operators, the aim of these private generation companies became to maximize theirprofit without any regards to satisfying the total demand of the grid, the satisfactionof the demand is ensured by the system operator through setting up markets for an-cillary services and intraday market. Profit based unit commitment (PBUC) has beenintroduced by many research works such as Chandram et.al [9], which assumes thatthe allocation of the generating units to meet the power demand is not necessary, thusthe power balance constraint is relaxed into a boundary condition depicting that thesum of outputs of the allocated generating units should always be less than or equal toforecasted power demand.

As discussed by Yamin [10], the unit commitment problem for GENCOs in the privatizedmarkets requires a formulation similar to the PBUC that includes the electricity marketconstraints in the model. The main consideration is that the spot price or market clearingprice is no longer predetermined, instead, it is set by open competition and determinedby the supply and demand.

In a perfect competitive electricity market, the optimal bidding strategy for a supplieris simply to bid marginal cost. When a generator bids other than the marginal cost toexploit imperfections in the market to increase profits, this behavior is called strategicbidding. The new electricity markets are certainly not perfectly competitive, and asa result, a supplier can increase profits through strategic bidding, or in other words,through exercising market power. According to David and Wen [11], there are threegeneral ways to develop optimal bidding strategies:

• Forecasting market-clearing prices.

• Estimate the bidding behaviors of the rivals.

• Use agent based models, such as game theoretic and mechanism design models.

More studies considered the first method than other methods, the reason is that theinformation about the clearing price is usually available in contrast to rivals informationwhich is normally unavailable. The solutions proposed were combining the forecastingmodel with the optimization model, many of these studies tried solving this problemby introducing a model that combines two steps. The first step is to forecast electricityprices, and the second is to incorporate these prices in the optimization model. InShrestha et.al [12] a stochastic scheduling technique was presented, in which scenarioswere created for the forecasted prices, and the probability distribution drawn from thesescenarios were incorporated into a multi-stage stochastic optimization problem. Thepaper by Conejo et.al [13] uses the same principle to output optimal bidding curvesfor each timestamp, the sizes of the blocks of each curve are functions of the forecastedprices at that timestamp. In Song et.al [14] more variables were added to calculate thetransitional probabilities such as probabilistic bidding information of competitors. An

Introduction 7

optimization system for day-ahead market bidding combined with a bilateral contractwas studied by Heredia et .al [15].

This study uses the same principles in the mentioned papers, in the sense that the pro-posed system will determine -from profit perspective- the optimal scheduling strategy ofCPPs that will participate in different electricity markets at the same time. The differencein this study is that the optimization will be deterministic not stochastic. The forecastedprices will be incorporated as deterministic parameters into an optimization model.However the uncertainty of the prices forecast will be investigated in the sensitivityanalysis, by running the model with worst case and best case scenarios of the forecastdrawn from the forecasting confidence intervals, in order to check whether the priceschange would make a difference in the dispatch and the total profit generated.

1.3.2 Electricity prices forecasting in literature

With the privatization of the electricity markets and the extreme volatility of the prices,forecasting has become a fundamental factor to an energy company in decision makingand strategy planning. Weron [16] classifies the different forecasting solution accordingto their planning horizon and their applied methodology. According to their planninghorizon, they can be divided into:

• Short-term price forecasting STPF.

• Medium-term price forecasting MTPF.

• Long-term price forecasting LTPF.

STPF -with lead times from a few hours to a few days- is concerned with day-to-daymarket operations, which is used for optimizing dispatch scheduling.MTPF -with monthly time horizons- are generally employed for financial risk manage-ment and balance sheet estimations, normally the consideration is the distributions offuture prices over time and not the actual prices.LTPF -measured in years- is for profitability analysis and investment planning anddecisions, such as determining future sites of power plants.

As for the methods applied for the forecasting, cost-based methods were employed inthe conventional markets, where the goal was to satisfy the demand with the lowestcost, and the prices were dependent on that cost. Other methods such as equilibriummethods were cost-based models amended with strategic bidding considerations. Thesemodels are useful in predicting expected price levels in markets with no price historybut known supply costs.

Introduction 8

However, the most suitable for spot prices are the statistical methodologies, whichforecast the current price by using a mathematical combination of the previous pricesand previous or current values of exogenous factors that affect prices such as weatherdata and fuel prices.According to Weron [17], statistical models that used for electricity prices forecastinginclude:

• Time series regression models such as AR, MA and ARIMA models, such asdiscussed by Shumway and Stoffer [18] and Bordignon et.al [19].

• Data mining and machine learning regression models, such as support vectormodels [20], or tree-based models [21].

• Deep learning models, such as the recurrent neural networks model by Anbazha-gan and Kumarappan [22].

In this study, both time series and machine learning models would be explored in thesearch for an optimum forecasting algorithm for day-ahead markets.

1.4 Thesis Structure

• In chapter 2, theoretical background about time series and machine learningforecasting methods is presented, followed by the theory of optimization andmixed-integer linear programming.

• In chapter 3, the methodology of the thesis is sown, divided into two parts concern-ing each of the two building blocks of the model, first art is about the methodologyfollowed for creating the forecasting model, the second part is about the methodol-ogy followed to build up the optimization model.

• Chapter 4 shows and discussing the results of each sub model and demonstratingtheir flexibility by performing sensitivity analysis with different scenarios.

• Chapter 5 is a wrap up with an outline of the results outputs, and future work thatcould be done in the line of this work.

Theory 9

2 Theory

This chapter describes some principles and concepts that are important to know in thecontext of this study, it also discusses the intuition of using these principles in the scopeof our problem.The first section in this chapter gives gernal introduction about electricity markets, andmarket types. The second section describes time series and their characteristics, and thetime series approaches to in forecasting. The third section introduces the principles ofmachine learning and the different algorithms used in prices forecasting. The fourthsection introduces principles of mathematical optimization, such as linear programming,mixed integer linear programming, and stochastic programming.

2.1 Introduction to Electricity Markets

The electric power industry is mainly divided into different layers including the gen-eration, transmission, and distribution of the electric power for the households andindustries[23].

Generation layer contains all the generation capacity of the grid, which is come formcombined different generation resources. The different pros and cons of each productionresource makes a better mix for the generation, which makes better use of the advantagesand overcome the problems of each sources. There are several generation resourcesincluding renewable sources such as wind and solar power. Other sources includefossil fuel and nuclear resources. Generation could be running by private companies orgovernment owned companies.

Transmission layer represents the physical connections and setups between the gen-eration sites and the local substations. The transmission system regulator is usually asingle company that controls the transmission throughout the grid and maintains thegrid stability, if the frequency in the grid is higher or lower than certain amounts, thisregulator has the authority to dispatching some of the reserve that is available in the grid.

Distribution layer is the last stage of the power delivery to consumers, it is responsiblefor transferring electricity from transmission system to individual consumers. This layeris running by a larger number of companies called distribution companies (DISCOMs)that provides different packages with different prices for its customers.

Theory 10

2.1.1 Privatization of Electric power industry

In the early conventional electricity systems, the power utilities were running by indi-vidual companies normally governmentally owned companies. These companies wereresponsible for operating all layers of the industry, which leaded to inefficiencies in oper-ation as well as monopolies in the industry. However, electric power industry has beengoing through many restructuring changes that aim to deregulate the industry[24]. Themain goals of these changes are to reduce energy charges through competition, increasethe efficiency of the market, provide more choices and reliable services to customers, andcreate more business opportunities for new products and services. This restructuringaimed to reduce the control and monopoly that is introduced by companies that operatein simultaneously in many areas of the, the restructuring also targeted the electricitymarket to create processes that leads to more competition in the market by adopting free-market principles for exchanging the electricity related commodities between producersand the customers or large customers.

2.1.2 Electricity markets

The privatization of the industry also entails setting up free markets for exchangingelectricity related commodities, which leads to increasing the efficiency and reducingthe prices significantly. This market is a system that enables the exchange of purchasing(bidding) and selling (offering) power generation capacities between suppliers andcustomers. The market clearing price (MCP) and the market clearing volume (MCV) inthe electricity markets are determined by the supply (offerings) and demand (bidding)curves created by the consolidated bids and offers from all customers and producers inthe market such as shown in Figure 2.1 The system operator is an important player inthe electricity market, which is responsible for ensuring the efficient operation and faircompetition in the market while in the same time ensuring the continuous satisfactionof the demand. This job is not easy because the satisfaction of the demand is not asimportant to the power producers, as the sole aim of the producers is to increase theirprofit, whether that means to produce or stop at some time regardless of the satisfactionof the demand.

There are several electricity markets according to the commodities traded in those mar-kets. These commodities could be energy related commodities -such as total energyoutput in a defined period of time, or power related commodities -such as spinningreserve and ancillary services. Another way to differentiate electricity markets is theinterval of the market such as day ahead markets and intraday markets.

Theory 11

Figure 2.1: Electricity market setup

Day-ahead markets are the usual markets used in different countries to exchange thebids and offers of energy produced in an interval basis, the bids and offers at eachinterval are submitted before a predefined time in order to calculate the MCP and MCVfor each interval. Hourly intervals are the usual interval in the day-ahead markets, yetin some countries there are other intervals such like half-hourly intervals.

Intraday markets are usually employed to trade Power related commodities or ancillaryservices, such commodities as spinning reserve, non-spinning reserve, regulation upand down. In some countries the ancillary services are traded in both the day-aheadmarket and intraday market.

2.1.3 Importance of prices forecasting in electricity markets

Given the electricity market setup, and the dependence of the price and the productionon the supply and demand, it seems very important to all market players to predictprices in order to make better use of their resource.

Suppliers would benefit from prices forecasting in maximizing their profits by produc-ing more when prices are higher and lower when prices are low, they also use long termforecasting in financial planning and reporting.

Theory 12

Customers or bidders -either retailors or individual customers- would also make use ofthe prices forecasting to adjust their demands according to prices patterns, by shiftingthe power consumption away from times with prices spike to times when prices arelower.

System operators are concerned with the stability of the market and the whole electricgrid, so they are also concerned with prediction of electricity prices spikes, the mitigationof spikes, and the stability between demand and the supplied capacity.

Theory 13

2.2 Time series and forecasting

A time series is a sequence of data points that are indexed in time order with equal spaces.Examples of time series are hourly temperature readings, or yearly water consumption.They are applied widely in the fields of science that involves temporal measurements,such as statistics, control systems and econometrics.

Forecasting is one of the main applications of time series that applies a model to predictfuture values based on the previous values of that time series. Because of their temporalorder, time series are very useful in getting time attributes that increase forecastingaccuracy such as the trend and seasonality, which could be obtained by decomposingthe series [25]. Figure 2.2 below gives an example of a time series.

2.2.1 Features

Features are individual measurable properties of a phenomenon being observed [26].A feature of a time series that is used to predict it could be either internal or externalfeatures. Internal features are those who can be obtained from the time series itself.Examples of these features are the lag features, and the decomposition features suchas the trend and seasonal parts of the series [27]. External features are other timeseries that have relationship with the targeted time series which makes these featureshelpful in predicting the time series behaviour. Many internal and external features wereinvestigated in this study in order to check their applicability to the forecasting model.

Figure 2.2: Time series example: Fuel prices in Australia 2016-2019.

Theory 14

Lag Features

Lag features represent the past values of the series, some lags have high correlationswith current values of the series, these high correlated lags could be suitable to be usedas features for predicting the future values of the series. Lags could be determinedusing the Autocorrelation Function (ACF) the Partial Autocorrelation Function (PACF)as denoted by Hyndman and George [28].

Autocorrelation Function (ACF) plot is a representation of the correlation between thedata and shifted versions of itself, the autocorrelation ac fk between the value yt and yt−k

is denoted by equation 2.1.

ac f k =∑T

t=k+1 (yt − y) (yt−k − y)

∑Tt=k+1 (yt − y)2

(2.1)

where T is the length of the time series. ACF plot gives an intuition about how manylags have high importance to add in the forecasting process, an example of an ACF forthe data in figure 1 is shown in Figure 2.3 below.

Figure 2.3: ACF plot for the data in Figure 2.2

Partial Autocorrelation Function (PACF) plot is a representation of the correlation be-tween the data and shifted versions of itself, the difference between it and the ACF isthat shorter lags are removed, i.e. the PACF between xt and xt+h does not depend onthe values of the lags between them, in contrast with ACF which considers the pointsbetween them, according to Durbin [29] the estimated partial autocorrelations rj can beobtained by equation 2.2.

rj = φk1rj−1 + φk2rj−2 + · · ·+ φk(k−1)rj−k+1 + φkkrj−k (2.2)

Theory 15

Where φkj is the jth coefficient in an autoregressive representation of order k, an exampleof an ACF for the data in figure 1 is shown in Figure 2.4 below.

Figure 2.4: PACF plot for the data in Figure 2.2

Decomposition features

There are several methods for decomposing time series and extracting the importantfeatures such as trend and seasonality, one of the widely used methods is STL, whichstands for Seasonal Trend decomposition using Loess [30] , and it is a regression methodused for extracting the trend from data -that has nonlinear relationship- by smoothing.The series is decomposed into three parts: the trend, seasonality, and the residuals [28].

Trend is a representation of the long-term non-periodic variation, it shows the persistentincrease or decrease in the data. Trend could generally be represented by the movingaverage of the series (or by other methods).

Seasonality This component represents the seasonal variation of the data, which is thevariation that occurs over a fixed and known period (week, month, etc.), and repeatsitself at each of these periods.

Residuals are the remainder of the time series after removal of the trend and seasonalcomponents. They could be described as random noise, and could be modeled usingnormal distribution, with mean and standard deviation. Figure 2.5 below demonstratesthe decomposition of the time series taking an example of the time series shown inFigure 2.2. The upper part of the plot represents the original time series, and the restrepresents the decomposition components of that series.

Theory 16

Figure 2.5: Decomposition of the series in Figure 2.2 into weekly trend, seasonality, andresiduals

External features

External features could be any exogenous data with relationship with the targetedvariable that could give help in better predicting the variable, this relationship could bedetermined through correlation between the data and the targeted variable. Within thecontext of electricity prices and load forecasting, the exogenous features that have beenused in the literature include:

• Electricity load: both the historical and forecasted data of the load.

• Available generating capacity.

• Weather data: such as temperature, humidity, and wind speed.

• Time factors such as the hour, the day, and the year.

• Fuel prices: fuel cost represents highest percentage of the total cost of generation.

2.2.2 Time series Forecasting using lag features

Time series forecasting employs the relationships between the time series and its lagfeatures and use it as a basis to predict future values of the series. Several models areused in time series forecasting such as Auto Regressive (AR), Moving Average (MA),Autoregressive Moving Integrated Average (ARIMA), and Seasonal AutoregressiveMoving Integrated Average [28].It should be considered that the data is assumed to be stationery (it’s mean, and stan-dard deviation does not change with time) prior to using these time series methods.Stationarity could be ensured by subtracting the data from its trend and seasonality bydifferencing the data from a shifted version of itself, an example of differencing data is

Theory 17

shown in Figure 2.6. Where a differencing process is applied on the data that was shownin Figure 2.2.

Figure 2.6: Differencing of time series in Figure 2.2.

Autoregressive Model (AR)

The autoregressive model relies on the assumption that the output variable dependslinearly on the historical data of that values and some random term, thus it is formedof a linear regression between the series and the shifted versions of it, up to a certainnumber of lags denoted by (p). AR(p) model could be formulated as in equation 2.3.

yt = c + φ1yt−1 + φ2yt−2 + · · ·+ φpyt−p + εt (2.3)

where εt is white noise, and φ1, . . . ,φp are the parameters of the model.The number p is determined using the Partial Autocorrelation Function (PACF) plotdiscussed earlier. The PACF plot gives an intuition about how many lags to include inthe AR model according to their importance, which is determined by their correlation inthe plot.

Moving Average Model (MA)

It is the same as the autoregressive model, except that the values of the series at everytime stamp depends on the previous forecasting error values. It could be defined as inequation 2.4

yt = c + εt + θ1εt−1 + θ2εt−2 + · · ·+ θqεt−q (2.4)

where εt is white noise, and the θ1, . . . ,θq are the weights of each forecast error, and theεt, εt−1, ..., εt−q are white noise error terms.

Theory 18

The value of q is called the order of the MA model, and it is determined by the Auto-correlation Function plot (ACF), which is the plot that depicts the correlation betweenthe data and the error terms of the lag values (up to certain number of lags). ACF plotgives an intuition about how many lags to include in the MA model according to theirimportance, which is determined by their correlation in the plot.

Autoregressive Integrated Moving Average Model (ARIMA)

This model combines the autoregressive model with the moving average model, thusformulating the data points as functions of their lags and residual errors of their forecasts.It also adds to them another factor d -differencing factor- which denotes how manytimes should the data be referenced from itself before calculating the AR and MA terms,equation 2.5 shows a demopnstraiton of an ARIMA model.

y′t = c + φ1y′t−1 + · · ·+ φpy′t−p + θ1εt−1 + · · ·+ θqεt−q + εt (2.5)

where y′t is a the differenced series, φ1, .., φp are the AR parameters of the model, θ1, ..., θq

are the MA parameters of the model, and the εt, ..., εt−q are error terms. We can use thenotation ARIMA (p, d, q) to represent the weights of the autoregressive, differencing,and moving average terms of the ARIMA model.p: order of the autoregressive part.d: order of the differencing part.q: order of the moving average part.

Seasonal ARIMA Model (SARIMA)

The seasonal ARIMA model includes other terms P, D, and Q which are related tothe seasonality of the data, the parameter m is also added to note the periodicity ofthe seasonal parts of the model. The seasonal part terms are like the non-seasonalcomponents of the model but involve back-shifts of the seasonal period. Equation 2.6below shows an example of a SARIMA (1,1,1) (1,1,1) 4.

(1− φ1B)(

1−Φ1B4)(1− B)

(1− B4

)yt = (1 + θ1B)

(1 + φ1B4

)εt (2.6)

SARIMA method was used in this study to forecast electricity and fuel prices, due to itsconsideration of the seasonal components of the time series, whcih makes it the mostsuitable case of autoregressive moving average models for this application.

2.2.3 Time series forecasting using decomposition features

Time series decomposition is very helpful in analyzing and forecasting data, as each oneof the composing parts (trend, seasonality, residuals) has its own characteristics, andtherefore each one has a proper way to be forecasted according to those characteristics.

Theory 19

The trend, for example, represents the overall increase or decrease in the data, so it couldbe predicted by extrapolation. The seasonality part repeats itself with time, so it couldbe forecasted by repeating its pattern. The noise part could be modeled using a normaldistribution, then the predictions could be obtained by sampling from that distribution.After getting the predictions for each one of these parts, the additive or multiplicativemodels could be used to combine their predictions into single forecasting output.

Additive model

It is a simple model in which the decomposed parts are added together to model thedata or predict its future values. It assumes that there is no relationship between thetrend and seasonality.

data = trend + seasonality + residuals

Multiplicative Model

This model assumes that the seasonal part changes with the trend of the data (it increaseswhen the trend increases and decreases when it decreases). Thus the seasonal part needsto be multiplied by the trend to get the total value at each data point.

data = trend * seasonality * residuals

It could be mentioned that both models could be converted to each other using thelogarithm and exponential functions. As an example, the equations below show how tochange the multiplicative relation in the equation above into an additive one:

log(data) = log(trend * seasonality * residuals)

log(data) = log(trend) + log( seasonality) + log(residuals)

Decomposition Model Inplementations

Some algorithms that uses decomposition in forecasting try to extract more featuresfrom the decomposed time series. One of them is Prophet forecasting library introducedby Taylor and Letham [31] which have been investigated as a forecasting method inthis study. Prophet forecasts time series data depending on an additive model wherenon-linear trends are fit with different seasonalities. It also models holiday effects.Prophet works better with time series that have strong seasonal effects and severalseasons of historical data, which is a perfect match for electricity and fuel prices. Prophetis tolerable to missing data and shifts in the trend, and it handles spikes and outliers aswell.

Theory 20

2.3 Machine learning and forecasting

Machine learning is the field that studies the models and algorithms that are used bycomputer systems in order to perform specific tasks effectively without getting explicitinstructions. We can simply say that machine learning is teaching machines how to learnon their own [32].

Machine learning algorithms are usually categorized into supervised and unsupervisedmethods. In the supervised machine learning, labelled input and output data are usedto train the model and these data are used as the basis for constructing the model. In theunsupervised machine learning, unlabeled data are used in the model training, withoutoutputs being explicitly labeled. For the purposes of forecasting usually supervisedmethods are employed [26]. Supervised machine learning algorithms are generally usedto perform one of three operations on data: classification, regression, and clustering.In the classification, the outputs are classified into categories and the input featuresare used to determine which category the output will fall in. The regression is aboutgetting a numerical value of the output according to the input values, and the clusteringis similar to classification but without clear labels that determines classes.

2.3.1 Using machine learning regression for forecasting

Regression in supervised machine learning is a model that employs independent variableto predict a target value. It is mostly used in finding relationships between variablesand forecasting those variables. Forecasting data using machine learning requires theseinput variables to be related to the target value in a predicted way. These variables arecalled features, and one of the most important processes in building machine learningforecasting model is the features extraction (or what is called feature engineering). Themore related are those features to the target variable (features importance) the moreaccurate is the regression forecasting.There are different regression algorithms according to the type of the correlation betweenthe data, for example linear regression and its related regression techniques are suitablefor data with linear relationships. Other more complex relationships require morecomplex nonlinear techniques such as decision trees, random forests, and support vectormachines. All the regression algorithms motioned below have been tested in this studyin order to build the forecasting model and validate it.

2.3.2 Linear regression

In linear regression, linear relationship is assumed between the variables, and thus theregression is performed to get the parameters of that linear relationship (i.e. interceptsand coefficients), the linear regression relationship between two variables Y and X couldbe obtained as in equation 2.7

Y = βX + β0 (2.7)

Theory 21

Where Y and X are the target variable and the features matrix in order, and the factorsβ, β0 are the intercepts and coefficients for each of the features. Figure 2.7 belowdemonstrates linear regression between two random variables Y and X, showing thereal data points and the regression line with its characteristics.

Figure 2.7: Fitting a linear regression on a time series.

When more than one variable is used to fit in the regression, it then is called multipleregression, equation 2.8an example of multiple linear regression with three variablesx1, x2, and x3.

y = β0 + β1x1 + β2x2 + β3x3 (2.8)

Fitting a linear regression curve is done with different algorithms, all of which aim to fita linear model which reduces the errors of the estimations. One of the major algorithmsis residual sum of squared errors (RSS), which uses the sum of squared errors of theestimations as a measure of optimized fitting of the model. An example for a singleexplanatory variable RSS could be formulated as in equation 2.9.

RSS =n

∑i=1

(ε i)2 =

n

∑i=1

(yi − (α + βxi))2 (2.9)

Where yi is the ith value of the target variable, xi is the ith value of the explanatoryvariable x, α, and β is the coefficient for xj variable. The values of α and β are optimizedusing each of optimization techniques such as Ordinary Least Squares (OLS) technique,which put all the parameters of all variables in matrix representation Xβ = y , where y isthe outputs matrix , X is the inputs matrix, and β is the parameters matrix. The processthen is to calculate the gradients of each of the output errors to each of the parameters,and modify the parameters accordingly, more explanation could be found in [33]. It is

Theory 22

worth noting that Polynomial and logarithmic features could also be used in the linearregression model, by simply adding them as other input variables. For example equation2.10 below represents a multiple linear regression between a targeted variable y and afeature composed of a target variable x, its second square x2 and its logarithm log(x).

y = β0x + β1x2 + β2log(x) (2.10)

It could also be represented by the linear regression in equation 2.11

y = β0x1 + β1x2 + β2x3 (2.11)

Where: x1 = x, x2 = x2, x3 = log(x)

2.3.3 Over-fitting and Regularization

Overfitting is the production of analysis that gives outputs too close to a set of data, itoccurs when fitting biased data, and therefore fails to fit additional data and predictfuture values. Figure 2.8 below give a general idea about the overfitting problem forrandom data. The left side of the figure shows a proper fitting for a linear relationshipbetween two variables, while the right side shows an overfitted linear relationship fortwo variables with nonlinear relationship.

Figure 2.8: Overfitting linear regression

Overfitting occurs due to training the model with biased data or using more training datathat make the model to detect false patterns in the data. Regularization is a way to avoidmodel overfitting by penalizing the high weighted coefficients that allows some featuresto have the dominance in determining the output values. In other words regularizationin regression is used to constrain or shrinks the coefficient estimates towards zero inorder to discourages learning a more complex or flexible model, thus avoiding the riskof overfitting. Regularization in linear regression models adds a penalty factor usingdifferent methods, such as:

• L1 regularization, used by LASSO regression [34].

• L2 regularization, used by Ridge regression [35].

Theory 23

In Lasso regularization, the penalizing term L1 that is added to the linear regression isof the absolute value of magnitude of the coefficients λ ∑

pj=1

∣∣β j∣∣ , so the multiple linear

regression with regularization for X vector of variables and y target variable is denotedby equation 2.12

y =n

∑i=1

(Yi −

p

∑j=1

Xijβ j

)2

+ λp

∑j=1

∣∣β j∣∣ (2.12)

Ridge regularization penalized the coefficients using L2 term which depends on thesquared magnitude of coefficients λ ∑

pj=1 β2

j , so the linear regression will be as in equation2.13 below.

y =n

∑i=1

(yi −

p

∑j=1

xijβ j

)2

+ λp

∑j=1

β2j (2.13)

In Lasso regression, the coefficients of weak features are penalized up to reaching zerovalues, which makes it better than ridge regression in feature selection process, yetit might reduce importance values to some variables with respect to others with thesame importance. Other regression techniques employ both L1 and L2 regularizationterms in order to overcome the drawbacks of using each of them, such as Elastic Net[36]. In Elastic Net the optimum ratio between L1 and L2 is used to determine thepenalty terms weights. More complex regression methods such as support vectormachines and tree-based methods have many regularization parameters, and the tuningof these parameters leads to the optimum regularization of the weights in order to betteravoidance of overfitting.

2.3.4 Support Vector Regression

A Support Vector Machine (SVM) is a classifier that discriminates between sets of databy separating a line (or hyperplane for more than two-dimensional data). In otherwords, given labeled training data, it outputs an optimal hyperplane which predicts thecategories of new data, with a parameter epsilon that determines the margin betweenthis line and the centers of both classes of the data [37]. Support vector regressors(SVR) work with the same principles as support vector machine, except the regressor isinterested in the construction of the separating line with the least cost (margin errors)with the actual data. The main concept in the support vector regressor is to minimizethe weights w of the input variables x which gives the output y with the least error( possible [38], equation 2.14 below shows an example for support vector regressionbetween variables y and x with y = f (x) = wx + b:

minx

12 |x|

2

s.t. yi − wxi − b ≤ ε

wxi + b− yi ≤ ε

(2.14)

Theory 24

Where xi is a training sample with target value yi. The inner product plus interceptwxi + b is the prediction for that sample, and ε is a free parameter that serves as athreshold. The simplest linear kernel SVR assumes a linear relationship between thedata and the output, and thus draws a line that has the least squared errors between itand the data points. We can further use nonlinear SVRs to create regression with morecomplex relationships that cannot be simply captured by linear regression. Figure 2.9below gives theoretical example of a support vector regressors.

Figure 2.9: Fitting Support Vector regression

2.3.5 Decision Trees

Decision trees are decision support tools that use tree-shaped model that is composedof decisions and their consequences. A test will be carried out in each level of thedecision trees according to an attribute (splitting attribute) to determine the outputs ofthe branches which the tree will spread into, and each leaf node (ending node) in the treerepresents a class or a value of the outputs. Decision trees follow the same principlesfor classification and regression purposes, except that in the regression trees the targetvariable is assumed to be continuous, and the value got from a regression is the meanof the observations that fall in this node during the training of the model. Figure 2.10below shows the basic principle of decision trees that predicts the variable y based ontwo predictors, x1 and x2, each leaf node represents a value of the varialbe y.

Theory 25

Figure 2.10: Regression Trees Example: determining the value of y based on attributesof two variables x1 and x2 [39]

2.3.6 Ensemble models

It was found that improved performance of machine learning models could be obtainedby combining multiple models together to get the output instead of just using a singlemodel.One of the most powerful methods used in combining models is bagging which standsfor bootstrapping aggregation, and it entails using samples of the training data to runthe model different times and get the results as the average of the outputs. This methodis good with methods that have high sensitivity to fluctuations in the training sets(variance) and small estimator errors (biases) such as trees. Another method used isboosting which involves converting weak learners into strong learners, by training weaklearner sequentially, and each one tries to correct its predecessor.

2.3.7 Random Forests

Random forest [26] is one of the ensemble models techniques that use principles ofbagging to build different trees and get their average results. Trees are good candidatesfor this implementation since they have way less bis when they are deep enough, yetthey have high variance, so they benefit greatly from the averaging of the models. Figure2.11 below demonstrates the basic shape of a random forest.

Theory 26

Figure 2.11: Simplified random forest model for regression.

2.3.8 Gradient Boostinf and XGBoost

XGBoost is a library built on many programming languages to build classification andregression models[40]. The library is based on the Gradient boosting principle, whichproduces a prediction model in the form of an ensemble of weak prediction models-normally tree models. The idea of gradient boosting is that boosting can be regarded asan optimization algorithm on a suitable loss function [41].

2.3.9 Features importance

An important attribute of machine learning algorithms is that they provide results offeatures importance when they are used for model training, which helps in filtering thefeatures and ignoring feature that have negligible importance. Reducing the numberof features helps in making the model perform faster, and it also helps it improving itsoutput by reducing the high dimensions problem, which is referred to as the curse ofdimensionality [42], and which state that as dimensions of data increases. Then thespace in which the data exist increases faster which make the available data becomesmore sparse for statistical analysis. Feature importance have been applied in this studyto evaluate each for the features considered for the forecasting model development.Feature importance of different machine learning algorithms have been used and theprevalent ones were eventually used in the model development.

Theory 27

2.3.10 Forecasting accuracy measurement

Accuracy of each forecasting model is measured in terms of the prediction errors withrespect to actual data. The distribution of these measures are aggregated from theirvalues at each time stamp. The error measures used in this study are:

• Residuals.

• Root Mean Squared Errors (RMSE).

• Mean Absolute Percentage Errors (MAPE).

Residuals (or errors) measure how far are the data points from the regression line, orthe differences between each forecasted and the actual data point at each time stamp.The residual et at each time stamp t is calculated by equation 2.15, Where yt is actualvalue of y at time stamp t, and yt is the forecasted value of y.

et = yt − yt (2.15)

Root Mean Square Error (RMSE) gives an idea about how concentrated is the dataaround the right predictions. It represents the standard deviation of the residuals andhow spread out these residuals are. The calculation of RMSE could be denoted byequation 2.16, where T is the total period of time of the predictions.

RMSE =

√∑T

t=1 (yt − yt)2

T(2.16)

The Normalized RMSE (NRMSE) is used in this study to give more abstract measureof the errors, that is not related to the quantity of the measured variable. NormalizedRMSE could be calculated using different methods. Equation 2.17 shows the one used inthis study, where y is the mean of the predictions along the prediction interval.

NRMSE =RMSD

y∗ 100% (2.17)

Mean Absolute Percentage Errors (MAPE) is another measure of predictions accuracyof a forecasting model, that measures the apsolute percentage errors and gets their meanthroughout the data as shown in equation 2.18.

M =100%

n

n

∑t=1

∣∣∣∣yt − yt

yt

∣∣∣∣ (2.18)

Theory 28

2.4 Mathematical Optimization

Mathematical Optimization (or mathematical programming) is the field of mathematicsthat explores the selection of the best element from a set of available alternatives withrespect to some criteria. In its simplest case, it consists of minimizing (or maximizing)a real function, by choosing the best input variables from a set, and computing thevalue of that function [43]. Equation 2.19 describes a mathematical formulation for aminimization problem.

minx

f(x)

s.t. g(x) ≤ 0h(x) = 0

(2.19)

where x is a subset of R, and it is the domain of the functions (f, g, and h), the functiong(x) and h(x) are called the constraints, and f(x) is called the objective function.

2.4.1 Linear Optimization

Linear optimization is a special case of convex optimization that studies the problemswith objective functions that are linear, and the constraints are expressed using linearequalities and inequalities. There are several algorithms to use in order to solve Linearoptimization problem, such as SIMPLEX algorithm which constructs a polytope outof the objective function and the constraints and tries to go along the vertices to withnon-decreasing values of the objective function until a n optimum is reached [44].

2.4.2 Mixed Integer Linear Programming (MILP)

An optimization problem is an integer program if any of its decision variables is Un-restricted to be an integer. If some of those variables is an integer and the rest are notintegers then it is called Mixed Integer Linear Programming [45].Due to its variable restrictions, MILP problems take more time and consumption thanregular linear problems, thus some methods are used in their solution that try to approx-imate them to linear problems, such as linear relaxation, branch and bound.

Relaxation includes approximating the variables domains to be continuous, and thussolve it as a linear programming problem, the problem with relaxation is that the solu-tion is sub-optimal or a local optimum most of the time, so more techniques should beused with relaxation in order to approach the global optimum solution.

Branch and Bound are of the methods that are used in approaching the global optima.In this method, the decision variable that is relaxed to be integer is rounded up anddown to get its two bounding integer values, then substitute both of them in the objectivefunction, to inspect whether if we could get the optimal solution with any of them. The

Theory 29

problem is solved by constructing binary trees of sub-problems, at each of these binarybranches the value of the objective function) is inspected whether it has optimal solution(more optimal than the solution in the branch before it) with integer values then it isstored, otherwise this branch will be pruned. An example for a branch and boundprocess is illustrated on an optimization problem as in equation 2.20.

minx

0.2x1 + 0.3x2 + 0.5x3 + 0.1x4

s.t. 0.5x1 + 1.0x2 + 1.5x3 + 0.1x4 ≤ 3.10.3x1 + 0.8x2 + 1.5x3 + 0.4x4 ≤ 2.50.2x1 + 0.2x2 + 0.3x3 + 0.1x4 ≤ 0.4

xj ∈ 0,1∀j ∈ 1, ...,4

(2.20)

with optimal LP solution of 0.65 and a value of x2 that is not an integer. The tree isstarted using the variable x2 to split the first node of the tree, and branching and pruningcontinues as in Figure 2.12 until the solution P3 is reached with an objective value of 0.6and all variables are integers, the tree is stopped after that and P3 is taken as the optimalsolution.

Figure 2.12: An example of branch and bounding in solving MILP [46]

2.4.3 Stochastic Optimization

When some of the decision variables in an optimization problem are random variables,they lead to creating a random objective functions or random constraints, which makenormal optimization techniques to fails to solve the problem, hence comes the concept ofstochastic optimization. Stochastic optimization employs certain techniques to solve theproblem by creating scenarios using Monte-Carlo simulations, and then using statisticalinference in getting the expected output, for example getting the distribution of the data,and calculate the mean of data at each time stamp [47].

Theory 30

In the scope of this thesis, the optimization problem will be solved using MILP algo-rithms, since the many of the input variables are restricted to be integers, such as theunit commitment variables and dummy variables used for production unit statuses.

Some of the variables in the optimization problem could be modelled as random vari-ables, especially electricity and fuel price forecasts, and the approach used by consid-erable amount of researches for solving dispatch problems with forecasted prices isdone using stochastic optimization methodology, yet in this study the electricity andfuel prices will not be considered as random variables but as input parameters, and theuncertainty of this forecast will be considered by the confidence intervals of the forecast,the confidence interval will be used to create worst and best case scenarios for the prices,and the scenarios will be used to assess the sensitivity of the optimization output to thechanges in the forecasting accuracy.

Methodology 31

3 Methodology

This chapter gives an outline about the research methods followed in this study. Itprovides details of the forecasting model and the optimization model, which were twobuilding blocks for solving the economic dispatch problem as described in Figure 3.1.The forecasting model was used to generate predicted electricity and fuel prices and thedemand for the upcoming optimization period, then the optimization model used theseparameters to maximize the profit of the cc plant given the operating constraints andmarket constraints.

Figure 3.1: Proposed model in the study

The first section summarizes the machine learning framework used to build the fore-casting model, and the processes followed to validate the model and ensure its integrity.This section also specifies the case studies used to test the forecasting model. The secondsection outlines the optimization model, starting by the formulation of the problem as aMILP problem with its parameters, variables, constraints, and other considerations andassumptions that were taken during the formulation and solution of the problem. It alsodiscusses the alternative implementations of the problem using case studies for differentscenarios that the plants might be working with.

3.1 Forecasting Model

The framework used as a basis for building the forecasting model include the datahandeling and machine learning processes. Data handeling processes such as data collec-tion and data preparation, the machine learning processes include feature engineering,

Methodology 32

training the model, testing and validating the model. Figure 3.2 below shows these stepsin order. The model was built using python programming language code [48] , pythonincludes many libraries that are widely used in data science applications, such librarieswork on acquiring data (Requests and Urllib library), analyzing data (pandas, NumPy,and SciPy), plotting and presenting data (Matplotlib), and performing machine learningprocessing(Scikit-learn).

Figure 3.2: Machine Learning Framework.

3.1.1 Data Collection

First step in building the model was to acquire that raw data used in training the forecast-ing. This data included historical data of the prices and external features such as weatherdata and electricity demand. Data collection was done using python language scrapinglibraries Urllib and Requests to load data from the Application program interfaces (APIs)of each data source. Some of the data could be acquired by downloading CSV filesfrom the data sources using web browsers. The data was collected from three differentmarkets which were used to implement this model. The reason for this is to ensure thatthe model is applicable for different market environments, their details are depicted inTable 3.1 below.

The amount and quality of datasets which were acquired varied for different markets,according to their availability within the market resources. For example, market-1 andmarket-2 both have open websites to download all the historical data about day aheadprices and demand, but they don’t have more details about intraday markets or ancillaryservices. O the other hand market-3 has a large database containing more informationabout prices, demand, ancillary services, available production capacity and other valu-able data. Figure 3.3 shows a sample from the dataset of electricity prices acquired frommarket-1 for the period 2016-2019.

Methodology 33

Figure 3.3: Sample of fuel prices in market-1

Some other external data such as weather data were acquired from some online servicesdedicated for these purposes, Figure 3.4 shows samples of some weather data featuresused in this study. Other useful data about the markets were also collected to extractfeatures from them, such data as ancillary services and day ahead projected demand areshown in Figure 3.5.

Figure 3.4: Samples from weather data used in building the model.

Methodology 34

Figure 3.5: Samples form market data used in building the model.

Table 3.1: Information of markets used to build and validate forecasting model

Market market-1 market-2 market-3

Continent Oceania Europe America

Electricity Price intervals Half hourly Hourly Hourly

Fuel Price intervals 5 different intervals Hourly Hourly

3.1.2 Data Preparation

Data preparation incorporates data cleaning and data analysis which are essential pro-cesses to make the data useful for any application. Data cleaning is the process ofdetecting and correcting inaccurate or corrupt values in order to make it ready to beanalyzed, such corrupt values could be detected by different analysis tools. A simpleway is to plot the data series, and check the points without values ( referred to as NaNsor not available values). In this study the NANs of the time series were filled usinginterpolation because of the relationships between neighboring datapoints.

Data Analysis denotes the inspection of the time series regarding its components andcharacteristics. Theses analyses were carried out for each of the prices on all the markets.Characteristics such as lags, stationarity, trends, and seasonality were retrieved in orderto prepare the time series for features extraction (feature engineering). STL decompo-sition was used to plot the seasonal-trend decomposition for the time series such asdepicted in Figure 2.4 in chapter 2. The ACF and PACF plots were also constructed forall price time series as in Figure 2.2 and Figure 2.3 in chapter 2.

Methodology 35

Other observations were also considered during the data preparation, such as the timeintervals of each time series. Some of the datasets had less intervals than the others, andthus they needed to be interpolated to give the same intervals as other datasets. Table 3.1shows data interval for electricity prices and fuel prices in different markets. From thistable it is obvious that fuel prices in market-1 have different interval patterns than othermarkets, and it needed further processing to harmonize these intervals. The solutionfor this was splitting the 8 hours and 4 hours intervals into 1 hour intervals with theassumption that forecasted price are actually the same throughout the whole intervalperiod. Data preparation was done with help of pandas, NumPy, SciPy, Matplotlib, andother data analysis libraries form python language.

3.1.3 Feature Engineering

Feature engineering refers to the analysis of the data in order to extract features thatenables the machine learning model to better predict the targeted variables. Feature im-portance is a very important attribute in any machine learning algorithms, and it showswhich features are more important to predict the output according to this algorithm.

For this model, tens of internal and external features were investigated, and the impor-tant features were eventually used to fit the model. Examples of the internal featuresare the lags, moving average, and the seasonal factors of the time series itself. Sometime series analysis techniques were used in determining these features as discussedin Chapter 2. The ACF and PACF plots were used to find the best lags to add to thefeatures, and STL decomposition was used in determining the rolling periods of themoving average and the dominant seasonality factors of each time series. Figure 3.6depicts the correlation between the electricity price and some of its lags.

Methodology 36

Figure 3.6: Correlation between electricity price and some of its lags in market-3.

External features were investigated from the external data that was acquired, featureimportance of some ML algorithms such as LASSO were used to determine the mostimportant features as described in chapter 2. Examples of selected external featuresare the electricity demand, spinning reserve and other ancillary services, renewablesgeneration, supplied capacity, time variables, and weather data. The correlation matrixshown in Figure 3.7 represents the relationship between the electricity price and someexternal features. More detailed correlation plot between electricity price and demandin market-3 is described in Figure 3.7. A linear relationship is noticeable between thevariables despite the fact that the data is scattered, also lots of outliers -spikes- arenoticeable when the price is higher than 200 USD/MWh. The temperature data is alsoused in the figure to color the correlation data, it shows that there are slightly differentcorrelation patterns between price and demand value according to different temperatureranges, which means that both variables have a correlation with temperature.

Methodology 37

Figure 3.7: Correlation matrix between electricity price and some external features.

Methodology 38

Figure 3.8: Correlation between the electricity price and its demand.

Features allocated for each model differed according to the forecasting interval, forexample a feature of one day lag could not be suitable for a model that forecasts for theweek ahead, as the day lag is not known for the whole coming weeks in advance. Insteada week lag is suitable for that model. Features also differ according to the forecastedvariable that the model is used to forecast, for example electricity demand is used as animportant feature for electricity price but not that important for fuel price. The marketused in the model and the available data for that market also affect which features touse, for example the spinning reserve scheduled could be used as feature for forecastingin the market-3, but it is not available for market-1 and market-2.

3.1.4 Model Training

Training the model involves preparing training datasets, choosing the proper trainingalgorithm, performing hyperparameter tuning for the algorithm, and performing featureenhancement on the model.

Preparing training data

Preparing the training data involves splitting the data into training and test set, as thetraining data is used to run the model, while test data is run after training the model in

Methodology 39

order to validate it. The default training and test splitting for machine learning model is75% to 25%, and it was used in this study for the hyperparameter tuning search grid, yetfor training the model and testing it; a span of one year period was used of the trainingdata and a week was the testing data, in order to validate the model with its actualapplication which is to forecast one week in the future.

Training Algorithms

Several machine learning regression algorithms - discussed in Chapter 2- were exploredin training the model in this study, such as Lasso, elastic net, support vector machine,Random forest, and Xgboost. The more complex models such as random forest andXgboost scored better results in accuracy, while the linear regression models such asLasso and elastic net were faster to run in comparison to other algorithms. All thementioned machine learning algorithms were imported form the scikit-learn library inpython language.

Hyper parameter tuning

Hyperparameter in machine learning is a parameter which value is set prior to thetraining process, i.e. its values are not changing during the training. Each of the ma-chine learning algorithms used included some hyperparameters which needed to betuned in order to give the optimum performance. Example of a hyperparameter forLasso algorithm is alpha which is a number representing the penalty coefficient used toregularize the weights of the features. An example of a hyperparameter for a randomforest algorithm could be the maximum depth of the tree, or the maximum number offeatures to consider while splitting a node.

Hyperparameter tuning in this study was carried out using search grid library whichis a part of the scikit-learn library based on python language, which runs the modeland does an exhaustive search with a list of values assigned for each hyperparameter.This method tries all the combinations of these parameters and evaluates the modelperformance for each combination according to certain accuracy measure such as RMSEor MAPE as discussed in chapter 2.

Features selection

Features selection is a continuous feedback loop between the feature engineering processand the model training. After each training iteration of the model; a feature importanceresults showed the performance of the features. Then the feature engineering processwas revised again given the new information about the features, and the weak featureswere removed.

Methodology 40

3.1.5 Model Validation

Validating the model is important to ensure the model integrity and applicability ondifferent data sets. The validation method used in this study was a special case ofcross-validation that is useful for time series forecasting called nested cross-validation[49].

Cross-validation

Cross-validation is used in machine learning models to estimate their performancein different sets of data. That is done by using a limited sample of the data to trainthe model and test it on another sample of the data, therefore ensuring an unbiasedestimate of the model. Nested cross-validation is a method that employs two loops forcross-validation. An inner loop during the training process for hyperparameter tuning,and an outer loop during the testing process. The inner loop employs K-fold validationon training data -shown in Figure 3.6- in order to optimize the hyperparameters. This isdone by splitting the training set into folds while running the search grid. The searchgrid uses some of the folds to fit the model and others to evaluate it, then it swaps thetraining and testing folds to evaluate each set of parameters by the average accuracy ofall K-fold iterations. The outer loop is a walk-forward testing loop, which is a result ofsplitting the data into small batches, and performing the training and testing processesfor each batch, the accuracy measures are then calculated from the results of all thebatches in order to give a unified evaluation for the model. Walk-forward testing is verysuitable for time series forecasting, because of the time dependence of the data and theimportance of consecutive analysis for real time data.

There are two types of walk-through validation, anchored and unanchored validation. Inanchored validation each batch of data is the previous batch added to it new data, whilein unanchored validation each batch of data takes a portion from the previous data anda portion of the new data. In this study both anchored and unanchored methods wereinvestigated, and it appeared that unanchored method saves time because it drop olddata as it moves forward. It also captures all relevant information from the timeserieswhen set to a certain length, for example in electricity prices forecasting model; a periodof one year was enough to capture all patterns of the data, and when batches morethan one year were used, the results were not much better. Figure 3.9 below givesa demonstration about nested validation, and its component loops. The type of thewalk-forward validation in this example is anchored validation, which add the test datato the training data for every new trial.

Methodology 41

Figure 3.9: Nest validation model.

3.1.6 Case studies for forecasting model

In order to test the validity of the forecasting model, several case studies were carriedout as follow:

1. Compare between machine learning and time series forecasting modelThe machine learning based forecasting model was benchmarked with time seriesforecasting methods. Two different time series models were used for this com-parison. The methods used to compare between these methods were the meanand standard deviation of the RMSE normalized and MAPE. The first model wasbuilt on autoregressive methods, implemented using the seasonal SARIMA modelwhich takes into consideration the multiple seasonalities associated with electricityprices and demand. For the SARIMAX model, a search grid was constructed to getthe optimum parameters p,d,q, P, D, Q, and m, the two seasonality periods of themodel were set to one day and one week. The second model was Prophet librarythat is based on the additive model, in which the time series was decomposed intotrend, seasonality, and residuals.

Methodology 42

2. Compare between different machine learning forecasting methodsThe Machine learning methods were further investigated in order to see the per-formance and applicability of each to the optimization model. The methods usedin this study are the linear regression-based methods such as Lasso and Elastic net,and more complex methods such as support vector machine, random forests, andboosting methods such as Xgboost. The measures for performance for these meth-ods includes the distributions of the residuals, normalized RMSEs and MAPEs.Tme of execution is also a critical measure because if an algorithm is more accuratebut takes long time it might not be practical to be used in the dispatch optimizer.

3. Compare between different models for different forecasting horizonsThree instances of the model were created with respect to forecasting horizons:one day ahead, two days ahead, and a week ahead forecast. The features foreach instance were different as stated in chapter 3, because some features are onlyavailable to the one day ahead forecast and not for two or week ahead forecast.

Methodology 43

3.2 Optimization Model

3.2.1 Optimization Problem Formulation

This optimization model was developed to optimize the operations of the CCPs, bydispatching plant units throughout a specific period, to satisfy different demands withdifferent prices, in an order that gives out the maximum profit. The formulation ofoptimization problem in this study was based on the method used by Rosso [5], sinceboth problems are based on the same power plants and the same data sets.

The inputs parameters, variables, constraints and objective function formulation isdenoted for i number of generation units satisfying j number of demands for each timestamp t, with j number of demands to satisfy, and k number of parameters used for fuelinput calculation

3.2.2 Input parameters

dj,t : electricity demand j at time t.Epj,t : electricity price associated with demand j at time t.Fpt : Fuel price at time t.Pns.costj,t : the cost of not satisfying part of demand j at time t.Pi,t : Maximum available capacity of unit i at time t.Pi,t : Minimum operating power of unit i at time t.ak,i : coefficient of parameter k for unit i.CITi,t : Compressor inlet temperature of unit i at time t.RHi,t : Relative humidity of unit i at time t.

3.2.3 Optimization Variables

Pi,t : output power of unit i at time t.ui,t : unit commitment of unit i at time t.Pnsj,t : power not served out of the demand j at time t.Fi,t : fuel input of unit i at time t.

3.2.4 Objective function

The objective function includes the revenue and the costs regarded in the problem scopeas shown in equation below, the revenue is calculated by multiplying the electricityproduced by its price. The electricity produced is equal to the aggregate of the demandsminus their power not served parts. The cost function is composed of two parts. The fuelcost and the power not served cost, the fuel cost is assumed to be dependent on the fuelinput and fuel price, and the cost of power not served is dependent on the percentage

Methodology 44

of each demand that is not satisfied multiplied by its corresponding cost at each timestamp. Simplified formula of the objective is represented in formula 3.1, and a moredetailed formulation is represented in formula 3.2

maxT

∑t=1

( revenue − cost o f power not served − cost o f f uel ) (3.1)

maxT

∑t=1

(I

∑i=1

Epj,t ∗ (dj,t − Pnsj,t)−N

∑j=1

Pns.costj,t ∗ Pnsj,t −N

∑i=1

Fpi,t ∗ Fi,t

)(3.2)

3.2.5 Optimization Constraints

power output boundaries:Pi,t ≤ Pi,t ≤ Pi,t (3.3)

Unit commitment boundaries:ui,t ∈ 0,1 (3.4)

Fuel input calculation:

Fi,t = a1,i ∗ CITi,t + a2,i ∗ RHi,t + a3,i ∗ Pi,t + a4,i ∗ P2i,t (3.5)

Demand satisfaction constraint:

N

∑i=1

Pi,t =J

∑j=1

(demand j,t − Pnsj,t

)(3.6)

The power produced at each time stamp is satisfying different demands for differentcustomers that the customer is selling the electricity to. Some of the demands could befulfilling long term power purchasing agreements with fixed amount and/or fixed price,some of them could be satisfying a day ahead or intraday market with its forecasteddemand and prices. Figure 3.10 shows an example of two of the demands the problemconsidered. The first demand represents a long term power purchasing agreement (PPA)to the electricity grid, with a constant 20 MW being dispatched all the period specified inthe plot. The second demand denotes the forecasted day ahead market demand whichvaries through time. Figure 3.11 shows the price plots for the two demands separately.The price for the first demand is constant which corresponds to the PPA, while thesecond price is resembling the day ahead price associated with the demands of the dayahead market.

Methodology 45

Figure 3.10: Two different demands plotted for a period of two days.

Figure 3.11: Prices corresponding to the demands in Figure 3.10

An important constraint to point out is the demand satisfaction constraint. It ensuresthat the total power produced should be equal to the demands forecasted subtractedfrom it the portion of power that is not produced from those demands.

Satisfied demand = forecasted demand - power not produced

The demand satisfaction constraint along with the objective function aim to optimizethe amount of power not produced Pns. The objective function is trying to minimize thecost that is due to Pns, yet the demand satisfaction constraint balances this, by statingthat the power not produced could be as much as needed as long as it is less costly tothe plant. In other words the demand will always be satisfied unless if it costs more toproduce power than to stop producing. In that case it is better to stop producing andpaying the penalty cost related to Pns. This also gives the possibility to cut some of thepower produced to satisfy certain demand and use it to satisfy another demand withhigher profit.

Methodology 46

3.2.6 Optimization Problem Implementation

The optimization model was created in python language as an Algebraic ModelingLanguage (AML) object using the pyomo library. Then it was solved as a MILP problemusing the CPLEX solver, since some of the variables in this problem are constrainedto be integers -such as the binary variable that denotes whether each unit is on oroff at each time stamp. Some of the variables in the optimization problem such aselectricity price could have been modelled as random variables because they are theresults of forecasting process that are not 100% accurate, and the approach used bymany researchers for solving dispatch problems with random price variables is usingstochastic optimization methodology. However, in this study the forecasted electricityand fuel prices are modelled as input parameters with known values instead of randomvariables, hence the optimization is not a stochastic process. The uncertainty of theforecasted prices will be taken into consideration by using the confidence intervals of theforecast in calculating the sensitivity analysis to check the difference in the optimizationoutputs with the change in the forecasting accuracy.

3.2.7 Case studies for optimization Model

1. No Specific demandThe optimization had been run to dispatch the plant units without any specifieddemand, i.e. the plant can sell as much as it produces. The purpose of this test isto check whether the model would suggest producing at maximum capacity whenit is profitable and will produce zero otherwise.

2. Demand satisfactionIn this test, the optimization model is run under the constraint that at each timestamp there is a specific demand to satisfy -which is the historical demand satisfiedby the plant. The idea behind this test is to demonstrate whether the model offersbetter optimized dispatch combination than the customer for the same demand,and measure the profit optimization between them as well.

3. Different demands and power not servedIn this case, the model is run with the assumption that it has different demandsto satisfy at the same time. Each demand corresponds to a market or a PPA withcertain price at each time stamp. Each of these demands has power not servedvariable (Pns) corresponds to the energy not served concept in literature whichenables the customer to not satisfy part of the demand if it is less expensive to paythe cost of not producing that part. The constraint of the demand satisfaction isrelaxed so that the plant can reduce the power produced as much as it can. Thisreduction is penalized by introduction of the power not served cost to the objectivefunction. The goal of this test is to show the further profit optimization opportunityachieved by the introduction of the power not served term in optimization model.

Methodology 47

4. Sensitivity analysis of the forecast output on the optimization outputsThe forecast outputs of models with different forecasting intervals are tested on theoptimization model, in order to check the sensitivity of the optimization outputwith the forecast outputs. The scenarios of the test are made from the confidenceintervals of the forecast that are depicted in Figure 3.12, which shows electricityprices with 95% confidence intervals. The 95% indicates that the price will fallwithin this interval by a probability of 95%. The best-case scenarios are definedas when the electricity price is on the highest expected value which is the upperlimit of the confidence interval, while fuel price is on the lowest expected valuewhich is the lower limit of the confidence interval. The worst-case scenario is theopposite as the electricity price is in the lowest point and fuel price is in the highestpoint. The sensitivity analysis is compared for both the one day ahead forecastmodel and the one week ahead model. For each of them, the normal forecast, thebest case and the worst-case scenarios are tested. The tested model is constrainedto satisfy one demand with the same price in order to focus on the effects on thedispatch output of each unit.

Figure 3.12: Example of forecasting output with 95% confidence intervals.

Results 48

4 Results

In this chapter, presentation and discussion of the results found by this study are pro-vided. The chapter is divided into two sections, the first one shows the results of theforecasting model by comparing between different models with respect to algorithmsused, time span of forecasting, and market targeted, it also discusses some phenomenasuch as spikes and their effects on the model performance. The second section discussesthe optimization model performance, representing some testing scenarios, and compar-ing the model behavior in each scenario with the customer behavior, in order to showthe area of improvement that could be covered by the model in profit optimization.

4.1 Forecasting Model Case Studies

4.1.1 Comparison between ML and time series forecasting methods

The machine learning model was used as the base model for the forecasting after com-paring its results with the time series models, the results of the comparison betweensome of the machine learning algorithms and the time series models is shown in Table4.1 below, the results are for forecasting fuel prices in market-1 for 45 weeks for the year2018 given the historical data of fuel prices form 2016 January until the forecasting date.

The reason for the machine learning model better performance is that there are highcorrelations with external variables that could not be captured by time series models,these correlations help in getting better forecasting accuracy, also machine learningmodels employ more computational power in detecting complex relationship betweenthe variable and its internal and external features.

Table 4.1: Comparison between time series methods and machine learningMethod NRMSE [%] MAPE[%]

SARIMAX 10.62 7.88Prophet 13.15 10.25Elastic net 9.32 7.63Random Forest 8.88 6.97

Results 49

4.1.2 Comparison between different ML forecasting methods

Linear models such as elastic net performed better with respect to speed and executiontime. However, more complex machine learning algorithms such as support vectormachines and random forests provided better performance with respect to accuracyin comparison to linear regression models. The reason for the better performance ofcomplex models is that the target variables have nonlinear relationships with some ofthe features, which could not be captured by the linear models, and the reason thatlinear models are faster is that they generally require less computations in fitting themodel. Figure 4.1 gives a comparison between different ML algorithms for forecastingelectricity prices for consecutive 18 weeks in market-3, the comparison is provided interms of RMSE, MAPE, and residuals distributions over the forecasting period usingbox plot. The results from the figure shows that elastic net algorithm which is based onlinear regression models have less accuracy than other models.Despite the fact that support vector machine score better accuracy that the rest ofalgorithms in some of the models, it was slightly better than random forest and Xgboost,also Xgboost and random forest are faster than support vector machine and they haveconstant accuracy for different models, which makes them more applicable for thedispatch optimizer.

4.1.3 Comparison between models for different forecasting horizons

The forecast outputs for one day ahead model always got better accuracy than the othermodels, and the week ahead models were the least accurate. Figure 4.2 shows snippetsof forecasting from the three different instances to give an example about the predictionaccuracy of each model, and figure 4.3 shows the forecasting residuals distribution ofeach of the models, from both figures it is obvious that the one week model has moredispersion in the forecast errors with respect to other models.

Results 50

Figure 4.1: Accuracy comparison between different ML algorithms.

Results 51

Figure 4.2: Performance comparison between three models with different intervals.

Figure 4.3: Forecasting residuals distribution.

The accuracy of the model was measured from the distribution of the errors through-out the walk-forward forecasting, however the model performed poorly in some timestamps that contain spikes as shown in Figure 4.4 which plots a sample of the output ofthe model in market-3 for one day ahead forecast, the reason for this poor performancein the areas of spikes is that the features used in this study are not enough to capturethe spikes incidences, as the spikes occurrence depends on the real-time imbalancebetween supply and demand, and therefore the supply data is important to capturespikes. Another factor that affects spikes detection is the complexity of the forecastingsystem, the current system uses only regression algorithms to predict the target variable

Results 52

according to its relationship with features, but more complex deep learning models havethe potential to better could capture these behaviors.

Figure 4.4: Example of spikes incidences.

The spikes incidence affects the distribution of the accuracy measure, as the distributionof the errors represented in MAPE or RMSE is skewed rather than being a normaldistribution. Figure 4.5 represents the distribution of the MAPE errors of the forecastingmodel during the spikes’ timestamps, using the probability density function, and itclearly shows a skewness of the data to the right indicating high errors.

Figure 4.5: Distribution of MAPE of 1 week ahead random forest mopdel.

Results 53

The overall comparison of the model performance for different forecasted variables anddifferent forecasting intervals is shown in Table 4.2. It generally shows that:

• Fuel price forecasting has better accuracy than electricity prices, due to the spikesand the relatively high volatility of electricity prices.

• The models with less forecasting periods perform better, due to the higher correla-tion between closer data points.

Table 4.2: Results of different variables with different time intervals

Variable Forecasting Horizon MAPE NRMSE

Fuel Price one day 6.4% 7.7%

Fuel Price two days 10.0% 13.0%

Fuel Price one week 17.0% 20.0%

Electricity Price one day 9.1% 11.1%

Electricity Price two days 13.3% 14.0%

Electricity Price one week 18.0% 20.3%

Results 54

4.2 Optimization model Results

The results of the optimization model were created using historical data form plant Afrom market-3, which contains one set of combined cycle units (two gas turbines GT01and GT02, and one steam turbine ST). The model was tested to run under differentconstraints in order to assess the flexibility of the model in providing optimized outputin different operating configuration. It should be mentioned that all the graphs in thissection were normalized in order to keep classified information of the customers andthe company. Optimization model case studies:

4.2.1 No specific demand

Figure 4.6 shows the results of this test depicting the model dispatch output and theprofit at each time stamp.

Figure 4.6: Dispatch output and the corresponding profit and efficiency of the plant.

The plant first dispatches all units at their maximum capacity, when the profit goesdown and approaches zero the plant tries to reduce the power produced by taking theless efficient unit, so it shuts down the unit GT01 because its efficiency is slightly lowerthan GT02 as noticed form the lower part of Figure 4.5. When the profit reaches zero theplant shuts down all units, and when the profit rises again the plant dispatches all itsunits at once. This is the anticipated behavior for profit optimization, as the plant:

• Works at maximum capacity when it is profitable, and at zero when it is not.

Results 55

• Prioritizes to shut down the less efficient units.

4.2.2 Demand satisfaction

Figure 4.7 shows the comparison between the optimization dispatch and the customer’sdispatch; it also shows the saving in the fuel by the optimization model by comparing itwith customer’s fuel consumption. From the figure it could be seen that the optimizationmodel tends to reduce the load of GT01 which is the less efficient unit, and insteadputs GT02 in its maximum available capacity, this change in dispatch reduces the fuelconsumed by the plant as shown in the lower part of the figure, as it shows that thefuel consumption by the optimization model output is always less than the customer’sdispatch.The optimized output of the steam turbine is within the same range of the customer’sdispatch, that is due to the assumption in this model that the customer has been operatingthe steam turbine in the most efficient way, so the optimization model tries to dispatchthe gas turbine sin the best available combination, and the steam turbine output ismodeled as a function of the sum of the exhaust heat of the optimized gas turbines.

Results 56

Figure 4.7: Optimized dispatch vs customer’s dispatch.

Results 57

4.2.3 Different demands and power not served

Figure 4.8 shows comparison for the dispatch output of two scenarios, one with Pnsconstraint and the other without it, the total demand in this test is decomposed into twoseparate demands each has its own price and Pns cost.

Figure 4.8: Comparing dispatch with and without Pns.

The figure shows that the dispatch with the Pns constraint obtains more profit, the reasonis that it gives the chance for the plant at some timestamps when it is less expensive- toreduce the power produced and pay the cost of not producing that part of the demandwhich is less expensive than operating cost. The figure also shows the electricity and fuelprice at each time stamp, which depicts that the Pns events occur when the electricityprice is lower, i.e. the profit is comparatively less. Despite the fact that the dispatchoutput for the second demand is the same in both scenarios, it is incurring a profitincrement at same time stamps in which demand 1 is incurring some power percentagenot served, this profit increasing entails that the overall efficiency of the plant is higherat those time stamps, so this reduction and efficiency increment affects all plant outputsmeaning that the fuel consumption is less for all the demands, accordingly the profitincreases.

Results 58

4.2.4 Sensitivity analysis of the optimization output to the forecast

Table 4.3 shows the results of the sensitivity analysis explained in chapter 3. The resultsoutcomes are that:

• The total cost was not varying much with change in prices in all the scenarios, thismeans that the optimization output is not sensitive to the change in fuel prices. Onthe other hand, the total revenue was changing remarkably with changes in prices,this means the electricity prices forecast is crucial to the optimization model.

• The distribution of the unit commitment and production between the individualunits of the plant were not affected by the change in forecasted prices, this isbecause the maintenance cost and constraints wasn’t included in the scope of thisstudy, the inclusion of these costs and constraints for each unit would have moreeffects on the units’ dispatch distribution.

• The total revenue obtained by the best-case scenario was as close to the actualrevenue as the revenue got by the normal forecast scenario, which means that theupper limit of the electricity price forecast was closer than the forecasted price tothe actual price, this happened due to the high spikes in the period of time usedfor the test and the associated high forecasting errors which led to the actual pricebeing closer to the confidence intervals limits. These results make it clear thatconfidence intervals are very important measures to be included in the forecastingoutput, especially for the remedy of the spikes in the forecast.

Table 4.3: Sensitivity analysis of the optimization output due to forecasting change

Conclusion 59

5 Conclusion

5.1 Study results

• This study presented knowledge in the areas of forecasting using machine learningand time series methods, and areas of mathematical optimization using mixedinteger linear programming. It also provided experience about employing theseprinciples in energy industry challenges.

• In this study different methods were investigated and compared in order to forecastelectricity and fuel prices, the machine learning models outperformed time seriesforecasting methods, the reason is that time series models uses internal features ofseries, while machine learning methods investigates more on external features.

• Furthermore, more complex machine learning models such as support vectormachines and ensemble models outperformed linear regression-based models.The rationale behind this is that the more complex machine learning algorithmscould detect more complex nonlinear relationships that could not be detected bylinear regression-based models.

• The forecasting model was used to predict for different horizons to check theoutput accuracy for each of the horizons, forecasting for one day ahead gave moreaccurate results than for one week ahead and slightly more accurate than two daysforecasts. However, forecasting with different horizons proved to be importantto serve different purposes for plant operation optimization, as forecasting forshort intervals results in more accurate optimized output to be used for actualdispatch of the plant units, and forecasting for long intervals is helpful in cashflow management, risk computations and financial projections.

• The data used to build and validate this model were acquired form three differentmarkets, the results showed that the amount and the type of data available in eachmarket affects the accuracy of the forecast.

• The optimization model was tested without limited demand to satisfy. It showedefficiency oriented behavior, as it dispatched the plant on maximum capacity whenit was profitable, and shut it down when it wasn’t. It also dispatched the higherefficient units before the less efficient ones, and shut down the less efficient onebefore the higher efficient ones.

Conclusion 60

• The model was compared with customer’s dispatch using the same data and samedemand to satisfy, the dispatch done by the model showed to be more optimizedthan customer, as its fuel consumption was less than customer’s consumption.

• The model was tested on satisfying different demands with different prices, andwith cost of energy not served for each demand, and it showed flexible and profitoriented behavior, as it satisfied each of the demands when it was profitable, andreduced the satisfaction for some demands when it was more optimum from profitviewpoint.

• The sensitivity analysis of the forecast on the optimization shows that whenthe demand to be satisfied is determined, the forecasting of prices affects thefinancial outcomes more than it affect the actual dispatch of the units, i.e. units aredispatched in the more efficient way despite the forecast.

• The sensitivity analysis shed the light on the importance of the confidence inter-vals in forecasting prices to increase the accuracy of the forecast and thereby theoptimization certainty.

5.2 Future work

• The accuracy results of the forecasting model were extremely affected by theexistence of spikes, the spikes prediction and calculations weren’t effective withthe features used in this study, as studies have shown that electricity price spikesoccur depending on the balance between the instant supply and demand and somefeatures related to this balance [50] . Future work could be done in this area bytrying to include more features that help in spikes detection and classification.Employing more powerful tools to forecast spikes would also be significant, deeplearning methods such as recurrent neural networks proved to have more potentialin detecting time series patterns and seasonalities, thus these methods could beinvestigated to solve spikes prediction issues in electricity prices.

• The differences between electricity markets in different countries affect the accu-racy of the method used to fit the model according to the available features andtheir behaviors in that market, thus if an optimization decision support tool needsto be applicable for different markets; it has to be configurable in a way that itdefines some parameters that could be used to describe market characteristics,then each market will be represented as set of parameters. Working on specifyingthose parameters and making them as comprehensive as possible could be usefulin designing universal forecasting frameworks for electricity and fuel prices.

• The electricity prices forecasted in this study are day-ahead market prices. Thefeatures used in this study are not convenient to forecast intraday and ancillaryservices prices, so a possible continuation of this work could be to work more on

Conclusion 61

finiding more relevant features to forecast prices of intraday markets. This willlead to better use of the optimizaiton model at hand, which optimizes the profit ofpower plants that bid on different markets.

Literature 62

Literature

[1] I. E. Agency, Lessons from Liberalised Electricity Markets. 2005, p. 224.

[2] E. León and M. Martın, “Optimal production of power in a combined cycle frommanure based biogas”, Energy Conversion and Management, vol. 114, pp. 89–99,2016.

[3] E. H. Abed, N. S. Namachchivaya, T. J. Overbye, M. Pai, P. W. Sauer, andA. Sussman, “Data-driven power system operations”, in International Conferenceon Computational Science, Springer, 2006, pp. 448–455.

[4] E. Bahilo Rodriguez, “Unit commitment of gas turbines using machine learningand milp programming”, Master’s thesis, Hogskolan I Gavle, Unpublished., 2019.

[5] S. Rosso, “Power plant operation optimization - economic dispatch of combinedcycle power plant”, Master’s thesis, KTH, Royal Institute, Unpublished., 2019.

[6] E. P. Act, “Energy policy act of 2005”, in US Congress, 2005.

[7] B. H. Chowdhury and S. Rahman, “A review of recent advances in economicdispatch”, IEEE Transactions on Power Systems, vol. 5, no. 4, pp. 1248–1259, 1990.

[8] J. Carpent, “A link between short term scheduling and dispatching:" separabilityof dynamic dispatch!!”, in Proceedings of the Eighth Power Systems ComputationConference: Helsinki, 19-24 August 1984, Butterworth-Heinemann, 2014, p. 391.

[9] K. Chandram, N. Subrahmanyam, and M. Sydulu, “New approach with mullermethod for profit based unit commitment”, in 2008 IEEE Power and Energy SocietyGeneral Meeting-Conversion and Delivery of Electrical Energy in the 21st Century,IEEE, 2008, pp. 1–8.

[10] H. Y. Yamin, “Review on methods of generation scheduling in electric powersystems”, Electric Power Systems Research, vol. 69, no. 2-3, pp. 227–248, 2004.

[11] A. K. David and F. Wen, “Strategic bidding in competitive electricity markets: Aliterature survey”, in 2000 Power Engineering Society Summer Meeting (Cat. No.00CH37134), IEEE, vol. 4, 2000, pp. 2168–2173.

[12] G. Shrestha, S. Kai, and L. Goel, “An efficient stochastic self-scheduling techniquefor power producers in the deregulated power market”, Electric Power SystemsResearch, vol. 71, no. 1, pp. 91–98, 2004.

Literature 63

[13] A. J. Conejo, F. J. Nogales, and J. M. Arroyo, “Price-taker bidding strategy underprice uncertainty”, IEEE Transactions on Power Systems, vol. 17, no. 4,pp. 1081–1088, 2002.

[14] H. Song, C.-C. Liu, J. Lawarrée, and R. W. Dahlgren, “Optimal electricity supplybidding by markov decision process”, IEEE transactions on power systems, vol. 15,no. 2, pp. 618–624, 2000.

[15] F.-J. Heredia, M. J. Rider, and C. Corchero, “A stochastic programming model forthe optimal electricity market bid problem with bilateral contracts for thermal andcombined cycle units”, Annals of Operations Research, vol. 193, no. 1, pp. 107–127,2012.

[16] R. Weron, Modeling and forecasting electricity loads and prices: A statistical approach.John Wiley & Sons, 2007, vol. 403.

[17] R. Weron, “Electricity price forecasting: A review of the state-of-the-art with alook into the future”, International journal of forecasting, vol. 30, no. 4,pp. 1030–1081, 2014.

[18] R. H. Shumway and D. S. Stoffer, Time series analysis and its applications: with Rexamples. Springer, 2017.

[19] S. Bordignon, D. W. Bunn, F. Lisi, and F. Nan, “Combining day-ahead forecasts forbritish electricity prices”, Energy Economics, vol. 35, pp. 88–103, 2013.

[20] J. H. Zhao, Z. Y. Dong, Z. Xu, and K. P. Wong, “A statistical approach for intervalforecasting of the electricity price”, IEEE Transactions on Power Systems, vol. 23,no. 2, pp. 267–276, 2008.

[21] H. Mori and A. Awata, “Data mining of electricity price forecasting withregression tree and normalized radial basis function network”, in 2007 IEEEInternational Conference on Systems, Man and Cybernetics, IEEE, 2007, pp. 3743–3748.

[22] S. Anbazhagan and N. Kumarappan, “Day-ahead deregulated electricity marketprice forecasting using recurrent neural network”, IEEE Systems Journal, vol. 7,no. 4, pp. 866–872, 2012.

[23] L. L. Grigsby, Electric power generation, transmission, and distribution. CRC press,2007.

[24] F. P. Sioshansi and W. Pfaffenberger, Electricity market reform: an internationalperspective. Elsevier, 2006.

[25] C. R. Nelson, “Applied time series analysis for managerial forecasting”,Holden-Day San Francisco, Tech. Rep., 1973.

[26] C. M. Bishop, Pattern recognition and machine learning. springer, 2006.

Literature 64

[27] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series analysis:forecasting and control. John Wiley & Sons, 2015.

[28] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and practice. OTexts,2018.

[29] J. Durbin, “The fitting of time-series models”, Revue de l’Institut International deStatistique, pp. 233–244, 1960.

[30] R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning, “Stl: Aseasonal-trend decomposition”, Journal of official statistics, vol. 6, no. 1, pp. 3–73,1990.

[31] S. J. Taylor and B. Letham, “Forecasting at scale”, The American Statistician, vol. 72,no. 1, pp. 37–45, 2018.

[32] T. M. Mitchell, Machine learning, 1997.

[33] F. Hayashi, “Econometrics. 2000”, Princeton University Press. Section, vol. 1,pp. 60–69, 2000.

[34] R. Tibshirani, “Regression shrinkage and selection via the lasso”, Journal of theRoyal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.

[35] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation fornonorthogonal problems”, Technometrics, vol. 12, no. 1, pp. 55–67, 1970.

[36] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net”,Journal of the royal statistical society: series B (statistical methodology), vol. 67, no. 2,pp. 301–320, 2005.

[37] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, “The elements of statisticallearning: Data mining, inference and prediction”, The Mathematical Intelligencer,vol. 27, no. 2, pp. 83–85, 2005.

[38] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression”, Statisticsand computing, vol. 14, no. 3, pp. 199–222, 2004.

[39] Mathworks, “Train regression trees using regression learner”, Retrieved from:www.mathworks.com, 2019.

[40] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system”, inProceedings of the 22nd acm sigkdd international conference on knowledge discovery anddata mining, ACM, 2016, pp. 785–794.

[41] L. Breiman, “Arcing the edge”, Technical Report 486, Statistics Department,University of California at . . ., Tech. Rep., 1997.

Literature 65

[42] R. E. Bellman, Adaptive control processes: a guided tour. Princeton university press,2015, vol. 2045.

[43] M. D. Intriligator, Mathematical optimization and economic theory. SIAM, 2002.

[44] G. B. Dantzig, Linear programming and extensions. Princeton university press, 1998.

[45] A. Schrijver, Theory of linear and integer programming. John Wiley & Sons, 1998.

[46] J. E. Beasley, “Or-library: Distributing test problems by electronic mail”, Journal ofthe operational research society, vol. 41, no. 11, pp. 1069–1072, 1990.

[47] J. C. Spall, Introduction to stochastic search and optimization: estimation, simulation,and control. John Wiley & Sons, 2005, vol. 65.

[48] G. Van Rossum and F. L. Drake Jr, Python reference manual. Centrum voorWiskunde en Informatica Amsterdam, 1995.

[49] S. Varma and R. Simon, “Bias in error estimation when using cross-validation formodel selection”, BMC bioinformatics, vol. 7, no. 1, p. 91, 2006.

[50] X. Lu, Z. Y. Dong, and X. Li, “Electricity market price spike forecast with datamining techniques”, Electric power systems research, vol. 73, no. 1, pp. 19–29, 2005.

Power Plant Operation Optimization

Documents