Medium-Term Forecasting for Rain Amounts and Groundwater ... · تايمزراوخ مادسا ت ةزغ عاطق يف حلب ا ريد ةنيدم ةعباا لحاسا تايدلب

Medium-Term Forecasting for Rain

Amounts and Groundwater Production

(Dear El-Balah City as A Case Study)

إنتاج المياه الجوفيةلكميات األمطار و متوسط المدى توقع( كدراسة حالة البلح ديرمدينة )

By

Ihsan Abd Al-Majeed Solaiman Abu Amra

Supervised by

Dr. Ashraf Y. A. Maghari

A thesis submitted in partial fulfillment

of the requirements for the degree of

Master of Information Technology

September/2018

بــغـــزة ةــالميــــــة اإلســـــــــامعـالج

البحث العلمي والدراسات العليا عمادة

تكنولوجيا المعلوماتة ــــــــــــــــــليـك

تكنــولوجيا المعلــوماتر ـــــــماجستي

The Islamic University of Gaza

Deanship of Research and Postgraduate Affairs

Faculty of Information Technology

Master of Information Technology

I

إقــــــــــــــرار

أنا الموقع أدناه مقدم الرسالة التي تحمل العنوان:

Medium-Term Forecasting for Rain Amounts and

Groundwater Production

(Dear El-Balah City as A Case Study)

المياه الجوفية انتاجو كميات األمطارلمتوسط المدى توقع

(كدراسة حالة البلح ديرمدينة )

جزء أقر بأن ما اشتملت عليه هذه الرسالة إنما هو نتاج جهدي الخاص، باستثناء ما تمت اإلشارة إليه حيثما ورد، وأن هذه الرسالة ككل أو أي

لنيل درجة أو لقب علمي أو بحثي لدى أي مؤسسة تعليمية أو بحثية أخرى. االخرين منها لم يقدم من قبل

Declaration

I understand the nature of plagiarism, and I am aware of the University’s policy on this. The work

provided in this thesis, unless otherwise referenced, is the researcher's own work, and has not been

submitted by others elsewhere for any other degree or qualification.

اسم الطالب: احسان عبدالمجيد سليمان أبوعمرة

Ihsan Abd Al Majeed Abu Amra Student's name:

:Signature التوقيع:

:Date التاريخ:

II

كمحالنتيجة

II

Abstract

Forecasting is a data mining technique which benefits from numerous sources of time-series

data to derive value from historical data and helps business decision-makers for effective planning.

Groundwater is the main water source in Gaza that decreasing due to population growth. A

real water crisis is found because of the lack of rainfall. Moreover, an increase in demand for

groundwater and reduced rainfall, which is the main source of groundwater recharge, will lead to

the depletion of groundwater wells. As a result, mixing seawater with groundwater increases the

salinity rate, especially in areas where wells are close to the Mediterranean Sea in Gaza. Wells

digging without governmental control, increasing salinity percentages.

Therefore, it is necessary to focus on the relationship between rainfall - which feeds the

groundwater reservoir and reduces its salinity - and the percentages of the production for the

groundwater.

In this thesis, we conducted the forecasting techniques on two real data sets: the

groundwater production amounts we gained from the Ministry of Agriculture and the rain amounts

from the Coastal Municipalities Water Utility (CMWU) of Dear El-Balah City in the Gaza Strip.

The following forecasting algorithms are used: Auto-Regressive Integrated Moving Average

(ARIMA), ARIMA combined with Neural Network (NN), Exponential Smoothing (ETS) and State

Space Model with Box-Cox Transformation, ARMA Errors, Trend and Seasonal Components

(TBATS) and ETS. The best performance of applied algorithms on rainfall data according to Mean

Absolute Percentage Error (MAPE) measure is (ARIMA+NN) which gave the MAPE = 21%. On

the other hand, (ARIMA) is the best algorithm applied to wells’ production data which achieved

MAPE= 4.9%.

The results have shown that after five years the amounts of rainfall and groundwater

production in comparison with the period from (2013 to 2017) will decrease by 8.4%, 1.05%,

respectively. Based on these results, the salinity is expected to increase in the coming years making

the groundwater unusable.

Keywords: Forecasting, Ground water, Time series, ARIMA, Hybrid ARIMA, ETS, Rainfall.

III

الملخص

ويستفيد من المصادر العديدة لبيانات السالسل الزمنية الشتقاق القيمة من البيانات يب البيانات قيعد من تقنيات تنالتنبؤ .التاريخية ويساعد صانعي القرار في العمل على التخطيط الفعال

تم العثور على أزمة مياه . تعتبر المياه الجوفية المصدر الرئيسي للمياه في غزة والذي يتناقص بسبب النمو السكانيوعالوة على ذلك ، فإن الزيادة في الطلب على المياه الجوفية وانخفاض هطول األمطار ، .عدم هطول األمطار حقيقية بسبب

التي تعد المصدر الرئيسي إلعادة تغذية المياه الجوفية ، ستؤدي إلى استنزاف آبار المياه الجوفية. ونتيجة لذلك ، فإن خلط مياه طق التي تكون فيها اآلبار قريبة من البحر األبيض المتوسط في البحر بالمياه الجوفية يزيد من معدل الملوحة خاصة في المنا

.نسب الملوحةيد يز غزة. حفر اآلبار بدون سيطرة حكومية

والنسب -التي تغذي خزان المياه الجوفية وتقلل من ملوحتها -لذلك ، من الضروري التركيز على العالقة بين األمطار المئوية إلنتاج المياه الجوفية.

التي حصلنا عليها ، أجرينا تقنيات التنبؤ على مجموعتين حقيقيتين من البيانات: كميات المياه الجوفيةالرسالةذه في هالبلح في قطاع غزة. تم استخدام خوارزميات ديراألمطار من مصلحة مياه بلديات الساحل التابعة لمدينة وكمياتمن وزارة الزراعة

. ETS و TBATS و NN ،ETS مع الهجينة ARIMA، (ARIMA)لالنحدار الذاتي متحركالمتوسط الالتنبؤ التالية: هو (MAPE) كان أفضل أداء للخوارزميات المطبقة على بيانات هطول األمطار طبقا لمقياس خطأ متوسط النسب المطلقة

(ARIMA + NN) الذي أعطى MAPE = 21% من ناحية أخرى ، كانت (ARIMA) مطبقة على بيانات أفضل خوارزمية .MAPE = 4.9% إنتاج اآلبار التي حققت

2013سنوات سوف تنخفض كميات األمطار وإنتاج المياه الجوفية مقارنة بالفترة من ) خمسوقد أظهرت النتائج أنه بعد السنوات القادمة للمياه في بناء على هذه النتائج من المتوقع زيادة الملوحة.٪ على التوالي1.05٪ ، 8.4( بنسبة 2017إلى

ستخدام.الالجوفية مما يجعلها غير قابلة ل

.، األمطار ARIMA ،ETS، الهجين ARIMA: التوقع ، المياه الجوفية ، السالسل الزمنية ، الكلمات المفتاحية

IV

Dedication

To my parents who gave me whatever they have to achieve my dreams in life.

To my brothers and sisters who shared with me the burden of life.

To my dear husband and my children, my permanent source of support.

To the Islamic University, which embraced us and help us graduate and be capable to benefit

others with the science of information technology.

To my second home and my students at Palestine Technical College

To all my friends and those who accompanied me during my studies at the University.

To the Coastal Municipalities Water Utility and the General Directorate of Soil and Irrigation

under the Ministry of Agriculture, which supported me with the necessary data and

information.

To all those I dedicate this work.

V

Acknowledgment

Thanks to the Almighty Allah for giving me strength and ability to complete this MA

research.

I would like to express my gratitude towards my supervisor, Dr. Ashraf Maghari, for all the

guidance and constructive observations, he has given me through the research period. I'm

proud to be one of his students and I had the opportunity to be under his supervision.

I also thank my parents for their great effort.

Special thanks to my dear husband for his boundless support to complete this research.

I would like to thank my darling children for being patient and supporting me during my

studies.

Many sincere thanks to my sisters and brothers for supporting me throughout this experience.

VI

Table of Content

Declaration .................................................................................................................................... I

Abstract ................................................................................................................................... II

III .................................................................................................................................. الملخص

Dedication .................................................................................................................................. IV

Acknowledgment ......................................................................................................................... V

Table of Content ......................................................................................................................... VI

List of Abbreviations .................................................................................................................. IX

List of Figures .............................................................................................................................. X

List of Tables ............................................................................................................................... XI

Chapter 1 Introduction ............................................................................................................ 12

1.1. Background and Context ......................................................................................... 2

1.2. Statement of the problem ........................................................................................ 5

1.3. Objectives ................................................................................................................ 6

1.3.1. Main objective ......................................................................................................... 6

1.3.2 Specific objective .................................................................................................... 6

1.4. Importance of the project ........................................................................................ 6

1.5. Scope and limitations of the project ........................................................................ 7

1.5.1. Main Scope .............................................................................................................. 7

1.5.2. Main limitations ...................................................................................................... 8

1.6. Methodology ........................................................................................................... 8

1.7. Thesis Outlines ........................................................................................................ 9

Chapter 2 Literature Review ................................................................................................... 10

2.1. Coastal Municipalities Water Utility (CMWU) .................................................... 11

2.2. The Ministry of Agriculture (MoA): ..................................................................... 11

2.3. Data Mining Overview .......................................................................................... 12

2.4. Forecasting Accuracy measures: ........................................................................... 13

2.5. Time Series Analysis ............................................................................................. 13

2.7. Forecasting techniques: ......................................................................................... 15

2.7.1. Auto-Regressive Integrated Moving Average (ARIMA) ...................................... 15

2.7.2. Hybrid ARIMA ..................................................................................................... 16

VII

2.7.2.1 Neural Networks Model (NNs) ........................................................................ 17

2.7.2.1.1 Feed-forward Neural Networks ............................................................... 18

2.7.2.1.2 Recurrent Networks (RN): ....................................................................... 18

2.7.2.1.3 Stochastic Neural Networks (SNN): ........................................................ 18

2.7.2.1.4 Modular Neural Networks ....................................................................... 18

2.7.2.2 Exponential Smoothing- the state space models: .............................................. 19

2.7.2.3 TBATS .............................................................................................................. 22

Chapter 3 Related Work .......................................................................................................... 24

3.1 Groundwater and Rainfall forecasting................................................................... 26

3.2 Other domains in forecasting researches: .............................................................. 31

3.3 Related Work Discussion ...................................................................................... 36

3.4 Conclusion ............................................................................................................. 38

Chapter 4 Methodology and Model Development ................................................................. 39

4.1. Methodology steps ................................................................................................ 40

4.2. Data Collection ...................................................................................................... 42

4.3. Data Preprocessing ................................................................................................ 45

4.4. Selected Models: ................................................................................................... 51

4.5. Implementation ...................................................................................................... 52

4.5.1. Tools:……………………………………………………………………………. 52

4.5.2. Steps of applying models: .................................................................................... 52

4.6. Summary: .............................................................................................................. 54

Chapter 5 Experimental Results and Discussion ................................................................... 55

5.1. Experiment sets ..................................................................................................... 56

5.2. Data set .................................................................................................................. 57

5.3. Evaluation forecasting algorithms ......................................................................... 59

5.3.1. Evaluating algorithms over Amount attribute in rain data set ............................... 59

5.3.1.1. ARIMA Evaluation ............................................................................................. 60

5.3.1.2. Hybrid ARIMA (ARIMA with Neural Network (NN)) Evaluation……………61

5.3.1.3. Hybrid ARIMA (ARIMA with (ETS )) Evaluation……………………… …62

5.3.1.4. Hybrid ARIMA (ARIMA with (TBATS)) Evaluation…………………………….63

5.3.1.5. Exponential Smoothing ETS Evaluation……………………………………..64

5.3.2. Comparing the Methods performance of ‘rain amount’ ........................................ 65

5.3.3. Algorithms evaluation over groundwater production amounts…………………..66

VIII

5.3.3.1. ARIMA Evaluation………………………………………………………….….67

5.3.3.2. Hybrid ARIMA(ARIMA+NN) Evaluation……………………………………..67

5.3.3.3. Hybrid ARIMA(ARIMA+ ETS ) Evaluation…………………………………68

5.3.3.4. Hybrid ARIMA(ARIMA+TBATS) Evaluation………………………………...….68

5.3.3.5. Exponential Smoothing ETS Evaluation………………………………..……69

5.3.4. Comparing Methods accuracy over ‘Wells’ production’ .................................. ....69

5.4. Forecasting the rain amounts and groundwater production in Dear El-Balah ....... 73

5.4.1. Deviations of rain amounts and wells’ production74

5.4.1.1. Deviations of rain, wells’ production in comparison with the 2017 amounts .... 74

5.4.1.2. Deviations of rain , wells production comparing with the period (2013-2017) .. 75

5.5. Discussion ............................................................................................................. 78

5.6. Summary ............................................................................................................... 78

Chapter 6 Conclusion and Future Work ................................................................................ 79

6.1. Summary ............................................................................................................... 80

6.2. Conclusion ............................................................................................................. 81

6.3. Future works .......................................................................................................... 83

Bibliography ................................................................................................................................ 85

Appendix .................................................................................................................................. 90

Appendix A .............................................................................................................................. 90

Appendix B ............................................................................................................................... 91

Appendix C ............................................................................................................................... 94

Appendix D .............................................................................................................................. 95

Appendix E ............................................................................................................................... 96

Appendix G .............................................................................................................................. 96

Appendix H .............................................................................................................................. 97

Appendix I ............................................................................................................................... 98

Appendix J ............................................................................................................................... 99

IX

List of Abbreviations

AFD French Development Agency

ANN Artificial Neural Network.

ARIMA Auto Regressive Integrated Moving Average.

CMWU Coastal Municipalities Water Utility.

ETS Exponential Smoothing State Space.

Hybrid ARIMA Auto Regressive Integrated Moving Average with (ANN/sets/TBATS).

KDD Knowledge Discovery in Data base

KFW the German Development Bank KFW

MAD Mean Absolute Deviation.

MAE Mean Absolute Error.

MAPE Mean Absolute Percentage Error.

MOA Ministry of Agriculture.

MSE Mean Squared Error.

SSA Singular Spectrum Analysis.

SVM Support Vector Machine.

TBATS State Space Model with Box-Cox Transformation, ARMA Errors,

Trend and Seasonal Components

TSFF Time Series Forecasting Framework.

UNRWA United Nations Relief and Works Agency.

X

List of Figures

Figure (1–1): Data mining Techniques (Deepashri & Kamath, 2017) ................................................................. 4

Figure (2–1):Steps of Knowledge Discovery (Alasdi and Bhaya, 2017) .......................................................... 12

Figure (2–2): Time series components(Australian Transport- ASSessment and Planning, 2016) .................. 14

Figure (2–3): The classifications of neural network architecture(Sibanda & Pretorius, 2012) ......................... 17

Figure )3–1): Summary of the Most Related Works to this Work .................................................................... 36

Figure (4–1): Steps of propsed forecasting methodology ................................................................................ 41

Figure (4–2): Monthly wells production data of groundwater .......................................................................... 51

Figure (4–3): Annual rainfall amounts .............................................................................................................. 51

Figure (4–4): Splitting orginal datasets into training and testing ...................................................................... 53

Figure (5–1): Applying ARIMA Process by RapidMiner tool ......................................................................... 60

Figure (5–2): ARIMA Evaluation for rain amounts (Actual vs Predicted). ...................................................... 61

Figure (5–3): Evaluation of rain data using (ARIMA with NN) by R code ...................................................... 61

Figure (5–4): (ARIMA+NN) Evaluation for rain amount (Actual vs Predicted) .............................................. 62

Figure (5–5): Evaluation of rain data using (ARIMA + ETS ) R code .......................................................... 62

Figure (5–6): (ARIMA+ ETS ) evaluation for rain amount (Actual vs Predicted) .......................................... 63

Figure (5–7): Evaluation of rain data using (ARIMA +TBATS) R code .......................................................... 63

Figure (5–8): (ARIMA+ TBATS) evaluation for rain amount (Actual vs Predicted) ....................................... 64

Figure (5–9): Evaluation of ETS for rain amounts (Actual vs Predicted) ........................................................ 65

Figure( 5–10): MAPE percentages for forecasting algorithms .......................................................................... 66

Figure (5–11): ARIMA evaluation for wells production (Actual vs Predicted) ................................................ 67

Figure (5–12): Hybrid ARIMA (ARIMA+NN) Evaluation for wells production (Actual vs Predicted). ......... 67

Figure (5–13): (ARIMA+ ETS ) Evaluation for wells Production (Actual vs Predicted) ................................ 68

Figure (5–14): (ARIMA+ TBATS ) evaluation for wells Production (Actual vs Predicted). .......................... 68

Figure (5–15): ETS evaluation for wells production (Actual vs Predicted) ..................................................... 69

Figure (5–16): MAPE percentages for forecasting algorithms for monthly amounts of wells’ production. ..... 69

Figure( 5–17): data representation of groundwater (before re-dividing)........................................................... 71

Figure( 5–18): Semi-annual data representation of groundwater (after re-dividing) ........................................ 71

Figure (5–19): MAPE percentages for forecasting algorithms for semi-annual amounts of wells’ production.

............................................................................................................................................................................ 72

Figure (5–20): Five years forecasting Rain amounts for Dear El-Balah city using (ARIMA+NN) ................. 73

Figure (5–21): Five years groundwater forecasting for semi-annual wells’ production amounts using

(ARIMA) ............................................................................................................................................................ 73

Figure (5–22): The Relationship between rain amounts and groundwater deviation in comparison with 201775

Figure (5–23): Rain amounts deviation (Actual + forecasted) from 2013 to 2022 ........................................... 76

Figure (5–24): Wells’ production data (Actual + forecasted) from 2013 to 2022 ............................................. 76

Figure (5–25): Groundwater production data(Actual+forecasted) from 2008 to 2022 ..................................... 77

Figure (5–26): Rain amounts data(Actual+forecasted) from 1985 to 2022 ...................................................... 77

XI

List of Tables

Table (4-1): Sample of groundwater dataset before preprocessing ................................. 43

Table (4-2): Sample of Rainfall data before preprocessing ............................................. 44

Table (4-3): Sample of combining wells production in new data set .............................. 47

Table (4-4): Sample of combining rain amounts of Dear El- Balah city in new data set

........................................................................................................................................... 48

Table (4-5): Sample of groundwater time series data set................................................. 49

Table (4-6): Sample of Rain time series data set ............................................................. 49

Table (4-7): Filling missing value in JUN-13 .................................................................. 50

Table (4-8): Training and testing data periods of datasets ............................................... 53

Table (5-1): Sample of the training set of rain data from 1985 to 2010 .......................... 57

Table (5-2): Sample of the testing set of rain data from 2011 to 2017 ............................ 58

Table (5-3): Sample of the training set of wells data from Jan 2007 to Dec 2017 .......... 58

Table (5-4): Sample of testing set of wells data from Jan 2016 to Dec 2017 .................. 59

Table (5-5): Comparing Methods MAPE over ‘Rain amount'. ........................................ 65

Table (5-6): Algorithms performance (accuracy) over wells Production. ...................... 69

Table( 5-7): dividing the data into semi-annual data ....................................................... 70

Table (5-8): Algorithms MAPE over wells’ production. ................................................. 72

Table (5-9): Deviation for groundwater and rain amounts comparing with 2017 ........... 74

Table (5-10): Deviation of rain amounts and groundwater amounts compared to last 5

years .................................................................................................................................. 75

1

Chapter1

Introduction

2

Chapter 1

Introduction

1.1. Background and Context

Water is considered one of the scarce resources in Palestine. An important reason for

this is the restriction imposed by the Israeli side on the Palestinian institutions in the field

of the extraction and use of water. Furthermore, the economic situation in the Gaza Strip

plays a vital role because of the shortage of the financial resources for investing in the

water sector resulting in high rates of poverty (Authority, 2018).

Water resources in Palestine include runoffs, surface water; and groundwater.

While the share of the Palestinians is around 11%, the remaining 89% is exploited by Israel

(Barakat & Heackock, 2013).

Gaza Strip is one of the most densely populated regions on the planet, According

to the latest population statistics in the Gaza Strip, the number of citizens in the Gaza Strip

until the beginning of April 2018 is 2090405, including the central governorate with a total

of 294531 citizens (Ministry of Interior and National Security, 2018). In 1970s, Israel was

physically controlling almost all available sources of fresh water from the Jordan River and

its catchment (including the Golan Heights annexed from Syria) as well as the underground

aquifer water.(Al-Shalalfeh, Napier, & Scandrett, 2018).

According to the latest reports and national plans of Palestinian Water Authority,

the water situation in Gaza is facing critical condition (Aiash & Mogheir, 2017). The only

source of water is the ground aquifer. It is noticed that the water level is decreasing as the

demand for water is increasing for many and different uses. The Coastal Aquifer receives

a yearly rate recharge of 55 -60 MCM/y specially from rainfall, while the yearly extraction

rates from the aquifer are about 200 MCM. The lower the groundwater level is due to

unsustainable high rates of extraction and the gradual intrusion of seawater and up conning

of the underground saline groundwater. Due to these conditions which made the aquifer

unusable in 2016, and the damage will be irreversible by 2022.

The classical coastal aquifer of GAZA represents the sole water source of GAZA

Strip covering an area of 360 (km2) with a total recharge of nearly 60 mcm/ yr. Due to over

3

pumping(Daibes-Murad1, 2004), the Gaza aquifer is threatened by seawater and salt

groundwater intrusion.

Rainwater is the only renewable source that feeds the Coastal Aquifer in Gaza Strip,

especially because of the increased reliance on the amount of groundwater by official

institutions such as municipalities. The maintenance of water levels in the wells is very

important taking into account the amounts of rainfall falling to maintain Salinity and other

minerals in groundwater (Aiash & Mogheir, 2017).

The increasing consumption of groundwater and decreasing rainfall water, will lead to

depletion of groundwater wells, thus increase the salinity rate due to the entry of sea water

and it’s mixing with groundwater, especially in areas where wells are close to the

Mediterranean Sea(Mushtaha & Al-Louh, 2013).

Gaza Strip is one of the scarcest water resources areas in the region. Groundwater

is the only water resource of water for the Palestinians in the Gaza Strip and provides more

than 90% of all water supplies(Report & Water, 2013). The rising depletion in the water

budget leads to falling in the quantity and quality of groundwater.

Data mining includes extracting the most important and related information from

large data sets and processing. It serves particular objectives. Data mining leads to either a

descriptive model or a predictive model. The descriptive model presents a summary

information, finds patterns in the data and the relationships between their attributes. The

Descriptive model includes tasks such as Sequence Discovery, Association Rules,

Clustering, Summarizations. The predictive model uses the strategy of predicting the data

values taken from various datasets. This model includes classification, prediction,

regression and analysis of time series as shown in Figure (1–1).

4

Figure (1–1): Data mining Techniques (Deepashri & Kamath, 2017)

Forecasting task means understanding and recognizing the old behavior of an

attributes to predict its future behavior or pattern. It is one of the ancient famous techniques

of the predictive analytics. It is based on making forecasts and prediction using historical

data. Forecasting is one of the known techniques of time series data analysis. It’s used to

predict future trends in retail sales, water demands, weather forecasting, economic

indicators, stock markets, and many other applications(Aggarwal, 2015).

A reliable forecast of groundwater consumption is necessary for the proper

planning of groundwater demand and water management resources. Depending on the

forecasting time horizon term, forecasts can be classified, as suggested by (Jones, 2011),

into: (1) Long-Term which forecasts decades, (2) Medium-Term which forecasts years to

a decade, (3) Short-Term which forecasts up to year.

From the previous introduction, population in Gaza in specific is growing up by

time, and water resources are limited due to low rainfall annual amounts or by Israeli

politics against Palestinian Authority about water underground resources.

So, handling these factors by using forecasting science in future water service

projects is very important issue for mayors of municipalities and decision makers. By

routing the consumption of groundwater with planning between the salinity of groundwater

and anticipated rainfall amounts which reduce the salinity. Moreover, managing the

5

consumption of groundwater through the coordination of the digging of wells by the

citizens.

CMWU has time series data for groundwater wells in Dear El Balah and the

amount of monthly production over years. The Ministry of Agriculture(Palestine, 2017)

also has a large time series data which belongs to all its services in the city. Hundreds of

records and data are collected. So, these data scientifically should be used in planning for

determining the relationship between groundwater and rainfall in the future.

In our research, we collected two different historical series data: the first was a

dataset for groundwater wells from (CMWU) in Dear El- Balah (10 years: 2008-2017) and

the second was historical rainfall amounts since 1985 (32 years) from the Palestinian

Ministry of Agriculture. The data prepared for applying forecasting algorithms to choose

the most accurate algorithm for the prediction of the next five years. Hence, useful results

will be presented to the managers and specialists to have a clear perspective view of the

water consumption in the near future.

1.2. Statement of the problem

The Coastal Municipalities Water Utility (CMWU) has monthly data of groundwater

consuming amounts, and the Ministry of Agriculture has also monthly rainfall amounts.

Both of these data are historical data and did not consider the future planning for the

groundwater management. In addition, the water Authority and the ministry of agriculture

give permits to citizens to dig underground wells, which increases the consumption of

groundwater and keep it uncontrolled. Moreover, Israel has acquired 89% of water

resources in the Gaza strip.

Furthermore, the relationship between the rainfall amounts - which feed the

groundwater reservoir that leads to reduces its salinity - and the groundwater production is

not estimated.

On the other hand, Gaza strip overlooks the Mediterranean Sea, making it more prone

to salinity. So, the increased consumption of the groundwater leads to the leakage of

seawater to replace the depleted groundwater, and thus the imbalance between freshwater

and saline water is making the groundwater in this area unusable.

6

The forecasting science is not used by municipalities and the Ministry of Agriculture

to estimate the future need of underground water and amounts of rainfall and their relation

with salinity (more wells production leads to more salinity percentage if rainfall rates

stable). Forecasting will give a vision to decision makers and funders of the water sector

in terms of water demand particularly in the Gaza Strip because of the political and

economic situation is unstable.

1.3. Objectives

1.3.1. Main objective

To apply some forecasting algorithms on two time series datasets of groundwater

production and the amounts of rainfall for a given area (Dear El-Balah city as a case

study). Then select the most accurate algorithm to forecast the groundwater production

and rainfall amounts in the next 5 years.

1.3.2. Specific objectives

1. To aggregate data from different resources: the CMWU and the Ministry of

Agriculture;

2. To analyze, understand and pre- process data;

3. To fetch the datasets to be on the form of time series;

4. To model data by implementing datamining forecasting techniques;

5. To evaluate the algorithms by measuring the accuracy of the findings;

6. To select the most accurate algorithm to forecast for rainfall and groundwater

consumption for our case study;

7. To give relational perception of the groundwater reservoir with rainfall in the

coming five years depending on the forecasting algorithms.

1.4. Importance of the study

Forecasting time data, especially for groundwater consumption and rainfall water, is

very important. It routs the annual pumping of groundwater. Moreover, it helps making

future plans based on the results of the groundwater production and rainfall rates, which

are the main source of groundwater recharge, thus reducing the salinity ratio. It also helps

7

managing the consumption of groundwater through the coordination of the digging of wells

by the citizens. The importance of our proposed model stems from:

1. Being helpful for other cities to benefit from this approach to estimating future

consumption of groundwater.

2. Being applicable to other utilities such as electricity, gas and oil.

3. Help both the CMWU and the Ministry of Agriculture in forecasting groundwater

and rainfall amounts for efficient decision making.

4. Providing means of reducing groundwater consumption such as water desalination

and getting the utmost benefit from the rainfall.

5. Reducing the salinity of the groundwater reservoir resulted from the increased

pumping.

6. Making plans depending on true time data related to the past in order to monitor

the future situation of the groundwater and coordination in the drilling of wells by

citizens.

7. Raising the awareness of those funding water projects

8. Not draining the available groundwater reservoir.

9. The groundwater resovir is affected by the temporal and spatial difference of

precipitation.

10. The search for new sources for groundwater recharge, such as: the collection of

rainwater from the roofs of buildings and covered areas.

11. Presenting suitable suggestions and recommendations for decision and policy

makers and supervisory bodies such as:( CMUS, MoA and the water authority).

1.5. Scope and limitations of the project

1.5.1. Main Scope

The main scope of this study is the analysis and forecasting of demand for

groundwater and rainwater in Dear El-Balah city for 5 years in advance, especially due

to the political and economic situation under the rule of the State of Israel, the

assumption of this research is limited to the management of groundwater consumption

for customers and rainfall water. This research will focus on historical data of Dear El-

Balah Municipality.

8

1.5.2. Main limitations

1. Monthly groundwater consumption data were aggregated from 2008 to 2017 and

rainfall dataset were from 1985 to 2017 (of about 32 years).

2. The medium-term forecasting methods will be used for next 5 years.

3. Our work focuses only on forecasting groundwater production quantity at Dear El-

Balah and forecasting rainfall water quantities.

4. Wells dug from citizen are not included in the study.

5. Salinity equations and their percentage in wells and pollution percentages are not

included in the study.

6. Evaporation rates of rainwater and population increase are also not included.

1.6. Methodology

The methodology of this research consists of the following five phases:

Phase 1: Data Collection and Acquisition:

This thesis depends on historical data collected from different institutions: (CMWU)

and the MoA in Dear El-Balah city. The collected data from the (CMWU) was for the

monthly groundwater production since 2007 and that from the MoA was for the rainfall

amounts since 1985. We selected the related tables and columns that belong to our study

and prepare it for the next phase.

Phase 2: Data Preprocessing:

Required preprocessing tasks were applied to enhance data efficiency before applying

the forecasting algorithms. Preprocessing include several techniques e.g. integration,

reduction and missing values. We performed Microsoft Excel 2016 version for

preprocessing

Phase 3: Implementation:

After preprocessing and preparing the real datasets on the form of time series data, we

gained two datasets: the first one is of the monthly wells production from January 2008 to

December 2017 and the second one is of the annual rainfall amounts from 1985 to 2017.

9

Five types of forecasting algorithms were run on these datasets. These algorithms are: 1-

ARIMA, 2- ETS, 3-ARIMA +NN, 4- ARIMA+ETS, 5-ARIMA+ TBATS. According to

their results, the appropriate algorithm with the lowest computed MAPE value will be

selected for real forecasting from 2018 to 2022.

Phase 4: Evaluating the algorithms:

Evaluation of the performance of the five selected forecasting algorithms was made for

each dataset separately by MAPE measure. We computed the MAPE value of the predicted

results and actual values in the testing set

Then, we compared the MAPE values of the forecasting algorithms. The algorithm with

the lowest MAPE value will be the most accurate algorithm and will be selected for the

actual forecast task.

Phase 5: Forecasting:

After selecting the most accurate algorithm, we forecasted both the groundwater

production and rain amounts for the next five years in Dear El-Balah city.

1.7. Thesis Outlines

The research is divided into six chapters. Chapter one includes the introduction,

statement of research problem, objectives, scope and limitation of research, research

methodology. Chapter two provides Literature Review. As for chapter three, it summarizes

some related works associated with our research. Chapter four explains the methodology

and model development. Chapter five provides experimental results. The analysis of

experiments results and discussion. The last one, chapter six, talks about conclusion and

future work. At the end of this thesis are the References list and the Appendices.

10

Chapter2

Literature Review

11

Chapter 2

Literature Review

Many types of research and scientific papers discuss forecasting which has received

attention over time.

In this chapter we talk about the CMWU and MoA as sources of datasets, data

mining overview, forecasting accuracy measures, time series analysis, data mining versus

statistics. We summarized forecasting methods and algorithms we used in the thesis which

are ARIMA, Hybrid ARIMA: ARIMA combined with ETS, TBATS, NN, and ETS.

2.1. Coastal Municipalities Water Utility (CMWU)

The CMWU was established in 1995 to be the first national and professional institution

under which all the water and sewage directorates work in municipalities in the Gaza Strip.

It runs four water utilities; three in the West Bank and one in Gaza. CMWU is funded by

UNRWA, the Red Cross and Red Crescent, the German Development Bank (KFW), the

French Development Agency (AFD) and other national and international institutions.

CMWU has carried out several tasks to serve the water sector. It established 45

water wells in Gaza and desalination units on some wells, developed water services in the

Gaza Strip and rebuilt what the war destroyed by 6 million dollars. It also repaired all water

wells and sewage lines and the treatment plant in Gaza City.

Its purpose is to provide residents of the Gaza Strip with integrated,

environmentally sound and safe water and sanitation services through optimal utilization

of available resources and innovative solutions (Coastal Municipalities Water Utility,

2015).

2.2. The Ministry of Agriculture (MoA):

It aims to achieve food security and contribute to improving the quality of life for

farmers and Palestinian citizens. In addition, it works to develop the agricultural sector

through several tasks. The following are the most important ones:

• Optimize agricultural resources, especially land and water efficiently and

effectively and contribute to food security.

12

• Increasing and improving the competitiveness of agricultural production in the

markets locally and abroad and introducing new agricultural varieties.

• Enable the private sector to play its role easily to enhance the agricultural and

rural development process.

In our research of Dear El-Balah city as a case study, we collected some related

data from the Coastal Municipality Water Utility and the General Directorate of soil and

Irrigation (GDSI) which is affiliated to the Ministry of Agriculture. The GDSI monitors

precipitation and prepares rainfall reports. We applied the forecasting algorithms to benefit

from the results in support of decision makers and the employment of future plans to serve

the water sector, especially with the increase in the population.

2.3. Data Mining Overview

A process of extracting valuable information from different data sources is called

Knowledge Discovery in Database (KDD). Data mining represents a step of KDD by

performing models and analysis for huge dataset using data mining techniques

e.g.(classification, clustering, association rules)( Alasdi and Bhaya, 2017).

Figure (2–1):Steps of Knowledge Discovery (Alasdi and Bhaya, 2017)

Figure (2–1) illustrates the data mining process of Knowledge discovery steps.

The data must be selected to determine the target data. Then the data is transformed

13

into a suitable form for mining by performing some of the preprocessing or aggregation

operation on data. After that a primary process including intelligent methods such as

classification, clustering, and association is applied to data to extract patterns. Pattern

Evaluation identifies the true situation of pattern based on some specific measures. The

final step is called knowledge presentation. It aims to visualize mined knowledge in which

a human can understand and figure it (Han, Kamber, & Pei, 2012).

2.4. Forecasting Accuracy measures:

The most common metrics used to measure accuracy in forecasting are Mean

Absolute Error (MAE), Root Mean Squared Error (RMSE) and the Mean Absolute

Percentage Error (MAPE). We used MAPE as a measure of accuracy of forecasting

methods in groundwater and rainfall prediction. MAPE measure expresses accuracy as a

percentage, it is defined by the following formula:

𝐌𝐀𝐏𝐄 =𝟏

𝑵∑ |

𝑨𝒄𝒕𝒖𝒂𝒍𝒊−𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅𝒊

𝑨𝒄𝒕𝒖𝒂𝒍𝒊| 𝟏𝟎𝟎%𝑵

𝒊=𝟏 (Equation 2-1)

Where Actuali is the actual value, Predictedi is the forecast value.

The difference between Actuali and Predictedi is divided by the Actual value. The absolute

value in this calculation is summed for every forecasted point in time and divided by the

number of fitted points N. Multiplying by 100% makes it a percentage error.

Sometimes MAPE was known as Mean Absolute Percentage Deviation (MAPD). It

usually expresses accuracy as a percentage(Kaytez et.al, 2015).

2.5. Time Series Analysis

Time series is an ordered sequence of values recorded at specific time intervals. In

time series analysis, we analyze the past behavior of a variable in order to predict its future

behavior. In other words, these data points are used to forecast the future.

A usual approach in analyzing time series is to decompose the series into four components

(Seymour, 2014). These components are illustrated in Figure (2–2):

1. Trend component – Trend is the decrease or increase in the series over a period of time, it

persists over a long period of time.

14

Example: Births growth over time can be seen as an upward trend. So, the variable exhibits

the general movement during the observation period without taking the irregularities and

seasonality into account. If a time series does not show decreasing or an increasing pattern

then the series is stationary in the mean.

2. Seasonality component – it occurs when the time series exhibits regular fluctuations during

the same period(month) every year, or during the same quarter every year.

Example: Sales of gifts increases during the Christmas season.

3. Random component– It is the trend-cycle and the seasonal components which have been

removed. Short-term fluctuations in a series which are not systematic and, in some instances,

not predictable leads to the results.

4. Cyclic component– When data exhibit rises and falls that are not of fixed period, it is a

cyclic pattern. The existence of the fluctuations usually at least 2 years.

Figure (2–2): Time series components(Australian Transport- ASSessment and

Planning, 2016)

Time series analysis can be classified as linear and non-linear and into a univariate and

multivariate analysis. Univariate time series means a time series containing a single

observation recorded sequentially over time, for example, hourly energy consumption.

15

Multivariate time series is used when a group of time series variables is involved and their

interactions are to be considered(Deb et.al, 2017).

Time series forecasting is different from time series analysis in that it uses a model to

predict future values based on previously observed values. In terms of Time series analysis,

it comprises methods for analyzing time series data in order to extract some useful statistics

and other characteristics of the data.

(Ullah, 2014) classifies the major objectives of time series analysis as:

1. Description: such as looking for trends, plotting data, seasonal fluctuations and so on;

2. Explanation: making possible to use the variation in one-time series to explain the

variation in another series by observations taken of two or more variables;

3. Prediction: predicting the future values of the series;

4. Control: Control procedures are of several different kinds e.g. quality measuring of the

time series manufacturing process.

2.6. Forecasting techniques:

2.6.1. Auto-Regressive Integrated Moving Average (ARIMA)

ARIMA stands for the auto-regressive integrated moving average(Dalinina,

2017) and is specified by these three order parameters: (p, d, q). The process of fitting

an ARIMA model is sometimes referred to as the Box-Jenkins method.

An autoregressive (AR(p)) component refers to the use of past values in the

regression equation for the series Y. The auto-regressive parameter p specifies the

number of lags used in the model. For example, AR(2) or,

equivalently, ARIMA(2,0,0) is represented as

𝑌𝑡 = 𝑐 + 𝜑1𝑦𝑡−1 + 𝜑2𝑦𝑡−2 + 𝑒𝑡 (Equation 2-2)

Where φ1, φ2 are parameters for the model, the d represents the degree of

difference in the integrated (I(d)) component. A series difference simply involves

subtracting its current and previous values d times. Differencing is often used to

stabilize the series when the stationarity assumption is not met. This will be discussed

below.

http://itfeature.com/time-series-analysis-and-forecasting/time-series-analysis-forecasting

http://itfeature.com/time-series-analysis-and-forecasting/objectives-of-time-series-analysis#timedescription

http://itfeature.com/time-series-analysis-and-forecasting/objectives-of-time-series-analysis#timeexplanation

http://itfeature.com/time-series-analysis-and-forecasting/objectives-of-time-series-analysis#timeprediction

http://itfeature.com/time-series-analysis-and-forecasting/objectives-of-time-series-analysis#timecontrol

16

A moving average (MA(q)) component represents the error of the model as a

combination of previous error terms et. The order q determines the number of terms

included in the model.

𝑦𝑡 = 𝑐 + 𝜃1𝑒𝑡−1 + 𝜃2𝑒𝑡−2 + ⋯ 𝜃𝑞𝑒𝑡−𝑞 + 𝑒𝑡 (Equation 2-3)

Differencing, autoregressive, and moving average components make up a non-

seasonal ARIMA model which can be written as a linear equation:

𝑦𝑡 = 𝑐 + 𝜃1𝑦𝑑 𝑡−1 + 𝜃𝑝 𝑦𝑑 𝑡−𝑝 + ⋯ 𝜃1𝑒𝑡−1 + 𝜃𝑞𝑒𝑡−𝑞 + 𝑒𝑡(Equation 2-4)

where yd is Y differenced d times and c is a constant.

ARIMA models can also be specified through a seasonal structure. In this case, the

model is specified by two sets of order parameters: (p, d, q) as described above

and parameters describing the seasonal component of m periods.

ARIMA methodology has its limitations. These models directly rely on past values,

and therefore work best on long and stable series. It is also noticed that ARIMA

simply approximates historical patterns and therefore does not aim to explain the

structure of the underlying data mechanism.

2.6.2. Hybrid ARIMA

ARIMA models are important algorithms for time series data. It assumes linear

relationship, and it needs a lot of data to produce accurate results. It is capable of

representing stationary as well as non-stationary time series(Gahirwal, 2013).

Forecasting performance can be enhanced by combining several different models.

Therefore, combining different models can increase the chance to discover more hidden

patterns and relationships in the data, improve forecasting performance, reduce the risk of

using an inappropriate model ,reduce the risk of failure, obtain more accurate results and

overcome the limitations of each component(Khashei & Bijari, 2011).

So, combining ARIMA models with other models, often non-linear prediction

algorithms for forecasting, gives better estimates than does the single time series models.

17

A novel hybridization of intelligent techniques and ARIMA models for time series

prediction can be done. These techniques are: ETS, TBATS, NN, stlm.

Both artificial neural networks (ANNs) and (ARIMA) models have achieved

success in their own linear or nonlinear domains. None of them is a universal model that is

suitable for all circumstances.

2.6.2.1 Neural Networks Model (NNs)

NNs as a nonlinear model are considered a time series forecasting model. They are

a highly abbreviated model of the structure of the biological neuron. NNs consist of: 1.

Processing unit; 2. Interconnections; 3. Operations; 4. Updates. NNs have several

processing units that are interconnected with each other according to some topology by

using interconnections. NNs take a signal from the out world or from the outputs of other

processing units. In operation units, the input signal comes to the NN from other units that

are connected units from the out world. There are two learning procedures to run the NNs.

The first is a supervised learning and the second is an unsupervised learning. It’s possible

to say many types of NNs that have supervised or unsupervised learning

methods(Egnanarayana, 2005). Figure (2–3) shows the classification of NN models into

feed- forward NNs, Recurrent NNs, Stochastic NNs and Modular NNs.

Figure (2–3): The classifications of neural network architecture(Sibanda & Pretorius,

2012)

18

2.6.2.1.1 Feed-forward Neural Networks

They are the simplest type of ANNs in which the information moves in only one

direction without loops in the network; forward from the input nodes through the hidden

nodes and to the output nodes. Single layer perceptron (SLP) and multi-layer perceptron

(MLP) are examples of feed-forward NNs.

2.6.2.1.2 Recurrent Networks (RN):

These models are with a bi-directional data flow .RN propagates data from later

processing stages to earlier stages (Bitzer & Kiebel, 2012) the data don't propagate linearly

from input to output as in feed-forward network.

2.6.2.1.3 Stochastic Neural Networks (SNN):

The SNN differs from a typical neural network in describing a practical system

more accurately and introducing random variations into the network(Yang, Zhang, & Shi,

2010).

2.6.2.1.4 Modular Neural Networks

A Modular Neural Network (MNN) is a Neural Network (NN) that consists of

several modules, each module carrying out one sub-task of the NN’s global task, and all

modules are functionally integrated. The whole network can either be a sub-structure or a

learning sub procedure. The network’s global task can be any neural network application,

e.g., mapping, function approximation, clustering or

associative memory application(Auda & Kamel, 1999). In the MNN the modules do not

interact with each other. A modular neural network has many benefits one of which is the

ability to reduce a large neural network to smaller, more manageable components (Azam,

2000).

Hybrid ARIMA (ARIMA +NN) is one of the best and famous hybrid models in

operating forecasting tasks. The motivation of that comes from the following perspectives

(Khashei & Bijari, 2011) (1) the problem of model selection can be eased by combining

linear ARIMA and nonlinear ANN models when it is difficult to determine whether a time

series is generated from a nonlinear or linear underlying process. (2) where neither ANN

nor ARIMA models alone can be adequate for modeling in rare cases of time series are

19

pure linear or nonlinear and often contain both linear and nonlinear patterns; combining

linear ARIMA and nonlinear ANN models form a solution to the problem of modeling the

combined linear and nonlinear autocorrelation structures. (3), it is agreed in the

forecasting science that no single model is the best in every situation, due to the complexity

of the real-world problem in nature and any single model may not be able to capture

different patterns equally well. Therefore, combining different models in order to capture

different patterns in the data.

2.6.2.2 Exponential Smoothing (ETS)- the state space models:

Exponential Smoothing (also known in forecasting society as ETS). ETS is a part

of smooth package. It aims to gather all the essential smoothing techniques used in

forecasting. There are statistical models that underlie the exponential smoothing methods.

A measurement equation of each model gives a description of some transition

equations and the observed data of how the unobserved components or states (level, trend,

seasonal) change over time. Hence these are referred to as “state space models”( Rob J

Hyndman, 2018).

The idea of exponential smoothing is to smooth the original series the way the

moving average does and to use the smoothed series in forecasting future values of the

variable of interest. In exponential smoothing, however, we want to allow the more recent

values of the series to have a greater influence on the forecast of future values than the

more distant observations.

An exponentially weighted average of past observations is constructed from a

simple and pragmatic approach to forecasting. The present observation is given the largest

weight, whereas less weight is given to the immediately preceding observation and the

observation before that is the less weight (exponential decay of influence of past data).

Exponential smoothing of time series data(Horse, 2018) assigns exponentially

decreasing weights for newest to oldest observations. On the other hand, the data which is

older, the less priority (“weight”) the data is given; newer data is seen as more relevant and

is assigned more weight. Usually, Smoothing parameters (smoothing constants)— denoted

by α— determine the weights for observations.

https://cran.r-project.org/web/packages/smooth/vignettes/smooth.html

http://www.statisticshowto.com/timeplot/

http://www.statisticshowto.com/observation-in-statistics/

20

In this thesis we have considered the simplest form of the exponentially smoothing

methods is called simple exponential smoothing (ETS). This method is common and

popular and its popularity is attributed to its being simple, efficient and easy to adjust its

responsiveness to change in the process. suitable for forecasting data with no clear trend or

seasonal pattern. The ETS is popular can be attributed to its simplicity, its computational

efficiency, the ease of adjusting its responsiveness to changes in the process being forecast,

and its reasonable accuracy(Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, 2002).

Simple (single) exponential smoothing uses (Stephanie, 2018) a weighted moving average

with exponentially decreasing weights. The basic formula for Simple Exponential

Smoothing is:

St = αyt-1 + (1 – α) St-1 (Equation 2-5)

Where:

• α = the smoothing constant, a value from 0 to 1. When α is near to zero, smoothing

occurs more slowly. Following this, the best value for α is the one that results in the

smallest mean squared error (MSE). Various ways exist to do this.

• t = time period.

Many alternative formulas exist. For instance, Roberts (1959) replaced yt-1 with the

latest observation, yt Another formula uses the forecast for the preceding and present

period:

𝐹𝑡 = 𝐹𝑡 − 1 + 𝑎(𝐴𝑡 − 1 − 𝐹𝑡 − 1) (Equation 2-6)

= 𝑎 ∗ 𝐴𝑡 − 1 + (1 − 𝑎) ∗ 𝐹𝑡 − 1

Where:

• 𝐹𝑡 – 1 = forecast for the previous period,

• 𝐴𝑡 – 1 = Actual demand for the period,

• a = weight (between 0 and 1). The nearest to zero, the smallest the weight.

Which formula to use is usually a moot point, as most exponential smoothing is

performed using software. Whichever formula you use though, you’ll have to set an initial

http://www.statisticshowto.com/mean-squared-error/

21

observation. This is a judgment call. You could use an average of the first few

observations, or you could set the second smoothed value equal to the original observation

value to get the ball rolling.

ETS can be used by FORECAST. ETS function in Microsoft Excel2016 for time

series (Microsoft, 2018). FORECAST. ETS function computes (predicts) a future value

by using existing values. ETS in Excel has the following syntax:

FORECAST.ETS (target_date, values, timeline, [seasonality], [data_completion],

[aggregation]). This function can be used to predict future sales, consumer trends and

inventory requirements.

The FORECAST.ETS function syntax has the following arguments:

• Target_date: Required. The data point for which you want to predict a value.

Target date can be date/time or numeric.

• Values Required:Values are the historical values, for which you want to

forecast the next points.

• Timeline Required. The independent array or range of numeric data. The

dates in the timeline must have a consistent step between them and can’t be zero. The

timeline isn't required to be sorted, as FORECAST.ETS will sort it implicitly for

calculations.

• Seasonality Optional. A numeric value. The default value of 1 means Excel

detects seasonality automatically for the forecast and uses positive, whole numbers for the

length of the seasonal pattern. 0 indicates no seasonality, meaning the prediction will be

linear.

• Data completion Optional. Although the timeline requires a constant step

between data points, FORECAST.ETS supports up to 30% missing data, and will

automatically adjust for it 0.

• Aggregation Optional. FORECAST.ETS will aggregate multiple points

which have the same time stamp. The aggregation parameter is a numeric value indicating

which method will be used to aggregate several values with the same time stamp. The

default value of 0 will use AVERAGE, while other options are SUM, COUNT, COUNTA,

MIN, MAX, MEDIAN.

http://www.statisticshowto.com/average/

22

2.6.2.3 State Space Model with Box-Cox Transformation, ARMA

Errors, Trend and Seasonal Components (TBATS)

The TBATS model (de Livera, 2011) is a time series model for series exhibiting

multiple complex seasonalities. TBATS is an acronym for all the techniques used to create

the model.

• T for trigonometric regressors to model

• B for Box-Cox transformations

• A for ARMA errors

• T for trend

• S for seasonality

The BATS model is similar to the TBATs model which is a generalization of the

BATs except for lacking the trigonometric regressors. In the forecast package for R, The

TBATS model can be fitted using the TBATS () command.

TBATS allows modeling datasets with varying characteristics, such as non-linear and

linear time series, single, multiple, high period, and non-integer seasonality as well as dual

calendar affects. These advantages make it a complex framework housing a large number

of features which are made easy for statisticians )Geetha & Maksood ,2017).

TBATS combined with ARIMA model, being an adapting model and provided

better and impressive prediction accuracy compared to the other models. Hence, it is

employed for forecasting groundwater data.

Strengths and Weaknesses of TBATS model(Chou,2017):

• Strengths:

1. Can deal with data with non-integer seasonal period, non-nested periods and high

frequency data.

2. Can do multi-seasonality without increasing too many parameters.

3. It has also all the following strengths of BATS:

▪ Box-cox transformation can deal with data with non-linearity and then

somewhat makes the variance becomes constant.

▪ ARMA model on residuals can solve the autocorrelation problem.

https://en.wikipedia.org/wiki/Power_transform

https://yintingchou.com/posts/bats-and-tbats-model/#strweak1

23

▪ No need to worry about initial values.

▪ Can get not only point prediction but also interval prediction.

▪ The performance is better than the simple state space model.

• Weaknesses:

1. Cannot add explanatory variables.

2. The performance for long-term prediction is not very well.

3. The computation cost is big if the data size is large.

.

24

Chapter3

Related Work

25

Chapter 3

Related Work

Forecasting plays a vital role in the operations related to modern management. It is an

important and necessary aid to planning which is considered the backbone of effective

operations. Many organizations have failed because of the lack of forecasting or faulty

forecasting on which the planning was based.

The purpose of forecasting is to help scientists and decision makers making effective

plans and experiments and translating the findings into principles that are easy to

understand and apply (Armstrong & Fildes, 2006). There are many areas in which

forecasting is widely used, such as sales forecasting, forecasting production yields, weather

forecasting, and forecasting demands and so on.

Recently, groundwater forecasting has become an essential component in effective

water resources planning and management. This is because rainfall forecasting helps

predicting the wet and dry periods of the year in advance. It also helps the annual rainfall

intensity to manage disasters or floods.

It provides a valuable trigger in determining the time and the capacity for new water

resources development and control the water consumption. Therefore, there is an increased

need for groundwater demand forecasting. It can provide a simulated view of future, and

contribute in identifying the suitable management alternative in balancing water supply

and demand (Mohamed & Al-Mualla, 2010).

Because of the great importance of forecasting, many researches and papers were

published in this field. We present different related works using two sections: the first

section dedicated to forecasting groundwater and rainfall consists of three levels of

forecasting. The first one is of a short-term nature. It forecasts data up to only one year.

The second one is Medium in term. It forecasts data from one year to ten years. This will

be the main point in this study. The last one is long as it deals with forecasting data for

more than ten years. Regarding the second section, it lists some researches in other domains

of forecasting for example oil, electricity. etc.

26

3.1 Groundwater and Rainfall forecasting

Khali et.al (2015) examined five data driven models for the short-term forecasting

of groundwater levels undermine- tailings recharge conditions at two observation wells:

MLR, ANN, W-MLR, W-ANN and W-ENN models. The ANN and MLR models aimed

to establish a functional relationship between the levels of groundwater and predictors

(tailings recharge, mean temperature and precipitation). The (W-MLR, W-ENN and W-

ANN) Wavelet based models and wavelet analysis were employed to de-noise the mean

temperature and precipitation variables. Then, MLR, ENN and ANN were used to

approximate the functional relationship between the de-noised predictors and the

groundwater levels. Evaluating the performance of the five models was by using the 'leave-

one-out' validation approach. According to their results: The W-ENN models outperformed

the other models of forecasting for the groundwater levels at lead times of 1 day, 1 week

and 1 month. Results also showed that ANN- based models' performance was better than

that of the MLR- based models. By de-noising the predictors using wavelet analysis, the

performance of the ANN and MLR models were improved.

Rajaee et.al (2016) proposed a hybrid wavelet- artificial neural network (WANN)

and a geostical method for spatiotemporal prediction of the groundwater level (GWL) for

one month ahead. They collected monthly observed time series of GWL from Sep 2005 to

April 2014 in 10 piezometers around Mashhad city in Iran. ANN (an artificial neural

network) and a WANN were trained for each piezometer. The prediction accuracy

comparison illustrated that WAN was more efficacious in prediction for month a head of

GWL. In their study the kriging method and Gaussian model were selected to predict GWL

in desired points, a Gaussian model with 0.253 in RMSE were a suitable choice for

spatiotemporal GWL forecasting. According to the obtained map of groundwater level, the

groundwater level was higher in the areas of plain located in mountains, which reflects the

correct outcomes.

Mahmud et.al (2017) used Seasonal Autoregressive Integrated Moving Average

(SARIMA) model to forecast monthly rainfall for twelve months lead-time for thirty

rainfall stations of Bangladesh. The lowest value (0.672) was found for the Barishal station

and the highest R-squared value (0.868) was found for Teknaf station. Only two stations

27

which contain R-squared value below 0.70 were reasonably precise. This indicates the

SARIMA models developed to forecast rainfall. Hence, these SARIMA models could be

used as a convenient tool for nationwide and year-long rainfall forecasting.

Zhao et. al (2018)applied the rainfall forecasting by Kalman Filter method and

ARIMA. Kalman Filter method is used to declare a time series model to determine the

future forecast. It used a recursive solution to minimize error. The rainfall data was

clustered by K-means clustering. They used ARIMA (p,d,q) to construct a state space for

Kalman Filter model. As a result, they had four group of the data and one model in each

group. The study concluded that the Kalman Filter method gave better results compared to

the ARIMA model for rainfall

forecasting in each group. This is the RMSE value in ARIMA was bigger than Kalman

Filter. The error using Kalman Filter method was smaller

than that of ARIMA model.

Qiu et. al ( 2017) proposed a neural network-based approach to automatically extract

features from the time series measured at observation sites and leverage the correlation

between the multiple sites for weather prediction via multi-tasking. It was the first attempt

to use deep learning techniques and multi-task learning to predict short-term rainfall

amount based on multisite features. They formulated the learning task as an end-to-end

multi-site neural network model which allows to leverage the learned knowledge from one

site to other correlated sites and model the correlations between different sites. Experiments

showed that the proposed model outperformed a set of baseline models including the

European Centre for Medium range Weather Forecasts system (ECMWF). They compared

the proposed approach with the results of the public weather forecast center results and

demonstrated its effectiveness.

Bakker et.al (2014) studied three different forecasting models: A Transfer/-noise

model, an Adaptive Heuristic mode, and a Multiple Linear Regression model. In order to

assess the possible performance improvement due to using weather input, the performance

of the models was studied both with and without using weather variable. The largest

forecasting errors can be reduced by 11% when using weather

28

input and the average errors by 7%. From their point of view the reduction was important

for the application of the forecasting model for the control of water supply systems and for

anomaly detection.

Bagirov et al. (2017) developed the model of Cluster wise Linear Regression (CLR) for

monthly rainfall prediction. Monthly rainfall was predicted by applying the algorithm in

Australia, Victoria, over the period of 1889 - 2014 using rainfall data with five input

variables from eight weather stations. Results revealed that the CLR model was efficient

for monthly rainfall predictions and it was superior to another models: the CR-EM, MLR,

SVMreg and ANNs models.

Mohanty et.al ( 2015) applied artificial neural network (ANN) approach to the weekly

forecasting of groundwater levels in multiple wells located over a River Basin. Gradient

descent with momentum and adaptive learning rate backpropagation (GDX) algorithm was

employed to predict groundwater levels 1 week ahead at 18 sites over the study area. An

appropriate set of inputs for the ANN model was selected. It consisted of weekly rainfall,

pan evaporation, river stage, water level in the surface drain, pumping rates of 18 sites and

groundwater levels of 18 sites in the previous week; This led to 40 input nodes and 18

output nodes. The model performance of forecasting groundwater levels at shorter lead

times was better than (up to 2 weeks) that for larger lead times.

Dhekale et.al (2015) aimed at forecasting groundwater fluctuations using time series

analysis groundwater data for each station under Murshidabad district. The time series

water observations were collected for four months (January, May, August and November)

for the period from 2005 to 2013. The technique of structural time series modelling was

applied to model and foresee the behavior of groundwater table in 2014. Data for 2005 to

2012 was used for analysis and 2013 data was used for validation. Residuals of

development model for each station was tested for normality and randomness.

The results showed that there are differences of groundwater depth among the sites

and seasons. During the month of November followed by August, there was maximum

variability among the sites in a particular season. This reflects that groundwater recharge

29

was different according to site and season also. Some regions showed that there was

fluctuating water table over the years, and this could be due to varied rainfall in these

regions.

Mukhairez (2018) conducted the forecasting using four forecasting algorithms:

ARIMA, hybrid ARIMA, singular spectrum analysis (SSA) and linear regression. The

author applied these algorithms to a real dataset collected from Khan Younis municipality

(KHM) at the Department of Customer Services. The best algorithm was Hybrid ARIMA

which gave the least mean absolute percentage error (MAPE). Finally, three levels of

forecast included the study (the whole city, the sub areas and the classes inside Khan

Younis city.

The results after applying Hybrid ARIMA for the next five years showed the

minimum water revenue will decrease about 38% compared to 2017, but the minimum

water consumption for the overall city will increase to about 8.4, compared to 2017.

Banerjee et.al ( 2011) evaluated artificial neural network (ANN) simulation over

mathematical modeling in estimating safe pumping rate to maintain groundwater salinity

in island aquifers. To forecast the salinity under varied pumping rates, ANN model with

quick propagation (QP) as training algorithm had been used. The accuracy, reliability, and

generalization ability of the model were verified by real-time data. The model was trained

with 2 years of real-time data. Regarding the prediction of water quality with varying

pumping rate, it was made for a span of 5 years. The results showed the superiority of ANN

over SUTRA models. ANN model had surfaced as more accurate alternative to the

numerical method techniques. As for the pumping rate, it should be below 13,000 L/day to

stabilize the groundwater salinity within 2.5%. ANN was capable of understanding the

poorly relations between hydrological attributes and advantageous when the vaguely

defined problem did not demand any specific solution.

Cui et.al( 2017) employed entropy spectral analysis for long- term forecasting of

monthly groundwater levels. The frequency domain was the domain

of consideration for defining entropy. Three types of entropies were known:

Configurational entropy, Burg entropy and relative entropy.

30

They led to three types of spectral analysis: (1) configurational entropy spectral

analysis (CESA), (2) Burg entropy spectral analysis (BESA), and (3) relative entropy

spectral analysis (RESA). CESA, BESA, and RESA were employed to

analyze spectra and forecast monthly groundwater levels. Then they were compared to

determine which spectral analysis method better forecasts the monthly groundwater level.

To verify the three methods, historical data of monthly and annual groundwater were

obtained from South Carolina. Both monthly and annual groundwater level data showed

significant decreasing trends at almost all stations. It was found that relative entropy

yielded the highest resolution in determining the spectral density, while for simulating

groundwater levels, all three methods fitted the observed values. Although reasonable

accuracy was yielded of forecasting groundwater time series using entropy spectral

analyses, the entropy spectral analysis had limitations in application for another different

data.

Mekanik et.al ( 2013) focused on forecasting of long-term seasonal spring rainfall

in Victoria. Artificial Neural Network (ANN) and Multiple regression (MR) approaches

were used for this purpose. Three regions (west, east and center) of Victoria were chosen

as a case study. Both ANN and MR modelling were assessed statistically using mean

square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott

index of agreement (d). The developed ANN and MR models were tested on out-of-sample

test sets.

The MR models showed very poor generalization ability for east Victoria with correlation

coefficients of 0.99 to 0.90 compared to ANN with correlation coefficients of 0.42–0.93.

ANN models also showed better generalization ability for central and west Victoria with

correlation coefficients of 0.68–0.85 and 0.58–0.97 respectively. The potential of ANN

over MR models for rainfall forecasting was suggested by the statistical analysis using

large scale climate modes.

Kisi & Sanikhani (2015) conducted long-term monthly

precipitation prediction without climatic data. They examined the accuracy of four

different soft computing methods: adaptive neuro-fuzzy inference system (ANFIS) with

31

grid partition (GP), ANFIS with subtractive clustering (SC), (ANN) and support vector

regression (SVR). The ANFIS-GP model gave the best accuracy in five out of ten stations.

It was better than the other models in long-term monthly precipitation prediction. The ANN

model had the best accuracy in four stations while the ANFIS-SC was the best model in

only one station. The SVR model gave the worst results in all stations. So, the ANFIS-GP

was employed to predict the long- term precipitations of any site without climate

measurements. The annual and monthly precipitations were also mapped and evaluated

using ANFIS-GP model in their study. The precipitation maps showed the highest amounts

of precipitation occurred in the west regions, southwestern and north, while the lowest

values were seen in the southeastern parts and east of Iran.

The abovementioned research examined different models and proposed different

methods for either groundwater or rainfall forecasting and used different evaluations

metrics. Such forecasting was of a medium, short, or long-term nature. In this thesis, we

proposed a new method for prediction using famous forecasting algorithms: ARIMA,

Hybrid ARIMA and ETS. The main purpose is to estimate the groundwater production and

the quantities of rainfall for a given area (Dear El-Balah as a case study) over a given time

(1-5 years) as Medium-term forecasting. The results were evaluated by using MAPE

measure.

3.2 Other domains in forecasting researches

Iqelan (2016) used Singular Spectrum Analysis SSA in forecasting the monthly

electricity consumption of the Middle Province in the Gaza Strip in Palestine.

39 observations of his dataset from November 2005 to December 2015 are used as a

training sample. While the remaining 12 observations from January 2016 to December

2016 are used as a testing dataset to evaluate the electricity consumption forecasts.

The results declared that the error came by the SSA technique was smaller than that

obtained by the ARIMA and ETS state space models according to mean absolute error

(MAE), mean square error (MSE), root mean square error (RMSE)and mean absolute

percentage error (MAPE). SSA is compared with ARIMA and exponential smoothing state

space (ETS). SSA of (MAPE 9.38%) outperformed both ARIMA (MAPE 14.99%) and

ETS (MAPE 15.63%).

32

Igor Aizenberg et.al (2016) discussed the long-term time series forecasting using a

Multilayer Neural Network with Multi-Valued Neurons (MLMVN). They evaluated the

proposed approach using data set of an oilfield asset located in the Gulf of Mexico. They

showed that MLMVN can be efficiently applied to multivariate and univariate one-step

and multi-step ahead prediction of reservoir dynamics. The research aimed to study some

important aspects of the application of

ANN models of time series forecasting that could be of particular interest for pattern

recognition community.

Kaytez et al. (2015) implemented support vector machine (SVM) and least square

support vector machine (LS-SVM) for the prediction of electricity energy consumption of

Turkey. They used independent variables using historical data from 1970 to 2009, e.g.

gross electricity generation, total subscribership results, installed capacity and population.

They compared the performance with multiple linear regression and ANN models. The

LS-SVM model achieved better than that of the ANN and MLR models by 0.88% and

1.70% respectively.

Based on the abovementioned, it can be used effectively for Turkey's long-term electricity

consumption forecast.

Kejela (2012) forecasted electricity consumption on a short-term basis for a particular

region in Norway using a novel approach; Gaussian process. The best feature vector was

designed for forecasting the electricity consumption using various factors such as

temperature, days of the week, previous consumptions, and hour of the day using reduction

and normalization methods. Feature space was scaled and reduced as different target

variables were analyzed to obtain better accuracy. The GP was compared with two

traditional forecasting techniques: Multiple Linear Regression (MLR) and Multiple Back

Propagation Neural Networks (MBPNN). The Gaussian processes were as better as

MBPNN in terms of short-term electricity forecasting, and it was far better than MLR.

33

Zhang et.al (2015) proposed a new hybrid method to forecast crude oil prices. First,

they decomposed international crude oil price into a series of independent intrinsic mode

functions (IMFs) and the residual term by using the ensemble empirical mode

decomposition (EEMD) method. Then, the method of the least square support vector

machine together with the particle swarm optimization (LSSVM–PSO) and the

autoregressive conditional heteroskedasticity (GARCH) model were developed to forecast

the nonlinear and time-varying components of crude oil prices, respectively. Next, the final

forecasted results of crude oil prices were summed. They compared the new method and

previously popular forecasting methods. The results proved superiority of the new hybrid

method in crude oil price forecasting.

Thiyagarajan et.al (2017) Proposed ARIMA model for forecasting the failure of a

sensor that measures surface temperature from an urban sewer. The proposed approach

based on the past time series of data was examined and compared with ETS and TBATS

model. The models were evaluated by using MAE, MPE, MAPD and RMSE. Prediction

the performance of TBATS model was better than that of the ETS model. In addition,

predicting the performance of ARIMA model was better than that of both ETS and TBATS

model.

Panigrahi & Behera (2017) developed a new hybrid methodology by combining

linear and nonlinear models from innovation state space (ETS) with ANN. Because both

ETS and ANN models have linear and nonlinear modeling capability, the ETS–ANN

model glorified the chances of capturing different combination of nonlinear and/or linear

patterns in time series. First, ETS was applied to the given time series and predictions were

obtained. Then the residual error was calculated by subtracting the ETS-predictions from

the original series. The residual error sequence obtained was modeled by ANN. The final

prediction was obtained by combining the ETS-predictions with ANN-predictions. Results

indicated the superiority of proposed model by achieving the best rank among all the

models.

34

Panigrahi & Behera (2017) proposed models of short-term load forecasting based

on linear regression and variables of load time series. The proposed methods were

compared with ARIMA, exponential smoothing, neural networks and SVM models.

Wongsathan & Seedadan (2016) improved the forecast performance of both

ARIMA and NNs for high accuracy by employing both hybrid ARIMA and NNs model to

forecast pollution of time series data in the Chiangmai city. Their results demonstrated that

ARIMA-NNs performed better than single NNs by average 65% and by average 50% for

ARIMA model.

Deb et al., (2017) presented and analyzed a comprehensive review of the 9 major

time series forecasting techniques: ANN, ARIMA, SVM CBR Fuzzy Grey MA & ES, NN

and Hybrid. They provided a summary of hybrid model methods that combined two or

more techniques in a way that each model completes the strength of the other, e.g.

ARIMA+ANN, ARIMA+SVMs, ARIMA+ Evolutionary Algorithms.

Chhetri et.al (2017) proposed a new forecasting technique to predict Amazon EC2

Spot prices. Their approach was distinguished by the application of training periods for the

non-deterministic and deterministic time series components. They evaluated their method

against ARIMA, ETS, STL, and TBATS techniques as well as simple techniques such as

Seasonal Naïve and Naïve. Experimental results indicated that their proposed technique

outperformed STL or ARIMA as a forecasting technique.

Kanchymalay et.al ( 2017) studied the relation between crude palm oil (CPO) price,

some selected vegetable oil prices, the monthly exchange rate and crude oil. Using the

machine learning techniques, they performed comparative analysis on CPO price

forecasting results. Data of monthly CPO prices, crude oil prices, monthly exchange rate

and selected vegetable oil prices were used from January 1987 to February 2017. There

were a positive and high relation between the CPO price and other oil prices and also

between CPO price and crude oil price. Multi-layer perception, Support Vector Regression

and Holt Winter exponential smoothing techniques were used to forecast the CPO price

35

using multivariate time series. The prediction results showed that Support Vector

Regression had the lowest MAPE of 7.8%, so it is the most accurate in forecasting

multivariate time series of CPO price.

Ramos et.al (2015) compared the forecasting performance of ARIMA models and

state space models (ETS). Both multiple-step and one-step forecasts were produced

through applying the models to a case study of retail sales of different categories of women

footwear. The performance of forecasting ARIMA models and state space was evaluated

via MAPE, RMSE. It was noticed that the MAE was similar on both multi-step forecasts

and one-step. The results demonstrated that when an automatic algorithm was applied the

overall out-of-sample forecasting performance of ARIMA models was not better than ETS

models in predicting retail sales, and neither was best for all circumstances.

Gong et.al (2016) developed and applied three models: artificial neural networks

(ANN), support vector machines (SVM) and adaptive neuro fuzzy inference system

(ANFIS) in the prediction of the groundwater level. The prediction took the interaction

between groundwater and surface water into consideration. The datasets were for wells in

Florida, United States of 10 years. Evaluating the performance of the models was by using

five measures, root mean squared error (RMSE), normalized mean square error (NMSE),

Nash-Sutcliffe efficiency coefficient (NS), correlation coefficient (R) and Akaike

information criteria (AIC). The conclusions proved the necessity and effect of considering

the surface water-groundwater in the management of water resources.

Liu et.al (2010) proposed a new short-term forecasting method based on the

methods of classical time series analysis and wavelet. The results demonstrated that the

proposed method:

1) was suitable for forecasting both the wind power series and the wind speed.

2) was strong in dealing with jumping data

3) was better than BP network method and classical time series method.

36

The researchers (Hill, Connor, & Remus, 2015) compared neural networks

forecasts with different time series(annual, monthly and quarterly) forecasting methods.

Some of these methods were: Reference Average, a naive forecasting model, the graphical

based on human judgment and DE seasonalized Holt. The comparison aimed to determine

the method of the best forecasting accuracy performance on the basis of APE (absolute

percentage error). According to the comparison results, the neural networks was the best.

Pati & Shukla (2015) had experimentally verified the predictive performance of

three models: ANN, ARIMA and the Hybrid Model (ARIMA + ANN). These models were

applied to bug number of Debian version. A comparative analysis was presented of

forecasting the performance of these models. It was found that an (ARIMA +ANN) model

was the most appropriate for Debian bug number series, but the performance of ARIMA

model was low and poor in predicting the non-linear patterns.

Garima et.al(2017) used ETS (Exponential Smoothing) and ARIMA

(Autoregressive Integrated Moving Average) for analysis and predicting of weather

parameters. Some of these parameters were Humidity, Air Temperature, Wind Speed and

Rainfall. The accuracy was estimated by different criteria such as: MAE (Moving Absolute

Error), MASE (Moving Absolute Scaled Error), MAPE and RMSE by using different

packages in R. The methods which gave the best forecast will be used for

prediction.

Hassani et.al ( 2017) presented the forecasting comparison among several non-

parametric and parametric techniques e.g. the ARIMA, ETS, NN, TBATS, ARFIMA, MA,

WMA, SSA-R and SSA-V. They used TBATS and SSA-R models for tourist arrival

forecasting purposes. The results suggested that there is no model that its forecasting

accuracy consistently outperformed that of all other models for any of the countries under

any of the forecasting horizons and investigation.

3.3 Related Work Discussion

Figure)3–1): Summary of the Most Related Works to this Work

37

Research Techniques Area /

datasets

Results

Evaluation

metrics

Short come

Medium-Term

Forecasting for

Municipal

Water Demand

and Revenue

(KhanYounis

City as A Case

Study)

ARIMA,

ARIMA

combined with

NN, SSA and

Linear

Regression

The datasets

about water

revenue and

water

consumption

collected from

Khan Younis

municipality

of the

Department of

Customer

Services.

Water

Revenue will

decrease about

3.8%, while

water

consumption

will increase to

8.4%.

used MAPE

measure for

selecting the

most accurate

algorithm

The research

didn’t include

improving of

resulted

MAPE values

before real

forecasting.

A Singular

Spectrum

Analysis

Technique to

Electricity Consumption

The researcher

compared

ARIMA model

and

Exponential

Smoothing

State Space

ETS model

and SSA

model. SSA

outperformed

the others for

electricity

consumption

forecasting

The monthly

electricity

consumption

of the Middle

Province in

Gaza

Strip\Palestine

the error came

by the SSA

technique was

smaller than

those obtained

by the ARIMA

and ETS state

space models

in which MAE

= 1.4158,

MSE =3.5604,

RMSE=1.8869

and

MAPE=0.0938

MAE, MSE,

RMSE and

MAPE.

Different

parameters

were tested,

the best

selection of

SSA

parameters of

electricity

consumption

time series is

that L = 40

and r = 7

The study

conducted

ARIMA and

ETS separately

without

hybridization

ARIMA with

ETS.

Structural

Time Series

Analysis

towards

Modeling and

Forecasting of

Ground Water

Fluctuations in

Murshidabad

District of

West Bengal

A “Structural

time series

model” with

(trend, cyclical

fluctuations,

seasonal

variations and

irregular)

components.

Then

Kalman filter

was used for

optimal

estimator of

the state at any

time,

Groundwater

data from 29

stations under

Murshidabad

district for four

months

January, May,

August and

November

during the

period from

2005 to 2013

The results

indicated the

groundwater

differences

among the

sites of

measurements

as well as

among the

seasons. This

due to the

difference in

groundwater

recharge

within a

season also.

R square,

RMSE, MAE

and MAPE

The research

did not

correlate

groundwater

forecasting

with rainfall

forecasting as

a source of

groundwater

recharge.

A comparison

of ARIMA,

Neural

Network and a

Hybrid

Technique for

Debian

ARIMA, ANN

and Hybrid

Model

(ARIMA+A)

The monthly

bug number

data collected

from Jan –

2000 to

Dec – 2013.

For Debian,

the hybrid

model was

good predictor

for Debian bug

number series

MAE, RMSE

and Average

Error per

Mean (Em)

The

comparison

did not include

MAPE

measure.

38

Research Techniques Area /

datasets

Results

Evaluation

metrics

Short come

Bug Number

Prediction they had 168

monthly bug

counts

The

development

rainfall

forecasting

using kalman

filter

ARIMA and

Kalman Filter

method

They used

rainfall data of

Kabupaten

Jember. the

data is divided

into two parts.

The first one

began from

January 2005

to December

2015.another

part started

from January

2016 until

December

2016.

Kalman Filter

method is

better than

ARIMA model

for rainfall

forecasting,

the error of

Kalman Filter

method

smaller

than the error

of ARIMA

model.

RMSE

measure

selecting the

most

appropriate

algorithm

hadn't been by

computing the

MAPE for the

forecasted

results of

algorithms,

and then

selecting the

algorithm with

lowest MAPE

value for real

forecast.

3.4 Conclusion

From our view of related works of forecasting science, no one applied forecasting

to groundwater and rainfall separately in Gaza and then used the forecasting results for

estimating the relationship between the rainfall amounts - which feed the groundwater

resovir, and leads to reducing its salinity - and the groundwater production in any of the

previous research. We used some known forecasting algorithms: ARIMA, Hybrid ARIMA

(ARIMA + ETS, ARIMA + TBATS, ARIMA+ NN) and ETS as the most popular,

algorithms which give powerful results.

The forecasting algorithms were applied to datasets in Dear El-Balah city for future

prediction of rainfall and groundwater for the next years. Analyzing and linking the results

of predicting rainfall and groundwater based on historical data related to both of them will

help predicting the amounts of rain and groundwater. Accordingly, this will help predicting

the salinity state.

39

Chapter4

Methodology and Model

Development

40

Chapter 4

Methodology and Model Development

This chapter proposes the methodology for forecasting as a data mining technique

of both groundwater and rainfall. The chapter is divided into five sections: section one

introduces the methodology steps, section two contains the process of collection and

acquisition data, section three contains data preprocessing, section four presents selecting

models, and section five is dedicated to the implementation and evaluation of the models.

4.1. Methodology steps

The steps of our methodology are outlined as shown in Figure (4–1).

41

Figure (4–1): Steps of propsed forecasting methodology

Data Collection: Two datasets for groundwater production and

rainfall amounts

Implementation: by applying the algorithms on each data set then evaluating the

algorithms for each dataset

Data splitting

Forecasting:

Applying the most accurate algorithm for real forecast for each:

Groundwater Dataset Rainfall Dataset

Training

Dataset

Testing

Dataset

ETS

ARIMA

ARIMA+NN

ARIMA+ETS

ARIMA + TBATS

Eval

uat

ion

Data Preprocessing: is applied on each data set separately

Data Integration Data Reduction Missing Values

42

4.2. Data Collection

This study depends on two sets of historical data from different institutions in Dear

El-Balah city of the Gaza Strip. The first data set, groundwater production data, has been

collected from the Coastal Municipalities Water Utility (CMWU) since 2007. The second

one, rainfall data has been collected from the ministry of agriculture since 1985.Table

(4-1) and Table (4-2) show a sample of data set before pre-processing from the CMWU

and the Ministry, respectively.

43

Table (4-1): Sample of groundwater dataset before preprocessing

Monthly operating records - Water Facilities in Middle Area

2017-( 1منطقة الوسطى عن شهر ) -القراءات التشغيلية والشهرية لمنشأت االمياه

No. Facility Name /No. Generator-مولد الكهرباء Pump Motor

Pump Hours

Generator Hours

Pump Ampere

Electricity consumption

Water Production

Days of

break done

Comments

ماتور Capacity Type اسم المنشأة ورقمها الرقم المضخة

ساعات تشغيل

المضخة

ساعات تشغيل المولد

أمبير المضخة

كمية المياه وات ساعة -كيلو المنتجة

عدد االيام

المتوقف فيه البئر

مالحظات

KVA hp hour hour A Kwhr m3 Day

Dear El Balah

1 J-146 200 ابو ناصر PERKINS 100 354 154 62 5069 33,660

2 S-69 20 ابو مروان PERKINS 100 312 172 65 3656 36,610

14,400 6681 42 0 275 ابو حمام 3

18,122 7005 45 0 300 40 _ _ بئر كفار دارووم 4

5 AL Aqsa 77 االقصى PERKINS 40 311 0 39 8510 18,660

6 J -32 200 التحلية PERKINS 40 317 14 48 135 0

7 K-20 65 1 بركة PERKINS 50 152 0 40 65 10,635

8 K-21 2بركة 65 PERKINS 50 193 11 40 399 18,285

12,780 25 213 بئر شوقي 9

10 Sahel4 4ساحل 80 PERKINS 30 366 32 39 7288 18,263

11 Sahel5 5ساحل _ _ 30 259 55 32 7299 22,140

12 Sahel6 6ساحل 80 PERKINS 30 340 4 40 6043 15,490

44

The table contains different information for 12 wells that belong to Deir El- Balah

city in the middle area of Gaza. These columns are: Facility name/No, generator (capacity-

type), pump Ampere, generator Hours, Electricity consumption, water production, days

of break done and comments. In this research the main focus is on the Water Production

attribute that represents the wells production of groundwater.

Table (4-2): Sample of Rainfall data before preprocessing

1985-1986

Year

B Hanon

B LahiaL

Shati

Gaza city/Remal

Nussirate

Dr-Elbalah Date

10/11/1985 1985 0.0 0.0 1.0 0.0 0.0 0.0

10/19/1985 1985 0.0 0.0 0.5 9.0 0.0 0.0

10/20/1985 1985 6.5 7.5 6.7 0.0 0.0 0.0

11/9/1985 1985 1.0 2.0 1.5 0.0 0.0 0.0

11/10/1985 1985 0.0 0.0 0.5 0.0 0.0 0.0

11/11/1985 1985 0.0 2.5 0.0 0.0 0.0 0.0

11/30/1985 1985 0.0 5.5 4.6 2.0 2.5 0.0

12/3/1985 1985 3.5 2.0 2.7 3.0 6.0 2.5

12/15/1985 1985 5.0 5.0 1.3 3.0 5.5 1.0

12/17/1985 1985 1.0 3.0 1.0 0.0 0.0 0.0

12/18/1985 1985 12.5 11.0 11.5 12.5 12.5 13.5

12/19/1985 1985 3.0 4.0 2.0 10.5 10.0 12.0

12/22/1985 1985 15.5 12.0 11.6 3.0 13.0 16.0

12/25/1985 1985 0.0 0.5 0.5 0.0 0.5 0.5

12/26/1985 1985 16.0 13.5 16.1 17.5 25.0 44.0

12/27/1985 1985 5.8 7.5 9.7 16.8 6.7 9.3

12/28/1985 1985 5.7 7.5 9.8 16.7 6.8 9.2

1/8/1986 1986 0.0 0.0 1.5 1.0 0.0 0.5

1/11/1986 1986 0.0 0.0 3.2 0.0 0.0 0.0

1/12/1986 1986 21.0 36.0 29.0 17.0 1.5 0.0

1/15/1986 1986 15.0 16.5 20.4 3.5 6.0 7.0

1/18/1986 1986 2.5 2.5 1.0 2.5 4.5 4.5

1/19/1986 1986 2.5 4.0 3.5 2.0 3.0 1.5

1/31/1986 1986 0.0 0.0 1.0 0.0 0.0 0.0

2/3/1986 1986 0.0 1.0 0.0 0.0 0.0 1.5

2/4/1986 1986 0.5 3.0 0.1 2.5 4.0 6.5

2/6/1986 1986 0.5 2.0 1.0 1.5 1.0 1.0

2/8/1986 1986 10.0 8.5 8.5 1.5 4.0 1.5

2/9/1986 1986 6.5 5.0 6.4 7.0 5.0 6.0

2/13/1986 1986 5.0 10.0 1.7 0.0 0.0 0.0

2/14/1986 1986 0.0 41.0 20.0 18.0 9.0 0.0

2/15/1986 1986 35.0 9.0 6.5 17.0 15.5 28.5

2/24/1986 1986 6.5 1.5 0.0 4.0 6.5 6.0

3/29/1986 1986 0.0 2.0 1.5 0.0 0.0 0.0

45

1985-1986

Year

B Hanon

B LahiaL

Shati

Gaza city/Remal

Nussirate

Dr-Elbalah Date

3/30/1986 1986 1.0 1.5 1.7 2.5 0.0 1.0

4/1/1986 1986 0.0 3.0 0.0 0.0 0.0 0.0

4/2/1986 1986 11.0 5.5 6.5 15.5 16.0 18.0

4/7/1986 1986 16.0 9.5 13.5 30.5 30.5 35.5

5/2/1986 1986 0.0 0.0 1.0 2.5 1.0 4.0

5/3/1986 1986 7.0 13.0 23.0 6.0 8.5 9.0

Sum of 85/86 215.5 258 232 228.5 204.5 240

As shown in Table (4-2), the rainfall data set contains the monthly amounts of

rainfall for all cities in Gaza (e.g. Jabalia, Nussirate, Dear El-Balah) what matters in this

research is the dataset of Dear el- Balah city.

The related tables and columns that belong to our study were selected and prepared

for the next phase.

4.3. Data Preprocessing

At this essential step, the required preprocessing tasks were applied to enhance

data efficiency before applying the forecasting algorithms. Preprocessing include several

techniques e.g. cleaning, reduction, transformation and integration. Microsoft Excel 2016

version is used to perform the following data preprocessing:

4.3.1. Data integration

Data integration is achieved by combining data from multiple datastores

gained from either (CMWU) or MoA into one consistent dataset. So, we merged

the CMWU data through the common ID and have a data set of 12 attributes as in

46

Table (4-3). The number of wells belonging to the (CMWU) in Deir El-Balah area

is 12 wells, these wells provide the city with water needs. The first attribute represents the

ID of each well, the second, is the well's name, the next 12 columns are for the monthly

production amounts of each well and the last attribute is for the annual well production.

Also, we combined the gained MoA data from multiple datastores into one consistent

dataset as in Table (4-4). It contains: (date, year and the amounts of rains) attributes.

47

Table (4-3): Sample of combining wells production in new data set

اسم البئر الرقم

االجمالي 12-االنتاج 11-االنتاج 10-االنتاج 9-االنتاج 8-االنتاج 7-االنتاج 6-االنتاج 5-االنتاج 4-االنتاج 3-االنتاج 2-االنتاج 1-االنتاج

No.

1 J-146 680,800 53,370 52,540 60,350 58,380 55,670 56,050 65,100 60,980 58,340 55,100 47,370 57,550 ابو ناصر

2 S-69 558,780 46,190 55,040 55,010 54,310 46,730 41,760 44,350 45,420 43,230 42,710 39,520 44,510 ابو مروان

3 T-46 30,000 0 0 0 0 0 0 0 0 0 0 0 30,000 ايو حمام

414,528 36,182 33,267 34,566 35,376 39,951 42,668 38,213 34,519 34,286 40,510 26,780 18,210 بئر كفار دارووم 4

5 AL A qsa 604,680 47,160 47,930 54,430 53,580 53,110 54,120 53,490 55,730 53,670 49,310 32,980 49,170 االقصى

6 J -32 659,860 40,990 45,210 53,850 53,130 60,420 54,450 68,870 58,620 33,452 83,778 51,430 55,660 التحلية

7 K-20 584,542 50,643 50,643 50,643 45,492 48,531 46,733 48,211 52,311 45,914 53,270 42,931 49,220 1 بركة

8 K-21 2بركة 5,456 49,630 56,560 56,130 58,420 57,480 57,640 56,660 55,440 54,640 47,660 38,110 593,826

9 Sahel3 3ساحل 41,097 35,384 40,466 34,916 35,981 39,604 38,740 40,196 27,974 39,597 16,466 2,716 393,137

10 Sahel4 4ساحل 32,080 32,120 39,830 35,860 41,690 41,110 38,800 40,060 37,440 39,180 38,830 40,210 457,210

11 Sahel5 5ساحل 25,020 28,400 29,540 27,540 30,710 29,450 27,850 30,290 29,340 12,970 12,826 12,804 296,740

12 Sahel6 6ساحل 790 1,000 1,060 6,430 6,720 6,320 15,910 19,210 28,880 31,010 33,010 32,110 182,450

48

Table (4-4): Sample of combining rain amounts of Dear El- Balah city in new data set

1985-1986 Year

Dr-Elbalah Date

10/11/1985 1985 0.0

10/19/1985 1985 0.0

10/20/1985 1985 0.0

11/9/1985 1985 0.0

11/10/1985 1985 0.0

11/11/1985 1985 0.0

11/30/1985 1985 0.0

12/3/1985 1985 2.5

12/15/1985 1985 1.0

12/17/1985 1985 0.0

12/18/1985 1985 13.5

12/19/1985 1985 12.0

12/22/1985 1985 16.0

12/25/1985 1985 0.5

12/26/1985 1985 44.0

12/27/1985 1985 9.3

12/28/1985 1985 9.2

1/8/1986 1986 0.5

1/11/1986 1986 0.0

1/12/1986 1986 0.0

1/15/1986 1986 7.0

1/18/1986 1986 4.5

1/19/1986 1986 1.5

1/31/1986 1986 0.0

2/3/1986 1986 1.5

2/4/1986 1986 6.5

2/6/1986 1986 1.0

2/8/1986 1986 1.5

2/9/1986 1986 6.0

2/13/1986 1986 0.0

2/14/1986 1986 0.0

2/15/1986 1986 28.5

2/24/1986 1986 6.0

3/29/1986 1986 0.0

3/30/1986 1986 1.0

4/1/1986 1986 0.0

4/2/1986 1986 18.0

4/7/1986 1986 35.5

5/2/1986 1986 4.0

5/3/1986 1986 9.0

Sum of 85/86 240

49

4.3.2. Data reduction:

In this step we reduced the representation of data set in smaller volumes, we

removed irrelevant attributes (attribute subset selection). The result of this step is two data

sets on the form of time series for groundwater and rainfall. Table (4-5) represents a

sample of groundwater data set with two columns: the first column "Month" represents

the time using the format of (mmm-yy), and the second column "amount(m3)" represents

the amount of groundwater.

Table (4-5): Sample of groundwater time series data set

Month Amount (m3)

Jan-08 339957

Feb-08 371711

Mar-08 396483

Apr-08 398139

May-08 398812

Jun-08 390183

Jul-08 415331

Aug-08 415165

Nov-08 352150

Dec-08 317989

Jan-09 299446

Feb-09 319041

Mar-09 372687

Jul-09 399822

Aug-09 423123

And Table 4-6) represents the rainfall data set in two columns: the year and the

amount (quantity of rainfall):

Table 4-6): Sample of Rain time series data set

Year Amount (mm)

1985 108

1986 626.4

1987 130.2

50

Year Amount (mm)

1992 419.3

1993 209.6

1994 683

2002 435.7

4.3.3. Missing values

We estimated the missing value of (Jun 2013) in a groundwater dataset using the

average values of the same month from the three previous years respectively: (Jun 2010,

Jun 2011, Jun 2012) as shown in Table (4-7). This is because the monthly amount of

rain in June 2013 is similar to the amounts of rain in the same month of the three previous

years.

Table (4-7): Filling missing value in JUN-13

Month Amount (m3)

Feb-13 331622

Mar-13 405257

Apr-13 465403

May-13 417558

Jun-13 436912

Jul-13 474883

Aug-13 513654

Sep-13 484779

After preprocessing and preparing the real datasets on the form of time series data,

we gained two data sets. The first is the monthly production of wells from January-2008

to December 2017. It consists of two columns (Month, Amount (m3)) which is presented

in appendix: B and declared in Figure (4–2). The second data set of the annual rainfall

amounts from 1985 to 2017, It consists of two columns (Year, Amount(mm)) as in

appendix: a and shown in Figure (4–3).

51

Figure (4–2): Monthly wells production data of groundwater

Figure (4–3): Annual rainfall amounts

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

Jan

-08

Jun

-08

No

v-0

8

Ap

r-0

9

Sep

-09

Feb

-10

Jul-

10

De

c-1

0

May

-11

Oct

-11

Mar

-12

Au

g-1

2

Jan

-13

Jun

-13

No

v-1

3

Ap

r-1

4

Sep

-14

Feb

-15

Jul-

15

De

c-1

5

May

-16

Oct

-16

Mar

-17

Au

g-1

7

Amount (m3)

0

100

200

300

400

500

600

700

800

19

85

19

86

19

87

19

88

19

89

19

90

19

91

19

92

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

20

13

20

14

20

15

20

16

20

17

Amount (mm)

52

4.4. Selected Models

In this thesis we used five models by different algorithms ARIMA, ETS and three

hybrid ARIMA models: ARIMA combined with Neural Network, ARIMA combined with

ETS and ARIMA combined with TBATS. The models were described in details in section

2.7.

We combined these hybrid models for prediction using R package forecastHybrid.

Each model is represented by a character string of any combination of a, e, n, and t for

auto.arima, ETS, nnetar, and TBATS respectively.

We selected these models because they are efficient and common (Dalinina, 2017)

and the experimental results with real data sets showed the effectiveness of the combined

model to improve forecasting accuracy, make up a high-performing forecasting method,

give truer prediction than any individual model’s prediction and to take advantage of the

unique strength of ARIMA and other linear and nonlinear models. (Khashei & Bijari,

2011).

4.5. Implementation

After preparing the two datasets on the form of time series data, five types of

forecasting algorithms were applied. According to their results, the appropriate algorithm

with the lowest MAPE value will be nominated to implement the actual forecast.

4.5.1. Tools

Different tools used to build models and test the five algorithms are:

• RapidMiner Studio

• R project

• Excel 2016.

4.5.2. Steps of applying models

The original wells data is divided into two parts as shown in Table (4-8) and in Figure

(4–4).

53

Table (4-8): Training and testing data periods of datasets

Figure (4–4): Splitting orginal datasets into training and testing

1- It is separated into training and testing data sets. The training set of wells belongs to

the period from January-2008 to December -2015 which represents (80 % of the

original data) and testing set belongs to the period from January 2016 to December -

2017 which presents (20% of the original data). Also, we divided the rain data into two

parts, training and testing sets. The training set of rain belongs to the period from 1985

to 2010 which represents (78.7 % of the original data) and testing set belongs to the

period from 2011 to 2017 (21.2% of the original data).

2- We applied the five algorithms on wells training set with horizon value (24). We

determined this value to reach the last record in testing set (December 2017). Then, we

computed the MAPE measure of the actual values in testing set and predicted values.

Data sets Training set Testing set

Wells data Jan-2008 to Dec-2015

(96 months)

Jan -2016 to Dec-2017

(24 months)

Rain data 1985 to 2010

(26 years)

2011-2017

(7 years)

Rain dataset (32 years-384 months) Groundwater dataset (120 months- 10years)

54

3- We ran again the five algorithms on the training set of rain data with horizon value

(7) to reach the last record in testing set (2017). As in the previous step we also computed

the MAPE measure between the actual values in testing set and predicted values.

4- We evaluated the algorithms by comparing MAPE's of algorithms to choose the most

accurate algorithm for real forecasting. The forecasting of rain amounts and

groundwater production separately, according to the lowest MAPE value(Agrawal et.al

2016).

4.5.3. Evaluation

We evaluated the model accuracy by calculating the mean absolute percentage

error MAPE:


𝑵∑ |




Where Actuali and Predicted i are the actual and predicted(n) step ahead value at

ith sample and (N) is the total number of observations.

According to our calculations and comparing MAPEs for applying (ARIMA,

HYBRID ARIMA, ETS) on wells data and rain data separately, we found that ARIMA+

NN was the lowest MAPE value. Accordingly, it is the most accurate algorithm of rain

forecasting. We also found that (ARIMA+ TBATS) was the lowest MAPE value, and

hence, the most accurate algorithm of groundwater forecasting.

4.6. Summary

In this chapter, the methodology of this work was presented starting from

collecting the data from different institutions to determining the most accurate algorithm

for real forecast. After collecting the data, data were pre-processed and we gained time

series data, then we divided data into training and testing sets to build the models after

applying five forecasting algorithms using several tools. Our model is tested using MAPE.

The next chapter presents and discusses the results of our research.

55

Chapter5 Experimental Results and

Discussion

56

Chapter 5

Experimental Results and Discussion

In this chapter we present the environment of the experiments and description of

the datasets. Also, the evaluation was conducted on the forecasting algorithms of time

series historical data. The evaluation aimed to decide which algorithm is the most accurate

among Auto Regressive Integrated Moving Average (ARIMA), (Hybrid ARIMA):

ARIMA combined with Neural Networks, ARIMA combined with ETS and ARIMA

combined with TBATS and ETS.

These algorithms are the most famous and recent ones according to previous

research on forecasting process of time series data, because the data of both rain and

groundwater are complex and nonlinear phenomenon. Here, we apply five algorithms on

our datasets, then calculate the (MAPE). The algorithm that achieved the lowest MAPE

value will be the most accurate one to be used in forecasting for the next five years.

5.1. Experiment sets

We performed our forecasting process and data preprocessing on a PC machine with

the following specification:

1. Operating System: Windows 10 Education 64-Bit

2. Processor: Intel® Core™ i53230M CPU @2.40GHz (4 CPUs)

3. RAM: 8GIGA RAM

We used three software tools:

1- Microsoft Excel 2016:

A spreadsheet was developed by Microsoft for Windows, Mac OS X, and iOS.

2- RapidMiner Studio 7.6.001

It is a data science software platform developed by the company of the same name

that provides an integrated environment for learning, data , text mining, deep

learning, and predictive analytics.

3- R studio

https://en.wikipedia.org/wiki/Data_science

https://en.wikipedia.org/wiki/Text_mining

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/Predictive_analytics

57

It is open-source and a free (IDE) for R(a programming language) it is

used for graphics and statistical computing . R Studio is available in open source and

runs on the desktop (macOS, Linux, and Windows) or in a browser connected to RStudio

Server Pro or RStudio Server.

5.2. Data set

In this research, we used two real data sets. The first one collected from the

CMWU in Dear El- Balah of Gaza. It presents the monthly groundwater production as for

the second data set, it was collected from the MoA. It presents the annual rain amounts.

After preprocessing the datasets as shown in the previous chapter, the original rain

data set consists of two columns (Year and amount (mm)). The rain data set contains data

of almost 32 years (~32 years, about 384 months). It was splitted into a training set,

containing the data of the first 26 years (from 1985 to 2010) and a testing set holding the

remaining data which represents data of 7 years (from 2011 to 2017); about 84 months.

Building the models was by running algorithms on the training sets, while the testing set

was unseen. A sample of the training set of rain data for both attributes (Year and

Amount(mm)) was given in Table (5-1), and a sample of testing set for the same data and

the same attributes (Year and Amount(mm)) was given as in Table (5-2).

Table (5-1): Sample of the training set of rain data from 1985 to 2010

Year Amount (mm)

1985 108

1986 626.4

1987 130.2

1988 344.5

1989 264.5

1990 203.4

1991 497.7

1992 419.3

2006 389.5

2007 219.5

2008 281

2009 220

2010 131

https://en.wikipedia.org/wiki/Free_and_open-source

https://en.wikipedia.org/wiki/R_(programming_language)

https://en.wikipedia.org/wiki/Programming_language

https://en.wikipedia.org/wiki/Statistical_computing

58

Table (5-2): Sample of the testing set of rain data from 2011 to 2017

Year Amount (mm)

2011 385.5

2012 255.5

2013 375.5

2014 216

2015 514

2016 298.5

2017 241.2

Another data set of groundwater consists of two columns (Month and amount m3).

It contains almost 120 months from January 2008 to December 2017. It was splitted into

training set, containing the first 96 months of data from January 2008 to December

2015(96 months), and testing set representing the remaining data from January 2016 to

December 2017 (~24 months). Building the models was by running algorithms on the

training sets (actual data), while testing data were unseen (predicted data). A sample of

training set for both attributes (Month and Amount(m3)) was given in Table (5-3), and a

sample of testing set of wells data was given for the same attributes (Month and

Amount(m3)) as in Table (5-4).

Table (5-3): Sample of the training set of wells data from Jan 2007 to Dec 2017

Month Amount (m3)

Jan-08 339957

Feb-08 371711

Feb-15 265190

Mar-15 296940

May-15 371687

Jun-15 144988

Jul-15 374003

Aug-15 383085

Sep-15 385764

Oct-15 402146

Nov-15 334845

Dec-17 290019

59

Table (5-4): Sample of testing set of wells data from Jan 2016 to Dec 2017

Month Amount (m3)

Jan-16 293112

Feb-16 258263

Mar-16 339185

Apr-16 318414

May-16 383726

Jun-17 322346

Jul-17 498978

Aug-17 377795

Sep-17 391049

Oct-17 386677

Nov-17 328756

Dec-17 332391

5.3. Evaluation forecasting algorithms

We applied five forecasting algorithms: ARIMA, Hybrid ARIMA:(ARIMA+NN),

(ARIMA+ ETS) and (ARIMA+TBATS), and the ETS on the training set of rain data with

horizon value 7 to reach 2017(the last record in the rain testing set), and on the training

set of wells data with horizon value 24 to reach December 2017(the last record in the

groundwater testing set). Then the evaluation process of the performance of these

algorithms will be done on the "amount" attribute available for each data set. It was done

by computing the MAPE for the results of each algorithm. The MAPE value of the

predicted and actual values was computed according to the following equation:


𝑵∑ |




Finally, we compared the results of the MAPE values of the forecasting algorithms.

The algorithm with the lowest MAPE value will be the most accurate one, so it will be

selected for the real forecast.

5.3.1. Evaluating algorithms over Amount attribute in rain data set

We evaluated the results of the selected five algorithms of the attribute ‘amount m3'

and computed the MAPE for each algorithm.

60

5.3.1.1. ARIMA Evaluation

ARIMA: is a model of a basic forecasting technique, it is an abbreviation for 'Auto

Regressive Integrated Moving Average'. An ARIMA model is specified by these three

order parameters: (p, d, q).

P: determines the number of Auto Regressive terms in the model.

q: determines the number of Moving Average terms in the model.

d: determines the number of differentiations applied on the Time Series values.

To apply ARIMA algorithm, we ran the following process in RapidMiner and used

“Arima Trainer” Operator. We conducted different experiments with different values of

(p,d,q) parameters and we computed the MAPE value in each experiment. The best

performance of prediction was of the following values:

(p=1, d=0, q=1).

Figure (5–1): Applying ARIMA Process by RapidMiner tool

Figure (5–1)shows the process of applying ARIMA algorithm. It consists of three

components.

‘Read Excel’: to load data set from Microsoft Excel spreadsheets.

‘ARIMA Trainer’: to train the ARIMA model over the dataset.

‘Apply Forecast’: to forecast the selected attribute according to the determined

horizon value.

61

Figure (5–2): ARIMA Evaluation for rain amounts (Actual vs Predicted).

Figure (5–2) shows that there is a clear variation in the representation of the

actual and predicted values. ARIMA gives predicted values in almost a horizontal line

because the data in non-stationary and closed to linear. We computed the MAPE for

ARIMA, it was 22.7%.

5.3.1.2. Hybrid ARIMA (ARIMA with Neural Network (NN)) Evaluation

Evaluation Hybrid ARIMA (ARIMA with Neural Network (NN)) is explained as in

Figure (5–3).

Figure (5–3): Evaluation of rain data using (ARIMA with NN) by R code

As shown in the previous figure, R code presents ARIMA combined with neural

network. In line 6, we used (models=’an’) attribute in which ‘a’ refers to ‘auto.arima’

model and ‘n’ refers to Neural Network. We determined the horizon value =7 as in line7

(h=7).

0

500

1000

2011 2012 2013 2014 2015 2016 2017

Rain Prediction with ARIMA

Actual ARIMA

62

Figure (5–4): (ARIMA+NN) Evaluation for rain amount (Actual vs Predicted)

Figure (5–4) shows that Hybrid ARIMA gives predicted values which are almost

close to the actual value in the period from 2011 to 2012. After 2012, the representation

of ARIMA is closed to Linear pattern because of the nature of data. After computing

MAPE we found it equals 21.0%.

5.3.1.3. Hybrid ARIMA (ARIMA with (ETS)) Evaluation

In this evaluation task we combined ARIMA with Exponential Smoothing State Space

Model (ETS).

Figure (5–5): Evaluation of rain data using (ARIMA + ETS) R code

0

200

400

600

2011 2012 2013 2014 2015 2016 2017

Rain Prediction with (ARIMA+NN)

Actual ARIMA+ NN

63

According to Figure (5–5) R code presents ARIMA combined with (ETS). In line 6

we used (models=’ae’) attribute, in which ‘a’ refers to ‘auto.arima’ model and ‘e’ refers

to exponential Smoothing State Space Model. We determined the horizon value =7 as in

line7 (h=7).

Figure (5–6): (ARIMA+ ETS) evaluation for rain amount (Actual vs Predicted)

In Figure (5–6) we can see that Hybrid ARIMA :(ARIMA+ ETS) gives predicted

values. It is clear that there is no convergence or symmetry between predicted values and

the actual values from the period 2011 to 2017. The representation looks like ETS and

that’s due to the non-stationary data in which the algorithms were applied. After

computing MAPE, we found it equals 23.8%.

5.3.1.4. Hybrid ARIMA (ARIMA with (TBATS)) Evaluation

In this evaluation task we combined ARIMA with (Exponential Smoothing State

Space Model with Box-Cox Transformation, ARMA Errors, Trend And Seasonal

Components) (TBATS).

Figure (5–7): Evaluation of rain data using (ARIMA +TBATS) R code

0

200

400

600

2011 2012 2013 2014 2015 2016 2017

Rain Prdiction with( ARIMA+ETS)

Actual ARIMA+ ETS

64

According to Figure (5–7), R code presents ARIMA combined with (TBATS)

model by using (models=’at’) attribute in line 6, in which ‘a’ refers to ‘auto.arima’ model

and ‘t’ refers to TBATS Model. We determined the horizon value =7 as in line7 (h=7).

Figure (5–8): (ARIMA+ TBATS) evaluation for rain amount (Actual vs Predicted)

In Figure (5–8) we can see that Hybrid ARIMA: (ARIMA+ TBATS) gives predicted

values which do not match the actual value in the period from 2011 to 2017. The two

curves are not compatible, but it looks like (ARIMA+NN) representation. After computing

MAPE we found it equals 21.2%.

5.3.1.5. Exponential Smoothing (ETS) Evaluation

ETS is commonly and a basic used type of predictive analysis. It is a statistical

measure that used to calculate or predicts a future value based on existing (historical)

values by using the Exponential Smoothing (ETS) algorithm. The predicted value is a

continuation of the historical values in the specified target date, which should be a

continuation of the timeline. This function can be used to predict future sales, inventory

requirements, or consumer trends.

This function requires the timeline to be organized with a constant step between

the different points. For example, that could be a monthly timeline with values on the 1st

of every month, a yearly timeline, or a timeline of numerical indices. For this type of

timeline, it’s very useful to aggregate raw detailed data before applying the forecast, which

produces more accurate forecast results as well.

0

200

400

600

2011 2012 2013 2014 2015 2016 2017

Rain Prediction with (ARIMA+TBATS)

Actual ARIMA+ TBATS

65

We used MS-EXCEL 2016 software to run ETS algorithm.

Figure (5–9): Evaluation of ETS for rain amounts (Actual vs Predicted)

According to Figure (5–9), we can see that the values of predictive values by ETS

are below the actual values. We computed MAPE value and found it 24.4%.

5.3.2. Comparing the Methods performance of ‘rain amount’

As explained before, the algorithm with the lowest MAPE value will be the most accurate

algorithm. As a result, it will be selected for real forecast. Table (5-5) shows the

compared values of MAPE.

Table (5-5): Comparing Methods MAPE over ‘Rain amount'.

Algorithm %MAPE

ARIMA 22.7

ARIMA+NN 21.0

ETS 24.4

ARIMA+ ETS 23.8

ARIMA+ TBATS 21.2

0

200

400

600

2011 2012 2013 2014 2015 2016 2017

Rain Prediction with ETS model

Actual ETS

66

Figure( 5–10): MAPE percentages for forecasting algorithms

According to the MAPE results of the five applied algorithms as in Table (5-5)

and Figure( 5–10) , we found that Hybrid ARIMA(ARIMA +NN) with MAPE of 21.0. is

the most accurate algorithm and will be selected for future forecasting. The lowest MAPE

value of this algorithm means that the forecasting values are close to the actual value (in

the testing set), while the ETS is out of consideration because it has the highest MAPE

(24.4%). This means the forecasting values of predicting set are far away from data of the

testing set.

We conclude from the results of evaluating the forecasting algorithms of rain

amounts that ARIMA combined with NN is the appropriate algorithm which will be used

for future forecast on rain data in our case.

All the above work is conducted and applied again on the attribute ‘amounts’ of

groundwater dataset to evaluate the five algorithms and choose the most appropriate

algorithm for future groundwater forecast.

5.3.3. Algorithms evaluation over groundwater production amounts

We applied the five forecasting algorithms on the rain amounts attribute, and after the

evaluation we found that hybrid ARIMA (ARIMA +NN) was the most accurate algorithm

to be used in real rain amounts forecasting. The previous work was repeated on the

attribute ‘amounts’ of monthly wells production dataset.

22.7

21.0

24.423.8

21.2

19.0

20.0

21.0

22.0

23.0

24.0

25.0

ARIMA ARIMA+ NN ETS ARIMA+ ETS ARIMA+ TBATS

Algorithms MAPE

67

5.3.3.1. ARIMA Evaluation

Figure (5–11): ARIMA evaluation for wells production (Actual vs Predicted)

The computed MAPE was 20.6% for this algorithm.

5.3.3.2. Hybrid ARIMA(ARIMA+NN) Evaluation

Figure (5–12): Hybrid ARIMA (ARIMA+NN) Evaluation for wells production (Actual vs

Predicted).

As we notice there is a convergence in the two curves: predicted and actual. After

computing MAPE we found it is 10.3%.

0

200000

400000

600000

Jan

-16

Feb

-16

Ma

r-1

6

Ap

r-1

6

Ma

y-1

6

Jun

-16

Jul-

16

Au

g-1

6

Sep

-16

Oct

-16

No

v-1

6

De

c-1

6

Jan

-17

Feb

-17

Ma

r-1

7

Ap

r-1

7

Ma

y-1

7

Jun

-17

Jul-

17

Au

g-1

7

Sep

-17

Oct

-17

No

v-1

7

De

c-1

7

The prediction of wells production with (ARIMA+NN)

Actual ARIMA+ NN

0

200000

400000

600000Ja

n-1

6

Feb

-16

Ma

r-1

6

Ap

r-1

6

Ma

y-1

6

Jun

-16

Jul-

16

Au

g-1

6

Sep

-16

Oct

-16

No

v-1

6

De

c-1

6

Jan

-17

Feb

-17

Ma

r-1

7

Ap

r-1

7

Ma

y-1

7

Jun

-17

Jul-

17

Au

g-1

7

Sep

-17

Oct

-17

No

v-1

7

De

c-1

7

The prediction of wells production with (ARIMA)

Actual ARIMA

68

5.3.3.3. Hybrid ARIMA(ARIMA+ETS) Evaluation

Figure (5–13): (ARIMA+ETS) Evaluation for wells Production (Actual vs Predicted)

After computing MAPE we found it is 9.1%.

5.3.3.4. Hybrid ARIMA(ARIMA+TBATS) Evaluation

Figure (5–14): (ARIMA+TBATS) evaluation for wells Production (Actual vs

Predicted).

After computing MAPE we found it is 8.9%.

0

200000

400000

600000Ja

n-1

6

Feb

-16

Ma

r-1

6

Ap

r-1

6

Ma

y-1

6

Jun

-16

Jul-

16

Au

g-1

6

Sep

-16

Oct

-16

No

v-1

6

De

c-1

6

Jan

-17

Feb

-17

Ma

r-1

7

Ap

r-1

7

Ma

y-1

7

Jun

-17

Jul-

17

Au

g-1

7

Sep

-17

Oct

-17

No

v-1

7

De

c-1

7

The prediction of wells production with (ARIMA+ETS)

Actual ARIMA+ ets

0

200000

400000

600000

Jan

-16

Feb

-16

Ma

r-1

6

Ap

r-1

6

Ma

y-1

6

Jun

-16

Jul-

16

Au

g-1

6

Sep

-16

Oct

-16

No

v-1

6

De

c-1

6

Jan

-17

Feb

-17

Ma

r-1

7

Ap

r-1

7

Ma

y-1

7

Jun

-17

Jul-

17

Au

g-1

7

Sep

-17

Oct

-17

No

v-1

7

De

c-1

7

The prediction of wells production with (ARIMA+TBATS)

Actual ARIMA+ tbats

69

5.3.3.5. Exponential Smoothing (ETS) Evaluation

Figure (5–15): ETS evaluation for wells production (Actual vs Predicted)

After computing MAPE we found it 12.6%.

5.3.4. Comparing Methods accuracy over ‘Wells Production’

Table (5-6)Algorithms performance (accuracy) over wells Production.

Algorithm MPE%

ARIMA 25.9

ARIMA+NN 11.4

ETS 12.6

ARIMA+ ETS 9.1

ARIMA+ TBATS 8.9

Figure (5–16): MAPE percentages for forecasting algorithms for monthly amounts of

wells’ production.

0

200000

400000

600000Ja

n-1

6

Feb

-16

Mar

-16

Ap

r-1

6

May

-16

Jun

-16

Jul-

16

Au

g-1

6

Sep

-16

Oct

-16

No

v-1

6

De

c-1

6

Jan

-17

Feb

-17

Mar

-17

Ap

r-1

7

May

-17

Jun

-17

Jul-

17

Au

g-1

7

Sep

-17

Oct

-17

No

v-1

7

De

c-1

7

The prediction of wells production with (ETS)

Actual ETS

25.9

11.412.6

9.1 8.9

0.0

5.0

10.0

15.0

20.0

25.0

30.0

ARIMA Hybrid ARIMA ETS ARIMA+ ETS ARIMA+ TBATS

Algorithms MAPE Over Groundwater Amount Attribute

70

As we noticed from Table (5-6) and Figure (5–16) of the MAPE results in the

evaluation process of the monthly wells data of the four methods, ARIMA combined with

TBATS is the most appropriate algorithm with the lowest MAPE value(8.9%) .This

means the forecasting values of training set are close to data of the testing set(actual

values), and hence this algorithm will be the most accurate. As we also noticed that

ARIMA has a high MAPE value (20.6%), so it will be the weakest in performance. This

means the forecasting values of training set are far away from actual data of the testing

set.

The resulted MAPE values obtained by applying the forecasting algorithms in Table

(5-6) are not small. This is because the volatility of data is high as shown in Figure( 5–

17). So, we re-divided the monthly data into semi-annual data as in Table( 5-7) to improve

its volatility. It became less volatility as demonstrated in Figure( 5–18). Then we reapplied

the algorithms on the re-divided data by splitting the data into 70% training set and 30%

testing set. The training set represents the period from 2008 to 2014 and the testing set

represents the period from 2015 to 2017.

Table( 5-7): dividing the data into semi-annual data

Month Amount (m3)

Jan-08 2295285

Jul-08 2317733

Jan-09 2273525

Jul-09 2439755

Jan-10 2175580

Jul-10 2597673

Jan-11 2691509

Jul-11 2765044

Jan-12 2183688

Jul-12 2490211

Jan-13 2422760

Jul-13 2525213

Jan-14 2098198

Jul-14 1971355

Jan-15 1725542

Jul-15 2169862

71

Month Amount (m3)

Jan-16 1970828

Jul-16 2321393

Jan-17 2042940

Jul-17 2315646

Figure( 5–17): data representation of groundwater (before re-dividing)

Figure( 5–18): Semi-annual data representation of groundwater (after re-dividing)

We reapplied the five forecasting algorithms on the (amount) attribute as in

appendix: H. Then reevaluating them by computing MAPE as in appendix: I.

After the evaluation process of the semi-annual groundwater data, the MAPE

values for the five algorithms have improved as shown in Table (5-8). When comparing

15000001700000190000021000002300000250000027000002900000

Jan

-08

Jun

-08

No

v-0

8

Ap

r-0

9

Sep

-09

Feb

-10

Jul-

10

De

c-1

0

May

-11

Oct

-11

Mar

-12

Au

g-1

2

Jan

-13

Jun

-13

No

v-1

3

Ap

r-1

4

Sep

-14

Feb

-15

Jul-

15

De

c-1

5

May

-16

Oct

-16

Mar

-17

Amount (m3) After re-dividing

100000

200000

300000

400000

500000

600000

Jan

-08

Jun

-08

No

v-0

8

Ap

r-0

9

Sep

-09

Feb

-10

Jul-

10

De

c-1

0

May

-11

Oct

-11

Mar

-12

Au

g-1

2

Jan

-13

Jun

-13

No

v-1

3

Ap

r-1

4

Sep

-14

Feb

-15

Jul-

15

De

c-1

5

May

-16

Oct

-16

Mar

-17

Au

g-1

7

Amount (m3) Before re-dividing

72

MAPE results for the same algorithms previously applied on monthly groundwater data

shown in Table (5-6), it gave smaller MAPE values and hence better forecasting results.

Table (5-8): Algorithms MAPE over wells’ production.

Algorithm MAPE%

ARIMA 4.9

ARIMA +NN 9.2

ETS 5.2

ARIMA+ ETS 7.7

ARIMA+ TBATS 6.9

Table (5-8), represents the algorithms performance (accuracy) over wells’

Production. We found that ARIMA was the most accurate algorithm with the lowest

MAPE value 4.9%, so ARIMA was selected for semi-annual forecast of groundwater

production amounts. (ARIMA+ NN) was excluded because it is the highest MAPE value

and will be the worst in the forecasting. ETS value of 5.2% it is close to ARIMA, but

because of the nature of groundwater data, this algorithm is inappropriate for time series

work.

Figure (5–19): MAPE percentages for forecasting algorithms for semi-annual amounts

of wells’ production.

4.9

9.2

5.2

7.76.9

0.0

2.0

4.0

6.0

8.0

10.0

ARIMA ARIMA +NN ETS ARIMA+ ets ARIMA+ tbats

Algorithms MAPE

73

5.4. Forecasting the rain amounts and groundwater production in Dear El-

Balah

After the evaluation process we choose (ARIMA combined with NN) and

(ARIMA) for real forecasting tasks of rain amounts and wells’ production of groundwater

respectively. We forecasted rain amounts and groundwater production over (amount)

attribute for each data set separately for Dear El- Balah city for the next 5 years. The

results are illustrated in Figure (5–20). After that, we forecasted the groundwater

production amounts for the same period and area using (ARIMA) as shown in Figure (5–

21).

Figure (5–20): Five years forecasting Rain amounts for Dear El-Balah city using (ARIMA+NN)

Figure (5–21): Five years groundwater forecasting for semi-annual wells’ production

amounts using (ARIMA)

2050000

2100000

2150000

2200000

Jan-18 Jul-18 Jan-19 Jul-19 Jan-20 Jul-20 Jan-21 Jul-21 Jan-22 Jul-22

Amount (m3)

290

295

300

305

310

315

320

2018 2019 2020 2021 2022

Amount (mm)

74

5.4.1. Deviations of rain amounts and wells’ production

To give a clear perspective of future vision for the amounts of rainfall and the

demand of groundwater amounts for Deir Al-Balah city, we compared the data of last

year of the original dataset (2017) with the forecasted data for both ‘rain amounts’ and

‘wells production’ as in Table (5-9). This will help decision makers managing

groundwater and searching for alternative water resources.

5.4.1.1. Deviations of rain amounts, wells’ production in comparison with

the 2017 amounts

Table (5-9): Deviation for groundwater and rain amounts comparing with 2017

On the one hand the anticipated amounts of wells production for the groundwater

will be decreasing in the next 5 years in comparison with 2017. This decrease will range

0.5% and 4%.

Year

Rain

amounts

(mm)

Rain amounts

Deviation %

wells production

(m3)

wells’ production

Deviation%

2017 241 0 4358586 0

2018 314 +30 4338393 - 0.5

2019 294 +22 4302683 -1

2020 301 +25 4267267 -2

2021 298 +24 4232142 -3

2022 299 +24 4197307 -4

75

On the other hand, the anticipated rain amounts compared with the 2017 amounts

will increase to reach 30% in the next five years. The relationship between rain amounts

deviation and wells’ production deviation are illustrated in Figure (5–22).

Figure (5–22): The Relationship between rain amounts and groundwater deviation in

comparison with 2017

5.4.1.2. Deviations of rain amounts and wells production comparing with

the period (2013-2017)

In order to have a homogenous comparison period between rain and wells’ production,

we calculated the deviation of the amounts taken from two periods: the last five years

(2013-2017) and the forecasted five years (2018-2022) of rain amounts and groundwater

wells’ production. Then we compared the deviation of rain amounts of forecasted period

with the same forecasted period for groundwater wells’ production. The results of

comparison showed that in the period from (2018-2022) the anticipated rain amounts will

decrease by 8.4% in comparison with rain amounts in the period from (2013 to 2017), on

the other hand, the wells’ production of groundwater after 5 years will decrease by -1.05%

in comparison with the period from (2013-2017) as in Table (5-10).

Table (5-10): Deviation of rain amounts and groundwater amounts compared to last 5

years

Year Rain amounts(mm) Groundwater amount (m3)

2013 -2017 1645 21563737

0

30

2225 24 24

0 -0.5 -1 -2 -3 -4

-10

0

10

20

30

40

2017 2018 2019 2020 2021 2022

Relationship between Rain percentags and groundwater production percetages (2017-2022)

Rain Well

76

2018 -2022 1507 21337792

Deviation% -8.4 -1.05

Figure (5–23): Rain amounts deviation (Actual + forecasted) from 2013 to 2022

Figure (5–23) shows the annual (actual + forecasted) rain amounts data from

2013 to 2022. The figure also illustrates that the rain amounts trend from the period 2018

to 2022 will increase with comparison to 2017. This means the anticipated rain amounts

for the next five years will increase comparing with 2017. On contrary, the rain amounts

will decrease by 8.4% in comparison with the amounts in the period from 2013 to 2017.

Figure (5–24): Wells’ production data (Actual + forecasted) from 2013 to 2022

190

240

290

340

390

440

490

540

2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Rain amounts Deviation 2013 - 2022

0

1000000

2000000

3000000

4000000

5000000

6000000

2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Wells Production Deviation 2013-2022

77

According to the forecasted data shown in Figure (5–24) of the next five years

(2018-2022), the trend is going down in comparison with 2017. The anticipated wells’

production of groundwater will decrease after 5 years by (-1.05) in comparison with period

from 2013 to 2017.

Figure (5–25): Groundwater production data(Actual+forecasted) from 2008 to 2022

Figure (5–26): Rain amounts data(Actual+forecasted) from 1985 to 2022

General representation of the actual and forecasted rain data from 1985 to 2022 is

shown in Figure (5–26). Another one for the groundwater production from 2008 to 2022

is shown in Figure (5–25(.

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Wells Production of Groundwater(2008-2022)

0

100

200

300

400

500

600

700

800

19

85

19

86

19

87

19

88

19

89

19

90

19

91

19

92

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

20

13

20

14

20

15

20

16

20

17

20

18

20

19

20

20

20

21

20

22

Rain Amounts from 1985 to 2022

78

5.5. Discussion

We applied the (ARIMA+NN) algorithm for real rain forecasting because it

achieved the lowest MAPE value. This result is consistent with the result obtained by

(Mukhairez, 2018), in which ARIMA combined with neural network has the lowest

MAPE value. For the groundwater forecasting, however, the resulted MAPE values

obtained by applying the forecasting algorithms were not small because the volatility of

data was high. So, re-dividing the monthly data into semi-annual data improved its

volatility. Then reapplying the algorithms on the re-divided data and re-evaluating the

algorithms improved the MAPE values of the five algorithms which became smaller.

ARIMA was the most accurate algorithm with the lowest MAPE value and selected for

semi-annual forecast of groundwater production amounts. This result is similar to the

result of (Udom & Phumchusri, 2014) in which ARIMA model outperformed other

models and got the best MAPE value.

According to the forecasted results for the next 5 years (2018- 2022), the production

of groundwater will decrease in comparison with the amounts of 2017. This decrease

will range between 0.5% and 0.4%. On the other hand, the anticipated rain amounts

compared with the 2017 amounts will increase from 22% to reach 30% in the next five

years.

The results of comparing the forecasted amounts of rain with the years from 2013 to

2017 showed that the rain amounts will decrease by 8.4% and the anticipated wells’

production of groundwater will decrease after 5 years by (-1.05%) in comparison with the

amounts of the period from 2013 to 2017.

5.6. Summary

In this chapter forecasting is conducted on the amounts of groundwater production

and rain for Dear El-Balah city. The results of comparing the amounts of groundwater

production and the rain amounts of the period (2013 – 2017) with the forecasting results

for the next 5 years will decrease with 1.05%, 8.4%, respectively.

79

Chapter 6

Conclusion and Future

Work

80

Chapter 6

Conclusion and Future Work

This chapter summarizes the thesis results and conclusion of our experiments and

future work.

6.1. Summary

We have used well known algorithms and have proven their efficiency in different

fields of previous research. These algorithms are: (ARIMA, ARIMA+ETS,

ARIMA+TBATS, ARIMA+NN and ETS). They are widely used in time series forecasting

tasks. It is observed that no researchers have forecast groundwater and rain using the

forecasting science in Gaza and none have discussed the relation between them

considering the relation of salinity with the increased consumption of groundwater and

whether relationship between the amounts of groundwater demand and the rain is

incremental or decremental.

We applied forecasting algorithms on the groundwater and rain datasets of Dear El-

Balah city and tried to extract the relation between groundwater demand and rain amounts.

Also, we forecasted the groundwater and rain by the growth of population. This thesis

focuses on the medium-term forecasting and can contribute positive impacts in water

resources management.

The study concluded that for the next 5 years, the anticipated amounts of groundwater

production in Dear El-Balah city will decrease by 1.05% and the average of rain amounts

will decrease by 8.4%.

Yet, the expected increase of population over time will lead to an increase in

groundwater production. Based on this paradox, an interview with concerned official in

the CMWU was conducted to identify the causes of the decrease in groundwater

production by the CMWU. The results showed that:

• In reality, as people dig private wells that go unmonitored by CMWU. Water

production is performed by other sources than CMWU itself. Therefore, it is believed

that there is an increase in water production.

• Regular power cut-offs reduce the groundwater pumping by CMWU.

81

• The presence of a desalination plant in Deir El Balah to desalinate the water of the

Mediterranean Sea reduces the extraction of groundwater from the coastal water

reservoir. Although the dependence on this plant as a source of water is not full, it is still

a source of water. Its production capacity has reached about (2600×1000) Liter since

2014.

The salinity of the groundwater is affected by two main important factors: rain that

recharges groundwater wells and pumping amounts from wells. Increased rainfall and lack

of pumping from wells lead to a lack of salinity(Mushtaha & Al-Louh, 2013).

Although the data are not enough in this study to predict salinity, however, based on

the thesis results in terms of decrease in rain amounts by 8.4% and a decrease in

groundwater production by 1.05%, this gives an expectation of an increase in salinity of

groundwater.

6.2. Conclusion

Our research depends on the use of medium-term forecasting on rainfall and

groundwater data by evaluating the forecasting algorithms over our datasets, then

selecting the most appropriate algorithm for real forecast.

From all discussions of this study the following conclusions can be drawn:

1. Groundwater data and rain are non-stationary time series

2. Annual not monthly, rainfall data has been used because of the absence of rain in

some months in GAZA

3. The datasets should be on the form of time series data before applying forecasting

algorithms.

4. The evaluation stage, measuring accuracy using MAPE measure on testing set data

was before real forecast.

5. After evaluation process for semi-annual groundwater data, the most accurate

algorithm for groundwater forecasting is ARIMA in which the value of MAPE

measure is 4.9%.

82

6. Improving the performance of the algorithms (MAPE values) after re-dividing the

groundwater data to semi-annual. However, the ETS algorithm is not the best

because of the non-stationary data nature.

7. The most accurate algorithm for rain forecasting was Hybrid ARIMA

(ARIMA+NN) in which the value of MAPE measure was 21.0%.

8. The selected and most accurate model was applied on (rain data of 32 years) and

(groundwater data of 120 months).

9. The (ARIMA) model and the nonlinear models (ANN, ETS, TBATS) were used

to capture different forms of relationship in the time series data to benefit from the

advantages of these models, so the proposed model will be more powerful and

efficient

10. The proposed model forecasted the groundwater production and the amounts of

rainfall for a given area (Dear El-Balah city as a case study) over a given time (1-

5 years). This model based on historical data of rain amounts that has been taken

from 1985 to 2017 from the Ministry and some other 10 years' historical data of

the monthly pumping groundwater from (2008 to 2017) obtained from (CMWU).

11. Generally, after five years groundwater production will approximately decrease

by 1.05%, but the rain amounts will approximately decrease by 8.4%. That means

rainfall will decrease, but the rate of groundwater production will decrease more

than the rate of rainfall. Although the data are not enough, the results give us an

expectation of increasing salinity.

To sum up, this study is based on the official groundwater data available

for CMWU. It doesn’t include data concerning wells illegally dug. Also, data

concerning desalination plant of Dear El-Balah is not included.

So, generalizing the study results for Dear El-Balah city must be dealt with great

caution. It is also shouldn’t be dealt with as the only source of data with relation

to the study problem.

83

In the light of these results, the study concluded with the following

recommendations:

• Recommendations for decision and policy makers:

[1] Filling a lawsuit against the occupation to benefit from the water of the River

Jordan, which is blocked by the dam of Gaza Valley at the borders of Gaza Strip

in the middle region. This water can be utilized to be used for irrigation and

agriculture.

[2] Mandating effective water legislation by the Legislative Council and the Water

Authority regarding the utilization of groundwater and the development of laws to

prevent illegal consumption of groundwater.

[3] Establishing of dams by water authorities to collect rainwater, especially in

places where rainfall is increasing and then using it in feeding the underground

reservoir.

• Recommendations for supervisory bodies such as:(CMUS, MoA, the water

authority):

[1] Managing water consumption to ensure reasonable balance between

precipitation and pumping from underground wells.

[2] Developing a future vision and alternative plans to compensate for the shortage

of groundwater and thus prevent salinization and the search for alternative water

sources.

[3] Increasing dependence on the seawater desalination plant under construction in

the southern Gaza Strip, instead of relying on groundwater pumping from wells.

[4] Utilizing the central processing plant in the southern Gaza Strip to use the

resulting water for irrigation and agriculture.

6.3. Future works

This study is limited for not dealing with not official and not registered wells. These

not registered and not official wells increase the production of groundwater. There is a

need for future work (research) dealing with salinity rate as more accurate indicator to

determine the groundwater pumping rate.

84

Also, we can study the forecasting of the salinity and various minerals percentages in

groundwater based on the rates of rain, groundwater production and desalination rates.

More factors e.g. (population growth, salinity, nitrates and minerals in wells water)

can be used for forecasting tasks of groundwater and rain.

Different models can be used for different cities on real data sets in the Gaza Strip to

forecast groundwater production and rain amounts to prevent the depletion of the

groundwater for the sake of the coming generations.

85

Bibliography

Aggarwal, C. C. (2015). Data Mining The Textbook.

Agrawal, V., Agrawal, S., Nag, S., Chakraborty, D., & Panigrahi, B. K. (2016). Knn Coal Mill,

11–18.

Aiash, M., & Mogheir, Y. (2017). Comprehensive Solutions for the Water Crisis in Gaza Strip,

25(3), 63–75.

Al-Shalalfeh, Z., Napier, F., & Scandrett, E. (2018). Water Nakba in Palestine: Sustainable

Development Goal 6 versus Israeli hydro-hegemony. Local Environment, 23(1), 117–124.

https://doi.org/10.1080/13549839.2017.1363728

Armstrong, J. S., & Fildes, R. (2006). Making progress in forecasting. International Journal of

Forecasting, 22(3), 433–441. https://doi.org/10.1016/j.ijforecast.2006.04.007

Auda, G., & Kamel, M. (1999). Modular Neural Networks: a Survey. International Journal of

Neural Systems, 9(2), 129–151.

Australian Transport- ASSessment and Planning. (2016). Forecasting and evaluation. Australian

Transport( ASSessment and Planning). Retrieved from https://atap.gov.au/tools-

techniques/travel-demand-modelling/6-forecasting-evaluation.aspx

Authority, P. W. (2018). Water Authority Strategic Plan 2016-2018 1, 1–48.

Azam, F. (2000). Biologically inspired modular neural networks. Specialist, 149. Retrieved from

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.1840&rep=rep1&typ

e=pdf

Bagirov, A. M., Mahmood, A., & Barton, A. (2017). Prediction of monthly rainfall in Victoria,

Australia: Clusterwise linear regression approach. Atmospheric Research, 188, 20–29.

https://doi.org/10.1016/j.atmosres.2017.01.003

Banerjee, P., Singh, V. S., Chatttopadhyay, K., Chandra, P. C., & Singh, B. (2011). Artificial

neural network model as a potential alternative for groundwater salinity forecasting.

Journal of Hydrology, 398(3–4), 212–220. https://doi.org/10.1016/j.jhydrol.2010.12.016

Barakat, R., & Heackock, R. (2013). Water in Palestine ISBN 978-9950-316-47-8. The Birzeit

Strategic Studies Forum.The Ibrahim Abu-Lughod Institiue of International Studies. Birzeit

University.

Bitzer, S., & Kiebel, S. J. (2012). Recognizing recurrent neural networks (rRNN): Bayesian

inference for recurrent neural networks. Biological Cybernetics, 106(4–5), 201–217.

https://doi.org/10.1007/s00422-012-0490-x

Chhetri, M. B., Lumpe, M., Vo, Q. B., & Kowalczyk, R. (2017). On Forecasting Amazon EC2

Spot Prices Using Time-Series Decomposition with Hybrid Look-Backs. Proceedings -

2017 IEEE 1st International Conference on Edge Computing, EDGE 2017, 158–165.

https://doi.org/10.1109/IEEE.EDGE.2017.29

Chou, I.-T. (n.d.). BATS and TBATS Model. Retrieved from https://yintingchou.com/posts/bats-

and-tbats-model/

86

Coastal Municipalities Water Utility. (2015). Retrieved from

http://www.cmwu.ps/Ar/ReadTopic.aspx?Static=20

Cui, H., Singh, V. P., & Asce, D. M. (2017). Entropy Spectral Analyses for Groundwater

Forecasting, 22(7), 1–8. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001512.

Daibes-Murad1, F. (2004). Water Resources in Palestine A Fact Sheet and Basic Analysis of the

Legal Status.

Dalinina. (2017). Introduction to Forecasting with ARIMA in R. ORACLE. Retrieved from

https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-

science-tutorials

de Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex

seasonal patterns using exponential smoothing. Journal of the American Statistical

Association, 106(496), 1513–1527. https://doi.org/10.1198/jasa.2011.tm09771

Deb, C., Zhang, F., Yang, J., Lee, S. E., & Shah, K. W. (2017). A review on time series

forecasting techniques for building energy consumption. Renewable and Sustainable

Energy Reviews, 74(January), 902–924. https://doi.org/10.1016/j.rser.2017.02.085

Deepashri, & Kamath, A. (2017). Survey on Techniques of Data Mining and its Applications.

International Journal of Emerging Research in Management &Technology, ISSN(62),

2278–9359. Retrieved from

https://www.ermt.net/docs/papers/Special_Issue/2017/ICETE/33p.pdf

Dhekale, B. S., Sahu, P. K., Vishwajith, K. P., & Narsimhaiah, L. (2015). Structural Time Series

Analysis towards Modeling and Forecasting of Ground Water Fluctuations in Murshidabad

District of West Bengal, 5, 117–126. https://doi.org/10.5923/c.ije.201501.17

Egnanarayana, Y. (2005). ARTIFICIAL NEURAL.

Et.al, B. (2014). Improving the performance of water demand forecasting models by using

weather input. Procedia Engineering, 70, 93–102.

https://doi.org/10.1016/j.proeng.2014.02.012

Gahirwal, M. (2013). Inter Time Series Sales Forecasting. Retrieved from

http://arxiv.org/abs/1303.0117

Garima Jain, E., & Mallick, B. (2017). A Study of Time Series Models ARIMA and ETS.

International Journal of Modern Education and Computer Science, 9(4), 57–63.

https://doi.org/10.5815/ijmecs.2017.04.07

Geetha, A., & Maksood, F. Z. (2017). Sustainability in Oman: Energy Consumption Forecasting

using R. Indian Journal of Science and Technology (IJST), 10(10), 1–14.

https://doi.org/10.17485/ijst/2017/v10i10/97008

Gong, Y., Zhang, Y., Lan, S., & Wang, H. (2016). A Comparative Study of Artificial Neural

Networks, Support Vector Machines and Adaptive Neuro Fuzzy Inference System for

Forecasting Groundwater Levels near Lake Okeechobee, Florida. Water Resources

Management, 30(1), 375–391. https://doi.org/10.1007/s11269-015-1167-8

Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. San Francisco,

CA, itd: Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-381479-1.00001-0

87

Hassani, H., Silva, E. S., Antonakakis, N., Filis, G., & Gupta, R. (2017). Forecasting accuracy

evaluation of tourist arrivals. Annals of Tourism Research, 63, 112–127.

https://doi.org/10.1016/j.annals.2017.01.008

Hill, T., Connor, M. O., & Remus, W. (2015). Neural Network Models for Time Series

Forecasts. Management Science, 42(7), 1082–1092.

https://doi.org/10.1287/mnsc.42.7.1082

Horse, H. (2018). Time Series Analysis - Exponential Smoothing.

Hosam Mukhairez, D. A. E.-H. (n.d.). Medium-Term Forecasting for Municipal Water Demand

and Revenue ( KhanYounis City as A Case Study ) Medium-Term Forecasting for

Municipal Water Demand and Revenue ( KhanYoun.

Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, S. (2002). A state space framework for

automatic forecasting using exponential smoothing methods. International Journal of

Forecasting, 18(3), 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8

Igor Aizenberg a, Leonid Sheremetovb, n, Luis Villa-Vargas c, J. M.-M. (2016). Multilayer

Neural Network with Multi-Valued Neurons in time series forecasting of oil production.

Neurocomputing, 175, 980–989. https://doi.org/10.1016/j.neucom.2015.06.092

Iqelan, B. M. (2016). A Singular Spectrum Analysis Technique to Electricity Consumption

Forecasting. Journal of Engineering Research and Applications Www.ijera.com ISSN,

6(36), 2248–962286. https://doi.org/10.9790/9622-

James Lani, P. D. (2018). What is Linear Regression? CompleteDissertation-Website. Retrieved

from http://www.statisticssolutions.com/what-is-linear-regression/

Jones, R. B. B. V. (2011). Forecasting Urban Water Demand (second edi). America: american

water works association.

Kanchymalay, K., Salim, N., Sukprasert, A., Krishnan, R., & Hashim, U. R. A. (2017).

Multivariate Time Series Forecasting of Crude Palm Oil Price Using Machine Learning

Techniques. IOP Conference Series: Materials Science and Engineering, 226(1).

https://doi.org/10.1088/1757-899X/226/1/012117

Kaytez, F., Taplamacioglu, M. C., Cam, E., & Hardalac, F. (2015). Forecasting electricity

consumption: A comparison of regression analysis, neural networks and least squares

support vector machines. International Journal of Electrical Power and Energy Systems,

67(February 2018), 431–438. https://doi.org/10.1016/j.ijepes.2014.12.036

Kejela, G. (2012). Short-term Forecasting of Electricity Consumption using Gaussian Processes.

Khali et.al. (n.d.). Short-term forecasting of groundwater levels under conditions of mine-tailings

recharge using wavelet ensemble neural network models. Hydrogeology Journal, 23(1),

121–141. https://doi.org/10.1007/s10040-014-1204-3

Khashei, M., & Bijari, M. (2011). A new hybrid methodology for nonlinear time series

forecasting. Modelling and Simulation in Engineering, 2011.

https://doi.org/10.1155/2011/379121

Kisi, O., & Sanikhani, H. (2015). Prediction of long-term monthly precipitation using several

soft computing methods without climatic data. International Journal of Climatology,

88

35(14), 4139–4150. https://doi.org/10.1002/joc.4273

Liu, H., Tian, H. Q., Chen, C., & Li, Y. fei. (2010). A hybrid statistical method to predict wind

speed and wind power. Renewable Energy, 35(8), 1857–1861.

https://doi.org/10.1016/j.renene.2009.12.011

Mahmud et.al. (2017). Monthly rainfall forecast of Bangladesh using autoregressive integrated

moving average method. Environmental Engineering Research, 22(2), 162–168.

https://doi.org/10.4491/eer.2016.075

Mekanik, F., Imteaz, M. A., Gato-Trinidad, S., & Elmahdi, A. (2013). Multiple regression and

Artificial Neural Network for long-term rainfall forecasting using large scale climate

modes. Journal of Hydrology, 503, 11–21. https://doi.org/10.1016/j.jhydrol.2013.08.035

Microsoft. (2018). FORECAST.LINEAR function. Retrieved from

https://support.office.com/en-us/article/forecast-linear-function-38e2a419-7415-4037-

8761-93f3992ace87

Ministry of Interior and National Security. (2018). No Title. Retrieved from

https://moi.gov.ps/Home/

Mohamed, M. M., & Al-Mualla, A. A. (2010). Water Demand Forecasting in Umm Al-Quwain

(UAE) Using the IWR-MAIN Specify Forecasting Model. Water Resources Management,

24(14), 4093–4120. https://doi.org/10.1007/s11269-010-9649-1

Mohanty, S., Jha, M. K., Raul, S. K., Panda, R. K., & Sudheer, K. P. (2015). Using Artificial

Neural Network Approach for Simultaneous Forecasting of Weekly Groundwater Levels at

Multiple Sites. Water Resources Management, 29(15), 5521–5532.

https://doi.org/10.1007/s11269-015-1132-6

Mushtaha, D. A.-A. K., & Al-Louh, D. M. N. (2013). The relationship between the water of rain,

groundwater and springs and population consumption In the West Bank and Gaza Strip In

the period from 1980 to 2010.

Palestine, M. of A. S. of. (2017). No Title.

Panigrahi, S., & Behera, H. S. (2017). A hybrid ETS–ANN model for time series forecasting.

Engineering Applications of Artificial Intelligence, 66(June), 49–59.

https://doi.org/10.1016/j.engappai.2017.07.007

Pati, J., & Shukla, K. K. (2015). A comparison of ARIMA, neural network and a hybrid

technique for Debian bug number prediction. Proceedings - 5th IEEE International

Conference on Computer and Communication Technology, ICCCT 2014, 47–53.

https://doi.org/10.1109/ICCCT.2014.7001468

Qiu, M., Zhao, P., Zhang, K., Huang, J., Shi, X., Wang, X., & Chu, W. (2017). A short-term

rainfall prediction model using multi-task convolutional neural networks. Proceedings -

IEEE International Conference on Data Mining, ICDM, 2017–Novem, 395–404.

https://doi.org/10.1109/ICDM.2017.49

Rajaee et.al. (2016). Groundwater Level Forecasting Using Wavelet and Kriging, II(Ii), 1–21.

https://doi.org/10.22055/jhs.2016.12848

Ramos, P., Santos, N., & Rebelo, R. (2015). Performance of state space and ARIMA models for

89

consumer retail sales forecasting. Robotics and Computer-Integrated Manufacturing, 34,

151–163. https://doi.org/10.1016/j.rcim.2014.12.015

Report, S., & Water, O. F. (2013). in the Occupied State of Palestine-2012, (October), 22.

Rob J Hyndman. (2018). Innovations state space models for exponential smoothing. Retrieved

from https://www.otexts.org/fpp/7/7

Seymour, L. (2014). Introduction to Time Series and Forecasting . by Peter J . Brockwell ;

Richard A . Davis Review by : Lynne Seymour Journal of the American Statistical

Association , Vol . 92 , No . 440 ( Dec ., 1997 ), p . 1647, 92(440).

SHARMA. (2017). Data Mining vs. Statistics. Retrieved from https://upxacademy.com/data-

mining-vs-statistics/

Sibanda, W., & Pretorius, P. (2012). Artificial Neural Networks-A Review of Applications of

Neural Networks in the Modeling of HIV Epidemic. International Journal of Computer

Applications, 44(April), 975–8887.

Stephanie. (2018). Exponential Smoothing: Definition of Simple, Double and Triple. Retrieved

from http://www.statisticshowto.com/exponential-smoothing/

Suad A.Alasdi and Wesam S.Bhaya. (2017). Review of Data Preprocessing Techniques in Data

Mining. Retrieved from https://www.researchgate.net/publication/320161439

Sunil Ray. (2015). 7Types of Regression Techniques you should know! Retrieved from

https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/

Thiyagarajan, K., Kodagoda, S., & Nguyen, L. V. (2017). Predictive Analytics for Detecting

Sensor Failure Using Autoregressive Integrated Moving Average Model. 12th IEEE

Conference on Industrial Electronics and Applications, 1923–1928.

Udom, P., & Phumchusri, N. (2014). A comparison study between time series model and

ARIMA model for sales forecasting of distributor in plastic industry, 4(2), 32–38.

Ullah, M. I. (2014). Objectives of Time Series Analysis, http://itfeature.com/time-series-

analysis-and-fore.

Wongsathan, R., & Seedadan, I. (2016). A Hybrid ARIMA and Neural Networks Model for PM-

10 Pollution Estimation: The Case of Chiang Mai City Moat Area. Procedia Computer

Science, 86(March), 273–276. https://doi.org/10.1016/j.procs.2016.05.057

Yang, R., Zhang, Z., & Shi, P. (2010). Exponential stability on stochastic neural networks with

discrete interval and distributed delays. IEEE Trans. Neural Networks, 21(1), 169–175.

https://doi.org/10.1109/TNN.2009.2036610

Zhang, J. L., Zhang, Y. J., & Zhang, L. (2015). A novel hybrid method for crude oil price

forecasting. Energy Economics, 49(December), 649–659.

https://doi.org/10.1016/j.eneco.2015.02.018

Zhao, C., Liu, G., Hermawan, E., Ruchjana, B. N., Siregar, F. A., & Makmur, T. (2018). The

development rainfall forecasting using kalman filter The development rainfall forecasting

using kalman filter.

90

Appendix

Summary of Results

Appendix A

Time series rain dataset

Year Amount (mm)

1985 108

Tra

inin

g S

et

1986 626.4

1987 130.2

1988 344.5

1989 264.5

1990 203.4

1991 497.7

1992 419.3

1993 209.6

1994 683

1995 236

1996 352.7

1997 354.5

1998 107.5

1999 158.2

2000 527

2001 359.5

2002 435.7

2003 354.9

2004 311.5

2005 304.5

2006 389.5

2007 219.5

2008 281

2009 220

2010 131

2011 385.5

Te

sti

ng

Se

t

2012 255.5

2013 375.5

2014 216

2015 514

2016 298.5

91

Year Amount (mm)

2017 241.2

Appendix B

Time series groundwater dataset

Month Amount (m3) Jan-08 339957

Tra

inin

g S

et

Feb-08 371711

Mar-08 396483

Apr-08 398139

May-08 398812

Jun-08 390183

Jul-08 415331

Aug-08 415165

Sep-08 394400

Oct-08 422698

Nov-08 352150

Dec-08 317989

Jan-09 299446

Feb-09 319041

Mar-09 372687

Apr-09 409717

May-09 395727

Jun-09 476907

Jul-09 399822

Aug-09 423123

Sep-09 424026

Oct-09 392577

Nov-09 395365

Dec-09 404842

Jan-10 294146

Feb-10 275096

Mar-10 346008

Apr-10 424610

May-10 425295

Jun-10 410425

Jul-10 422653

Aug-10 385932

92

Month Amount (m3) Sep-10 441907

Oct-10 473732

Nov-10 446231

Dec-10 427218

Jan-11 408763

Feb-11 387545

Mar-11 492134

Apr-11 429768

May-11 481101

Jun-11 492198

Jul-11 474721

Aug-11 490828

Sep-11 479342

Oct-11 486246

Nov-11 433422

Dec-11 400485

Jan-12 322137

Feb-12 315952

Mar-12 298945

Apr-12 414893

May-12 423647

Jun-12 408114

Jul-12 437694

Aug-12 422044

Sep-12 444590

Oct-12 439953

Nov-12 385702

Dec-12 360228

Jan-13 366008

Feb-13 331622

Mar-13 405257

Apr-13 465403

May-13 417558

Jun-13 436912

Jul-13 474883

Aug-13 513654

Sep-13 484779

Oct-13 449431

Nov-13 289844

93

Month Amount (m3) Dec-13 312622

Jan-14 284846

Feb-14 348547

Mar-14 353087

Apr-14 379378

May-14 370794

Jun-14 361546

Jul-14 301269

Aug-14 301268

Sep-14 356924

Oct-14 325178

Nov-14 351438

Dec-14 335278

Jan-15 313784

Feb-15 265190

Mar-15 296940

Apr-15 332953

May-15 371687

Jun-15 144988

Jul-15 374003

Aug-15 383085

Sep-15 385764

Oct-15 402146

Nov-15 334845

Dec-15 290019

Jan-16 293112

Te

sti

ng

Se

t

Feb-16 258263

Mar-16 339185

Apr-16 318414

May-16 383726

Jun-16 378128

Jul-16 418558

Aug-16 405105

Sep-16 404280

Oct-16 383765

Nov-16 390100

Dec-16 319585

Jan-17 306611

Feb-17 364681

94

Month Amount (m3) Mar-17 380579

Apr-17 335951

May-17 332772

Jun-17 322346

Jul-17 498978

Aug-17 377795

Sep-17 391049

Oct-17 386677

Nov-17 328756

Dec-17 332391

Appendix C

The results of applying different algorithms of Amount (mm) attribute on rain dataset.

Year Amount (mm)

(Y)

ARIMA

(Y1)

ARIMA+ NN

(Y2)

ETS

(Y3)

ARIMA+ ETS

(Y4)

ARIMA+ TBATS

(Y5)

2011 385.5 247 429 241 351 328

2012 255.5 246 288 239 307 284

2013 375.5 245 303 238 322 299

2014 216 244 298 236 317 294

2015 514 243 299 234 319 296

2016 298.5 242 299 233 318 295

2017 241.2 242 299 231 319 295

95

Appendix D

Computing MAPE for algorithms results of Amount (mm) on rain dataset.

ARIMA Error

Deviation

=(Abs(y-y2)/y)

*100

ARIMA+ NN

Error

Deviation

=(Abs(y-y2)/y)

*100

ETS

Error Deviation

=(Abs(y-y3)/y)

*100

ARIMA+ ETS

Error

Deviation

=(Abs(y-y4)/y)

*100

ARIMA+

TBATS Error

Deviation

=(Abs(y-y5)/y)

*100

35.9 11.3 37.4 8.9 14.9

3.7 12.6 6.3 20.2 11.1

34.7 19.3 36.7 14.1 20.3

13.1 37.8 9.3 46.8 36.1

52.7 41.7 54.4 37.9 42.5

18.8 0.1 22.1 6.6 1.1

0.1 24.0 4.3 32.1 22.4

M

AP

E

22.7 21.0 24.4 23.8 21.2

the best

96

Appendix E

The results of applying different algorithms over Amount (m3) on wells dataset

Month Amount (m3)

(y)

ARIMA

(y1)

ARIMA+ NN

(y2)

ETS

(y3)

ARIMA+ ETS

(y4)

ARIMA+ TBATS

(y5)

Jan-16 293112 305902 334581 258608 290710 291948

Feb-16 258263 304219 321916 249970 296304 305128

Mar-16 339185 302545 347862 323864 321600 325142

Apr-16 318414 300880 340691 339615 344393 340567

May-16 383726 299224 289115 350471 347624 345582

Jun-16 378128 297577 358395 352011 337285 344810

Jul-16 418558 295940 371431 337166 347444 346998

Aug-16 405105 294311 358062 331701 349869 354624

Sep-16 404280 292692 399713 342598 355840 360440

Oct-16 383765 291081 350976 350785 353917 354769

Nov-16 390100 289479 327363 316659 330131 336302

Dec-16 319585 287886 363549 300961 324135 315320

Jan-17 306611 286302 337125 249799 304398 305636

Feb-17 364681 284727 347753 241161 303890 312715

Mar-17 380579 283160 357526 315055 325805 329346

Apr-17 335951 281602 312685 330806 346724 342897

May-17 332772 280052 334481 341662 348916 346873

Jun-17 322346 278511 347945 343202 338001 345526

Jul-17 498978 276978 350918 328357 347841 347395

Aug-17 377795 275454 418747 322892 350089 354844

Sep-17 391049 273938 372852 333789 355962 360562

Oct-17 386677 272431 334747 341976 353985 354837

Nov-17 328756 270932 358877 307850 330168 336339

Dec-17 332391 269441 319792 292152 324156 315341

Appendix G

The results of rain amounts forecasting for Dear El-Balah city using (ARIMA +NN)

Year Forecasted amount (mm)

2018 314

2019 294

2020 301

2021 298

97

Year Forecasted amount (mm)

2022 299

Appendix H

Applying the algorithms on the testing set of semi-annual groundwater data

Month Amount

(m3) (Z)

ARIMA

(Z1)

ARIMA+ NN (Z2)

ETS (Z3)

ARIMA+ ETS (Z4)

ARIMA+ TBATS

(Z5)

Jan-15 1725542 2135017 2311065 2277485

2398878 2374316

Jul-15 2169862 2121727 2758355 2269306

2398878 2374316

Jan-16 1970828 2108520 2648037 2261127

2398878 2374316

Jul-16 2321393 2095395 2358245 2252949

2398878 2374316

Jan-17 2042940 2082352 2387903 2244770

2398878 2374316

Jul-17 2315646 2069390 2255529 2236592 2398878 2374316

98

Appendix I

Computing MAPE for the results of applying algorithms of Amount (mm) on semi-

annual groundwater data

ARIMA Error Deviation

=(Abs(z-z1)/z)

*100

ARIMA+ NN Error Deviation

=(Abs(z-z2)/z)

*100

ETS Error Deviation

=(Abs(z-z3)/z)

*100

ARIMA+ ETS Error Deviation

=(Abs(z-z4)/z)

*100

ARIMA+ TBATS Error Deviation

=(Abs(z-z5)/z)

*100

23.7 33.9 32.0 39.0 37.6

2.2 27.1 4.6 10.6 9.4

7.0 34.4 14.7 21.7 20.5

9.7 1.6 2.9 3.3 2.3

1.9 16.9 9.9 17.4 16.2

10.6 2.6 3.4 3.6 2.5

MA

PE

4.9 9.2 5.2 7.7 6.9

The Best

99

Appendix J

The results of forecasting of semi-annual groundwater data using (ARIMA)

Month Forecasted semi- annual amounts (m3)

Jan-18 2173678

Jul-18 2164714

Jan-19 2155787

Jul-19 2146896

Jan-20 2138042

Jul-20 2129225

Jan-21 2120444

Jul-21 2111699

Jan-22 2102990

Jul-22 2094317

Medium-Term Forecasting for Rain Amounts and Groundwater ... · تايمزراوخ مادسا ت ةزغ عاطق يف حلب ا ريد ةنيدم ةعباا لحاسا تايدلب

Documents