Page 1
Medium-Term Forecasting for Rain
Amounts and Groundwater Production
(Dear El-Balah City as A Case Study)
إنتاج المياه الجوفيةلكميات األمطار و متوسط المدى توقع( كدراسة حالة البلح ديرمدينة )
By
Ihsan Abd Al-Majeed Solaiman Abu Amra
Supervised by
Dr. Ashraf Y. A. Maghari
A thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Information Technology
September/2018
بــغـــزة ةــالميــــــة اإلســـــــــامعـالج
البحث العلمي والدراسات العليا عمادة
تكنولوجيا المعلوماتة ــــــــــــــــــليـك
تكنــولوجيا المعلــوماتر ـــــــماجستي
The Islamic University of Gaza
Deanship of Research and Postgraduate Affairs
Faculty of Information Technology
Master of Information Technology
Page 2
I
إقــــــــــــــرار
أنا الموقع أدناه مقدم الرسالة التي تحمل العنوان:
Medium-Term Forecasting for Rain Amounts and
Groundwater Production
(Dear El-Balah City as A Case Study)
المياه الجوفية انتاجو كميات األمطارلمتوسط المدى توقع
(كدراسة حالة البلح ديرمدينة )
جزء أقر بأن ما اشتملت عليه هذه الرسالة إنما هو نتاج جهدي الخاص، باستثناء ما تمت اإلشارة إليه حيثما ورد، وأن هذه الرسالة ككل أو أي
لنيل درجة أو لقب علمي أو بحثي لدى أي مؤسسة تعليمية أو بحثية أخرى. االخرين منها لم يقدم من قبل
Declaration
I understand the nature of plagiarism, and I am aware of the University’s policy on this. The work
provided in this thesis, unless otherwise referenced, is the researcher's own work, and has not been
submitted by others elsewhere for any other degree or qualification.
اسم الطالب: احسان عبدالمجيد سليمان أبوعمرة
Ihsan Abd Al Majeed Abu Amra Student's name:
:Signature التوقيع:
:Date التاريخ:
Page 5
II
Abstract
Forecasting is a data mining technique which benefits from numerous sources of time-series
data to derive value from historical data and helps business decision-makers for effective planning.
Groundwater is the main water source in Gaza that decreasing due to population growth. A
real water crisis is found because of the lack of rainfall. Moreover, an increase in demand for
groundwater and reduced rainfall, which is the main source of groundwater recharge, will lead to
the depletion of groundwater wells. As a result, mixing seawater with groundwater increases the
salinity rate, especially in areas where wells are close to the Mediterranean Sea in Gaza. Wells
digging without governmental control, increasing salinity percentages.
Therefore, it is necessary to focus on the relationship between rainfall - which feeds the
groundwater reservoir and reduces its salinity - and the percentages of the production for the
groundwater.
In this thesis, we conducted the forecasting techniques on two real data sets: the
groundwater production amounts we gained from the Ministry of Agriculture and the rain amounts
from the Coastal Municipalities Water Utility (CMWU) of Dear El-Balah City in the Gaza Strip.
The following forecasting algorithms are used: Auto-Regressive Integrated Moving Average
(ARIMA), ARIMA combined with Neural Network (NN), Exponential Smoothing (ETS) and State
Space Model with Box-Cox Transformation, ARMA Errors, Trend and Seasonal Components
(TBATS) and ETS. The best performance of applied algorithms on rainfall data according to Mean
Absolute Percentage Error (MAPE) measure is (ARIMA+NN) which gave the MAPE = 21%. On
the other hand, (ARIMA) is the best algorithm applied to wells’ production data which achieved
MAPE= 4.9%.
The results have shown that after five years the amounts of rainfall and groundwater
production in comparison with the period from (2013 to 2017) will decrease by 8.4%, 1.05%,
respectively. Based on these results, the salinity is expected to increase in the coming years making
the groundwater unusable.
Keywords: Forecasting, Ground water, Time series, ARIMA, Hybrid ARIMA, ETS, Rainfall.
Page 6
III
الملخص
ويستفيد من المصادر العديدة لبيانات السالسل الزمنية الشتقاق القيمة من البيانات يب البيانات قيعد من تقنيات تنالتنبؤ .التاريخية ويساعد صانعي القرار في العمل على التخطيط الفعال
تم العثور على أزمة مياه . تعتبر المياه الجوفية المصدر الرئيسي للمياه في غزة والذي يتناقص بسبب النمو السكانيوعالوة على ذلك ، فإن الزيادة في الطلب على المياه الجوفية وانخفاض هطول األمطار ، .عدم هطول األمطار حقيقية بسبب
التي تعد المصدر الرئيسي إلعادة تغذية المياه الجوفية ، ستؤدي إلى استنزاف آبار المياه الجوفية. ونتيجة لذلك ، فإن خلط مياه طق التي تكون فيها اآلبار قريبة من البحر األبيض المتوسط في البحر بالمياه الجوفية يزيد من معدل الملوحة خاصة في المنا
.نسب الملوحةيد يز غزة. حفر اآلبار بدون سيطرة حكومية
والنسب -التي تغذي خزان المياه الجوفية وتقلل من ملوحتها -لذلك ، من الضروري التركيز على العالقة بين األمطار المئوية إلنتاج المياه الجوفية.
التي حصلنا عليها ، أجرينا تقنيات التنبؤ على مجموعتين حقيقيتين من البيانات: كميات المياه الجوفيةالرسالةذه في هالبلح في قطاع غزة. تم استخدام خوارزميات ديراألمطار من مصلحة مياه بلديات الساحل التابعة لمدينة وكمياتمن وزارة الزراعة
. ETS و TBATS و NN ،ETS مع الهجينة ARIMA، (ARIMA)لالنحدار الذاتي متحركالمتوسط الالتنبؤ التالية: هو (MAPE) كان أفضل أداء للخوارزميات المطبقة على بيانات هطول األمطار طبقا لمقياس خطأ متوسط النسب المطلقة
(ARIMA + NN) الذي أعطى MAPE = 21% من ناحية أخرى ، كانت (ARIMA) مطبقة على بيانات أفضل خوارزمية .MAPE = 4.9% إنتاج اآلبار التي حققت
2013سنوات سوف تنخفض كميات األمطار وإنتاج المياه الجوفية مقارنة بالفترة من ) خمسوقد أظهرت النتائج أنه بعد السنوات القادمة للمياه في بناء على هذه النتائج من المتوقع زيادة الملوحة.٪ على التوالي1.05٪ ، 8.4( بنسبة 2017إلى
ستخدام.الالجوفية مما يجعلها غير قابلة ل
.، األمطار ARIMA ،ETS، الهجين ARIMA: التوقع ، المياه الجوفية ، السالسل الزمنية ، الكلمات المفتاحية
Page 7
IV
Dedication
To my parents who gave me whatever they have to achieve my dreams in life.
To my brothers and sisters who shared with me the burden of life.
To my dear husband and my children, my permanent source of support.
To the Islamic University, which embraced us and help us graduate and be capable to benefit
others with the science of information technology.
To my second home and my students at Palestine Technical College
To all my friends and those who accompanied me during my studies at the University.
To the Coastal Municipalities Water Utility and the General Directorate of Soil and Irrigation
under the Ministry of Agriculture, which supported me with the necessary data and
information.
To all those I dedicate this work.
Page 8
V
Acknowledgment
Thanks to the Almighty Allah for giving me strength and ability to complete this MA
research.
I would like to express my gratitude towards my supervisor, Dr. Ashraf Maghari, for all the
guidance and constructive observations, he has given me through the research period. I'm
proud to be one of his students and I had the opportunity to be under his supervision.
I also thank my parents for their great effort.
Special thanks to my dear husband for his boundless support to complete this research.
I would like to thank my darling children for being patient and supporting me during my
studies.
Many sincere thanks to my sisters and brothers for supporting me throughout this experience.
Page 9
VI
Table of Content
Declaration .................................................................................................................................... I
Abstract ................................................................................................................................... II
III .................................................................................................................................. الملخص
Dedication .................................................................................................................................. IV
Acknowledgment ......................................................................................................................... V
Table of Content ......................................................................................................................... VI
List of Abbreviations .................................................................................................................. IX
List of Figures .............................................................................................................................. X
List of Tables ............................................................................................................................... XI
Chapter 1 Introduction ............................................................................................................ 12
1.1. Background and Context ......................................................................................... 2
1.2. Statement of the problem ........................................................................................ 5
1.3. Objectives ................................................................................................................ 6
1.3.1. Main objective ......................................................................................................... 6
1.3.2 Specific objective .................................................................................................... 6
1.4. Importance of the project ........................................................................................ 6
1.5. Scope and limitations of the project ........................................................................ 7
1.5.1. Main Scope .............................................................................................................. 7
1.5.2. Main limitations ...................................................................................................... 8
1.6. Methodology ........................................................................................................... 8
1.7. Thesis Outlines ........................................................................................................ 9
Chapter 2 Literature Review ................................................................................................... 10
2.1. Coastal Municipalities Water Utility (CMWU) .................................................... 11
2.2. The Ministry of Agriculture (MoA): ..................................................................... 11
2.3. Data Mining Overview .......................................................................................... 12
2.4. Forecasting Accuracy measures: ........................................................................... 13
2.5. Time Series Analysis ............................................................................................. 13
2.7. Forecasting techniques: ......................................................................................... 15
2.7.1. Auto-Regressive Integrated Moving Average (ARIMA) ...................................... 15
2.7.2. Hybrid ARIMA ..................................................................................................... 16
Page 10
VII
2.7.2.1 Neural Networks Model (NNs) ........................................................................ 17
2.7.2.1.1 Feed-forward Neural Networks ............................................................... 18
2.7.2.1.2 Recurrent Networks (RN): ....................................................................... 18
2.7.2.1.3 Stochastic Neural Networks (SNN): ........................................................ 18
2.7.2.1.4 Modular Neural Networks ....................................................................... 18
2.7.2.2 Exponential Smoothing- the state space models: .............................................. 19
2.7.2.3 TBATS .............................................................................................................. 22
Chapter 3 Related Work .......................................................................................................... 24
3.1 Groundwater and Rainfall forecasting................................................................... 26
3.2 Other domains in forecasting researches: .............................................................. 31
3.3 Related Work Discussion ...................................................................................... 36
3.4 Conclusion ............................................................................................................. 38
Chapter 4 Methodology and Model Development ................................................................. 39
4.1. Methodology steps ................................................................................................ 40
4.2. Data Collection ...................................................................................................... 42
4.3. Data Preprocessing ................................................................................................ 45
4.4. Selected Models: ................................................................................................... 51
4.5. Implementation ...................................................................................................... 52
4.5.1. Tools:……………………………………………………………………………. 52
4.5.2. Steps of applying models: .................................................................................... 52
4.6. Summary: .............................................................................................................. 54
Chapter 5 Experimental Results and Discussion ................................................................... 55
5.1. Experiment sets ..................................................................................................... 56
5.2. Data set .................................................................................................................. 57
5.3. Evaluation forecasting algorithms ......................................................................... 59
5.3.1. Evaluating algorithms over Amount attribute in rain data set ............................... 59
5.3.1.1. ARIMA Evaluation ............................................................................................. 60
5.3.1.2. Hybrid ARIMA (ARIMA with Neural Network (NN)) Evaluation……………61
5.3.1.3. Hybrid ARIMA (ARIMA with (ETS )) Evaluation……………………… …62
5.3.1.4. Hybrid ARIMA (ARIMA with (TBATS)) Evaluation…………………………….63
5.3.1.5. Exponential Smoothing ETS Evaluation……………………………………..64
5.3.2. Comparing the Methods performance of ‘rain amount’ ........................................ 65
5.3.3. Algorithms evaluation over groundwater production amounts…………………..66
Page 11
VIII
5.3.3.1. ARIMA Evaluation………………………………………………………….….67
5.3.3.2. Hybrid ARIMA(ARIMA+NN) Evaluation……………………………………..67
5.3.3.3. Hybrid ARIMA(ARIMA+ ETS ) Evaluation…………………………………68
5.3.3.4. Hybrid ARIMA(ARIMA+TBATS) Evaluation………………………………...….68
5.3.3.5. Exponential Smoothing ETS Evaluation………………………………..……69
5.3.4. Comparing Methods accuracy over ‘Wells’ production’ .................................. ....69
5.4. Forecasting the rain amounts and groundwater production in Dear El-Balah ....... 73
5.4.1. Deviations of rain amounts and wells’ production74
5.4.1.1. Deviations of rain, wells’ production in comparison with the 2017 amounts .... 74
5.4.1.2. Deviations of rain , wells production comparing with the period (2013-2017) .. 75
5.5. Discussion ............................................................................................................. 78
5.6. Summary ............................................................................................................... 78
Chapter 6 Conclusion and Future Work ................................................................................ 79
6.1. Summary ............................................................................................................... 80
6.2. Conclusion ............................................................................................................. 81
6.3. Future works .......................................................................................................... 83
Bibliography ................................................................................................................................ 85
Appendix .................................................................................................................................. 90
Appendix A .............................................................................................................................. 90
Appendix B ............................................................................................................................... 91
Appendix C ............................................................................................................................... 94
Appendix D .............................................................................................................................. 95
Appendix E ............................................................................................................................... 96
Appendix G .............................................................................................................................. 96
Appendix H .............................................................................................................................. 97
Appendix I ............................................................................................................................... 98
Appendix J ............................................................................................................................... 99
Page 12
IX
List of Abbreviations
AFD French Development Agency
ANN Artificial Neural Network.
ARIMA Auto Regressive Integrated Moving Average.
CMWU Coastal Municipalities Water Utility.
ETS Exponential Smoothing State Space.
Hybrid ARIMA Auto Regressive Integrated Moving Average with (ANN/sets/TBATS).
KDD Knowledge Discovery in Data base
KFW the German Development Bank KFW
MAD Mean Absolute Deviation.
MAE Mean Absolute Error.
MAPE Mean Absolute Percentage Error.
MOA Ministry of Agriculture.
MSE Mean Squared Error.
SSA Singular Spectrum Analysis.
SVM Support Vector Machine.
TBATS State Space Model with Box-Cox Transformation, ARMA Errors,
Trend and Seasonal Components
TSFF Time Series Forecasting Framework.
UNRWA United Nations Relief and Works Agency.
Page 13
X
List of Figures
Figure (1–1): Data mining Techniques (Deepashri & Kamath, 2017) ................................................................. 4
Figure (2–1):Steps of Knowledge Discovery (Alasdi and Bhaya, 2017) .......................................................... 12
Figure (2–2): Time series components(Australian Transport- ASSessment and Planning, 2016) .................. 14
Figure (2–3): The classifications of neural network architecture(Sibanda & Pretorius, 2012) ......................... 17
Figure )3–1): Summary of the Most Related Works to this Work .................................................................... 36
Figure (4–1): Steps of propsed forecasting methodology ................................................................................ 41
Figure (4–2): Monthly wells production data of groundwater .......................................................................... 51
Figure (4–3): Annual rainfall amounts .............................................................................................................. 51
Figure (4–4): Splitting orginal datasets into training and testing ...................................................................... 53
Figure (5–1): Applying ARIMA Process by RapidMiner tool ......................................................................... 60
Figure (5–2): ARIMA Evaluation for rain amounts (Actual vs Predicted). ...................................................... 61
Figure (5–3): Evaluation of rain data using (ARIMA with NN) by R code ...................................................... 61
Figure (5–4): (ARIMA+NN) Evaluation for rain amount (Actual vs Predicted) .............................................. 62
Figure (5–5): Evaluation of rain data using (ARIMA + ETS ) R code .......................................................... 62
Figure (5–6): (ARIMA+ ETS ) evaluation for rain amount (Actual vs Predicted) .......................................... 63
Figure (5–7): Evaluation of rain data using (ARIMA +TBATS) R code .......................................................... 63
Figure (5–8): (ARIMA+ TBATS) evaluation for rain amount (Actual vs Predicted) ....................................... 64
Figure (5–9): Evaluation of ETS for rain amounts (Actual vs Predicted) ........................................................ 65
Figure( 5–10): MAPE percentages for forecasting algorithms .......................................................................... 66
Figure (5–11): ARIMA evaluation for wells production (Actual vs Predicted) ................................................ 67
Figure (5–12): Hybrid ARIMA (ARIMA+NN) Evaluation for wells production (Actual vs Predicted). ......... 67
Figure (5–13): (ARIMA+ ETS ) Evaluation for wells Production (Actual vs Predicted) ................................ 68
Figure (5–14): (ARIMA+ TBATS ) evaluation for wells Production (Actual vs Predicted). .......................... 68
Figure (5–15): ETS evaluation for wells production (Actual vs Predicted) ..................................................... 69
Figure (5–16): MAPE percentages for forecasting algorithms for monthly amounts of wells’ production. ..... 69
Figure( 5–17): data representation of groundwater (before re-dividing)........................................................... 71
Figure( 5–18): Semi-annual data representation of groundwater (after re-dividing) ........................................ 71
Figure (5–19): MAPE percentages for forecasting algorithms for semi-annual amounts of wells’ production.
............................................................................................................................................................................ 72
Figure (5–20): Five years forecasting Rain amounts for Dear El-Balah city using (ARIMA+NN) ................. 73
Figure (5–21): Five years groundwater forecasting for semi-annual wells’ production amounts using
(ARIMA) ............................................................................................................................................................ 73
Figure (5–22): The Relationship between rain amounts and groundwater deviation in comparison with 201775
Figure (5–23): Rain amounts deviation (Actual + forecasted) from 2013 to 2022 ........................................... 76
Figure (5–24): Wells’ production data (Actual + forecasted) from 2013 to 2022 ............................................. 76
Figure (5–25): Groundwater production data(Actual+forecasted) from 2008 to 2022 ..................................... 77
Figure (5–26): Rain amounts data(Actual+forecasted) from 1985 to 2022 ...................................................... 77
Page 14
XI
List of Tables
Table (4-1): Sample of groundwater dataset before preprocessing ................................. 43
Table (4-2): Sample of Rainfall data before preprocessing ............................................. 44
Table (4-3): Sample of combining wells production in new data set .............................. 47
Table (4-4): Sample of combining rain amounts of Dear El- Balah city in new data set
........................................................................................................................................... 48
Table (4-5): Sample of groundwater time series data set................................................. 49
Table (4-6): Sample of Rain time series data set ............................................................. 49
Table (4-7): Filling missing value in JUN-13 .................................................................. 50
Table (4-8): Training and testing data periods of datasets ............................................... 53
Table (5-1): Sample of the training set of rain data from 1985 to 2010 .......................... 57
Table (5-2): Sample of the testing set of rain data from 2011 to 2017 ............................ 58
Table (5-3): Sample of the training set of wells data from Jan 2007 to Dec 2017 .......... 58
Table (5-4): Sample of testing set of wells data from Jan 2016 to Dec 2017 .................. 59
Table (5-5): Comparing Methods MAPE over ‘Rain amount'. ........................................ 65
Table (5-6): Algorithms performance (accuracy) over wells Production. ...................... 69
Table( 5-7): dividing the data into semi-annual data ....................................................... 70
Table (5-8): Algorithms MAPE over wells’ production. ................................................. 72
Table (5-9): Deviation for groundwater and rain amounts comparing with 2017 ........... 74
Table (5-10): Deviation of rain amounts and groundwater amounts compared to last 5
years .................................................................................................................................. 75
Page 15
1
Chapter1
Introduction
Page 16
2
Chapter 1
Introduction
1.1. Background and Context
Water is considered one of the scarce resources in Palestine. An important reason for
this is the restriction imposed by the Israeli side on the Palestinian institutions in the field
of the extraction and use of water. Furthermore, the economic situation in the Gaza Strip
plays a vital role because of the shortage of the financial resources for investing in the
water sector resulting in high rates of poverty (Authority, 2018).
Water resources in Palestine include runoffs, surface water; and groundwater.
While the share of the Palestinians is around 11%, the remaining 89% is exploited by Israel
(Barakat & Heackock, 2013).
Gaza Strip is one of the most densely populated regions on the planet, According
to the latest population statistics in the Gaza Strip, the number of citizens in the Gaza Strip
until the beginning of April 2018 is 2090405, including the central governorate with a total
of 294531 citizens (Ministry of Interior and National Security, 2018). In 1970s, Israel was
physically controlling almost all available sources of fresh water from the Jordan River and
its catchment (including the Golan Heights annexed from Syria) as well as the underground
aquifer water.(Al-Shalalfeh, Napier, & Scandrett, 2018).
According to the latest reports and national plans of Palestinian Water Authority,
the water situation in Gaza is facing critical condition (Aiash & Mogheir, 2017). The only
source of water is the ground aquifer. It is noticed that the water level is decreasing as the
demand for water is increasing for many and different uses. The Coastal Aquifer receives
a yearly rate recharge of 55 -60 MCM/y specially from rainfall, while the yearly extraction
rates from the aquifer are about 200 MCM. The lower the groundwater level is due to
unsustainable high rates of extraction and the gradual intrusion of seawater and up conning
of the underground saline groundwater. Due to these conditions which made the aquifer
unusable in 2016, and the damage will be irreversible by 2022.
The classical coastal aquifer of GAZA represents the sole water source of GAZA
Strip covering an area of 360 (km2) with a total recharge of nearly 60 mcm/ yr. Due to over
Page 17
3
pumping(Daibes-Murad1, 2004), the Gaza aquifer is threatened by seawater and salt
groundwater intrusion.
Rainwater is the only renewable source that feeds the Coastal Aquifer in Gaza Strip,
especially because of the increased reliance on the amount of groundwater by official
institutions such as municipalities. The maintenance of water levels in the wells is very
important taking into account the amounts of rainfall falling to maintain Salinity and other
minerals in groundwater (Aiash & Mogheir, 2017).
The increasing consumption of groundwater and decreasing rainfall water, will lead to
depletion of groundwater wells, thus increase the salinity rate due to the entry of sea water
and it’s mixing with groundwater, especially in areas where wells are close to the
Mediterranean Sea(Mushtaha & Al-Louh, 2013).
Gaza Strip is one of the scarcest water resources areas in the region. Groundwater
is the only water resource of water for the Palestinians in the Gaza Strip and provides more
than 90% of all water supplies(Report & Water, 2013). The rising depletion in the water
budget leads to falling in the quantity and quality of groundwater.
Data mining includes extracting the most important and related information from
large data sets and processing. It serves particular objectives. Data mining leads to either a
descriptive model or a predictive model. The descriptive model presents a summary
information, finds patterns in the data and the relationships between their attributes. The
Descriptive model includes tasks such as Sequence Discovery, Association Rules,
Clustering, Summarizations. The predictive model uses the strategy of predicting the data
values taken from various datasets. This model includes classification, prediction,
regression and analysis of time series as shown in Figure (1–1).
Page 18
4
Figure (1–1): Data mining Techniques (Deepashri & Kamath, 2017)
Forecasting task means understanding and recognizing the old behavior of an
attributes to predict its future behavior or pattern. It is one of the ancient famous techniques
of the predictive analytics. It is based on making forecasts and prediction using historical
data. Forecasting is one of the known techniques of time series data analysis. It’s used to
predict future trends in retail sales, water demands, weather forecasting, economic
indicators, stock markets, and many other applications(Aggarwal, 2015).
A reliable forecast of groundwater consumption is necessary for the proper
planning of groundwater demand and water management resources. Depending on the
forecasting time horizon term, forecasts can be classified, as suggested by (Jones, 2011),
into: (1) Long-Term which forecasts decades, (2) Medium-Term which forecasts years to
a decade, (3) Short-Term which forecasts up to year.
From the previous introduction, population in Gaza in specific is growing up by
time, and water resources are limited due to low rainfall annual amounts or by Israeli
politics against Palestinian Authority about water underground resources.
So, handling these factors by using forecasting science in future water service
projects is very important issue for mayors of municipalities and decision makers. By
routing the consumption of groundwater with planning between the salinity of groundwater
and anticipated rainfall amounts which reduce the salinity. Moreover, managing the
Page 19
5
consumption of groundwater through the coordination of the digging of wells by the
citizens.
CMWU has time series data for groundwater wells in Dear El Balah and the
amount of monthly production over years. The Ministry of Agriculture(Palestine, 2017)
also has a large time series data which belongs to all its services in the city. Hundreds of
records and data are collected. So, these data scientifically should be used in planning for
determining the relationship between groundwater and rainfall in the future.
In our research, we collected two different historical series data: the first was a
dataset for groundwater wells from (CMWU) in Dear El- Balah (10 years: 2008-2017) and
the second was historical rainfall amounts since 1985 (32 years) from the Palestinian
Ministry of Agriculture. The data prepared for applying forecasting algorithms to choose
the most accurate algorithm for the prediction of the next five years. Hence, useful results
will be presented to the managers and specialists to have a clear perspective view of the
water consumption in the near future.
1.2. Statement of the problem
The Coastal Municipalities Water Utility (CMWU) has monthly data of groundwater
consuming amounts, and the Ministry of Agriculture has also monthly rainfall amounts.
Both of these data are historical data and did not consider the future planning for the
groundwater management. In addition, the water Authority and the ministry of agriculture
give permits to citizens to dig underground wells, which increases the consumption of
groundwater and keep it uncontrolled. Moreover, Israel has acquired 89% of water
resources in the Gaza strip.
Furthermore, the relationship between the rainfall amounts - which feed the
groundwater reservoir that leads to reduces its salinity - and the groundwater production is
not estimated.
On the other hand, Gaza strip overlooks the Mediterranean Sea, making it more prone
to salinity. So, the increased consumption of the groundwater leads to the leakage of
seawater to replace the depleted groundwater, and thus the imbalance between freshwater
and saline water is making the groundwater in this area unusable.
Page 20
6
The forecasting science is not used by municipalities and the Ministry of Agriculture
to estimate the future need of underground water and amounts of rainfall and their relation
with salinity (more wells production leads to more salinity percentage if rainfall rates
stable). Forecasting will give a vision to decision makers and funders of the water sector
in terms of water demand particularly in the Gaza Strip because of the political and
economic situation is unstable.
1.3. Objectives
1.3.1. Main objective
To apply some forecasting algorithms on two time series datasets of groundwater
production and the amounts of rainfall for a given area (Dear El-Balah city as a case
study). Then select the most accurate algorithm to forecast the groundwater production
and rainfall amounts in the next 5 years.
1.3.2. Specific objectives
1. To aggregate data from different resources: the CMWU and the Ministry of
Agriculture;
2. To analyze, understand and pre- process data;
3. To fetch the datasets to be on the form of time series;
4. To model data by implementing datamining forecasting techniques;
5. To evaluate the algorithms by measuring the accuracy of the findings;
6. To select the most accurate algorithm to forecast for rainfall and groundwater
consumption for our case study;
7. To give relational perception of the groundwater reservoir with rainfall in the
coming five years depending on the forecasting algorithms.
1.4. Importance of the study
Forecasting time data, especially for groundwater consumption and rainfall water, is
very important. It routs the annual pumping of groundwater. Moreover, it helps making
future plans based on the results of the groundwater production and rainfall rates, which
are the main source of groundwater recharge, thus reducing the salinity ratio. It also helps
Page 21
7
managing the consumption of groundwater through the coordination of the digging of wells
by the citizens. The importance of our proposed model stems from:
1. Being helpful for other cities to benefit from this approach to estimating future
consumption of groundwater.
2. Being applicable to other utilities such as electricity, gas and oil.
3. Help both the CMWU and the Ministry of Agriculture in forecasting groundwater
and rainfall amounts for efficient decision making.
4. Providing means of reducing groundwater consumption such as water desalination
and getting the utmost benefit from the rainfall.
5. Reducing the salinity of the groundwater reservoir resulted from the increased
pumping.
6. Making plans depending on true time data related to the past in order to monitor
the future situation of the groundwater and coordination in the drilling of wells by
citizens.
7. Raising the awareness of those funding water projects
8. Not draining the available groundwater reservoir.
9. The groundwater resovir is affected by the temporal and spatial difference of
precipitation.
10. The search for new sources for groundwater recharge, such as: the collection of
rainwater from the roofs of buildings and covered areas.
11. Presenting suitable suggestions and recommendations for decision and policy
makers and supervisory bodies such as:( CMUS, MoA and the water authority).
1.5. Scope and limitations of the project
1.5.1. Main Scope
The main scope of this study is the analysis and forecasting of demand for
groundwater and rainwater in Dear El-Balah city for 5 years in advance, especially due
to the political and economic situation under the rule of the State of Israel, the
assumption of this research is limited to the management of groundwater consumption
for customers and rainfall water. This research will focus on historical data of Dear El-
Balah Municipality.
Page 22
8
1.5.2. Main limitations
1. Monthly groundwater consumption data were aggregated from 2008 to 2017 and
rainfall dataset were from 1985 to 2017 (of about 32 years).
2. The medium-term forecasting methods will be used for next 5 years.
3. Our work focuses only on forecasting groundwater production quantity at Dear El-
Balah and forecasting rainfall water quantities.
4. Wells dug from citizen are not included in the study.
5. Salinity equations and their percentage in wells and pollution percentages are not
included in the study.
6. Evaporation rates of rainwater and population increase are also not included.
1.6. Methodology
The methodology of this research consists of the following five phases:
Phase 1: Data Collection and Acquisition:
This thesis depends on historical data collected from different institutions: (CMWU)
and the MoA in Dear El-Balah city. The collected data from the (CMWU) was for the
monthly groundwater production since 2007 and that from the MoA was for the rainfall
amounts since 1985. We selected the related tables and columns that belong to our study
and prepare it for the next phase.
Phase 2: Data Preprocessing:
Required preprocessing tasks were applied to enhance data efficiency before applying
the forecasting algorithms. Preprocessing include several techniques e.g. integration,
reduction and missing values. We performed Microsoft Excel 2016 version for
preprocessing
Phase 3: Implementation:
After preprocessing and preparing the real datasets on the form of time series data, we
gained two datasets: the first one is of the monthly wells production from January 2008 to
December 2017 and the second one is of the annual rainfall amounts from 1985 to 2017.
Page 23
9
Five types of forecasting algorithms were run on these datasets. These algorithms are: 1-
ARIMA, 2- ETS, 3-ARIMA +NN, 4- ARIMA+ETS, 5-ARIMA+ TBATS. According to
their results, the appropriate algorithm with the lowest computed MAPE value will be
selected for real forecasting from 2018 to 2022.
Phase 4: Evaluating the algorithms:
Evaluation of the performance of the five selected forecasting algorithms was made for
each dataset separately by MAPE measure. We computed the MAPE value of the predicted
results and actual values in the testing set
Then, we compared the MAPE values of the forecasting algorithms. The algorithm with
the lowest MAPE value will be the most accurate algorithm and will be selected for the
actual forecast task.
Phase 5: Forecasting:
After selecting the most accurate algorithm, we forecasted both the groundwater
production and rain amounts for the next five years in Dear El-Balah city.
1.7. Thesis Outlines
The research is divided into six chapters. Chapter one includes the introduction,
statement of research problem, objectives, scope and limitation of research, research
methodology. Chapter two provides Literature Review. As for chapter three, it summarizes
some related works associated with our research. Chapter four explains the methodology
and model development. Chapter five provides experimental results. The analysis of
experiments results and discussion. The last one, chapter six, talks about conclusion and
future work. At the end of this thesis are the References list and the Appendices.
Page 24
10
Chapter2
Literature Review
Page 25
11
Chapter 2
Literature Review
Many types of research and scientific papers discuss forecasting which has received
attention over time.
In this chapter we talk about the CMWU and MoA as sources of datasets, data
mining overview, forecasting accuracy measures, time series analysis, data mining versus
statistics. We summarized forecasting methods and algorithms we used in the thesis which
are ARIMA, Hybrid ARIMA: ARIMA combined with ETS, TBATS, NN, and ETS.
2.1. Coastal Municipalities Water Utility (CMWU)
The CMWU was established in 1995 to be the first national and professional institution
under which all the water and sewage directorates work in municipalities in the Gaza Strip.
It runs four water utilities; three in the West Bank and one in Gaza. CMWU is funded by
UNRWA, the Red Cross and Red Crescent, the German Development Bank (KFW), the
French Development Agency (AFD) and other national and international institutions.
CMWU has carried out several tasks to serve the water sector. It established 45
water wells in Gaza and desalination units on some wells, developed water services in the
Gaza Strip and rebuilt what the war destroyed by 6 million dollars. It also repaired all water
wells and sewage lines and the treatment plant in Gaza City.
Its purpose is to provide residents of the Gaza Strip with integrated,
environmentally sound and safe water and sanitation services through optimal utilization
of available resources and innovative solutions (Coastal Municipalities Water Utility,
2015).
2.2. The Ministry of Agriculture (MoA):
It aims to achieve food security and contribute to improving the quality of life for
farmers and Palestinian citizens. In addition, it works to develop the agricultural sector
through several tasks. The following are the most important ones:
• Optimize agricultural resources, especially land and water efficiently and
effectively and contribute to food security.
Page 26
12
• Increasing and improving the competitiveness of agricultural production in the
markets locally and abroad and introducing new agricultural varieties.
• Enable the private sector to play its role easily to enhance the agricultural and
rural development process.
In our research of Dear El-Balah city as a case study, we collected some related
data from the Coastal Municipality Water Utility and the General Directorate of soil and
Irrigation (GDSI) which is affiliated to the Ministry of Agriculture. The GDSI monitors
precipitation and prepares rainfall reports. We applied the forecasting algorithms to benefit
from the results in support of decision makers and the employment of future plans to serve
the water sector, especially with the increase in the population.
2.3. Data Mining Overview
A process of extracting valuable information from different data sources is called
Knowledge Discovery in Database (KDD). Data mining represents a step of KDD by
performing models and analysis for huge dataset using data mining techniques
e.g.(classification, clustering, association rules)( Alasdi and Bhaya, 2017).
Figure (2–1):Steps of Knowledge Discovery (Alasdi and Bhaya, 2017)
Figure (2–1) illustrates the data mining process of Knowledge discovery steps.
The data must be selected to determine the target data. Then the data is transformed
Page 27
13
into a suitable form for mining by performing some of the preprocessing or aggregation
operation on data. After that a primary process including intelligent methods such as
classification, clustering, and association is applied to data to extract patterns. Pattern
Evaluation identifies the true situation of pattern based on some specific measures. The
final step is called knowledge presentation. It aims to visualize mined knowledge in which
a human can understand and figure it (Han, Kamber, & Pei, 2012).
2.4. Forecasting Accuracy measures:
The most common metrics used to measure accuracy in forecasting are Mean
Absolute Error (MAE), Root Mean Squared Error (RMSE) and the Mean Absolute
Percentage Error (MAPE). We used MAPE as a measure of accuracy of forecasting
methods in groundwater and rainfall prediction. MAPE measure expresses accuracy as a
percentage, it is defined by the following formula:
𝐌𝐀𝐏𝐄 =𝟏
𝑵∑ |
𝑨𝒄𝒕𝒖𝒂𝒍𝒊−𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅𝒊
𝑨𝒄𝒕𝒖𝒂𝒍𝒊| 𝟏𝟎𝟎%𝑵
𝒊=𝟏 (Equation 2-1)
Where Actuali is the actual value, Predictedi is the forecast value.
The difference between Actuali and Predictedi is divided by the Actual value. The absolute
value in this calculation is summed for every forecasted point in time and divided by the
number of fitted points N. Multiplying by 100% makes it a percentage error.
Sometimes MAPE was known as Mean Absolute Percentage Deviation (MAPD). It
usually expresses accuracy as a percentage(Kaytez et.al, 2015).
2.5. Time Series Analysis
Time series is an ordered sequence of values recorded at specific time intervals. In
time series analysis, we analyze the past behavior of a variable in order to predict its future
behavior. In other words, these data points are used to forecast the future.
A usual approach in analyzing time series is to decompose the series into four components
(Seymour, 2014). These components are illustrated in Figure (2–2):
1. Trend component – Trend is the decrease or increase in the series over a period of time, it
persists over a long period of time.
Page 28
14
Example: Births growth over time can be seen as an upward trend. So, the variable exhibits
the general movement during the observation period without taking the irregularities and
seasonality into account. If a time series does not show decreasing or an increasing pattern
then the series is stationary in the mean.
2. Seasonality component – it occurs when the time series exhibits regular fluctuations during
the same period(month) every year, or during the same quarter every year.
Example: Sales of gifts increases during the Christmas season.
3. Random component– It is the trend-cycle and the seasonal components which have been
removed. Short-term fluctuations in a series which are not systematic and, in some instances,
not predictable leads to the results.
4. Cyclic component– When data exhibit rises and falls that are not of fixed period, it is a
cyclic pattern. The existence of the fluctuations usually at least 2 years.
Figure (2–2): Time series components(Australian Transport- ASSessment and
Planning, 2016)
Time series analysis can be classified as linear and non-linear and into a univariate and
multivariate analysis. Univariate time series means a time series containing a single
observation recorded sequentially over time, for example, hourly energy consumption.
Page 29
15
Multivariate time series is used when a group of time series variables is involved and their
interactions are to be considered(Deb et.al, 2017).
Time series forecasting is different from time series analysis in that it uses a model to
predict future values based on previously observed values. In terms of Time series analysis,
it comprises methods for analyzing time series data in order to extract some useful statistics
and other characteristics of the data.
(Ullah, 2014) classifies the major objectives of time series analysis as:
1. Description: such as looking for trends, plotting data, seasonal fluctuations and so on;
2. Explanation: making possible to use the variation in one-time series to explain the
variation in another series by observations taken of two or more variables;
3. Prediction: predicting the future values of the series;
4. Control: Control procedures are of several different kinds e.g. quality measuring of the
time series manufacturing process.
2.6. Forecasting techniques:
2.6.1. Auto-Regressive Integrated Moving Average (ARIMA)
ARIMA stands for the auto-regressive integrated moving average(Dalinina,
2017) and is specified by these three order parameters: (p, d, q). The process of fitting
an ARIMA model is sometimes referred to as the Box-Jenkins method.
An autoregressive (AR(p)) component refers to the use of past values in the
regression equation for the series Y. The auto-regressive parameter p specifies the
number of lags used in the model. For example, AR(2) or,
equivalently, ARIMA(2,0,0) is represented as
𝑌𝑡 = 𝑐 + 𝜑1𝑦𝑡−1 + 𝜑2𝑦𝑡−2 + 𝑒𝑡 (Equation 2-2)
Where φ1, φ2 are parameters for the model, the d represents the degree of
difference in the integrated (I(d)) component. A series difference simply involves
subtracting its current and previous values d times. Differencing is often used to
stabilize the series when the stationarity assumption is not met. This will be discussed
below.
Page 30
16
A moving average (MA(q)) component represents the error of the model as a
combination of previous error terms et. The order q determines the number of terms
included in the model.
𝑦𝑡 = 𝑐 + 𝜃1𝑒𝑡−1 + 𝜃2𝑒𝑡−2 + ⋯ 𝜃𝑞𝑒𝑡−𝑞 + 𝑒𝑡 (Equation 2-3)
Differencing, autoregressive, and moving average components make up a non-
seasonal ARIMA model which can be written as a linear equation:
𝑦𝑡 = 𝑐 + 𝜃1𝑦𝑑 𝑡−1 + 𝜃𝑝 𝑦𝑑 𝑡−𝑝 + ⋯ 𝜃1𝑒𝑡−1 + 𝜃𝑞𝑒𝑡−𝑞 + 𝑒𝑡(Equation 2-4)
where yd is Y differenced d times and c is a constant.
ARIMA models can also be specified through a seasonal structure. In this case, the
model is specified by two sets of order parameters: (p, d, q) as described above
and parameters describing the seasonal component of m periods.
ARIMA methodology has its limitations. These models directly rely on past values,
and therefore work best on long and stable series. It is also noticed that ARIMA
simply approximates historical patterns and therefore does not aim to explain the
structure of the underlying data mechanism.
2.6.2. Hybrid ARIMA
ARIMA models are important algorithms for time series data. It assumes linear
relationship, and it needs a lot of data to produce accurate results. It is capable of
representing stationary as well as non-stationary time series(Gahirwal, 2013).
Forecasting performance can be enhanced by combining several different models.
Therefore, combining different models can increase the chance to discover more hidden
patterns and relationships in the data, improve forecasting performance, reduce the risk of
using an inappropriate model ,reduce the risk of failure, obtain more accurate results and
overcome the limitations of each component(Khashei & Bijari, 2011).
So, combining ARIMA models with other models, often non-linear prediction
algorithms for forecasting, gives better estimates than does the single time series models.
Page 31
17
A novel hybridization of intelligent techniques and ARIMA models for time series
prediction can be done. These techniques are: ETS, TBATS, NN, stlm.
Both artificial neural networks (ANNs) and (ARIMA) models have achieved
success in their own linear or nonlinear domains. None of them is a universal model that is
suitable for all circumstances.
2.6.2.1 Neural Networks Model (NNs)
NNs as a nonlinear model are considered a time series forecasting model. They are
a highly abbreviated model of the structure of the biological neuron. NNs consist of: 1.
Processing unit; 2. Interconnections; 3. Operations; 4. Updates. NNs have several
processing units that are interconnected with each other according to some topology by
using interconnections. NNs take a signal from the out world or from the outputs of other
processing units. In operation units, the input signal comes to the NN from other units that
are connected units from the out world. There are two learning procedures to run the NNs.
The first is a supervised learning and the second is an unsupervised learning. It’s possible
to say many types of NNs that have supervised or unsupervised learning
methods(Egnanarayana, 2005). Figure (2–3) shows the classification of NN models into
feed- forward NNs, Recurrent NNs, Stochastic NNs and Modular NNs.
Figure (2–3): The classifications of neural network architecture(Sibanda & Pretorius,
2012)
Page 32
18
2.6.2.1.1 Feed-forward Neural Networks
They are the simplest type of ANNs in which the information moves in only one
direction without loops in the network; forward from the input nodes through the hidden
nodes and to the output nodes. Single layer perceptron (SLP) and multi-layer perceptron
(MLP) are examples of feed-forward NNs.
2.6.2.1.2 Recurrent Networks (RN):
These models are with a bi-directional data flow .RN propagates data from later
processing stages to earlier stages (Bitzer & Kiebel, 2012) the data don't propagate linearly
from input to output as in feed-forward network.
2.6.2.1.3 Stochastic Neural Networks (SNN):
The SNN differs from a typical neural network in describing a practical system
more accurately and introducing random variations into the network(Yang, Zhang, & Shi,
2010).
2.6.2.1.4 Modular Neural Networks
A Modular Neural Network (MNN) is a Neural Network (NN) that consists of
several modules, each module carrying out one sub-task of the NN’s global task, and all
modules are functionally integrated. The whole network can either be a sub-structure or a
learning sub procedure. The network’s global task can be any neural network application,
e.g., mapping, function approximation, clustering or
associative memory application(Auda & Kamel, 1999). In the MNN the modules do not
interact with each other. A modular neural network has many benefits one of which is the
ability to reduce a large neural network to smaller, more manageable components (Azam,
2000).
Hybrid ARIMA (ARIMA +NN) is one of the best and famous hybrid models in
operating forecasting tasks. The motivation of that comes from the following perspectives
(Khashei & Bijari, 2011) (1) the problem of model selection can be eased by combining
linear ARIMA and nonlinear ANN models when it is difficult to determine whether a time
series is generated from a nonlinear or linear underlying process. (2) where neither ANN
nor ARIMA models alone can be adequate for modeling in rare cases of time series are
Page 33
19
pure linear or nonlinear and often contain both linear and nonlinear patterns; combining
linear ARIMA and nonlinear ANN models form a solution to the problem of modeling the
combined linear and nonlinear autocorrelation structures. (3), it is agreed in the
forecasting science that no single model is the best in every situation, due to the complexity
of the real-world problem in nature and any single model may not be able to capture
different patterns equally well. Therefore, combining different models in order to capture
different patterns in the data.
2.6.2.2 Exponential Smoothing (ETS)- the state space models:
Exponential Smoothing (also known in forecasting society as ETS). ETS is a part
of smooth package. It aims to gather all the essential smoothing techniques used in
forecasting. There are statistical models that underlie the exponential smoothing methods.
A measurement equation of each model gives a description of some transition
equations and the observed data of how the unobserved components or states (level, trend,
seasonal) change over time. Hence these are referred to as “state space models”( Rob J
Hyndman, 2018).
The idea of exponential smoothing is to smooth the original series the way the
moving average does and to use the smoothed series in forecasting future values of the
variable of interest. In exponential smoothing, however, we want to allow the more recent
values of the series to have a greater influence on the forecast of future values than the
more distant observations.
An exponentially weighted average of past observations is constructed from a
simple and pragmatic approach to forecasting. The present observation is given the largest
weight, whereas less weight is given to the immediately preceding observation and the
observation before that is the less weight (exponential decay of influence of past data).
Exponential smoothing of time series data(Horse, 2018) assigns exponentially
decreasing weights for newest to oldest observations. On the other hand, the data which is
older, the less priority (“weight”) the data is given; newer data is seen as more relevant and
is assigned more weight. Usually, Smoothing parameters (smoothing constants)— denoted
by α— determine the weights for observations.
Page 34
20
In this thesis we have considered the simplest form of the exponentially smoothing
methods is called simple exponential smoothing (ETS). This method is common and
popular and its popularity is attributed to its being simple, efficient and easy to adjust its
responsiveness to change in the process. suitable for forecasting data with no clear trend or
seasonal pattern. The ETS is popular can be attributed to its simplicity, its computational
efficiency, the ease of adjusting its responsiveness to changes in the process being forecast,
and its reasonable accuracy(Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, 2002).
Simple (single) exponential smoothing uses (Stephanie, 2018) a weighted moving average
with exponentially decreasing weights. The basic formula for Simple Exponential
Smoothing is:
St = αyt-1 + (1 – α) St-1 (Equation 2-5)
Where:
• α = the smoothing constant, a value from 0 to 1. When α is near to zero, smoothing
occurs more slowly. Following this, the best value for α is the one that results in the
smallest mean squared error (MSE). Various ways exist to do this.
• t = time period.
Many alternative formulas exist. For instance, Roberts (1959) replaced yt-1 with the
latest observation, yt Another formula uses the forecast for the preceding and present
period:
𝐹𝑡 = 𝐹𝑡 − 1 + 𝑎(𝐴𝑡 − 1 − 𝐹𝑡 − 1) (Equation 2-6)
= 𝑎 ∗ 𝐴𝑡 − 1 + (1 − 𝑎) ∗ 𝐹𝑡 − 1
Where:
• 𝐹𝑡 – 1 = forecast for the previous period,
• 𝐴𝑡 – 1 = Actual demand for the period,
• a = weight (between 0 and 1). The nearest to zero, the smallest the weight.
Which formula to use is usually a moot point, as most exponential smoothing is
performed using software. Whichever formula you use though, you’ll have to set an initial
Page 35
21
observation. This is a judgment call. You could use an average of the first few
observations, or you could set the second smoothed value equal to the original observation
value to get the ball rolling.
ETS can be used by FORECAST. ETS function in Microsoft Excel2016 for time
series (Microsoft, 2018). FORECAST. ETS function computes (predicts) a future value
by using existing values. ETS in Excel has the following syntax:
FORECAST.ETS (target_date, values, timeline, [seasonality], [data_completion],
[aggregation]). This function can be used to predict future sales, consumer trends and
inventory requirements.
The FORECAST.ETS function syntax has the following arguments:
• Target_date: Required. The data point for which you want to predict a value.
Target date can be date/time or numeric.
• Values Required:Values are the historical values, for which you want to
forecast the next points.
• Timeline Required. The independent array or range of numeric data. The
dates in the timeline must have a consistent step between them and can’t be zero. The
timeline isn't required to be sorted, as FORECAST.ETS will sort it implicitly for
calculations.
• Seasonality Optional. A numeric value. The default value of 1 means Excel
detects seasonality automatically for the forecast and uses positive, whole numbers for the
length of the seasonal pattern. 0 indicates no seasonality, meaning the prediction will be
linear.
• Data completion Optional. Although the timeline requires a constant step
between data points, FORECAST.ETS supports up to 30% missing data, and will
automatically adjust for it 0.
• Aggregation Optional. FORECAST.ETS will aggregate multiple points
which have the same time stamp. The aggregation parameter is a numeric value indicating
which method will be used to aggregate several values with the same time stamp. The
default value of 0 will use AVERAGE, while other options are SUM, COUNT, COUNTA,
MIN, MAX, MEDIAN.
Page 36
22
2.6.2.3 State Space Model with Box-Cox Transformation, ARMA
Errors, Trend and Seasonal Components (TBATS)
The TBATS model (de Livera, 2011) is a time series model for series exhibiting
multiple complex seasonalities. TBATS is an acronym for all the techniques used to create
the model.
• T for trigonometric regressors to model
• B for Box-Cox transformations
• A for ARMA errors
• T for trend
• S for seasonality
The BATS model is similar to the TBATs model which is a generalization of the
BATs except for lacking the trigonometric regressors. In the forecast package for R, The
TBATS model can be fitted using the TBATS () command.
TBATS allows modeling datasets with varying characteristics, such as non-linear and
linear time series, single, multiple, high period, and non-integer seasonality as well as dual
calendar affects. These advantages make it a complex framework housing a large number
of features which are made easy for statisticians )Geetha & Maksood ,2017).
TBATS combined with ARIMA model, being an adapting model and provided
better and impressive prediction accuracy compared to the other models. Hence, it is
employed for forecasting groundwater data.
Strengths and Weaknesses of TBATS model(Chou,2017):
• Strengths:
1. Can deal with data with non-integer seasonal period, non-nested periods and high
frequency data.
2. Can do multi-seasonality without increasing too many parameters.
3. It has also all the following strengths of BATS:
▪ Box-cox transformation can deal with data with non-linearity and then
somewhat makes the variance becomes constant.
▪ ARMA model on residuals can solve the autocorrelation problem.
Page 37
23
▪ No need to worry about initial values.
▪ Can get not only point prediction but also interval prediction.
▪ The performance is better than the simple state space model.
• Weaknesses:
1. Cannot add explanatory variables.
2. The performance for long-term prediction is not very well.
3. The computation cost is big if the data size is large.
.
Page 38
24
Chapter3
Related Work
Page 39
25
Chapter 3
Related Work
Forecasting plays a vital role in the operations related to modern management. It is an
important and necessary aid to planning which is considered the backbone of effective
operations. Many organizations have failed because of the lack of forecasting or faulty
forecasting on which the planning was based.
The purpose of forecasting is to help scientists and decision makers making effective
plans and experiments and translating the findings into principles that are easy to
understand and apply (Armstrong & Fildes, 2006). There are many areas in which
forecasting is widely used, such as sales forecasting, forecasting production yields, weather
forecasting, and forecasting demands and so on.
Recently, groundwater forecasting has become an essential component in effective
water resources planning and management. This is because rainfall forecasting helps
predicting the wet and dry periods of the year in advance. It also helps the annual rainfall
intensity to manage disasters or floods.
It provides a valuable trigger in determining the time and the capacity for new water
resources development and control the water consumption. Therefore, there is an increased
need for groundwater demand forecasting. It can provide a simulated view of future, and
contribute in identifying the suitable management alternative in balancing water supply
and demand (Mohamed & Al-Mualla, 2010).
Because of the great importance of forecasting, many researches and papers were
published in this field. We present different related works using two sections: the first
section dedicated to forecasting groundwater and rainfall consists of three levels of
forecasting. The first one is of a short-term nature. It forecasts data up to only one year.
The second one is Medium in term. It forecasts data from one year to ten years. This will
be the main point in this study. The last one is long as it deals with forecasting data for
more than ten years. Regarding the second section, it lists some researches in other domains
of forecasting for example oil, electricity. etc.
Page 40
26
3.1 Groundwater and Rainfall forecasting
Khali et.al (2015) examined five data driven models for the short-term forecasting
of groundwater levels undermine- tailings recharge conditions at two observation wells:
MLR, ANN, W-MLR, W-ANN and W-ENN models. The ANN and MLR models aimed
to establish a functional relationship between the levels of groundwater and predictors
(tailings recharge, mean temperature and precipitation). The (W-MLR, W-ENN and W-
ANN) Wavelet based models and wavelet analysis were employed to de-noise the mean
temperature and precipitation variables. Then, MLR, ENN and ANN were used to
approximate the functional relationship between the de-noised predictors and the
groundwater levels. Evaluating the performance of the five models was by using the 'leave-
one-out' validation approach. According to their results: The W-ENN models outperformed
the other models of forecasting for the groundwater levels at lead times of 1 day, 1 week
and 1 month. Results also showed that ANN- based models' performance was better than
that of the MLR- based models. By de-noising the predictors using wavelet analysis, the
performance of the ANN and MLR models were improved.
Rajaee et.al (2016) proposed a hybrid wavelet- artificial neural network (WANN)
and a geostical method for spatiotemporal prediction of the groundwater level (GWL) for
one month ahead. They collected monthly observed time series of GWL from Sep 2005 to
April 2014 in 10 piezometers around Mashhad city in Iran. ANN (an artificial neural
network) and a WANN were trained for each piezometer. The prediction accuracy
comparison illustrated that WAN was more efficacious in prediction for month a head of
GWL. In their study the kriging method and Gaussian model were selected to predict GWL
in desired points, a Gaussian model with 0.253 in RMSE were a suitable choice for
spatiotemporal GWL forecasting. According to the obtained map of groundwater level, the
groundwater level was higher in the areas of plain located in mountains, which reflects the
correct outcomes.
Mahmud et.al (2017) used Seasonal Autoregressive Integrated Moving Average
(SARIMA) model to forecast monthly rainfall for twelve months lead-time for thirty
rainfall stations of Bangladesh. The lowest value (0.672) was found for the Barishal station
and the highest R-squared value (0.868) was found for Teknaf station. Only two stations
Page 41
27
which contain R-squared value below 0.70 were reasonably precise. This indicates the
SARIMA models developed to forecast rainfall. Hence, these SARIMA models could be
used as a convenient tool for nationwide and year-long rainfall forecasting.
Zhao et. al (2018)applied the rainfall forecasting by Kalman Filter method and
ARIMA. Kalman Filter method is used to declare a time series model to determine the
future forecast. It used a recursive solution to minimize error. The rainfall data was
clustered by K-means clustering. They used ARIMA (p,d,q) to construct a state space for
Kalman Filter model. As a result, they had four group of the data and one model in each
group. The study concluded that the Kalman Filter method gave better results compared to
the ARIMA model for rainfall
forecasting in each group. This is the RMSE value in ARIMA was bigger than Kalman
Filter. The error using Kalman Filter method was smaller
than that of ARIMA model.
Qiu et. al ( 2017) proposed a neural network-based approach to automatically extract
features from the time series measured at observation sites and leverage the correlation
between the multiple sites for weather prediction via multi-tasking. It was the first attempt
to use deep learning techniques and multi-task learning to predict short-term rainfall
amount based on multisite features. They formulated the learning task as an end-to-end
multi-site neural network model which allows to leverage the learned knowledge from one
site to other correlated sites and model the correlations between different sites. Experiments
showed that the proposed model outperformed a set of baseline models including the
European Centre for Medium range Weather Forecasts system (ECMWF). They compared
the proposed approach with the results of the public weather forecast center results and
demonstrated its effectiveness.
Bakker et.al (2014) studied three different forecasting models: A Transfer/-noise
model, an Adaptive Heuristic mode, and a Multiple Linear Regression model. In order to
assess the possible performance improvement due to using weather input, the performance
of the models was studied both with and without using weather variable. The largest
forecasting errors can be reduced by 11% when using weather
Page 42
28
input and the average errors by 7%. From their point of view the reduction was important
for the application of the forecasting model for the control of water supply systems and for
anomaly detection.
Bagirov et al. (2017) developed the model of Cluster wise Linear Regression (CLR) for
monthly rainfall prediction. Monthly rainfall was predicted by applying the algorithm in
Australia, Victoria, over the period of 1889 - 2014 using rainfall data with five input
variables from eight weather stations. Results revealed that the CLR model was efficient
for monthly rainfall predictions and it was superior to another models: the CR-EM, MLR,
SVMreg and ANNs models.
Mohanty et.al ( 2015) applied artificial neural network (ANN) approach to the weekly
forecasting of groundwater levels in multiple wells located over a River Basin. Gradient
descent with momentum and adaptive learning rate backpropagation (GDX) algorithm was
employed to predict groundwater levels 1 week ahead at 18 sites over the study area. An
appropriate set of inputs for the ANN model was selected. It consisted of weekly rainfall,
pan evaporation, river stage, water level in the surface drain, pumping rates of 18 sites and
groundwater levels of 18 sites in the previous week; This led to 40 input nodes and 18
output nodes. The model performance of forecasting groundwater levels at shorter lead
times was better than (up to 2 weeks) that for larger lead times.
Dhekale et.al (2015) aimed at forecasting groundwater fluctuations using time series
analysis groundwater data for each station under Murshidabad district. The time series
water observations were collected for four months (January, May, August and November)
for the period from 2005 to 2013. The technique of structural time series modelling was
applied to model and foresee the behavior of groundwater table in 2014. Data for 2005 to
2012 was used for analysis and 2013 data was used for validation. Residuals of
development model for each station was tested for normality and randomness.
The results showed that there are differences of groundwater depth among the sites
and seasons. During the month of November followed by August, there was maximum
variability among the sites in a particular season. This reflects that groundwater recharge
Page 43
29
was different according to site and season also. Some regions showed that there was
fluctuating water table over the years, and this could be due to varied rainfall in these
regions.
Mukhairez (2018) conducted the forecasting using four forecasting algorithms:
ARIMA, hybrid ARIMA, singular spectrum analysis (SSA) and linear regression. The
author applied these algorithms to a real dataset collected from Khan Younis municipality
(KHM) at the Department of Customer Services. The best algorithm was Hybrid ARIMA
which gave the least mean absolute percentage error (MAPE). Finally, three levels of
forecast included the study (the whole city, the sub areas and the classes inside Khan
Younis city.
The results after applying Hybrid ARIMA for the next five years showed the
minimum water revenue will decrease about 38% compared to 2017, but the minimum
water consumption for the overall city will increase to about 8.4, compared to 2017.
Banerjee et.al ( 2011) evaluated artificial neural network (ANN) simulation over
mathematical modeling in estimating safe pumping rate to maintain groundwater salinity
in island aquifers. To forecast the salinity under varied pumping rates, ANN model with
quick propagation (QP) as training algorithm had been used. The accuracy, reliability, and
generalization ability of the model were verified by real-time data. The model was trained
with 2 years of real-time data. Regarding the prediction of water quality with varying
pumping rate, it was made for a span of 5 years. The results showed the superiority of ANN
over SUTRA models. ANN model had surfaced as more accurate alternative to the
numerical method techniques. As for the pumping rate, it should be below 13,000 L/day to
stabilize the groundwater salinity within 2.5%. ANN was capable of understanding the
poorly relations between hydrological attributes and advantageous when the vaguely
defined problem did not demand any specific solution.
Cui et.al( 2017) employed entropy spectral analysis for long- term forecasting of
monthly groundwater levels. The frequency domain was the domain
of consideration for defining entropy. Three types of entropies were known:
Configurational entropy, Burg entropy and relative entropy.
Page 44
30
They led to three types of spectral analysis: (1) configurational entropy spectral
analysis (CESA), (2) Burg entropy spectral analysis (BESA), and (3) relative entropy
spectral analysis (RESA). CESA, BESA, and RESA were employed to
analyze spectra and forecast monthly groundwater levels. Then they were compared to
determine which spectral analysis method better forecasts the monthly groundwater level.
To verify the three methods, historical data of monthly and annual groundwater were
obtained from South Carolina. Both monthly and annual groundwater level data showed
significant decreasing trends at almost all stations. It was found that relative entropy
yielded the highest resolution in determining the spectral density, while for simulating
groundwater levels, all three methods fitted the observed values. Although reasonable
accuracy was yielded of forecasting groundwater time series using entropy spectral
analyses, the entropy spectral analysis had limitations in application for another different
data.
Mekanik et.al ( 2013) focused on forecasting of long-term seasonal spring rainfall
in Victoria. Artificial Neural Network (ANN) and Multiple regression (MR) approaches
were used for this purpose. Three regions (west, east and center) of Victoria were chosen
as a case study. Both ANN and MR modelling were assessed statistically using mean
square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott
index of agreement (d). The developed ANN and MR models were tested on out-of-sample
test sets.
The MR models showed very poor generalization ability for east Victoria with correlation
coefficients of 0.99 to 0.90 compared to ANN with correlation coefficients of 0.42–0.93.
ANN models also showed better generalization ability for central and west Victoria with
correlation coefficients of 0.68–0.85 and 0.58–0.97 respectively. The potential of ANN
over MR models for rainfall forecasting was suggested by the statistical analysis using
large scale climate modes.
Kisi & Sanikhani (2015) conducted long-term monthly
precipitation prediction without climatic data. They examined the accuracy of four
different soft computing methods: adaptive neuro-fuzzy inference system (ANFIS) with
Page 45
31
grid partition (GP), ANFIS with subtractive clustering (SC), (ANN) and support vector
regression (SVR). The ANFIS-GP model gave the best accuracy in five out of ten stations.
It was better than the other models in long-term monthly precipitation prediction. The ANN
model had the best accuracy in four stations while the ANFIS-SC was the best model in
only one station. The SVR model gave the worst results in all stations. So, the ANFIS-GP
was employed to predict the long- term precipitations of any site without climate
measurements. The annual and monthly precipitations were also mapped and evaluated
using ANFIS-GP model in their study. The precipitation maps showed the highest amounts
of precipitation occurred in the west regions, southwestern and north, while the lowest
values were seen in the southeastern parts and east of Iran.
The abovementioned research examined different models and proposed different
methods for either groundwater or rainfall forecasting and used different evaluations
metrics. Such forecasting was of a medium, short, or long-term nature. In this thesis, we
proposed a new method for prediction using famous forecasting algorithms: ARIMA,
Hybrid ARIMA and ETS. The main purpose is to estimate the groundwater production and
the quantities of rainfall for a given area (Dear El-Balah as a case study) over a given time
(1-5 years) as Medium-term forecasting. The results were evaluated by using MAPE
measure.
3.2 Other domains in forecasting researches
Iqelan (2016) used Singular Spectrum Analysis SSA in forecasting the monthly
electricity consumption of the Middle Province in the Gaza Strip in Palestine.
39 observations of his dataset from November 2005 to December 2015 are used as a
training sample. While the remaining 12 observations from January 2016 to December
2016 are used as a testing dataset to evaluate the electricity consumption forecasts.
The results declared that the error came by the SSA technique was smaller than that
obtained by the ARIMA and ETS state space models according to mean absolute error
(MAE), mean square error (MSE), root mean square error (RMSE)and mean absolute
percentage error (MAPE). SSA is compared with ARIMA and exponential smoothing state
space (ETS). SSA of (MAPE 9.38%) outperformed both ARIMA (MAPE 14.99%) and
ETS (MAPE 15.63%).
Page 46
32
Igor Aizenberg et.al (2016) discussed the long-term time series forecasting using a
Multilayer Neural Network with Multi-Valued Neurons (MLMVN). They evaluated the
proposed approach using data set of an oilfield asset located in the Gulf of Mexico. They
showed that MLMVN can be efficiently applied to multivariate and univariate one-step
and multi-step ahead prediction of reservoir dynamics. The research aimed to study some
important aspects of the application of
ANN models of time series forecasting that could be of particular interest for pattern
recognition community.
Kaytez et al. (2015) implemented support vector machine (SVM) and least square
support vector machine (LS-SVM) for the prediction of electricity energy consumption of
Turkey. They used independent variables using historical data from 1970 to 2009, e.g.
gross electricity generation, total subscribership results, installed capacity and population.
They compared the performance with multiple linear regression and ANN models. The
LS-SVM model achieved better than that of the ANN and MLR models by 0.88% and
1.70% respectively.
Based on the abovementioned, it can be used effectively for Turkey's long-term electricity
consumption forecast.
Kejela (2012) forecasted electricity consumption on a short-term basis for a particular
region in Norway using a novel approach; Gaussian process. The best feature vector was
designed for forecasting the electricity consumption using various factors such as
temperature, days of the week, previous consumptions, and hour of the day using reduction
and normalization methods. Feature space was scaled and reduced as different target
variables were analyzed to obtain better accuracy. The GP was compared with two
traditional forecasting techniques: Multiple Linear Regression (MLR) and Multiple Back
Propagation Neural Networks (MBPNN). The Gaussian processes were as better as
MBPNN in terms of short-term electricity forecasting, and it was far better than MLR.
Page 47
33
Zhang et.al (2015) proposed a new hybrid method to forecast crude oil prices. First,
they decomposed international crude oil price into a series of independent intrinsic mode
functions (IMFs) and the residual term by using the ensemble empirical mode
decomposition (EEMD) method. Then, the method of the least square support vector
machine together with the particle swarm optimization (LSSVM–PSO) and the
autoregressive conditional heteroskedasticity (GARCH) model were developed to forecast
the nonlinear and time-varying components of crude oil prices, respectively. Next, the final
forecasted results of crude oil prices were summed. They compared the new method and
previously popular forecasting methods. The results proved superiority of the new hybrid
method in crude oil price forecasting.
Thiyagarajan et.al (2017) Proposed ARIMA model for forecasting the failure of a
sensor that measures surface temperature from an urban sewer. The proposed approach
based on the past time series of data was examined and compared with ETS and TBATS
model. The models were evaluated by using MAE, MPE, MAPD and RMSE. Prediction
the performance of TBATS model was better than that of the ETS model. In addition,
predicting the performance of ARIMA model was better than that of both ETS and TBATS
model.
Panigrahi & Behera (2017) developed a new hybrid methodology by combining
linear and nonlinear models from innovation state space (ETS) with ANN. Because both
ETS and ANN models have linear and nonlinear modeling capability, the ETS–ANN
model glorified the chances of capturing different combination of nonlinear and/or linear
patterns in time series. First, ETS was applied to the given time series and predictions were
obtained. Then the residual error was calculated by subtracting the ETS-predictions from
the original series. The residual error sequence obtained was modeled by ANN. The final
prediction was obtained by combining the ETS-predictions with ANN-predictions. Results
indicated the superiority of proposed model by achieving the best rank among all the
models.
Page 48
34
Panigrahi & Behera (2017) proposed models of short-term load forecasting based
on linear regression and variables of load time series. The proposed methods were
compared with ARIMA, exponential smoothing, neural networks and SVM models.
Wongsathan & Seedadan (2016) improved the forecast performance of both
ARIMA and NNs for high accuracy by employing both hybrid ARIMA and NNs model to
forecast pollution of time series data in the Chiangmai city. Their results demonstrated that
ARIMA-NNs performed better than single NNs by average 65% and by average 50% for
ARIMA model.
Deb et al., (2017) presented and analyzed a comprehensive review of the 9 major
time series forecasting techniques: ANN, ARIMA, SVM CBR Fuzzy Grey MA & ES, NN
and Hybrid. They provided a summary of hybrid model methods that combined two or
more techniques in a way that each model completes the strength of the other, e.g.
ARIMA+ANN, ARIMA+SVMs, ARIMA+ Evolutionary Algorithms.
Chhetri et.al (2017) proposed a new forecasting technique to predict Amazon EC2
Spot prices. Their approach was distinguished by the application of training periods for the
non-deterministic and deterministic time series components. They evaluated their method
against ARIMA, ETS, STL, and TBATS techniques as well as simple techniques such as
Seasonal Naïve and Naïve. Experimental results indicated that their proposed technique
outperformed STL or ARIMA as a forecasting technique.
Kanchymalay et.al ( 2017) studied the relation between crude palm oil (CPO) price,
some selected vegetable oil prices, the monthly exchange rate and crude oil. Using the
machine learning techniques, they performed comparative analysis on CPO price
forecasting results. Data of monthly CPO prices, crude oil prices, monthly exchange rate
and selected vegetable oil prices were used from January 1987 to February 2017. There
were a positive and high relation between the CPO price and other oil prices and also
between CPO price and crude oil price. Multi-layer perception, Support Vector Regression
and Holt Winter exponential smoothing techniques were used to forecast the CPO price
Page 49
35
using multivariate time series. The prediction results showed that Support Vector
Regression had the lowest MAPE of 7.8%, so it is the most accurate in forecasting
multivariate time series of CPO price.
Ramos et.al (2015) compared the forecasting performance of ARIMA models and
state space models (ETS). Both multiple-step and one-step forecasts were produced
through applying the models to a case study of retail sales of different categories of women
footwear. The performance of forecasting ARIMA models and state space was evaluated
via MAPE, RMSE. It was noticed that the MAE was similar on both multi-step forecasts
and one-step. The results demonstrated that when an automatic algorithm was applied the
overall out-of-sample forecasting performance of ARIMA models was not better than ETS
models in predicting retail sales, and neither was best for all circumstances.
Gong et.al (2016) developed and applied three models: artificial neural networks
(ANN), support vector machines (SVM) and adaptive neuro fuzzy inference system
(ANFIS) in the prediction of the groundwater level. The prediction took the interaction
between groundwater and surface water into consideration. The datasets were for wells in
Florida, United States of 10 years. Evaluating the performance of the models was by using
five measures, root mean squared error (RMSE), normalized mean square error (NMSE),
Nash-Sutcliffe efficiency coefficient (NS), correlation coefficient (R) and Akaike
information criteria (AIC). The conclusions proved the necessity and effect of considering
the surface water-groundwater in the management of water resources.
Liu et.al (2010) proposed a new short-term forecasting method based on the
methods of classical time series analysis and wavelet. The results demonstrated that the
proposed method:
1) was suitable for forecasting both the wind power series and the wind speed.
2) was strong in dealing with jumping data
3) was better than BP network method and classical time series method.
Page 50
36
The researchers (Hill, Connor, & Remus, 2015) compared neural networks
forecasts with different time series(annual, monthly and quarterly) forecasting methods.
Some of these methods were: Reference Average, a naive forecasting model, the graphical
based on human judgment and DE seasonalized Holt. The comparison aimed to determine
the method of the best forecasting accuracy performance on the basis of APE (absolute
percentage error). According to the comparison results, the neural networks was the best.
Pati & Shukla (2015) had experimentally verified the predictive performance of
three models: ANN, ARIMA and the Hybrid Model (ARIMA + ANN). These models were
applied to bug number of Debian version. A comparative analysis was presented of
forecasting the performance of these models. It was found that an (ARIMA +ANN) model
was the most appropriate for Debian bug number series, but the performance of ARIMA
model was low and poor in predicting the non-linear patterns.
Garima et.al(2017) used ETS (Exponential Smoothing) and ARIMA
(Autoregressive Integrated Moving Average) for analysis and predicting of weather
parameters. Some of these parameters were Humidity, Air Temperature, Wind Speed and
Rainfall. The accuracy was estimated by different criteria such as: MAE (Moving Absolute
Error), MASE (Moving Absolute Scaled Error), MAPE and RMSE by using different
packages in R. The methods which gave the best forecast will be used for
prediction.
Hassani et.al ( 2017) presented the forecasting comparison among several non-
parametric and parametric techniques e.g. the ARIMA, ETS, NN, TBATS, ARFIMA, MA,
WMA, SSA-R and SSA-V. They used TBATS and SSA-R models for tourist arrival
forecasting purposes. The results suggested that there is no model that its forecasting
accuracy consistently outperformed that of all other models for any of the countries under
any of the forecasting horizons and investigation.
3.3 Related Work Discussion
Figure)3–1): Summary of the Most Related Works to this Work
Page 51
37
Research Techniques Area /
datasets
Results
Evaluation
metrics
Short come
Medium-Term
Forecasting for
Municipal
Water Demand
and Revenue
(KhanYounis
City as A Case
Study)
ARIMA,
ARIMA
combined with
NN, SSA and
Linear
Regression
The datasets
about water
revenue and
water
consumption
collected from
Khan Younis
municipality
of the
Department of
Customer
Services.
Water
Revenue will
decrease about
3.8%, while
water
consumption
will increase to
8.4%.
used MAPE
measure for
selecting the
most accurate
algorithm
The research
didn’t include
improving of
resulted
MAPE values
before real
forecasting.
A Singular
Spectrum
Analysis
Technique to
Electricity Consumption
The researcher
compared
ARIMA model
and
Exponential
Smoothing
State Space
ETS model
and SSA
model. SSA
outperformed
the others for
electricity
consumption
forecasting
The monthly
electricity
consumption
of the Middle
Province in
Gaza
Strip\Palestine
the error came
by the SSA
technique was
smaller than
those obtained
by the ARIMA
and ETS state
space models
in which MAE
= 1.4158,
MSE =3.5604,
RMSE=1.8869
and
MAPE=0.0938
MAE, MSE,
RMSE and
MAPE.
Different
parameters
were tested,
the best
selection of
SSA
parameters of
electricity
consumption
time series is
that L = 40
and r = 7
The study
conducted
ARIMA and
ETS separately
without
hybridization
ARIMA with
ETS.
Structural
Time Series
Analysis
towards
Modeling and
Forecasting of
Ground Water
Fluctuations in
Murshidabad
District of
West Bengal
A “Structural
time series
model” with
(trend, cyclical
fluctuations,
seasonal
variations and
irregular)
components.
Then
Kalman filter
was used for
optimal
estimator of
the state at any
time,
Groundwater
data from 29
stations under
Murshidabad
district for four
months
January, May,
August and
November
during the
period from
2005 to 2013
The results
indicated the
groundwater
differences
among the
sites of
measurements
as well as
among the
seasons. This
due to the
difference in
groundwater
recharge
within a
season also.
R square,
RMSE, MAE
and MAPE
The research
did not
correlate
groundwater
forecasting
with rainfall
forecasting as
a source of
groundwater
recharge.
A comparison
of ARIMA,
Neural
Network and a
Hybrid
Technique for
Debian
ARIMA, ANN
and Hybrid
Model
(ARIMA+A)
The monthly
bug number
data collected
from Jan –
2000 to
Dec – 2013.
For Debian,
the hybrid
model was
good predictor
for Debian bug
number series
MAE, RMSE
and Average
Error per
Mean (Em)
The
comparison
did not include
MAPE
measure.
Page 52
38
Research Techniques Area /
datasets
Results
Evaluation
metrics
Short come
Bug Number
Prediction they had 168
monthly bug
counts
The
development
rainfall
forecasting
using kalman
filter
ARIMA and
Kalman Filter
method
They used
rainfall data of
Kabupaten
Jember. the
data is divided
into two parts.
The first one
began from
January 2005
to December
2015.another
part started
from January
2016 until
December
2016.
Kalman Filter
method is
better than
ARIMA model
for rainfall
forecasting,
the error of
Kalman Filter
method
smaller
than the error
of ARIMA
model.
RMSE
measure
selecting the
most
appropriate
algorithm
hadn't been by
computing the
MAPE for the
forecasted
results of
algorithms,
and then
selecting the
algorithm with
lowest MAPE
value for real
forecast.
3.4 Conclusion
From our view of related works of forecasting science, no one applied forecasting
to groundwater and rainfall separately in Gaza and then used the forecasting results for
estimating the relationship between the rainfall amounts - which feed the groundwater
resovir, and leads to reducing its salinity - and the groundwater production in any of the
previous research. We used some known forecasting algorithms: ARIMA, Hybrid ARIMA
(ARIMA + ETS, ARIMA + TBATS, ARIMA+ NN) and ETS as the most popular,
algorithms which give powerful results.
The forecasting algorithms were applied to datasets in Dear El-Balah city for future
prediction of rainfall and groundwater for the next years. Analyzing and linking the results
of predicting rainfall and groundwater based on historical data related to both of them will
help predicting the amounts of rain and groundwater. Accordingly, this will help predicting
the salinity state.
Page 53
39
Chapter4
Methodology and Model
Development
Page 54
40
Chapter 4
Methodology and Model Development
This chapter proposes the methodology for forecasting as a data mining technique
of both groundwater and rainfall. The chapter is divided into five sections: section one
introduces the methodology steps, section two contains the process of collection and
acquisition data, section three contains data preprocessing, section four presents selecting
models, and section five is dedicated to the implementation and evaluation of the models.
4.1. Methodology steps
The steps of our methodology are outlined as shown in Figure (4–1).
Page 55
41
Figure (4–1): Steps of propsed forecasting methodology
Data Collection: Two datasets for groundwater production and
rainfall amounts
Implementation: by applying the algorithms on each data set then evaluating the
algorithms for each dataset
Data splitting
Forecasting:
Applying the most accurate algorithm for real forecast for each:
Groundwater Dataset Rainfall Dataset
Training
Dataset
Testing
Dataset
ETS
ARIMA
ARIMA+NN
ARIMA+ETS
ARIMA + TBATS
Eval
uat
ion
Data Preprocessing: is applied on each data set separately
Data Integration Data Reduction Missing Values
Page 56
42
4.2. Data Collection
This study depends on two sets of historical data from different institutions in Dear
El-Balah city of the Gaza Strip. The first data set, groundwater production data, has been
collected from the Coastal Municipalities Water Utility (CMWU) since 2007. The second
one, rainfall data has been collected from the ministry of agriculture since 1985.Table
(4-1) and Table (4-2) show a sample of data set before pre-processing from the CMWU
and the Ministry, respectively.
Page 57
43
Table (4-1): Sample of groundwater dataset before preprocessing
Monthly operating records - Water Facilities in Middle Area
2017-( 1منطقة الوسطى عن شهر ) -القراءات التشغيلية والشهرية لمنشأت االمياه
No. Facility Name /No. Generator-مولد الكهرباء Pump Motor
Pump Hours
Generator Hours
Pump Ampere
Electricity consumption
Water Production
Days of
break done
Comments
ماتور Capacity Type اسم المنشأة ورقمها الرقم المضخة
ساعات تشغيل
المضخة
ساعات تشغيل المولد
أمبير المضخة
كمية المياه وات ساعة -كيلو المنتجة
عدد االيام
المتوقف فيه البئر
مالحظات
KVA hp hour hour A Kwhr m3 Day
Dear El Balah
1 J-146 200 ابو ناصر PERKINS 100 354 154 62 5069 33,660
2 S-69 20 ابو مروان PERKINS 100 312 172 65 3656 36,610
14,400 6681 42 0 275 ابو حمام 3
18,122 7005 45 0 300 40 _ _ بئر كفار دارووم 4
5 AL Aqsa 77 االقصى PERKINS 40 311 0 39 8510 18,660
6 J -32 200 التحلية PERKINS 40 317 14 48 135 0
7 K-20 65 1 بركة PERKINS 50 152 0 40 65 10,635
8 K-21 2بركة 65 PERKINS 50 193 11 40 399 18,285
12,780 25 213 بئر شوقي 9
10 Sahel4 4ساحل 80 PERKINS 30 366 32 39 7288 18,263
11 Sahel5 5ساحل _ _ 30 259 55 32 7299 22,140
12 Sahel6 6ساحل 80 PERKINS 30 340 4 40 6043 15,490
Page 58
44
The table contains different information for 12 wells that belong to Deir El- Balah
city in the middle area of Gaza. These columns are: Facility name/No, generator (capacity-
type), pump Ampere, generator Hours, Electricity consumption, water production, days
of break done and comments. In this research the main focus is on the Water Production
attribute that represents the wells production of groundwater.
Table (4-2): Sample of Rainfall data before preprocessing
1985-1986
Year
B Hanon
B LahiaL
Shati
Gaza city/Remal
Nussirate
Dr-Elbalah Date
10/11/1985 1985 0.0 0.0 1.0 0.0 0.0 0.0
10/19/1985 1985 0.0 0.0 0.5 9.0 0.0 0.0
10/20/1985 1985 6.5 7.5 6.7 0.0 0.0 0.0
11/9/1985 1985 1.0 2.0 1.5 0.0 0.0 0.0
11/10/1985 1985 0.0 0.0 0.5 0.0 0.0 0.0
11/11/1985 1985 0.0 2.5 0.0 0.0 0.0 0.0
11/30/1985 1985 0.0 5.5 4.6 2.0 2.5 0.0
12/3/1985 1985 3.5 2.0 2.7 3.0 6.0 2.5
12/15/1985 1985 5.0 5.0 1.3 3.0 5.5 1.0
12/17/1985 1985 1.0 3.0 1.0 0.0 0.0 0.0
12/18/1985 1985 12.5 11.0 11.5 12.5 12.5 13.5
12/19/1985 1985 3.0 4.0 2.0 10.5 10.0 12.0
12/22/1985 1985 15.5 12.0 11.6 3.0 13.0 16.0
12/25/1985 1985 0.0 0.5 0.5 0.0 0.5 0.5
12/26/1985 1985 16.0 13.5 16.1 17.5 25.0 44.0
12/27/1985 1985 5.8 7.5 9.7 16.8 6.7 9.3
12/28/1985 1985 5.7 7.5 9.8 16.7 6.8 9.2
1/8/1986 1986 0.0 0.0 1.5 1.0 0.0 0.5
1/11/1986 1986 0.0 0.0 3.2 0.0 0.0 0.0
1/12/1986 1986 21.0 36.0 29.0 17.0 1.5 0.0
1/15/1986 1986 15.0 16.5 20.4 3.5 6.0 7.0
1/18/1986 1986 2.5 2.5 1.0 2.5 4.5 4.5
1/19/1986 1986 2.5 4.0 3.5 2.0 3.0 1.5
1/31/1986 1986 0.0 0.0 1.0 0.0 0.0 0.0
2/3/1986 1986 0.0 1.0 0.0 0.0 0.0 1.5
2/4/1986 1986 0.5 3.0 0.1 2.5 4.0 6.5
2/6/1986 1986 0.5 2.0 1.0 1.5 1.0 1.0
2/8/1986 1986 10.0 8.5 8.5 1.5 4.0 1.5
2/9/1986 1986 6.5 5.0 6.4 7.0 5.0 6.0
2/13/1986 1986 5.0 10.0 1.7 0.0 0.0 0.0
2/14/1986 1986 0.0 41.0 20.0 18.0 9.0 0.0
2/15/1986 1986 35.0 9.0 6.5 17.0 15.5 28.5
2/24/1986 1986 6.5 1.5 0.0 4.0 6.5 6.0
3/29/1986 1986 0.0 2.0 1.5 0.0 0.0 0.0
Page 59
45
1985-1986
Year
B Hanon
B LahiaL
Shati
Gaza city/Remal
Nussirate
Dr-Elbalah Date
3/30/1986 1986 1.0 1.5 1.7 2.5 0.0 1.0
4/1/1986 1986 0.0 3.0 0.0 0.0 0.0 0.0
4/2/1986 1986 11.0 5.5 6.5 15.5 16.0 18.0
4/7/1986 1986 16.0 9.5 13.5 30.5 30.5 35.5
5/2/1986 1986 0.0 0.0 1.0 2.5 1.0 4.0
5/3/1986 1986 7.0 13.0 23.0 6.0 8.5 9.0
Sum of 85/86 215.5 258 232 228.5 204.5 240
As shown in Table (4-2), the rainfall data set contains the monthly amounts of
rainfall for all cities in Gaza (e.g. Jabalia, Nussirate, Dear El-Balah) what matters in this
research is the dataset of Dear el- Balah city.
The related tables and columns that belong to our study were selected and prepared
for the next phase.
4.3. Data Preprocessing
At this essential step, the required preprocessing tasks were applied to enhance
data efficiency before applying the forecasting algorithms. Preprocessing include several
techniques e.g. cleaning, reduction, transformation and integration. Microsoft Excel 2016
version is used to perform the following data preprocessing:
4.3.1. Data integration
Data integration is achieved by combining data from multiple datastores
gained from either (CMWU) or MoA into one consistent dataset. So, we merged
the CMWU data through the common ID and have a data set of 12 attributes as in
Page 60
46
Table (4-3). The number of wells belonging to the (CMWU) in Deir El-Balah area
is 12 wells, these wells provide the city with water needs. The first attribute represents the
ID of each well, the second, is the well's name, the next 12 columns are for the monthly
production amounts of each well and the last attribute is for the annual well production.
Also, we combined the gained MoA data from multiple datastores into one consistent
dataset as in Table (4-4). It contains: (date, year and the amounts of rains) attributes.
Page 61
47
Table (4-3): Sample of combining wells production in new data set
اسم البئر الرقم
االجمالي 12-االنتاج 11-االنتاج 10-االنتاج 9-االنتاج 8-االنتاج 7-االنتاج 6-االنتاج 5-االنتاج 4-االنتاج 3-االنتاج 2-االنتاج 1-االنتاج
No.
1 J-146 680,800 53,370 52,540 60,350 58,380 55,670 56,050 65,100 60,980 58,340 55,100 47,370 57,550 ابو ناصر
2 S-69 558,780 46,190 55,040 55,010 54,310 46,730 41,760 44,350 45,420 43,230 42,710 39,520 44,510 ابو مروان
3 T-46 30,000 0 0 0 0 0 0 0 0 0 0 0 30,000 ايو حمام
414,528 36,182 33,267 34,566 35,376 39,951 42,668 38,213 34,519 34,286 40,510 26,780 18,210 بئر كفار دارووم 4
5 AL A qsa 604,680 47,160 47,930 54,430 53,580 53,110 54,120 53,490 55,730 53,670 49,310 32,980 49,170 االقصى
6 J -32 659,860 40,990 45,210 53,850 53,130 60,420 54,450 68,870 58,620 33,452 83,778 51,430 55,660 التحلية
7 K-20 584,542 50,643 50,643 50,643 45,492 48,531 46,733 48,211 52,311 45,914 53,270 42,931 49,220 1 بركة
8 K-21 2بركة 5,456 49,630 56,560 56,130 58,420 57,480 57,640 56,660 55,440 54,640 47,660 38,110 593,826
9 Sahel3 3ساحل 41,097 35,384 40,466 34,916 35,981 39,604 38,740 40,196 27,974 39,597 16,466 2,716 393,137
10 Sahel4 4ساحل 32,080 32,120 39,830 35,860 41,690 41,110 38,800 40,060 37,440 39,180 38,830 40,210 457,210
11 Sahel5 5ساحل 25,020 28,400 29,540 27,540 30,710 29,450 27,850 30,290 29,340 12,970 12,826 12,804 296,740
12 Sahel6 6ساحل 790 1,000 1,060 6,430 6,720 6,320 15,910 19,210 28,880 31,010 33,010 32,110 182,450
Page 62
48
Table (4-4): Sample of combining rain amounts of Dear El- Balah city in new data set
1985-1986 Year
Dr-Elbalah Date
10/11/1985 1985 0.0
10/19/1985 1985 0.0
10/20/1985 1985 0.0
11/9/1985 1985 0.0
11/10/1985 1985 0.0
11/11/1985 1985 0.0
11/30/1985 1985 0.0
12/3/1985 1985 2.5
12/15/1985 1985 1.0
12/17/1985 1985 0.0
12/18/1985 1985 13.5
12/19/1985 1985 12.0
12/22/1985 1985 16.0
12/25/1985 1985 0.5
12/26/1985 1985 44.0
12/27/1985 1985 9.3
12/28/1985 1985 9.2
1/8/1986 1986 0.5
1/11/1986 1986 0.0
1/12/1986 1986 0.0
1/15/1986 1986 7.0
1/18/1986 1986 4.5
1/19/1986 1986 1.5
1/31/1986 1986 0.0
2/3/1986 1986 1.5
2/4/1986 1986 6.5
2/6/1986 1986 1.0
2/8/1986 1986 1.5
2/9/1986 1986 6.0
2/13/1986 1986 0.0
2/14/1986 1986 0.0
2/15/1986 1986 28.5
2/24/1986 1986 6.0
3/29/1986 1986 0.0
3/30/1986 1986 1.0
4/1/1986 1986 0.0
4/2/1986 1986 18.0
4/7/1986 1986 35.5
5/2/1986 1986 4.0
5/3/1986 1986 9.0
Sum of 85/86 240
Page 63
49
4.3.2. Data reduction:
In this step we reduced the representation of data set in smaller volumes, we
removed irrelevant attributes (attribute subset selection). The result of this step is two data
sets on the form of time series for groundwater and rainfall. Table (4-5) represents a
sample of groundwater data set with two columns: the first column "Month" represents
the time using the format of (mmm-yy), and the second column "amount(m3)" represents
the amount of groundwater.
Table (4-5): Sample of groundwater time series data set
Month Amount (m3)
Jan-08 339957
Feb-08 371711
Mar-08 396483
Apr-08 398139
May-08 398812
Jun-08 390183
Jul-08 415331
Aug-08 415165
Nov-08 352150
Dec-08 317989
Jan-09 299446
Feb-09 319041
Mar-09 372687
Jul-09 399822
Aug-09 423123
And Table 4-6) represents the rainfall data set in two columns: the year and the
amount (quantity of rainfall):
Table 4-6): Sample of Rain time series data set
Year Amount (mm)
1985 108
1986 626.4
1987 130.2
Page 64
50
Year Amount (mm)
1992 419.3
1993 209.6
1994 683
2002 435.7
4.3.3. Missing values
We estimated the missing value of (Jun 2013) in a groundwater dataset using the
average values of the same month from the three previous years respectively: (Jun 2010,
Jun 2011, Jun 2012) as shown in Table (4-7). This is because the monthly amount of
rain in June 2013 is similar to the amounts of rain in the same month of the three previous
years.
Table (4-7): Filling missing value in JUN-13
Month Amount (m3)
Feb-13 331622
Mar-13 405257
Apr-13 465403
May-13 417558
Jun-13 436912
Jul-13 474883
Aug-13 513654
Sep-13 484779
After preprocessing and preparing the real datasets on the form of time series data,
we gained two data sets. The first is the monthly production of wells from January-2008
to December 2017. It consists of two columns (Month, Amount (m3)) which is presented
in appendix: B and declared in Figure (4–2). The second data set of the annual rainfall
amounts from 1985 to 2017, It consists of two columns (Year, Amount(mm)) as in
appendix: a and shown in Figure (4–3).
Page 65
51
Figure (4–2): Monthly wells production data of groundwater
Figure (4–3): Annual rainfall amounts
100000
150000
200000
250000
300000
350000
400000
450000
500000
550000
Jan
-08
Jun
-08
No
v-0
8
Ap
r-0
9
Sep
-09
Feb
-10
Jul-
10
De
c-1
0
May
-11
Oct
-11
Mar
-12
Au
g-1
2
Jan
-13
Jun
-13
No
v-1
3
Ap
r-1
4
Sep
-14
Feb
-15
Jul-
15
De
c-1
5
May
-16
Oct
-16
Mar
-17
Au
g-1
7
Amount (m3)
0
100
200
300
400
500
600
700
800
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
20
13
20
14
20
15
20
16
20
17
Amount (mm)
Page 66
52
4.4. Selected Models
In this thesis we used five models by different algorithms ARIMA, ETS and three
hybrid ARIMA models: ARIMA combined with Neural Network, ARIMA combined with
ETS and ARIMA combined with TBATS. The models were described in details in section
2.7.
We combined these hybrid models for prediction using R package forecastHybrid.
Each model is represented by a character string of any combination of a, e, n, and t for
auto.arima, ETS, nnetar, and TBATS respectively.
We selected these models because they are efficient and common (Dalinina, 2017)
and the experimental results with real data sets showed the effectiveness of the combined
model to improve forecasting accuracy, make up a high-performing forecasting method,
give truer prediction than any individual model’s prediction and to take advantage of the
unique strength of ARIMA and other linear and nonlinear models. (Khashei & Bijari,
2011).
4.5. Implementation
After preparing the two datasets on the form of time series data, five types of
forecasting algorithms were applied. According to their results, the appropriate algorithm
with the lowest MAPE value will be nominated to implement the actual forecast.
4.5.1. Tools
Different tools used to build models and test the five algorithms are:
• RapidMiner Studio
• R project
• Excel 2016.
4.5.2. Steps of applying models
The original wells data is divided into two parts as shown in Table (4-8) and in Figure
(4–4).
Page 67
53
Table (4-8): Training and testing data periods of datasets
Figure (4–4): Splitting orginal datasets into training and testing
1- It is separated into training and testing data sets. The training set of wells belongs to
the period from January-2008 to December -2015 which represents (80 % of the
original data) and testing set belongs to the period from January 2016 to December -
2017 which presents (20% of the original data). Also, we divided the rain data into two
parts, training and testing sets. The training set of rain belongs to the period from 1985
to 2010 which represents (78.7 % of the original data) and testing set belongs to the
period from 2011 to 2017 (21.2% of the original data).
2- We applied the five algorithms on wells training set with horizon value (24). We
determined this value to reach the last record in testing set (December 2017). Then, we
computed the MAPE measure of the actual values in testing set and predicted values.
Data sets Training set Testing set
Wells data Jan-2008 to Dec-2015
(96 months)
Jan -2016 to Dec-2017
(24 months)
Rain data 1985 to 2010
(26 years)
2011-2017
(7 years)
Rain dataset (32 years-384 months) Groundwater dataset (120 months- 10years)
Page 68
54
3- We ran again the five algorithms on the training set of rain data with horizon value
(7) to reach the last record in testing set (2017). As in the previous step we also computed
the MAPE measure between the actual values in testing set and predicted values.
4- We evaluated the algorithms by comparing MAPE's of algorithms to choose the most
accurate algorithm for real forecasting. The forecasting of rain amounts and
groundwater production separately, according to the lowest MAPE value(Agrawal et.al
2016).
4.5.3. Evaluation
We evaluated the model accuracy by calculating the mean absolute percentage
error MAPE:
𝐌𝐀𝐏𝐄 =𝟏
𝑵∑ |
𝑨𝒄𝒕𝒖𝒂𝒍𝒊−𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅𝒊
𝑨𝒄𝒕𝒖𝒂𝒍𝒊| 𝟏𝟎𝟎%𝑵
𝒊=𝟏 (Equation 4-1)
Where Actuali and Predicted i are the actual and predicted(n) step ahead value at
ith sample and (N) is the total number of observations.
According to our calculations and comparing MAPEs for applying (ARIMA,
HYBRID ARIMA, ETS) on wells data and rain data separately, we found that ARIMA+
NN was the lowest MAPE value. Accordingly, it is the most accurate algorithm of rain
forecasting. We also found that (ARIMA+ TBATS) was the lowest MAPE value, and
hence, the most accurate algorithm of groundwater forecasting.
4.6. Summary
In this chapter, the methodology of this work was presented starting from
collecting the data from different institutions to determining the most accurate algorithm
for real forecast. After collecting the data, data were pre-processed and we gained time
series data, then we divided data into training and testing sets to build the models after
applying five forecasting algorithms using several tools. Our model is tested using MAPE.
The next chapter presents and discusses the results of our research.
Page 69
55
Chapter5 Experimental Results and
Discussion
Page 70
56
Chapter 5
Experimental Results and Discussion
In this chapter we present the environment of the experiments and description of
the datasets. Also, the evaluation was conducted on the forecasting algorithms of time
series historical data. The evaluation aimed to decide which algorithm is the most accurate
among Auto Regressive Integrated Moving Average (ARIMA), (Hybrid ARIMA):
ARIMA combined with Neural Networks, ARIMA combined with ETS and ARIMA
combined with TBATS and ETS.
These algorithms are the most famous and recent ones according to previous
research on forecasting process of time series data, because the data of both rain and
groundwater are complex and nonlinear phenomenon. Here, we apply five algorithms on
our datasets, then calculate the (MAPE). The algorithm that achieved the lowest MAPE
value will be the most accurate one to be used in forecasting for the next five years.
5.1. Experiment sets
We performed our forecasting process and data preprocessing on a PC machine with
the following specification:
1. Operating System: Windows 10 Education 64-Bit
2. Processor: Intel® Core™ i53230M CPU @2.40GHz (4 CPUs)
3. RAM: 8GIGA RAM
We used three software tools:
1- Microsoft Excel 2016:
A spreadsheet was developed by Microsoft for Windows, Mac OS X, and iOS.
2- RapidMiner Studio 7.6.001
It is a data science software platform developed by the company of the same name
that provides an integrated environment for learning, data , text mining, deep
learning, and predictive analytics.
3- R studio
Page 71
57
It is open-source and a free (IDE) for R(a programming language) it is
used for graphics and statistical computing . R Studio is available in open source and
runs on the desktop (macOS, Linux, and Windows) or in a browser connected to RStudio
Server Pro or RStudio Server.
5.2. Data set
In this research, we used two real data sets. The first one collected from the
CMWU in Dear El- Balah of Gaza. It presents the monthly groundwater production as for
the second data set, it was collected from the MoA. It presents the annual rain amounts.
After preprocessing the datasets as shown in the previous chapter, the original rain
data set consists of two columns (Year and amount (mm)). The rain data set contains data
of almost 32 years (~32 years, about 384 months). It was splitted into a training set,
containing the data of the first 26 years (from 1985 to 2010) and a testing set holding the
remaining data which represents data of 7 years (from 2011 to 2017); about 84 months.
Building the models was by running algorithms on the training sets, while the testing set
was unseen. A sample of the training set of rain data for both attributes (Year and
Amount(mm)) was given in Table (5-1), and a sample of testing set for the same data and
the same attributes (Year and Amount(mm)) was given as in Table (5-2).
Table (5-1): Sample of the training set of rain data from 1985 to 2010
Year Amount (mm)
1985 108
1986 626.4
1987 130.2
1988 344.5
1989 264.5
1990 203.4
1991 497.7
1992 419.3
2006 389.5
2007 219.5
2008 281
2009 220
2010 131
Page 72
58
Table (5-2): Sample of the testing set of rain data from 2011 to 2017
Year Amount (mm)
2011 385.5
2012 255.5
2013 375.5
2014 216
2015 514
2016 298.5
2017 241.2
Another data set of groundwater consists of two columns (Month and amount m3).
It contains almost 120 months from January 2008 to December 2017. It was splitted into
training set, containing the first 96 months of data from January 2008 to December
2015(96 months), and testing set representing the remaining data from January 2016 to
December 2017 (~24 months). Building the models was by running algorithms on the
training sets (actual data), while testing data were unseen (predicted data). A sample of
training set for both attributes (Month and Amount(m3)) was given in Table (5-3), and a
sample of testing set of wells data was given for the same attributes (Month and
Amount(m3)) as in Table (5-4).
Table (5-3): Sample of the training set of wells data from Jan 2007 to Dec 2017
Month Amount (m3)
Jan-08 339957
Feb-08 371711
Feb-15 265190
Mar-15 296940
May-15 371687
Jun-15 144988
Jul-15 374003
Aug-15 383085
Sep-15 385764
Oct-15 402146
Nov-15 334845
Dec-17 290019
Page 73
59
Table (5-4): Sample of testing set of wells data from Jan 2016 to Dec 2017
Month Amount (m3)
Jan-16 293112
Feb-16 258263
Mar-16 339185
Apr-16 318414
May-16 383726
Jun-17 322346
Jul-17 498978
Aug-17 377795
Sep-17 391049
Oct-17 386677
Nov-17 328756
Dec-17 332391
5.3. Evaluation forecasting algorithms
We applied five forecasting algorithms: ARIMA, Hybrid ARIMA:(ARIMA+NN),
(ARIMA+ ETS) and (ARIMA+TBATS), and the ETS on the training set of rain data with
horizon value 7 to reach 2017(the last record in the rain testing set), and on the training
set of wells data with horizon value 24 to reach December 2017(the last record in the
groundwater testing set). Then the evaluation process of the performance of these
algorithms will be done on the "amount" attribute available for each data set. It was done
by computing the MAPE for the results of each algorithm. The MAPE value of the
predicted and actual values was computed according to the following equation:
𝐌𝐀𝐏𝐄 =𝟏
𝑵∑ |
𝑨𝒄𝒕𝒖𝒂𝒍𝒊−𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅𝒊
𝑨𝒄𝒕𝒖𝒂𝒍𝒊| 𝟏𝟎𝟎%𝑵
𝒊=𝟏 (Equation 5-2)
Finally, we compared the results of the MAPE values of the forecasting algorithms.
The algorithm with the lowest MAPE value will be the most accurate one, so it will be
selected for the real forecast.
5.3.1. Evaluating algorithms over Amount attribute in rain data set
We evaluated the results of the selected five algorithms of the attribute ‘amount m3'
and computed the MAPE for each algorithm.
Page 74
60
5.3.1.1. ARIMA Evaluation
ARIMA: is a model of a basic forecasting technique, it is an abbreviation for 'Auto
Regressive Integrated Moving Average'. An ARIMA model is specified by these three
order parameters: (p, d, q).
P: determines the number of Auto Regressive terms in the model.
q: determines the number of Moving Average terms in the model.
d: determines the number of differentiations applied on the Time Series values.
To apply ARIMA algorithm, we ran the following process in RapidMiner and used
“Arima Trainer” Operator. We conducted different experiments with different values of
(p,d,q) parameters and we computed the MAPE value in each experiment. The best
performance of prediction was of the following values:
(p=1, d=0, q=1).
Figure (5–1): Applying ARIMA Process by RapidMiner tool
Figure (5–1)shows the process of applying ARIMA algorithm. It consists of three
components.
‘Read Excel’: to load data set from Microsoft Excel spreadsheets.
‘ARIMA Trainer’: to train the ARIMA model over the dataset.
‘Apply Forecast’: to forecast the selected attribute according to the determined
horizon value.
Page 75
61
Figure (5–2): ARIMA Evaluation for rain amounts (Actual vs Predicted).
Figure (5–2) shows that there is a clear variation in the representation of the
actual and predicted values. ARIMA gives predicted values in almost a horizontal line
because the data in non-stationary and closed to linear. We computed the MAPE for
ARIMA, it was 22.7%.
5.3.1.2. Hybrid ARIMA (ARIMA with Neural Network (NN)) Evaluation
Evaluation Hybrid ARIMA (ARIMA with Neural Network (NN)) is explained as in
Figure (5–3).
Figure (5–3): Evaluation of rain data using (ARIMA with NN) by R code
As shown in the previous figure, R code presents ARIMA combined with neural
network. In line 6, we used (models=’an’) attribute in which ‘a’ refers to ‘auto.arima’
model and ‘n’ refers to Neural Network. We determined the horizon value =7 as in line7
(h=7).
0
500
1000
2011 2012 2013 2014 2015 2016 2017
Rain Prediction with ARIMA
Actual ARIMA
Page 76
62
Figure (5–4): (ARIMA+NN) Evaluation for rain amount (Actual vs Predicted)
Figure (5–4) shows that Hybrid ARIMA gives predicted values which are almost
close to the actual value in the period from 2011 to 2012. After 2012, the representation
of ARIMA is closed to Linear pattern because of the nature of data. After computing
MAPE we found it equals 21.0%.
5.3.1.3. Hybrid ARIMA (ARIMA with (ETS)) Evaluation
In this evaluation task we combined ARIMA with Exponential Smoothing State Space
Model (ETS).
Figure (5–5): Evaluation of rain data using (ARIMA + ETS) R code
0
200
400
600
2011 2012 2013 2014 2015 2016 2017
Rain Prediction with (ARIMA+NN)
Actual ARIMA+ NN
Page 77
63
According to Figure (5–5) R code presents ARIMA combined with (ETS). In line 6
we used (models=’ae’) attribute, in which ‘a’ refers to ‘auto.arima’ model and ‘e’ refers
to exponential Smoothing State Space Model. We determined the horizon value =7 as in
line7 (h=7).
Figure (5–6): (ARIMA+ ETS) evaluation for rain amount (Actual vs Predicted)
In Figure (5–6) we can see that Hybrid ARIMA :(ARIMA+ ETS) gives predicted
values. It is clear that there is no convergence or symmetry between predicted values and
the actual values from the period 2011 to 2017. The representation looks like ETS and
that’s due to the non-stationary data in which the algorithms were applied. After
computing MAPE, we found it equals 23.8%.
5.3.1.4. Hybrid ARIMA (ARIMA with (TBATS)) Evaluation
In this evaluation task we combined ARIMA with (Exponential Smoothing State
Space Model with Box-Cox Transformation, ARMA Errors, Trend And Seasonal
Components) (TBATS).
Figure (5–7): Evaluation of rain data using (ARIMA +TBATS) R code
0
200
400
600
2011 2012 2013 2014 2015 2016 2017
Rain Prdiction with( ARIMA+ETS)
Actual ARIMA+ ETS
Page 78
64
According to Figure (5–7), R code presents ARIMA combined with (TBATS)
model by using (models=’at’) attribute in line 6, in which ‘a’ refers to ‘auto.arima’ model
and ‘t’ refers to TBATS Model. We determined the horizon value =7 as in line7 (h=7).
Figure (5–8): (ARIMA+ TBATS) evaluation for rain amount (Actual vs Predicted)
In Figure (5–8) we can see that Hybrid ARIMA: (ARIMA+ TBATS) gives predicted
values which do not match the actual value in the period from 2011 to 2017. The two
curves are not compatible, but it looks like (ARIMA+NN) representation. After computing
MAPE we found it equals 21.2%.
5.3.1.5. Exponential Smoothing (ETS) Evaluation
ETS is commonly and a basic used type of predictive analysis. It is a statistical
measure that used to calculate or predicts a future value based on existing (historical)
values by using the Exponential Smoothing (ETS) algorithm. The predicted value is a
continuation of the historical values in the specified target date, which should be a
continuation of the timeline. This function can be used to predict future sales, inventory
requirements, or consumer trends.
This function requires the timeline to be organized with a constant step between
the different points. For example, that could be a monthly timeline with values on the 1st
of every month, a yearly timeline, or a timeline of numerical indices. For this type of
timeline, it’s very useful to aggregate raw detailed data before applying the forecast, which
produces more accurate forecast results as well.
0
200
400
600
2011 2012 2013 2014 2015 2016 2017
Rain Prediction with (ARIMA+TBATS)
Actual ARIMA+ TBATS
Page 79
65
We used MS-EXCEL 2016 software to run ETS algorithm.
Figure (5–9): Evaluation of ETS for rain amounts (Actual vs Predicted)
According to Figure (5–9), we can see that the values of predictive values by ETS
are below the actual values. We computed MAPE value and found it 24.4%.
5.3.2. Comparing the Methods performance of ‘rain amount’
As explained before, the algorithm with the lowest MAPE value will be the most accurate
algorithm. As a result, it will be selected for real forecast. Table (5-5) shows the
compared values of MAPE.
Table (5-5): Comparing Methods MAPE over ‘Rain amount'.
Algorithm %MAPE
ARIMA 22.7
ARIMA+NN 21.0
ETS 24.4
ARIMA+ ETS 23.8
ARIMA+ TBATS 21.2
0
200
400
600
2011 2012 2013 2014 2015 2016 2017
Rain Prediction with ETS model
Actual ETS
Page 80
66
Figure( 5–10): MAPE percentages for forecasting algorithms
According to the MAPE results of the five applied algorithms as in Table (5-5)
and Figure( 5–10) , we found that Hybrid ARIMA(ARIMA +NN) with MAPE of 21.0. is
the most accurate algorithm and will be selected for future forecasting. The lowest MAPE
value of this algorithm means that the forecasting values are close to the actual value (in
the testing set), while the ETS is out of consideration because it has the highest MAPE
(24.4%). This means the forecasting values of predicting set are far away from data of the
testing set.
We conclude from the results of evaluating the forecasting algorithms of rain
amounts that ARIMA combined with NN is the appropriate algorithm which will be used
for future forecast on rain data in our case.
All the above work is conducted and applied again on the attribute ‘amounts’ of
groundwater dataset to evaluate the five algorithms and choose the most appropriate
algorithm for future groundwater forecast.
5.3.3. Algorithms evaluation over groundwater production amounts
We applied the five forecasting algorithms on the rain amounts attribute, and after the
evaluation we found that hybrid ARIMA (ARIMA +NN) was the most accurate algorithm
to be used in real rain amounts forecasting. The previous work was repeated on the
attribute ‘amounts’ of monthly wells production dataset.
22.7
21.0
24.423.8
21.2
19.0
20.0
21.0
22.0
23.0
24.0
25.0
ARIMA ARIMA+ NN ETS ARIMA+ ETS ARIMA+ TBATS
Algorithms MAPE
Page 81
67
5.3.3.1. ARIMA Evaluation
Figure (5–11): ARIMA evaluation for wells production (Actual vs Predicted)
The computed MAPE was 20.6% for this algorithm.
5.3.3.2. Hybrid ARIMA(ARIMA+NN) Evaluation
Figure (5–12): Hybrid ARIMA (ARIMA+NN) Evaluation for wells production (Actual vs
Predicted).
As we notice there is a convergence in the two curves: predicted and actual. After
computing MAPE we found it is 10.3%.
0
200000
400000
600000
Jan
-16
Feb
-16
Ma
r-1
6
Ap
r-1
6
Ma
y-1
6
Jun
-16
Jul-
16
Au
g-1
6
Sep
-16
Oct
-16
No
v-1
6
De
c-1
6
Jan
-17
Feb
-17
Ma
r-1
7
Ap
r-1
7
Ma
y-1
7
Jun
-17
Jul-
17
Au
g-1
7
Sep
-17
Oct
-17
No
v-1
7
De
c-1
7
The prediction of wells production with (ARIMA+NN)
Actual ARIMA+ NN
0
200000
400000
600000Ja
n-1
6
Feb
-16
Ma
r-1
6
Ap
r-1
6
Ma
y-1
6
Jun
-16
Jul-
16
Au
g-1
6
Sep
-16
Oct
-16
No
v-1
6
De
c-1
6
Jan
-17
Feb
-17
Ma
r-1
7
Ap
r-1
7
Ma
y-1
7
Jun
-17
Jul-
17
Au
g-1
7
Sep
-17
Oct
-17
No
v-1
7
De
c-1
7
The prediction of wells production with (ARIMA)
Actual ARIMA
Page 82
68
5.3.3.3. Hybrid ARIMA(ARIMA+ETS) Evaluation
Figure (5–13): (ARIMA+ETS) Evaluation for wells Production (Actual vs Predicted)
After computing MAPE we found it is 9.1%.
5.3.3.4. Hybrid ARIMA(ARIMA+TBATS) Evaluation
Figure (5–14): (ARIMA+TBATS) evaluation for wells Production (Actual vs
Predicted).
After computing MAPE we found it is 8.9%.
0
200000
400000
600000Ja
n-1
6
Feb
-16
Ma
r-1
6
Ap
r-1
6
Ma
y-1
6
Jun
-16
Jul-
16
Au
g-1
6
Sep
-16
Oct
-16
No
v-1
6
De
c-1
6
Jan
-17
Feb
-17
Ma
r-1
7
Ap
r-1
7
Ma
y-1
7
Jun
-17
Jul-
17
Au
g-1
7
Sep
-17
Oct
-17
No
v-1
7
De
c-1
7
The prediction of wells production with (ARIMA+ETS)
Actual ARIMA+ ets
0
200000
400000
600000
Jan
-16
Feb
-16
Ma
r-1
6
Ap
r-1
6
Ma
y-1
6
Jun
-16
Jul-
16
Au
g-1
6
Sep
-16
Oct
-16
No
v-1
6
De
c-1
6
Jan
-17
Feb
-17
Ma
r-1
7
Ap
r-1
7
Ma
y-1
7
Jun
-17
Jul-
17
Au
g-1
7
Sep
-17
Oct
-17
No
v-1
7
De
c-1
7
The prediction of wells production with (ARIMA+TBATS)
Actual ARIMA+ tbats
Page 83
69
5.3.3.5. Exponential Smoothing (ETS) Evaluation
Figure (5–15): ETS evaluation for wells production (Actual vs Predicted)
After computing MAPE we found it 12.6%.
5.3.4. Comparing Methods accuracy over ‘Wells Production’
Table (5-6)Algorithms performance (accuracy) over wells Production.
Algorithm MPE%
ARIMA 25.9
ARIMA+NN 11.4
ETS 12.6
ARIMA+ ETS 9.1
ARIMA+ TBATS 8.9
Figure (5–16): MAPE percentages for forecasting algorithms for monthly amounts of
wells’ production.
0
200000
400000
600000Ja
n-1
6
Feb
-16
Mar
-16
Ap
r-1
6
May
-16
Jun
-16
Jul-
16
Au
g-1
6
Sep
-16
Oct
-16
No
v-1
6
De
c-1
6
Jan
-17
Feb
-17
Mar
-17
Ap
r-1
7
May
-17
Jun
-17
Jul-
17
Au
g-1
7
Sep
-17
Oct
-17
No
v-1
7
De
c-1
7
The prediction of wells production with (ETS)
Actual ETS
25.9
11.412.6
9.1 8.9
0.0
5.0
10.0
15.0
20.0
25.0
30.0
ARIMA Hybrid ARIMA ETS ARIMA+ ETS ARIMA+ TBATS
Algorithms MAPE Over Groundwater Amount Attribute
Page 84
70
As we noticed from Table (5-6) and Figure (5–16) of the MAPE results in the
evaluation process of the monthly wells data of the four methods, ARIMA combined with
TBATS is the most appropriate algorithm with the lowest MAPE value(8.9%) .This
means the forecasting values of training set are close to data of the testing set(actual
values), and hence this algorithm will be the most accurate. As we also noticed that
ARIMA has a high MAPE value (20.6%), so it will be the weakest in performance. This
means the forecasting values of training set are far away from actual data of the testing
set.
The resulted MAPE values obtained by applying the forecasting algorithms in Table
(5-6) are not small. This is because the volatility of data is high as shown in Figure( 5–
17). So, we re-divided the monthly data into semi-annual data as in Table( 5-7) to improve
its volatility. It became less volatility as demonstrated in Figure( 5–18). Then we reapplied
the algorithms on the re-divided data by splitting the data into 70% training set and 30%
testing set. The training set represents the period from 2008 to 2014 and the testing set
represents the period from 2015 to 2017.
Table( 5-7): dividing the data into semi-annual data
Month Amount (m3)
Jan-08 2295285
Jul-08 2317733
Jan-09 2273525
Jul-09 2439755
Jan-10 2175580
Jul-10 2597673
Jan-11 2691509
Jul-11 2765044
Jan-12 2183688
Jul-12 2490211
Jan-13 2422760
Jul-13 2525213
Jan-14 2098198
Jul-14 1971355
Jan-15 1725542
Jul-15 2169862
Page 85
71
Month Amount (m3)
Jan-16 1970828
Jul-16 2321393
Jan-17 2042940
Jul-17 2315646
Figure( 5–17): data representation of groundwater (before re-dividing)
Figure( 5–18): Semi-annual data representation of groundwater (after re-dividing)
We reapplied the five forecasting algorithms on the (amount) attribute as in
appendix: H. Then reevaluating them by computing MAPE as in appendix: I.
After the evaluation process of the semi-annual groundwater data, the MAPE
values for the five algorithms have improved as shown in Table (5-8). When comparing
15000001700000190000021000002300000250000027000002900000
Jan
-08
Jun
-08
No
v-0
8
Ap
r-0
9
Sep
-09
Feb
-10
Jul-
10
De
c-1
0
May
-11
Oct
-11
Mar
-12
Au
g-1
2
Jan
-13
Jun
-13
No
v-1
3
Ap
r-1
4
Sep
-14
Feb
-15
Jul-
15
De
c-1
5
May
-16
Oct
-16
Mar
-17
Amount (m3) After re-dividing
100000
200000
300000
400000
500000
600000
Jan
-08
Jun
-08
No
v-0
8
Ap
r-0
9
Sep
-09
Feb
-10
Jul-
10
De
c-1
0
May
-11
Oct
-11
Mar
-12
Au
g-1
2
Jan
-13
Jun
-13
No
v-1
3
Ap
r-1
4
Sep
-14
Feb
-15
Jul-
15
De
c-1
5
May
-16
Oct
-16
Mar
-17
Au
g-1
7
Amount (m3) Before re-dividing
Page 86
72
MAPE results for the same algorithms previously applied on monthly groundwater data
shown in Table (5-6), it gave smaller MAPE values and hence better forecasting results.
Table (5-8): Algorithms MAPE over wells’ production.
Algorithm MAPE%
ARIMA 4.9
ARIMA +NN 9.2
ETS 5.2
ARIMA+ ETS 7.7
ARIMA+ TBATS 6.9
Table (5-8), represents the algorithms performance (accuracy) over wells’
Production. We found that ARIMA was the most accurate algorithm with the lowest
MAPE value 4.9%, so ARIMA was selected for semi-annual forecast of groundwater
production amounts. (ARIMA+ NN) was excluded because it is the highest MAPE value
and will be the worst in the forecasting. ETS value of 5.2% it is close to ARIMA, but
because of the nature of groundwater data, this algorithm is inappropriate for time series
work.
Figure (5–19): MAPE percentages for forecasting algorithms for semi-annual amounts
of wells’ production.
4.9
9.2
5.2
7.76.9
0.0
2.0
4.0
6.0
8.0
10.0
ARIMA ARIMA +NN ETS ARIMA+ ets ARIMA+ tbats
Algorithms MAPE
Page 87
73
5.4. Forecasting the rain amounts and groundwater production in Dear El-
Balah
After the evaluation process we choose (ARIMA combined with NN) and
(ARIMA) for real forecasting tasks of rain amounts and wells’ production of groundwater
respectively. We forecasted rain amounts and groundwater production over (amount)
attribute for each data set separately for Dear El- Balah city for the next 5 years. The
results are illustrated in Figure (5–20). After that, we forecasted the groundwater
production amounts for the same period and area using (ARIMA) as shown in Figure (5–
21).
Figure (5–20): Five years forecasting Rain amounts for Dear El-Balah city using (ARIMA+NN)
Figure (5–21): Five years groundwater forecasting for semi-annual wells’ production
amounts using (ARIMA)
2050000
2100000
2150000
2200000
Jan-18 Jul-18 Jan-19 Jul-19 Jan-20 Jul-20 Jan-21 Jul-21 Jan-22 Jul-22
Amount (m3)
290
295
300
305
310
315
320
2018 2019 2020 2021 2022
Amount (mm)
Page 88
74
5.4.1. Deviations of rain amounts and wells’ production
To give a clear perspective of future vision for the amounts of rainfall and the
demand of groundwater amounts for Deir Al-Balah city, we compared the data of last
year of the original dataset (2017) with the forecasted data for both ‘rain amounts’ and
‘wells production’ as in Table (5-9). This will help decision makers managing
groundwater and searching for alternative water resources.
5.4.1.1. Deviations of rain amounts, wells’ production in comparison with
the 2017 amounts
Table (5-9): Deviation for groundwater and rain amounts comparing with 2017
On the one hand the anticipated amounts of wells production for the groundwater
will be decreasing in the next 5 years in comparison with 2017. This decrease will range
0.5% and 4%.
Year
Rain
amounts
(mm)
Rain amounts
Deviation %
wells production
(m3)
wells’ production
Deviation%
2017 241 0 4358586 0
2018 314 +30 4338393 - 0.5
2019 294 +22 4302683 -1
2020 301 +25 4267267 -2
2021 298 +24 4232142 -3
2022 299 +24 4197307 -4
Page 89
75
On the other hand, the anticipated rain amounts compared with the 2017 amounts
will increase to reach 30% in the next five years. The relationship between rain amounts
deviation and wells’ production deviation are illustrated in Figure (5–22).
Figure (5–22): The Relationship between rain amounts and groundwater deviation in
comparison with 2017
5.4.1.2. Deviations of rain amounts and wells production comparing with
the period (2013-2017)
In order to have a homogenous comparison period between rain and wells’ production,
we calculated the deviation of the amounts taken from two periods: the last five years
(2013-2017) and the forecasted five years (2018-2022) of rain amounts and groundwater
wells’ production. Then we compared the deviation of rain amounts of forecasted period
with the same forecasted period for groundwater wells’ production. The results of
comparison showed that in the period from (2018-2022) the anticipated rain amounts will
decrease by 8.4% in comparison with rain amounts in the period from (2013 to 2017), on
the other hand, the wells’ production of groundwater after 5 years will decrease by -1.05%
in comparison with the period from (2013-2017) as in Table (5-10).
Table (5-10): Deviation of rain amounts and groundwater amounts compared to last 5
years
Year Rain amounts(mm) Groundwater amount (m3)
2013 -2017 1645 21563737
0
30
2225 24 24
0 -0.5 -1 -2 -3 -4
-10
0
10
20
30
40
2017 2018 2019 2020 2021 2022
Relationship between Rain percentags and groundwater production percetages (2017-2022)
Rain Well
Page 90
76
2018 -2022 1507 21337792
Deviation% -8.4 -1.05
Figure (5–23): Rain amounts deviation (Actual + forecasted) from 2013 to 2022
Figure (5–23) shows the annual (actual + forecasted) rain amounts data from
2013 to 2022. The figure also illustrates that the rain amounts trend from the period 2018
to 2022 will increase with comparison to 2017. This means the anticipated rain amounts
for the next five years will increase comparing with 2017. On contrary, the rain amounts
will decrease by 8.4% in comparison with the amounts in the period from 2013 to 2017.
Figure (5–24): Wells’ production data (Actual + forecasted) from 2013 to 2022
190
240
290
340
390
440
490
540
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Rain amounts Deviation 2013 - 2022
0
1000000
2000000
3000000
4000000
5000000
6000000
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Wells Production Deviation 2013-2022
Page 91
77
According to the forecasted data shown in Figure (5–24) of the next five years
(2018-2022), the trend is going down in comparison with 2017. The anticipated wells’
production of groundwater will decrease after 5 years by (-1.05) in comparison with period
from 2013 to 2017.
Figure (5–25): Groundwater production data(Actual+forecasted) from 2008 to 2022
Figure (5–26): Rain amounts data(Actual+forecasted) from 1985 to 2022
General representation of the actual and forecasted rain data from 1985 to 2022 is
shown in Figure (5–26). Another one for the groundwater production from 2008 to 2022
is shown in Figure (5–25(.
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Wells Production of Groundwater(2008-2022)
0
100
200
300
400
500
600
700
800
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
20
13
20
14
20
15
20
16
20
17
20
18
20
19
20
20
20
21
20
22
Rain Amounts from 1985 to 2022
Page 92
78
5.5. Discussion
We applied the (ARIMA+NN) algorithm for real rain forecasting because it
achieved the lowest MAPE value. This result is consistent with the result obtained by
(Mukhairez, 2018), in which ARIMA combined with neural network has the lowest
MAPE value. For the groundwater forecasting, however, the resulted MAPE values
obtained by applying the forecasting algorithms were not small because the volatility of
data was high. So, re-dividing the monthly data into semi-annual data improved its
volatility. Then reapplying the algorithms on the re-divided data and re-evaluating the
algorithms improved the MAPE values of the five algorithms which became smaller.
ARIMA was the most accurate algorithm with the lowest MAPE value and selected for
semi-annual forecast of groundwater production amounts. This result is similar to the
result of (Udom & Phumchusri, 2014) in which ARIMA model outperformed other
models and got the best MAPE value.
According to the forecasted results for the next 5 years (2018- 2022), the production
of groundwater will decrease in comparison with the amounts of 2017. This decrease
will range between 0.5% and 0.4%. On the other hand, the anticipated rain amounts
compared with the 2017 amounts will increase from 22% to reach 30% in the next five
years.
The results of comparing the forecasted amounts of rain with the years from 2013 to
2017 showed that the rain amounts will decrease by 8.4% and the anticipated wells’
production of groundwater will decrease after 5 years by (-1.05%) in comparison with the
amounts of the period from 2013 to 2017.
5.6. Summary
In this chapter forecasting is conducted on the amounts of groundwater production
and rain for Dear El-Balah city. The results of comparing the amounts of groundwater
production and the rain amounts of the period (2013 – 2017) with the forecasting results
for the next 5 years will decrease with 1.05%, 8.4%, respectively.
Page 93
79
Chapter 6
Conclusion and Future
Work
Page 94
80
Chapter 6
Conclusion and Future Work
This chapter summarizes the thesis results and conclusion of our experiments and
future work.
6.1. Summary
We have used well known algorithms and have proven their efficiency in different
fields of previous research. These algorithms are: (ARIMA, ARIMA+ETS,
ARIMA+TBATS, ARIMA+NN and ETS). They are widely used in time series forecasting
tasks. It is observed that no researchers have forecast groundwater and rain using the
forecasting science in Gaza and none have discussed the relation between them
considering the relation of salinity with the increased consumption of groundwater and
whether relationship between the amounts of groundwater demand and the rain is
incremental or decremental.
We applied forecasting algorithms on the groundwater and rain datasets of Dear El-
Balah city and tried to extract the relation between groundwater demand and rain amounts.
Also, we forecasted the groundwater and rain by the growth of population. This thesis
focuses on the medium-term forecasting and can contribute positive impacts in water
resources management.
The study concluded that for the next 5 years, the anticipated amounts of groundwater
production in Dear El-Balah city will decrease by 1.05% and the average of rain amounts
will decrease by 8.4%.
Yet, the expected increase of population over time will lead to an increase in
groundwater production. Based on this paradox, an interview with concerned official in
the CMWU was conducted to identify the causes of the decrease in groundwater
production by the CMWU. The results showed that:
• In reality, as people dig private wells that go unmonitored by CMWU. Water
production is performed by other sources than CMWU itself. Therefore, it is believed
that there is an increase in water production.
• Regular power cut-offs reduce the groundwater pumping by CMWU.
Page 95
81
• The presence of a desalination plant in Deir El Balah to desalinate the water of the
Mediterranean Sea reduces the extraction of groundwater from the coastal water
reservoir. Although the dependence on this plant as a source of water is not full, it is still
a source of water. Its production capacity has reached about (2600×1000) Liter since
2014.
The salinity of the groundwater is affected by two main important factors: rain that
recharges groundwater wells and pumping amounts from wells. Increased rainfall and lack
of pumping from wells lead to a lack of salinity(Mushtaha & Al-Louh, 2013).
Although the data are not enough in this study to predict salinity, however, based on
the thesis results in terms of decrease in rain amounts by 8.4% and a decrease in
groundwater production by 1.05%, this gives an expectation of an increase in salinity of
groundwater.
6.2. Conclusion
Our research depends on the use of medium-term forecasting on rainfall and
groundwater data by evaluating the forecasting algorithms over our datasets, then
selecting the most appropriate algorithm for real forecast.
From all discussions of this study the following conclusions can be drawn:
1. Groundwater data and rain are non-stationary time series
2. Annual not monthly, rainfall data has been used because of the absence of rain in
some months in GAZA
3. The datasets should be on the form of time series data before applying forecasting
algorithms.
4. The evaluation stage, measuring accuracy using MAPE measure on testing set data
was before real forecast.
5. After evaluation process for semi-annual groundwater data, the most accurate
algorithm for groundwater forecasting is ARIMA in which the value of MAPE
measure is 4.9%.
Page 96
82
6. Improving the performance of the algorithms (MAPE values) after re-dividing the
groundwater data to semi-annual. However, the ETS algorithm is not the best
because of the non-stationary data nature.
7. The most accurate algorithm for rain forecasting was Hybrid ARIMA
(ARIMA+NN) in which the value of MAPE measure was 21.0%.
8. The selected and most accurate model was applied on (rain data of 32 years) and
(groundwater data of 120 months).
9. The (ARIMA) model and the nonlinear models (ANN, ETS, TBATS) were used
to capture different forms of relationship in the time series data to benefit from the
advantages of these models, so the proposed model will be more powerful and
efficient
10. The proposed model forecasted the groundwater production and the amounts of
rainfall for a given area (Dear El-Balah city as a case study) over a given time (1-
5 years). This model based on historical data of rain amounts that has been taken
from 1985 to 2017 from the Ministry and some other 10 years' historical data of
the monthly pumping groundwater from (2008 to 2017) obtained from (CMWU).
11. Generally, after five years groundwater production will approximately decrease
by 1.05%, but the rain amounts will approximately decrease by 8.4%. That means
rainfall will decrease, but the rate of groundwater production will decrease more
than the rate of rainfall. Although the data are not enough, the results give us an
expectation of increasing salinity.
To sum up, this study is based on the official groundwater data available
for CMWU. It doesn’t include data concerning wells illegally dug. Also, data
concerning desalination plant of Dear El-Balah is not included.
So, generalizing the study results for Dear El-Balah city must be dealt with great
caution. It is also shouldn’t be dealt with as the only source of data with relation
to the study problem.
Page 97
83
In the light of these results, the study concluded with the following
recommendations:
• Recommendations for decision and policy makers:
[1] Filling a lawsuit against the occupation to benefit from the water of the River
Jordan, which is blocked by the dam of Gaza Valley at the borders of Gaza Strip
in the middle region. This water can be utilized to be used for irrigation and
agriculture.
[2] Mandating effective water legislation by the Legislative Council and the Water
Authority regarding the utilization of groundwater and the development of laws to
prevent illegal consumption of groundwater.
[3] Establishing of dams by water authorities to collect rainwater, especially in
places where rainfall is increasing and then using it in feeding the underground
reservoir.
• Recommendations for supervisory bodies such as:(CMUS, MoA, the water
authority):
[1] Managing water consumption to ensure reasonable balance between
precipitation and pumping from underground wells.
[2] Developing a future vision and alternative plans to compensate for the shortage
of groundwater and thus prevent salinization and the search for alternative water
sources.
[3] Increasing dependence on the seawater desalination plant under construction in
the southern Gaza Strip, instead of relying on groundwater pumping from wells.
[4] Utilizing the central processing plant in the southern Gaza Strip to use the
resulting water for irrigation and agriculture.
6.3. Future works
This study is limited for not dealing with not official and not registered wells. These
not registered and not official wells increase the production of groundwater. There is a
need for future work (research) dealing with salinity rate as more accurate indicator to
determine the groundwater pumping rate.
Page 98
84
Also, we can study the forecasting of the salinity and various minerals percentages in
groundwater based on the rates of rain, groundwater production and desalination rates.
More factors e.g. (population growth, salinity, nitrates and minerals in wells water)
can be used for forecasting tasks of groundwater and rain.
Different models can be used for different cities on real data sets in the Gaza Strip to
forecast groundwater production and rain amounts to prevent the depletion of the
groundwater for the sake of the coming generations.
Page 99
85
Bibliography
Aggarwal, C. C. (2015). Data Mining The Textbook.
Agrawal, V., Agrawal, S., Nag, S., Chakraborty, D., & Panigrahi, B. K. (2016). Knn Coal Mill,
11–18.
Aiash, M., & Mogheir, Y. (2017). Comprehensive Solutions for the Water Crisis in Gaza Strip,
25(3), 63–75.
Al-Shalalfeh, Z., Napier, F., & Scandrett, E. (2018). Water Nakba in Palestine: Sustainable
Development Goal 6 versus Israeli hydro-hegemony. Local Environment, 23(1), 117–124.
https://doi.org/10.1080/13549839.2017.1363728
Armstrong, J. S., & Fildes, R. (2006). Making progress in forecasting. International Journal of
Forecasting, 22(3), 433–441. https://doi.org/10.1016/j.ijforecast.2006.04.007
Auda, G., & Kamel, M. (1999). Modular Neural Networks: a Survey. International Journal of
Neural Systems, 9(2), 129–151.
Australian Transport- ASSessment and Planning. (2016). Forecasting and evaluation. Australian
Transport( ASSessment and Planning). Retrieved from https://atap.gov.au/tools-
techniques/travel-demand-modelling/6-forecasting-evaluation.aspx
Authority, P. W. (2018). Water Authority Strategic Plan 2016-2018 1, 1–48.
Azam, F. (2000). Biologically inspired modular neural networks. Specialist, 149. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.1840&rep=rep1&typ
e=pdf
Bagirov, A. M., Mahmood, A., & Barton, A. (2017). Prediction of monthly rainfall in Victoria,
Australia: Clusterwise linear regression approach. Atmospheric Research, 188, 20–29.
https://doi.org/10.1016/j.atmosres.2017.01.003
Banerjee, P., Singh, V. S., Chatttopadhyay, K., Chandra, P. C., & Singh, B. (2011). Artificial
neural network model as a potential alternative for groundwater salinity forecasting.
Journal of Hydrology, 398(3–4), 212–220. https://doi.org/10.1016/j.jhydrol.2010.12.016
Barakat, R., & Heackock, R. (2013). Water in Palestine ISBN 978-9950-316-47-8. The Birzeit
Strategic Studies Forum.The Ibrahim Abu-Lughod Institiue of International Studies. Birzeit
University.
Bitzer, S., & Kiebel, S. J. (2012). Recognizing recurrent neural networks (rRNN): Bayesian
inference for recurrent neural networks. Biological Cybernetics, 106(4–5), 201–217.
https://doi.org/10.1007/s00422-012-0490-x
Chhetri, M. B., Lumpe, M., Vo, Q. B., & Kowalczyk, R. (2017). On Forecasting Amazon EC2
Spot Prices Using Time-Series Decomposition with Hybrid Look-Backs. Proceedings -
2017 IEEE 1st International Conference on Edge Computing, EDGE 2017, 158–165.
https://doi.org/10.1109/IEEE.EDGE.2017.29
Chou, I.-T. (n.d.). BATS and TBATS Model. Retrieved from https://yintingchou.com/posts/bats-
and-tbats-model/
Page 100
86
Coastal Municipalities Water Utility. (2015). Retrieved from
http://www.cmwu.ps/Ar/ReadTopic.aspx?Static=20
Cui, H., Singh, V. P., & Asce, D. M. (2017). Entropy Spectral Analyses for Groundwater
Forecasting, 22(7), 1–8. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001512.
Daibes-Murad1, F. (2004). Water Resources in Palestine A Fact Sheet and Basic Analysis of the
Legal Status.
Dalinina. (2017). Introduction to Forecasting with ARIMA in R. ORACLE. Retrieved from
https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-
science-tutorials
de Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex
seasonal patterns using exponential smoothing. Journal of the American Statistical
Association, 106(496), 1513–1527. https://doi.org/10.1198/jasa.2011.tm09771
Deb, C., Zhang, F., Yang, J., Lee, S. E., & Shah, K. W. (2017). A review on time series
forecasting techniques for building energy consumption. Renewable and Sustainable
Energy Reviews, 74(January), 902–924. https://doi.org/10.1016/j.rser.2017.02.085
Deepashri, & Kamath, A. (2017). Survey on Techniques of Data Mining and its Applications.
International Journal of Emerging Research in Management &Technology, ISSN(62),
2278–9359. Retrieved from
https://www.ermt.net/docs/papers/Special_Issue/2017/ICETE/33p.pdf
Dhekale, B. S., Sahu, P. K., Vishwajith, K. P., & Narsimhaiah, L. (2015). Structural Time Series
Analysis towards Modeling and Forecasting of Ground Water Fluctuations in Murshidabad
District of West Bengal, 5, 117–126. https://doi.org/10.5923/c.ije.201501.17
Egnanarayana, Y. (2005). ARTIFICIAL NEURAL.
Et.al, B. (2014). Improving the performance of water demand forecasting models by using
weather input. Procedia Engineering, 70, 93–102.
https://doi.org/10.1016/j.proeng.2014.02.012
Gahirwal, M. (2013). Inter Time Series Sales Forecasting. Retrieved from
http://arxiv.org/abs/1303.0117
Garima Jain, E., & Mallick, B. (2017). A Study of Time Series Models ARIMA and ETS.
International Journal of Modern Education and Computer Science, 9(4), 57–63.
https://doi.org/10.5815/ijmecs.2017.04.07
Geetha, A., & Maksood, F. Z. (2017). Sustainability in Oman: Energy Consumption Forecasting
using R. Indian Journal of Science and Technology (IJST), 10(10), 1–14.
https://doi.org/10.17485/ijst/2017/v10i10/97008
Gong, Y., Zhang, Y., Lan, S., & Wang, H. (2016). A Comparative Study of Artificial Neural
Networks, Support Vector Machines and Adaptive Neuro Fuzzy Inference System for
Forecasting Groundwater Levels near Lake Okeechobee, Florida. Water Resources
Management, 30(1), 375–391. https://doi.org/10.1007/s11269-015-1167-8
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. San Francisco,
CA, itd: Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-381479-1.00001-0
Page 101
87
Hassani, H., Silva, E. S., Antonakakis, N., Filis, G., & Gupta, R. (2017). Forecasting accuracy
evaluation of tourist arrivals. Annals of Tourism Research, 63, 112–127.
https://doi.org/10.1016/j.annals.2017.01.008
Hill, T., Connor, M. O., & Remus, W. (2015). Neural Network Models for Time Series
Forecasts. Management Science, 42(7), 1082–1092.
https://doi.org/10.1287/mnsc.42.7.1082
Horse, H. (2018). Time Series Analysis - Exponential Smoothing.
Hosam Mukhairez, D. A. E.-H. (n.d.). Medium-Term Forecasting for Municipal Water Demand
and Revenue ( KhanYounis City as A Case Study ) Medium-Term Forecasting for
Municipal Water Demand and Revenue ( KhanYoun.
Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, S. (2002). A state space framework for
automatic forecasting using exponential smoothing methods. International Journal of
Forecasting, 18(3), 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8
Igor Aizenberg a, Leonid Sheremetovb, n, Luis Villa-Vargas c, J. M.-M. (2016). Multilayer
Neural Network with Multi-Valued Neurons in time series forecasting of oil production.
Neurocomputing, 175, 980–989. https://doi.org/10.1016/j.neucom.2015.06.092
Iqelan, B. M. (2016). A Singular Spectrum Analysis Technique to Electricity Consumption
Forecasting. Journal of Engineering Research and Applications Www.ijera.com ISSN,
6(36), 2248–962286. https://doi.org/10.9790/9622-
James Lani, P. D. (2018). What is Linear Regression? CompleteDissertation-Website. Retrieved
from http://www.statisticssolutions.com/what-is-linear-regression/
Jones, R. B. B. V. (2011). Forecasting Urban Water Demand (second edi). America: american
water works association.
Kanchymalay, K., Salim, N., Sukprasert, A., Krishnan, R., & Hashim, U. R. A. (2017).
Multivariate Time Series Forecasting of Crude Palm Oil Price Using Machine Learning
Techniques. IOP Conference Series: Materials Science and Engineering, 226(1).
https://doi.org/10.1088/1757-899X/226/1/012117
Kaytez, F., Taplamacioglu, M. C., Cam, E., & Hardalac, F. (2015). Forecasting electricity
consumption: A comparison of regression analysis, neural networks and least squares
support vector machines. International Journal of Electrical Power and Energy Systems,
67(February 2018), 431–438. https://doi.org/10.1016/j.ijepes.2014.12.036
Kejela, G. (2012). Short-term Forecasting of Electricity Consumption using Gaussian Processes.
Khali et.al. (n.d.). Short-term forecasting of groundwater levels under conditions of mine-tailings
recharge using wavelet ensemble neural network models. Hydrogeology Journal, 23(1),
121–141. https://doi.org/10.1007/s10040-014-1204-3
Khashei, M., & Bijari, M. (2011). A new hybrid methodology for nonlinear time series
forecasting. Modelling and Simulation in Engineering, 2011.
https://doi.org/10.1155/2011/379121
Kisi, O., & Sanikhani, H. (2015). Prediction of long-term monthly precipitation using several
soft computing methods without climatic data. International Journal of Climatology,
Page 102
88
35(14), 4139–4150. https://doi.org/10.1002/joc.4273
Liu, H., Tian, H. Q., Chen, C., & Li, Y. fei. (2010). A hybrid statistical method to predict wind
speed and wind power. Renewable Energy, 35(8), 1857–1861.
https://doi.org/10.1016/j.renene.2009.12.011
Mahmud et.al. (2017). Monthly rainfall forecast of Bangladesh using autoregressive integrated
moving average method. Environmental Engineering Research, 22(2), 162–168.
https://doi.org/10.4491/eer.2016.075
Mekanik, F., Imteaz, M. A., Gato-Trinidad, S., & Elmahdi, A. (2013). Multiple regression and
Artificial Neural Network for long-term rainfall forecasting using large scale climate
modes. Journal of Hydrology, 503, 11–21. https://doi.org/10.1016/j.jhydrol.2013.08.035
Microsoft. (2018). FORECAST.LINEAR function. Retrieved from
https://support.office.com/en-us/article/forecast-linear-function-38e2a419-7415-4037-
8761-93f3992ace87
Ministry of Interior and National Security. (2018). No Title. Retrieved from
https://moi.gov.ps/Home/
Mohamed, M. M., & Al-Mualla, A. A. (2010). Water Demand Forecasting in Umm Al-Quwain
(UAE) Using the IWR-MAIN Specify Forecasting Model. Water Resources Management,
24(14), 4093–4120. https://doi.org/10.1007/s11269-010-9649-1
Mohanty, S., Jha, M. K., Raul, S. K., Panda, R. K., & Sudheer, K. P. (2015). Using Artificial
Neural Network Approach for Simultaneous Forecasting of Weekly Groundwater Levels at
Multiple Sites. Water Resources Management, 29(15), 5521–5532.
https://doi.org/10.1007/s11269-015-1132-6
Mushtaha, D. A.-A. K., & Al-Louh, D. M. N. (2013). The relationship between the water of rain,
groundwater and springs and population consumption In the West Bank and Gaza Strip In
the period from 1980 to 2010.
Palestine, M. of A. S. of. (2017). No Title.
Panigrahi, S., & Behera, H. S. (2017). A hybrid ETS–ANN model for time series forecasting.
Engineering Applications of Artificial Intelligence, 66(June), 49–59.
https://doi.org/10.1016/j.engappai.2017.07.007
Pati, J., & Shukla, K. K. (2015). A comparison of ARIMA, neural network and a hybrid
technique for Debian bug number prediction. Proceedings - 5th IEEE International
Conference on Computer and Communication Technology, ICCCT 2014, 47–53.
https://doi.org/10.1109/ICCCT.2014.7001468
Qiu, M., Zhao, P., Zhang, K., Huang, J., Shi, X., Wang, X., & Chu, W. (2017). A short-term
rainfall prediction model using multi-task convolutional neural networks. Proceedings -
IEEE International Conference on Data Mining, ICDM, 2017–Novem, 395–404.
https://doi.org/10.1109/ICDM.2017.49
Rajaee et.al. (2016). Groundwater Level Forecasting Using Wavelet and Kriging, II(Ii), 1–21.
https://doi.org/10.22055/jhs.2016.12848
Ramos, P., Santos, N., & Rebelo, R. (2015). Performance of state space and ARIMA models for
Page 103
89
consumer retail sales forecasting. Robotics and Computer-Integrated Manufacturing, 34,
151–163. https://doi.org/10.1016/j.rcim.2014.12.015
Report, S., & Water, O. F. (2013). in the Occupied State of Palestine-2012, (October), 22.
Rob J Hyndman. (2018). Innovations state space models for exponential smoothing. Retrieved
from https://www.otexts.org/fpp/7/7
Seymour, L. (2014). Introduction to Time Series and Forecasting . by Peter J . Brockwell ;
Richard A . Davis Review by : Lynne Seymour Journal of the American Statistical
Association , Vol . 92 , No . 440 ( Dec ., 1997 ), p . 1647, 92(440).
SHARMA. (2017). Data Mining vs. Statistics. Retrieved from https://upxacademy.com/data-
mining-vs-statistics/
Sibanda, W., & Pretorius, P. (2012). Artificial Neural Networks-A Review of Applications of
Neural Networks in the Modeling of HIV Epidemic. International Journal of Computer
Applications, 44(April), 975–8887.
Stephanie. (2018). Exponential Smoothing: Definition of Simple, Double and Triple. Retrieved
from http://www.statisticshowto.com/exponential-smoothing/
Suad A.Alasdi and Wesam S.Bhaya. (2017). Review of Data Preprocessing Techniques in Data
Mining. Retrieved from https://www.researchgate.net/publication/320161439
Sunil Ray. (2015). 7Types of Regression Techniques you should know! Retrieved from
https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/
Thiyagarajan, K., Kodagoda, S., & Nguyen, L. V. (2017). Predictive Analytics for Detecting
Sensor Failure Using Autoregressive Integrated Moving Average Model. 12th IEEE
Conference on Industrial Electronics and Applications, 1923–1928.
Udom, P., & Phumchusri, N. (2014). A comparison study between time series model and
ARIMA model for sales forecasting of distributor in plastic industry, 4(2), 32–38.
Ullah, M. I. (2014). Objectives of Time Series Analysis, http://itfeature.com/time-series-
analysis-and-fore.
Wongsathan, R., & Seedadan, I. (2016). A Hybrid ARIMA and Neural Networks Model for PM-
10 Pollution Estimation: The Case of Chiang Mai City Moat Area. Procedia Computer
Science, 86(March), 273–276. https://doi.org/10.1016/j.procs.2016.05.057
Yang, R., Zhang, Z., & Shi, P. (2010). Exponential stability on stochastic neural networks with
discrete interval and distributed delays. IEEE Trans. Neural Networks, 21(1), 169–175.
https://doi.org/10.1109/TNN.2009.2036610
Zhang, J. L., Zhang, Y. J., & Zhang, L. (2015). A novel hybrid method for crude oil price
forecasting. Energy Economics, 49(December), 649–659.
https://doi.org/10.1016/j.eneco.2015.02.018
Zhao, C., Liu, G., Hermawan, E., Ruchjana, B. N., Siregar, F. A., & Makmur, T. (2018). The
development rainfall forecasting using kalman filter The development rainfall forecasting
using kalman filter.
Page 104
90
Appendix
Summary of Results
Appendix A
Time series rain dataset
Year Amount (mm)
1985 108
Tra
inin
g S
et
1986 626.4
1987 130.2
1988 344.5
1989 264.5
1990 203.4
1991 497.7
1992 419.3
1993 209.6
1994 683
1995 236
1996 352.7
1997 354.5
1998 107.5
1999 158.2
2000 527
2001 359.5
2002 435.7
2003 354.9
2004 311.5
2005 304.5
2006 389.5
2007 219.5
2008 281
2009 220
2010 131
2011 385.5
Te
sti
ng
Se
t
2012 255.5
2013 375.5
2014 216
2015 514
2016 298.5
Page 105
91
Year Amount (mm)
2017 241.2
Appendix B
Time series groundwater dataset
Month Amount (m3) Jan-08 339957
Tra
inin
g S
et
Feb-08 371711
Mar-08 396483
Apr-08 398139
May-08 398812
Jun-08 390183
Jul-08 415331
Aug-08 415165
Sep-08 394400
Oct-08 422698
Nov-08 352150
Dec-08 317989
Jan-09 299446
Feb-09 319041
Mar-09 372687
Apr-09 409717
May-09 395727
Jun-09 476907
Jul-09 399822
Aug-09 423123
Sep-09 424026
Oct-09 392577
Nov-09 395365
Dec-09 404842
Jan-10 294146
Feb-10 275096
Mar-10 346008
Apr-10 424610
May-10 425295
Jun-10 410425
Jul-10 422653
Aug-10 385932
Page 106
92
Month Amount (m3) Sep-10 441907
Oct-10 473732
Nov-10 446231
Dec-10 427218
Jan-11 408763
Feb-11 387545
Mar-11 492134
Apr-11 429768
May-11 481101
Jun-11 492198
Jul-11 474721
Aug-11 490828
Sep-11 479342
Oct-11 486246
Nov-11 433422
Dec-11 400485
Jan-12 322137
Feb-12 315952
Mar-12 298945
Apr-12 414893
May-12 423647
Jun-12 408114
Jul-12 437694
Aug-12 422044
Sep-12 444590
Oct-12 439953
Nov-12 385702
Dec-12 360228
Jan-13 366008
Feb-13 331622
Mar-13 405257
Apr-13 465403
May-13 417558
Jun-13 436912
Jul-13 474883
Aug-13 513654
Sep-13 484779
Oct-13 449431
Nov-13 289844
Page 107
93
Month Amount (m3) Dec-13 312622
Jan-14 284846
Feb-14 348547
Mar-14 353087
Apr-14 379378
May-14 370794
Jun-14 361546
Jul-14 301269
Aug-14 301268
Sep-14 356924
Oct-14 325178
Nov-14 351438
Dec-14 335278
Jan-15 313784
Feb-15 265190
Mar-15 296940
Apr-15 332953
May-15 371687
Jun-15 144988
Jul-15 374003
Aug-15 383085
Sep-15 385764
Oct-15 402146
Nov-15 334845
Dec-15 290019
Jan-16 293112
Te
sti
ng
Se
t
Feb-16 258263
Mar-16 339185
Apr-16 318414
May-16 383726
Jun-16 378128
Jul-16 418558
Aug-16 405105
Sep-16 404280
Oct-16 383765
Nov-16 390100
Dec-16 319585
Jan-17 306611
Feb-17 364681
Page 108
94
Month Amount (m3) Mar-17 380579
Apr-17 335951
May-17 332772
Jun-17 322346
Jul-17 498978
Aug-17 377795
Sep-17 391049
Oct-17 386677
Nov-17 328756
Dec-17 332391
Appendix C
The results of applying different algorithms of Amount (mm) attribute on rain dataset.
Year Amount (mm)
(Y)
ARIMA
(Y1)
ARIMA+ NN
(Y2)
ETS
(Y3)
ARIMA+ ETS
(Y4)
ARIMA+ TBATS
(Y5)
2011 385.5 247 429 241 351 328
2012 255.5 246 288 239 307 284
2013 375.5 245 303 238 322 299
2014 216 244 298 236 317 294
2015 514 243 299 234 319 296
2016 298.5 242 299 233 318 295
2017 241.2 242 299 231 319 295
Page 109
95
Appendix D
Computing MAPE for algorithms results of Amount (mm) on rain dataset.
ARIMA Error
Deviation
=(Abs(y-y2)/y)
*100
ARIMA+ NN
Error
Deviation
=(Abs(y-y2)/y)
*100
ETS
Error Deviation
=(Abs(y-y3)/y)
*100
ARIMA+ ETS
Error
Deviation
=(Abs(y-y4)/y)
*100
ARIMA+
TBATS Error
Deviation
=(Abs(y-y5)/y)
*100
35.9 11.3 37.4 8.9 14.9
3.7 12.6 6.3 20.2 11.1
34.7 19.3 36.7 14.1 20.3
13.1 37.8 9.3 46.8 36.1
52.7 41.7 54.4 37.9 42.5
18.8 0.1 22.1 6.6 1.1
0.1 24.0 4.3 32.1 22.4
M
AP
E
22.7 21.0 24.4 23.8 21.2
the best
Page 110
96
Appendix E
The results of applying different algorithms over Amount (m3) on wells dataset
Month Amount (m3)
(y)
ARIMA
(y1)
ARIMA+ NN
(y2)
ETS
(y3)
ARIMA+ ETS
(y4)
ARIMA+ TBATS
(y5)
Jan-16 293112 305902 334581 258608 290710 291948
Feb-16 258263 304219 321916 249970 296304 305128
Mar-16 339185 302545 347862 323864 321600 325142
Apr-16 318414 300880 340691 339615 344393 340567
May-16 383726 299224 289115 350471 347624 345582
Jun-16 378128 297577 358395 352011 337285 344810
Jul-16 418558 295940 371431 337166 347444 346998
Aug-16 405105 294311 358062 331701 349869 354624
Sep-16 404280 292692 399713 342598 355840 360440
Oct-16 383765 291081 350976 350785 353917 354769
Nov-16 390100 289479 327363 316659 330131 336302
Dec-16 319585 287886 363549 300961 324135 315320
Jan-17 306611 286302 337125 249799 304398 305636
Feb-17 364681 284727 347753 241161 303890 312715
Mar-17 380579 283160 357526 315055 325805 329346
Apr-17 335951 281602 312685 330806 346724 342897
May-17 332772 280052 334481 341662 348916 346873
Jun-17 322346 278511 347945 343202 338001 345526
Jul-17 498978 276978 350918 328357 347841 347395
Aug-17 377795 275454 418747 322892 350089 354844
Sep-17 391049 273938 372852 333789 355962 360562
Oct-17 386677 272431 334747 341976 353985 354837
Nov-17 328756 270932 358877 307850 330168 336339
Dec-17 332391 269441 319792 292152 324156 315341
Appendix G
The results of rain amounts forecasting for Dear El-Balah city using (ARIMA +NN)
Year Forecasted amount (mm)
2018 314
2019 294
2020 301
2021 298
Page 111
97
Year Forecasted amount (mm)
2022 299
Appendix H
Applying the algorithms on the testing set of semi-annual groundwater data
Month Amount
(m3) (Z)
ARIMA
(Z1)
ARIMA+ NN (Z2)
ETS (Z3)
ARIMA+ ETS (Z4)
ARIMA+ TBATS
(Z5)
Jan-15 1725542 2135017 2311065 2277485
2398878 2374316
Jul-15 2169862 2121727 2758355 2269306
2398878 2374316
Jan-16 1970828 2108520 2648037 2261127
2398878 2374316
Jul-16 2321393 2095395 2358245 2252949
2398878 2374316
Jan-17 2042940 2082352 2387903 2244770
2398878 2374316
Jul-17 2315646 2069390 2255529 2236592 2398878 2374316
Page 112
98
Appendix I
Computing MAPE for the results of applying algorithms of Amount (mm) on semi-
annual groundwater data
ARIMA Error Deviation
=(Abs(z-z1)/z)
*100
ARIMA+ NN Error Deviation
=(Abs(z-z2)/z)
*100
ETS Error Deviation
=(Abs(z-z3)/z)
*100
ARIMA+ ETS Error Deviation
=(Abs(z-z4)/z)
*100
ARIMA+ TBATS Error Deviation
=(Abs(z-z5)/z)
*100
23.7 33.9 32.0 39.0 37.6
2.2 27.1 4.6 10.6 9.4
7.0 34.4 14.7 21.7 20.5
9.7 1.6 2.9 3.3 2.3
1.9 16.9 9.9 17.4 16.2
10.6 2.6 3.4 3.6 2.5
MA
PE
4.9 9.2 5.2 7.7 6.9
The Best
Page 113
99
Appendix J
The results of forecasting of semi-annual groundwater data using (ARIMA)
Month Forecasted semi- annual amounts (m3)
Jan-18 2173678
Jul-18 2164714
Jan-19 2155787
Jul-19 2146896
Jan-20 2138042
Jul-20 2129225
Jan-21 2120444
Jul-21 2111699
Jan-22 2102990
Jul-22 2094317