Page 1
IN DEGREE PROJECT MECHANICAL ENGINEERING,SECOND CYCLE, 30 CREDITS
, STOCKHOLM SWEDEN 2021
Gearbox fault detection, based on Machine Learning of multiple sensors
ARMANDS KRUMINS
KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT
Page 2
TRITA 515
www.kth.se
Page 3
1
Examensarbete TRITA-ITM-EX 2021:515
Skadedetektering av växellådor baserat på maskininlärning från flera sensorer
Armands Krumins
Godkänt
2021-09-07
Examinator
Ulf Olofsson
Handledare
Ulf Olofsson
Uppdragsgivare
KTH
Kontaktperson
Edwin Bergstedt
Sammanfattning Den ökande efterfrågan på högre effektivitet och mindre miljöpåverkan från växellådor, som
bland annat används i bilar och vindkraftverk, har skapat ett behov av mer avancerade tekniska
lösningar för att uppfylla dessa krav. En sådan teknisk lösning är tillståndsövervakning som kan
förlänga växellådors livscykel, vilket sparar resurser och tid. Idag bidrar tillståndsövervakning,
genom att tillämpa maskininlärning, till att gå från reaktiva till prediktiva handlingar för att
åtgärda och förutsäga även mindre fel innan de blir betydande.
Syftet med detta examensarbete är att utveckla en metodik som kan användas för att förutsäga
utmattningsskador under test i en kugg-rigg (FZG-rigg) växer sig för stora, som finns på KTH
Institutionen för maskinkonstruktion. I det här arbetet användes existerande standardsensorer på
riggen för temperatur, varvtal och vridmoment. Mätdata från test av fyra olika kugghjul
analyserades. Två olika material (smidesstål eller pulvermetall) med två olika ytbehandlingar
(slipade eller superfinerade).
Efter litteraturstudier om kuggytutmattning (pitting), tillståndsindikatorer för dessa fel och
maskininlärning. utfördes en statistisk analys för att se hur mätdata beter sig under pågående
testning och för att kunna jämföra med resultat från maskininlärning. Två
maskininlärningsmodeller, beslutsträd och Support Vector Machine (svensk översättning saknas)
valdes och tränades i två kombinationer, antingen endast med effektivvärdesavvikelse (RMS),
eller också med crestfaktor, standardavvikelse och kurtosis.
Totalt tränades 64 modeller, 32 för alla tester och ytterligare 32 för att undersöka två specifika
test där som hade en längre provtid innan skada uppstod. Nya tillståndsindikatorer som
standardavvikelse och signal-brusförhållande beräknades för att få mer nyanserade trender än
enbart ytprofilförändringar hos kugghjulen för att övervaka växelns beteende. Efter en
jämförelse med resultaten från den statistiska analysen och tidigare utförda ytprofilmätningar
drogs slutsatsen att de nya indikatorerna kan indikera förändringar i växelns beteende innan den
första ytskadan upptäcks med kuggprofilmätning.
Nyckelord: kuggväxel, maskininlärning, tillståndsövervakning, ytutmattning
Page 5
3
Master of Science Thesis TRITA-ITM-EX 2021:515
Gearbox fault detection, based on Machine Learning of multiple sensors
Armands Krumins
Approved
2021-09-07
Examiner
Ulf Olofsson
Supervisor
Ulf Olofsson
Commissioner
KTH
Contact person
Edwin Bergstedt
Abstract The increasing demand for higher efficiency and lower environmental impact of transmissions,
used in automotive and wind energy industries has created a need for more advanced technical
solutions to fulfil those requirements. Condition monitoring plays an important role in the
transmission life cycle, saving resources and time. Recently condition monitoring, using machine
learning has shifted from reactive to proactive action, predicting minor faults before they become
significant.
This thesis intends to develop a methodology that can be used to predict faults like pitting
initiation, before propagating in FZG test rig, available at KTH Machine Design department.
Standard sensor measurements already available like temperature, rotation speed and torque are
used in this project. Four kinds of gears were used, two made of wrought, and two – of powder
metal steel, each with ground or superfinish surface.
After a literature review about pitting fatigue, condition indicators for these failures and machine
learning were done, a statistical analysis was done, to see how the transmission behaves during
testing and to have comparison material, helpful when having machine learning results. Two
machine learning models, Decision Tree and Support Vector Machine were selected and trained
in two combinations, either with Root Mean Square only, or with Crest Factor, Standard
Deviation and Kurtosis in addition.
As a result, 64 models were trained, 32 for all tests and another 32 to investigate two particular
tests due to a longer pitting propagation period. New condition indicators like Standard
Deviation and Signal – to – noise ratio was calculated to get more nuanced trends than just using
one measurement to monitor the gearbox behavior. After comparing with the results from
statistical analysis and previously done tooth profile measurements, it was concluded that the
new indicators could indicate the change in gearbox operation before the first pitting initiation is
detected, using tooth profile measurement.
Keywords: condition monitoring, gear, machine learning, surface fatigue
Page 7
5
FOREWORD
I would like to thank my supervisor Ulf Olofsson, who offered this thesis project, for guidance
through the whole thesis process and consulting regarding to various technical challenges. As
well I would like to thank advisors Edwin Bergstedt and Ellen Bergseth for conversations during
the thesis writing process, provided materials and useful feedback.
Armands Krumins
Stockholm, August 2021
Page 9
7
NOMENCLATURE
Notations
Symbol Description
t time (s)
τ torque (Nm)
N rotation speed (min-1)
T temperature (°C)
Abbreviations
CF Crest Factor
CI Condition Indicator
DT Decision Tree
FZG Forschungsstelle für Zahnräder und Getriebebau
LS Load Stage
MAE Mean Absolute Error
ML Machine Learning
PID Proportional Integral Derivative controller
PMGR Powder Metal, Ground surface
PMSF Powder Metal, Superfinished surface
R2 Coefficient of Determination
RMS Root Mean Square
RMSE Root Mean Square Error
SNR Signal – to – noise ratio
STD Standard Deviation
SVM Support Vector Machine
WGR Wrought Steel, Ground surface
WSF Wrought Steel, Superfinished surface
Page 10
8
TABLE OF CONTENTS
SAMMANFATTNING (SWEDISH) 1
ABSTRACT 3
FOREWORD 5
NOMENCLATURE 7
TABLE OF CONTENTS 8
1 INTRODUCTION 10
1.1 Background 10
1.2 Purpose 10
1.3 Delimitations 11
1.4 Research questions 11
1.5 Method 11
2 FRAME OF REFERENCE 14
2.1 Pitting fatigue phenomena 14
2.2 Condition monitoring in gearboxes 14
2.3 The basics of Machine Learning 17
2.4 Test rig 20
2.4.1 Transmission 20
2.4.2 Sensors 22
2.5 The used data 22
3 IMPLEMENTATION 24
3.1 First view on data 24
3.2 Statistical analysis of the data 27
Page 11
9
3.3 Machine learning method selection 28
3.4 Data preparation for machine learning 29
4 RESULTS 30
4.1 Statistical analysis results 30
4.2 Machine learning trained models 35
4.3 Condition Indicator calculations 47
5 DISCUSSION AND CONCLUSIONS 52
5.1 Discussion 52
5.2 Conclusions 53
6 RECOMMENDATIONS AND FUTURE WORK 55
6.1 Recommendation 55
6.2 Future work 55
7 REFERENCES 56
APPENDIX A: SUPPLEMENTARY INFORMATION 58
Page 12
10
1 INTRODUCTION
1.1 Background
Nowadays gearboxes are used in various industries like automotive, manufacturing and energy
producing to multiply or reduce the drivetrain speed and torque, from automotive to
manufacturing and energy industry. In latter, especially in wind energy industry, the quality and
maintenance of the gearbox plays significant role in performance of the whole wind turbine. The
reason for that are the costs and amount of work and energy put in the installation and
maintenance, this argument is usually causing skepticism regarding to wind energy as
sustainable energy source (1). To mitigate these doubts and encourage the society to come to
consensus, it is important to prove that there are opportunities to develop it as material and
energy efficient as possible.
Therefore, in last two decades there has been a big interest and development in predictive
maintenance sphere, which has mostly developed alongside wind turbine industry as a
complement to it. The latest trend is transformation to industry 4.0, when online data are used to
monitor and predict the conditions, instead offline or onsite measurements (2). In multiple
research studies the most often observed data source are vibrations of transmission (3), measured
in multiple places. However, they occur only then, when the damage has already started to
propagate, and it is too late to plan the maintenance work ahead. In that case there are possible
different approaches like using transmission error, oil temperature or oil contamination as
parameters that indicate changes in transmission operation conditions.
This master thesis project was offered by Professor Ulf Olofsson as research – based project at
KTH with goal to do research, using pre – recorded measurements from spur gear transmission
tests. As there are signs that during the tests with increasing load, the damage on gear teeth
forms, it raises question whether it is possible to predict it before it becomes so large that entire
transmission fails to operate. The proposed methodology here is machine learning, in which
multiple sources of data are used to predict the outcome, learning from previous cases with
similar conditions, it will be investigated whether this methodology can be used in this situation
and if there are opportunities to do the measurements in the way that is more suitable for this
methodology.
1.2 Purpose
The purpose of this thesis project is to continue a long – term research process in Machine
Design Department of gear contact characteristics, particularly in this case pitting phenomena in
FZG test rig. The novelty of the project is to predict the phenomena before it becomes so severe
that the operation of transmission is interrupted immediately and servicing is required, causing
undesirable downtime. It can be done, analyzing pre – recorded measurements to find patterns in
them that show the changes in gear contact behavior.
The approach that is selected for this project is Machine Learning (further in text ML) algorithm
training, that is trained, using previously mentioned measurements, so, when the algorithms
recognize pattern that is associated with coming failure, it can calculate when the failure can
happen. This proactive approach helps to make more precise decisions to do the maintenance
work just in time. However, to do that, there should be enough measurements to train the
algorithms as advanced as the task requires. This problem will be discussed more in research
question section.
Page 13
11
1.3 Delimitations
It is desirable that the work fits in 20 weeks, so, from many failure modes that are possible in
gear contact, such as pitting, spalling, scuffing and wear in this thesis project only the pitting will
be analysed. The project relies on pre – recorded measurements in test rig due limited access to
campus during Spring semester 2021. They have been recorded during (4) and have been made
with gears, manufactured from different materials like powder metal and wrought steel. Each of
them has two variants – first with ground, second with super finish surface. The damage after
each Load Stage was documented, measuring tooth profile and comparing it to default profile. In
result deviation from it depending on rolling angle is measured.
At the moment there are no goals to use the output result of this project elsewhere than for FZG
test rig at KTH, as mentioned before, this knowledge would contribute in long term to
transmission research.
1.4 Research questions
After background research of gearbox industry requirements and discussions with supervisors,
following research questions were defined:
1. Is it possible to detect surface fatigue and especially pitting, using existing sensor
system?
2. How early can the pitting fatigue be detected?
3. Which measured parameters give the most useful contribution to predict the surface
fatigue?
4. How the test rig can be upgraded to get earlier/more precise pitting detection?
1.5 Method
The method used in this thesis project contains following tasks:
1. Literature review on previous published studies – before going deeper in this master
thesis topic, it is important to understand the context of problem, how often the problems
like premature pitting happens in transmissions used in industry. Moreover, it has to be
clarified, where to go during implementation phase, so some examples of condition
monitoring should be viewed, including measurement sources, their sensors and
measured data amount and quality.
2. Background study on machine learning – although the main idea of machine learning is
simple, namely, use given data to find patterns that they are forming, the whole sphere of
machine learning is getting wider and wider every year. Therefore, it is necessary to
define, how much content from this sphere should be used to keep the machine learning
model, that can answer the research question, as simple as possible.
3. Measurement evaluation – first, it is visual evaluation, when pre – recorded values that
are electric signals in range from 0 to 10 V, multiplied with corresponding scale, are
plotted in respect to time domain. When each test run is plotted in sub – plot that is
included in main plot, they can be compared visually, evaluating whether there are curve
shape changes during the whole test with certain gear set.
4. Basic statistical analysis of the measurements – it includes parameters like root mean
square, standard deviation, crest factor and kurtosis, calculated through the entire test to
see, what kind of changes in transmission behaviour there are.
Page 14
12
5. Selection of appropriate method to train algorithms with given data – here appropriate
tool and its method is going to be selected, for example, either clustering, fitting, pattern
recognition or something else.
6. Model training, passing prepared data to machine learning algorithms. The models are
going to be trained with different ML methods to see whether any of them is more
preferable than another.
7. Use of the trained models – when the model is trained, practical application of it should
be defined. Indicators like standard deviation and signal – to – noise ratio can be
extracted from it and used, to evaluate the behaviour of transmission.
8. Evaluation of results – comparison between statistical analysis, tooth profile
measurements from (5) and machine learning outcome, conclusion about the whole
process and its output, how usable it is in other cases than FZG test rig.
9. Suggestions for improvement – they have to be based on findings during background
study of gearbox condition monitoring.
10. Finalization of report – read through the whole report, update it with the latest findings.
Page 16
14
2 FRAME OF REFERENCE
2.1 Pitting fatigue phenomena
Gears during their operation can experience two kinds of failure – either surface damage or tooth
breakage. As in this thesis project the focus is on detecting possible damage as soon as possible
and tooth breakage means the end of transmission life, this time only surface damage is
considered. The reason for selecting particularly pitting as the parameter to monitor is because it
is one of those failure modes, that appears at the earliest. However, it is difficult to spot its origin
visually in the beginning because this phenomenon consists of multiple stages: initial, micro and
macro pitting (in some sources called spalling) (6).
Surface fatigue of spur and helical gears primarily appears in the form of pitting near the pitch
line. From Hertz theory it is known that at the pitch line, where the rolling action dominates, the
shear stresses are greatest at the surface. This stress consideration leads to subsurface crack
initiation. The crack may originate at material incursions that contribute to the stress
concentration (7).
In the area over and above the pitch, where combined rolling with slip takes place, the tangential
traction load can significantly affect the value and location of the stress concentration. With large
traction load the maximum stress concentration shifts to the surface and may initiate surface
cracking. Contributory factors to surface crack initiation include intensified stresses around large
asperity contacts or surface defects. As the teeth of pinion are in contact more frequently than the
wheel teeth, they get more severe pitting than the latter. At the addendum a slip takes place,
inducing a positive surface stress, whereas at the dedendum flank it becomes oriented in opposite
direction. Tensile stress is induced in latter case, which pulls open surface cracks, then lubricant
is pressed in them, and it accelerates crack growth (8).
When cracks start to propagate, they form small, shallow pits in the micrometre range, resulting
in grey patches. This stage can start right after run – in phase, when gear irregularities are worn
off. Finally, when the propagation of cracks results in particles breaking off, leaving pits in the
flanks, macro pitting starts to propagate. It usually leads to more vibrations in transmission
operation (8).
In (5) the pitting damage was investigated, applying higher and higher load in the FZG test rig.
The indicator of it was the deviation from gear standard profile, measured after each Load Stage.
This is more accurate than plain weighting, getting impression about phenomena broadly, instead
locally. The results in this paper show that the pitting propagates more and more after each load,
it forms at the middle of tooth, where the pure rolling point is, and right above gear root, where
the contact ends. The results of this thesis project and this paper will be compared further in
chapter 5.
2.2 Failures and faults
Before going deeper in condition monitoring, it is important to distinguish different conditions.
The concept of failure is fundamental to reliability. Failure is always related to a required
function. The function is often specified together with a performance requirement. The failure
occurs when the function cannot be performed or has a performance that falls outside the
performance requirement. It may develop gradually or suddenly. It can be revealed on demand,
when it has already appeared, during a functional test to seek, whether it is coming soon, and
finally, by monitoring or diagnostics. Fault is the state of an item characterized by inability to
perform a required function. While a failure is an event that occurs at a specific point in time, a
Page 17
15
fault is a state that will last for a shorter or longer period. An error is present when the
performance of a function deviates from the target performance (the theoretically correct
performance), but still satisfies the performance requirement. An error will often, but not always,
develop into a failure (9).
2.3 Condition monitoring in gearboxes
Predictive maintenance is becoming more and more interesting and even necessary for many
industries to be cost effective. It can consist of three main parts – measurement recording, their
processing and using in machine learning algorithms (2), these steps are going to be described
further in this chapter. So far wind turbines are the most often represented in predictive
maintenance research articles, in some extent it can be explained with exponential growth
worldwide in the windmill industry in last two decades (1). Also, in wind turbine industry the
maintenance can not be done spontaneously, as it is time and resource consuming due large
scale, compared to maintenance of passenger or commercial vehicles. The big amount of data,
collected by industrial systems, contains information about processes, events and alarms that
occur along an industrial production line.
Moreover, when processed and analyzed, these data can bring out valuable information and
knowledge from manufacturing process and system dynamics. By applying analytic approaches
based on data, it is possible to find interpretive results for strategic decision-making, providing
advantages such as, maintenance cost reduction, machine fault reduction, repair stop reduction,
spare parts inventory reduction, spare part life increasing, increased production, improvement in
operator safety, repair verification and overall profit (10).
Techniques for maintenance policies can be categorized into the following main classifications:
1. Run to Failure (R2F): also known as corrective maintenance or unplanned maintenance. It is
the simplest amongst maintenance techniques which is performed only when the equipment has
failed. It may lead to high equipment downtime and a high risk of secondary faults and thus,
create a very large number of defective products in production.
2. Preventive Maintenance (PvM): also known as scheduled maintenance or time-based
maintenance (TBM). PvM refers to periodically performed maintenance based on a planned
schedule in order to anticipate the failures. It sometimes leads to unnecessary maintenance which
increase the operating costs. The main aim here is to improve the efficiency of the equipment by
minimizing the failures in production.
3. Condition-based Maintenance (CBM): this method of maintenance is based on a constant
machine or equipment monitoring or their process health that can be carried out only when they
are actually necessary. The maintenance actions can only be carried out when the actions on the
process are taken after one or more conditions of degradation of the process. CBM usually
cannot be planned in advance.
4. Predictive maintenance (PdM): known as Statistical-based maintenance: maintenance
schedules are only taken when needed. It is based on the continuous monitoring of the equipment
or the machine, as like CBM. It utilizes prediction tools to measure when such maintenance
actions are necessary, hence the maintenance can be scheduled. Furthermore, it allows failure
detection at an early stage based on the historical data by utilizing those prediction tools such as
machine learning methods, integrity factors (such as visual aspects, coloration different from
original, wear), statistical inference approaches, and engineering techniques (11).
The gearbox is one of the most important components in industrial processes. Its health and
safety are vital to the reliable operation and improved efficiency of relevant facilities in the
whole system. However, gearboxes generally work under harsh operating environment, which
may accelerate their degradation. Consequently, they are subject to different defect types such as
Page 18
16
gear fatigue crack, gear pitting, bearing defects, bent shaft, etc. Gearbox defects may even cause
failure of the whole system, leading to significant economic losses, costly downtime, and even
catastrophic damage. Thus, fault diagnosis and prognosis of gearboxes are of great importance to
achieve a high degree of availability, reliability, and operational safety (12).
Condition monitoring systems deal with various types of input data, for instance vibration,
acoustic emission, temperature, oil debris analysis etc. Systems based on vibration analysis,
acoustic emission and oil debris are the most common and are very well established in industry.
Systems based on acoustic emission have a more obvious application for bearing monitoring
than for gearing monitoring. However, some applications for gearbox condition monitoring have
been introduced. Acoustic emission (AE) is usually defined as transient elastic waves generated
from a rapid release of strain energy caused by a deformation or by damage within or on the
surface of the material.
Oil debris analysis is a very reliable method for detecting gearing damage in the early stages and
allows estimation of the wear level. During gearbox operations the contacting surfaces of
gearwheels and bearings are gradually abraded. Small pieces of material break down from the
contact surfaces. These small pieces of material are carried away by oil lubricating the
gearwheels and bearing. By detecting the number and size of particles in the oil we can identify
gear – pitting damage in an early stage, which is unidentifiable by vibration analysis. Oil debris
sensors are usually based on a magnetic or an optical principle. Magnetic sensors measure the
change in magnetic field caused by metal particles in a monitored sample of oil.
A disadvantage of oil debris analysis is that it does not localize the failure in complicated
gearboxes. The oil used in the oil debris monitoring system should not dissolve the metal
particles and spread a metal film on to the gearbox housing (13).
The gearbox lubrication oil temperature values are important from a condition monitoring
perspective, as the most common failure modes in the gearbox will, potentially, manifest
themselves into a deviation in these measurements. Hence, normal behavior models for the
gearbox bearing and lubrication oil temperatures are utilized to achieve condition monitoring of
the gearbox (14).
There are two ways to record measurements – directly or indirectly. The indirect sensing
parameters are less accurate to indicate gearbox components conditions, but the rugged senor
design makes them more suitable for practical applications. On the other hand, direct sensing
techniques measure actual quantities of gearbox components conditions and have a high degree
of accuracy. Due to the practical limitations during gearbox normal operations, direct sensing
techniques are commonly used for offline measurement or as laboratory techniques (12).
In general, for condition monitoring the more data are available, the better because then the
training and decision – making process can be more nuanced, also there is always option to sort
out the data that give inadequate result that could be caused by test interruption due maintenance
work, malfunctioning sensor, mistakes during experiment etc., that can cause false alarm.
For example, in (14) 4 wind turbines are inspected, and the 10-min-average Supervisory control
and data acquisition (SCADA) data are used for monitoring purposes. Hence, in 24 h, there are a
maximum 144 measurements. The results for the anomaly detection are presented as an average
of 12-h periods. In order for the confidence in the prediction to be increased and in-line with the
missing data filtering approach, it is ensured that at least 1 h of data is available for an output
from the anomaly detection to be considered. In cases where sufficient data are not available, the
previous valid output is copied, and an indication of missing data is presented in the output.
In order to get more accurate answer after visual evaluation of measurements, Condition
Indicators (CI) are used, that describe measurements from statistical perspective. They can be
either time – based or frequency – based. The time – based indicators can be divided further into
Page 19
17
Raw Signal, Time Synchronous Average Signal, Residual Signal, Difference Signal, and Band-
Pass Mesh Signal (15). Raw signal sub – group includes parameters like Root Mean Squared
(RMS), Crest Factor (CF), Energy Ratio (ER) and Kurtosis.
RMS is the simplest method in detecting and measuring defects in the time domain. It is good for
tracking the general noise level. It is also very useful for detecting unbalanced rotating elements.
The root mean squared, also is called a quadratic mean, is a statistical measure of the magnitude
of a varying quantity. RMS was initially developed to describe a temperature of a resistor
subjected to sine wave alternative current. It can be described with the equation:
RMS = √1
N[Σi=1
N (xi)2] (1)
Where:
x – the original sample time signal
N – the number of samples
i – the sample index
CF gives better measurements than RMS in detecting defects in rotating machinery (15). It can be
defined as the ratio of the positive peak value of the input signal to the RMS level. The value of
the crest factor is affected by the numbers of peaks in the time series signal. Crest factor can be
calculated to be between 2 and 6 in normal operation. However, any value higher than 6 is
usually related to machinery problems. A signal that has a smaller number of high amplitude
peaks can generate a larger crest factor value as the numerator increases (high amplitude peaks),
as the denominator decreases (few peaks means lower RMS) It is a normalized measurement of
the amplitude of the signal and is calculated to increase even when a small number of high
amplitude peaks, such as a signal resulted from local tooth damage, occurs. The equation for CF
is:
CF =Peak level
RMS (2)
ER is a useful technique for detecting heavy uniform wear. It can be defined as the ratio of the
RMS of the difference signal d and the RMS of the signal of the regular meshing component r.
The energy in the regular component signal d is compared to the energy in the difference signal
r. The theory in this technique is that the energy moves from the regular signal to the difference
signal (15). This parameter is more suitable to detect severe damage, as it is dependent on large
difference of acceleration that causes significant change of energy. It can be described with
equation:
ER =RMSd
RMSr (3)
Next one is Kurtosis, the parameter that provides a measure of the size of the tails of distribution.
It can be used as an indicator of major peaks in a signal. The kurtosis can be defined as the fourth
normalized moment of the signal. It is useful measurement to demonstrate how peaky the signal
is. As gear wears or breaks, this feature should signal the error due to the increased level of
vibration. Simply, it can be said that kurtosis is a statistical measure of the number of amplitude
of peaks in a signal. when there are more peaks in a signal, kurtosis value become larger. A
signal of Gaussian noise has a kurtosis value close to 3. A gearbox in a good condition is
associated with a Gaussian distribution and has a kurtosis around 3. It should be noted that
researchers subtract 3 from the calculated value and they end up with a value of near zero for a
healthy gearbox (15). If more than one tooth is defective, the data distribution becomes flat, and
the kurtosis value decreases (13). Kurtosis equation is given by:
Page 20
18
Kurtosis =NΣi=1
N (xi−x̅)4
(Σi=1N (xi−x̅)2)
2 (4)
Where:
x – the signal
�̅� – mean value of the signal
i – the index of data points in time record
N – total number of data points in time record
2.4 The basics of Machine Learning Nowadays ML has become one of the leading spheres in technology area, leading to Industry 4.0
and changing the way how many issues are managed like self – driving cars and treatment of
diseases. ML is a branch of artificial intelligence (AI) focused on building applications that learn
from data and improve their accuracy over time without being programmed to do so. In data
science, an algorithm is a sequence of statistical processing steps. In ML, algorithms are
”trained” to find patterns and features in massive amounts of data in order to make decisions and
predictions based on new data. The better the algorithm, the more accurate the decisions and
predictions will become as it processes more data. As big data keeps getting bigger, as
computing becomes more powerful and affordable, and as data scientists keep developing more
capable algorithms, ML will drive greater and greater efficiency in personal and work lives (16) .
ML algorithms are categorized into three different types: supervised, unsupervised, and
reinforcement learning. The categories are depictured in figure 1. In this chapter the first two will
be discussed as they are the most used in various industries and are the simplest ones.
Figure 1. Classifications within Machine Learning techniques (11)
Page 21
19
Supervised learning, also known as supervised machine learning, is a subcategory of machine
learning and artificial intelligence. It is defined by its use of labeled datasets to train algorithms
that to classify data or predict outcomes accurately. As input data is fed into the model, it adjusts
its weights through a reinforcement learning process, which ensures that the model has been
fitted appropriately. Supervised learning helps organizations solve for a variety of real-world
problems at scale, such as classifying spam in a separate folder from inbox (16).
Supervised learning uses a training set to teach models to yield the desired output. This training
dataset includes inputs and correct outputs, which allow the model to learn over time. The
algorithm measures its accuracy through the loss function, adjusting until the error has been
sufficiently minimized.
Supervised learning can be separated into two types of problems – data mining classification and
regression:
• Classification uses an algorithm to accurately assign test data into specific categories. It
recognizes specific entities within the dataset and attempts to draw some conclusions on
how those entities should be labeled or defined. Common classification algorithms are
linear classifiers, support vector machines (SVM), decision trees, k-nearest neighbor, and
random forest.
• Regression is used to understand the relationship between dependent and independent
variables. It is commonly used to make projections, such as for sales revenue for a given
business. Linear regression, logistical regression, and polynomial regression are popular
regression algorithms (16).
Unsupervised learning uses machine learning algorithms to analyze and cluster unlabeled
datasets. These algorithms discover hidden patterns or data groupings without the need for
human intervention. Its ability to discover similarities and differences in information make it the
ideal solution for exploratory data analysis, cross-selling strategies, customer segmentation, and
image recognition.
Unsupervised learning models are utilized for three main tasks—clustering, association, and
dimensionality reduction. Clustering is a data mining technique which groups unlabeled data
based on their similarities or differences. Clustering algorithms are used to process raw,
unclassified data objects into groups represented by structures or patterns in the information (16).
In (10) it is stated that the most employed ML algorithm in industry is Random Forest (RF) - 33%,
followed by neural network-based methods (i.e. ANN - Artificial NN, CNN - Convolution NN,
LSTM - Long short-term memory network, and deep learning) - 27%, Support Vector Machine
(SVM) - 25%, and k-means - 13%.
ANNs are intelligent computational techniques inspired by the biological neurons (10). An ANN
is composed of several processing units (nodes or neurons) that have relatively simple operation.
These units are usually connected by communication channels that have an associated weight
and they only operate their local data that are indicated through their connections. The intelligent
behavior of ANNs comes from the interactions between the processing units of the network.
ANNs are one of the most common and applied ML algorithms, and they have been proposed in
many industrial applications, including soft and predictive control (10). Their main advantages
include: no expert knowledge to make decisions is needed, since they are based only on the
historical data (as the k-means model), even if the data are inconsistent, they do not suffer
degradation (i.e., ANNs are robust); and by building an accurate ANN for a particular
application, it can be employed in real-time without having to change its architecture at every
update. However, some disadvantages of ANNs are:
• networks can reach conclusions that deny the rules and theories established by the
applications,
Page 22
20
• training an ANN can be time-consuming,
• they are “black box” methods (that is, it is impossible to know why the ANN model has
reached an output prediction),
• huge data set is needed for an ANN to learn correctly.
SVM is a well-known ML technique which is widely used for both classification and regression
analysis, due to its high accuracy (11).SVM is defined as a statistical learning concept with an
adaptive computational learning method. SVM learning algorithm is presented in Figure 2. SVM
learning technique employs input vectors to map nonlinearly into a feature space whose
dimension is high.
SVM is a set of supervised learning methods that perform regression analysis and pattern
recognition. Initially, SVMs were non-probabilistic binary classifiers. But now they are also
employed in multi-class problems. In this case, SVM creates n-dimension hyperplanes that
divide data ideally into n groups/classes.
Figure 2. Support Vector Machine algorithm visualization (11)
Decision Tree is a network system composed primarily of nodes and branches, and nodes
comprising root nodes and intermediate nodes. The intermediate nodes are used to represent a
feature, and the leaf nodes are used to represent a class label. DT classifiers have gained
considerable popularity in a number of areas, such as character identification, medical diagnosis,
and voice recognition. More notably, the DT model has the potential to decompose a
complicated decision-making mechanism into a series of simplified decisions by recursively
splitting covariate space into subspaces, thereby offering a solution that is sensitive to
interpretation (11).
Random Forest creates a ”forest” (ensemble) with multiple randomized decision trees and
aggregates their predictions by simple average. RF is a supervised learning algorithm for both
classification and regression tasks. Although a RF is a collection of decision trees, there are
differences that need to be emphasized: while decision trees generate rules and nodes from the
calculation of information gain and index gini, RFs generate decision trees randomly.
Additionally, while deep decision trees may suffer from overfitting, RFs avoid overfitting in
most cases, because they work with random subsets of features and build smaller trees from such
subsets.
RF is the most used and compared ML method in PdM applications, because their main
motivations are decision trees provide a large number of observations to be part of the forecast,
and in some scenarios, RFs can reduce variation and increase generalization. However, the RF
method also has some drawbacks. For example, the RF method is complex, and takes more
computational time when compared to other ML algorithms (10).
Page 23
21
2.5 Test rig
2.5.1 Transmission
The test rig (further in text referred as FZG test rig), that is used as data source for this master
thesis project, is back – to – back gear rig that is used to run a gear transmission with different
loads. The measured result is the condition of gear tooth and its surface, looking for the patterns
of failure modes like pitting, scuffing, sliding wear, erosion and tooth breakage. It has been
designed by Forschungsstelle für Zahnräder und Getriebebau (Gear Research Centre) in
Technical University of Munich, where the abbreviation FZG is coming from. It contains two
gearboxes – slave and test, each of them is filled with 1,5 liters of transmission oil.
The working principle is that in the beginning, when the motor is turned on, it accelerates
transmission until it reaches a constant speed and as there is closed energy circulation loop, the
motor later compensates power loss in transmission. To make this layout work, the pre – load is
required, which in this case is applied with loading clutch that is connected to lever arm, and in
the end of lever arm there are added weights, in result generating reacting torque that the motor
has to overcome to maintain constant rotation speed. The layout is demonstrated in figure 3 and
technical specifications are summarized in table 1.
The test process is standardized and is summarized in standard ISO 14635-1:2000. First, the
transmission oil is heated up to 80° C, the motor turns on and rotates with speed 2250 rpm. It
stars with run – in phase, which lasts for 1,3*105 contacts of the pinion at the lowest load. The
loads that are used for this process are compiled in table 2. After run – in phase, each load from
Load Stage 3 to Load Stage 10 (further in text simply LS) is applied, running the test for 2,1 *
106 contacts of the pinion. After each test with its load, the tooth profile is measured, evaluating
how much material is removed from the profile, depending on gear roll angle. If no signs of
pitting can be noticed until the final load, the test is repeated with the same load and same
number of cycles until it appears.
The profile of tooth in (4) was measured, using Taylor Hobson Stylus instrument mounted on
fixture (see fig. 4), so the gear can remain in its place. It measures the profile from the root to the
tip of the tooth.
Figure 3. FZG test rig layout (4)
Page 24
22
Figure 4. Gear tooth profile measurement (18)
Table 1. FZG test rig technical specifications (17)
Table 2. Test loads
LS 3 4 5 6 7 8 9 10
Torque,
Nm
35,3 60,8 94,1 135,3 183,4 239,3 302,0 372,7
2.5.2 Sensors
FZG test rig that is available at KTH is equipped with several sensors, that measure eight
following parameters:
• Motor rotation speed
• Input torque
• Output torque
• Temperature in slave gearbox
• Temperature in test gearbox
• Temperature at coolant injection
• Ambient temperature
• Flow rate of coolant
Axle distance a 91,5 mm
Module mn 2 .. 5 mm
Face width b 10 .. 40 mm
Load torque T 0 .. 800 Nm
Rotation speed n 10 .. 3000 min-1
Page 25
23
The sensor positions in this pitting setup are the same as in gear efficiency, these locations are
demonstrated in figure 5. Rotation speed and torque is measured in the motor. Although the
information about the content is limited, it is known that temperature is regulated and measured,
using proportional – integral – derivative (PID) controller, that uses feedback loop to either
increase or decrease temperature, depending on the measurement of previous sample.
Figure 5. FZG test rig sensor locations (17)
2.6 The used data
The data that are used in this project are all stored as .mat format files, each test, for example,
PMGR29 has its own folder and in which each LS is represented as separate file. Then each LS
file contains all measurements and separate vector with scales, that are multiplied with particular
signals to get real measurement values. To make it easier to work with the files, all LSes were
merged in one, 3D array. For each load stage it is in size 58230 x 8, former number representing
samples that are recorded every second and latter – number of total kinds of measurements. They
are recorded as a 0 – 10 V signal, later multiplied by scaling factor to obtain the real values with
the sampling rate – one sample per second.
Page 27
25
3 IMPLEMENTATION
3.1 First view on data
To understand how the given measurements can be used for predictive maintenance, it is useful
to take a look on them in graph format. From all those parameters the rotation speed, Input
torque and temperature in test gearbox will be evaluated further, because output torque is directly
related to output torque and does not show any other difference, and all other parameters do not
show any kind of trends. To get better understanding about the measurements, figures of
temperature, speed and torque from WGR34 test are presented in this chapter, as they have the
most sub – tests and show the most indicators of defects. The figures of remaining tests can be
found in appendix A, as they would take lots of space in the report.
In total, there are four kinds of gears tested: Powder metal gears with ground and super –
finished surface, further called PMGR and PMSF, and Wrought steel gears, also with ground and
super – finished surface, namely, WGR and WSF. Gears of each kind are tested twice, resulting
into eight tests. For each test there is unique ID number to differentiate it from other tests, for
example, WGR33 and WGR34 are separate tests. The data of each gear test are plotted in
MATLAB software.
Rotation speed: WGR33 run – in phase has some irregularities, after that, nothing stands out
visually, fluctuation amplitude is from 1495 to 1505 rpm during all tests. Although, in test
WGR34 situation is different – there are higher periodical irregularities during run – in sub – test
every 500 seconds (around 8 minutes). Sub – tests from LS3 to LS6 are stable with similar
amplitude like in WGR33 test, but then more fluctuations start to appear starting from LS7 until
LS10 with periodically striking spikes. When LS10 is repeated, the amplitude of oscillations
stands constant in range from 1490 to 1510 rpm. It is interesting that at LS10v9 test and further
the amplitude decreases to the level that in LS3 test was.
WSF35 test starts with long periodical fluctuations, after that there is nothing interesting –
amplitude of the fluctuations is between 1490 to 1510 rpm as periodical spikes. The same trend
happens in WSF38 test.
In PMGR29 test, run – in phase is relatively stable, compared to wrought steel gear tests, very
narrow amplitude between 1497 to 1503 rpm. In rest of the sub – tests the amplitude is stable,
without periodical spikes and is between 1495 and 1505 rpm. Similar trend is in PMGR30 test,
however, in LS10 sub – test, after around 9,7 hours, the oscillations start to increase significantly
with some outstanding peaks like 1460 rpm after around 12,5 hours from start. In the end of this
sub – test the amplitude is between 1470 to 1520 rpm. This phenomenon happened because the
teeth of gear just broke off and the operation of transmission failed.
In last two sub – tests with Powder metal, super finished gears (PMSFR31 & PMSFR32) there is
the same trend as in case with ground surface gears, except the episode with broken teeth.
Input torque: In WGR33, run – in phase has similar trend like in the case with speed –
fluctuations with long period. From LS3 to LS6, the fluctuations of torque are between around
75 to 190 Nm. They increase starting from LS7, with range from 100 to 240 Nm, finally topping
at LS10v3 with range from 120 to 300 Nm and the trend remains the same until the last
LS10v10. However, in test WGR34 the observations are quite different. The periods of
fluctuations during run – in phase are shorter, the amplitudes of torque fluctuations are much
higher; from LS3 to LS5 they are between 20 to 240 Nm, few peaks fall even below zero. In next
sub – tests the lowest margin of amplitude remains similar, but the top margin increases
constantly up to 440 Nm in LS10 load case. After that, the amplitude remains the same until
LS10v8, after that the input torque amplitude decreases radically, in LS10v9 it is between around
110 to 280 Nm, with periodical peaks up to 300 Nm and this trend remains until LS10v11.
Page 28
26
In WSFR35, during the run – in phase the torque fluctuates periodically from -10 to 440 Nm,
later the trend remains similar – from -50 to 440 Nm, the upper margin gradually increases
during the test to 500 Nm. Similar trend is in WSF38 test, however, in LS8 is one outstanding
lowest value -513 Nm, which, most probably, is random error as all other measurements fit in
their range.
PMGR29 test shows similar trends; run – in phase with periodical fluctuations, later they
gradually increase from amplitude between 80 to 170 Nm in LS3 and 130 to 280 Nm in the LS9
with some periodical peaks up to 310 Nm. Same situation is with test PMGR30, but in the LS10,
as it was mentioned in speed chapter, the gear teeth broke off, so the motor was disrupted, and it
worked with full load which is easily noticeable in the last figure.
Also, in PMSF31 test the torque increment is noticeable, with amplitude between 80 and 180 Nm
in LS3 to amplitude between 120 and 240 Nm in LS8. Also, during all tests there are periodical
peaks whose values usually are between 20 and 30 Nm in both directions. In LS9 after around
10,5 h since the beginning of this run, the torque started to increase faster than before, 2 hours
later reaching amplitude between 110 and 320 Nm.
The last test, PMSF32, runs in smooth manner, with similar run – in phase like in PMSF31 test,
after that the torque amplitude increases from between 70 and 180 Nm in LS3 to amplitude
between 130 and 270 Nm in the end of test. Also in this test there are some periodical peaks,
whose value is approximately 30 Nm higher than previously mentioned amplitude margin
values.
Temperature in test gearbox: All tests start with larger fluctuations, which can be explained as
heating up the transmission oil, it takes time to stabilize it to 80° C. The temperature fluctuations
happen as a result of increasing temperature from gear contact, in result, the PID controller has
to shut down the heater, and when the temperature falls below 80° C, it switches on again. In
WGR33 test the fluctuations of temperature are increasing until LS10, when the temperature
increases up to 84°C. Similar trend remains for the rest of measurements until the pitting is
detected. LS10 in the first 4 hours has some peaks that are standing out. Meanwhile in WGR34
test the fluctuations around nominal temperature 80° C increases from LS3 to LS7, then in
following two runs they decrease. Then the temperature increases parabolically and the trend
remains the same until the end of the test.
Regarding to both WSF35 and WSF38, these tests have similar trend with increasing fluctuations
until LS7, then they decrease, but in the LS10, 4 hours after the beginning of run, temperature
increases constantly up to 81 and 82° C in the end of the former and latter run of former and
latter tests.
And there is the same trend with Powder Metal gears – in first four sub – tests fluctuations
increase, then decrease, and, finally, the temperature starts to increase, in PMGR29 LS10 has
some waviness with long periods. Meanwhile, in PMGR30 test LS10 demonstrates dramatic
temperature increase, starting already from 89° C until 94°C after 10 hours, then test rig was
turned off for around 20 minutes, when temperature stabilized around 80° C. PMSF31 and
PMSF32 tests demonstrate similar trends and results like PMGR29, although, the temperature in
the Load Stage in both tests increases more – up to 92° C in former and 87,5° C in latter.
Page 29
27
Figure 6. Rotation speed in slave gearbox
Figure 7. Temperature in test gearbox
Page 30
28
Figure 8. Input torque in test gearbox.
3.2 Statistical analysis of the data
To get better understanding about results and have reference for comparison with machine
learning results, statistical analysis is conducted. The parameters that will be used in this analysis
are motor rotation speed, input torque and test gearbox temperature due to same reasons
mentioned in previous sub – chapter. The features that are selected for the analysis are root mean
square (RMS), standard deviation (STD), crest factor (CF) and kurtosis. RMS can help to see
general picture, minimizing noise effect contributions and point out transition of speed or torque
during the test. STD explains, how far each measurement is from mean value of the whole Load
Stage, CF is useful to detect early local damage on tooth, demonstrating periodical peaks and
kurtosis demonstrates the severity of the damage, for example, when its value decreases.
Each of those parameters is calculated for every LS, so it is possible to see the trends more
clearly where raw data can create doubts. Run – in phase is neglected, because it has shorter time
period, and it does not describe regular gear behavior. Some temperature measurements have to
be filtered off, because every time, when next load is applied, it decreases during the shutdown,
but is shooting up during the beginning until the PID controller reaches desirable equilibrium at
80° C. The same applies to the end of each sub – test, as the data also includes the period when
the test rig is shut down and its oil is cooling down from the last temperature.
In the beginning and end of each LS, all parameters need time to increase and reach steady state
or decrease, when the test rig is shut down. During this time data also are recorded, so from
beginning 3500 measurement samples, namely, 3500 seconds are neglected, whereas at the end
the last 2030 samples were neglected, in result, 55000 samples or 15,28 h long sub – test is used
for the analysis. In PMGR29 test LS8 temperature reaches its steady state later than 1000
seconds, so longer number – 2000 samples from the beginning in all Load Stages were neglected
in this test and each sub – test is evaluated equally. In WGR33 test LS5 and in WGR34 test
LS10v4, LS10v5 and LS10v7 were done with sampling rate 2 Hz, so in default their datasets
were 2 times longer than others. Also, sub – test LS5 in WGR33 test was interrupted in the
middle, so the temperature decrease seen in plot was modified and replaced with overall mean
value of the temperature value in this Load Stage, so it does not show false peak in
measurements. LS3 in WSF38 test was run for shorter time (47000 seconds), to make it
compatible with other Load stages for statistical analysis and upcoming machine learning, it was
Page 31
29
resampled to the same length as other Load Stages, yet shorter range was used for calculations,
because after resampling, the decrease of all parameters started earlier and as mentioned before,
this transition from transmission operation to complete end of the work (transmission is shut
down and measurement process is stopped) turns out longer due changed scale.
3.3 Machine learning method selection
For machine learning the same parameters and features are used, so both results can be
compared. For data training Machine Learning and Deep Learning Toolbox on MATLAB was
selected. In this thesis supervised learning is used, as there already are some trends in raw data,
especially in the data of temperature. From classification and regression strategies the latter is
selected, because the measurements are given as time dependent continuous values and can be
used to learn, where the trend can go.
The data are passed to the application when a new session is started (see fig. 9). From the dataset
predictors and responses are selected, in this case the former are measurement signals and latter
are data labels. The next step is validation selection, which is necessary to protect the model
from overfitting. Cross – Validation selects the number of folds to partition the data set using the
slider control. This method gives a good estimate of the predictive accuracy of the final model
trained using the full data set. The method requires multiple fits, but makes efficient use of all
the data, so it works well for small data sets. Holdout Validation selects a percentage of the data
to use as a validation set using the slider control. The app trains a model on the training set and
assesses its performance with the validation set. The model used for validation is based on only a
portion of the data, so holdout validation is appropriate only for large data sets. The final model
is trained using the full data set (19). This time the Holdout Validation is used due compatibility
with large data set. The division between training and testing data is 70:30, as recommended in
(20).
After that interface offers multiple possible model types like linear regression models,
Regression Trees, Support Vector Machines, Gaussian process regression models and ensembles
of trees. From all these the Decision Tree and Support Vector Machine are selected, as these
were the most often mentioned in literature review and more investigated in connection with
damage detection in rotating machines. There are options like fine, medium and coarse decision
tree and linear, quadratic, cubic and gaussian support vector machine. The former differs with
minimum leaf size that varies from 4 to 36, and latter differs with kernel function type. It means
that additional dimension is introduced in order to enable them to operate in a high –
dimensional, implicit feature space without computing the coordinates of the data in that space,
but rather by simply computing the inner products between the data points in the feature space.
From all kinds of Decision Tree, the coarse one is selected as it is the simplest one and from
Support Vector Machine assortment the cubic is selected to see whether there is any gain from
using more sophisticated method than decision tree. Support Vector Machine parameters like
kernel scale, box constraint and epsilon are set to automatic.
Page 32
30
Figure 9. Data selection for training session
3.4 Data preparation for machine learning
Total length of test is very long, for example, PMGR29 contains 7 used LS, each of them 58230
samples long, in result these are 407610 samples. It means that training with such a large data
amount on budget notebook computer with 1,8 GHz processor and 8 GB Random Access
Memory it would be very time and energy consuming. To reduce the burden on the computer in
this project, the length of the data is reduced 100 times. First, like in statistical analysis, run – in
phase is neglected and the beginning and end from each LS is neglected in order to not spoil the
trend in remaining data. All selected ranges of samples are merged together in one vector
representing the whole test. When the beginning and end of each LS are neglected, 52500
samples remaining, then they are sliced in 525 parts, and each of those parts contain 100
samples. Then for each part previously mentioned statistical parameters are calculated and
finally those new calculated values from each LS are merged together. Decision in favor of the
100 samples was made because it would be easier to see the difference between the raw and
reduced data if round number was used for scaling.
To see whether it makes sense to make machine learning model more sophisticated or not, inputs
will be passed to it in two versions: first, only with RMS, second, with RMS, STD, CF and
kurtosis. Regression learner requires labeled data, so additional vector with values from 0 to 1
with increment of dataset length is introduced to each dataset with measurements. These labels
explain how far the measurement is in the test. To make the measurements comparable, the scale
is neglected, in result they all are signals in range from 0 to 10 V again. Although, regression
learner application normalizes the data in scale between 0 and 1 in the whole range, before they
are used for learning.
Page 33
31
4 RESULTS
4.1 Statistical analysis results
The results of statistical analysis are plotted in figures 10 to 21, for each of selected parameters
and CIs, namely, CF, kurtosis, RMS and STD of speed, test gearbox oil temperature and input
torque. There is additional crop view on results in range from first to eight sub – test, as there are
some results that are very close in narrow range, so it is simpler to see them.
Regarding to the speed, CF during the test remains relatively stable in all tests during all LSes.
Both WSF gears are standing above other tests, perhaps it is due noisier signal that was also
noticed in 3.1. chapter. It slightly increases in WGR34 test after LS10, when micropitting and
mild damage only starts to form. The last value of PMGR30 is the highest, which can be
explained by the massive failure of gear, already explained earlier.
Kurtoses in general also do not show significant change, for all measurements they all are around
2,75 and 2,85 in the beginning, relatively larger changes happen in WGR and WSF tests, while
all powder metal gears demonstrate much smoother behavior during all LSes. RMS values do not
vary much, they are slightly lower in WSF35 test, later WGR34 test values drop even lower until
LS10v6, later increasing again. Similar trend is happening in STD graph, where all but WSF35,
WSF38 and WSF34 gears demonstrate increasing value during the, although, the values of both
WSF gears tend to decrease after LS7.
In case with the temperature, the results stand out more: In all cases CF gradually increases from
1,004 at LS3 to 1,008 at LS7, after that it decreases and later in all, except WGR tests, increases
rapidly, which can be explained with significant temperature increase, when macropitting
already starts to propagate. WGR tests show similar trend, but there is smaller CF increase
during repeated LS10 tests than in the last LSes of other tests. Later it remains relatively stable,
as the temperature during previously mentioned tests is already higher than steady temperature
80° C and does not change a lot upwards or downwards.
Kurtoses before pitting appearance tend to vary between 2,2 and 2,7. In difference from CF,
they can go upwards or downwards when the pitting starts to propagate. For example, both WSF
kurtoses tend to go downwards starting from LS8, at the same time, it increases sharply in
PMGR30 to 6, then it rapidly decreases to 1,4. Similar scenario happens in both WGR tests,
when the kurtosis reaches maximum above at LS10v2 and LS10v3 in, namely, WGR34 and
WGR33 tests. Later they start to decrease down to 4 at LS10v7, and after that, once again they
are increasing. RMS in all tests tend to slightly decrease after LS4, at LS5 they become stable
and increase significantly at last LS of each test, except WGR. In those tests the RMS value
increases, but not as high as in other tests, it remains relatively stable during the remaining LSes,
although still higher than nominal steady state temperature.
CF values of torque show very similar picture to speed case, yet here without outstanding WSF
tests also the values in WGR34 test are higher than in other tests. If in majority of tests they tend
to start at around 1,7 and decrease to 1,6 before pitting, WGR values start from 2,2 and tend to
slightly increase during the whole test up to 2,4, whereas both WSF gears start with CF at around
3, it significantly decreases down to 2,4 at the last LS. In contrast, kurtoses show less significant
changes, but also here WSF and WGR34 tests tend to have distance from other tests,
respectively, they vary between 2,1 and 2,5, while other tests tend to be around 3. There is clear
trend in all tests relating to RMS value that increases until the last LS or LS10 in WGR case,
when the same torque was applied for multiple times until severe pitting damage was reached.
Only feature that stands out again is the distance between WSF and other gears that was also
Page 34
32
noticed in raw measurements. Same thing can be noticed in STD graph, where rest of the results
are between 20 and 30 Nm, while WSF35 and WSF38 gears have increasing values from 120
and 110 to 140 and 145 Nm at LS7, later decreasing very close to starting values. Like in CF
plot, also here WGR34 test stands somewhere between WSF and remaining tests, whose STD
starts at 45 Nm and peaks at LS10v4 with value 100 Nm, remaining the same for the rest of the
test.
Figure 10. Speed crest factors
Figure 11. Speed kurtoses
Page 35
33
Figure 12. Speed RMS
Figure 13. Speed STD
Figure 14. Temperature CF
Page 36
34
Figure 15. Temperature kurtoses
Figure 16. Temperature RMS
Figure 17. Temperature STD
Page 37
35
Figure 18. Torque CF
Figure 19. Torque kurtoses
Figure 20. Torque RMS
Page 38
36
Figure 21. Torque STD
4.2 Machine learning trained models
The results of ML application are presented in figures 22 to 37, in each of them four graphs
are included with results from models trained with DT and cubic SVM. They both are trained
whether with RMS only or with all other parameters, including CF, STD and kurtosis. The
blue line represents true values, yellow dots – predicted values and red lines – error, or in
other words, distance between true and predicted value. To make it easier to follow the
results, additional markers from first to last LS are added to the plot. The Root Mean Square
Error (RMSE), Coefficient of Determination (R2) and Mean Absolute Error (MAE) are
calculated for each test and compared in bar chart. As both WGR33 and WGR34 tests were
longer and the damage propagated in those gears much slower than in other tests, additional
training models from LS10v2 in WGR33 and LS10v3 in WGR test were created to see the
difference during the damage propagation.
PMGR29 results with RMS only tend to have stable prediction before severe damage at LS9.
The prediction becomes slightly more accurate at LS8, where the error decreases, but at the
second half of LS8 the model tends to underestimate the response, as the predicted lie below
the blue line before LS9. Both SVM and DT during the last LS tend to show very similar
behavior, when prediction of the condition more accurate than it was during previous LSes.
The situation in results with all inputs is different – while in DT case the results show
something like a step trend, perhaps due the torque that is increased after every LS. However,
the model tends to predict some especially outstanding predictions both in positive and
negative directions, namely, at LS4 there are some signal predictions of value 0,5, while the
true values are between 0,15 and 0,25. At LS7 the ML algorithms underestimate the true
response from 0,55 to 0,7, predicting values even down to 0,2. The prediction at final LS is
nearly identical to one that was in model trained only with RMS. In case with all inputs,
trained with cubic SVM, there are lower underestimations and overestimations of true
response, but the final LS is vaguer than in other PMGR test results. What draws attention is
that in models, trained with all inputs, the prediction error at LS5 and LS6 decreases,
whereas in models trained only with RMS it is similar to other LSes. As it was discussed in
chapter 3.1, in those LSes the temperature response changes were more intense, perhaps it
can be linked to this situation, when there is larger difference between adjacent
measurements, and it is easier to see a difference between them.
In PMGR30 case, when at the last LS the gear was totally damaged, namely, multiple teeth
were broken off, the results are quite ambiguous. In models, trained with RMS only, the
Page 39
37
prediction up to LS9 is relatively even, yet there are some random values periodically
standing out, like it was in PMGR29 test, also with DT. After LS8 in both test there is similar
trend, when damage starts to propagate significantly until in the middle of LS10 the gear was
destroyed. Interesting, that both ML methods in that situation tend to underestimate the real
response thoroughly, when the real value is 0,9, but the predicted value can be even down to
0,1. The behavior of DT model trained with all inputs is quite similar to the one that was
demonstrated also in PMGR29 test with some predictions that are standing out. Compared to
both simpler models, this one also tends to underestimate the total failure of gear, although,
the error is much lower, predicting signal response 0,5, when the true response is 0,95. It
means that at failure more sophisticated model predicts with around 2 times higher precision
than the model trained only with RMS. However, the situation with SVM model that is
trained with all inputs is not as pleasing – there are two severe random errors in model, when
the gear was destroyed. The model predicts signal responses 25 and 22, which is physically
impossible to have in reality, so this model is not usable.
PMSF31 results, when the algorithms are trained only with RMS, demonstrate very similar
behavior, yet the major difference is at the last LS, when gear rapidly gets damaged. DT
model demonstrates nearly perfect prediction, whereas SVM model – less accurate
prediction, even with one random underestimation with true response 0,95 and predicted one
– 0,8. This situation demonstrates the difference between both methodologies very well –
there was obvious temperature increase, hence, larger difference between adjacent data
points, and it is easier for DT be more certain about decision. SVM benefits from more
certain data trend too, yet in this situation it can not be as accurate DT, just because it used
hyperplane to separate two data points and introduces additional dimension to project the
data, so perhaps it is too clumsy for this situation. When both models are trained with all
available inputs, generally the prediction error tends to decrease, in DT case at LS9 it
demonstrates exactly the same prediction as the simpler model does, SVM model is quite
stable during the whole test, but in the beginning of LS6 is one random error with predicted
value 1,7 that could cause fake alarm. The end of LS9, when damage has propagated, the
prediction is less accurate than in simpler SVM model.
The results from PMSF32 tests, trained with RMS, are similar to PMSF31 results, although
they tend to be less symmetrical at LS6 than in PMSF31. Also failure predictions are similar
to the previously mentioned test. In difference from PMSF31 DT case with all inputs, here
the errors are larger from LS6 to LS8, but the failure prediction is as accurate as in former.
SVM model demonstrates similar errors, yet the data trends look like they have specific
hyperbolic shape through the test, as cubic kernel function is used.
WSF35 test already in chapter 3.1. was described as very noisy, namely, the torque signal
had very wide bandwidth and it was problematic any kind of trends to notice. Passing the
RMS to the DT and SVM, the result in both cases has relatively large errors, compared with
powder metal gear tests. They become lower in both cases at LS6 and LS7, however, in
model trained with DT it is less noticeable. SVM model starts to underestimate the signal
response earlier than DT model, former starts to show it at the end of LS7, while latter – in
the middle of LS9. Also DT model has more periodical predictions with large errors,
especially in the beginning and the end. Nevertheless, in DT model it is easier to notice,
where gear failure starts – true and predicted values match nearly perfectly, like in previously
discussed results. Both models trained with all inputs demonstrate similar picture, except that
DT model has lower errors in the middle and larger in the beginning and the end just before
the failure. SVM model looks relatively, except the random error in LS4, giving the predicted
response 3,5 that is totally out of normal predicted value range.
The same trend keeps going in WSF38 results, trained with RMS – DT signal during the test
tends to have more densely distributed errors, whereas in SVM model they tend to vary less
in smaller scale. The failure in DT model is predicted more accurately than in SVM model,
Page 40
38
where it has rather parabolic trend that gives no clear indication that something has been
damaged rapidly. The models trained with all inputs also are similar to WSF35 results, yet in
DT case there are three significant response underestimations, when true response is 0,9, but
predicted – 0,6. SVM model has some negative response estimations, at the beginning of LS7
one prediction is -0,2. Compared to simpler SVM model, this one brings less smooth
predictions, and the last LS, when failure occurs, the prediction is even more vague.
The final gear set is WGR gear set, during the test they did not demonstrate significant
damage, so the highest load was applied multiple times until it finally started to propagate.
WGR33 test results were trained in following configurations: up to LS10v2, LS10v4,
LS10v6, LS10v8 and LS10v10, whereas WGR34 – up to LS10v3, LS10v5, LS10v7, LS10v9
and LS10v11.
WGR33 model results with measurements up to LS10v2 demonstrate quite smooth and
symmetric predicted value distribution around the line of true values, especially SVM,
trained with both RMS only and all inputs. LS10 and LS10v2 in all results look very similar,
showing that something has happened in gearbox behavior. Dataset up to LS10v4 show
different behavior – while predictions until LS10 are very similar to ones in the LS10v2
model, in all kinds of models there is strong overreaction at LS10, following similar yet
smaller patterns at next LSes. The predicted response at LS10 jumps right to value 1, while
in reality it is just over 0,6. While both DT models show similar shape, SVM model trained
with RMS only demonstrates smoother prediction transition that in model trained with all
inputs. In next step, namely, up to LS10v6, the predictions up to LS10 remain similar,
previously mentioned overreaction reduces down to 0,85 in SVM and 0,9 in DT models.
Later, starting from LS10v5 the prediction error increases again, but this time in opposite
direction – that happens in all models. The models with data up to LS10v8 demonstrate again
similar predictions until LS10, then the model overestimates the response again, but this
time, especially in SVM models, the area below predicted values till true value line increases,
also the predicted responses form parabolic shape. Finally, in models with all LSes
significant damage can be detected only in DT models, where predicted and true responses
have much smaller differences than in models trained with SVM. The SVM models even
tend to overestimate the true value in the end.
WGR34 model results with measurements up to LS10v3 in difference from all WGR33
models demonstrate weaker symmetry, especially when they are trained with RMS only.
Both DT models starting from LS10 tend to demonstrate better accuracy than SVM models,
but in the beginning of LS10v3 in all tests there is a sharp underestimation. In this situation
SVM model with all inputs has the lowest error, however, in previous LS10v2 it makes false
overestimation of response with value close to 1,4. Next, in models with data up to LS10v5
SVM model with all inputs at LS8 makes unreasonable underestimation down to -1,6,
although in previous step in this LS everything worked fine. Looking at rest of the models, it
is clear that they become less accurate starting from LS10, compared to previous step. They
all have rapid underestimation at start of LS10v3 and LS10v5. Then, in models with data up
to LS10v7, similar trend keeps happening, but this time at last two LSes DT model shows
more accurate prediction than SVM models. Moreover, SVM model with all inputs again
makes weird underestimation, but this time at different location – the middle of LS10v2. In
next step LS10v9, the same underestimation appears at the same location. At LS10v8 and
LS10v9 DT starts to demonstrate larger error, but still not as large as SVM does. Finally,
when full datasets are trained, the error at LS10v10 and LS10v11 increases significantly,
especially in models trained with SVM.
Page 41
39
Figure 22. PMGR29 ML results
Figure 23. PMGR30 ML results
Figure 24. PMSF31 ML results
Page 42
40
Figure 25. PMSF32 ML results
Figure 26. WSF35 ML results
Figure 27. WSF38 ML results
Page 43
41
Figure 28. WGR33 up to LS10v2 ML results
Figure 29. WGR33 up to LS10v4 ML results
Figure 30. WGR33 up to LS10v6 ML results
Page 44
42
Figure 31. WGR33 up to LS10v8 ML results
Figure 32. WGR33 ML results
Figure 33. WGR34 up to LS10v3 ML results
Page 45
43
Figure 34. WGR34 up to LS10v5 ML results
Figure 35. WGR34 up to LS10v7 ML results
Figure 36. WGR34 up to LS10v9 ML results
Page 46
44
Figure 37. WGR34 ML results
In figures 38 to 43 performance parameters are presented, results of WGR repeated tests are
presented in separate figures. As PMGR30 test, trained with SVM and all inputs, had severe
errors presented before, it also gives large values in bar charts, so it will not be discussed
further. In order to make the charts easier to read, the y axis was limited, so the rest of the
tests are comparable, but this one will be mentioned separately.
In PM gear tests the RMSE tends to be approximately the same around 0,1. However, it is
easy to notice the trend that with more input parameters it tends to decrease, the model
trained with all inputs and DT method has the lowest error in all cases. Similar trend is
visible in WGR tests, yet the errors are around 2 times higher than in PM cases, reaching
even 0,26 in WSF38 test, trained with RMS only and SVM. Perhaps it can be linked to
noisier signal response. Although WGR tests bring relatively similar errors, they differ from
each other in different way – the error tends to be lower in models trained with DT. Similar
trend is happening in MAE chart, although here the PMGR30 test mentioned before shows
relatively low absolute error, compared to other PMGR30 test combinations.
R2 values tend to be from 0,8 to 0,92 in PM tests, namely, the accuracy of prediction is from
80 to 92%. The difference between training combinations is not very big, but still DT with all
inputs also here demonstrates slight advantage. Meanwhile in WSF tests the data tend to vary
more, in result the coefficients are lower, around 0,5 at WSF35, trained with RMS only and
DT, all inputs and SVM and WSF38, trained with RMS only and DT. Better results with
coefficients around 0,65 bring tests WSF35, trained with all parameters and DT, WSF38,
both trained with all parameters and DT and SVM. WSF tests trained with RMS only and
SVM bring the worst results – 0,38 at WSF35 and 0,18 at WSF38. WGR tests are not far
behind all PM tests, as they demonstrate quite good R2 values from 0,7 to 0,86, knowing that
these tests were quite noisy too. Also, here the difference between training methodologies
remains the same – DT models bring better results, especially at WGR34 test.
Looking at the results of additional WGR33 test, it can be concluded that the further the test
is going, the larger error it returns. There is not significant difference between results of tests
up to LS10v2 and up to LS10v4, but the error starts to slightly increase in next tests.
Opposite trend happened at WGR34 test, where it decreased from 0,155 in WGR34 up to
LS10v3, trained with RMS only and SVM to 0,08 in WGR34 up to LS10v9, trained with all
parameters and DT. The difference between training combinations remains similar to WGR
tests with data from entire test, WGR33 models trained with RMS only and SVM tends to be
less accurate than models trained with all inputs and DT. Only exception is WGR33 up to
LS8 trained with RMS only and DT that brings lower error than other training combinations.
Page 47
45
In WGR34 tests DT models tend to return lower error in all cases. Similar trend is happening
also in MAE chart. R2 values demonstrate very low variation in both WGR33 and WGR34
tests, in former they tend to slightly decrease from 0,9 to 0,8 and in later they increase from
0,72 to 0,91. The relation between different training combinations here is similar to one
discussed about RMSE figure.
Figure 38. RMSE of main tests
Figure 39. MAE of main tests
Page 48
46
Figure 40. R2 of main tests
Figure 41. RMSE of additional tests
Page 49
47
Figure 42. MAE of additional tests
Figure 43. R2 of additional tests
4.3 Condition indicator calculations
To have practical application of ML results, it was decided to calculate new condition indicators
(CI) like STD and Signal to Noise Ratio (SNR) from the trained models. The STD is simple way
to detect changes in transmission behavior, whereas SNR compares desired signal to the
background noise. It can be calculated with following formula:
𝑆𝑁𝑅 =µ
𝜎 (5)
Where:
µ - mean value of signal
σ – standard deviation of signal
In figures 44 to 51 mean value, STD and SNR of each gear test are presented, using each
machine learning methodology. These parameters are calculated as moving average values with
Page 50
48
time period 30 minutes. This time was selected during supervisions with the intention of not
being present to test rig all time, so indicators can be checked occasionally to see whether the rig
should be shut down or not.
PMSF gear tests, trained with both DT and SVM, demonstrate similar STD trend – it starts at
around 0,05, then increases up to LS4 and then decreases until LS8 and demonstrate totally
different behavior at LS9, when damage starts to propagate. The only difference is between input
variants – model that is trained with RMS only forms curved arc, whereas model with all inputs
experiences significant STD decrease at LS6 in PMSF31 and at LS5 in PMSF32 test. DT and
SVM models have fundamentally different SNR results in terms of shape and absolute values,
although they peak approximately at the same time. SVM models in the middle of test
demonstrate increase of SNR, but when wider picture is evaluated, it is clear that in the end there
is bigger increase, probably SVM models should be evaluated in longer perspective than DT
models, that suddenly demonstrate severe value increase.
The difference between input variants as well as between ML methodologies in PMGR tests is
not as big as in PMSF tests, and they also form arc – like curve that returns to the same value
that was in the beginning. After that, STD decreases nearly down to zero, in PMGR30 test it
increases significantly, when there was severe damage with gear teeth torn off. Only exception is
model with all inputs and SVM, which does not demonstrate this occurrence, which could lead to
missing the failure. Regarding to SNR, there is difference in PMGR29 test trained with SVM
between variant with all inputs and RMS only – the latter seems to be more sensitive, as there is
rapid peak of value close to beginning of LS9, while model with all inputs demonstrates only
gradual response increase. Like in situation with PMSF gears, also here SNR behaves in similar
manner. The response rapidly increases at LS9, values of models trained with both RMS only
and all inputs overlap so there is no difference between them here.
WSF gear tests, in contrast to all PM gears, demonstrate different response behavior. There is a
difference within models trained with DT and all inputs and RMS only – the response of former
tends to form downward arc during the test, whereas the latter – upward. Interesting, that they
end up in the same place before the last LS, then decrease nearly to zero and in then bring tiny
response, when the damage started to form. Both kinds of inputs in SVM models behave in more
similar manner, especially in WSF38 case. They star at approximately same value, then increase
during LS7 and LS8, then decease and finally increase once again in the last LS. The only
difference is that the model with all inputs three times demonstrated peak in the middle of LS5
and LS8 and in the end of test, so it is more sensitive to noise. SNR values in DT models
demonstrate similar behavior like previously discussed models, and SVM models also peak in
the last LS, yet in WSF35 test the model trained with all inputs demonstrate more rapid increase
from LS5 to LS7 than model trained with RMS only.
WGR test turns out to be the most difficult to evaluate, as the damage in it is not certain in one
exact LS. STD response in both tests is very similar between both input variants. In both
WGR33 DT and SVM tests it tends to form upward arc, then decrease back to start value around
0,02 at LS10, then there is steep increase up to 0,15 at LS10v2, after that there is decrease in
both models at LS10v3. The response tends to increase gradually in DT model, whereas in SVM
model it remains relatively same, even at the end, when more significant damage started to form.
Both WGR34 DT and SVM models form very similar trend and shape, increasing until LS10,
then the pit for two LSes follows until the value increases significantly at LS10v2 and starts to
decrease until the last LS, when it reaches the highest point during the whole test. SNR values in
WGR33 DT models appear only at the last LS, missing the mild damage that was forming during
previous LSes, in WGR test the first response comes at LS6. Meanwhile SVM models for both
tests are noisier, especially WGR33 test which is difficult to evaluate, especially since LS10. In
WGR34 model RMS only data tend to gradually peak around LS10v7 and LS10v8, while model
trained with all inputs in the same region tends to form arc – like upward curve, both responses
behave in the same way at the last LS.
Page 51
49
Figure 44. PMSF indicators, using DT
Figure 45. PMSF indicators, using SVM
Figure 46. PMGR indicators, using DT
Page 52
50
Figure 47. PMGR indicators, using SVM
Figure 48. WSF indicators, using DT
Figure 49. WSF indicators, using SVM
Page 53
51
Figure 50. WGR indicators, using DT
Figure 51. WGR indicators, using SVM
Page 54
52
5 DISCUSSION AND CONCLUSIONS
5.1 Discussion
In total 32 ML models were created during this thesis, four combinations per each test. In
parallel statistical parameters with each input parameter at each LS were calculated as a
reference point for their validation. Statistical analysis shows that speed has the least
contribution to condition indication, because the most important factors like CF and Kurtosis
showed nearly no difference during the test.
Temperature demonstrated mild increase of those factors in the middle of the test, in the end
increasing significantly, this trend can be connected to more intense response of PID controller,
however, so far there is no certain explanation why it happened. One version could be due mild
wear of asperities on gear teeth, although they are more common among wrought steel surfaces,
but also super – finished gears demonstrated this behavior. The temperature gave more useful
feedback in WGR gear tests, when the highest load was applied multiple times until noteworthy
damage started to form.
Torque is measured indirectly and is the measurement that is dependent on resistance in gear
contact, so it gave the most descriptive data from all measurements. As mentioned in Chapter 2,
CF indicates local damage, and in most of the measurements it started to slightly increase
starting from LS4. When it decreases, the damage becomes more evenly spread between teeth
and that kind of behavior demonstrated PM gears. However, wrought steel gears did not follow
the same trend, the CF of WSF gears even constantly decreased up to end of the test. The torque
measurement is connected to the speed measurement, and it reflects also in STD plot, where they
demonstrate the same trend.
The ML models of PM gear tests were simpler to evaluate than the tests of wrought steel gears. It
can be linked to fact that the former gears in all cases experienced severe damage, especially at
PMGR30 test, while at latter case, especially WGR, many teeth remained even undamaged after
the last repeated LS. Training combinations at PM tests show no significant difference in
accuracy, however, DT models brings more accurate response then, when damage already has
started to propagate. It was noticed that often, when the model with all input has low error in
general, a few random errors spoil the whole picture, so it is one of the biggest drawbacks, when
more data are passed to ML model training, although, in case with DT very often it can
demonstrate overall performance.
WGR gear tests, when all their data were used for training, did not demonstrate severe
differences, although, DT models, trained with RMS only and all parameters, gave slightly
higher accuracy than SVM models. When these tests were investigated further, creating the
models with data after each two LSes, there was not big differences between combinations, but
in WGR33 case the most accurate prediction was in the model trained with data up to LS10v2,
after that the predictions became less accurate. Perhaps this was the LS, from which damage
slowly started to form, as temperature response also changed then, as it can be seen in statistical
analysis. When WGR34 test data were trained in the same way, higher accuracy was in the
model with data up to LS10v3, especially, when trained with DT, later the accuracy decreased
and increased once again in the model with data up to LS10v7, when significant damage started
to form. WSF gear tests contain lots of noise that is coming from torque measurements, yet the
damage at the last LS can be easier noticed with DT – like it was in case with PM gears.
The new CIs, extracted from ML models, can be useful from practical perspective, as they
demonstrate clear trend in STD and SNR plots. In both PMGR tests the model trained with RMS
gives more stable trend than the model trained with all parameters. Regardless of whether the
Page 55
53
model is trained with DT or SVM, they show significant changes already at LS8, SNR response
of model that is trained with SVM brings more scrutinized response than DT model, but
sometimes it can overreact. Very similar situation is also with both PMSF tests, STD plot
indicates significant changes at LS9. Regarding to WGR tests, there is no big difference between
any combination in STD plots, but SNR values, especially in WGR33 test, calculated from SVM
models, are hard to interpret, as there are no clear changes visible, here SNR values from DT
models bring stingier but clearer information. In WSF condition indicators it is more difficult to
spot the trends, perhaps the clearest are the ones extracted from DT models, which demonstrate
clear decrease of STD in the middle of LS10.
The tooth profile measurements in (5) show that in PMGR test significant damage starts to form
at LS8, close to the tooth root, in PMSF test it happens at LS9 at the same location, in WSF tests
only mild damage formed at LS10 and in WGR test early damage at tooth root starts to form at
LS9, at LS10v6 micropitting at pure rolling point, ending there with macropitting at LS10v10.
CIs in PMGR tests also demonstrate their value change at LS8, same applies to PMSF tests,
when STD and SNR significantly changed at LS9 and to WSF tests. Although during the writing
process it is unclear whether the profile measurements were made for WGR33 or WGR34 gear,
but the indicators in both WGR tests change their values significantly after LS9, but only STD
values that are calculated from WGR34 DT models show raid decrease at LS10v6, when
micropitting started to form.
If the CI are compared with statistical analysis, the most remarkable similarity is with
temperature parameters, CF and STD of the temperature formed similar trend like the CIs of PM
and WSF gear tests, going upwards at LS4, peaking at LS6 and decreasing at LS8. Meanwhile
CIs of WGR tests show similar trend to temperature Kurtosis and RMS that increased after
LS10.
After comparing all combinations of ML training strategies, it can be concluded that very often
using multiple input parameters gave more precise response of signal, but it came with expense
of random errors that could cause false alarm. SVM method demonstrated the best performance
in wrought steel gear tests, especially WSF, where measurements were the noisiest, but even
then, it gave less accurate response in the end of test than DT method. Same thing happened in
WGR tests, especially, when they were evaluated further. In all PM gear tests DT method was
always demonstrating higher accuracy. Although both methods have their own advantages at
particular situation, the last and one of the most important factors are the consumed time. SVM
is more time – consuming method and can overcomplicate stable test, even more, when
additional inputs are introduced. Thus, the most useful of the tested combinations is DT, trained
with RMS.
5.2 Conclusions
After obtaining the results, the research questions defined in chapter can be answered:
1. Is it possible to detect fatigue, using existing sensor system?
Yes, it is possible. After comparison with reference data like tooth profile measurements
and statistical analysis it can be stated that CIs show change in transmissions behavior,
when damage starts to propagate.
2. How early can the pitting fatigue be detected?
In PM gear case it can be stated that in the moment, when STD decreases back to the
value that was in the start of the test, the damage will start to propagate very soon, so it is
possible to act proactively. The most useful finding was at WGR34 STD indicator, when
Page 56
54
it immediately decreased at LS10v6, when micropitting started to form at pure rolling
zone. It means that in this case it is possible to react early enough before micropitting
turns into macropitting.
3. Which measured parameters give the most useful contribution to predict the
fatigue?
The temperature gives the most useful contribution due its clear changes, closer to the
end of the test. However, it is more useful to pass at least another one measurement to let
algorithms make more nuanced decision. In this project it would be input torque, rotation
speed did not bring as much useful information as other two measurements that were
used in training, also the speed is indirectly related to the torque, so they are repeating
each other, as it was noticeable in their Kurtoses and STD plots.
4. How the test rig can be upgraded to get earlier/more precise pitting detection?
Introduce additional measurements, also higher sampling rate would help to get more
refined data from existing sensors.
Page 57
55
6 RECOMMENDATIONS AND FUTURE WORK
6.1 Recommendations
Increased sampling rate could bring the data with higher resolution, meanwhile economical
factor should be kept in mind, finding the balance between complexity and expense of the model.
In this project due time limitation, it was not done, but for better comparison ML models should
be trained also with temperature or torque only to get even better understanding whether they
give better contribution to the models together or not.
For better decision making more data sources would be desirable. One of the most widely used
are vibration sensors, discussed in (5), (20) and (21). In (21) the vibrations were linked to
transmission error (TE), and noise emissions. The noise and vibrations each were measured with
3 microphones and 3 accelerometers. The test gearbox and microphones were shielded from
ambient noise by a box made of sound absorbing material as initial measurements showed that
the noise from the electric motor was louder than the gear noise, at least for low RPM.
Accelerometer 1 registers vibrations in an axial direction; accelerometer 2 registers vibrations in
a radial direction, at an angle corresponding to the direction of the gear mesh contact force and
accelerometer 3 registers vibrations at a right angle to the direction of accelerometer 2.
6.2 Future work
First work that could be done is to implement developed methodology in practical use, writing
software that every 30 minutes checks the CI values and their trends. When it is done, the
software should be tested in working conditions.
From technical perspective this fault detection can be done also by classification, sorting whether
the gears are damaged or not. This time due time limitation it was not done, because it would
require more time to prepare the data for classification.
As this thesis project was done with limited amount of information and knowledge about AI,
closer cooperation with some student with Computer Science background should be established,
perhaps next time the master thesis, connected to this topic, should be done in cooperation with
another department, for example, Department of Computer Science at KTH.
Page 58
56
7 REFERENCES
1. The impact of wind energy on wildlife and environment. Peiser, B. s.l. : The Global Warming
Policy Foundation, 2019. ISBN 978-1-9160700-1-1.
2. A systematic literature review of machine learning methods applied to. Thyago P. Carvalhoa,
Fabrízzio A. A. M. N. Soaresa,d, Roberto Vitac, Roberto da P. Franciscob. s.l. : Elsevier, 2019,
Vol. 137. 106024.
3. A Review of Gearbox Condition Monitoring Based on vibration Analysis. Yahya I. Sharaf-
Eldeen, Abdulrahman S. Sait. Melbourne, FL : s.n., 2011, Vol. 5.
4. Bergstedt, Edwin. A Comparative Investigation of Gear Performance Between Wrought and
Sintered Powder Metallurgical Steel. Stockholm : KTH Royal Institute of Technology, 2021.
ISBN 978-91-7873-821-2.
5. A quantitatively distributed wear-measurement method for spur gears during micro-pitting
and pitting tests. Jiachun Lin, Chen Teng, Edwin Bergstedt, Hanxiao Li, Zhaoyao Shi, Ulf
Olofsson. s.l. : Elsevier, 2021, Vol. 157. 0301-679X.
6. Beek, Anton van. Advanced engineering design. Delft : TU Delft, 2009. p. 35. ISBN-10
9081040618.
7. —. Advanced engineering design. Delft : TU Delft, 2009. p. 37. ISBN-10 9081040618.
8. —. Advanced engineering design. Delft : TU Delft, 2009. p. 38. ISBN-10 9081040618.
9. Lundteigen, Mary Ann and Rausand, Marvin. Failures and Failure Classification.
Trondheim : NTNU.
10. A systematic literature review of machine learning methods applied to. Thyago P. Carvalho,
Fabrízzio A. A. M. N. Soares, Roberto Vita, Roberto da P. Francisco, João P. Basto, Symone G.
S. Alcalá. Computers & Industrial Engineering, s.l. : Elsevier, 2019, Vol. 137.
11. Machine Learning in Predictive Maintenance twards Sustainable Smart Manufacturing in
Industry 4.0. Zeki Murate Cinar, Abubakar Abdussalam Nuhu, Qasim Zeeshan, Orhan Korhan,
Mohammed Asmael, Babak Safaei. 19, s.l. : MDPI, 2020, Vol. 12.
12. Virtual sensing for gearbox condition monitoring based on extreme learning machine.
Jinjiang Wang, Yinghao Zheng, Lixiang Duan, Junyao Xie4, Laibin Zhang. 2, Beijing : JVE
Journals, 2016, Vol. 19.
13. Condition Indicators for Gearbox Condition Monitoring Systems. P. Večeř, M. Kreidl, R.
Šmíd. Prague : Czech Technical University in Prague, 2005.
14. An artificial neural network-based condition monitoring method for wind turbines, with
application to the monitoring of the gearbox. P. Bangalore, S. Letzgus, D. Karlsson and M.
Patriksson. s.l. : Wiley Online Library, 2017, Vol. 20.
15. A Review of Gearbox Condition Monitoring Based on vibration Analysis Techniques
Diagnostics and Prognostics. Sharaf-Eldeen, Yahya I. s.l. : Springer, 2011, Vol. 5. ISSN 2191-
5644.
16. Machine Learning. [Online] IBM, 15 July 2020. [Cited: 19 April 2021.]
https://www.ibm.com/cloud/learn/machine-learning.
17. Lehrstuhl für Maschinenelemente. [Online] Technische Universität München. [Cited: 23
April 2021.] https://www.mw.tum.de/fzg/forschung/ausstattung/.
18. Gear micropitting initiation of ground and superfinished gears: Wrought versus pressed and
sintered steel. Edwin Bergstedt, Jiachun Lin, Michael Andersson, Ellen Bergseth, Ulf Olofsson.
s.l. : Tribology International, 2021, Vol. 160. 0301-679X.
19. Select Data and Validation for Regression Problem. [Online] Matlab. [Cited: 4 June 2021.]
https://www.mathworks.com/help/stats/select-data-and-validation-for-regression-problem.html.
Page 59
57
20. Wind turbine gearbox failure and remaining useful life prediction using machine learning
techniques. James Carroll, Sofia Koukoura, Alasdair Mcdonald, Anastasis Charalambous. s.l. :
Wiley Online Library, 2018, Vol. 22.
21. Åkerblom, M and Pärssinen, M. A study of gear noise and vibration. Stockholm : KTH,
2002. ISSN 1400-1179 ; 2002:8.
Page 60
58
APPENDIX A: FIGURES OF RAW DATA
Figure 49. Temperature in test gearbox.
Figure 50. Temperature in test gearbox.
Page 61
59
Figure 51. Temperature in test gearbox.
Figure 52. Temperature in test gearbox.
Page 62
60
Figure 53. Temperature in test gearbox.
Figure 54. Temperature in test gearbox.
Page 63
61
Figure 55. Temperature in test gearbox.
Figure 56. Rotation speed in test gearbox.
Page 64
62
Figure 57. Rotation speed in test gearbox.
Figure 58. Rotation speed in test gearbox.
Page 65
63
Figure 59. Rotation speed in test gearbox.
Figure 60. Rotation speed in test gearbox.
Page 66
64
Figure 61. Rotation speed in test gearbox.
Figure 62. Rotation speed in test gearbox.
Page 67
65
Figure 63. Input torque in test gearbox.
Figure 64. Input torque in test gearbox.
Page 68
66
Figure 65. Input torque in test gearbox.
Figure 66. Input torque in test gearbox.
Page 69
67
Figure 67. Input torque in test gearbox.
Figure 68. Input torque in test gearbox.
Page 70
68
Figure 69. Input torque in test gearbox.