Page 1
© Nokia 20171
Nokia internal use
Machine Learning Assisted QoT Estimation
and Planning with Low Margins
Kostas Christodoulopoulos1, I. Sartzetakis2, P. Soumplis2, and E. Varvarigos2
1Nokia Bell Labs, Stuttgart, Germany
2 School of Electrical and Computer Engineering, National Technical University of Athens,
Greece
2019
Page 2
© Nokia 20172
Nokia internal use
Outline
• Motivation
• QoT estimation and margins
• Accurate QoT estimation using ML
– Accuracy evaluation
• Provisioning with accurate QoT estimation
– Incremental multiperiod planning
– Case study
2019
Page 3
© Nokia 20173
Nokia internal use
Motivation
Optical networks are planned to be operated statically
• Provision lightpaths, by estimating QoT at EOL (10 years)
• Ageing, increased interference, inaccuracies – EOL Margins
High margins lead to overprovisioning / high CAPEX &OPEX
• Other overprovisioning factors: inefficient failure handling,
accounting for future traffic demands
Static network operation & overprovisioning will not work as
traffic becomes more volatile – 5G, Edge Cloud
Call for increased efficiency, lower overprovisioning, reduced
margins
2019
Page 4
© Nokia 20174
Nokia internal use
Evolution of margins over time
• Define acceptable performance after accounting for fast penalties (~1 dB), operator’s margin (~1 dB), uncertainties (~2 dB), unallocated (transponder-reach mismatch)
• Traditional approach: target to be acceptable at EoL
• Reduced margins: target to be acceptable at intermediate periods (while we also reduce uncertainties)
Page 5
© Nokia 20175
Nokia internal use
QoT estimation
• QoT estimation is used by a planning or online algorithm
• QoT estimator (Qtool): a Physical Layer Model (PLM)
• Modelling inaccuracy
– Inputs (databases)
• Physical layer parameters: spans, fibers, amplifier parameters, etc.
– Uncertainty: measuring errors, outdated measurements
• Connections parameters: route, spectrum, baudrate, modulation, etc.
– Output: lightpaths’ QoT (SNR, BER, etc.)
• Design margin: account for inaccuracies (model and input parameters)
• System margin: account for QoT deterioration until EOL (equipment
ageing, increased interference, fiber cut reparation, etc.)
Provisioning algorithm
(placement of equipment and configurations, routing
and wavelength assignment, etc.)
Design margin
- PLM model inaccuracy - Physical layer
parameters uncertainty
System marginto reach EOL- Interference
- Ageing
Physical layer
parameters
Established connections
QtoolPhysical Layer Model
(PLM)
Demands
2019
Page 6
© Nokia 20176
Nokia internal use
QoT estimation with Machine Learning
• Assume established connections / brownfield deployment
• Use monitoring and machine learning (ML)
– Understand actual network conditions
• Reduce design margin: improve accuracy of physical layer parameters
• Reduce system margins: no need to target EOL
• Monitoring data
– Power monitors
– Rx (e.g. dispersion, SNR, BER), focus on SNR (used by developed models)
but a Rx gives aggregated information →account for routes & spectrum
• Lightpaths sharing common links share information
• Lightpaths relative spectrum position includes information
• Target per link and wavelength/interference QoT parameters
Lightpaths crossing the same
link share information
2019
Physical layer
parameters
ML train
QtoolPhysical Layer Model
(PLM)
Design margin
- PLM model inaccuracy - Physical layer
parameters uncertainty
System marginsto reach EOL- Interference
- Ageing
Established connections
Accuracy improved
Reduced(no need to
target EOL)Reduced
Monitored data
Monitoring
Page 7
© Nokia 20177
Nokia internal use
1st method: Machine Learning - Physical Layer Model (ML-PLM)
• Physical Layer Model (PLM)
– Inputs
• Connection parameters 𝑃
routes, spectrum, TRx configuration (baudrate, modulation, etc.)
• Physical layer parameters 𝑏
spans, fiber attenuation, dispersion, nonlinear coefficients, amplifiers
parameters, ROADM parameters
– Output: lightpaths’ QoT (SNR) estimates 𝑄 𝑏, 𝑃
• Parameters 𝑏 not accurately known, yield QoT estimation error
• Train PLM using monitoring 𝑌 𝑃 → ML-PLM
2019
ML train
QtoolPhysical Layer Model
(PLM)
Physical layer
parameters
Established connections
𝑄 𝑏, 𝑃
𝑏
𝑃
Monitored data
Monitoring
𝑌 𝑃
[1] E. Seve, J. Pesic, C. Delezoide, S. Bigo, and Y. Pointurier, “Learning Process for Reducing Uncertainties on Network Parameters and Design Margins”, JOCN 2018
Page 8
© Nokia 20178
Nokia internal use
1st method: Machine Learning - Physical Layer Model (ML-PLM)
• ML training
– Initialize physical layer parameters b0
datasheets or (outdated) measurements
– Fit (iteratively) parameters 𝑏𝑖 to min the error 𝑌 𝑃 − 𝑄 𝑏𝑖 , 𝑃
• Fitting algo depends on the PLM model, and if we know ∂Q/∂bj
• If Q is nonlinear to some bj→ nonlinear fitting
– Obtain fitted physical layer parameters 𝑏∗
• For a new connection, 𝑤 (𝑃′ = {𝑃U 𝑤}), use learned parameters 𝑏∗
to estimate 𝑄(𝑏∗, 𝑃′), when deciding how to establish it
• Implementation: Qtool = GN model, fitting = nonlinear regression (Levenberg-Marquardt nonlinear least squares algorithm)
2019
ML train
QtoolPhysical Layer Model
(PLM)
Physical layer
parameters
Established connections
𝑄 𝑏, 𝑃
𝑏
𝑃
Monitored data
Monitoring
𝑌 𝑃
Page 9
© Nokia 20179
Nokia internal use
2nd method: Machine Learning Model (ML-M)
• Without a Qtool
• ML-M: Machine Learning Model, functioning as Qtool
– Input
• Features 𝑋 = 𝑓 𝑃 , 𝑋 : matrix with one row per lightpath
For a lightpath its row - features represent QoT-related parameters
• ML Model coefficients 𝛩
depend on the particular ML model (linear regression, NN, SVM, etc.)
– Output: lightpaths’ QoT estimates෩𝑌(𝑋, 𝛩)
• Use monitoring 𝑌(𝑃) to train the model and obtain 𝛩∗ that yield low estimation error
ML train
ML-MQoT estimator
ML-M coefficients
Established connections
Features extraction 𝑋 = 𝑓 𝑃
𝑋 ෩𝑌(𝑋, 𝛩)
𝛩
𝑃
2019
Monitored data
Monitoring
𝑌 𝑃
Page 10
© Nokia 201710
Nokia internal use
2nd method: Machine Learning Model (ML-M)
• Choosing appropriate features
– Literature: end-to-end features (e.g. path length, #hops, #EDFAs)
• Cannot cope with network heterogeneity
→ Per link features
• Features matrix with link features: X=[B A S W]|P|x(1+3|L|)
Bias+3 sets of link features for the 3 major impairment classes
– A: ASE, S: SCI, X: XCI
e.g. Sp,i=PSDp3 (power spectral density) of lightpath p if it uses link i, else=0
SCI noise contribution depends (linearly) on lightpath’s PSD3
ML train
ML-MQoT estimator
ML-M coefficients
Features extraction 𝑋 = 𝑓 𝑃
𝑋 ෩𝑌(𝑋, 𝛩)
𝛩
BBias
Ap,l =1 Sp,l = 𝑃𝑆𝐷𝑝
3
𝑊𝑝, 𝑙 =
𝑝′𝑃𝑆𝐷𝑝 ∙ 𝑃𝑆𝐷𝑝′
2 ∙
{𝑎𝑠𝑖ℎ𝑛 𝑑𝑝,𝑝′ + 𝐵𝑝′/2 ∙ 𝐵𝑝 -
𝑎𝑠𝑖ℎ𝑛 𝑑𝑝,𝑝′ − 𝐵𝑝′/2∙ 𝐵𝑝 }
2019
Established connections
𝑃Monitored
data
Monitoring
𝑌 𝑃
Page 11
© Nokia 201711
Nokia internal use
2nd method: Machine Learning Model (ML-M)
• Features designed so that the impairment noise contribution
depends (close to) linearly on them
• 𝑋𝑝,𝑗 the jth impairment/link feature of lightpath 𝑝, the noise contribution
of impairment on that link is approximated well with 𝑛𝑗 𝑋𝑝,𝑗 , 𝛩 = 𝑋𝑝,𝑗 ∙ 𝛩𝑗
• Noise additivity assumption
– The total noise of lightpath 𝑝 is σ𝑗 𝑛𝑗(𝑋𝑝,𝑗 , 𝛩)
• ML-Model: linear regression ෩𝑌 𝑋, 𝛩 = 𝑋 ∙ 𝛩, gradient decent
• Also tried NN and SVM, and obtained similar results
• NN, SVM would account better for nonlinear dependency of
features and other more complicated features
2019
ML train
ML-M QoT estimator
ML-M coefficients
Features extraction 𝑋 = 𝑓 𝑃
𝑋 ෩𝑌(𝑋, 𝛩)
𝛩
BBias
Ap,l =1 Sp,l = 𝑃𝑆𝐷𝑝
3
𝑊𝑝, 𝑙 =
𝑝′𝑃𝑆𝐷𝑝 ∙ 𝑃𝑆𝐷𝑝′
2 ∙
{𝑎𝑠𝑖ℎ𝑛 𝑑𝑝,𝑝′ + 𝐵𝑝′/2 ∙ 𝐵𝑝 -
𝑎𝑠𝑖ℎ𝑛 𝑑𝑝,𝑝′ − 𝐵𝑝′/2∙ 𝐵𝑝 }
Established connections
𝑃Monitored
data
Monitoring
𝑌 𝑃
Page 12
© Nokia 201712
Nokia internal use
ML QoT Estimation – accuracy evaluation
• Ground truth (create monitoring data and obtain estimation error): GN model
• 12 nodes Deutsche Telecom topology
• Physical layer parameters
– Span parameters (length, fiber coefficients): ±0%, 10%, 20% around default values
– Actual parameters assumed unknown → uncertainty: 0%, 10% and 20%
• Traffic
– 4 traffic loads 100, 200, 300, 400 connections, 80% training, 20% testing
– Uniform source-destination, uniform baudrate: 32, 43, 56 Gbaud
– 500 instances per load
• ML-PLM, ML-M
2019
Page 13
© Nokia 201713
Nokia internal use
Mean Square Error
• Both ML-PLM and ML-M achieve excellent MSE
• ML-PLM is better (note: the ground truth and the trained PLM are the same)
• ML-PLM’s error is higher for higher uncertainty
– Starts from default / average parameters and learns
• ML-M’s accuracy is not sensitive to uncertainty since it does not assume any default parameters
2019
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
100 200 300 400
MSE
SN
R (in
dB)
Number of lightpaths
ML-PLM (0% uncertainty)
ML-PLM (10% uncertainty)
ML-PLM (20% uncertainty)
ML-M (0% uncertainty)
ML-M (10% uncertainty)
ML-M (20% uncertainty)
Page 14
© Nokia 201714
Nokia internal use
Maximum Overestimation
• Similar findings for max overestimation
• Design margin = max overestimation– SNRover= SNRest - SNRreal , for threshold SNRthr , it is safe if SNRreal> SNRthr➔ SNRest - SNRover > SNRthr
• ML-PLM design margin: 0.05 dB, ML-M design margin: 0.2 dB for 200 lightpaths
2019
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
100 200 300 400
Max
ove
rest
imat
ion
SNR
(in d
B)
Number of lightpaths
ML-PLM (0% uncertainty)
ML-PLM (10% uncertainty)
ML-PLM (20% uncertainty)
ML-M (0% uncertainty)
ML-M (10% uncertainty)
ML-M (20% uncertainty)
(untrained) PLM max overestimation 0dB for 0% uncertainty0.9dB for 10% uncertainty2dB for 20% uncertainty
Page 15
© Nokia 201715
Nokia internal use
Quantifying savings of accurate QoT estimation
• Multi-period/incremental planning (period=several months to years)
1. Traditional: provision with high margins to reach end-of-life (EOL) and account for inaccuracies
• System margin: equipment ageing, interference increases, maintenance operations
• Design margin: QoT estimation model inaccuracy
2. With reduced margins / accurate QoT estimation
• New connections: provision them with enough margins to reach next (or some targeted) period
• Established connections: check their QoT and reconfigure or add regenerators to reach next (or targeted) period
2019
Page 16
© Nokia 201716
Nokia internal use
Incremental planning algorithm
• Input at the start of period τi
– Traffic described by the remaining and new demands
– TRx installed at previous periods / established lightpaths (up to τi)
– Equipment e.g. capabilities of Flex- (or fixed-) rate TRx
• Interface with QoT estimator
• Objective
– Serve traffic
• Cater for remaining lightpaths that run out of margins, serve new demands
– Minimize added cost
• Algorithm: heuristic, examines previous and new connections 1-by-1 [1][2]
[1] P. Soumplis, K. Christodoulopoulos, M. Quagliotti, A. Pagano, E. Varvarigos, "Network Planning with Actual Margins“, Journal of Lightwave Technology (JLT), 2017
[2] P. Soumplis, K. Christodoulopoulos, M. Quagliotti, A. Pagano, E. Varvarigos, "Multi-Period Planning With Actual Physical and Traffic Conditions", Journal of Optical Communications & Networking (JOCN), 2018
Provisioning algorithm
(placement of equipment and configurations, routing
and wavelength assignment, etc.)
Established connections
New demands
QtoolPhysical Layer Model (PLM)
Available equipment
2019
Page 17
© Nokia 201717
Nokia internal use
Incremental Planning and ML-PLM
QtoolPhysical Layer Model (PLM)
Provisioning algorithm
(placement of equipment and configurations, routing
and wavelength assignment, etc.)
Design margin
- PLM model inaccuracy - Physical layer
parameters uncertainty
System marginsto reach EOL- Interference
- Ageing
Physical layer
parameters
Initial demands
Provisioning algorithm
(placement of equipment and configurations, routing
and wavelength assignment, etc.)Established
connections
Provisionconnections
ML train
Monitoring
QtoolPhysical Layer Model (PLM)
Design margin
- PLM model inaccuracy - Physical layer
parameters uncertainty
system marginto reach EOL- Interference
- Ageing
Physical layer
parameters
Accuracy improved
Reduced(to reach next
period)Reduced
Provisionconnections
Reduced(to reach next
period)
Greenfield Brownfield / Incremental planning
Available equipment
New demands
2019
Page 18
© Nokia 201718
Nokia internal use
Case study – Topology, traffic, TRx
• DT network topology
• 11 periods, 1 period ≈ 1 year, incremental planning every 1 year
• Initial traffic: 200 initial connections, uniform src-dst, uniform [100-200] Gbps
• Traffic increases by 20% per period
• 2 types of TRx: TRx1 available at period 0 (τ0), TRx2 available at period 5 (τ5)
– TRx1: 32 Gbaud, DP-QPSK to DP-16QAM, SNRthr=0.01dB, cost= 1, at period 0 (τ0)
– TRx2: 64 Gbaud, DP-QPSK to DP-32QAM, SNRthr=0.01dB, cost= 1, at period 5 (τ5)
• Cost reduction 10% per period
Data
Rate
(Gbps)
Baud
Rate
(Gbaud)
Mod
Format
BOL ageing
& BOL
interf. &
High design
EOL ageing &
EOL interf. &
Low design
EOL ageing &
EOL interf. &
High design
100 32 DP-QPSK 4720 3600 2280
150 32 DP-8QAM 2080 1600 1280
200 32 DP-16QAM 1040 800 560
Data
Rate
(Gbps)
Baud
Rate
(Gbaud)
Mod Format
BOL ageing &
BOL interf. &
High design
EOL ageing &
EOL interf. &
Low design
EOL ageing &
EOL interf. &
High design
200 64 DP-QPSK 4160 2800 2240
300 64 DP-8QAM 2720 1840 1440
400 64 DP-16QAM 1840 1280 960
500 64 DP-16QAM 1280 880 640
2019
Page 19
© Nokia 201719
Nokia internal use
Case study – Physical layer evolution & margins
• Initialize with heterogeneous spans and uncertainty
– Attenuation, dispersion, nonlinear coefficients
uniformly around default values ±10%
– Unknown to QoT estimator, requires ~1 dB margin
• Ageing: increase per period according to table
• 10 instances (load & physical layer), average results
• Planning with high margins
– EOL system margin (EOL ageing & full load interference), BOL design margin (2 dB)
• Planning with reduced margins - ML-M (or ML-PLM) and incremental planning algorithm
• Initial period: design = 2 dB, system = 2 periods ageing & current interference
• Each period, train ML-M and obtain new design (=1dB+0.2dB+training max overest.), system = 2
periods ageing & current interference
2019
Physical layer parameters evolutionIncrease per
period
syst
em m
argi
n
Age
ing
Transponder margin (dB) 0.05
Attenuation (dB/km) 0.0015EDFA noise figure (dB) 0.1OXC loss (dB) 0.3
Interference According to load
Page 20
© Nokia 201720
Nokia internal use
Case study - basic comparison
• The reduction of the system margin postpones the purchase of equipment
• The reduction of the design margin (ML – learning) avoids the purchase, after the first period
• ~20% savings at the end of 10 periods
– Could be even higher if we optimize the power
2019
0
0.05
0.1
0.15
0.2
0.25
0
100
200
300
400
500
600
0 1 2 3 4 5 6 7 8 9 10
savi
ngs
TRx
Co
st
Period
EOL planning
Reduced margins planning
savings
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5 6 7 8 9 10
Des
ign
mar
gin
(d
B)
Periods
used design margin(1dB+0.2dB+training max overest)
max estimation error (afterestablishing) +1dB
Page 21
© Nokia 201721
Nokia internal use
Conclusions
• Traditionally lightpaths are provisioned using a QoT estimator (PLM) and EOL margins
• Developed 2 ML QoT estimators (with a PLM and without)
– Use monitoring data, understand physical conditions and ageing, reduce system margins
– Very good accuracy, design margin reduced to 0.2 dB with few 100s lightpaths
• Quantified savings of accurate QoT estimation
– Integrated ML-M with incremental planning algorithm
– Multiperiod planning case study
~20% savings with accurate QoT estimation/planning with reduced margins as opposed to EOL margins
2019
Page 22
© Nokia 201722
Nokia internal use
2019
<Document ID: change ID in footer or remove> <Change information classification in footer>
Copyright and confidentiality
The contents of this document are proprietary and
confidential property of Nokia. This document is
provided subject to confidentiality obligations of the
applicable agreement(s).
This document is intended for use of Nokia’s
customers and collaborators only for the purpose for
which this document is submitted by Nokia. No part
of this document may be reproduced or made
available to the public or to any third party in any
form or means without the prior written permission of
Nokia. This document is to be used by properly
trained professional personnel. Any use of the
contents in this document is limited strictly to the
use(s) specifically created in the applicable
agreement(s) under which the document is
submitted. The user of this document may voluntarily
provide suggestions, comments or other feedback to
Nokia in respect of the contents of this document
("Feedback").
Such Feedback may be used in Nokia products and
related specifications or other documentation.
Accordingly, if the user of this document gives Nokia
Feedback on the contents of this document, Nokia
may freely use, disclose, reproduce, license,
distribute and otherwise commercialize the feedback
in any Nokia product, technology, service,
specification or other documentation.
Nokia operates a policy of ongoing development.
Nokia reserves the right to make changes and
improvements to any of the products and/or services
described in this document or withdraw this
document at any time without prior notice.
The contents of this document are provided "as is".
Except as required by applicable law, no warranties
of any kind, either express or implied, including, but
not limited to, the implied warranties of
merchantability and fitness for a particular purpose,
are made in relation to the accuracy, reliability or
contents of this document. NOKIA SHALL NOT BE
RESPONSIBLE IN ANY EVENT FOR ERRORS IN
THIS DOCUMENT or for any loss of data or income
or any special, incidental, consequential, indirect or
direct damages howsoever caused, that might arise
from the use of this document or any contents of this
document.
This document and the product(s) it describes
are protected by copyright according to the
applicable laws.
Nokia is a registered trademark of Nokia Corporation.
Other product and company names mentioned
herein may be trademarks or trade names of their
respective owners.
2019
Page 23
© Nokia 201723
Nokia internal use
QoT estimation – state of the art
Internal
[1] I. Sartzetakis, K. Christodoulopoulos, C. Tsekrekos, D. Syvridis, E. Varvarigos, "Quality of transmission estimation in WDM and elastic optical networks accounting for space–spectrum dependencies", JOCN 2016
[2] C. Rottondi, L. Barletta, A. Giusti, M. Tornatore, “Machine learning method for quality of transmission prediction of unestalished lightpaths”, JOCN 2018
[3] E. Seve et. al., “Learning Process for Reducing Uncertainties on Network Parameters and Design Margins,” JOCN, 2018.
[4] M. Bouda, et. al. “Accurate Prediction of Quality of Transmission Based on a Dynamically Configurable Optical Impairment Model”, Journal of Optical Communications and Networking, 2018. (Fujitsu)
[5] P. Samadi et. al., “Quality of Transmission Prediction with Machine Learning for Dynamic Operation of Optical WDM Networks,” ECOC 2017. (Bergman)
[6] G Choudhury, et. al., ”Two use cases of machine learning for SDN-Enabled IP/Optical Networks: traffic matrix prediction and optical path perfomrnace prediction”, JOCN 2018
Paper Qtool ML method Features Simulations /Experiments Comment
[1] without Qtool RegressionKriging (linear correlation)
Interference aware links(1/SNR per link, different links according to adjacent lightpaths)
Simulations, GN model as ‘ground truth’ Homogeneous’ spans
[2] without Qtool ClassificationK-nearest neighbors, Random Forest
#hops, path length, longest link, modulation format, network traffic volume, …
Simulations, GN model with worst case interference as ‘ground truth’ Worst case interference‘Homogeneous’ spans
[3] with Qtool (inhouse, GN model) RegressionGradient decent
Input parameters of Qtool (power at nodes, XXX) Simulations, GN model and inhouse Qtool as ‘ground truth’, worst case interference
Worst case interreference
[4] with Qtool (inhouse close to GN) Calculate the derivatives, similar to gradient decent
length-dependent loss and nonlinearintensity (NLI) noise based on the GN model [15], computing in each fiber span the SPM-like and XPM-like noise contributions due to nonlinear effects in fiber based on frequency spacing between optical signals, their signalpower levels, and the fiber nonlinear coefficient [2]. In this work we used only a single-mode fiber (SMF).
Experiments, 6 nodes !
[5] With Qtool (GN model) RegressionMaximum likehood / extended Kalman filter
Not clearly described, for sure SNR and wavelength Experiments (6 nodes, VOAs to emulate different link lengths), and simulations (to evaluate benefits)
Account for non-homogeneous amplification
[6] Without Qtool Ridge regressionLASSO regressionLASSO with quadratic featuresMultilayer perceptronGaussian process regressionGradient boosted regression treesRandom forest regression trees
26 input features for each wavelength or data sample.These features include data rate, fiber type, frequency, length of path, margin, measured fiber loss, measurement date, number of amplifiers in the path, number of passthrough ROADMs, optical return loss (ORL), end-of-path optical signal-to-noise ratio (OSNR), and polarization mode dispersion (PMD).We estimate the OSNR of each fiber section based on launch power, amplifier noise, and measuredspan loss.We then combine these fiber section estimates to estimate the end-to-end path OSNR. In cases where regeneration is needed, we treat the sections between regeneration points as separate wavelengths.
Experiments
2019