Loss Function - rt-rk.uns.ac.rs · Dragan Samardzija Wireless Research Laboratory Bell Laboratories, Alcatel-Lucent Holmdel, NJ 07733, USA Email: [email protected] Abstract—In

Loss Function-Quick Notes-

Dragan Samardzija

January 2020

1

References

1. Wikipedia

2. Data Science: Deep Learning in Python

https://www.youtube.com/watch?v=XeQBsidyhWE

3. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

https://www.youtube.com/watch?v=ErfnhcEV1O8

4. Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

2

https://www.youtube.com/watch?v=XeQBsidyhWEhttps://www.youtube.com/watch?v=ErfnhcEV1O8

Likelihood Interpretation

Information Theory Interpretation

3

Square Error Loss FunctionMinimize

Likelihood – Gaussian AssumptionMaximize

The same answer since log() monotonically increasing function.

Cross Entropy Loss Function – Binary ClassificationMinimize

LikelihoodMaximize

The same answer since log() monotonically increasing function.

Illustration

Likelihood Interpretation

Information Theory Interpretation

9

Number of Bits Needed to Encode • Information entropy is the average bit rate at which information is

produced by a stochastic source of data.

Claude Shannon Ludwig Boltzmann

Number of Bits when Mismatched

• Cross entropy between two probability distributions p and qmeasures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q, rather than the true distribution p.

• Minimal cross entropy is achieved when the p and q distributions are identical, i.e., when cross entropy becomes entropy.

Loss Function - rt-rk.uns.ac.rs · Dragan Samardzija Wireless Research Laboratory Bell Laboratories, Alcatel-Lucent Holmdel, NJ 07733, USA Email: [email protected] Abstract—In

Documents