Top Banner
Loss Function -Quick Notes- Dragan Samardzija January 2020 1
12

Loss Function - rt-rk.uns.ac.rs · Dragan Samardzija Wireless Research Laboratory Bell Laboratories, Alcatel-Lucent Holmdel, NJ 07733, USA Email: [email protected] Abstract—In

Jan 30, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Loss Function-Quick Notes-

    Dragan Samardzija

    January 2020

    1

  • References

    1. Wikipedia

    2. Data Science: Deep Learning in Python

    https://www.youtube.com/watch?v=XeQBsidyhWE

    3. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

    https://www.youtube.com/watch?v=ErfnhcEV1O8

    4. Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

    2

    https://www.youtube.com/watch?v=XeQBsidyhWEhttps://www.youtube.com/watch?v=ErfnhcEV1O8

  • Likelihood Interpretation

    Information Theory Interpretation

    3

  • Square Error Loss FunctionMinimize

  • Likelihood – Gaussian AssumptionMaximize

    The same answer since log() monotonically increasing function.

  • Cross Entropy Loss Function – Binary ClassificationMinimize

  • LikelihoodMaximize

    The same answer since log() monotonically increasing function.

  • Illustration

  • Likelihood Interpretation

    Information Theory Interpretation

    9

  • Number of Bits Needed to Encode • Information entropy is the average bit rate at which information is

    produced by a stochastic source of data.

    Claude Shannon Ludwig Boltzmann

  • Number of Bits when Mismatched

    • Cross entropy between two probability distributions p and qmeasures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q, rather than the true distribution p.

    • Minimal cross entropy is achieved when the p and q distributions are identical, i.e., when cross entropy becomes entropy.