Top Banner
Shane McLaughlin, PhD Center for Automotive Safety Research
20

Shane McLaughlin, PhD - Virginia Tech · 2013. 1. 16. · Shane McLaughlin, PhD Center for Automotive Safety Research. Domain Application/ Understanding Id tif i G l Mk C l i pp

Oct 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Shane McLaughlin, PhD

    Center for Automotive Safety Research

  • Domain Application/Understanding

    Id tif i G l M k C l i

    pp /Deployment

    Identifying Goals

    Selection and Interpretation of

    Make Conclusions

    Addition

    Data Preparation Evaluation

    pOutput

    Data Preparation

    Data Mining

    Evaluation

    2

    Data Mining

  • Are there differences in driver following behavior in urban areas following behavior in urban areas during clear weather versus severe rain?rain?

    3

  • • Acquiring Samples• Understanding the data

    Speed

    • Explore• Evaluate quality• Select interesting subsets

    RadarSelect interesting subsets

    • Plan integration of datasets• Selecting fields/attributes• Sampling design

    Latitude

    • Sampling design

    Urban AreasPrecipitation TimeDateDemographicsVehicle Type

    4

    Vehicle Type

  • • Organizing– Accumulating files– Domain specific applications– Connections to large datasets– Definitions, units, sign, coding

    /• Storage/processing strategy– RAM vs reduced for later– Flat table, mixed format, relational

    Read/write speeds subsequent analysis– Read/write speeds, subsequent analysis

    • Transforming– Format, creating composite variables, separating

    Cleaning• Cleaning– Missing values, noise, outliers, incorrect values

    • Prepare data set from raw foruse in all subsequent stagesuse in all subsequent stages

    5

  • • Three DM Algorithm Components• Event Parsing Component• Crunching• Crunching

    6

  • 1. Stream processing– Numerical methods

    Filters– Filters– Splines– FFTs

    2. Event parsingT i b l l i th h ld d – Triggers – boolean logic, thresholds and combinations

    – Algorithms • Custom scenario recognition code

    Ki ti d l• Kinematic models• Neural Nets• Machine vision

    3. Descriptive Data Capture - IVs and DVs– Within event counts, summaries etc (steering reversals)– Aggregation, trends descriptive statistics (max, mean, dominant

    frequencies)– Classification (lead vehicle braking, intersection turn)– References used for subsequent stages (Target ID, road segment)– Temporal landmarks within data (sync of max brake, sync of

    glance up)7

  • Raw Data

    Data Preparation

    G li fVideoTraining Set

    Model 0

    Generalize from a sample in a way that will identify a broad range

    Validation Set

    VideoReduction Video

    Tune - Make decisions about narrowing, redirecting, or adding

    Model 1

    Test Set

    redirecting, or adding

    Test Set

    Data(unseen)

  • Urban Following Something else

    Urban TrueFalse

    N i S i i i TP/(TP FN)Method finds x% of 

    Predicted

    Following PositiveNegativeType II

    Sensitivity TP/(TP+FN)true events

    Something else

    FalsePositiveType I

    TrueNegative

    Specificity TN/(TN+FP)x% correct saying something is not of 

    interest

    Actua

    l

    Type I interestPositive Predictive 

    ValueNegative Predictive 

    ValueTP/(TP+FP) TN/(TN+FN)Strength of  Strength of 

    Predicted

    gconfirming a true 

    indication

    gconfirming a false 

    indication

    FalseNegatives

    Urban Following

    Something else

    UrbanFollowing

    Hits Misses

    ual

    True Positives

    True Negatives

    True Negatives

    9

    Something else

    False Alarms

    Correct Rejections

    Actu

    FalsePositives

  • Event Counting

    ExposureComputation

    Event Description

    Processing & Capture

    ProcessManagement

    ProcessTracking

    Interruption Recovery

    Data

    Recovery

    Sampling Control Metadata

    Data setIntegration

    Data Addressing

    ring

    Stream Processing

    Event Parsing u

    re M

    onitor

    CountStorageParsing

    Capture of

    Event Processing

    ucc

    ess/

    Fail

    Exposure Variable

    Storage

    10

    Capture of IVs and DVs

    S Exposure Successfully Processed

    Variable Storage

  • Domain Application/Understanding

    Id tif i G l M k C l i

    pp /Deployment

    Identifying Goals

    Selection and Interpretation of

    Make Conclusions

    Addition

    Data Preparation Evaluation

    pOutput

    Data Preparation

    Data Mining

    Evaluation

    11

    Data Mining

  • • Not familiarizing with domain and details of data– Faulty from start– Imbedding assumptions early - too narrow

    • Starting analysis before the data is clean– If detected, rework – If not detected, faulty conclusions

    D t i i diffi lt– Data versioning difficulty• Not designing a DM sampling strategy and monitoring

    successes.– Sampling biasp g– Incorrect exposure estimates– Insufficient data

    • Evaluating on the same data used for developing a model– Optimistic estimates of performance

    12

  • orE

    rro

    Underfit OverfitModel Complexity

    Underfit Overfit

  • Mined eventsMined events

  • Mined events

    Adjustment - random sample. 31% found

    Mined events

    pto be false positives.

  • Stratified Evaluation Approach

    Bi t iBias present in proportion of valid eventsvalid events

    across variable of interest

  • Mined events

    Adjustment - random sample. 31% found t b f l iti

    Mined events

    Adjustment correcting for bias in

    to be false positives.

    correcting for bias in data mining code.

  • Speed

  • Domain Application/Understanding

    Id tif i G l M k C l i

    pp /Deployment

    Identifying Goals

    Selection and Interpretation of

    Make Conclusions

    Addition

    Data Preparation Evaluation

    pOutput

    Data Preparation

    Data Mining

    Evaluation

    19

    Data Mining

  • • Larose, D. T. (2005). Discovering knowledge in data: an introduction to data mining John Wiley & Sons Hoboken NJto data mining. John Wiley & Sons. Hoboken, NJ.

    • Maimon, O., Rokach, L. Eds. (2005). Data mining and knowledge discovery handbook. Springer. New York, NY.

    • Witten, I., Frank, E. (2005). Data mining: practical machine learning tools and techniques 2nd ed Elsevier San Fransico CAtools and techniques 2 ed. Elsevier. San Fransico, CA.

    • http://en.wikipedia.org/wiki/Sensitivity_(tests)• http://www.sigkdd.org/• http://www.kdnuggets.com

    20