Top Banner

of 31

Shazia Sadiq

Apr 05, 2018

Download

Documents

Ark Group
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/2/2019 Shazia Sadiq

    1/31

    .niversity of [email protected]

  • 8/2/2019 Shazia Sadiq

    2/31

    s talk is base on your data

    conducted by researchers

    ,

    participated in a survey in

    to identify key data and

    by industry

  • 8/2/2019 Shazia Sadiq

    3/31

    The University of Quee

    learnin and research ingraduates since openingareas of society and ind

    I work in the Data and

    BIG Data Manageme ata n ng an na

    Spatio-temporal, Mul

    Information Modellin

    sland (UQ) is one of Australias pr

    titutions has roduced almost 197in 1911 that have become leaders istry.

    nowledge Engineering research

    t

    yt cs

    imedia, Text/Web, Data Streams

    g and Semantics

  • 8/2/2019 Shazia Sadiq

    4/31

    kground and

    esearch and Industry Innovatledge, which is diversified, a

    p nary

    re but also its boundaries (Be

    eived lack of synergy between

    ationale

    ons constitute a large body ofplication specific, and cross-

    basat and Zmud, 2003)

    research community and Indus

  • 8/2/2019 Shazia Sadiq

    5/31

    rall Objective

    tify the key concepts/themes d

    the past 20 years.zia Sadiq, Naiem Khodabandehloo Yeganeh andds and synergies. In Proceedings of: The 22nd

    stralia, (1-10). 17-20 January 2011.

    in industry feedbackon these k

    rc commun y on a us ng

    nducted at DQAsiaPacific 2011zia Sadiq, Vimukthi Jayawardene, Marta Induls

    nagement. International Conference on Informati

    vember, 2011

    tify the key capability areas

    roved data ualit and enli

    of the Resear

    veloped by DQ research com

    Marta Indulska. 20 years of data quality research

    ustralasian Database Conference (ADC 2011) Pe

    y concepts and enlighten the

    e u ure researc rec ons.

    a. Research and Industry Synergies in Data Quali

    on Quality (ICIQ2011), Adelaide, Australia, 18-2

    hich contribute most towar

    ten the industr ractitioner

  • 8/2/2019 Shazia Sadiq

    6/31

    dy Methodolog

    dy incorporates two separate co

    rature analysis:

    the research community.

    actitioner survey:va ate t e mportance o t ese coctitioner point of view along with

    y

    ponents.

    cepts t emes romheir implementation challenges.

  • 8/2/2019 Shazia Sadiq

    7/31

    rature Analy

    is.

    Conceptual analysis approa

    Selection of ublication outleby discipline rankings

    Over 30,000 publications (19

    Relevance scanning

    Multiple levels of keywords

    ons erat on o synonyms

    Two rounds of paper identificFull text content analysis

  • 8/2/2019 Shazia Sadiq

    8/31

    rature Analysi

    Includes

    rences BPM, CAiSE (Workshops), CIKM,

    ECOOP, EDBT,PODS, SIGIR, SIG

    WIDM, WISEences ACIS, AMCIS, CAiSE, ECIS, E

    ICIS IFIP IRMA IS Foundations P

    ls TODS, TOIS, CACM, DKE, DSS

    JDM, TKDE, VLDB Journal

    ls BPM, CAIS, EJIS, Information and

    ISF, ISJ (Black-well), ISJ (Sarasota),

    , .

    umber of publications considered ->

    .

    Total Data/Informati

    ASFAA,

    OD, VLDB,

    7535 476

    , HICSS, ICIQ,

    ACIS

    13256 651

    , ISJ (Elsevier), 8417 93

    anagement,

    JAIS, JISR,

    2493 144

    1701

  • 8/2/2019 Shazia Sadiq

    9/31

    onomy of DQ reas of Study

  • 8/2/2019 Shazia Sadiq

    10/31

  • 8/2/2019 Shazia Sadiq

    11/31

    h N t k

  • 8/2/2019 Shazia Sadiq

    12/31

    earch Netwo ks

    h N t k

  • 8/2/2019 Shazia Sadiq

    13/31

    earch Netwo ks

  • 8/2/2019 Shazia Sadiq

    14/31

    r understanding of core of data

    of between multi le communiti

    siness Analysts, who focus onorganizatia y o ec ves or e organ za on an sd standards required to manage and ensure

    lution Architects, working onarchitectur

    uired to deploy developed data quality ma

    tabase Experts and statisticians, contribd efficient IT tools & computational techni

    antic integrity constraints, and informati

    uality research as well as synerg

    s contributin to data ualit sol

    nal solutions that is the development of da eg es o es a s e peop e, processes,the data quality objectives are met

    al solutions, that is the technology landsca

    nagement processes, standards and policie

    ting tocomputational solutions, that is efques required to meet data quality objectiv

    , ,n trust and credibility

    d M h d l

  • 8/2/2019 Shazia Sadiq

    15/31

    dy Methodolog

    dy incorporates two separate co

    rature analysis:

    the research community.

    actitioner survey:va a e e mpor ance o ese coctitioner point of view.

    y

    ponents.

    cep s emes rom

    titi S

    (D i )

  • 8/2/2019 Shazia Sadiq

    16/31

    ctitioner Surve

    up ng e eywor s en e

    esearch themes (Data Quality Fa

    Quality Assessment. (statistical profiling, error

    Quality Frameworks. (governance, benchmark

    Modelling and Design. (schema quality, docum

    Integration and Linkage. (schema matching, d

    ent formats, ETL/Data Warehousing )

    Constraints and Rules. (business rules, data sta

    neage. provenance, ata trac ng, source attr

    Acquisition and Presentation. (data interfaces,

    media data)

    vey questionnaire was designed based

    (Design)

    n e axonomy, we recogn ze

    tors).

    etection, metrics, cost estimation methods)

    ng, best ractices, standards)

    ntation/meta-data, managing legacy systems)

    plicate detection/entity resolution, use of master

    ndards, key/id management)

    ut on, owners p

    data entry, data collection/upload e.g sensor & R

    on the above themes

    titi S (E ti )

  • 8/2/2019 Shazia Sadiq

    17/31

    ctitioner Surve

    arget audience was data quality profes

    ctive participation in data quality related

    dustry conferences

    ofessional bodies

    than 200 Participants were reached us

    ough direct invitations in an online web

    nse rate was around 30% )

    (Execution)

    ionals identified through various sources

    nline forums,

    ng either printed version of the question

    ite.

    .

    nalysis and Re

    ults

  • 8/2/2019 Shazia Sadiq

    18/31

    nalysis and Re

    vel of data quality management tr

    .

    eneral Im ortance of the D fac

    plementation success of the DQ

    actitioners point of view.

    s a s ca ana ys s o n ou e

    ults

    ining possessed by the industry

    ors from ractitioners oint of vie

    factors in organizations from

    os s gn can ac ors or a a q

    ticipant Demo raphics

  • 8/2/2019 Shazia Sadiq

    19/31

    ticipant Demo

    groun an emograp c n orma o

    f data quality training

    ajority of the data quality professionals h

    raphics

    n a ou par c pa ng prac oners:

    32% of the respondents work for large or> emp oyees

    27% of the respondents work for mediu

    41% are from small sized organizations

    The average number of completed data qprojects per participant is 13.

    ave not received any formal training in d

    rces of data q ality problems

  • 8/2/2019 Shazia Sadiq

    20/31

    rces of data q ality problems

    portance of th DQ factors

  • 8/2/2019 Shazia Sadiq

    21/31

    portance of th

    ta Quality concept

    Ver

    Lowuality Assessment

    17.4

    uality Frameworks6.5%

    odelling and Design4.4%

    4.4%

    onstraints and Rules 4.4%

    ineage4.7%

    c uisition and Presentation6.10

    DQ factors

    Low Medium High

    2.2% 8.6% 19.6% 5

    8.7% 10.9% 19.6% 5

    8.9% 20.0% 15.6% 5

    0.0% 26.7% 24.4% 4

    2.2% 15.6% 22.2% 5

    9.3% 18.6% 30.2% 3

    2.00% 14.20% 20.40% 5

  • 8/2/2019 Shazia Sadiq

    22/31

    Quality concept Very

    Quality Assessment 31.3

    Quality Frameworks 26.1

    Modelling and Design 11.1

    Integration and Linkage 15.9

    Constraints and Rules 20.0

    Linea e .

    Acquisition and Presentation

    17.0

    oor Low Medium Well Very

    % 19.5% 20.9% 17.4% 10.

    % 26.1% 23.9% 15.2% 8.7

    37.8% 28.9% 13.3% 8.9

    % 38.6% 25.0% 9.1% 11.

    % 15.6% 26.7% 31.1% 6.7

    . . . .

    % 16.00% 34.60% 24.40% 8.0

  • 8/2/2019 Shazia Sadiq

    23/31

    ata Quality Assesment

    a Modelling and Design

    ta Constraints & Rules

    a Integration & Linkage

    F6-Data Lineage

    Corra Acquisition & Present pos t

    Fit t

    VIF

    Data Quality (Y)

    lation between each factor with Y stronve corre at on > . was s own

    e Multiple Linear Regression model

    or each independent variable was well b

    nificance of D Factors

  • 8/2/2019 Shazia Sadiq

    24/31

    nificance of D

    Coefficie

    pt 0.

    esment (F1) 0.

    mework (F2) 0.

    odeling & Design (F3) -0.

    tegration & Linckage (F4) -0.

    onstraints & Rules F5 0.

    ineage (F6) -0.

    cqu s on resen a on .

    =

    Factors

    tsStandard

    Error P-valueLower95% Uppe

    861 0.339 0.0155 0.173

    337 0.169 0.0536 -0.005

    632 0.209 0.0046 0.207

    126 0.200 0.5312 -0.532

    124 0.211 0.5578 -0.553

    30 0.184 0.0252 0.056

    123 0.165 0.4597 -0.457

    . . - .

    =

  • 8/2/2019 Shazia Sadiq

    25/31

    ata ua ty

    Data Rules aare the three t

    overall success of

    ramewor s

    d Constraintsp factors that

    ata Quality Projects

    t steps

  • 8/2/2019 Shazia Sadiq

    26/31

    t steps

    evelop a deeper understc arac er s cs w re

    u en muconducting an explor

    w.jayawarden

    + 61 045

    anding of organizationalspec o ese ac ors

    ayawar eneatory study [email protected]

    20 3719

    IAIDQ booth to

  • 8/2/2019 Shazia Sadiq

    27/31

    IAIDQ booth to

    iting newn s o n

    ia Pacific

    e to theneral meeting

    he founding

  • 8/2/2019 Shazia Sadiq

    28/31

    a Quality Ass ssment

  • 8/2/2019 Shazia Sadiq

    29/31

    a Quality Ass

    Very

    Low/PoorLow

    Importance 17.4% 2.2% 8.

    ell has this been

    d31.3% 19.5% 20

    r 70% indicated DQ assessment is a highly im

    than 30% are satisfied about the initiatives tak

    ctive organizations.

    ed responses revealed.assessment s st re a ve y a new concep to

    of knowledge, skills and organizational support

    ng a successful approach to data quality assess

    ,istent methodology for data quality assessment a

    e poor quality data.

    ssment

    edium HighVery

    High/Well

    % 19.6% 52.2%

    .9% 17.4% 10.9%

    ortant concept.

    n towards a DQ assessment in their

    ustry.

    has prevented them from

    ent.

    d addressing the root causes which

    a Quality Fra eworks

  • 8/2/2019 Shazia Sadiq

    30/31

    a Quality Fra

    Very

    Low/Poorl

    neral Importance6.5% 8

    w well has this been addressed26.1% 2

    und 75 % indicated that DQ framework

    s than 25% have ro er D framework

    sed responses revealed:

    .Need further guidance to resolve the con

    Quality frameworks and to address prac

    eworks

    ow Medium HighVery

    Hi h/Well

    .7% 10.9% 19.6% 54.3%

    6.1% 23.9% 15.2% 8.7%

    s are highly important.

    in lace.

    ceptual level issues in deriving data

    ical implementation challenges.

    a Constraints & Rules

  • 8/2/2019 Shazia Sadiq

    31/31

    a Constraints

    Very

    Low/PoorlyLo

    neral Importance 4.4% 2.2

    w well has this been 20.0% 15.ressed

    ver agree on e mpor ance o e

    round 37% are satisfied about the curre

    based responses revealed:

    Systems without a long term vision

    In appropriate modelling tools

    & Rules

    w Medium HighVery

    High/Well

    % 15.6% 22.2% 55.6%

    % 26.7% 31.1% 6.7%

    concep

    nt implementation of the concept