8/2/2019 Shazia Sadiq
1/31
.niversity of [email protected]
8/2/2019 Shazia Sadiq
2/31
s talk is base on your data
conducted by researchers
,
participated in a survey in
to identify key data and
by industry
8/2/2019 Shazia Sadiq
3/31
The University of Quee
learnin and research ingraduates since openingareas of society and ind
I work in the Data and
BIG Data Manageme ata n ng an na
Spatio-temporal, Mul
Information Modellin
sland (UQ) is one of Australias pr
titutions has roduced almost 197in 1911 that have become leaders istry.
nowledge Engineering research
t
yt cs
imedia, Text/Web, Data Streams
g and Semantics
8/2/2019 Shazia Sadiq
4/31
kground and
esearch and Industry Innovatledge, which is diversified, a
p nary
re but also its boundaries (Be
eived lack of synergy between
ationale
ons constitute a large body ofplication specific, and cross-
basat and Zmud, 2003)
research community and Indus
8/2/2019 Shazia Sadiq
5/31
rall Objective
tify the key concepts/themes d
the past 20 years.zia Sadiq, Naiem Khodabandehloo Yeganeh andds and synergies. In Proceedings of: The 22nd
stralia, (1-10). 17-20 January 2011.
in industry feedbackon these k
rc commun y on a us ng
nducted at DQAsiaPacific 2011zia Sadiq, Vimukthi Jayawardene, Marta Induls
nagement. International Conference on Informati
vember, 2011
tify the key capability areas
roved data ualit and enli
of the Resear
veloped by DQ research com
Marta Indulska. 20 years of data quality research
ustralasian Database Conference (ADC 2011) Pe
y concepts and enlighten the
e u ure researc rec ons.
a. Research and Industry Synergies in Data Quali
on Quality (ICIQ2011), Adelaide, Australia, 18-2
hich contribute most towar
ten the industr ractitioner
8/2/2019 Shazia Sadiq
6/31
dy Methodolog
dy incorporates two separate co
rature analysis:
the research community.
actitioner survey:va ate t e mportance o t ese coctitioner point of view along with
y
ponents.
cepts t emes romheir implementation challenges.
8/2/2019 Shazia Sadiq
7/31
rature Analy
is.
Conceptual analysis approa
Selection of ublication outleby discipline rankings
Over 30,000 publications (19
Relevance scanning
Multiple levels of keywords
ons erat on o synonyms
Two rounds of paper identificFull text content analysis
8/2/2019 Shazia Sadiq
8/31
rature Analysi
Includes
rences BPM, CAiSE (Workshops), CIKM,
ECOOP, EDBT,PODS, SIGIR, SIG
WIDM, WISEences ACIS, AMCIS, CAiSE, ECIS, E
ICIS IFIP IRMA IS Foundations P
ls TODS, TOIS, CACM, DKE, DSS
JDM, TKDE, VLDB Journal
ls BPM, CAIS, EJIS, Information and
ISF, ISJ (Black-well), ISJ (Sarasota),
, .
umber of publications considered ->
.
Total Data/Informati
ASFAA,
OD, VLDB,
7535 476
, HICSS, ICIQ,
ACIS
13256 651
, ISJ (Elsevier), 8417 93
anagement,
JAIS, JISR,
2493 144
1701
8/2/2019 Shazia Sadiq
9/31
onomy of DQ reas of Study
8/2/2019 Shazia Sadiq
10/31
8/2/2019 Shazia Sadiq
11/31
h N t k
8/2/2019 Shazia Sadiq
12/31
earch Netwo ks
h N t k
8/2/2019 Shazia Sadiq
13/31
earch Netwo ks
8/2/2019 Shazia Sadiq
14/31
r understanding of core of data
of between multi le communiti
siness Analysts, who focus onorganizatia y o ec ves or e organ za on an sd standards required to manage and ensure
lution Architects, working onarchitectur
uired to deploy developed data quality ma
tabase Experts and statisticians, contribd efficient IT tools & computational techni
antic integrity constraints, and informati
uality research as well as synerg
s contributin to data ualit sol
nal solutions that is the development of da eg es o es a s e peop e, processes,the data quality objectives are met
al solutions, that is the technology landsca
nagement processes, standards and policie
ting tocomputational solutions, that is efques required to meet data quality objectiv
, ,n trust and credibility
d M h d l
8/2/2019 Shazia Sadiq
15/31
dy Methodolog
dy incorporates two separate co
rature analysis:
the research community.
actitioner survey:va a e e mpor ance o ese coctitioner point of view.
y
ponents.
cep s emes rom
titi S
(D i )
8/2/2019 Shazia Sadiq
16/31
ctitioner Surve
up ng e eywor s en e
esearch themes (Data Quality Fa
Quality Assessment. (statistical profiling, error
Quality Frameworks. (governance, benchmark
Modelling and Design. (schema quality, docum
Integration and Linkage. (schema matching, d
ent formats, ETL/Data Warehousing )
Constraints and Rules. (business rules, data sta
neage. provenance, ata trac ng, source attr
Acquisition and Presentation. (data interfaces,
media data)
vey questionnaire was designed based
(Design)
n e axonomy, we recogn ze
tors).
etection, metrics, cost estimation methods)
ng, best ractices, standards)
ntation/meta-data, managing legacy systems)
plicate detection/entity resolution, use of master
ndards, key/id management)
ut on, owners p
data entry, data collection/upload e.g sensor & R
on the above themes
titi S (E ti )
8/2/2019 Shazia Sadiq
17/31
ctitioner Surve
arget audience was data quality profes
ctive participation in data quality related
dustry conferences
ofessional bodies
than 200 Participants were reached us
ough direct invitations in an online web
nse rate was around 30% )
(Execution)
ionals identified through various sources
nline forums,
ng either printed version of the question
ite.
.
nalysis and Re
ults
8/2/2019 Shazia Sadiq
18/31
nalysis and Re
vel of data quality management tr
.
eneral Im ortance of the D fac
plementation success of the DQ
actitioners point of view.
s a s ca ana ys s o n ou e
ults
ining possessed by the industry
ors from ractitioners oint of vie
factors in organizations from
os s gn can ac ors or a a q
ticipant Demo raphics
8/2/2019 Shazia Sadiq
19/31
ticipant Demo
groun an emograp c n orma o
f data quality training
ajority of the data quality professionals h
raphics
n a ou par c pa ng prac oners:
32% of the respondents work for large or> emp oyees
27% of the respondents work for mediu
41% are from small sized organizations
The average number of completed data qprojects per participant is 13.
ave not received any formal training in d
rces of data q ality problems
8/2/2019 Shazia Sadiq
20/31
rces of data q ality problems
portance of th DQ factors
8/2/2019 Shazia Sadiq
21/31
portance of th
ta Quality concept
Ver
Lowuality Assessment
17.4
uality Frameworks6.5%
odelling and Design4.4%
4.4%
onstraints and Rules 4.4%
ineage4.7%
c uisition and Presentation6.10
DQ factors
Low Medium High
2.2% 8.6% 19.6% 5
8.7% 10.9% 19.6% 5
8.9% 20.0% 15.6% 5
0.0% 26.7% 24.4% 4
2.2% 15.6% 22.2% 5
9.3% 18.6% 30.2% 3
2.00% 14.20% 20.40% 5
8/2/2019 Shazia Sadiq
22/31
Quality concept Very
Quality Assessment 31.3
Quality Frameworks 26.1
Modelling and Design 11.1
Integration and Linkage 15.9
Constraints and Rules 20.0
Linea e .
Acquisition and Presentation
17.0
oor Low Medium Well Very
% 19.5% 20.9% 17.4% 10.
% 26.1% 23.9% 15.2% 8.7
37.8% 28.9% 13.3% 8.9
% 38.6% 25.0% 9.1% 11.
% 15.6% 26.7% 31.1% 6.7
. . . .
% 16.00% 34.60% 24.40% 8.0
8/2/2019 Shazia Sadiq
23/31
ata Quality Assesment
a Modelling and Design
ta Constraints & Rules
a Integration & Linkage
F6-Data Lineage
Corra Acquisition & Present pos t
Fit t
VIF
Data Quality (Y)
lation between each factor with Y stronve corre at on > . was s own
e Multiple Linear Regression model
or each independent variable was well b
nificance of D Factors
8/2/2019 Shazia Sadiq
24/31
nificance of D
Coefficie
pt 0.
esment (F1) 0.
mework (F2) 0.
odeling & Design (F3) -0.
tegration & Linckage (F4) -0.
onstraints & Rules F5 0.
ineage (F6) -0.
cqu s on resen a on .
=
Factors
tsStandard
Error P-valueLower95% Uppe
861 0.339 0.0155 0.173
337 0.169 0.0536 -0.005
632 0.209 0.0046 0.207
126 0.200 0.5312 -0.532
124 0.211 0.5578 -0.553
30 0.184 0.0252 0.056
123 0.165 0.4597 -0.457
. . - .
=
8/2/2019 Shazia Sadiq
25/31
ata ua ty
Data Rules aare the three t
overall success of
ramewor s
d Constraintsp factors that
ata Quality Projects
t steps
8/2/2019 Shazia Sadiq
26/31
t steps
evelop a deeper understc arac er s cs w re
u en muconducting an explor
w.jayawarden
+ 61 045
anding of organizationalspec o ese ac ors
ayawar eneatory study [email protected]
20 3719
IAIDQ booth to
8/2/2019 Shazia Sadiq
27/31
IAIDQ booth to
iting newn s o n
ia Pacific
e to theneral meeting
he founding
8/2/2019 Shazia Sadiq
28/31
a Quality Ass ssment
8/2/2019 Shazia Sadiq
29/31
a Quality Ass
Very
Low/PoorLow
Importance 17.4% 2.2% 8.
ell has this been
d31.3% 19.5% 20
r 70% indicated DQ assessment is a highly im
than 30% are satisfied about the initiatives tak
ctive organizations.
ed responses revealed.assessment s st re a ve y a new concep to
of knowledge, skills and organizational support
ng a successful approach to data quality assess
,istent methodology for data quality assessment a
e poor quality data.
ssment
edium HighVery
High/Well
% 19.6% 52.2%
.9% 17.4% 10.9%
ortant concept.
n towards a DQ assessment in their
ustry.
has prevented them from
ent.
d addressing the root causes which
a Quality Fra eworks
8/2/2019 Shazia Sadiq
30/31
a Quality Fra
Very
Low/Poorl
neral Importance6.5% 8
w well has this been addressed26.1% 2
und 75 % indicated that DQ framework
s than 25% have ro er D framework
sed responses revealed:
.Need further guidance to resolve the con
Quality frameworks and to address prac
eworks
ow Medium HighVery
Hi h/Well
.7% 10.9% 19.6% 54.3%
6.1% 23.9% 15.2% 8.7%
s are highly important.
in lace.
ceptual level issues in deriving data
ical implementation challenges.
a Constraints & Rules
8/2/2019 Shazia Sadiq
31/31
a Constraints
Very
Low/PoorlyLo
neral Importance 4.4% 2.2
w well has this been 20.0% 15.ressed
ver agree on e mpor ance o e
round 37% are satisfied about the curre
based responses revealed:
Systems without a long term vision
In appropriate modelling tools
& Rules
w Medium HighVery
High/Well
% 15.6% 22.2% 55.6%
% 26.7% 31.1% 6.7%
concep
nt implementation of the concept