The University of Queensland Faculty of Business, Economics & Law Department of Commerce Information Request Ambiguity and End User Query Performance: Theory and Empirical Evidence A Thesis submitted to the Department of Commerce, the University of Queensland, in partial fulfilment of the requirements for the degree of Master of Information Systems. By Micheal Axelsen 15th June 2000 Supervisor: Dr Paul Bowen
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The University of Queensland
Faculty of Business, Economics & Law
Department of Commerce
Information Request Ambiguity and End User Query
Performance: Theory and Empirical Evidence
A Thesis submitted to the Department of Commerce, the University of Queensland, in partial fulfilment of the requirements for the degree of
Master of Information Systems.
By Micheal Axelsen
15th June 2000
Supervisor: Dr Paul Bowen
i
Acknowledgments
I wish to express my appreciation and thanks to my supervisor, Dr Paul Bowen, for his
assistance, advice, and patience in the preparation of this thesis. To my mother I offer thanks
for making it all possible. I also express sincere gratitude to my wife, Leeanne Klan, whose
obstinate patience continues to assist in putting the world in focus.
I also thank workshop participants at Nanyang Technological University in Singapore for
their comments and contributions to this thesis.
ii
Abstract
The increasing reliance of organisations on information technology and the persistent
shortage of IT/IS professionals requires end users to satisfy many information requests by
querying complex information systems. Because many business decisions are now based on
the results of the end users' queries, information request ambiguity has extensive
ramifications for business practices. Where the queries do not match the requirements of the
information requests, the business decisions are likely to be fundamentally flawed.
This paper develops a theory of ambiguity in information requests and reports the results of
an initial empirical investigation of that theory. The theory identifies seven ambiguities:
lexical, syntactical, inflective, pragmatic, extraneous, emphatic, and suggestive. A laboratory
experiment with sixty-six participants was used to investigate the empirical effect of
ambiguity on end user query performance. End user query performance was measured by the
number of total errors in the proposed solution, the time taken to complete the solution, and
the end user's confidence in the solution.
The results indicate that ambiguity significantly degrades end user query performance. The
seven types of ambiguity were analysed to determine their individual effects on end user
query performance. Actual (pragmatic, extraneous) and imaginary (emphatic, suggestive)
ambiguities show significant relationships with total errors and duration. In general, potential
(lexical, syntactical, and inflective) ambiguities were not significantly associated with total
errors or end user confidence. The results should have important implications for consulting
firms, for organisations with ad hoc work groups, and for entities that make extensive use of
Appendix L: Internal Validity of the Experiment ........................................................................ 94
v
Figures
Figure 1 Types of Ambiguity (adapted from Walton 1996) 7
Figure 2 The Theoretical Model of Ambiguity, Complexity, and End User Query Performance 19
Figure 3 Depicting graphically the relationship between the treatment received (ambiguous or
clear information request) and the total errors in the participant's response.
25
Figure 4 Depicting graphically the relationship between the treatment received (ambiguous or
clear information request) and the duration taken for the participant to prepare the
response.
26
Figure 5 Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the participant's confidence in the response.
26
Tables
Table 1 Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests
17
Table 2 Participant Demographic Information and Descriptive Statistics: Course Background
of Group A and Group B
23
Table 3 Participant Demographic Information and Descriptive Statistics: Academic Record of
Group A and Group B
23
Table 4 Participant Demographic Information and Descriptive Statistics: Participant Age in
Group A and Group B
24
Table 5 Comparative Statistics for all Participant Responses Grouped by Question (Q) and
Treatment (T). Note that for T, a = ambiguous, c = clear
25
Table 6 Confidence Rating Transformation to a Numerical Scale 28
Table 7 Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model
29
Table 8 Regression Analysis Results for the General Ambiguity Regression Model 30
Table 9 Regression Analysis Results for the Seven Ambiguity Types Regression Model 31
Table 10 Summary of Analysis' Support for Hypotheses 32
Table 11 Participant Strata Classes 69
1
1. Introduction
Keen (1993) predicts that innovative applications of information technology will change the
competitive landscape to such an extent that fifty percent of companies in some industries
may not survive the next decade. This rise of the importance of information technology
innovation and application has lead to the increased need for relevant, timely information at
the point where that information is used and understood (Conger 1994; Delligatta and
Umbaugh 1994; Nath and Lederer 1996).
The demand for information system (IS) professionals vastly overwhelms the available
supply for both now and the foreseeable future (Freeman et al. 2000; Rosenthal and
Jategaonkar 1995; Australian Bureau of Statistics 1997). Hence, the use of computerised
information systems by end users has become compulsory in most business organisations
(Cardinali 1992; Athey and Wickham 1995-1996). To provide appropriate, relevant
information requires identifying and eliminating ambiguities in communication between the
stakeholders or managers requesting information, and the end users querying the information
systems.
Traditional structured methodologies reduce ambiguity at the expense of timeliness,
flexibility, and learning. The insights that end users can achieve during interactive, iterative
query sessions are also of benefit. The need for timeliness, flexibility, learning and end user
insights, as well as the shortage of IS professionals, have lead to the general decline of
structured reports (Ryan 1993). The use of ad hoc and iterative end user reports has
increased (Tayntor 1994). Nonetheless, many end users now use more formalised processes
in developing their reports than previously (Conger 1994; Tayntor 1994).
2
Information request ambiguity has potentially real and large impacts on business
organisations. An ambiguous information request can result in a report that, although it
appears acceptable to the person making the information request, does not contain the desired
information. If that wrong report is then used to make business decisions that the correct
report would not have supported, then information request ambiguity can cause substantial
negative impacts.
This paper develops a theory of the impact of ambiguity in information requests on end user
query performance, and tests that theory empirically. It empirically examines the strength
and direction of the relationships between ambiguity types (lexical, syntactical, inflective,
pragmatic, extraneous, emphatic, and suggestive), complexity, and end user query
performance. The current study extends previous work (Suh and Jenkins 1992; Borthick et
al. 1997; Rho and March 1997; Borthick et al. 2000) and builds upon the theory of end users'
query performance in the tradition of Dubin (1978).
3
2. Information Request Ambiguity and End User Query Performance
Different forms of ambiguity can be present in a natural language information request. The
primary aim of this research is to explore the impact of ambiguity on end user query
performance. This chapter develops a theory of the relationship between information request
ambiguity and end user query performance.
2.1 A Theoretical Model of Information Request Ambiguity
The development of an accurate SQL query by an end user depends on the user's knowledge
of the information needed, the database structure, and the query language (Ogden et al. 1986).
A lack of skill in any of these three domains will lead to inaccurate SQL queries (Ogden et al.
1986).
A natural language information request requires end users to transform the natural language
constructs into the query components consisting of lexical items (Katzeff 1990). End users
must conceptualise the information requirement and then mentally map this conceptualisation
to their understanding of the database structure. Reisner (1977) proposed a template model
for the manner in which users create SQL queries from a natural language information
request. Each query's operator components (Halstead 1977) are drawn from a set of known
query language components to address the requirements of the natural language information
request.
Ambiguity affects the user's interpretation of the information needed. Because information
requests are expressed using a natural language, they are ambiguous and uncertain. End users
4
must interpret and analyse the information requests to develop queries that meet the
requestors' needs. The end users' uncertainty in determining the required response affects the
required cognitive effort because multiple interpretations of the actual information required
may be legitimately constructed (Almuallim et al. 1997).
The impact of natural language's seven types of ambiguity has not previously been examined
in the context of end user query performance. These seven types of ambiguity are lexical,
syntactical, inflective, pragmatic, extraneous, emphatic and suggestive (Walton 1996; Fowler
and Aaron 1998). These ambiguities affect the number of legitimate interpretations of the
natural language statement of the information request. The information request has
"multiplicity of meaning" (Walton 1996).
Tasks that are more complex require increased cognitive effort (Campbell 1988). In the
context of database queries, task complexity generally negatively impacts end user query
performance (Borthick et al. 1997; Borthick et al. 2000). Task complexity is included in this
research to control for complexity's established impact on end user query performance.
Query performance can be measured on a number of dimensions including correctness, time
required, and confidence.
Hence, the following hypotheses are proposed:
H1a: Higher ambiguity in the information request leads to an increase in the total errors
in the query formulation.
H1b: Higher ambiguity in the information request leads to an increase in the time taken
to complete the query formulation.
5
H1c: Higher ambiguity in the information request leads to lower end user confidence in
the accuracy of the query formulation.
2.2 The Nature of Ambiguity
Ambiguity is an inherent property of all natural languages, including English (Jespersen
1922; Williamson 1994). Absolute precision of a language is pragmatically undesirable,
because the language is unable to adapt to new concepts (Williamson 1994). The
communication needed to ensure effective and efficient report production, however, requires
complete clarity. Hence, a tension exists between the natural language's need for flexibility
in the long term and the need for precision in the short term. Natural language is at once both
dysfunctional and poorly adapted to the functions language needs to perform, yet flexible and
broad-based such that it is useable in practice (Chomsky 1990).
Interest in linguistic ambiguity has an extensive history, and has been recognised as a
separate branch of study since at least Aristotle's time (Kooij 1971). Aristotle noted that
language must be ambiguous, as a language has limited words but an infinite number of
things and concepts to which those words must apply (Kooij 1971).
Russell (1923) recognised that all natural languages are vague and ambiguous. Excluding the
realm of mathematical symbolism, constructing completely unambiguous expressions is not
possible with the syntax and vocabulary tools available within natural languages (Williamson
1994). To endure and survive, language requires the flexibility to communicate new
concepts. Ambiguity necessarily derives from the flexibility of natural language.
6
Kooij (1971) states that ambiguity arises where a sentence can be interpreted in more than
one way. Similarly, Walton (1996) considers a sentence or statement to be more ambiguous
as the number of legitimate interpretations of the sentence (or paragraph) increase.
Ambiguity implies multiplicity of meaning (Walton 1996).
In classical analysis, the multiplex (Latin for "multiple meaning") categorisation of
Alexander of Aphrodisius (Hamblin 1970) suggests a basis for the identification of categories
of ambiguity. In classical literature, Alexander of Aphrodisius identified three categories of
ambiguity: potential, actual, and imaginary. Walton (1996) adapts this classical multiplex
categorisation to his identified types of ambiguity.
Walton (1996) identifies six classical types of ambiguity in natural language: lexical,
syntactical, inflective, pragmatic, emphatic, and suggestive. In addition to Walton's (1996)
taxonomy, extraneous information and noise in the communication can also be a source of
ambiguity. Extraneous ambiguity arises where the communication is not parsimonious, or
the communication includes information that is not directly relevant to the message being
communicated (Fowler and Aaron 1998). Extraneous ambiguity is an actual ambiguity
within the Walton (1996) taxonomy.
Each ambiguity type can be independently present within the communication. Walton's
(1996) modified taxonomy and model of ambiguity is presented in Figure 1.
7
Ambiguity
SuggestiveEmphaticPragmaticInflective
Syntactical
Lexical
ImaginaryActualPotentialMultiplex
Categories of
Ambiguity
Types of
AmbiguityExtraneous
Figure 1
Types of Ambiguity (adapted from Walton 1996)
2.2.1 Potential Ambiguity
Potential ambiguity arises when a term or a sentence is ambiguous in and of itself, for
example, before its use in the context of a sentence or paragraph. Three types of ambiguity
are categorised as potential ambiguity: lexical, syntactical, and inflective.
Lexical Ambiguity
Lexical ambiguity is the most commonly known form of ambiguity (Reilly 1991; Walton
1996). It occurs when words have more than one meaning as commonly defined and
understood. Considerable potential ambiguity arises when a word with various meanings is
used in a statement of information request. For example, "bank" may variously mean the
"bank" of a river (noun), to "bank" as related to aeroplane or a roller-coaster (verb), a savings
"bank" (noun), to "bank" money (verb), or a "bank" of computer terminals (noun) (Turner
1987). Lexical ambiguity is often reduced or mitigated by the context of the sentence.
In the case of an information request, lexical ambiguity exists in the statement "A report of
our clients for our marketing brochure mail-out". The word "report" may have several
8
meanings, independent of its context. A gunshot report may echo across the hillside. A
student can report to the lecturer. A heavy report can be dropped on the foot. Although the
context may make the meaning clear, the lexical ambiguity contributes to the overall
ambiguity of the statement and increases cognitive effort.
The following hypotheses are proposed:
H2a: Higher lexical ambiguity in the information request leads to an increase in the total
errors in the query formulation.
H2b: Higher lexical ambiguity in the information request leads to an increase in the time
taken to complete the query formulation.
H2c: Higher lexical ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Syntactical Ambiguity
Syntactical ambiguity is a structural or grammatical ambiguity of a whole sentence that
occurs in a sub-part of a sentence (Reilly 1991; Walton 1996). Syntactical ambiguity is a
grammatical construct, and results from the difficulty of applying universal grammatical laws
to sentence structure. An example of syntactical ambiguity is "Bob hit the man with the
stick". This phrasing is unclear as to whether a man was hit with a stick, or whether a man
with a stick was struck by Bob. The context can substantially reduce syntactical ambiguity.
For example, knowing that either Bob, or the man, but not both, had a stick resolves the
syntactical ambiguity.
9
Comparing the phrase "Bob hit the man with the stick" to the analogous "Bob hit the man
with the scar" provides some insights. As a scar is little suited to physical, violent use, the
latter formulation clearly conveys that the man with the scar was struck by Bob (Kooij 1971).
In the case of an information request, syntactical ambiguity exists in the request "A report of
poor-paying clients and client managers. Determine their effect on our profitability for the
last twelve months." The request is syntactically ambiguous because the end user can
interpret "their" to mean the poor paying clients, the client managers, or both. Although the
context may reduce or negate the ambiguity, syntactically the request is ambiguous.
The following hypotheses are proposed:
H3a: Higher syntactical ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H3b: Higher syntactical ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H3c: Higher syntactical ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Inflective Ambiguity
As Walton (1996) notes, inflective ambiguity is a composite ambiguity, containing elements
of both lexical and syntactical ambiguity. Like syntactical ambiguity, inflective ambiguity is
grammatical in nature. Inflection arises where a word is used more than once in a sentence or
paragraph, but with different meanings each time (Walton 1996). An example of inflective
10
ambiguity is to use the word "scheme" with two different meanings in the fallacious
argument, "Bob has devised a scheme to save costs by recycling paper. Therefore, Bob is a
schemer, and should not be trusted" (Ryle 1971; Walton 1996).
In the case of an information request, inflective ambiguity exists in the example, "A report
showing the product of our marketing campaign for our accounting software product".
Ambiguity derives from using the word "product" in two different senses in the one statement
(Walton 1996; Fowler and Aaron 1998).
The following hypotheses are proposed:
H4a: Higher inflective ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H4b: Higher inflective ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H4c: Higher inflective ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
2.2.2 Actual Ambiguity
Actual ambiguity refers to ambiguity that occurs in the act of speaking. It arises when a word
or phrase, without variation either in itself or in the way the word is put forward, has different
meanings. The statement does not contain adequate information to resolve the ambiguity,
resulting in a number of legitimate interpretations. Two distinct types of ambiguity are
categorised as actual ambiguity: pragmatic and extraneous.
11
Pragmatic Ambiguity
Pragmatic ambiguity arises when the statement is not specific, and the context does not
provide the information needed to clarify the statement. Information is missing, and must be
inferred. An example of pragmatic ambiguity is the story of King Croesus and the Oracle of
Delphi (adapted from Copi and Cohen 1990):
"King Croesus consulted the Oracle of Delphi before warring with Cyrus of
Persia. The Oracle replied that, "If Croesus went to war with Cyrus, he would
destroy a mighty kingdom". Delighted, Croesus attacked Persia, and Croesus'
army and kingdom were crushed. Croesus complained bitterly to the Oracle's
priests, who replied that the Oracle had been entirely right. By going to war with
Persia, Croesus had destroyed a mighty kingdom - his own."
Pragmatic ambiguity arises when the statement is not specific, and the context does not
provide the information needed to clarify the statement (Walton 1996). The information
necessary to clearly understand the message is omitted. Due to the need to infer the missing
information, pragmatically ambiguous statements have multiple possible interpretations
(Walton 1996). Croesus interpreted the Oracle's statement as indicating his success in battle -
the response he desired. As noted by Hamblin (1970), Croesus' logical response to the
oracular reply would have been to immediately ask the Oracle, "Which kingdom?" Further
information is needed to resolve pragmatic ambiguity.
In the case of an information request, pragmatic ambiguity exists in the request for "A report
of all the clients for a department." The ambiguity is that the request does not refer to a
specific department. The end user could legitimately prepare a report for any department.
Further information is needed to resolve this actual ambiguity in this case.
12
The following hypotheses are proposed:
H5a: Higher pragmatic ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H5b: Higher pragmatic ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H5c: Higher pragmatic ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Extraneous Ambiguity
In contrast to pragmatic ambiguity, in which information necessary to clearly understand the
message is omitted, extraneous ambiguity arises from an excess of information. Clearer
communication arises where the minimally sufficient words needed to convey the message of
the statement are used (Fowler and Aaron 1998). Where more words are used than
necessary, or where unnecessary detail is provided in the communication that is not part of
the message, ambiguity arises. The excess detail obscures the essential message and
contributes to different emphases or interpretations.
The use of passive voice, vacuous words, or the repetition of phrases with the same meaning
of figures of speech add volume to the statement, but add little or no meaning. Pretentious
and indirect writing also adds to the bulk of the statement, but without adding meaning.
Fowler and Aaron (1998) provide the following comparative example:
13
Pretentious: To perpetuate our endeavour of providing funds for our elderly citizens as
we do at the present moment, we will face the exigency of enhanced
contributions from all our citizens.
Revised: We cannot continue to fund Social Security and Medicare for the elderly
unless we raise taxes.
The extra volume contributes to vagueness in the first statement, and adds to the multiplicity
of legitimate interpretations of the statement. The first statement exhibits extraneous
ambiguity. The second statement communicates forcefully and concisely.
An example of extraneous ambiguity in an information request is "A report of all clients (and
their names and addresses only) for the Tax and Business Services department. Some of
those clients are our biggest earners, you know". The last sentence is extraneous, and
contains detail that is redundant, uninformative, or misleading relative to the fundamental
message. In information theoretic terms, extraneous ambiguity is "noise" in the
communication (Axley 1984; Eisenberg and Phillips 1991; Severin and Tankard 1997).
The following hypotheses are proposed:
H6a: Higher extraneous ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H6b: Higher extraneous ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H6c: Higher extraneous ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
14
2.2.3 Imaginary Ambiguity
Imaginary ambiguity occurs when a word with a fixed meaning seems to have a different one.
Imaginary ambiguity derives from the optional interpretation that the recipient of the
communication places on the information received. Two distinct types of ambiguity can be
categorised as imaginary ambiguity: emphatic and suggestive.
Emphatic Ambiguity
The question of ambiguity deriving from accent, or emphasis in speaking, is an ancient one
(Hamblin 1970). When a phrasing is rendered in the written form, the verbal emphasis may
only be crudely indicated. Significant meaning and context is lost. Rescher (1964) provides
the following example of emphatic ambiguity:
The intended meaning of the democratic credo "Men were created equal" can be
altered by stressing the word "created" (implying "that's how men started out, but
they are no longer so").
The verbal emphasis creates an inference of meaning that is a legitimate interpretation of the
phrasing. That is, changes in intonation can yield different interpretations.
In the case of an information request, emphatic ambiguity occurs in the example information
request of "A report of our good clients". Ambiguity can derive from placing different
emphases on the words. Depending on the context or on emphasis used, "good clients" could
be legitimately interpreted to be clients that pay on time or clients that have the highest
dollar-value sales. Indeed, with an ironic emphasis on the word "good", this request could be
interpreted as a list of our worst clients - those that do not pay. The information necessary to
resolve the ambiguity is often difficult to convey using only printed media.
15
The following hypotheses are proposed:
H7a: Higher emphatic ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H7b: Higher emphatic ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H7c: Higher emphatic ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
Suggestive Ambiguity
Despite the apparent clarity of the sentence in question, suggestive ambiguity creates diverse
implications and innuendos that can produce different implications (Walton 1996). Fischer
(1970) provides an example:
The First Mate of a ship docked in China returned drunk from shore leave, and
was unable to write up the ship's log. The displeased Captain completed the log,
adding, "The Mate was drunk all day". The next day, the now-sober Mate
challenged the Captain over the entry, as it would reflect poorly on him. The
Captain responded that the comment was true, and must stand. Whereupon the
mate added to that day's log, "The Captain was sober all day". In reply to the
Captain's challenge, the mate responded "the comment is true, and must stand"
(derived from Trow 1905, pp 14-15).
The phrase "The Captain was sober all day" contains suggestive ambiguity. As a further
example, the statement, "The President is now an honest man", is perfectly clear, and yet
considerable innuendo exists. The fact that the President's current honesty is worthy of
comment implies that the President was previously dishonest.
16
Both phrases are perfectly clear, and, indeed, true. However, considerable innuendo exists.
The fact that the Captain's sobriety, or the President's honesty, is singled out for special
comment implies that such a state of affairs is unusual (Walton 1996). The statements are
suggestively ambiguous.
In the case of an information request, an example of this ambiguity is, "A report of the clients
of this accounting practice that have lodged taxation returns in the past five years in
accordance with the requirements of the Australian Taxation Office". The request for
information is quite clear. By definition, however, all taxation returns should be lodged in
accordance with the Australian Taxation Office's requirements. The extra phrase introduces
suggestive ambiguity into the information request by suggesting that the report will not
consist of all taxation clients, because some clients may not have complied with the Tax
Office's requirements.
The following hypotheses are proposed:
H8a: Higher suggestive ambiguity in the information request leads to an increase in the
total errors in the query formulation.
H8b: Higher suggestive ambiguity in the information request leads to an increase in the
time taken to complete the query formulation.
H8c: Higher suggestive ambiguity in the information request leads to lower end user
confidence in the accuracy of the query formulation.
17
2.2.4 Ambiguity in Practice
Table 1 provides examples of the types of ambiguity identified in this paper. The table also
summarises, and provides examples for, each type of ambiguity.
Table 1
Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests
Ambiguity
Type
Information Request
Lexical A report of our clients for our marketing brochure mail-out.
The word "report" may have several meanings, independent of its context.
For example, there may be: a gunshot report echoing through the hillside;
the Lieutenant reported to the Captain; I dropped the heavy report on my toe,
etc. Although the context may make the meaning clear, the lexical ambiguity adds to cognitive effort and contributes to ambiguity overall.
Syntactical A report of poor-paying clients and client managers. Determine their effect
on our profitability for the last twelve months.
It is not clear whose effect on profitability is meant. Another example is
"Bob hit the man with a stick". It is not clear, syntactically, whether the man
with a stick was hit, or whether the man was hit, by Bob, with a stick.
Inflective A report showing what the product of our last marketing campaign for sales
of our accounting software product in the last month was.
Ambiguity here derives from the use of the word "product" with two
different meanings in the one information request.
Pragmatic A report of all the clients for a department.
The ambiguity here is that the department has not been specified.
Information necessary to clearly understand the message is omitted. It would
be legitimate to prepare a report for any department. Further information is
needed to resolve this actual ambiguity.
Extraneous A report of all clients (and their names and addresses only) for the Tax and
Business Services department. Some of those clients are our biggest earners, you know.
The last sentence is extraneous. Unlike pragmatic ambiguity, the sentence
contains information that is redundant, uninformative, or not necessary to
derive the statement's message. "Noise" in the communication exists. More words are used than are necessary to make the statement.
Emphatic A report of our good clients.
Ambiguity here could derive from the lack of ability to provide emphasis of
the words in its written form. Depending on the emphasis used, "good
clients" could be legitimately interpreted to be clients that pay on time,
clients that have the most dollar-value sales, or even, with the correct ironic emphasis on the spoken word, our worst clients - those that do not pay.
18
Ambiguity
Type
Information Request
Suggestive A report of the clients of this accounting practice that have lodged taxation
returns in the past five years in accordance with the requirements of the
Australian Taxation Office.
The request for information is quite clear until the phrase "in accordance
with the requirements of the Australian Taxation Office". By definition, all
taxation returns should be lodged in accordance with these requirements.
The extra phrase introduces suggestive ambiguity into the information
request by suggesting that the report will not necessarily consist of all
taxation clients.
2.3 Task Complexity
More complex tasks require more cognitive effort and hence have a generally negative
impact on the user's performance in deriving database queries (Campbell 1988; Borthick et
al. 1997; Borthick et al. 2000). Task complexity, in the context of query development,
consists of the inherent task complexity associated with the query syntax, and the data
structure complexity associated with the organisation of the tables and attributes (Liew 1995).
Campbell (1988) and Wood (1986) document the general impact of task complexity. Jih et
al. (1989) studied task complexity and user performance in the context of the use of entity-
relationship diagrams and relational data models. Complexity in this context is generally
measured as a function of the total number of elementary mental discriminations required to
write a query (Halstead 1977).
The following hypotheses are proposed:
H9a: Higher complexity in the information request leads to more total errors in the query
formulation.
19
H9b: Higher complexity in the information request leads to more time taken to complete
the query formulation.
H9c: Higher complexity in the information request leads to lower end user confidence in
the accuracy of the query formulation.
2.4 Theoretical Model Summary
Figure 2 summarises the theoretical model presented in this paper. Complexity and the seven
types of ambiguity have a negative impact on end user query performance as they increase.
Hypotheses 1 through 9 are derived from these hypothesised relationships.
Pragmatic
Extraneous
Lexical
Syntactical
Inflective
Emphatic
Suggestive
Ambiguity
Information
Request
Complexity
End User
Query
Performance
Negative
Relationship With
Negative
Relationship With
Figure 2
The Theoretical Model of Ambiguity, Complexity, and End User Query Performance
20
3. Methodology
3.1 Experimental Design
A laboratory experiment was conducted to test the hypotheses presented in this study. A two-
factor, within-groups experimental design was used (Huck et al. 1974). Participants were
randomly assigned to two groups (Group A and Group B). Each participant was presented
with up to sixteen questions. Each question was presented in either a clear or ambiguous
formulation.
Group A's question formulations were alternately ambiguous and clear. Group B's question
formulations were alternately clear and ambiguous. Using alternating formulations helped
promote equitable treatment of the two groups. That is, the alternating formulations ensured
that both groups would complete approximately the same number of questions during the
allotted time, expend approximately the same amount of cognitive effort, and would
experience approximately the same level of frustration in dealing with ambiguous
information requests. All participants spent two hours on the experiment. Appendix A
shows the questions presented to students together with the model answers.
A set of instructions (Appendix B), including a synopsis of the query language syntax, was
provided to the participants. A Unix shell script (Appendix C) presented the questions
electronically to the participants and automatically captured their responses in text files. An
entity-relationship diagram describing the database is presented in Appendix D, and was
available to subjects. Further details regarding the experimental process are provided in
Appendix E.
21
3.2 Experiment Participants
Forty-seven undergraduate and nineteen postgraduate students participated in the experiment.
Participating students were enrolled either in an advanced undergraduate or in a post-graduate
database subject within the business school at the University of Queensland. All students
enrolled in the two database subjects participated in the experiment.
The motivation for student participation was the receipt of five percent of the students' final
mark for the subject (2.5% for participation, 2.5% for performance). Participants were aware
that they were participating in an experiment.
Participants had been previously trained in the use of the SQL query language, and had been
afforded the opportunity to practice SQL on the university systems. All practice took place
on different databases than used for the experiment. Generally, student expertise with SQL
was low to intermediate. The experiment, for most students, was the first practical
application of their SQL skills.
3.3 Assessment of Participant Responses
Participant responses were captured in text files that showed each interactive response and
captured the start and end time of each question. This file was edited into a suitable format
for marking by two examiners. Each response was independently assessed by each examiner
to determine whether the response was the participant's final complete response. Responses
where participants did not finish the query formulation were removed from the study.
22
In some instances, the state of completion of the response was indeterminate. If the response
could only be corrected with substantial rework of the submitted response, the examiners
erred on the side of caution and removed these responses from the study.
Examiners then corrected the answers according to the model answers (Appendix A), using
the Semantic Error Counting, SQL Challenge Error Counting, and Intermediate Error
Counting Forms shown in Appendix F. Each examiner independently assessed the
participant responses and corrected the response. Each discrete alteration (addition or
deletion of a query component) counted as one "micro error" in the Semantic Error Counting
Form (Appendix F).
The corrected response that determined the total error count was the response that required
the fewest changes to the participant's response, and still produced the required result set.
This approach ensured a lower error count than a strict modification of the response to ensure
an exact match to the model answer. Appendix G provides an example corrected response.
The examiners then compared their independent assessments to ensure that all errors had
been found and corrected and that the proposed formulations or corrected formulations
produced the correct output. If more than one correction method was found to produce a
correct query, the correction method that produced the smallest number of errors was used.
A diary of common errors and their corrections was kept to ensure consistency throughout the
assessment process. The final, moderated, error sheets were transcribed to a relational
database for analysis.
23
4. Results and Discussion
4.1 Overview of Experimental Results
Participant demographic information and statistics are presented in Tables 2, 3, and 4. The
demographic information indicates that the assignment of participants to ensure homogeneity
between Group A and Group B was successful. The groups are relatively homogeneous in
terms of course background, grade point average (GPA), and age. In any case, both Group A
and Group B received the treatment effect of ambiguity on alternate questions, mitigating
concerns of the effect of a selection bias on experimental results.
Table 2
Participant Demographic Information and Descriptive Statistics: Course Background of Group A and Group B
Enrolled Degree Group
A
Group
B
Total
Undergraduate Arts 3 3 6
Undergraduate Business 20 18 38
Undergraduate Computer Science/Information systems 3 0 3
Postgraduate Business 2 1 3
Postgraduate Computer Science/Information Systems 5 11 16
Total Participants: 33 33 66
Table 3 Participant Demographic Information and Descriptive Statistics:
Academic Record of Group A and Group B
Academic Record Average Standard
Deviation
Min Max
GPA (65 students with academic
records)
4.94 0.90
3.26 7.00
GPA (Group A: 33 students with
academic records)
5.04 0.83 3.26 6.84
GPA (Group B: 32 students with
academic records)
4.83 0.97 3.29 7.00
24
Table 4
Participant Demographic Information and Descriptive Statistics:
Participant Age in Group A and Group B
Age (in Years) Average Standard
Deviation
Min Max
Average Age
(65 Students with date of birth
available)
24.94 7.72 18.74 61.25
Average Age
(Group A, 33 Students with date
of birth available)
24.76 7.29 19.50 48.53
Average Age
(Group B, 32 Students with date
of birth available)
25.13 8.26 18.74 61.25
Participants completed 425 responses in the experiment. The experiment contained sixteen
questions for both ambiguous and clear information requests. Due to the two hour time
constraint no participant completed more than twelve questions. Forty participants (60.61%
of the sample population) completed six questions. On average, participants completed 6.44
questions, with a standard deviation of 1.75.
Table 5 provides an overview of the participants' results in the experiment. Total errors is
calculated as the average of the micro errors counted using the Semantic Error Counting
Sheet shown in Appendix F. Appendix H provides a Pearson correlation matrix of the
dependent and independent variables measured in the experiment. Appendix I provides
detailed reports of the errors participants made on each individual question.
25
Table 5
Comparative Statistics for all Participant Responses
Grouped by Question (Q) and Treatment (T). Note that for T, a = ambiguous, c = clear Q T Halstead's
Complexity
Group Response
Count
Attempts
Average
Attempts
Standard
Deviation
Confidence
Average
Confidence
Standard
Deviation
Duration
Average
Duration
Standard
Deviation
Total Errors
Average
Total Errors
Standard
Deviation
1 a 1.6927 A 32 3.31 1.99 6.22 1.36 10.51 4.63 1.59 3.66
1 c 1.6927 B 33 3.18 2.16 6.42 0.87 11.63 6.60 1.12 2.48
2 a 5.4186 B 33 9.21 8.88 5.21 1.47 20.74 11.30 4.27 8.18
2 c 5.4186 A 33 3.61 3.43 6.30 1.05 9.03 6.89 0.30 0.81
3 a 6.8908 A 33 7.94 6.04 5.91 1.57 11.84 7.72 3.97 3.50
3 c 6.8908 B 33 5.09 6.18 6.27 1.42 8.63 5.29 1.03 2.86
4 a 4.4697 B 32 7.31 4.75 5.38 1.64 15.57 8.95 4.03 5.54
4 c 4.4697 A 33 6.52 7.36 6.21 1.47 10.95 8.46 0.67 2.23
5 a 12.2917 A 33 9.24 6.63 5.24 2.21 18.54 11.06 9.42 10.39
5 c 12.2917 B 30 7.07 5.98 5.37 2.16 15.65 9.74 5.20 7.70
6 a 18.8000 B 17 11.41 7.21 5.59 1.33 23.59 7.93 32.94 13.21
6 c 18.8000 A 23 14.91 9.36 4.87 1.91 25.63 10.13 8.00 10.49
7 a 16.0076 A 15 11.07 6.10 5.07 1.49 18.78 5.46 7.27 8.65
7 c 16.0076 B 15 7.67 4.20 5.07 1.98 15.31 7.86 6.13 7.41
8 a 16.2684 B 6 6.83 8.42 5.83 1.60 13.24 8.36 2.33 4.08
8 c 16.2684 A 10 6.40 2.46 5.00 1.94 12.53 5.35 6.40 6.52
9 a 23.8970 A 3 12.33 2.08 3.00 1.73 16.43 7.77 18.00 10.54
9 c 23.8970 B 2 6.50 3.54 6.50 0.71 15.36 2.51 15.50 21.92
10 a 19.4819 B 1 7.00 - 5.00 - 9.93 - 20.00 -
10 c 19.4819 A 4 7.25 3.20 4.25 2.50 9.56 1.40 5.00 2.58
11 a 22.4000 A 2 7.00 4.24 5.00 2.83 8.53 2.13 22.50 13.44
11 c 22.4000 B 1 4.00 - 7.00 - 9.45 - 8.00 -
12 c 29.1633 B 1 14.00 - 4.00 - 10.10 - 8.00 -
The relationships between the dependent variables (duration, confidence, and total errors) and
the independent variables (complexity, ambiguity) are graphically depicted in Figures 3, 4,
and 5. These figures illustrate that the hypothesised relationships for complexity and
ambiguity were supported for most measures by most queries.
Questions by Treatment and Error
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
1 2 3 4 5 6 7 8 9 10 11 12
Question
Avera
ge E
rro
rs
Ambiguous
Clear
Figure 3
Depicting graphically the relationship between the treatment received (ambiguous or clear information request)
and the total errors in the participant's response.
26
Questions by Treatment and Duration
0.00
5.00
10.00
15.00
20.00
25.00
30.00
1 2 3 4 5 6 7 8 9 10 11 12
Question
Avera
ge D
ura
tio
n
(in
min
ute
s)
Ambiguous
Clear
Figure 4
Depicting graphically the relationship between the treatment received (ambiguous or clear information request)
and the duration taken for the participant to prepare the response.
Questions by Treatment and Confidence
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
1 2 3 4 5 6 7 8 9 10 11 12
Question
Avera
ge C
on
fid
en
ce
Ambiguous
Clear
Figure 5
Depicting graphically the relationship between the treatment received (ambiguous or clear
information request) and the participant's confidence in the response.
Question Six, with an average of 32.94 errors (standrard deviation of 13.21), caused the most
problems for participants in its ambiguous formulation. Nonetheless the seventeen
respondents to Question Six in its ambiguous formulation took on average slightly less time
to complete the response (23.59 average minutes, 7.93 standard deviation) than the twenty-
three respondents for the clear formulation (25.63 average minutes, 10.13 standard
deviation).
27
Participants that completed Question Eight in the clear formulation made more average errors
(6.40, standard deviation of 6.52) than those with the ambiguous formulation (average of 2.33
and standard deviation of 4.08). Participants also exhibited higher average confidence ratings
for the ambiguous formulation of this question (5.83, standard deviation of 1.60) than
participants receiving the clear formulation (5.00, standard deviation of 1.94).
A reason for these results may be that extraneous ambiguity is apparent in the clear
formulation due to the formulation's length. Question Eight had sixteen completed responses
(six respondents for the ambiguous formulation, ten respondents for the clear formulation),
however, which limits the weight that can be placed on this question's result. Because of the
small number of participants completing Questions Nine through Twelve, analysis of
differences in these individual questions is not appropriate.
4.2 Regression Analysis
Two multiple linear regression models were used to analyse the experimental results. The
model used to test H1a-c, and H9a-c for the effects of ambiguity and complexity respectively
was:
(1) Performance = Ambiguity + Complexity
where ambiguity was a dichotomous variable and complexity was measured using the
Halstead (1977) complexity measure for difficulty.
28
The model used to test the seven individual types of ambiguity in H2a-c to H8a-c was:
- deliver_days) from carrier, invoice, customer, delivdays where carrier.carrier_code =
invoice.carrier_code and invoice.cust_no = customer.cust_no and carrier.carrier_code =
delivdays.carrier_code and customer.city = delivdays.city and customer.state =
delivdays.state and customer.country = delivdays.country group by carrier.carrier_code, carrier_name, delivdays.country having avg((deliver_date - ship_date) - deliver_days) > 1;
52
Appendix B: Experiment Instruction Sheet
INSTRUCTIONS
This laboratory session requires you to execute command files and query a database.
Please follow the instructions carefully.
53
Part 1 - Scenario
George Harford Wine Merchant distributes wines throughout the world. They predominantly
trade with customers in France, Japan, the USA, and the UK. Customers place orders for
wines which employees process, pack, and ship to the customers via an appropriate carrier.
The packers attach an invoice created by the Accounts Receivable department to the goods
when shipped. These invoices contain all relevant information generated from the invoice and
inventory databases. The data structures for the relevant tables are attached.
54
Part 2 - SQL Syntax Reminder
The SQL syntax for SELECT commands follows. Items in square brackets [ ] are optional,
and items in braces { } can be repeated zero or more times: