Validity and Reliability

Slide 1

Validity and Reliability

Validity Is the translation from concept to operationalization accurately representing the underlying concept. Does your variables measure what you think in abstract concepts.This is more familiarly called Construct Validity.empirical study with high construct validity would ensure the studied parameters are relevant to the research questions.Without a valid design, valid scientific conclusions cannot be drawn

Types of construct validity Translation validity (Trochims term)Face validity Content validity Criterion-related validity Predictive validity Concurrent validity Convergent validity Discriminant validity

Translation validityIs the operationalization a good reflection of the construct? This approach is definitional in nature assumes you have a good detailed definition of the construct and you can check the operationalization against it.Example software success. Does your definition representative of SW success construct?E.g. Application software is a software used to assist end users.Face ValidityOn its face" does it seems like a good translation of the construct. If the respondent knows what information we are looking for, they can use that context to help interpret the questions and provide more useful, accurate answers

Content ValidityCheck the operationalization against the relevant content domain for the construct. For example, a depression measure should cover the checklist of depression symptomsWorld history its content must include major histories from all continents or countriesInterface Usability should include all valid usability measures: learnability, efficiency, memorability (low cognitive overload), error recovery and the likeCriteria-Related Validity Check the performance of operationalization against some criterion. it compares the test with other measures or outcomes (the criteria) already held to be valid. For example, employee selection tests are often validated against measures of job performance (the criterion), and IQ tests are often validated against measures of academic performance (the criterion).

Predictive ValidityAssess the operationalization's ability to predict something it should theoretically be able to predict. A high correlation would provide evidence for predictive validity Examples Measures of job applicant is supposed to measure the new applicant performance at work. If the applicant performs well at his job when measured after one year, our applicant measurement is a good predictive measure.Measures of Interface Usability can predict later SW utilization. High correlation is an indication of measures of predictive validityConcurrent Validity Assess the operationalization's ability to distinguish between groups that it should theoretically be able to distinguish between. It is similar to predictive validity but the measures are taken at the same time. If measure of subordinate rating and supervisor rating positively correlate on job performance, it has high concurrent validity Compares the results of two measures Convergent Validity Examine the degree to which the operationalization is similar to (converges on) other operationalizations that it theoretically should be similar to. This compares two or more attributes of the same construct To show the convergent validity of a test of arithmetic skills, one might correlate the scores on a Math test with scores on other tests (e.g problem solving ability) that support to measure basic math ability, The measure learnability should have high correction with efficiency, memorability, errors and satisfactionAll measures measure the same constructThere is also instrument measure convergence If measure of Interview and questionnaire produce the same result to say the instruments are convergent

Discriminant Validity Examine the degree to which the operationalization is not similar to (diverges from) other operationalizations that it theoretically should be not similar to. A test of a concept is not highly correlated with other tests designed to measure theoretically different concepts.Measure the overlap between two scales using a formula

Discriminate whererxy is correlation between x and y, rxx is the reliability of x, andryy is the reliability of y: a result less than .85 tells us existence of discriminant validity> .85, the two constructs overlap greatly and they are likely measuring the same thing.

Discriminate Measuring the concept of Narcissism and Self-esteemNarcissismis a term with a wide range of meanings, usually is used to describe some kind of problem in a person or group's relationships with self and others.Self-esteemis a term inpsychologyto reflect aperson's overall evaluation or appraisal of her or his own worth. Self-esteem encompassesbeliefs(for example, "I am competent", "I am worthy") andemotions such as triumph, despair,prideand shameThe Researchers show that their new scale measures Narcissism and not simply Self-esteem.Discriminate First, we can calculate the Average Inter-Item Correlations within and between the two scales:

Narcissism Narcissism: 0.47Narcissism Self-esteem: 0.30Self-esteem Self-esteem: 0.52We then use the correction for attenuation formula:

Discriminate Since 0.607 is less than 0.85, we can conclude that discriminant validity exists between the scale measuring narcissism and the scale measuring self-esteem.e.g., a new measure of depression should have negative correlations with measures of happinesshave minimal correlations with tests of physical health,

Internal and External Validity Internal ValidityInferences are said to possess internal validity if a causal relation between two variables is properly demonstrated.A causal inference may be based on a relation when three criteria are satisfied:the "cause" precedes the "effect" in time (temporal precedence),the "cause" and the "effect" are related (covariation), andthere are no plausible alternative explanations for the observed covariation

Example - InternalThe researcher hypothesized that computer training will increase software usabilityTraining (IV) and usability (DV)Positive correlation between the two indicates high internal validity.This can be done with Spearman Rank Correlation or Pearson Correlation Can be easily done with SPSS software

Internal In many cases, however, themagnitude of effectsfound in the dependent variable may not just depend on variations in the independent variable,thepowerof the instruments and statistical procedures used to measure and detect the effects, andthe choice of statistical methodsOther variables or circumstances uncontrolled for (or uncontrollable) may lead to additional or alternative explanations (a) for the effects found and/or (b) for the magnitude of the effects found.

Internal highly controlled true experimental designs, i.e random selection, random assignment to either the control or experimental groups, reliable instruments, reliable manipulation processes, and safeguards against confounding factors may be the "gold standard" of scientific research.the very strategies employed to control these factors may also limit the generalizability orExternal Validityof the findings.External validity external validity refers to the applicability of study or experimental results to realms beyond those under immediate observation.Refers to generalizability of the research finding to other similar casesDoes the software solution for one case is also applicable to other similar cases in other organization or country.Does the solution has wider application and audience or acceptance. We need that solution!Researchers prize studies with external validity, since the results can be widely applied to other scenarios.External External validity for a given study has several aspects: whether the study generalizes to other subjects in the domain whether there exist enough evidence and arguments to support the claimed generalizability whether the study outcomes validate predicted theoriesReliability Means "repeatability" or "consistency". A measure is considered reliable if it would give us the same result over and over again (assuming that what we are measuring isn't changing!).Measuring the same distance at different times should give the same result if the instrument (e.g. meter) is reliable. There are four general classes of reliability estimates, each of which estimates reliability in a different way. Types of Reliability Estimation24Inter-rater or inter-observer reliabilityIs used to assess consistency of different raters Test-retest reliabilityIs used to assess the consistency of a measure from one time to anotherParallel-forms reliabilityIs used to assess the consistency of the results of two tests constructed in the same way from the same content domainInternal consistency reliabilityIs used to assess the consistency of results across items within a testInter-Rater or Inter-Observer ReliabilityUsed to assess the degree to which different raters/observers give consistent estimates of the same phenomenon. Establish reliability on pilot data or a subsample of data and retest often throughout.For categorical data a X2 (Chai sqaure) can be used and for continuous data an R (regression) can be calculated.

Test-Retest Reliability Used to assess the consistency of a measure from one time to another. This approach assumes that there is no substantial change in the construct being measured between the two occasions. The amount of time allowed between measures is critical. The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation Parallel-Forms Reliability Used to assess the consistency of the results of two tests constructed in the same way from the same content domain. Create a large set of questions that address the same construct and then randomly divide the questions into two sets and administer both instruments to the same sample of people. The correlation between the two parallel forms is the estimate of reliability. One major problem with this approach is that you have to be able to generate lots of items that reflect the same construct. Split Half Reliability Collect your data with the instrument to measure your construct. Split the data into halve and do correlation between the two data setsPositive correlation indicates high reliability

Reliability and Validity29

Research EthicsEthics a definition31 Research should avoid causing harm, distress, anxiety, pain or any other negative feeling to participants. Participants should be fully informed about all relevant aspects of the research, before they agree to take part [1]ARE YOU HOMOSEXUAL?24 Nov 2008Research Methodology32THIS IS A HYPOTHETICAL QUESTION - DO NOT ANSWER THISResearch questions ethical or not?33Research may ask a taboo or personal questionWhat if you were asked if you are homosexualHow would you feel if you were asked this?Would you feel awkward?Would you lie?Would you answer truthfully?Why are we asking this question anyway?Could we phrase the question better?Pause for thought34 Is it morally correct to carry out research by any means whatsoever providing that the end result increases the sum of human knowledge or provides some tangible benefit to mankind?

Does the end justify the means?

DISCUSSEthics before Research begins35Inform all participants fullyWhat about childrenMentally deficient peopleThose with poor language skillsObtain consentDefine the gatekeeperCraft your research methods carefullyNo distortion of the data

Ethics during ResearchResearch Methodoogy36Field notes what are they?Do we need these?Can we use these in our research?Consent issuesContent issuesMoral issuesYou have heard about a crime do you report it?DISCUSSConfidentiality of respondent data37How do we keep track of respondents?Should we keep track of respondents?How do we de-personalise gathered data?If data are depersonalised, is it morally correct to reuse this data for a new research project?

DISCUSSEthics after Research38Disposal of data paper or digital?Freedom of Information ActReuse of data is this ethical?Are there occasions where reuse of gathered data for another purpose is ok?Requesting permission from respondentsDifficulties of contacting original respondents

Engineering and Ethics39Confidentiality of dataOwnership of research resultsConsider research resultsIs a cure for a disease as the direct result of research good?Is the creation of a powerful bomb as the direct result of research good?e.g. the atom bomb

DISCUSSResearch Ethics Committees40Monitor ethical issues in research programmes Before during and after researchMakes decisions and enforces theseGives researchers organisational supportReassurance to researchers about moral issues related to a particular research projectPlagiarism41What is plagiarism?How do we avoid plagiarism?What are the dangers that plagiarism causes?State some examples of plagiarism.

DISCUSSSummary - EthicsEthics are moral issues relating to the prior design, gathering and usage of data for research purposesThink before, during and afterConsult gatekeepers and respondentsNever act alone consult your supervisor if in doubt

Validity and Reliability

Documents

high construct validity

reliability validity

measures convergent

high concurrent validity

likecriteriarelated

measures of job performance

valid usability measures

indication of measures