This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Pennsylvania State University
The Graduate School
College of Education
FACTOR STRUCTURE OF WECHSLER PRESCHOOL AND PRIMARY SCALE
OF INTELLIGENCE (THIRD EDITION) – SPANISH VERSION SCORES
Submitted in Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
August, 2016
ii
The dissertation of Abigail E. Crimmins was reviewed and approved* by the following: Barbara A. Schaefer Associate Professor of Education Dissertation Advisor Chair of Committee Professor-in-Charge, School Psychology Program Peter M. Nelson Assistant Professor of School Psychology Richard M. Kubina, Jr. Professor of Education Laura E. Murray-Kolb Assistant Professor of Nutritional Sciences *Signatures are on file in the Graduate School.
iii
Abstract
A critical component in the adaptation of measures across culturally different populations
is the validation of the adapted measure for use in the new population. Validation
requires evidence that the scores from the new tool measure the same qualities or aspects
of the construct in the new population as they purport to measure in the original
population. This study examined the reliability and validity of scores for an adapted
version of the Spanish-language form of the Wechsler Preschool and Primary Scale of
Intelligence-Third Edition (WPPSI-III-SP; Wechsler, 2009) as a measure of cognitive
ability among a cohort of children in rural Peru. Using confirmatory factor analyses
(CFA), a series of models were fit to data from a cohort of children age 36 months (n =
147) and the same cohort at age 48 months (n = 167). These models represented the
theoretical factor structure established by the publisher for the normative data as well as
models derived from other studies of cross-cultural intelligence test adaptation in the
region. It was hypothesized that the models derived from prior South American studies
would yield a better fit for the data as compared to the normative sample model.
Convergent validity was also assessed based on the hypothesis that the scores from the
adapted WPPSI-III-SP would be positively and strongly correlated with cognitive scores
from a similarly adapted Bayley Scales of Infant and Toddler Development-Third Edition
(Bayley-III; Bayley, 2005) administered with these children at age 24 months. CFA
results support a one-factor model for both the 36- and 48-month time points for the
adapted WPPSI-III-SP measure; however, evidence for convergent validity with prior
estimates of cognitive ability using the adapted Bayley-III was minimal. Implications for
cross-cultural test adaptation and the use of the adapted WPPSI-III-SP are discussed.
iv
Table of Contents
List of Tables.............................................................................................................. vi List of Figures............................................................................................................. vii Chapter 1: INTRODUCTION..................................................................................... 1 Cultural Context of Peru................................................................................ 4 Purpose and Proposed Models........................................................................ 7 Chapter 2: LITERATURE REVIEW.......................................................................... 11 Cross-Cultural Test Adaptation...................................................................... 11 Cross-Cultural Intelligence Testing................................................................ 18 Adaptation of Preschool Cognitive Assessment............................................. 22
Description of the WPPSI-III.......................................................................... 25 Present Study.................................................................................................. 32 Chapter 3: METHOD................................................................................................. 42
Sample........................................................................................................... 42 Measures......................................................................................................... 43 Procedure........................................................................................................ 45 Data Analyses................................................................................................. 46
Chapter 4: RESULTS................................................................................................. 52 Younger Cohort.............................................................................................. 52 Preliminary Analyses and Descriptive Statistics...................................... 52 Model 1-NormYoung................................................................................ 54 Model 2-OneFactorYoung........................................................................ 56 Model Comparison.................................................................................... 58 Older Cohort................................................................................................... 59 Preliminary Analyses and Descriptive Statistics...................................... 59 Model 3-NormOlder................................................................................. 61 Model 4-OneFactorOlder.......................................................................... 62 Model 5-AltOlder...................................................................................... 64 Convergent Validity Analyses........................................................................ 65
Chapter 5: DISCUSSION........................................................................................... 67 Fit of Hypothesized Models............................................................................ 68 Convergent Validity Evidence........................................................................ 75
Limitations and Future Research.................................................................... 77 Implications..................................................................................................... 79 Conclusions..................................................................................................... 81
v
References................................................................................................................... 82 Appendix A. Overview of WPPSI Content................................................................ 93 Appendix B. Parameter Estimates and Standard Errors for Models 3, 4, and 5......... 95
vi
List of Tables
Table 1. Model Identification Rules for Standard CFA Models................................. 48
Table 2. Free Parameters and Observations for Specified Models............................. 48
Table 3. Intercorrelations, Descriptive Statistics, and Reliability Estimates for Adapted WPPSI-III-SP Subtest Scores (Younger Cohort)............................ 53
Table 4. Parameter and Standard Error Estimates for Model 1-NormYoung........... 55
Table 5. Parameter and Standard Error Estimates for Model 2-OneFactorYoung.. 57
Table 6. Selected Fit Indices for Younger Cohort Models......................................... 58
Table 8. Intercorrelations, Descriptive Statistics, and Reliability Estimates for Adapted WPPSI-III-SP Subtest Scores (Older Cohort).................................. 60
Table 9. Parameter and Standard Error Estimates for Model 4-OneFactorOlder with Modifications.......................................................................................... 63
Table 10. Selected Fit Indices for Older Cohort Models............................................ 65
Table 11. Descriptive Statistics and Reliability Estimates for Adapted WPPSI-III-SP Total Scores and Adapted Bayley-III Scores (Younger and Older Cohorts)........................................................................................................... 66
Table A1. Subtests and Composite Scores of the WPPSI, WPPSI-R, and WPPSI-III by Age........................................................................................... 94
Table B1. Parameter and Standard Error Estimates for Model 3-NormOlder............ 95
Table B2. Parameter and Standard Error Estimates for Model 4-OneFactorOlder.... 96 Table B3. Parameter and Standard Error Estimates for Model 5-AltOlder................ 97
vii
List of Figures
Figure 1. Hypothesized model of the adapted WPPSI-III-SP scores among the Peruvian cohort at age 36 months based on the factor structure of the normative data (Wechsler, 2002b), referred to as Model 1-NormYoung for analyses........................................................................................................... 37
Figure 2. Hypothesized single-factor model of the adapted WPPSI-III-SP for the
Peruvian cohort at age 36 months, referred to as Model 2-OneFactorYoung model for analyses................................................................................... 38
Figure 3. Hypothesized model of the adapted WPPSI-III-SP for the Peruvian
cohort at age 48 months based on the factor structure of the normative data (Wechsler, 2002b), referred to as Model 3-NormOlder for analyses............ 39
Figure 4. Hypothesized single-factor model of the adapted WPPSI-III-SP for the
Peruvian cohort at age 48 months, referred to as Model 4-OneFactorOlder model for analyses.......................................................................................... 40
Figure 5. Hypothesized model of the adapted WPPSI-III-SP for the Peruvian
cohort at age 48 months, referred to as Model 5-AltOlder model for analyses........................................................................................................... 41
Figure 6. Completely standardized factor loadings for Model 1-NormYoung........... 56 Figure 7. Completely standardized factor loadings for Model 2-OneFactor Young.. 58 Figure 8. Completely standardized factor loadings for modified Model 4-
Quotient. The VIQ and PIQ aim to measure the same abilities as described for the
29
younger cohort. The subtests of Information and Block Design carry over from the
younger age range to the older range. Information, Vocabulary, and Word Reasoning
subtests combine to provide the VIQ. The Vocabulary subtest asks children to verbally
define a series of words, whereas the Word Reasoning subtest asks the children to
identify the common concept being described in a series of increasingly specific clues.
The Block Design, Matrix Reasoning, and Picture Concepts subtests combine to provide
the PIQ composite. During the Matrix Reasoning subtest, the child looks at an incomplete
matrix and selects the missing portion from response options. The child chooses pictures
from a series of rows that form a group with a common characteristic during the Picture
Concepts subtest. The scores from these core subtests, in addition to the Coding subtest,
are then combined to provide an overall measure of general intellectual functioning
(FSIQ; Wechsler, 2002a). During the Coding subtest, the child is presented with a series
of geometric shapes (e.g., star, circle, square). The child uses a key to copy symbols (e.g.,
line, cross) into each shape within a certain time limit.
Reliability and validity. The WPPSI-III was first developed and standardized in
the United States using a normative sample of 1,700 children. The sample was stratified
on the characteristics of age, sex, ethnicity, geographic region, and parental education
(Wechsler, 2002b). The scores from the test were found to have good reliability with
internal consistency indices ranging from .94 to .96 for the VIQ, from .89 to .95 for the
PIQ, and from .95 to .97 for the FSIQ. Split-half reliability estimates for the core subtests
ranged from .83 (Symbol Search) to .91 (Word Reasoning; Wechsler, 2002b). As
reported in the test’s technical manual (Wechsler, 2002b) and through a principal axis
factor analysis by Sattler (2008), the WPPSI-III is comprised of two factors for the
30
younger age range and four factors for the older age range. These factors align with the
composite and subtest groups described previously. For the younger children, significant
factor loadings for the subtests onto their respective factors ranged from .59 (Object
Assembly on PIQ) to .85 (Receptive Vocabulary on VIQ). For the older children,
significant factor loadings for the subtests onto their respective factors ranged from .38
(Matrix Reasoning on PIQ) to .88 (Vocabulary on VIQ).
However, Gordon (2004) notes in his review of the WPPSI-III that
intercorrelational relationships among the subtests point to a potential one-factor
structure, with all subtests loading on a general intellectual factor. In support of a two-
factor structure it would be expected for the VIQ subtests to correlate highly with each
other (convergent validity) than with the PIQ subtests (discriminant validity). Across age
bands, the VIQ subtests for both age bands do correlate more highly with each other than
with the subtests from the PIQ composite. This pattern, however, does not hold for the
PIQ subtests. These subtests correlated equally high among each other and with the
subtests from the VIQ factor. The WPPSI-III test manual (Wechsler, 2002b) addresses
these validity concerns. The authors posit that this lack of discriminant validity may be a
reflection of less differentiation between cognitive abilities evidenced among young
children and may be due to the high g (general intelligence) loadings of all subtests.
Based on these arguments and the intercorrelations presented, Gordon (2004) questions
whether or not a one-factor structure may be more appropriate, particularly for the
younger age band. The factor structure presented in the manual was replicated through
principal axis factor analysis (Sattler, 2008) and item response theory (Price, Raju, Lurie,
Wilkins, & Zhu, 2006). Beyond the interrcorrelational evidence highlighted by Gordon
31
(2004), no further studies were found to either confirm or dismiss the superiority of a
one-factor structure.
The WPPSI-III has been translated, adapted, and standardized for use in the
following languages: Spanish (normed in Spain), French (normed in France), French
Canadian, German, Italian, Swedish, Korean, Japanese, and Dutch. Standardization also
occurred in Australia, the United Kingdom, and Canada (Visser, Ruiter, van der Meulen,
Ruijssenaars, & Timmerman, 2012). Few further adaptations of the WPPSI-III for use in
a culture different from the original normative culture were found within the literature.
While Bagdonas, Pociute, Rimkute, and Valickas (2008) refer to the adaptation of the
WPPSI-III for use in Lithuania, no studies confirming this adaptation process were found.
Furthermore, Wasserman and colleagues (2004) outlined the translation and adaptation of
the WPPSI-III for use among young children in Bangladesh. However, no description of
reliability or validity evidence for scores from the adapted measure was provided. Similar
to Bangladesh, Karino, Laros, and Ribeiro de Jesus (2011) used an adapted version of the
WPPSI-III within a study with no mention of the adaptation and validation process. It
should be noted that the language of the articles in which these studies are written limits a
search for evidence of cross-cultural validation of the WPPSI-III. Many studies may have
provided this evidence and would be accessible for researchers or clinicians within a
country who may use the adapted measure. For the purpose of this study, however, it
remains unclear as to whether or not the factor structures provided within the
standardization samples of WPPSI-III are replicated when these assessments are adapted
for use in a dissimilar culture.
32
Spanish adaptation. In 2009, a Spanish-language version of the WPPSI-III, the
Escala de Inteligencia de Wechsler para Preescolar y Primaria – III (heretofore identified
as the WPPSI-III-SP; Wechsler, 2009) was adapted and normed for use in Spain. The test
was normed on a sample of 1,220 Spanish children (Rodriguez & Miguel, 2012). This
test contains the same subtests and composite scores as the English-language version.
Through the adaptation process, however, items were changed or adapted to be culturally
appropriate for use in Spain. The order of items was necessarily changed so as to ensure
that items became increasingly more difficult for the Spanish children.
Present Study
The question remains, however, as to whether or not an adapted version of the
WPPSI-III-SP reliably and fairly measures intelligence in a cohort of children in rural
Peru. The primary purpose of this study, therefore, is to examine the extent to which a
model based on the factor structure derived from scores on the WPPSI-III-SP completed
by the normative Spain sample fit the scores from an adapted WPPSI-III-SP completed
by children from rural communities in Peru (see Figures 1 and 3). Given the cultural,
developmental, construct, and adaptation considerations, however, the present study also
proposes to determine if another factor structure would provide a better fit. As such, the
study attempts to answer the following questions:
1. Is the factor structure of scores from the original version of the WPPSI-III
replicated with children living in rural Peruvian communities?
2. If not, does another model fit the data better than the model outlined within the
normative sample?
33
In addition, this study aims to assess the construct validity of the scores from the
adapted measure by assessing the relationship of these scores to the scores of another
measure of cognitive development. The final research question addresses the convergent
validity of the adapted measure’s scores:
3. To what extent do the scores from the adapted WPPSI-III-SP correlate with the
scores from another, previously administered adapted measure of cognitive
development?
Possible alternative factor structures. Looking to other instances of intelligence
test adaptation in the region may help predict possible factor structures of the adapted
WPPSI-III-SP, outside the factor structure of the normative sample. As noted earlier,
however, no studies were found to examine the validity of scores from a cross-culturally
adapted version of the WPPSI-III. However, two studies of the adaptation of an
intelligence measure for older children were conducted with children in Colombia and
Chile. While the cultures of Colombia and Chile are not the same as the culture in rural
Peru, the cultures may be more similar than a comparison between Spain and rural Peru.
As such, the standardization of intelligence tests in these two countries may offer possible
alternative factor structures to consider, despite outlining the psychometric properties of
an assessment aimed at older children.
Contreras and Rodriquez (2013) studied the reliability and validity of scores from
the Spanish version of the Wechsler Intelligence Scale for Children – Fourth Edition
(WISC-IV-SP; Wechsler, 2005) in a sample of children and adolescents from
Bucaramanga, Colombia. Similar to the WPPSI-III, the WISC-IV-SP indicates that the 15
subtests depend on a four-factor structure. Regarding reliability, the WISC-IV scores had
34
similar reliability estimates for the Colombian sample as they had for the Spanish
version’s normative sample. For the overall assessment, Contreras and Rodriquez (2013)
calculated a split-half alpha coefficient of .95 and a Cronbach’s alpha coefficient of .98.
Regarding validity, the data did not support a four-factor structure as presented in the
original version of the test. Instead, the researchers found evidence for a single factor that
accounted for 70.26% of the total variance in the test scores (Contreras & Rodriquez,
2013). In addition, Baron and Leonberger (2012) argue that the intellectual functioning of
the preschool-aged child is more homogeneous than is the cognitive ability of an older
individual. As such, the alternative models described in Figures 2 and 4 may prove a
better fit for the data from the Peruvian sample of children.
In a second study of an intelligence test adaptation in South America, Ramirez
and Rosas (2007) adapted the Argentinian version of the Wechsler Intelligence Scale for
Children – Third Edition (WISC-III; Wechsler, 1991/1997) for use in Chile. The
researchers administered the adapted test to a stratified sample of 1,914 children, divided
into 11 age categories. The internal consistency of the subscale and composite scores and
the factorial structure of the test are reported. Regarding internal consistency, Cronbach’s
alpha coefficients ranged from .65 to .91 for the subscale scores and from .75 to .87 for
the composite scores. Using factor analysis, Ramirez and Rosas (2007) analyzed the
factor structure of the sample as a whole and for four age ranges (6 – 7, 8 – 10, 11 – 13,
and 14 – 16 years). Overall, the individual subtests loaded onto four distinct factors in a
manner consistent with the original test’s factor structure. In looking at the results for the
four age ranges, however, the Coding subtest loaded significantly on the factor
representing Perceptual Organization and not on the supplemental Processing Speed
35
factor, as was Coding’s loading in the Argentinian sample. The authors argue that this
result perhaps demonstrates children’s cognitive skills have not fully differentiated and is
a reflection of preoperational thought. In other words, the Coding subtest may require
more abstract reasoning than previously theorized (Ramirez & Rosas, 2007).
This interpretation may be especially relevant for a population in which children
may not have had extensive exposure to paper-and-pencil educational activities. To be
able to quickly complete the Coding subtest, a child relies on fluent skills in shape and
symbol identification. A child who has not had formal exposure to performing this type
of paper-and-pencil test, may be relying more on his or her perceptual reasoning to
interpret the larger shape and then identify which symbol goes into that shape. As such,
the task becomes more a performance than a processing speed task. Therefore, a third
proposed model for the older cohort of children posits the loading of Coding on the PIQ
factor (see Figure 5).
In summary, for each age group of children (i.e., 36-month and 48-month
cohorts), two potential models are proposed: (1) models identical to the factor structure
demonstrated in the normative data (see Figures 1 and 3) and (2) a one-factor model, such
that all subtests will load on one general latent factor of intelligence (i.e., no separate
composite scores; see Figures 2 and 4; Contreras & Rodriquez, 2013). For the 48-month-
old children, a third model is proposed in which the Coding subtest loads on the
Performance factor (see Figure 5; Ramirez & Rosas, 2007).
Hypotheses. If the adapted WPPSI-III-SP measures the construct of intelligence
in the same manner as the original version, it was hypothesized that the theorized factor
structure of the adapted WPPSI-III-SP would be an adequate fit for the Peruvian cohort.
36
Given that, however, the cultures of Colombia and Chile may resemble more closely the
Peruvian culture as compared to Spain, it was also hypothesized that the models based on
the research in these South American countries would provide a better fit to the data than
the model based on the normative data. Finally, it was hypothesized that the scores from
an adapted measure of intelligence completed by the children at age 24 months would be
positively correlated with scores from the adapted WPPSI-III-SP.
37
Figure 1. Hypothesized model of the adapted WPPSI-III-SP scores among the Peruvian cohort at age 36 months based on the factor structure of the normative data (Wechsler, 2002b), referred to as Model 1-NormYoung for analyses. GLC = General Language Composite; VIQ = Verbal Intelligence Quotient; PIQ = Performance Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient.
38
Figure 2. Hypothesized single-factor model of the adapted WPPSI-III-SP for the Peruvian cohort at age 36 months, referred to as Model 2-OneFactorYoung model for analyses. This model proposes a single overall ability as found by Contreras and Rodriguez (2013).
39
Figure 3. Hypothesized model of the adapted WPPSI-III-SP for the Peruvian cohort at age 48 months based on the factor structure of the normative data (Wechsler, 2002b), referred to as Model 3-NormOlder for analyses. VIQ = Verbal Intelligence Quotient; PIQ = Performance Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient.
40
Figure 4. Hypothesized single-factor model of the adapted WPPSI-III-SP for the Peruvian cohort at age 48 months, referred to as Model 4-OneFactorOlder model for analyses. This model proposes a single overall ability as found by Contreras and Rodriguez (2013).
41
Figure 5. Hypothesized model of the adapted WPPSI-III-SP for the Peruvian cohort at age 48 months, referred to as Model 5-AltOlder model for analyses. This model proposes the loading of Coding on the factor representing the Performance Intelligence Quotient (PIQ), based on the findings of Ramirez and Rosas (2007). VIQ = Verbal Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient.
42
Chapter 3: Method
Sample A total of 188 children (101 boys) completed the younger version only (10.63%
of children), older version only (18.09%), or both versions (71.28%) of the adapted
WPPSI-III-SP. Three children were missing gender data. Among this cohort of children,
on average, the children’s mothers completed 7.77 years of education (SD = 2.68). One
hundred and fifty-six children completed the younger version of the assessment, aged 35
to 47 months (M = 36.38, SD = 3.04). For the older version, 168 children completed the
test, all aged 48 months, with the exception of one child age 49 months and one child age
36 months. For brevity’s sake, these will be referred to as the younger and older cohorts,
respectively; however, substantial overlap of participants exists (i.e., 71.28% above).
These children were drawn from a larger sample of children participating in the
Interactions of Malnutrition and Enteric Infections: Consequences for Child Health and
Development (MAL-ED) project overseen by the Foundation for the National Institutes
of Health and the Fogarty International Center. All participants lived in rural
communities in northeastern Peru. A review of this cultural context is provided in the
Introduction. Children were eligible to participate in the longitudinal study if their mother
was older than 16 years of age, if no other child in the household participated in the
study, if they were healthy (e.g., no congenital diseases or severe neonatal diseases
requiring prolonged hospitalization), if the family had no plans to move away from the
community within 6 months, and if the child was not part of a multiple pregnancy.
43
Measures
Adapted Wechsler Preschool and Primary Scale of Intelligence - Spanish
Version (adapted WPPSI-III-SP). The cognitive ability of each child was measured
through an adapted version of the Wechsler Preschool and Primary Scale of Intelligence -
Spanish Version (Wechsler, 2009). A review of the original WPPSI-III and the WPPSI-
III-SP is presented in Chapter 2. For use with the Peruvian sample, adaptations were
made to the WPPSI-III-SP by researchers from the Department of International Health at
Johns Hopkins University. These adaptations included altering pictures to be culturally
appropriate and rewording instructions to be appropriate for the dialect of Spanish spoken
in Peru.
For example, for an item on the Picture Concepts subtest, the picture of a
capybara replaced the picture of a squirrel. As squirrels are not native to the Peruvian
jungle environment, the children would have been unfamiliar with the animal and the
item may have been unfairly difficult for them to answer. In another example on this
subtest, pictures of more realistic and recognizable dogs to the children replaced original
cartoon pictures of dogs. In an example of changes made for Peruvian Spanish, the item
asking children to define "Swing" on the Vocabulary subtest was altered. The word used
on the WPPSI-III-SP signifies both the object (e.g., a playground swing) and a movement
(e.g., to swing back and forth) in Castilian Spanish. In Peruvian Spanish, however, the
word only represents the object. Another word was substituted asking the child to
describe the movement of swinging (A. Orbe, personal communication, July 14, 2014).
For the Block Design subtest, children earned a possible score of 0, 1, or 2 for
each item. Scoring for this test was dependent on not only whether or not the child
44
completed the construction within a specified time limit but also on whether or not the
child required one or two trials to do so. Children could earn either a score of 0 (incorrect
answer) or 1 (correct answer) on each item of the Information, Receptive Vocabulary,
Word Reasoning, Matrix Reasoning, and Picture Concepts subtests. Scores on the Object
Assembly subtest were based on the number of junctures (i.e., the place where two
adjacent puzzle pieces meet) correctly joined, with a possible per item score ranging from
0 to 5 points. For the Coding subtest, children received one point for each correctly
paired symbol and shape. Finally, for each item on the Vocabulary subtest, children could
earn 0, 1, or 2 points, with more sophisticated and specific definitions earning a higher
score. Possible score ranges for the subtests were as follows: (1) Block Design: 0 - 40; (2)
Note. n = 147. WPPSI-III-SP = Wechsler Preschool and Primary Scale of Intelligence (Third Edition) - Spanish Version The simultaneous test of multivariate skewness and kurtosis was statistically
significant, χ2 (2) = 52.28, p = .000. However, the relative multivariate kurtosis was 1.08,
indicating that multivariate kurtosis was 8% larger than a multivariate normal
distribution. Multivariate kurtosis, therefore, was considered mildly non-normal (per
Kline, 1998). The χ2 tests of simultaneous univariate skewness and kurtosis of were also
statistically significant for all variables at the .05 level, with the exception of the Block
Design subtest. For the variables that had statistically significant simultaneous univariate
skewness and kurtosis, all variables had skewness values that were significantly different
from normal (p < .05). Object Assembly was the only subtest with a kurtosis value
significantly different from normal. In analyzing the skewness and kurtosis values
presented in Table 3, univariate skewness fell in the moderate range for Object Assembly
and in the mild range for all other variables. Univariate kurtosis was considered mild for
54
all variables. In sum, the data were considered to be mildly non-normal, justifying the use
of robust tests in the analyses.
Intercorrelations were conducted to determine the relationships among the various
subtest scores and all values are reported in Table 3. All subtests were positively
correlated with each other. The correlation between Receptive Vocabulary and
Information fell in the moderate range; all other correlations were weak. Scores from the
Receptive Vocabulary subtest were found to have good reliability, and Block Design
subtest scores demonstrated acceptable reliability, whereas scores from the Information
and Object Assembly subtest scores revealed low and poor reliability, respectively (see
Table 3). Overall, the average reliability coefficient across all four subtests was low
(Cronbach's α = 0.61).
Model 1-NormYoung. The model based on the normative structure appears to
provide a reasonable fit to the data. Selected fit indices are presented in Table 6. The fit
indices of CFI, NNFI, and IFI fall above .95, indicating good fit. The RMSEA falls below
.06, also indicating good fit. Finally, the Satorra-Bentler Scaled Chi-Square is
nonsignificant, indicating good overall fit, χ2SB = 0.092, p = 0.762, df = 1.
Regarding component fit, parameter and standard error estimates are presented in
Table 4. All completely standardized factor loadings are within range and statistically
significant (z-test statistics > 1.96; see Figure 6). Furthermore, it would be expected all
path coefficients be positive (i.e., a positive relationship between indicator and latent
variable). The standard errors are reasonable as they are smaller than the standard
deviations of the indicator variables. All standardized residuals are acceptable (< |2.58|).
No modification indices were greater than 3.84, and all standardized expected change
55
values were small, suggesting that no paths should have been freed. Measurement model
R2 values were poor (< .36) for Object Assembly (R2 = .21), Block Design (R2 = .33), and
Receptive Vocabulary (R2 = .32), and moderate for Information (R2 = .52).
Table 4 Parameter and Standard Error Estimates for Model 1-NormYoung
Model Parameters Standardized Estimate
Unstandardized Estimate
Standard Error
Loadings on VIQ
Receptive Vocabulary .57 2.33a 2.02
Information .73 1.67* 0.63
Loadings on PIQ
Block Design .57 2.25a 1.99
Object Assembly .46 0.78* 0.41
Loadings on FSIQ
VIQ .96 0.43a 0.21
PIQ .99 0.45a 0.33
Note. Table values are Maximum Likelihood estimates. VIQ = Verbal Intelligence Quotient; PIQ = Perceptual Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient. *p < .05 a fixed factor loading.
56
Figure 6. Completely standardized factor loadings for Model 1-NormYoung. * p < .05 a fixed factor loading Model 2-OneFactorYoung. The one-factor model based on the research of
Contreras and Rodriquez (2013) appears to provide a reasonable fit to the data. Selected
fit indices are presented in Table 6. The fit indices of CFI, NNFI, and IFI fall above .95,
indicating good fit. The RMSEA falls below .06, also indicating good fit. Finally, the
Satorra-Bentler Scaled Chi-Square is nonsignificant, indicating good overall fit, χ2SB =
0.191, p = .0.91, df = 2.
Regarding component fit, parameter and standard error estimates are presented in
Table 5. All completely standardized factor loadings are within range and statistically
significant (z-test statistics > 1.96; see Figure 7). As expected, all path coefficients were
positive (i.e., a positive relationship between indicator and latent variable). The standard
errors are reasonable as they are all smaller than the standard deviations of the indicator
.57a
.73*
.57a
.46*
.96a
.99a
57
variables. All standardized residuals are acceptable (< |2.58|). No modification indices
were greater than 3.84, and all standardized expected change values were small,
suggesting that no paths should have been freed. Measurement model R2 values were
poor (< .36) for Receptive Vocabulary (R2 = .32), Block Design (R2 = .31), and Object
Assembly (R2 = .20), and moderate for Information (R2 = .52).
Table 5
Parameter and Standard Error Estimates for Model 2-OneFactorYoung
Model Parameters Standardized Estimate
Unstandardized Estimate
Standard Error
Loadings on FSIQ
Receptive Vocabulary .57 2.33* 0.32
Information .72 1.65* 0.25
Block Design .55 2.17* 0.35
Object Assembly .45 0.76* 0.14
Note. Table values are Maximum Likelihood estimates. FSIQ = Full Scale Intelligence Quotient. *p < .05
58
Figure 7. Completely standardized factor loadings for Model 2-OneFactorYoung. * p < .05 Table 6
SB = Satorra-Bentler Chi-Square; df = degrees of freedom; RMSEA = Root Mean Square Error of Approximation; CI90 = 90% Confidence Interval for RMSEA; CFI = Comparative Fit Index; NNFI = Non-Normed Fit Index; IFI = Incremental Fit Index. Model comparisons. To compare the overall fit of Model 1-NormYoung to
Model 2-OneFactorYoung, a Satorra-Bentler Chi-Square difference test was conducted.
As summarized in Table 7, results suggest that Model 1-NormYoung and Model 2-
OneFactorYoung are equivalent.
.57*
.72*
.55*
.45*
59
Table 7
Satorra-Bentler Chi-Square Difference Test
χ2SB Df
Model 1-OneFactorYoung 0.188 2
Model 2-NormYoung 0.092 1
Difference 0.096 1 Note. Difference is statistically significant if greater than 3.84.
Older Cohort
Preliminary analyses and descriptive statistics. Preliminary analyses were
conducted to examine the scores from the older cohort for outliers and missing data. The
data from the child aged 36 months was deleted listwise due to this age being outside the
range of the assessment. Although some cases had scores greater than three standard
deviations from the mean, no Mahalanobis' distance scores were significant (CV = 24.32
at p = .001). The final sample size for data analysis was 167. Table 8 presents the
descriptive statistics (i.e., intercorrelations, means, skew, and kurtosis) for this dataset.
Note. n = 167. WPPSI-III-SP = Wechsler Preschool and Primary Scale of Intelligence (Third Edition) - Spanish Version. The simultaneous test of multivariate skewness and kurtosis was statistically
significant, χ2 (2) = 474.35, p = .000. The relative multivariate kurtosis was 1.52,
indicating that multivariate kurtosis was 51.8% larger than a multivariate normal
distribution. The χ2 tests of simultaneous univariate skewness and kurtosis of were also
statistically significant for all variables at the .05 level. Regarding univariate skewness
and kurtosis, all variables had skewness and kurtosis values that were significantly
different from normal (p < .05), with two exceptions. The skewness value for Information
and the kurtosis value for Vocabulary were not significant. In analyzing the skewness and
kurtosis values presented in Table 8, univariate skewness fell in the moderate range for
all variables. Univariate kurtosis was considered mild for all variables, with the exception
61
of the Word Reasoning subtest. This subtest demonstrated moderate kurtosis. In sum, the
data was considered to be non-normal, justifying the use of robust tests in the analyses.
Intercorrelations were conducted to determine the relationships among the various
subtest scores and all values are reported in Table 8. All subtests were positively
correlated with each other. Correlations between Matrix Reasoning and Picture Concepts,
Information and Word Reasoning, Information and Vocabulary, and Vocabulary and
Word Reasoning fell in the moderate range, whereas all other correlations fell in the
weak to very weak ranges. Scores from the Vocabulary, Picture Concepts, and Coding
subtests were found to have good reliability. Scores from the Block Design, Information,
Matrix Reasoning, and Word Reasoning subtests were found to have acceptable
reliability. Overall, the average reliability estimate across subtest scores fell in the
acceptable range (Cronbach's α = .75).
Model 3-NormOlder. The model based on the normative structure appears to
provide a poor fit to the data. Initial analyses yielded a non-positive definite matrix for
latent variables, with a negative error of variance for the PIQ factor. Gerbing and
Anderson (1987) studied three methods for respecification of initial models with one
negative estimate and small sample sizes. The authors suggested fixing the variance of
the improper parameter to a negligible number. This method is also noted by Brown
(2015). As such, the variance of PIQ was set to 0.001. Selected fit indices are presented
in Table 10. The fit indices of CFI, NNFI, and IFI fall below .95, indicating inadequate
fit. The RMSEA falls above .08, also indicating inadequate fit. Finally, the Satorra-
Bentler Scaled Chi-Square is significant, indicating poor overall fit, χ2SB = 38.79, p =
0.007, df = 14. Parameter and standard error estimates are presented in Table B1 in
62
Appendix B. While some modification indices were greater than 3.84, these
modifications to the model were not theoretically supported.
Model 4-OneFactorOlder. The one-factor model based on the research of
Contreras and Rodriquez (2013) appears to provide a poor fit to the data. Selected fit
indices are presented in Table 10. The fit indices of CFI, NNFI, and IFI fall below .95,
indicating inadequate fit. The RMSEA falls above .08, also indicating inadequate fit.
Finally, the Satorra-Bentler Scaled Chi-Square is significant, indicating poor overall fit,
χ2SB = 37.87, p = .000, df = 14. Parameter and standard error estimates are presented in
Table B2 in Appendix B.
Of the modification indices (MI) greater than 3.84, the MIs suggesting correlated
errors between Coding and Block Design and between Vocabulary and Word Reasoning
were the largest (20.78 and 8.62, respectively) and the most theoretically supported.
Coding and Block Design rely on recognizing and matching shapes. Vocabulary and
Word Reasoning draw on word knowledge and definition skills. The selected fit statistics
for this reduced model are also presented in Table 13. While the fit index of IFI and CFI
falls at the threshold of 0.95, the NNFI falls below. The RMSEA is equal to .08,
indicating acceptable fit. However, the Satorra-Bentler Scaled Chi-Square is significant,
indicating potentially unacceptable overall fit, χ2SB = 21.534, p = .04, df = 12. Taking into
account all fit evidence, it appears that, overall, correlating these errors provided an
adequate fitting model.
Regarding component fit, parameter and standard error estimates are presented in
Table 9. All completely standardized factor loadings are within range and statistically
significant (z-test statistics > 1.96; see Figure 8). As expected, all path coefficients were
63
positive (i.e., a positive relationship between indicator and latent variable). The standard
errors are reasonable as they are all smaller than the standard deviations of the indicator
variables. All standardized residuals are acceptable (<|2.58|). Measurement model R2
values were poor (< 0.36) for Information (R2 = 0.21), Block Design (R2 = 0.24), Coding
(R2 = 0.18), and Vocabulary (R2 = 0.18), and moderate for Picture Concepts (R2 = 0.42),
Word Reasoning (R2 = 0.49), and Matrix Reasoning (R2 = 0.51).
Table 9
Parameter and Standard Error Estimates for Model 4-OneFactorOlder with Modifications
Model Parameters Standardized Estimate
Unstandardized Estimate
Standard Error
Loadings on FSIQ
Picture Concepts .65 2.08* 1.07
Information .46 1.57* 1.41
Block Design .49 1.05* 0.58
Word Reasoning .70 1.26* 0.49
Vocabulary .42 1.07* 1.04
Matrix Reasoning .72 2.88* 1.53
Coding .42 0.86* 0.49
Note. Table values are Maximum Likelihood estimates. FSIQ = Full Scale Intelligence Quotient. * p < .05 level.
64
Figure 8. Completely standardized factor loadings for modified Model 4-OneFactorOlder * p < .05 Model 5-AltOlder. The alternative second-order model based on the research of
Ramirez and Rosas (2007) appears to provide a poor fit to the data. Similar to Model 3-
NormOlder model, initial analyses yielded a non-positive definite matrix for latent
variables, with a negative error of variance for the VIQ factor. As such, the variance of
VIQ was set to 0.001 (Gerbing & Anderson, 1987). Selected fit indices are presented in
Table 10. The fit indices of CFI, NNFI, and IFI fall below .95, indicating inadequate fit.
The RMSEA falls above .08, also indicating inadequate fit. Finally, the Satorra-Bentler
Scaled Chi-Square is significant, indicating poor overall fit, χ2SB = 36.83, p = .000, df =
.42*
.46*
.70*
.49*
.65*
.72*
.42*
65
13. Parameter and standard error estimates are presented in Table B3 in Appendix B.
While some modification indices were greater than 3.84, these modifications to the model
were not theoretically supported.
Table 10
Selected Fit Indices for Older Cohort Models
χ2SB df RMSEA (CI90) CFI NNFI IFI
Model 3-NormOlder 38.70 14 0.11(0.08 - 0.15) .88 .83 .89
SB = Satorra-Bentler Chi-Square; df = degrees of freedom; RMSEA = Root Mean Square Error of Approximation; CI90 = 90% Confidence Interval for RMSEA; CFI = Comparative Fit Index; NNFI = Non-Normed Fit Index; IFI = Incremental Fit Index. Convergent Validity Analyses
Of the 147 children in the younger cohort who were included in the sample for
confirmatory factor analyses, 141 of these children completed the adapted Bayley-III
Cognitive subtest at 24 months. Among the older cohort, 158 children completed the
adapted Bayley-III and the adapted WPPSI-III-SP. Table 11 presents the descriptive
statistics (i.e., means, skew, kurtosis, and coefficient alphas) for these samples. In
addition, 134 children completed the adapted WPPSI-III-SP at 36 and 48 months. Data
for eight children from this sample were deleted due to the child's age being outside the
range of the assessment. As such, the final sample size for examining the direction and
strength of the relationship between scores at each time point was 126.
Cognitive subtest scores from the adapted Bayley-III at 24 months were
significantly and positively correlated (r = .21; p < .05) with scores from the adapted
66
WPPSI-III-SP at 36 months. This correlation was weak. For the older cohort, cognitive
subtest scores from the adapted Bayley-III at 24 months were significantly and positively
correlated (r = .28; p < .05) with scores from the adapted WPPSI-III-SP at 48 months.
This relationship was also weak. Regarding the cohort of children who completed the
adapted WPPSI-III-SP at 36 and 48 months, scores from these time points were
significantly, positively, and moderately correlated (r = .55; p < .05).
Table 11
Descriptive Statistics and Reliability Estimates for Adapted WPPSI-III-SP Total Scores and Adapted Bayley-III Scores (Younger and Older Cohorts) Bayley- III
(Younger Cohort) Bayley- III
(Older Cohort) WPPSI-III-SP
(Younger Cohort) WPPSI-III-SP (Older Cohort)
M 4.66 6.37 26.06 39.11
SD 2.98 2.35 8.64 12.66
Skew 0.45 0.01 -0.19 0.82
Kurtosis -0.57 -0.39 -0.03 1.11
Note. n =141 for younger cohort; n = 158 for older cohort.
67
Chapter 5: Discussion
The primary purpose of this study was to examine the construct validity of the
WPPSI-II-SP adapted for use among children in rural Peru. Using confirmatory factor
analyses, data from a younger and older cohort of children were fitted to models based on
the normative structure of the WPPSI-III-SP. If the adapted version measured
intelligence in the same manner as the original version, it was hypothesized that these
normative models would provide an adequate fit for the scores from the Peruvian cohorts.
The process of adapting a test for use among a different culture, however, is complicated
and must take into account many cultural, developmental, construct, and adaptation
considerations. For example, the process considers differences as to how the construct in
question is expressed across cultures, differences in language that may affect how items
are interpreted across cultures, and differences in exposure to the assessment's tasks (e.g.,
completing pencil-and-paper tasks) across cultures.
Given these adaptation considerations, the present study also proposed to
determine if another factor structure would provide a better fit to the scores. Based on the
research of others (Contreras & Rodriquez, 2013; Ramirez & Rosas, 2007) conducting
test adaptation research in Colombia and Chile, three additional factor structures were
proposed. For both the younger and older cohorts, a one-factor model was proposed. For
the older cohort, an alternative to the normative structure was proposed in which the
Coding subtest loaded on the Performance factor instead of on the overall Full Scale
Intelligence Quotient factor. As the cultures of Colombia and Chile may more closely
resemble the Peruvian culture (in comparison to Spanish culture), it was hypothesized
that the models based on the research in South America would provide a better fit to the
68
data than the models based on the normative data. Finally, to examine evidence of
convergent validity the total subtest scores from the younger cohort and from the older
cohort were correlated with scores from another adapted cognitive ability measure
completed by the children at 24 months. It was expected that the scores from these
measures of cognitive ability would be positively correlated. A supplemental analysis
was conducted to examine the direction and strength of the relationship between scores
on the adapted WPPSI-III-SP from children who completed the assessment at 36 and 48
months.
Fit of the Hypothesized Models
Younger cohort. For the younger cohort, both the model based on the normative
data (Model 1-NormYoung) and the one-factor model based on the research of Contreras
and Rodriquez (2015; Model 2-OneFactorYoung) appear to provide an adequate fit to the
data. Furthermore, Model 1-NormYoung appeared to fit the data as well as Model 2-
OneFactorYoung. As such, the first hypothesis was supported as Model 1-NormYoung
provided an adequate fit for the data. The second hypothesis, however, was not supported
based on the confirmatory factor analyses. In comparing the fit for Model 2-
OneFactorYoung and Model 1-NormYoung, the prior model did not provide a better fit
for the data as compared to the latter model.
Regarding component fit, all standardized estimates were significant across
models. Scores from the Information subtest demonstrated the strongest relationship with
VIQ and with the latent global intelligence factor, while Receptive Vocabulary, Block
Design, and Object Assembly were moderately related to the latent factors. It should be
noted that measurement model R2 values were poor for three out of four subtests
69
(Receptive Vocabulary, Block Design, and Object Assembly), indicating that much of the
variance associated with these subtests was left unexplained. Finally, the second-order
factor loadings were larger than the subtest loadings on the first-order loadings,
suggesting the verbal and performance factors were strongly influenced by overall
cognitive ability.
The relationships between subtest scores demonstrated potential support for the
one-factor structure. In looking at evidence for convergent validity, the scores from the
VIQ subtests (Receptive Vocabulary and Information) were moderately correlated.
However, the relationship between scores from the PIQ subtests (Object Assembly and
Block Design) was weak. Furthermore, little evidence was present for discriminant
validity. Information subtest scores demonstrated a similar strength relationship with
scores from the Block Design subtest and scores from the Receptive Vocabulary subtest.
Scores from the Block Design subtest were more strongly related to scores from the VIQ
subtests as compared to Object Assembly subtest scores. In other words, scores did not
demonstrate consistently stronger relationships among subtest scores from the same
factor (convergent validity) and consistently weaker relationships among subtest scores
from differing factors (discriminant validity). As such, this pattern may provide evidence
of the superiority of a one-factor structure with an overall cognitive ability factor.
Reliability. In exploring the validity of an assessment it is imperative to also look
at the test's scores' reliability, as reliability is the foundation of validity (Sattler, 2008).
The reliability estimate of Object Assembly subtest scores is questionable. As noted
earlier, scores from assessments of young children struggle to produce high reliability
estimates (Alfonso & Flanagan, 1999; Sattler, 2008). In addition, the test taking behavior
70
of young children adds potential error and inconsistency to scores (Frisby, 1999b). Young
children have shorter attention spans, less expressive language, and less exposure to the
Note. WPPSI = Wechsler Preschool and Primary Scale of Intelligence. VIQ = Verbal Intelligence Quotient; PIQ = Performance Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient; GLC = General Language Composite; PSQ = Processing Speed Quotient. Check marks indicate the inclusion of a specific subtest into the composite score. Parentheses indicate a supplemental subtest. Reviewed tests are based on US normative sample.
95
Appendix B
Parameter Estimates and Standard Errors for Models 3, 4, and 5
Table B1
Parameter and Standard Error Estimates for Model 3-NormOlder
Model Parameters Standardized Estimate
Unstandardized Estimate
Standard Error
Loadings on VIQ
Vocabulary .56 1.38* 0.81
Information .39 1.27a 1.45
Word Reasoning .81 1.46* 0.38
Loadings on PIQ
Block Design .56 1.19* 0.54
Matrix Reasoning .72 2.86* 1.28
Picture Concepts .61 1.94* 1.10
Loadings on FSIQ
VIQ .79 -- 0.32
PIQ -- -- --
Coding .52 1.06* 0.44
Note. Table values are Maximum Likelihood estimates. VIQ = Verbal Intelligence Quotient; PIQ = Perceptual Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient. *p < .05 a fixed factor loading.
96
Table B2 Parameter and Standard Error Estimates for Model 4-OneFactorOlder
Model Parameters Standardized Estimate
Unstandardized Estimate
Standard Error
Loadings on FSIQ
Picture Concepts .61 1.96* 0.29
Information .45 1.56* 1.43
Block Design .54 1.16* 0.21
Word Reasoning .72 1.30* 0.42
Vocabulary .51 1.28* 0.87
Matrix Reasoning .68 1.07* 0.99
Coding .49 1.00* 0.45
Note. Table values are Maximum Likelihood estimates. FSIQ = Full Scale Intelligence Quotient. *p < .05 level.
97
Table B3
Parameter and Standard Error Estimates for Model 5-AltOlder
Model Parameters Standardized Estimate
Unstandardized Estimate
Standard Error
Loadings on VIQ
Vocabulary .56 0.87* 0.80
Information .46 0.97* 1.42
Word Reasoning .80 0.89* 0.36
Loadings on PIQ
Block Design .56 1.20* 0.53
Matrix Reasoning .72 2.88* 1.25
Picture Concepts .62 1.97a 1.09
Coding .52 1.06* 0.44
Loadings on FSIQ
VIQ -- -- --
PIQ .51 -- 0.22
Note. Table values are Maximum Likelihood estimates. VIQ = Verbal Intelligence Quotient; PIQ = Perceptual Intelligence Quotient; FSIQ = Full Scale Intelligence Quotient. * p < .05 level. a fixed factor loading.
VITA
Abigail E. Crimmins 96 Sunnyside Dr. Elmira, NY 14905 [email protected]
607-215-3252
Education 2011 – present The Pennsylvania State University M.S. (August 2013), Ph.D. (exp. August, 2016) School Psychology 2005 – 2009 Hamilton College B.A. (May 2009; GPA – 3.85) Psychology and Hispanic Studies
Research Experience • Research Assistant, Penn State Department of Special Education, Summer 2014 • Research Assistant, LEGACY Project, Summer 2013 • Predissertation Research Project, Student-Teacher Relationships among Children with Autism:
Contribution of Students’ Social Skills, August 2013 • Honors Thesis, The Use of Thought Suppression to Cope with Ego-Threat among Those with
Fragile Self-Esteem, May 2009 • Research Assistant, Department of Psychology, Hamilton College, Summer 2007
Clinical Experience • Doctoral School Psychology Intern, Letchworth Central School District, 2015 – present • CEDAR Clinic Mobile Clinician, Juniata County School District, Spring 2015 • CEDAR Clinic Student Supervisor, Penn State CEDAR Clinic, 2014 – 2015 • School Psychology Practicum Intern, State College Area School District, 2013 - 2014 • School Psychology Student Clinician, Penn State CEDAR Clinic, 2011 – 2014
Teaching Experience • Graduate Teaching Assistant, Human Development and Family Studies, 2011 – 2014 • Clinical Graduate Assistant, School Psychology Program, 2012 – 2014 • Statistics Teaching Assistant, Psychology Department, 2007 – 2009
Work Experience • Respite Care Specialist, 2011 – present • Level II Teacher, New England Center for Children, 2009 – 2011 • Undergraduate Counselor Intern for Children with ADHD, Center for Children and Families, 2008
Publications and Presentations Woika, S. A., & Crimmins, A. E. (2014, October). Practical Guidance for Supervisors of School Psychologists. Presentation at the meeting of the Association of School Psychologists of Pennsylvania, State College, PA. Crimmins, A. E. (2014, February). Student-Teacher Relationships among Children with Autism: Contribution of Students’ Social Skills. Poster presented at the meeting of the National Association of School Psychologists, Washington DC. Clark, T. C., Crimmins, A. E., & Leposa, B. (2012, February). Reading first, or is it? Paper presented at the meeting of the National Association of School Psychologists, Philadelphia, PA. Borton, J. L. S., Crimmins, A. E., Ashby, R. S., & Ruddiman, J. F. (2012). How do individuals with fragile self-esteem cope with intrusive thoughts following ego threat? Self and Identity, 11, 16 – 35.
Awards and Honors • Membership Award, Pennsylvania Psychologists Association, June 2013