Top Banner
Evaluating Effect of Reflex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel * March 13, 2017 1 Study Characteristics 1.1 Intervention Condition Reflex is an online, game-based system for developing math fact fluency in schoolchildren. It is provided by ExploreLearning, a division of Cambium- Learning. Reflex maintains an internal student model to facilitate adaptive instruction and individualized practice on math facts. It uses a fact-family approach, teaching groups of related facts together. For example, a student may receive coaching on 2+6, 6+2, 8-2, and 8-6 on the same day. A student’s daily work in Reflex generally comprises 4 phases: 1. An assessment component monitoring progress posed in a game envi- ronment that minimizes distraction 2. A coaching session where the student learns a new set of related facts or receives remedial work on a previously learned set 3. A practice game combining newly learned facts with facts the student is developing 4. Intense practice under time pressure on facts the student has demon- strated at least partial fluency * Senior Principal Data Scientist—ExploreLearning 1
28

Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Feb 28, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Evaluating Effect of Reflex Rcopy onMath Fact Fluency in Grades 2 amp 3

David I Rudellowast

March 13 2017

1 Study Characteristics

11 Intervention Condition

Reflex is an online game-based system for developing math fact fluency inschoolchildren It is provided by ExploreLearning a division of Cambium-Learning Reflex maintains an internal student model to facilitate adaptiveinstruction and individualized practice on math facts It uses a fact-familyapproach teaching groups of related facts together For example a studentmay receive coaching on 2+6 6+2 8-2 and 8-6 on the same day A studentrsquosdaily work in Reflex generally comprises 4 phases

1 An assessment component monitoring progress posed in a game envi-ronment that minimizes distraction

2 A coaching session where the student learns a new set of related factsor receives remedial work on a previously learned set

3 A practice game combining newly learned facts with facts the studentis developing

4 Intense practice under time pressure on facts the student has demon-strated at least partial fluency

lowastSenior Principal Data ScientistmdashExploreLearning

1

The assessment component uses a combination of several games someof which present facts aligned vertically while others present facts alignedhorizontally The coaching session uses a cover-copy-compare strategy tointroduce facts followed by a fill-in-the-blank session where the student com-pletes an open fact sentence with one or two missing terms The third com-ponent uses horizontally aligned facts and provides interactive feedback tomissed facts The intense practice component differs from the rest in thatthe student is given multiple facts and chooses one to answer This choiceprovides agency to the student as it affects outcomes in the game (eg thefact chosen determines which direction an on-screen character moves)

Reflex has individualized practice recommendations The median totaltime in the system for second and third graders to complete these recommen-dations is 15-16 minutes per day with earlier days generally taking longerthan later ones Students do not always meet the daily practice target dueto lack of time or limited technological resources Once the recommendedpractice is complete for a day an on-screen indicator illuminates and thestudent is allowed to spend time on non-practice motivational aspects of thesystem such as using tokens to buy new clothes for his avatar

Reflex has been sold commercially since 2011 It is delivered on an annualsubscription basis to thousands of schools A time-limited free trial is avail-able and interested teachers can apply for grants providing free access forone year Subscriptions are sold at teacher- site- and district-wide levels

Participating teachers assigned to the intervention condition undertook astandard 90-minute training webinar acquainting them with the system andbest practices Approximately 50 of all new Reflex subscriptions includedsuch training in spring 2016

Students use Reflex directly no teacher involvement occurs within a Re-flex session Teachers support students indirectly by encouraging studentsand cultivating their enthusiasm including the distribution of milestone cer-tificates provided by the system Teachers also of course need to scheduletime for students to play Reflex and supervise student usage Reflex providesteachers reports showing progress and usage of each student

Reflex provides three options for the pool of facts a student learns

bull Addition and Subtraction 0-10 Addition facts whose terms are within0-10 and their associated subtraction facts

bull Multiplication and Division 0-10 Multiplication facts whose factorsare in the range 0-10 and their associated division facts

2

bull Multiplication and Division 0-12 Multiplication facts whose factorsare in the range 0-12 and their associated division facts

Students assigned to the intervention condition began in the addition subtraction assignment if they were in second grade and in the multiplication division 0-10 assignment if they were in third grade Teachers had theability to switch students on an individual basis to other assignments at theirown discretion Sixteen of the 37 second grader using Reflex were switchedinto multiplicationdivision before the posttest Thus some of their timespent in Reflex was dedicated to above-grade-level items that were not onthe posttest

The recommended usage for Reflex is 3 days per week The four teachersachieved weekly usages of 26 33 34 and 15 These values include all dayson which a login was made even if the student was practicing facts outsidethe range of testing

The average usage across all students was 27 daysweekThe median time spent in Reflex during the studyrsquos was 59 minutes a

week which includes time spent in non-instructional aspects of the systemsuch as browsing an in-product store to buy virtual items using tokens earnedin games or cases where a student logged in from home and forgot to log off

Reflex requires individual accounts with individual passwords A user inthe comparison group could only have used Reflex by logging into the accountof another student

Post-survey questionnaires were given to all teachers Two teachers fromthe intervention condition returned questionnaires both indicating they re-lied on Reflex as their primary means of developing math fact fluency duringthe course of the study

12 Comparison Condition

This study used a business as usual comparison condition Math fluency ingeneral and math fact fluency in particular are required by the Florida MathStandards and Common Core State Standards for grades 2 and 3 FloridaMath Standard MAFS2OA22 and Common Core State Standard 2MD2have identical wordings ldquoFluently add and subtract within 20 using mentalstrategies By end of Grade 2 know from memory all sums of two one-digitnumbersrdquo Similarly Florida Math Standard MAFSOAC7 and CommonCore State Standard3OAC7 read ldquoFluently multiply and divide within 100

3

using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

Table 1 Post-Study Comparison Group Responses

Grade Average Strategies Time SpentGain (Hours per month)

3 088 (Did not return survey) NA

3 081 flash cards timed tests repeti-tion math fact raps

20 hours

2 053 (Did not return survey) NA

2 -001 ten marks flash cards fast factscenter work

(time everyday)10 minutes

Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

4

13 Setting

Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

14 Participants

The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

2 Study Design and Analysis

21 Sample Formation

The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

5

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 2: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

The assessment component uses a combination of several games someof which present facts aligned vertically while others present facts alignedhorizontally The coaching session uses a cover-copy-compare strategy tointroduce facts followed by a fill-in-the-blank session where the student com-pletes an open fact sentence with one or two missing terms The third com-ponent uses horizontally aligned facts and provides interactive feedback tomissed facts The intense practice component differs from the rest in thatthe student is given multiple facts and chooses one to answer This choiceprovides agency to the student as it affects outcomes in the game (eg thefact chosen determines which direction an on-screen character moves)

Reflex has individualized practice recommendations The median totaltime in the system for second and third graders to complete these recommen-dations is 15-16 minutes per day with earlier days generally taking longerthan later ones Students do not always meet the daily practice target dueto lack of time or limited technological resources Once the recommendedpractice is complete for a day an on-screen indicator illuminates and thestudent is allowed to spend time on non-practice motivational aspects of thesystem such as using tokens to buy new clothes for his avatar

Reflex has been sold commercially since 2011 It is delivered on an annualsubscription basis to thousands of schools A time-limited free trial is avail-able and interested teachers can apply for grants providing free access forone year Subscriptions are sold at teacher- site- and district-wide levels

Participating teachers assigned to the intervention condition undertook astandard 90-minute training webinar acquainting them with the system andbest practices Approximately 50 of all new Reflex subscriptions includedsuch training in spring 2016

Students use Reflex directly no teacher involvement occurs within a Re-flex session Teachers support students indirectly by encouraging studentsand cultivating their enthusiasm including the distribution of milestone cer-tificates provided by the system Teachers also of course need to scheduletime for students to play Reflex and supervise student usage Reflex providesteachers reports showing progress and usage of each student

Reflex provides three options for the pool of facts a student learns

bull Addition and Subtraction 0-10 Addition facts whose terms are within0-10 and their associated subtraction facts

bull Multiplication and Division 0-10 Multiplication facts whose factorsare in the range 0-10 and their associated division facts

2

bull Multiplication and Division 0-12 Multiplication facts whose factorsare in the range 0-12 and their associated division facts

Students assigned to the intervention condition began in the addition subtraction assignment if they were in second grade and in the multiplication division 0-10 assignment if they were in third grade Teachers had theability to switch students on an individual basis to other assignments at theirown discretion Sixteen of the 37 second grader using Reflex were switchedinto multiplicationdivision before the posttest Thus some of their timespent in Reflex was dedicated to above-grade-level items that were not onthe posttest

The recommended usage for Reflex is 3 days per week The four teachersachieved weekly usages of 26 33 34 and 15 These values include all dayson which a login was made even if the student was practicing facts outsidethe range of testing

The average usage across all students was 27 daysweekThe median time spent in Reflex during the studyrsquos was 59 minutes a

week which includes time spent in non-instructional aspects of the systemsuch as browsing an in-product store to buy virtual items using tokens earnedin games or cases where a student logged in from home and forgot to log off

Reflex requires individual accounts with individual passwords A user inthe comparison group could only have used Reflex by logging into the accountof another student

Post-survey questionnaires were given to all teachers Two teachers fromthe intervention condition returned questionnaires both indicating they re-lied on Reflex as their primary means of developing math fact fluency duringthe course of the study

12 Comparison Condition

This study used a business as usual comparison condition Math fluency ingeneral and math fact fluency in particular are required by the Florida MathStandards and Common Core State Standards for grades 2 and 3 FloridaMath Standard MAFS2OA22 and Common Core State Standard 2MD2have identical wordings ldquoFluently add and subtract within 20 using mentalstrategies By end of Grade 2 know from memory all sums of two one-digitnumbersrdquo Similarly Florida Math Standard MAFSOAC7 and CommonCore State Standard3OAC7 read ldquoFluently multiply and divide within 100

3

using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

Table 1 Post-Study Comparison Group Responses

Grade Average Strategies Time SpentGain (Hours per month)

3 088 (Did not return survey) NA

3 081 flash cards timed tests repeti-tion math fact raps

20 hours

2 053 (Did not return survey) NA

2 -001 ten marks flash cards fast factscenter work

(time everyday)10 minutes

Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

4

13 Setting

Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

14 Participants

The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

2 Study Design and Analysis

21 Sample Formation

The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

5

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 3: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

bull Multiplication and Division 0-12 Multiplication facts whose factorsare in the range 0-12 and their associated division facts

Students assigned to the intervention condition began in the addition subtraction assignment if they were in second grade and in the multiplication division 0-10 assignment if they were in third grade Teachers had theability to switch students on an individual basis to other assignments at theirown discretion Sixteen of the 37 second grader using Reflex were switchedinto multiplicationdivision before the posttest Thus some of their timespent in Reflex was dedicated to above-grade-level items that were not onthe posttest

The recommended usage for Reflex is 3 days per week The four teachersachieved weekly usages of 26 33 34 and 15 These values include all dayson which a login was made even if the student was practicing facts outsidethe range of testing

The average usage across all students was 27 daysweekThe median time spent in Reflex during the studyrsquos was 59 minutes a

week which includes time spent in non-instructional aspects of the systemsuch as browsing an in-product store to buy virtual items using tokens earnedin games or cases where a student logged in from home and forgot to log off

Reflex requires individual accounts with individual passwords A user inthe comparison group could only have used Reflex by logging into the accountof another student

Post-survey questionnaires were given to all teachers Two teachers fromthe intervention condition returned questionnaires both indicating they re-lied on Reflex as their primary means of developing math fact fluency duringthe course of the study

12 Comparison Condition

This study used a business as usual comparison condition Math fluency ingeneral and math fact fluency in particular are required by the Florida MathStandards and Common Core State Standards for grades 2 and 3 FloridaMath Standard MAFS2OA22 and Common Core State Standard 2MD2have identical wordings ldquoFluently add and subtract within 20 using mentalstrategies By end of Grade 2 know from memory all sums of two one-digitnumbersrdquo Similarly Florida Math Standard MAFSOAC7 and CommonCore State Standard3OAC7 read ldquoFluently multiply and divide within 100

3

using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

Table 1 Post-Study Comparison Group Responses

Grade Average Strategies Time SpentGain (Hours per month)

3 088 (Did not return survey) NA

3 081 flash cards timed tests repeti-tion math fact raps

20 hours

2 053 (Did not return survey) NA

2 -001 ten marks flash cards fast factscenter work

(time everyday)10 minutes

Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

4

13 Setting

Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

14 Participants

The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

2 Study Design and Analysis

21 Sample Formation

The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

5

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 4: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

Table 1 Post-Study Comparison Group Responses

Grade Average Strategies Time SpentGain (Hours per month)

3 088 (Did not return survey) NA

3 081 flash cards timed tests repeti-tion math fact raps

20 hours

2 053 (Did not return survey) NA

2 -001 ten marks flash cards fast factscenter work

(time everyday)10 minutes

Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

4

13 Setting

Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

14 Participants

The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

2 Study Design and Analysis

21 Sample Formation

The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

5

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 5: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

13 Setting

Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

14 Participants

The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

2 Study Design and Analysis

21 Sample Formation

The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

5

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 6: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 7: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 8: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 9: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 10: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 11: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 12: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 13: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 14: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 15: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 16: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 17: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 18: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 19: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 20: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 21: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 22: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 23: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 24: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 25: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 26: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 27: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model
Page 28: Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model