Evaluating E ect of Re ex R Math Fact Fluency in Grades 2 & 3...Evaluating E ect of Re ex R on Math Fact Fluency in Grades 2 & 3 David I. Rudel March 13, 2017 1 Study Characteristics

Post on 28-Feb-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Evaluating Effect of Reflex Rcopy onMath Fact Fluency in Grades 2 amp 3

David I Rudellowast

March 13 2017

1 Study Characteristics

11 Intervention Condition

Reflex is an online game-based system for developing math fact fluency inschoolchildren It is provided by ExploreLearning a division of Cambium-Learning Reflex maintains an internal student model to facilitate adaptiveinstruction and individualized practice on math facts It uses a fact-familyapproach teaching groups of related facts together For example a studentmay receive coaching on 2+6 6+2 8-2 and 8-6 on the same day A studentrsquosdaily work in Reflex generally comprises 4 phases

1 An assessment component monitoring progress posed in a game envi-ronment that minimizes distraction

2 A coaching session where the student learns a new set of related factsor receives remedial work on a previously learned set

3 A practice game combining newly learned facts with facts the studentis developing

4 Intense practice under time pressure on facts the student has demon-strated at least partial fluency

lowastSenior Principal Data ScientistmdashExploreLearning

1

The assessment component uses a combination of several games someof which present facts aligned vertically while others present facts alignedhorizontally The coaching session uses a cover-copy-compare strategy tointroduce facts followed by a fill-in-the-blank session where the student com-pletes an open fact sentence with one or two missing terms The third com-ponent uses horizontally aligned facts and provides interactive feedback tomissed facts The intense practice component differs from the rest in thatthe student is given multiple facts and chooses one to answer This choiceprovides agency to the student as it affects outcomes in the game (eg thefact chosen determines which direction an on-screen character moves)

Reflex has individualized practice recommendations The median totaltime in the system for second and third graders to complete these recommen-dations is 15-16 minutes per day with earlier days generally taking longerthan later ones Students do not always meet the daily practice target dueto lack of time or limited technological resources Once the recommendedpractice is complete for a day an on-screen indicator illuminates and thestudent is allowed to spend time on non-practice motivational aspects of thesystem such as using tokens to buy new clothes for his avatar

Reflex has been sold commercially since 2011 It is delivered on an annualsubscription basis to thousands of schools A time-limited free trial is avail-able and interested teachers can apply for grants providing free access forone year Subscriptions are sold at teacher- site- and district-wide levels

Participating teachers assigned to the intervention condition undertook astandard 90-minute training webinar acquainting them with the system andbest practices Approximately 50 of all new Reflex subscriptions includedsuch training in spring 2016

Students use Reflex directly no teacher involvement occurs within a Re-flex session Teachers support students indirectly by encouraging studentsand cultivating their enthusiasm including the distribution of milestone cer-tificates provided by the system Teachers also of course need to scheduletime for students to play Reflex and supervise student usage Reflex providesteachers reports showing progress and usage of each student

Reflex provides three options for the pool of facts a student learns

bull Addition and Subtraction 0-10 Addition facts whose terms are within0-10 and their associated subtraction facts

bull Multiplication and Division 0-10 Multiplication facts whose factorsare in the range 0-10 and their associated division facts

2

bull Multiplication and Division 0-12 Multiplication facts whose factorsare in the range 0-12 and their associated division facts

Students assigned to the intervention condition began in the addition subtraction assignment if they were in second grade and in the multiplication division 0-10 assignment if they were in third grade Teachers had theability to switch students on an individual basis to other assignments at theirown discretion Sixteen of the 37 second grader using Reflex were switchedinto multiplicationdivision before the posttest Thus some of their timespent in Reflex was dedicated to above-grade-level items that were not onthe posttest

The recommended usage for Reflex is 3 days per week The four teachersachieved weekly usages of 26 33 34 and 15 These values include all dayson which a login was made even if the student was practicing facts outsidethe range of testing

The average usage across all students was 27 daysweekThe median time spent in Reflex during the studyrsquos was 59 minutes a

week which includes time spent in non-instructional aspects of the systemsuch as browsing an in-product store to buy virtual items using tokens earnedin games or cases where a student logged in from home and forgot to log off

Reflex requires individual accounts with individual passwords A user inthe comparison group could only have used Reflex by logging into the accountof another student

Post-survey questionnaires were given to all teachers Two teachers fromthe intervention condition returned questionnaires both indicating they re-lied on Reflex as their primary means of developing math fact fluency duringthe course of the study

12 Comparison Condition

This study used a business as usual comparison condition Math fluency ingeneral and math fact fluency in particular are required by the Florida MathStandards and Common Core State Standards for grades 2 and 3 FloridaMath Standard MAFS2OA22 and Common Core State Standard 2MD2have identical wordings ldquoFluently add and subtract within 20 using mentalstrategies By end of Grade 2 know from memory all sums of two one-digitnumbersrdquo Similarly Florida Math Standard MAFSOAC7 and CommonCore State Standard3OAC7 read ldquoFluently multiply and divide within 100

3

using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

Table 1 Post-Study Comparison Group Responses

Grade Average Strategies Time SpentGain (Hours per month)

3 088 (Did not return survey) NA

3 081 flash cards timed tests repeti-tion math fact raps

20 hours

2 053 (Did not return survey) NA

2 -001 ten marks flash cards fast factscenter work

(time everyday)10 minutes

Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

4

13 Setting

Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

14 Participants

The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

2 Study Design and Analysis

21 Sample Formation

The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

5

bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

Group Descriptions

Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

22 Outcome Measures

221 Outcomes

One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

6

Table 2 Baseline Demographic Information

Full Sample Comparison Group Intervention Group

Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

222 Probes

Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

7

division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

An example is provided in the Appendix

223 Administrations

Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

8

their posttest than on their pretest

224 Fluency Score Calculation

For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

Table 3 Raw Fluency Pretest Score Distributions by Grade

Measure Grade 2 Grade 3

Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

of inversion Thus the calculation for final score isradic

(C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

9

23 Validity

The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

24 Reliability

Several researchers have confirmed the reliability of CBM for math fluency

Table 4 Previous Research on CBM Reliability for Math Fluency

Metric Scoring Method Source Value

Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

minus Incorrect Digitsper Minute

Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

per Minute

Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

Table 5 Internal Consistency of Raw Fluency Score

AdditionSubtraction MultiplicationDivision

Pretest 095 094Interim Test 096 094Posttest 097 095

10

We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

Table 6 Delayed Alternate-Form Reliability (14 weeks)

AdditionSubtraction MultiplicationDivision

Intervention 077 047Comparison 072 089

The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

25 Analytic Approach

Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

11

The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

bull No Level-2 variables

bull Two Level-1 variables the covariate in question and pretest score

bull Group-mean-centered values

bull Data scaled to be univariate

This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

12

Table 7 Impact and significance of demographic covariates

Covariate Coefficient t-score

age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

Statistical significance was determined based on the t-score of the multi-level model

26 Statistical Adjustments

We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

13

two high-achieving classes In grade 2 every ESE-designated student was ina single class

Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

Pretest score is defined asradicC minus I + 2 where C is digits correct per

minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

radicC minus 2

where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

27 Students Removed from Study

Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

14

Table 8 Descriptive Statistics of Control Variables

Control Variable Mean SD Skew Kurtosis

Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

28 Missing Data

Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

15

established by Burns et al (2006)

Table 9 Categorization of Students

Fluency (dcmin) Category N

Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

281 Frustration Level

One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

282 Instructional Level

Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

29 Mastery Level

There were no students in the mastery level for whom imputation was nec-essary

16

3 Study Data

Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

The tables in this section report unscaled uncentered values for ease ofinterpretability

31 Pre-Intervention DatamdashAll Pretest Takers

This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 70 444 117 4 70 449 126

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

17

32 Pre-Intervention DatamdashBaseline Sample

This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

Outcome Data

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fluency Score 4 64 457 112 4 66 460 120

Background Data

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

18

33 Pre-intervention Data Analytic Sample

Outcome DatamdashAnalytic Sample

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 64 4573 1121 4 65 4580 1195

Background DatamdashAnalytic Sample

VariableComparison Intervention

Mean SD Mean SD

Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

Outcome DatamdashAnalytic Sample with No Imputation

Measure Comparison Group Intervention Group

Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

Fact Fluency 4 61 4557 1143 4 61 4640 1142

19

Background DatamdashAnalytic Sample with No Imputation

VariableComparison Intervention

Mean SD Mean SD

Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

34 Post-intervention Data and Findings

341 Analytic Sample

As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

4Nminus9for small

effect size

20

Estimation of Effect SizemdashAnalytic Sample

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

342 Analytic Sample with No Imputation

Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

Model Comparison Group Intervention Group Estimated Effect

Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

Estimation of Effect SizemdashAnalytic Sample with No Imputation

Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

21

35 Subpopulation Analyses

We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

Difference Within-Group SD (adj Hedgesrsquo g) t-score

Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

4 Acknowledgment

We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

22

References

Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

23

Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

24

Appendix A Full Model

The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

25

Appendix B Demographic Model

The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

Factor Coefficient t-score

Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

26

Appendix C Reduced Model

The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

Factor Coefficient t-score

Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

27

9 5 18 9 8 13

+ 8 + 9 minus10 minus 6 + 3 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

10 2 3 10 12 9

minus 3 + 7 + 8 + 1 minus10 minus 1

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

5 8 3 19 7 16 3

+ 0 + 4 + 6 minus 9 minus 1 minus10 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

3 7 15 0 4 14 7

minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 20 11 4 9 6 1

+ 9 minus10 minus 3 minus 4 + 0 minus 1 +10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

9 12 12 2 5 9 5

+10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 14 7 11 7 4 6

+ 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

2 12 14 4 10 1 7

+ 3 minus 5 minus 5 minus 4 +10 + 0 + 2

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

13 10 3 9 17 10 3

minus 6 +10 + 6 minus 6 minus 7 +10 + 6

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

4 10 10 3 5 5 10

+ 9 + 2 +10 minus 0 + 3 minus 5 minus10

macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

Appendix B Sample AdditionSubtraction Probe

  • Study Characteristics
    • Intervention Condition
    • Comparison Condition
    • Setting
    • Participants
      • Study Design and Analysis
        • Sample Formation
        • Outcome Measures
          • Outcomes
          • Probes
          • Administrations
          • Fluency Score Calculation
            • Validity
            • Reliability
            • Analytic Approach
            • Statistical Adjustments
            • Students Removed from Study
            • Missing Data
              • Frustration Level
              • Instructional Level
                • Mastery Level
                  • Study Data
                    • Pre-Intervention DatamdashAll Pretest Takers
                    • Pre-Intervention DatamdashBaseline Sample
                    • Pre-intervention Data Analytic Sample
                    • Post-intervention Data and Findings
                      • Analytic Sample
                      • Analytic Sample with No Imputation
                        • Subpopulation Analyses
                          • Acknowledgment
                          • Appendices
                          • Appendix Full Model
                          • Appendix Demographic Model
                          • Appendix Reduced Model

    The assessment component uses a combination of several games someof which present facts aligned vertically while others present facts alignedhorizontally The coaching session uses a cover-copy-compare strategy tointroduce facts followed by a fill-in-the-blank session where the student com-pletes an open fact sentence with one or two missing terms The third com-ponent uses horizontally aligned facts and provides interactive feedback tomissed facts The intense practice component differs from the rest in thatthe student is given multiple facts and chooses one to answer This choiceprovides agency to the student as it affects outcomes in the game (eg thefact chosen determines which direction an on-screen character moves)

    Reflex has individualized practice recommendations The median totaltime in the system for second and third graders to complete these recommen-dations is 15-16 minutes per day with earlier days generally taking longerthan later ones Students do not always meet the daily practice target dueto lack of time or limited technological resources Once the recommendedpractice is complete for a day an on-screen indicator illuminates and thestudent is allowed to spend time on non-practice motivational aspects of thesystem such as using tokens to buy new clothes for his avatar

    Reflex has been sold commercially since 2011 It is delivered on an annualsubscription basis to thousands of schools A time-limited free trial is avail-able and interested teachers can apply for grants providing free access forone year Subscriptions are sold at teacher- site- and district-wide levels

    Participating teachers assigned to the intervention condition undertook astandard 90-minute training webinar acquainting them with the system andbest practices Approximately 50 of all new Reflex subscriptions includedsuch training in spring 2016

    Students use Reflex directly no teacher involvement occurs within a Re-flex session Teachers support students indirectly by encouraging studentsand cultivating their enthusiasm including the distribution of milestone cer-tificates provided by the system Teachers also of course need to scheduletime for students to play Reflex and supervise student usage Reflex providesteachers reports showing progress and usage of each student

    Reflex provides three options for the pool of facts a student learns

    bull Addition and Subtraction 0-10 Addition facts whose terms are within0-10 and their associated subtraction facts

    bull Multiplication and Division 0-10 Multiplication facts whose factorsare in the range 0-10 and their associated division facts

    2

    bull Multiplication and Division 0-12 Multiplication facts whose factorsare in the range 0-12 and their associated division facts

    Students assigned to the intervention condition began in the addition subtraction assignment if they were in second grade and in the multiplication division 0-10 assignment if they were in third grade Teachers had theability to switch students on an individual basis to other assignments at theirown discretion Sixteen of the 37 second grader using Reflex were switchedinto multiplicationdivision before the posttest Thus some of their timespent in Reflex was dedicated to above-grade-level items that were not onthe posttest

    The recommended usage for Reflex is 3 days per week The four teachersachieved weekly usages of 26 33 34 and 15 These values include all dayson which a login was made even if the student was practicing facts outsidethe range of testing

    The average usage across all students was 27 daysweekThe median time spent in Reflex during the studyrsquos was 59 minutes a

    week which includes time spent in non-instructional aspects of the systemsuch as browsing an in-product store to buy virtual items using tokens earnedin games or cases where a student logged in from home and forgot to log off

    Reflex requires individual accounts with individual passwords A user inthe comparison group could only have used Reflex by logging into the accountof another student

    Post-survey questionnaires were given to all teachers Two teachers fromthe intervention condition returned questionnaires both indicating they re-lied on Reflex as their primary means of developing math fact fluency duringthe course of the study

    12 Comparison Condition

    This study used a business as usual comparison condition Math fluency ingeneral and math fact fluency in particular are required by the Florida MathStandards and Common Core State Standards for grades 2 and 3 FloridaMath Standard MAFS2OA22 and Common Core State Standard 2MD2have identical wordings ldquoFluently add and subtract within 20 using mentalstrategies By end of Grade 2 know from memory all sums of two one-digitnumbersrdquo Similarly Florida Math Standard MAFSOAC7 and CommonCore State Standard3OAC7 read ldquoFluently multiply and divide within 100

    3

    using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

    Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

    Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

    The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

    Table 1 Post-Study Comparison Group Responses

    Grade Average Strategies Time SpentGain (Hours per month)

    3 088 (Did not return survey) NA

    3 081 flash cards timed tests repeti-tion math fact raps

    20 hours

    2 053 (Did not return survey) NA

    2 -001 ten marks flash cards fast factscenter work

    (time everyday)10 minutes

    Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

    4

    13 Setting

    Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

    14 Participants

    The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

    2 Study Design and Analysis

    21 Sample Formation

    The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

    The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

    bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

    5

    bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

    bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

    Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

    Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

    One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

    Group Descriptions

    Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

    22 Outcome Measures

    221 Outcomes

    One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

    6

    Table 2 Baseline Demographic Information

    Full Sample Comparison Group Intervention Group

    Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

    bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

    bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

    These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

    222 Probes

    Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

    Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

    7

    division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

    An example is provided in the Appendix

    223 Administrations

    Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

    The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

    Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

    All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

    Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

    8

    their posttest than on their pretest

    224 Fluency Score Calculation

    For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

    Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

    Table 3 Raw Fluency Pretest Score Distributions by Grade

    Measure Grade 2 Grade 3

    Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

    A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

    The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

    of inversion Thus the calculation for final score isradic

    (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

    9

    23 Validity

    The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

    24 Reliability

    Several researchers have confirmed the reliability of CBM for math fluency

    Table 4 Previous Research on CBM Reliability for Math Fluency

    Metric Scoring Method Source Value

    Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

    minus Incorrect Digitsper Minute

    Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

    per Minute

    Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

    Table 5 Internal Consistency of Raw Fluency Score

    AdditionSubtraction MultiplicationDivision

    Pretest 095 094Interim Test 096 094Posttest 097 095

    10

    We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

    Table 6 Delayed Alternate-Form Reliability (14 weeks)

    AdditionSubtraction MultiplicationDivision

    Intervention 077 047Comparison 072 089

    The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

    When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

    25 Analytic Approach

    Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

    Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

    We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

    11

    The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

    For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

    In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

    bull No Level-2 variables

    bull Two Level-1 variables the covariate in question and pretest score

    bull Group-mean-centered values

    bull Data scaled to be univariate

    This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

    The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

    12

    Table 7 Impact and significance of demographic covariates

    Covariate Coefficient t-score

    age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

    in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

    This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

    from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

    Statistical significance was determined based on the t-score of the multi-level model

    26 Statistical Adjustments

    We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

    Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

    Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

    and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

    indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

    ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

    13

    two high-achieving classes In grade 2 every ESE-designated student was ina single class

    Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

    As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

    Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

    Pretest score is defined asradicC minus I + 2 where C is digits correct per

    minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

    radicC minus 2

    where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

    All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

    Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

    An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

    27 Students Removed from Study

    Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

    Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

    14

    Table 8 Descriptive Statistics of Control Variables

    Control Variable Mean SD Skew Kurtosis

    Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

    they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

    Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

    One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

    28 Missing Data

    Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

    15

    established by Burns et al (2006)

    Table 9 Categorization of Students

    Fluency (dcmin) Category N

    Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

    All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

    281 Frustration Level

    One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

    282 Instructional Level

    Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

    29 Mastery Level

    There were no students in the mastery level for whom imputation was nec-essary

    16

    3 Study Data

    Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

    The tables in this section report unscaled uncentered values for ease ofinterpretability

    31 Pre-Intervention DatamdashAll Pretest Takers

    This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

    Outcome Data

    Measure Comparison Group Intervention Group

    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

    Fluency Score 4 70 444 117 4 70 449 126

    Background Data

    VariableComparison Intervention

    Mean SD Mean SD

    Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

    17

    32 Pre-Intervention DatamdashBaseline Sample

    This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

    Outcome Data

    Measure Comparison Group Intervention Group

    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

    Fluency Score 4 64 457 112 4 66 460 120

    Background Data

    VariableComparison Intervention

    Mean SD Mean SD

    Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

    18

    33 Pre-intervention Data Analytic Sample

    Outcome DatamdashAnalytic Sample

    Measure Comparison Group Intervention Group

    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

    Fact Fluency 4 64 4573 1121 4 65 4580 1195

    Background DatamdashAnalytic Sample

    VariableComparison Intervention

    Mean SD Mean SD

    Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

    Outcome DatamdashAnalytic Sample with No Imputation

    Measure Comparison Group Intervention Group

    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

    Fact Fluency 4 61 4557 1143 4 61 4640 1142

    19

    Background DatamdashAnalytic Sample with No Imputation

    VariableComparison Intervention

    Mean SD Mean SD

    Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

    34 Post-intervention Data and Findings

    341 Analytic Sample

    As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

    Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

    Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

    Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

    4Nminus9for small

    effect size

    20

    Estimation of Effect SizemdashAnalytic Sample

    Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

    Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    342 Analytic Sample with No Imputation

    Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

    Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

    Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

    Model Comparison Group Intervention Group Estimated Effect

    Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

    Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    Estimation of Effect SizemdashAnalytic Sample with No Imputation

    Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

    Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    21

    35 Subpopulation Analyses

    We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

    Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

    Difference Within-Group SD (adj Hedgesrsquo g) t-score

    Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    4 Acknowledgment

    We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

    This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

    The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

    22

    References

    Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

    Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

    Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

    Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

    Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

    Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

    Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

    Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

    UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

    VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

    Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

    Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

    23

    Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

    24

    Appendix A Full Model

    The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

    Factor Coefficient t-score

    Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    25

    Appendix B Demographic Model

    The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

    Factor Coefficient t-score

    Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    26

    Appendix C Reduced Model

    The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

    This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

    Factor Coefficient t-score

    Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

    27

    9 5 18 9 8 13

    + 8 + 9 minus10 minus 6 + 3 minus 5

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    10 2 3 10 12 9

    minus 3 + 7 + 8 + 1 minus10 minus 1

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    5 8 3 19 7 16 3

    + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    3 7 15 0 4 14 7

    minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    9 20 11 4 9 6 1

    + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    9 12 12 2 5 9 5

    +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    4 14 7 11 7 4 6

    + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    2 12 14 4 10 1 7

    + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    13 10 3 9 17 10 3

    minus 6 +10 + 6 minus 6 minus 7 +10 + 6

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    4 10 10 3 5 5 10

    + 9 + 2 +10 minus 0 + 3 minus 5 minus10

    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

    Appendix B Sample AdditionSubtraction Probe

    • Study Characteristics
      • Intervention Condition
      • Comparison Condition
      • Setting
      • Participants
        • Study Design and Analysis
          • Sample Formation
          • Outcome Measures
            • Outcomes
            • Probes
            • Administrations
            • Fluency Score Calculation
              • Validity
              • Reliability
              • Analytic Approach
              • Statistical Adjustments
              • Students Removed from Study
              • Missing Data
                • Frustration Level
                • Instructional Level
                  • Mastery Level
                    • Study Data
                      • Pre-Intervention DatamdashAll Pretest Takers
                      • Pre-Intervention DatamdashBaseline Sample
                      • Pre-intervention Data Analytic Sample
                      • Post-intervention Data and Findings
                        • Analytic Sample
                        • Analytic Sample with No Imputation
                          • Subpopulation Analyses
                            • Acknowledgment
                            • Appendices
                            • Appendix Full Model
                            • Appendix Demographic Model
                            • Appendix Reduced Model

      bull Multiplication and Division 0-12 Multiplication facts whose factorsare in the range 0-12 and their associated division facts

      Students assigned to the intervention condition began in the addition subtraction assignment if they were in second grade and in the multiplication division 0-10 assignment if they were in third grade Teachers had theability to switch students on an individual basis to other assignments at theirown discretion Sixteen of the 37 second grader using Reflex were switchedinto multiplicationdivision before the posttest Thus some of their timespent in Reflex was dedicated to above-grade-level items that were not onthe posttest

      The recommended usage for Reflex is 3 days per week The four teachersachieved weekly usages of 26 33 34 and 15 These values include all dayson which a login was made even if the student was practicing facts outsidethe range of testing

      The average usage across all students was 27 daysweekThe median time spent in Reflex during the studyrsquos was 59 minutes a

      week which includes time spent in non-instructional aspects of the systemsuch as browsing an in-product store to buy virtual items using tokens earnedin games or cases where a student logged in from home and forgot to log off

      Reflex requires individual accounts with individual passwords A user inthe comparison group could only have used Reflex by logging into the accountof another student

      Post-survey questionnaires were given to all teachers Two teachers fromthe intervention condition returned questionnaires both indicating they re-lied on Reflex as their primary means of developing math fact fluency duringthe course of the study

      12 Comparison Condition

      This study used a business as usual comparison condition Math fluency ingeneral and math fact fluency in particular are required by the Florida MathStandards and Common Core State Standards for grades 2 and 3 FloridaMath Standard MAFS2OA22 and Common Core State Standard 2MD2have identical wordings ldquoFluently add and subtract within 20 using mentalstrategies By end of Grade 2 know from memory all sums of two one-digitnumbersrdquo Similarly Florida Math Standard MAFSOAC7 and CommonCore State Standard3OAC7 read ldquoFluently multiply and divide within 100

      3

      using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

      Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

      Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

      The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

      Table 1 Post-Study Comparison Group Responses

      Grade Average Strategies Time SpentGain (Hours per month)

      3 088 (Did not return survey) NA

      3 081 flash cards timed tests repeti-tion math fact raps

      20 hours

      2 053 (Did not return survey) NA

      2 -001 ten marks flash cards fast factscenter work

      (time everyday)10 minutes

      Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

      4

      13 Setting

      Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

      14 Participants

      The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

      2 Study Design and Analysis

      21 Sample Formation

      The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

      The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

      bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

      5

      bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

      bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

      Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

      Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

      One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

      Group Descriptions

      Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

      22 Outcome Measures

      221 Outcomes

      One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

      6

      Table 2 Baseline Demographic Information

      Full Sample Comparison Group Intervention Group

      Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

      bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

      bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

      These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

      222 Probes

      Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

      Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

      7

      division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

      An example is provided in the Appendix

      223 Administrations

      Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

      The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

      Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

      All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

      Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

      8

      their posttest than on their pretest

      224 Fluency Score Calculation

      For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

      Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

      Table 3 Raw Fluency Pretest Score Distributions by Grade

      Measure Grade 2 Grade 3

      Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

      A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

      The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

      of inversion Thus the calculation for final score isradic

      (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

      9

      23 Validity

      The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

      24 Reliability

      Several researchers have confirmed the reliability of CBM for math fluency

      Table 4 Previous Research on CBM Reliability for Math Fluency

      Metric Scoring Method Source Value

      Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

      minus Incorrect Digitsper Minute

      Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

      per Minute

      Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

      Table 5 Internal Consistency of Raw Fluency Score

      AdditionSubtraction MultiplicationDivision

      Pretest 095 094Interim Test 096 094Posttest 097 095

      10

      We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

      Table 6 Delayed Alternate-Form Reliability (14 weeks)

      AdditionSubtraction MultiplicationDivision

      Intervention 077 047Comparison 072 089

      The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

      When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

      25 Analytic Approach

      Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

      Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

      We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

      11

      The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

      For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

      In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

      bull No Level-2 variables

      bull Two Level-1 variables the covariate in question and pretest score

      bull Group-mean-centered values

      bull Data scaled to be univariate

      This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

      The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

      12

      Table 7 Impact and significance of demographic covariates

      Covariate Coefficient t-score

      age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

      in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

      This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

      from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

      Statistical significance was determined based on the t-score of the multi-level model

      26 Statistical Adjustments

      We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

      Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

      Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

      and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

      indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

      ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

      13

      two high-achieving classes In grade 2 every ESE-designated student was ina single class

      Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

      As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

      Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

      Pretest score is defined asradicC minus I + 2 where C is digits correct per

      minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

      radicC minus 2

      where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

      All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

      Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

      An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

      27 Students Removed from Study

      Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

      Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

      14

      Table 8 Descriptive Statistics of Control Variables

      Control Variable Mean SD Skew Kurtosis

      Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

      they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

      Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

      One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

      28 Missing Data

      Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

      15

      established by Burns et al (2006)

      Table 9 Categorization of Students

      Fluency (dcmin) Category N

      Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

      All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

      281 Frustration Level

      One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

      282 Instructional Level

      Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

      29 Mastery Level

      There were no students in the mastery level for whom imputation was nec-essary

      16

      3 Study Data

      Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

      The tables in this section report unscaled uncentered values for ease ofinterpretability

      31 Pre-Intervention DatamdashAll Pretest Takers

      This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

      Outcome Data

      Measure Comparison Group Intervention Group

      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

      Fluency Score 4 70 444 117 4 70 449 126

      Background Data

      VariableComparison Intervention

      Mean SD Mean SD

      Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

      17

      32 Pre-Intervention DatamdashBaseline Sample

      This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

      Outcome Data

      Measure Comparison Group Intervention Group

      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

      Fluency Score 4 64 457 112 4 66 460 120

      Background Data

      VariableComparison Intervention

      Mean SD Mean SD

      Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

      18

      33 Pre-intervention Data Analytic Sample

      Outcome DatamdashAnalytic Sample

      Measure Comparison Group Intervention Group

      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

      Fact Fluency 4 64 4573 1121 4 65 4580 1195

      Background DatamdashAnalytic Sample

      VariableComparison Intervention

      Mean SD Mean SD

      Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

      Outcome DatamdashAnalytic Sample with No Imputation

      Measure Comparison Group Intervention Group

      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

      Fact Fluency 4 61 4557 1143 4 61 4640 1142

      19

      Background DatamdashAnalytic Sample with No Imputation

      VariableComparison Intervention

      Mean SD Mean SD

      Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

      34 Post-intervention Data and Findings

      341 Analytic Sample

      As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

      Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

      Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

      Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

      4Nminus9for small

      effect size

      20

      Estimation of Effect SizemdashAnalytic Sample

      Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

      Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      342 Analytic Sample with No Imputation

      Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

      Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

      Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

      Model Comparison Group Intervention Group Estimated Effect

      Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

      Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      Estimation of Effect SizemdashAnalytic Sample with No Imputation

      Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

      Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      21

      35 Subpopulation Analyses

      We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

      Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

      Difference Within-Group SD (adj Hedgesrsquo g) t-score

      Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      4 Acknowledgment

      We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

      This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

      The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

      22

      References

      Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

      Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

      Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

      Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

      Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

      Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

      Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

      Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

      UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

      VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

      Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

      Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

      23

      Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

      24

      Appendix A Full Model

      The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

      Factor Coefficient t-score

      Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      25

      Appendix B Demographic Model

      The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

      Factor Coefficient t-score

      Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      26

      Appendix C Reduced Model

      The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

      This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

      Factor Coefficient t-score

      Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

      27

      9 5 18 9 8 13

      + 8 + 9 minus10 minus 6 + 3 minus 5

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      10 2 3 10 12 9

      minus 3 + 7 + 8 + 1 minus10 minus 1

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      5 8 3 19 7 16 3

      + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      3 7 15 0 4 14 7

      minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      9 20 11 4 9 6 1

      + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      9 12 12 2 5 9 5

      +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      4 14 7 11 7 4 6

      + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      2 12 14 4 10 1 7

      + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      13 10 3 9 17 10 3

      minus 6 +10 + 6 minus 6 minus 7 +10 + 6

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      4 10 10 3 5 5 10

      + 9 + 2 +10 minus 0 + 3 minus 5 minus10

      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

      Appendix B Sample AdditionSubtraction Probe

      • Study Characteristics
        • Intervention Condition
        • Comparison Condition
        • Setting
        • Participants
          • Study Design and Analysis
            • Sample Formation
            • Outcome Measures
              • Outcomes
              • Probes
              • Administrations
              • Fluency Score Calculation
                • Validity
                • Reliability
                • Analytic Approach
                • Statistical Adjustments
                • Students Removed from Study
                • Missing Data
                  • Frustration Level
                  • Instructional Level
                    • Mastery Level
                      • Study Data
                        • Pre-Intervention DatamdashAll Pretest Takers
                        • Pre-Intervention DatamdashBaseline Sample
                        • Pre-intervention Data Analytic Sample
                        • Post-intervention Data and Findings
                          • Analytic Sample
                          • Analytic Sample with No Imputation
                            • Subpopulation Analyses
                              • Acknowledgment
                              • Appendices
                              • Appendix Full Model
                              • Appendix Demographic Model
                              • Appendix Reduced Model

        using strategies such as the relationship between multiplication and division(eg knowing that 8 5 = 40 one knows 40 5 = 8) or properties of operationsBy the end of Grade 3 know from memory all products of two one-digitnumbersrdquo

        Additionally Common Core State Standards specify a number of gen-eral computational fluency requirements for which facility with math factsare foundational (Standards 2NBTB5 2NBTB6 2NBTB7 3OAA3NBTA2) Floridarsquos standards retain these requirements

        Post-survey questionnaires were given to all teachers Teachers in thecomparison condition were asked to describe methods they used to developmath fact fluency and the time they spent on this goal Two of the fourteachers in the comparison condition returned these questionnaires Theircomments are provided below verbatim We have also included data on theaverage fluency gain for each comparison class including the two that didnot return questionnaires

        The survey asked teachers how many hours a month were spent on de-veloping math fact fluency One teacher specified her answer in terms ofminutes per day The wrote ldquo20 hoursrdquo in the blank

        Table 1 Post-Study Comparison Group Responses

        Grade Average Strategies Time SpentGain (Hours per month)

        3 088 (Did not return survey) NA

        3 081 flash cards timed tests repeti-tion math fact raps

        20 hours

        2 053 (Did not return survey) NA

        2 -001 ten marks flash cards fast factscenter work

        (time everyday)10 minutes

        Given the average fluency gains we surmise that the other two teacherslikely spent considerably more than 10 minutes a day on math fact fluencyThe grade 3 responder had a group of high-achieving students so it is possiblehomework was assigned on math fact fluency as it is hard to imagine that20 hours of class time a month was spent on the topic

        4

        13 Setting

        Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

        14 Participants

        The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

        2 Study Design and Analysis

        21 Sample Formation

        The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

        The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

        bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

        5

        bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

        bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

        Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

        Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

        One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

        Group Descriptions

        Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

        22 Outcome Measures

        221 Outcomes

        One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

        6

        Table 2 Baseline Demographic Information

        Full Sample Comparison Group Intervention Group

        Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

        bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

        bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

        These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

        222 Probes

        Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

        Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

        7

        division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

        An example is provided in the Appendix

        223 Administrations

        Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

        The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

        Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

        All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

        Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

        8

        their posttest than on their pretest

        224 Fluency Score Calculation

        For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

        Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

        Table 3 Raw Fluency Pretest Score Distributions by Grade

        Measure Grade 2 Grade 3

        Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

        A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

        The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

        of inversion Thus the calculation for final score isradic

        (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

        9

        23 Validity

        The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

        24 Reliability

        Several researchers have confirmed the reliability of CBM for math fluency

        Table 4 Previous Research on CBM Reliability for Math Fluency

        Metric Scoring Method Source Value

        Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

        minus Incorrect Digitsper Minute

        Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

        per Minute

        Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

        Table 5 Internal Consistency of Raw Fluency Score

        AdditionSubtraction MultiplicationDivision

        Pretest 095 094Interim Test 096 094Posttest 097 095

        10

        We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

        Table 6 Delayed Alternate-Form Reliability (14 weeks)

        AdditionSubtraction MultiplicationDivision

        Intervention 077 047Comparison 072 089

        The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

        When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

        25 Analytic Approach

        Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

        Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

        We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

        11

        The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

        For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

        In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

        bull No Level-2 variables

        bull Two Level-1 variables the covariate in question and pretest score

        bull Group-mean-centered values

        bull Data scaled to be univariate

        This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

        The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

        12

        Table 7 Impact and significance of demographic covariates

        Covariate Coefficient t-score

        age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

        in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

        This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

        from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

        Statistical significance was determined based on the t-score of the multi-level model

        26 Statistical Adjustments

        We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

        Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

        Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

        and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

        indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

        ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

        13

        two high-achieving classes In grade 2 every ESE-designated student was ina single class

        Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

        As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

        Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

        Pretest score is defined asradicC minus I + 2 where C is digits correct per

        minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

        radicC minus 2

        where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

        All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

        Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

        An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

        27 Students Removed from Study

        Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

        Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

        14

        Table 8 Descriptive Statistics of Control Variables

        Control Variable Mean SD Skew Kurtosis

        Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

        they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

        Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

        One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

        28 Missing Data

        Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

        15

        established by Burns et al (2006)

        Table 9 Categorization of Students

        Fluency (dcmin) Category N

        Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

        All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

        281 Frustration Level

        One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

        282 Instructional Level

        Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

        29 Mastery Level

        There were no students in the mastery level for whom imputation was nec-essary

        16

        3 Study Data

        Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

        The tables in this section report unscaled uncentered values for ease ofinterpretability

        31 Pre-Intervention DatamdashAll Pretest Takers

        This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

        Outcome Data

        Measure Comparison Group Intervention Group

        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

        Fluency Score 4 70 444 117 4 70 449 126

        Background Data

        VariableComparison Intervention

        Mean SD Mean SD

        Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

        17

        32 Pre-Intervention DatamdashBaseline Sample

        This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

        Outcome Data

        Measure Comparison Group Intervention Group

        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

        Fluency Score 4 64 457 112 4 66 460 120

        Background Data

        VariableComparison Intervention

        Mean SD Mean SD

        Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

        18

        33 Pre-intervention Data Analytic Sample

        Outcome DatamdashAnalytic Sample

        Measure Comparison Group Intervention Group

        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

        Fact Fluency 4 64 4573 1121 4 65 4580 1195

        Background DatamdashAnalytic Sample

        VariableComparison Intervention

        Mean SD Mean SD

        Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

        Outcome DatamdashAnalytic Sample with No Imputation

        Measure Comparison Group Intervention Group

        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

        Fact Fluency 4 61 4557 1143 4 61 4640 1142

        19

        Background DatamdashAnalytic Sample with No Imputation

        VariableComparison Intervention

        Mean SD Mean SD

        Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

        34 Post-intervention Data and Findings

        341 Analytic Sample

        As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

        Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

        Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

        Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

        4Nminus9for small

        effect size

        20

        Estimation of Effect SizemdashAnalytic Sample

        Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

        Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        342 Analytic Sample with No Imputation

        Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

        Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

        Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

        Model Comparison Group Intervention Group Estimated Effect

        Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

        Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        Estimation of Effect SizemdashAnalytic Sample with No Imputation

        Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

        Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        21

        35 Subpopulation Analyses

        We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

        Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

        Difference Within-Group SD (adj Hedgesrsquo g) t-score

        Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        4 Acknowledgment

        We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

        This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

        The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

        22

        References

        Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

        Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

        Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

        Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

        Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

        Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

        Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

        Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

        UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

        VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

        Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

        Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

        23

        Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

        24

        Appendix A Full Model

        The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

        Factor Coefficient t-score

        Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        25

        Appendix B Demographic Model

        The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

        Factor Coefficient t-score

        Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        26

        Appendix C Reduced Model

        The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

        This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

        Factor Coefficient t-score

        Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

        27

        9 5 18 9 8 13

        + 8 + 9 minus10 minus 6 + 3 minus 5

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        10 2 3 10 12 9

        minus 3 + 7 + 8 + 1 minus10 minus 1

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        5 8 3 19 7 16 3

        + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        3 7 15 0 4 14 7

        minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        9 20 11 4 9 6 1

        + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        9 12 12 2 5 9 5

        +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        4 14 7 11 7 4 6

        + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        2 12 14 4 10 1 7

        + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        13 10 3 9 17 10 3

        minus 6 +10 + 6 minus 6 minus 7 +10 + 6

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        4 10 10 3 5 5 10

        + 9 + 2 +10 minus 0 + 3 minus 5 minus10

        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

        Appendix B Sample AdditionSubtraction Probe

        • Study Characteristics
          • Intervention Condition
          • Comparison Condition
          • Setting
          • Participants
            • Study Design and Analysis
              • Sample Formation
              • Outcome Measures
                • Outcomes
                • Probes
                • Administrations
                • Fluency Score Calculation
                  • Validity
                  • Reliability
                  • Analytic Approach
                  • Statistical Adjustments
                  • Students Removed from Study
                  • Missing Data
                    • Frustration Level
                    • Instructional Level
                      • Mastery Level
                        • Study Data
                          • Pre-Intervention DatamdashAll Pretest Takers
                          • Pre-Intervention DatamdashBaseline Sample
                          • Pre-intervention Data Analytic Sample
                          • Post-intervention Data and Findings
                            • Analytic Sample
                            • Analytic Sample with No Imputation
                              • Subpopulation Analyses
                                • Acknowledgment
                                • Appendices
                                • Appendix Full Model
                                • Appendix Demographic Model
                                • Appendix Reduced Model

          13 Setting

          Teachers from a Florida school in a metropolitan area participated in thisstudy The demographic data provided by the school indicate it is a majority-minority school 57 of its second- and third-grade students are Hispanic orLatino and 31 are Caucasian The data provided indicate that 28 havelow English proficiency and 17 are on free or reduced lunch

          14 Participants

          The participating students are generally demographically similar to the fullpopulation of second- and third-grade students in terms of exceptional stu-dent status race gender and economic status In all cases we relied oninformation received from the school

          2 Study Design and Analysis

          21 Sample Formation

          The school was identified by project personnel owing to its previous interestin Reflex The school was offered a discount on a later subscription in ex-change for participation After logistical discussions to ensure that the schoolhad sufficient technical resources to allow usage of a computer-delivered in-tervention teachers were asked to volunteer for participation Nine teachersinitially volunteered to have their homeroom students take part One of thesehomeroom classes was taught by another teacher who also taught her ownhomeroom so the 9 classes were taught by 8 teachers

          The study was intended as a cluster random control trial with the teach-ers from each grade randomly assigned to condition Unfortunately thedesign was compromised across grade 3 teachers One teacher assigned tothe comparison did not participate at allmdashproject personnel did not admin-ister pretests Another teacher assigned to the treatment never used theintervention There was zero uptake across her entire class Review of emailexchanges suggest three possible causes

          bull The liaison between the head researcher and the school may have mis-represented the constraints of the study to the school He reports thatthe school may have thought that an even number of teachers wererequired

          5

          bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

          bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

          Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

          Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

          One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

          Group Descriptions

          Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

          22 Outcome Measures

          221 Outcomes

          One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

          6

          Table 2 Baseline Demographic Information

          Full Sample Comparison Group Intervention Group

          Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

          bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

          bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

          These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

          222 Probes

          Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

          Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

          7

          division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

          An example is provided in the Appendix

          223 Administrations

          Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

          The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

          Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

          All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

          Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

          8

          their posttest than on their pretest

          224 Fluency Score Calculation

          For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

          Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

          Table 3 Raw Fluency Pretest Score Distributions by Grade

          Measure Grade 2 Grade 3

          Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

          A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

          The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

          of inversion Thus the calculation for final score isradic

          (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

          9

          23 Validity

          The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

          24 Reliability

          Several researchers have confirmed the reliability of CBM for math fluency

          Table 4 Previous Research on CBM Reliability for Math Fluency

          Metric Scoring Method Source Value

          Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

          minus Incorrect Digitsper Minute

          Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

          per Minute

          Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

          Table 5 Internal Consistency of Raw Fluency Score

          AdditionSubtraction MultiplicationDivision

          Pretest 095 094Interim Test 096 094Posttest 097 095

          10

          We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

          Table 6 Delayed Alternate-Form Reliability (14 weeks)

          AdditionSubtraction MultiplicationDivision

          Intervention 077 047Comparison 072 089

          The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

          When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

          25 Analytic Approach

          Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

          Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

          We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

          11

          The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

          For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

          In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

          bull No Level-2 variables

          bull Two Level-1 variables the covariate in question and pretest score

          bull Group-mean-centered values

          bull Data scaled to be univariate

          This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

          The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

          12

          Table 7 Impact and significance of demographic covariates

          Covariate Coefficient t-score

          age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

          in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

          This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

          from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

          Statistical significance was determined based on the t-score of the multi-level model

          26 Statistical Adjustments

          We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

          Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

          Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

          and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

          indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

          ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

          13

          two high-achieving classes In grade 2 every ESE-designated student was ina single class

          Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

          As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

          Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

          Pretest score is defined asradicC minus I + 2 where C is digits correct per

          minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

          radicC minus 2

          where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

          All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

          Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

          An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

          27 Students Removed from Study

          Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

          Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

          14

          Table 8 Descriptive Statistics of Control Variables

          Control Variable Mean SD Skew Kurtosis

          Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

          they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

          Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

          One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

          28 Missing Data

          Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

          15

          established by Burns et al (2006)

          Table 9 Categorization of Students

          Fluency (dcmin) Category N

          Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

          All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

          281 Frustration Level

          One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

          282 Instructional Level

          Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

          29 Mastery Level

          There were no students in the mastery level for whom imputation was nec-essary

          16

          3 Study Data

          Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

          The tables in this section report unscaled uncentered values for ease ofinterpretability

          31 Pre-Intervention DatamdashAll Pretest Takers

          This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

          Outcome Data

          Measure Comparison Group Intervention Group

          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

          Fluency Score 4 70 444 117 4 70 449 126

          Background Data

          VariableComparison Intervention

          Mean SD Mean SD

          Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

          17

          32 Pre-Intervention DatamdashBaseline Sample

          This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

          Outcome Data

          Measure Comparison Group Intervention Group

          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

          Fluency Score 4 64 457 112 4 66 460 120

          Background Data

          VariableComparison Intervention

          Mean SD Mean SD

          Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

          18

          33 Pre-intervention Data Analytic Sample

          Outcome DatamdashAnalytic Sample

          Measure Comparison Group Intervention Group

          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

          Fact Fluency 4 64 4573 1121 4 65 4580 1195

          Background DatamdashAnalytic Sample

          VariableComparison Intervention

          Mean SD Mean SD

          Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

          Outcome DatamdashAnalytic Sample with No Imputation

          Measure Comparison Group Intervention Group

          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

          Fact Fluency 4 61 4557 1143 4 61 4640 1142

          19

          Background DatamdashAnalytic Sample with No Imputation

          VariableComparison Intervention

          Mean SD Mean SD

          Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

          34 Post-intervention Data and Findings

          341 Analytic Sample

          As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

          Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

          Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

          Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

          4Nminus9for small

          effect size

          20

          Estimation of Effect SizemdashAnalytic Sample

          Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

          Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          342 Analytic Sample with No Imputation

          Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

          Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

          Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

          Model Comparison Group Intervention Group Estimated Effect

          Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

          Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          Estimation of Effect SizemdashAnalytic Sample with No Imputation

          Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

          Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          21

          35 Subpopulation Analyses

          We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

          Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

          Difference Within-Group SD (adj Hedgesrsquo g) t-score

          Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          4 Acknowledgment

          We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

          This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

          The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

          22

          References

          Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

          Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

          Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

          Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

          Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

          Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

          Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

          Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

          UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

          VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

          Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

          Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

          23

          Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

          24

          Appendix A Full Model

          The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

          Factor Coefficient t-score

          Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          25

          Appendix B Demographic Model

          The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

          Factor Coefficient t-score

          Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          26

          Appendix C Reduced Model

          The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

          This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

          Factor Coefficient t-score

          Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

          27

          9 5 18 9 8 13

          + 8 + 9 minus10 minus 6 + 3 minus 5

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          10 2 3 10 12 9

          minus 3 + 7 + 8 + 1 minus10 minus 1

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          5 8 3 19 7 16 3

          + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          3 7 15 0 4 14 7

          minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          9 20 11 4 9 6 1

          + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          9 12 12 2 5 9 5

          +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          4 14 7 11 7 4 6

          + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          2 12 14 4 10 1 7

          + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          13 10 3 9 17 10 3

          minus 6 +10 + 6 minus 6 minus 7 +10 + 6

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          4 10 10 3 5 5 10

          + 9 + 2 +10 minus 0 + 3 minus 5 minus10

          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

          Appendix B Sample AdditionSubtraction Probe

          • Study Characteristics
            • Intervention Condition
            • Comparison Condition
            • Setting
            • Participants
              • Study Design and Analysis
                • Sample Formation
                • Outcome Measures
                  • Outcomes
                  • Probes
                  • Administrations
                  • Fluency Score Calculation
                    • Validity
                    • Reliability
                    • Analytic Approach
                    • Statistical Adjustments
                    • Students Removed from Study
                    • Missing Data
                      • Frustration Level
                      • Instructional Level
                        • Mastery Level
                          • Study Data
                            • Pre-Intervention DatamdashAll Pretest Takers
                            • Pre-Intervention DatamdashBaseline Sample
                            • Pre-intervention Data Analytic Sample
                            • Post-intervention Data and Findings
                              • Analytic Sample
                              • Analytic Sample with No Imputation
                                • Subpopulation Analyses
                                  • Acknowledgment
                                  • Appendices
                                  • Appendix Full Model
                                  • Appendix Demographic Model
                                  • Appendix Reduced Model

            bull Two giftedhigh-achieving classes participated in the study They wereboth inadvertently randomly assigned to the intervention It was ourintention to split these through block randomization but we only re-ceived the pertinent data after selection and due to a misreading ofthe correspondence failed to catch the error so no re-assignment wasdone The school may have rectified our error themselves

            bull It is possible that one of the teachers simply did not want to use theintervention Project personnel doing the training reported that sheattended but ldquohad to leave early onrdquo

            Given the above we our analyzing our study as a QED where the intactgroups are the 8 classes for whom we have pretest data and the interventiongroup comprises those classes where any uptake occurred prior to posttest

            Teachers were provided the opportunity to indicate any students whowere not prepared for fact fluency instruction Four third-grade studentswere identified 3 from the intervention group and 1 from the comparisongroup These studentsrsquo data were not considered as part of the study

            One of the teachers taught two classes one within the intervention groupand another in the comparison group All other teachers taught a singleclass

            Group Descriptions

            Table 2 provides a description of the demographic character of the groupsas well as their pretest scores results The fluency score on the pretest com-bines both speed and accuracy as described in the Fluency Score Calculationsubsection

            22 Outcome Measures

            221 Outcomes

            One outcome were measured in the study math fact fluency which is botha key component of general math achievement and has been shown to bepredictive of studentsrsquo performance on general math achievement tests (seeValidity subsection below) Fluency was measured using timed probes

            6

            Table 2 Baseline Demographic Information

            Full Sample Comparison Group Intervention Group

            Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

            bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

            bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

            These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

            222 Probes

            Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

            Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

            7

            division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

            An example is provided in the Appendix

            223 Administrations

            Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

            The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

            Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

            All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

            Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

            8

            their posttest than on their pretest

            224 Fluency Score Calculation

            For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

            Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

            Table 3 Raw Fluency Pretest Score Distributions by Grade

            Measure Grade 2 Grade 3

            Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

            A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

            The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

            of inversion Thus the calculation for final score isradic

            (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

            9

            23 Validity

            The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

            24 Reliability

            Several researchers have confirmed the reliability of CBM for math fluency

            Table 4 Previous Research on CBM Reliability for Math Fluency

            Metric Scoring Method Source Value

            Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

            minus Incorrect Digitsper Minute

            Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

            per Minute

            Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

            Table 5 Internal Consistency of Raw Fluency Score

            AdditionSubtraction MultiplicationDivision

            Pretest 095 094Interim Test 096 094Posttest 097 095

            10

            We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

            Table 6 Delayed Alternate-Form Reliability (14 weeks)

            AdditionSubtraction MultiplicationDivision

            Intervention 077 047Comparison 072 089

            The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

            When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

            25 Analytic Approach

            Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

            Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

            We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

            11

            The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

            For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

            In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

            bull No Level-2 variables

            bull Two Level-1 variables the covariate in question and pretest score

            bull Group-mean-centered values

            bull Data scaled to be univariate

            This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

            The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

            12

            Table 7 Impact and significance of demographic covariates

            Covariate Coefficient t-score

            age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

            in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

            This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

            from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

            Statistical significance was determined based on the t-score of the multi-level model

            26 Statistical Adjustments

            We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

            Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

            Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

            and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

            indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

            ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

            13

            two high-achieving classes In grade 2 every ESE-designated student was ina single class

            Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

            As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

            Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

            Pretest score is defined asradicC minus I + 2 where C is digits correct per

            minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

            radicC minus 2

            where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

            All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

            Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

            An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

            27 Students Removed from Study

            Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

            Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

            14

            Table 8 Descriptive Statistics of Control Variables

            Control Variable Mean SD Skew Kurtosis

            Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

            they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

            Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

            One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

            28 Missing Data

            Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

            15

            established by Burns et al (2006)

            Table 9 Categorization of Students

            Fluency (dcmin) Category N

            Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

            All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

            281 Frustration Level

            One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

            282 Instructional Level

            Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

            29 Mastery Level

            There were no students in the mastery level for whom imputation was nec-essary

            16

            3 Study Data

            Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

            The tables in this section report unscaled uncentered values for ease ofinterpretability

            31 Pre-Intervention DatamdashAll Pretest Takers

            This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

            Outcome Data

            Measure Comparison Group Intervention Group

            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

            Fluency Score 4 70 444 117 4 70 449 126

            Background Data

            VariableComparison Intervention

            Mean SD Mean SD

            Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

            17

            32 Pre-Intervention DatamdashBaseline Sample

            This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

            Outcome Data

            Measure Comparison Group Intervention Group

            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

            Fluency Score 4 64 457 112 4 66 460 120

            Background Data

            VariableComparison Intervention

            Mean SD Mean SD

            Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

            18

            33 Pre-intervention Data Analytic Sample

            Outcome DatamdashAnalytic Sample

            Measure Comparison Group Intervention Group

            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

            Fact Fluency 4 64 4573 1121 4 65 4580 1195

            Background DatamdashAnalytic Sample

            VariableComparison Intervention

            Mean SD Mean SD

            Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

            Outcome DatamdashAnalytic Sample with No Imputation

            Measure Comparison Group Intervention Group

            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

            Fact Fluency 4 61 4557 1143 4 61 4640 1142

            19

            Background DatamdashAnalytic Sample with No Imputation

            VariableComparison Intervention

            Mean SD Mean SD

            Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

            34 Post-intervention Data and Findings

            341 Analytic Sample

            As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

            Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

            Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

            Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

            4Nminus9for small

            effect size

            20

            Estimation of Effect SizemdashAnalytic Sample

            Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

            Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            342 Analytic Sample with No Imputation

            Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

            Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

            Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

            Model Comparison Group Intervention Group Estimated Effect

            Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

            Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            Estimation of Effect SizemdashAnalytic Sample with No Imputation

            Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

            Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            21

            35 Subpopulation Analyses

            We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

            Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

            Difference Within-Group SD (adj Hedgesrsquo g) t-score

            Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            4 Acknowledgment

            We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

            This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

            The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

            22

            References

            Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

            Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

            Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

            Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

            Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

            Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

            Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

            Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

            UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

            VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

            Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

            Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

            23

            Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

            24

            Appendix A Full Model

            The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

            Factor Coefficient t-score

            Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            25

            Appendix B Demographic Model

            The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

            Factor Coefficient t-score

            Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            26

            Appendix C Reduced Model

            The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

            This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

            Factor Coefficient t-score

            Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

            27

            9 5 18 9 8 13

            + 8 + 9 minus10 minus 6 + 3 minus 5

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            10 2 3 10 12 9

            minus 3 + 7 + 8 + 1 minus10 minus 1

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            5 8 3 19 7 16 3

            + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            3 7 15 0 4 14 7

            minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            9 20 11 4 9 6 1

            + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            9 12 12 2 5 9 5

            +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            4 14 7 11 7 4 6

            + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            2 12 14 4 10 1 7

            + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            13 10 3 9 17 10 3

            minus 6 +10 + 6 minus 6 minus 7 +10 + 6

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            4 10 10 3 5 5 10

            + 9 + 2 +10 minus 0 + 3 minus 5 minus10

            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

            Appendix B Sample AdditionSubtraction Probe

            • Study Characteristics
              • Intervention Condition
              • Comparison Condition
              • Setting
              • Participants
                • Study Design and Analysis
                  • Sample Formation
                  • Outcome Measures
                    • Outcomes
                    • Probes
                    • Administrations
                    • Fluency Score Calculation
                      • Validity
                      • Reliability
                      • Analytic Approach
                      • Statistical Adjustments
                      • Students Removed from Study
                      • Missing Data
                        • Frustration Level
                        • Instructional Level
                          • Mastery Level
                            • Study Data
                              • Pre-Intervention DatamdashAll Pretest Takers
                              • Pre-Intervention DatamdashBaseline Sample
                              • Pre-intervention Data Analytic Sample
                              • Post-intervention Data and Findings
                                • Analytic Sample
                                • Analytic Sample with No Imputation
                                  • Subpopulation Analyses
                                    • Acknowledgment
                                    • Appendices
                                    • Appendix Full Model
                                    • Appendix Demographic Model
                                    • Appendix Reduced Model

              Table 2 Baseline Demographic Information

              Full Sample Comparison Group Intervention Group

              Sample Size 129 64 65Grade 3 Students 481 484 477 Hispanic 542 531 553 Asian 171 188 154 White 225 219 231 Black 47 62 31 Multiracial 16 00 3 Low English Proficiency 202 250 154 Exceptional Student (Gifted) 256 25 261 FreeReduced Lunch 202 219 184 Male 465 531 40age-at-pretest (years) 843 844 841pre-test Accuracy 923 932 914pre-test Speed 429 426 431pre-test Score 458 457 458

              bull Grade 2 students were testing on facts with terms minuend and sub-trahends from 0 to 10 inclusive (ie from 0 + 0 up to 10 + 10 and theirassociated subtraction facts)

              bull Grade 3 students were tested on facts with factors divisors and quo-tients from 0 to 10 inclusive (ie from 0 times 0 to 10 times 10 and theirassociated division facts)

              These match the requirements in the Common Core State Standards ex-cept that owing to that documentrsquos idiosyncratic definition of ldquowithin Xrdquo(as in rdquoaddition within 20rdquo) a literal reading of the work indicates that factssuch as 20 minus 17 and 91 divide 13 are considered within grade level The FloridaMath Standards do not provide a glossary so it is unclear whether such factswould be in the scope of the wording of its standards

              222 Probes

              Probes had a format similar to those in other Curriculum Based Measurement(CBM) studies (Hintze Christ amp Keller 2002 Burns VanDerHeyden amp Jiban2006 Stevens amp Leigh 2012) as described below

              Each probe was a single-sheet of paper with 10 rows of vertically orientedproblems Probes given to grade 2 students contained addition and subtrac-tion facts Probes given to grade 3 students contained multiplication and

              7

              division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

              An example is provided in the Appendix

              223 Administrations

              Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

              The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

              Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

              All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

              Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

              8

              their posttest than on their pretest

              224 Fluency Score Calculation

              For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

              Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

              Table 3 Raw Fluency Pretest Score Distributions by Grade

              Measure Grade 2 Grade 3

              Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

              A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

              The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

              of inversion Thus the calculation for final score isradic

              (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

              9

              23 Validity

              The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

              24 Reliability

              Several researchers have confirmed the reliability of CBM for math fluency

              Table 4 Previous Research on CBM Reliability for Math Fluency

              Metric Scoring Method Source Value

              Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

              minus Incorrect Digitsper Minute

              Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

              per Minute

              Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

              Table 5 Internal Consistency of Raw Fluency Score

              AdditionSubtraction MultiplicationDivision

              Pretest 095 094Interim Test 096 094Posttest 097 095

              10

              We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

              Table 6 Delayed Alternate-Form Reliability (14 weeks)

              AdditionSubtraction MultiplicationDivision

              Intervention 077 047Comparison 072 089

              The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

              When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

              25 Analytic Approach

              Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

              Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

              We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

              11

              The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

              For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

              In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

              bull No Level-2 variables

              bull Two Level-1 variables the covariate in question and pretest score

              bull Group-mean-centered values

              bull Data scaled to be univariate

              This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

              The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

              12

              Table 7 Impact and significance of demographic covariates

              Covariate Coefficient t-score

              age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

              in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

              This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

              from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

              Statistical significance was determined based on the t-score of the multi-level model

              26 Statistical Adjustments

              We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

              Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

              Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

              and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

              indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

              ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

              13

              two high-achieving classes In grade 2 every ESE-designated student was ina single class

              Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

              As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

              Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

              Pretest score is defined asradicC minus I + 2 where C is digits correct per

              minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

              radicC minus 2

              where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

              All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

              Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

              An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

              27 Students Removed from Study

              Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

              Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

              14

              Table 8 Descriptive Statistics of Control Variables

              Control Variable Mean SD Skew Kurtosis

              Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

              they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

              Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

              One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

              28 Missing Data

              Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

              15

              established by Burns et al (2006)

              Table 9 Categorization of Students

              Fluency (dcmin) Category N

              Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

              All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

              281 Frustration Level

              One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

              282 Instructional Level

              Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

              29 Mastery Level

              There were no students in the mastery level for whom imputation was nec-essary

              16

              3 Study Data

              Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

              The tables in this section report unscaled uncentered values for ease ofinterpretability

              31 Pre-Intervention DatamdashAll Pretest Takers

              This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

              Outcome Data

              Measure Comparison Group Intervention Group

              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

              Fluency Score 4 70 444 117 4 70 449 126

              Background Data

              VariableComparison Intervention

              Mean SD Mean SD

              Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

              17

              32 Pre-Intervention DatamdashBaseline Sample

              This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

              Outcome Data

              Measure Comparison Group Intervention Group

              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

              Fluency Score 4 64 457 112 4 66 460 120

              Background Data

              VariableComparison Intervention

              Mean SD Mean SD

              Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

              18

              33 Pre-intervention Data Analytic Sample

              Outcome DatamdashAnalytic Sample

              Measure Comparison Group Intervention Group

              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

              Fact Fluency 4 64 4573 1121 4 65 4580 1195

              Background DatamdashAnalytic Sample

              VariableComparison Intervention

              Mean SD Mean SD

              Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

              Outcome DatamdashAnalytic Sample with No Imputation

              Measure Comparison Group Intervention Group

              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

              Fact Fluency 4 61 4557 1143 4 61 4640 1142

              19

              Background DatamdashAnalytic Sample with No Imputation

              VariableComparison Intervention

              Mean SD Mean SD

              Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

              34 Post-intervention Data and Findings

              341 Analytic Sample

              As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

              Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

              Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

              Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

              4Nminus9for small

              effect size

              20

              Estimation of Effect SizemdashAnalytic Sample

              Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

              Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              342 Analytic Sample with No Imputation

              Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

              Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

              Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

              Model Comparison Group Intervention Group Estimated Effect

              Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

              Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              Estimation of Effect SizemdashAnalytic Sample with No Imputation

              Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

              Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              21

              35 Subpopulation Analyses

              We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

              Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

              Difference Within-Group SD (adj Hedgesrsquo g) t-score

              Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              4 Acknowledgment

              We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

              This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

              The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

              22

              References

              Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

              Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

              Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

              Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

              Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

              Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

              Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

              Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

              UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

              VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

              Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

              Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

              23

              Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

              24

              Appendix A Full Model

              The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

              Factor Coefficient t-score

              Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              25

              Appendix B Demographic Model

              The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

              Factor Coefficient t-score

              Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              26

              Appendix C Reduced Model

              The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

              This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

              Factor Coefficient t-score

              Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

              27

              9 5 18 9 8 13

              + 8 + 9 minus10 minus 6 + 3 minus 5

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              10 2 3 10 12 9

              minus 3 + 7 + 8 + 1 minus10 minus 1

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              5 8 3 19 7 16 3

              + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              3 7 15 0 4 14 7

              minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              9 20 11 4 9 6 1

              + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              9 12 12 2 5 9 5

              +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              4 14 7 11 7 4 6

              + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              2 12 14 4 10 1 7

              + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              13 10 3 9 17 10 3

              minus 6 +10 + 6 minus 6 minus 7 +10 + 6

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              4 10 10 3 5 5 10

              + 9 + 2 +10 minus 0 + 3 minus 5 minus10

              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

              Appendix B Sample AdditionSubtraction Probe

              • Study Characteristics
                • Intervention Condition
                • Comparison Condition
                • Setting
                • Participants
                  • Study Design and Analysis
                    • Sample Formation
                    • Outcome Measures
                      • Outcomes
                      • Probes
                      • Administrations
                      • Fluency Score Calculation
                        • Validity
                        • Reliability
                        • Analytic Approach
                        • Statistical Adjustments
                        • Students Removed from Study
                        • Missing Data
                          • Frustration Level
                          • Instructional Level
                            • Mastery Level
                              • Study Data
                                • Pre-Intervention DatamdashAll Pretest Takers
                                • Pre-Intervention DatamdashBaseline Sample
                                • Pre-intervention Data Analytic Sample
                                • Post-intervention Data and Findings
                                  • Analytic Sample
                                  • Analytic Sample with No Imputation
                                    • Subpopulation Analyses
                                      • Acknowledgment
                                      • Appendices
                                      • Appendix Full Model
                                      • Appendix Demographic Model
                                      • Appendix Reduced Model

                division facts The problems were printed in extra large type so only 7 factsfit on each row The first two rows only contained 6 facts to make room fora geometric shape placed in the upper-righthand corner to help students andmonitors quickly identify which page the students were on The problemswere computer-generated with the constraint that the problems in a givenrow be as balanced as possible between the two operations The facts werechosen randomly from the appropriate fact pool with each having an identicalselection likelihood

                An example is provided in the Appendix

                223 Administrations

                Three administrations were given A pretest administration was conductedon February 12th 2016 An interim administration was conducted on April14th timed to occur before heavy preparation for end-of-year testing beganA final administration was conducted on May 24th Students were told toanswer the items in order and not to skip items The administrator used ascript and was witnessed by the classroom teacher who used a checklist toconfirm each of several key points of instruction This form also providedspace for indicating any unusual occurrences

                The first and second administrations each comprised 4 one-minute factfluency probes Students were instructed that the first probe was a warm-upin each case The final administration did not have a warm-up probe Itcontained 3 math fact probes

                Grade 2 students also took a multi-digit computation probe but the re-sults of that probe were not analyzed as part of this combined report forthird grade students did not take a multidigit probe Multidigit multipli-cationdivision is not a core topic for third grade students in Florida andthe distribution of scores on the multi-digit additionsubtraction probe wereknown to be fundamentally different from the distribution of scores on mathfact probes so there is no clear way to combine the two

                All students in a given grade took the same probes using the same ad-ministrative script regardless of condition The probes that were describedas ldquowarmuprdquo tests were not counted in any analysis

                Five studentsmdashall in comparison classesmdashwere noted by test administra-tors as working on their quizzes significantly beyond the called time limitThese students were not formally considered part of the study Posttestswere taken by these students Three of the five students scored higher on

                8

                their posttest than on their pretest

                224 Fluency Score Calculation

                For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

                Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

                Table 3 Raw Fluency Pretest Score Distributions by Grade

                Measure Grade 2 Grade 3

                Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

                A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

                The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

                of inversion Thus the calculation for final score isradic

                (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

                9

                23 Validity

                The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

                24 Reliability

                Several researchers have confirmed the reliability of CBM for math fluency

                Table 4 Previous Research on CBM Reliability for Math Fluency

                Metric Scoring Method Source Value

                Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

                minus Incorrect Digitsper Minute

                Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

                per Minute

                Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

                Table 5 Internal Consistency of Raw Fluency Score

                AdditionSubtraction MultiplicationDivision

                Pretest 095 094Interim Test 096 094Posttest 097 095

                10

                We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

                Table 6 Delayed Alternate-Form Reliability (14 weeks)

                AdditionSubtraction MultiplicationDivision

                Intervention 077 047Comparison 072 089

                The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

                When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

                25 Analytic Approach

                Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

                Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

                We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

                11

                The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

                For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

                In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

                bull No Level-2 variables

                bull Two Level-1 variables the covariate in question and pretest score

                bull Group-mean-centered values

                bull Data scaled to be univariate

                This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

                The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

                12

                Table 7 Impact and significance of demographic covariates

                Covariate Coefficient t-score

                age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

                in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

                This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

                from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

                Statistical significance was determined based on the t-score of the multi-level model

                26 Statistical Adjustments

                We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

                Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

                Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

                and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

                indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

                ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

                13

                two high-achieving classes In grade 2 every ESE-designated student was ina single class

                Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                Pretest score is defined asradicC minus I + 2 where C is digits correct per

                minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                radicC minus 2

                where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                27 Students Removed from Study

                Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                14

                Table 8 Descriptive Statistics of Control Variables

                Control Variable Mean SD Skew Kurtosis

                Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                28 Missing Data

                Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                15

                established by Burns et al (2006)

                Table 9 Categorization of Students

                Fluency (dcmin) Category N

                Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                281 Frustration Level

                One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                282 Instructional Level

                Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                29 Mastery Level

                There were no students in the mastery level for whom imputation was nec-essary

                16

                3 Study Data

                Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                The tables in this section report unscaled uncentered values for ease ofinterpretability

                31 Pre-Intervention DatamdashAll Pretest Takers

                This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                Outcome Data

                Measure Comparison Group Intervention Group

                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                Fluency Score 4 70 444 117 4 70 449 126

                Background Data

                VariableComparison Intervention

                Mean SD Mean SD

                Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                17

                32 Pre-Intervention DatamdashBaseline Sample

                This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                Outcome Data

                Measure Comparison Group Intervention Group

                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                Fluency Score 4 64 457 112 4 66 460 120

                Background Data

                VariableComparison Intervention

                Mean SD Mean SD

                Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                18

                33 Pre-intervention Data Analytic Sample

                Outcome DatamdashAnalytic Sample

                Measure Comparison Group Intervention Group

                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                Fact Fluency 4 64 4573 1121 4 65 4580 1195

                Background DatamdashAnalytic Sample

                VariableComparison Intervention

                Mean SD Mean SD

                Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                Outcome DatamdashAnalytic Sample with No Imputation

                Measure Comparison Group Intervention Group

                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                Fact Fluency 4 61 4557 1143 4 61 4640 1142

                19

                Background DatamdashAnalytic Sample with No Imputation

                VariableComparison Intervention

                Mean SD Mean SD

                Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                34 Post-intervention Data and Findings

                341 Analytic Sample

                As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                4Nminus9for small

                effect size

                20

                Estimation of Effect SizemdashAnalytic Sample

                Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                342 Analytic Sample with No Imputation

                Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                Model Comparison Group Intervention Group Estimated Effect

                Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                Estimation of Effect SizemdashAnalytic Sample with No Imputation

                Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                21

                35 Subpopulation Analyses

                We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                Difference Within-Group SD (adj Hedgesrsquo g) t-score

                Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                4 Acknowledgment

                We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                22

                References

                Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                23

                Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                24

                Appendix A Full Model

                The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                Factor Coefficient t-score

                Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                25

                Appendix B Demographic Model

                The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                Factor Coefficient t-score

                Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                26

                Appendix C Reduced Model

                The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                Factor Coefficient t-score

                Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                27

                9 5 18 9 8 13

                + 8 + 9 minus10 minus 6 + 3 minus 5

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                10 2 3 10 12 9

                minus 3 + 7 + 8 + 1 minus10 minus 1

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                5 8 3 19 7 16 3

                + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                3 7 15 0 4 14 7

                minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                9 20 11 4 9 6 1

                + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                9 12 12 2 5 9 5

                +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                4 14 7 11 7 4 6

                + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                2 12 14 4 10 1 7

                + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                13 10 3 9 17 10 3

                minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                4 10 10 3 5 5 10

                + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                Appendix B Sample AdditionSubtraction Probe

                • Study Characteristics
                  • Intervention Condition
                  • Comparison Condition
                  • Setting
                  • Participants
                    • Study Design and Analysis
                      • Sample Formation
                      • Outcome Measures
                        • Outcomes
                        • Probes
                        • Administrations
                        • Fluency Score Calculation
                          • Validity
                          • Reliability
                          • Analytic Approach
                          • Statistical Adjustments
                          • Students Removed from Study
                          • Missing Data
                            • Frustration Level
                            • Instructional Level
                              • Mastery Level
                                • Study Data
                                  • Pre-Intervention DatamdashAll Pretest Takers
                                  • Pre-Intervention DatamdashBaseline Sample
                                  • Pre-intervention Data Analytic Sample
                                  • Post-intervention Data and Findings
                                    • Analytic Sample
                                    • Analytic Sample with No Imputation
                                      • Subpopulation Analyses
                                        • Acknowledgment
                                        • Appendices
                                        • Appendix Full Model
                                        • Appendix Demographic Model
                                        • Appendix Reduced Model

                  their posttest than on their pretest

                  224 Fluency Score Calculation

                  For each student raw fluency scores were calculated as the average numberof digits correct per min (dcmin) minus the number of digits incorrect permin (dimin) as this was the method found by Stevens amp Leigh (2012) tohave the greatest criterion validity

                  Previous CBM researchers have combined grade 2 and grade 3 students(Burns et al 2006) but to justify the pooling of their outcomes in a singleanalysis we conducted an analysis of the distribution of raw pretest scoresfor each grade separately to show similarity of distribution

                  Table 3 Raw Fluency Pretest Score Distributions by Grade

                  Measure Grade 2 Grade 3

                  Mean 2026 2027Standard Deviation 1053 1135Median 19 19Kurtosis 237 191Skewness 117 101Range 5467 5833Optimal Box-Cox (anchored at 1) λ 050 056

                  A Kolmogorov-Smirnov corroborated the premise that these two distri-butions were quite similar It failed to reject homogeneity (critical D-statwas 0233 calculated D-stat was 0063 p-value = 099)

                  The distribution of these raw scores were significantly skewed and lep-tokurtic as has been reported in similar studies (Burns et al 2006) so wenormalized them using a Box-Cox transformation to arrive at a final fluencyscore Following the recommendation of Osborne (2005) we anchored the fulldistribution at a minimum value of 1 by adding 2 to all raw fluency scoresA search for an optimum λ returned 0525 so we chose λ = 05 for simplicity

                  of inversion Thus the calculation for final score isradic

                  (C minus I + 2) where Cis the average digits correct per min and I is the average digits incorrect permin The resulting distribution of pretest scores was not significantly skewed(skew = 008 SES 021) but was still slightly leptokurtic (Kurtosis = 085SEK = 042) DAgostino-Pearson (p-value = 013) and Jarque-Barre tests(p-value = 013) failed to reject normality

                  9

                  23 Validity

                  The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

                  24 Reliability

                  Several researchers have confirmed the reliability of CBM for math fluency

                  Table 4 Previous Research on CBM Reliability for Math Fluency

                  Metric Scoring Method Source Value

                  Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

                  minus Incorrect Digitsper Minute

                  Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

                  per Minute

                  Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

                  Table 5 Internal Consistency of Raw Fluency Score

                  AdditionSubtraction MultiplicationDivision

                  Pretest 095 094Interim Test 096 094Posttest 097 095

                  10

                  We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

                  Table 6 Delayed Alternate-Form Reliability (14 weeks)

                  AdditionSubtraction MultiplicationDivision

                  Intervention 077 047Comparison 072 089

                  The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

                  When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

                  25 Analytic Approach

                  Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

                  Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

                  We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

                  11

                  The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

                  For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

                  In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

                  bull No Level-2 variables

                  bull Two Level-1 variables the covariate in question and pretest score

                  bull Group-mean-centered values

                  bull Data scaled to be univariate

                  This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

                  The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

                  12

                  Table 7 Impact and significance of demographic covariates

                  Covariate Coefficient t-score

                  age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

                  in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

                  This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

                  from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

                  Statistical significance was determined based on the t-score of the multi-level model

                  26 Statistical Adjustments

                  We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

                  Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

                  Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

                  and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

                  indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

                  ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

                  13

                  two high-achieving classes In grade 2 every ESE-designated student was ina single class

                  Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                  As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                  Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                  Pretest score is defined asradicC minus I + 2 where C is digits correct per

                  minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                  radicC minus 2

                  where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                  All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                  Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                  An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                  27 Students Removed from Study

                  Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                  Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                  14

                  Table 8 Descriptive Statistics of Control Variables

                  Control Variable Mean SD Skew Kurtosis

                  Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                  they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                  Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                  One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                  28 Missing Data

                  Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                  15

                  established by Burns et al (2006)

                  Table 9 Categorization of Students

                  Fluency (dcmin) Category N

                  Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                  All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                  281 Frustration Level

                  One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                  282 Instructional Level

                  Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                  29 Mastery Level

                  There were no students in the mastery level for whom imputation was nec-essary

                  16

                  3 Study Data

                  Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                  The tables in this section report unscaled uncentered values for ease ofinterpretability

                  31 Pre-Intervention DatamdashAll Pretest Takers

                  This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                  Outcome Data

                  Measure Comparison Group Intervention Group

                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                  Fluency Score 4 70 444 117 4 70 449 126

                  Background Data

                  VariableComparison Intervention

                  Mean SD Mean SD

                  Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                  17

                  32 Pre-Intervention DatamdashBaseline Sample

                  This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                  Outcome Data

                  Measure Comparison Group Intervention Group

                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                  Fluency Score 4 64 457 112 4 66 460 120

                  Background Data

                  VariableComparison Intervention

                  Mean SD Mean SD

                  Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                  18

                  33 Pre-intervention Data Analytic Sample

                  Outcome DatamdashAnalytic Sample

                  Measure Comparison Group Intervention Group

                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                  Fact Fluency 4 64 4573 1121 4 65 4580 1195

                  Background DatamdashAnalytic Sample

                  VariableComparison Intervention

                  Mean SD Mean SD

                  Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                  Outcome DatamdashAnalytic Sample with No Imputation

                  Measure Comparison Group Intervention Group

                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                  Fact Fluency 4 61 4557 1143 4 61 4640 1142

                  19

                  Background DatamdashAnalytic Sample with No Imputation

                  VariableComparison Intervention

                  Mean SD Mean SD

                  Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                  34 Post-intervention Data and Findings

                  341 Analytic Sample

                  As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                  Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                  Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                  Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                  4Nminus9for small

                  effect size

                  20

                  Estimation of Effect SizemdashAnalytic Sample

                  Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                  Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  342 Analytic Sample with No Imputation

                  Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                  Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                  Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                  Model Comparison Group Intervention Group Estimated Effect

                  Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                  Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  Estimation of Effect SizemdashAnalytic Sample with No Imputation

                  Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                  Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  21

                  35 Subpopulation Analyses

                  We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                  Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                  Difference Within-Group SD (adj Hedgesrsquo g) t-score

                  Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  4 Acknowledgment

                  We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                  This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                  The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                  22

                  References

                  Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                  Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                  Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                  Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                  Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                  Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                  Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                  Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                  UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                  VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                  Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                  Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                  23

                  Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                  24

                  Appendix A Full Model

                  The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                  Factor Coefficient t-score

                  Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  25

                  Appendix B Demographic Model

                  The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                  Factor Coefficient t-score

                  Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  26

                  Appendix C Reduced Model

                  The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                  This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                  Factor Coefficient t-score

                  Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                  27

                  9 5 18 9 8 13

                  + 8 + 9 minus10 minus 6 + 3 minus 5

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  10 2 3 10 12 9

                  minus 3 + 7 + 8 + 1 minus10 minus 1

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  5 8 3 19 7 16 3

                  + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  3 7 15 0 4 14 7

                  minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  9 20 11 4 9 6 1

                  + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  9 12 12 2 5 9 5

                  +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  4 14 7 11 7 4 6

                  + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  2 12 14 4 10 1 7

                  + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  13 10 3 9 17 10 3

                  minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  4 10 10 3 5 5 10

                  + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                  Appendix B Sample AdditionSubtraction Probe

                  • Study Characteristics
                    • Intervention Condition
                    • Comparison Condition
                    • Setting
                    • Participants
                      • Study Design and Analysis
                        • Sample Formation
                        • Outcome Measures
                          • Outcomes
                          • Probes
                          • Administrations
                          • Fluency Score Calculation
                            • Validity
                            • Reliability
                            • Analytic Approach
                            • Statistical Adjustments
                            • Students Removed from Study
                            • Missing Data
                              • Frustration Level
                              • Instructional Level
                                • Mastery Level
                                  • Study Data
                                    • Pre-Intervention DatamdashAll Pretest Takers
                                    • Pre-Intervention DatamdashBaseline Sample
                                    • Pre-intervention Data Analytic Sample
                                    • Post-intervention Data and Findings
                                      • Analytic Sample
                                      • Analytic Sample with No Imputation
                                        • Subpopulation Analyses
                                          • Acknowledgment
                                          • Appendices
                                          • Appendix Full Model
                                          • Appendix Demographic Model
                                          • Appendix Reduced Model

                    23 Validity

                    The criterion validity for CBM based measures in elementary math hasbeen established by Stevens amp Leigh (2012) and VanDerHeyden amp Burns(2008) These studies showed math fact fluency was predictive of generalmath achievement on the Oklahoma Core Curriculum test and StanfordAchievement Test respectively

                    24 Reliability

                    Several researchers have confirmed the reliability of CBM for math fluency

                    Table 4 Previous Research on CBM Reliability for Math Fluency

                    Metric Scoring Method Source Value

                    Inter-scorer Agreement Correct Digits per Minute (Burns et al 2006) 096+Inter-scorer Agreement Correct Digits per Minute (Hintze et al 2002) 0955Inter-scorer Agreement Correct Digits per Minute (Stevens amp Leigh 2012) 099+

                    minus Incorrect Digitsper Minute

                    Delayed Alternate-form Correct Digits per Minute (Burns et al 2006) 084ReliabilityAbsolute Generalizability Correct Digits per Minute (Hintze et al 2002) 075Relative Generalizability Correct Digits per Minute (Hintze et al 2002) 095Test-Retest Alternate Correct Digits per minute (Stevens amp Leigh 2012) 087Form Reliability minus Incorrect Digits

                    per Minute

                    Our study gave 3 separate fact probes on the same day allowing us tomeasure internal consistency of raw fluency score (correct digits minus incor-rect digits) using Cronbachrsquos α The α values across the six test are describedin Table 5

                    Table 5 Internal Consistency of Raw Fluency Score

                    AdditionSubtraction MultiplicationDivision

                    Pretest 095 094Interim Test 096 094Posttest 097 095

                    10

                    We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

                    Table 6 Delayed Alternate-Form Reliability (14 weeks)

                    AdditionSubtraction MultiplicationDivision

                    Intervention 077 047Comparison 072 089

                    The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

                    When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

                    25 Analytic Approach

                    Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

                    Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

                    We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

                    11

                    The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

                    For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

                    In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

                    bull No Level-2 variables

                    bull Two Level-1 variables the covariate in question and pretest score

                    bull Group-mean-centered values

                    bull Data scaled to be univariate

                    This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

                    The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

                    12

                    Table 7 Impact and significance of demographic covariates

                    Covariate Coefficient t-score

                    age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

                    in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

                    This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

                    from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

                    Statistical significance was determined based on the t-score of the multi-level model

                    26 Statistical Adjustments

                    We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

                    Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

                    Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

                    and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

                    indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

                    ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

                    13

                    two high-achieving classes In grade 2 every ESE-designated student was ina single class

                    Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                    As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                    Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                    Pretest score is defined asradicC minus I + 2 where C is digits correct per

                    minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                    radicC minus 2

                    where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                    All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                    Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                    An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                    27 Students Removed from Study

                    Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                    Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                    14

                    Table 8 Descriptive Statistics of Control Variables

                    Control Variable Mean SD Skew Kurtosis

                    Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                    they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                    Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                    One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                    28 Missing Data

                    Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                    15

                    established by Burns et al (2006)

                    Table 9 Categorization of Students

                    Fluency (dcmin) Category N

                    Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                    All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                    281 Frustration Level

                    One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                    282 Instructional Level

                    Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                    29 Mastery Level

                    There were no students in the mastery level for whom imputation was nec-essary

                    16

                    3 Study Data

                    Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                    The tables in this section report unscaled uncentered values for ease ofinterpretability

                    31 Pre-Intervention DatamdashAll Pretest Takers

                    This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                    Outcome Data

                    Measure Comparison Group Intervention Group

                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                    Fluency Score 4 70 444 117 4 70 449 126

                    Background Data

                    VariableComparison Intervention

                    Mean SD Mean SD

                    Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                    17

                    32 Pre-Intervention DatamdashBaseline Sample

                    This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                    Outcome Data

                    Measure Comparison Group Intervention Group

                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                    Fluency Score 4 64 457 112 4 66 460 120

                    Background Data

                    VariableComparison Intervention

                    Mean SD Mean SD

                    Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                    18

                    33 Pre-intervention Data Analytic Sample

                    Outcome DatamdashAnalytic Sample

                    Measure Comparison Group Intervention Group

                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                    Fact Fluency 4 64 4573 1121 4 65 4580 1195

                    Background DatamdashAnalytic Sample

                    VariableComparison Intervention

                    Mean SD Mean SD

                    Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                    Outcome DatamdashAnalytic Sample with No Imputation

                    Measure Comparison Group Intervention Group

                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                    Fact Fluency 4 61 4557 1143 4 61 4640 1142

                    19

                    Background DatamdashAnalytic Sample with No Imputation

                    VariableComparison Intervention

                    Mean SD Mean SD

                    Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                    34 Post-intervention Data and Findings

                    341 Analytic Sample

                    As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                    Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                    Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                    Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                    4Nminus9for small

                    effect size

                    20

                    Estimation of Effect SizemdashAnalytic Sample

                    Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                    Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    342 Analytic Sample with No Imputation

                    Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                    Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                    Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                    Model Comparison Group Intervention Group Estimated Effect

                    Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                    Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    Estimation of Effect SizemdashAnalytic Sample with No Imputation

                    Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                    Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    21

                    35 Subpopulation Analyses

                    We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                    Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                    Difference Within-Group SD (adj Hedgesrsquo g) t-score

                    Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    4 Acknowledgment

                    We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                    This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                    The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                    22

                    References

                    Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                    Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                    Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                    Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                    Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                    Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                    Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                    Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                    UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                    VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                    Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                    Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                    23

                    Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                    24

                    Appendix A Full Model

                    The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                    Factor Coefficient t-score

                    Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    25

                    Appendix B Demographic Model

                    The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                    Factor Coefficient t-score

                    Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    26

                    Appendix C Reduced Model

                    The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                    This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                    Factor Coefficient t-score

                    Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                    27

                    9 5 18 9 8 13

                    + 8 + 9 minus10 minus 6 + 3 minus 5

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    10 2 3 10 12 9

                    minus 3 + 7 + 8 + 1 minus10 minus 1

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    5 8 3 19 7 16 3

                    + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    3 7 15 0 4 14 7

                    minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    9 20 11 4 9 6 1

                    + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    9 12 12 2 5 9 5

                    +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    4 14 7 11 7 4 6

                    + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    2 12 14 4 10 1 7

                    + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    13 10 3 9 17 10 3

                    minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    4 10 10 3 5 5 10

                    + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                    Appendix B Sample AdditionSubtraction Probe

                    • Study Characteristics
                      • Intervention Condition
                      • Comparison Condition
                      • Setting
                      • Participants
                        • Study Design and Analysis
                          • Sample Formation
                          • Outcome Measures
                            • Outcomes
                            • Probes
                            • Administrations
                            • Fluency Score Calculation
                              • Validity
                              • Reliability
                              • Analytic Approach
                              • Statistical Adjustments
                              • Students Removed from Study
                              • Missing Data
                                • Frustration Level
                                • Instructional Level
                                  • Mastery Level
                                    • Study Data
                                      • Pre-Intervention DatamdashAll Pretest Takers
                                      • Pre-Intervention DatamdashBaseline Sample
                                      • Pre-intervention Data Analytic Sample
                                      • Post-intervention Data and Findings
                                        • Analytic Sample
                                        • Analytic Sample with No Imputation
                                          • Subpopulation Analyses
                                            • Acknowledgment
                                            • Appendices
                                            • Appendix Full Model
                                            • Appendix Demographic Model
                                            • Appendix Reduced Model

                      We also calculated delayed alternate-form reliability of the final fluencyscore across each grade times condition cohort and found an average value of071

                      Table 6 Delayed Alternate-Form Reliability (14 weeks)

                      AdditionSubtraction MultiplicationDivision

                      Intervention 077 047Comparison 072 089

                      The relatively poor value for the 3rd grade intervention group may bedue to large variation in dosage The standard variation in weekly usageacross 3rd grade intervention groups was 124 daysweek nearly twice thatof the 2nd grade intervention group where the standard deviation was 065daysweek

                      When dosage was added to the model predicting posttest score frompretest score the agreement between the two intervention groups improvedconsiderably The coefficients of multiple correlation were R = 081 andR = 077 for the 2nd and 3rd grade intervention groups respectively

                      25 Analytic Approach

                      Since randomized assignment occurred at the class level we used an HLMmodeling approach to account for cluster effects when analyzing the rela-tionship between condition and posttest fluency The model has two levelsmdashgrade and condition are level-2 variables and all other covariates are level-1variables We used grand-mean-centered values for the lower level variablesand a maximum-likelihood method for determining the random effects Ifthe search for a model did not converge using maximum likelihood restrictedmaximum likelihood was used instead

                      Models were constructed using Rrsquos lmer function part of the lme4 libraryusing the methodology for two-tier HLM models documented in a technicalreport from the Department of Statistics and Data Sciences The Universityof Texas at Austin (UTA 2015) which showed the similarity in results tothose given by SPSS SAS Mplus and HLM

                      We formed 3 models of decreasing complexity and calculated an effectsize and statistical significance based on each

                      11

                      The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

                      For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

                      In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

                      bull No Level-2 variables

                      bull Two Level-1 variables the covariate in question and pretest score

                      bull Group-mean-centered values

                      bull Data scaled to be univariate

                      This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

                      The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

                      12

                      Table 7 Impact and significance of demographic covariates

                      Covariate Coefficient t-score

                      age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

                      in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

                      This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

                      from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

                      Statistical significance was determined based on the t-score of the multi-level model

                      26 Statistical Adjustments

                      We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

                      Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

                      Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

                      and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

                      indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

                      ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

                      13

                      two high-achieving classes In grade 2 every ESE-designated student was ina single class

                      Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                      As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                      Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                      Pretest score is defined asradicC minus I + 2 where C is digits correct per

                      minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                      radicC minus 2

                      where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                      All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                      Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                      An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                      27 Students Removed from Study

                      Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                      Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                      14

                      Table 8 Descriptive Statistics of Control Variables

                      Control Variable Mean SD Skew Kurtosis

                      Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                      they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                      Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                      One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                      28 Missing Data

                      Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                      15

                      established by Burns et al (2006)

                      Table 9 Categorization of Students

                      Fluency (dcmin) Category N

                      Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                      All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                      281 Frustration Level

                      One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                      282 Instructional Level

                      Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                      29 Mastery Level

                      There were no students in the mastery level for whom imputation was nec-essary

                      16

                      3 Study Data

                      Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                      The tables in this section report unscaled uncentered values for ease ofinterpretability

                      31 Pre-Intervention DatamdashAll Pretest Takers

                      This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                      Outcome Data

                      Measure Comparison Group Intervention Group

                      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                      Fluency Score 4 70 444 117 4 70 449 126

                      Background Data

                      VariableComparison Intervention

                      Mean SD Mean SD

                      Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                      17

                      32 Pre-Intervention DatamdashBaseline Sample

                      This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                      Outcome Data

                      Measure Comparison Group Intervention Group

                      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                      Fluency Score 4 64 457 112 4 66 460 120

                      Background Data

                      VariableComparison Intervention

                      Mean SD Mean SD

                      Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                      18

                      33 Pre-intervention Data Analytic Sample

                      Outcome DatamdashAnalytic Sample

                      Measure Comparison Group Intervention Group

                      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                      Fact Fluency 4 64 4573 1121 4 65 4580 1195

                      Background DatamdashAnalytic Sample

                      VariableComparison Intervention

                      Mean SD Mean SD

                      Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                      Outcome DatamdashAnalytic Sample with No Imputation

                      Measure Comparison Group Intervention Group

                      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                      Fact Fluency 4 61 4557 1143 4 61 4640 1142

                      19

                      Background DatamdashAnalytic Sample with No Imputation

                      VariableComparison Intervention

                      Mean SD Mean SD

                      Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                      34 Post-intervention Data and Findings

                      341 Analytic Sample

                      As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                      Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                      Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                      Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                      4Nminus9for small

                      effect size

                      20

                      Estimation of Effect SizemdashAnalytic Sample

                      Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                      Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      342 Analytic Sample with No Imputation

                      Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                      Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                      Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                      Model Comparison Group Intervention Group Estimated Effect

                      Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                      Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      Estimation of Effect SizemdashAnalytic Sample with No Imputation

                      Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                      Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      21

                      35 Subpopulation Analyses

                      We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                      Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                      Difference Within-Group SD (adj Hedgesrsquo g) t-score

                      Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      4 Acknowledgment

                      We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                      This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                      The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                      22

                      References

                      Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                      Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                      Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                      Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                      Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                      Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                      Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                      Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                      UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                      VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                      Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                      Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                      23

                      Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                      24

                      Appendix A Full Model

                      The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                      Factor Coefficient t-score

                      Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      25

                      Appendix B Demographic Model

                      The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                      Factor Coefficient t-score

                      Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      26

                      Appendix C Reduced Model

                      The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                      This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                      Factor Coefficient t-score

                      Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                      27

                      9 5 18 9 8 13

                      + 8 + 9 minus10 minus 6 + 3 minus 5

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      10 2 3 10 12 9

                      minus 3 + 7 + 8 + 1 minus10 minus 1

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      5 8 3 19 7 16 3

                      + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      3 7 15 0 4 14 7

                      minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      9 20 11 4 9 6 1

                      + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      9 12 12 2 5 9 5

                      +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      4 14 7 11 7 4 6

                      + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      2 12 14 4 10 1 7

                      + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      13 10 3 9 17 10 3

                      minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      4 10 10 3 5 5 10

                      + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                      Appendix B Sample AdditionSubtraction Probe

                      • Study Characteristics
                        • Intervention Condition
                        • Comparison Condition
                        • Setting
                        • Participants
                          • Study Design and Analysis
                            • Sample Formation
                            • Outcome Measures
                              • Outcomes
                              • Probes
                              • Administrations
                              • Fluency Score Calculation
                                • Validity
                                • Reliability
                                • Analytic Approach
                                • Statistical Adjustments
                                • Students Removed from Study
                                • Missing Data
                                  • Frustration Level
                                  • Instructional Level
                                    • Mastery Level
                                      • Study Data
                                        • Pre-Intervention DatamdashAll Pretest Takers
                                        • Pre-Intervention DatamdashBaseline Sample
                                        • Pre-intervention Data Analytic Sample
                                        • Post-intervention Data and Findings
                                          • Analytic Sample
                                          • Analytic Sample with No Imputation
                                            • Subpopulation Analyses
                                              • Acknowledgment
                                              • Appendices
                                              • Appendix Full Model
                                              • Appendix Demographic Model
                                              • Appendix Reduced Model

                        The first model uses the same structure as that used in the original ver-sion of this report In this model all dichotomous and numeric covariateswere used (ie all covariates other than race which was polynominal) in-cluding the pretest accuracy and pretest speed This model is most inclusiveand allows for continuity between the original version of this report and thecurrent version It is denoted as the Full Model

                        For the data available at the time of the original report the pretest speedand pretest accuracy were both highly significant (p lt 0001) But afterremoving students who did not respect the time limits on the pretest orwere designated as being below grade level before the study began theseadditional pretest features were no longer statistically significant A nestedmodel χ-squared test comparing change in deviance to change in degrees offreedom did not show a statistically significant improvement upon addingeither of these terms Thus we generated a new model lacking these twopretest features but retaining all the demographic covariates of the originalThis model is denoted in the sequel as the Demographic Model

                        In an effort to simplify the model further we assessed the relevance ofeach of the demographic variables by generating a HLM with the followingcharacteristics

                        bull No Level-2 variables

                        bull Two Level-1 variables the covariate in question and pretest score

                        bull Group-mean-centered values

                        bull Data scaled to be univariate

                        This method was selected for determining the relevance of a given level-1factor based on Woltman Feldstain MacKay amp Rocchirsquos (2012) presentationThe results are shown in Table 7 Note that this was the only analysis usinggroup-mean centered data The modelrsquos used for determining interventioneffect and statistical significance used grand-mean centered level-1 variables

                        The results of this analysis are shown in Table 7 Given their very lowcoefficients and t-scores we removed gender and ESE Upon forming the fullHLM using the remaining covariates it was found that LEP had very littleimpact (coefficient = 003) and low significance (t = 024) so it was droppedas well In the resulting model all covariates had t-scores greater than 09

                        12

                        Table 7 Impact and significance of demographic covariates

                        Covariate Coefficient t-score

                        age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

                        in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

                        This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

                        from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

                        Statistical significance was determined based on the t-score of the multi-level model

                        26 Statistical Adjustments

                        We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

                        Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

                        Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

                        and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

                        indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

                        ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

                        13

                        two high-achieving classes In grade 2 every ESE-designated student was ina single class

                        Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                        As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                        Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                        Pretest score is defined asradicC minus I + 2 where C is digits correct per

                        minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                        radicC minus 2

                        where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                        All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                        Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                        An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                        27 Students Removed from Study

                        Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                        Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                        14

                        Table 8 Descriptive Statistics of Control Variables

                        Control Variable Mean SD Skew Kurtosis

                        Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                        they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                        Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                        One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                        28 Missing Data

                        Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                        15

                        established by Burns et al (2006)

                        Table 9 Categorization of Students

                        Fluency (dcmin) Category N

                        Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                        All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                        281 Frustration Level

                        One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                        282 Instructional Level

                        Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                        29 Mastery Level

                        There were no students in the mastery level for whom imputation was nec-essary

                        16

                        3 Study Data

                        Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                        The tables in this section report unscaled uncentered values for ease ofinterpretability

                        31 Pre-Intervention DatamdashAll Pretest Takers

                        This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                        Outcome Data

                        Measure Comparison Group Intervention Group

                        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                        Fluency Score 4 70 444 117 4 70 449 126

                        Background Data

                        VariableComparison Intervention

                        Mean SD Mean SD

                        Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                        17

                        32 Pre-Intervention DatamdashBaseline Sample

                        This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                        Outcome Data

                        Measure Comparison Group Intervention Group

                        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                        Fluency Score 4 64 457 112 4 66 460 120

                        Background Data

                        VariableComparison Intervention

                        Mean SD Mean SD

                        Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                        18

                        33 Pre-intervention Data Analytic Sample

                        Outcome DatamdashAnalytic Sample

                        Measure Comparison Group Intervention Group

                        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                        Fact Fluency 4 64 4573 1121 4 65 4580 1195

                        Background DatamdashAnalytic Sample

                        VariableComparison Intervention

                        Mean SD Mean SD

                        Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                        Outcome DatamdashAnalytic Sample with No Imputation

                        Measure Comparison Group Intervention Group

                        Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                        Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                        Fact Fluency 4 61 4557 1143 4 61 4640 1142

                        19

                        Background DatamdashAnalytic Sample with No Imputation

                        VariableComparison Intervention

                        Mean SD Mean SD

                        Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                        34 Post-intervention Data and Findings

                        341 Analytic Sample

                        As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                        Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                        Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                        Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                        4Nminus9for small

                        effect size

                        20

                        Estimation of Effect SizemdashAnalytic Sample

                        Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                        Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        342 Analytic Sample with No Imputation

                        Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                        Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                        Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                        Model Comparison Group Intervention Group Estimated Effect

                        Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                        Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        Estimation of Effect SizemdashAnalytic Sample with No Imputation

                        Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                        Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        21

                        35 Subpopulation Analyses

                        We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                        Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                        Difference Within-Group SD (adj Hedgesrsquo g) t-score

                        Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        4 Acknowledgment

                        We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                        This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                        The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                        22

                        References

                        Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                        Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                        Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                        Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                        Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                        Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                        Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                        Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                        UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                        VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                        Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                        Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                        23

                        Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                        24

                        Appendix A Full Model

                        The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                        Factor Coefficient t-score

                        Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        25

                        Appendix B Demographic Model

                        The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                        Factor Coefficient t-score

                        Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        26

                        Appendix C Reduced Model

                        The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                        This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                        Factor Coefficient t-score

                        Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                        27

                        9 5 18 9 8 13

                        + 8 + 9 minus10 minus 6 + 3 minus 5

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        10 2 3 10 12 9

                        minus 3 + 7 + 8 + 1 minus10 minus 1

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        5 8 3 19 7 16 3

                        + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        3 7 15 0 4 14 7

                        minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        9 20 11 4 9 6 1

                        + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        9 12 12 2 5 9 5

                        +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        4 14 7 11 7 4 6

                        + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        2 12 14 4 10 1 7

                        + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        13 10 3 9 17 10 3

                        minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        4 10 10 3 5 5 10

                        + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                        Appendix B Sample AdditionSubtraction Probe

                        • Study Characteristics
                          • Intervention Condition
                          • Comparison Condition
                          • Setting
                          • Participants
                            • Study Design and Analysis
                              • Sample Formation
                              • Outcome Measures
                                • Outcomes
                                • Probes
                                • Administrations
                                • Fluency Score Calculation
                                  • Validity
                                  • Reliability
                                  • Analytic Approach
                                  • Statistical Adjustments
                                  • Students Removed from Study
                                  • Missing Data
                                    • Frustration Level
                                    • Instructional Level
                                      • Mastery Level
                                        • Study Data
                                          • Pre-Intervention DatamdashAll Pretest Takers
                                          • Pre-Intervention DatamdashBaseline Sample
                                          • Pre-intervention Data Analytic Sample
                                          • Post-intervention Data and Findings
                                            • Analytic Sample
                                            • Analytic Sample with No Imputation
                                              • Subpopulation Analyses
                                                • Acknowledgment
                                                • Appendices
                                                • Appendix Full Model
                                                • Appendix Demographic Model
                                                • Appendix Reduced Model

                          Table 7 Impact and significance of demographic covariates

                          Covariate Coefficient t-score

                          age 0028 0506gender -0005 -0096LEP 0042 0612Lunch 0080 1249ESE -0006 -0100

                          in magnitude and standardized coefficients greater than 01 There was anearly statistically significant interaction (t = 194) between condition andwhether the student was on free or reduced lunch

                          This final model is denoted as the Reduced ModelAll three models are provided in the AppendixEffect sizes were calculated from the coefficient for the intervention effect

                          from each HLM-model and the pooled-within-group standard deviation ofunadjusted post-test scores

                          Statistical significance was determined based on the t-score of the multi-level model

                          26 Statistical Adjustments

                          We used all demographic information provided except race which was non-binary and correlated significantly with other demographic information (Rbetween 036 and 046 for the three most prevalent races in our sample)

                          Grade was coded as grade3 a variable equal to 1 if the student was ingrade 3 and 0 otherwise

                          Age was measured in years as of the pretest administrationGender was coded as a variable male equal to 1 if the student was male

                          and 0 if the student was femaleLow-English proficiency was coded as a variable LEP equal to 1 if school

                          indicated the student had low English proficiencyExceptional Student Status was determined based on the schoolrsquos des-

                          ignation of the student as being within an Exceptional Student Educationprogram It was coded as a variable ESE equal to 1 if the school specifiedthe student as belonging to an ESE program The state of Florida specifiesseveral ESE programs one of which is a program for gifted students Forour study it appears this program furnished the large majority of ESE des-ignations as 29 of the 36 students designated as ESE were concentrated in

                          13

                          two high-achieving classes In grade 2 every ESE-designated student was ina single class

                          Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                          As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                          Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                          Pretest score is defined asradicC minus I + 2 where C is digits correct per

                          minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                          radicC minus 2

                          where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                          All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                          Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                          An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                          27 Students Removed from Study

                          Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                          Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                          14

                          Table 8 Descriptive Statistics of Control Variables

                          Control Variable Mean SD Skew Kurtosis

                          Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                          they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                          Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                          One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                          28 Missing Data

                          Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                          15

                          established by Burns et al (2006)

                          Table 9 Categorization of Students

                          Fluency (dcmin) Category N

                          Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                          All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                          281 Frustration Level

                          One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                          282 Instructional Level

                          Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                          29 Mastery Level

                          There were no students in the mastery level for whom imputation was nec-essary

                          16

                          3 Study Data

                          Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                          The tables in this section report unscaled uncentered values for ease ofinterpretability

                          31 Pre-Intervention DatamdashAll Pretest Takers

                          This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                          Outcome Data

                          Measure Comparison Group Intervention Group

                          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                          Fluency Score 4 70 444 117 4 70 449 126

                          Background Data

                          VariableComparison Intervention

                          Mean SD Mean SD

                          Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                          17

                          32 Pre-Intervention DatamdashBaseline Sample

                          This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                          Outcome Data

                          Measure Comparison Group Intervention Group

                          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                          Fluency Score 4 64 457 112 4 66 460 120

                          Background Data

                          VariableComparison Intervention

                          Mean SD Mean SD

                          Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                          18

                          33 Pre-intervention Data Analytic Sample

                          Outcome DatamdashAnalytic Sample

                          Measure Comparison Group Intervention Group

                          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                          Fact Fluency 4 64 4573 1121 4 65 4580 1195

                          Background DatamdashAnalytic Sample

                          VariableComparison Intervention

                          Mean SD Mean SD

                          Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                          Outcome DatamdashAnalytic Sample with No Imputation

                          Measure Comparison Group Intervention Group

                          Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                          Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                          Fact Fluency 4 61 4557 1143 4 61 4640 1142

                          19

                          Background DatamdashAnalytic Sample with No Imputation

                          VariableComparison Intervention

                          Mean SD Mean SD

                          Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                          34 Post-intervention Data and Findings

                          341 Analytic Sample

                          As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                          Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                          Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                          Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                          4Nminus9for small

                          effect size

                          20

                          Estimation of Effect SizemdashAnalytic Sample

                          Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                          Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          342 Analytic Sample with No Imputation

                          Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                          Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                          Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                          Model Comparison Group Intervention Group Estimated Effect

                          Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                          Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          Estimation of Effect SizemdashAnalytic Sample with No Imputation

                          Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                          Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          21

                          35 Subpopulation Analyses

                          We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                          Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                          Difference Within-Group SD (adj Hedgesrsquo g) t-score

                          Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          4 Acknowledgment

                          We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                          This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                          The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                          22

                          References

                          Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                          Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                          Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                          Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                          Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                          Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                          Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                          Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                          UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                          VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                          Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                          Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                          23

                          Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                          24

                          Appendix A Full Model

                          The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                          Factor Coefficient t-score

                          Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          25

                          Appendix B Demographic Model

                          The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                          Factor Coefficient t-score

                          Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          26

                          Appendix C Reduced Model

                          The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                          This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                          Factor Coefficient t-score

                          Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                          27

                          9 5 18 9 8 13

                          + 8 + 9 minus10 minus 6 + 3 minus 5

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          10 2 3 10 12 9

                          minus 3 + 7 + 8 + 1 minus10 minus 1

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          5 8 3 19 7 16 3

                          + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          3 7 15 0 4 14 7

                          minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          9 20 11 4 9 6 1

                          + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          9 12 12 2 5 9 5

                          +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          4 14 7 11 7 4 6

                          + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          2 12 14 4 10 1 7

                          + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          13 10 3 9 17 10 3

                          minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          4 10 10 3 5 5 10

                          + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                          Appendix B Sample AdditionSubtraction Probe

                          • Study Characteristics
                            • Intervention Condition
                            • Comparison Condition
                            • Setting
                            • Participants
                              • Study Design and Analysis
                                • Sample Formation
                                • Outcome Measures
                                  • Outcomes
                                  • Probes
                                  • Administrations
                                  • Fluency Score Calculation
                                    • Validity
                                    • Reliability
                                    • Analytic Approach
                                    • Statistical Adjustments
                                    • Students Removed from Study
                                    • Missing Data
                                      • Frustration Level
                                      • Instructional Level
                                        • Mastery Level
                                          • Study Data
                                            • Pre-Intervention DatamdashAll Pretest Takers
                                            • Pre-Intervention DatamdashBaseline Sample
                                            • Pre-intervention Data Analytic Sample
                                            • Post-intervention Data and Findings
                                              • Analytic Sample
                                              • Analytic Sample with No Imputation
                                                • Subpopulation Analyses
                                                  • Acknowledgment
                                                  • Appendices
                                                  • Appendix Full Model
                                                  • Appendix Demographic Model
                                                  • Appendix Reduced Model

                            two high-achieving classes In grade 2 every ESE-designated student was ina single class

                            Eligibility for free or reduced lunch was coded as a variable lunch equalto 1 if the student was eligible

                            As described in the Fluency Score Calculation subsection fluency wasevaluated based on research-supported combination of speed and accuracynormalized to reduce skewness via a Box-Cox transformation This meansthat a studentrsquos fluency score depends on personal characteristics such asconfidence sense of urgency on a pen-and-paper test and attention to accu-racy so students differ markedly in potential for improvement

                            Pretest accuracy is the ratio of correct digits to the sum of correct andincorrect digits

                            Pretest score is defined asradicC minus I + 2 where C is digits correct per

                            minute and I is digits incorrect per minutePretest speed is defined in a manner analogous to pretest score

                            radicC minus 2

                            where C is digits correct per minute In this expression 2 is subtractedrather than added so that the expression is anchored at 1 conforming tobest practices (Osborne 2005)

                            All student-level covariates were scaled to be univariate and grand-meancentered for improved interpretability and model convergence

                            Speed score and accuracy on the interim administration were consideredduring the regression process used to impute missing data as described inthe Missing Data section These metrics are calculated exactly as for thepretest using the same formula (ie the data was not re-anchored for theBox-Cox transformation)

                            An HLM model was used to calculate statistical significance for the entiresample so no adjustment for cluster effects were necessary We only analyzedone outcome for this study so no adjustment was made for multiple outcomes

                            27 Students Removed from Study

                            Ten students 4 from the intervention condition and 6 from the compari-son condition were excluded from the analysis In all cases the decision toexclude was based on information attained from the day of the pretest

                            Four of these ten (3 from intervention 1 from comparison) were excludedbecause their teacher indicated they were sufficiently below grade level that

                            14

                            Table 8 Descriptive Statistics of Control Variables

                            Control Variable Mean SD Skew Kurtosis

                            Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                            they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                            Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                            One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                            28 Missing Data

                            Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                            15

                            established by Burns et al (2006)

                            Table 9 Categorization of Students

                            Fluency (dcmin) Category N

                            Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                            All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                            281 Frustration Level

                            One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                            282 Instructional Level

                            Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                            29 Mastery Level

                            There were no students in the mastery level for whom imputation was nec-essary

                            16

                            3 Study Data

                            Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                            The tables in this section report unscaled uncentered values for ease ofinterpretability

                            31 Pre-Intervention DatamdashAll Pretest Takers

                            This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                            Outcome Data

                            Measure Comparison Group Intervention Group

                            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                            Fluency Score 4 70 444 117 4 70 449 126

                            Background Data

                            VariableComparison Intervention

                            Mean SD Mean SD

                            Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                            17

                            32 Pre-Intervention DatamdashBaseline Sample

                            This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                            Outcome Data

                            Measure Comparison Group Intervention Group

                            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                            Fluency Score 4 64 457 112 4 66 460 120

                            Background Data

                            VariableComparison Intervention

                            Mean SD Mean SD

                            Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                            18

                            33 Pre-intervention Data Analytic Sample

                            Outcome DatamdashAnalytic Sample

                            Measure Comparison Group Intervention Group

                            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                            Fact Fluency 4 64 4573 1121 4 65 4580 1195

                            Background DatamdashAnalytic Sample

                            VariableComparison Intervention

                            Mean SD Mean SD

                            Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                            Outcome DatamdashAnalytic Sample with No Imputation

                            Measure Comparison Group Intervention Group

                            Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                            Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                            Fact Fluency 4 61 4557 1143 4 61 4640 1142

                            19

                            Background DatamdashAnalytic Sample with No Imputation

                            VariableComparison Intervention

                            Mean SD Mean SD

                            Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                            34 Post-intervention Data and Findings

                            341 Analytic Sample

                            As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                            Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                            Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                            Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                            4Nminus9for small

                            effect size

                            20

                            Estimation of Effect SizemdashAnalytic Sample

                            Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                            Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            342 Analytic Sample with No Imputation

                            Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                            Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                            Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                            Model Comparison Group Intervention Group Estimated Effect

                            Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                            Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            Estimation of Effect SizemdashAnalytic Sample with No Imputation

                            Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                            Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            21

                            35 Subpopulation Analyses

                            We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                            Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                            Difference Within-Group SD (adj Hedgesrsquo g) t-score

                            Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            4 Acknowledgment

                            We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                            This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                            The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                            22

                            References

                            Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                            Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                            Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                            Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                            Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                            Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                            Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                            Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                            UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                            VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                            Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                            Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                            23

                            Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                            24

                            Appendix A Full Model

                            The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                            Factor Coefficient t-score

                            Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            25

                            Appendix B Demographic Model

                            The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                            Factor Coefficient t-score

                            Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            26

                            Appendix C Reduced Model

                            The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                            This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                            Factor Coefficient t-score

                            Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                            27

                            9 5 18 9 8 13

                            + 8 + 9 minus10 minus 6 + 3 minus 5

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            10 2 3 10 12 9

                            minus 3 + 7 + 8 + 1 minus10 minus 1

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            5 8 3 19 7 16 3

                            + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            3 7 15 0 4 14 7

                            minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            9 20 11 4 9 6 1

                            + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            9 12 12 2 5 9 5

                            +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            4 14 7 11 7 4 6

                            + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            2 12 14 4 10 1 7

                            + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            13 10 3 9 17 10 3

                            minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            4 10 10 3 5 5 10

                            + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                            Appendix B Sample AdditionSubtraction Probe

                            • Study Characteristics
                              • Intervention Condition
                              • Comparison Condition
                              • Setting
                              • Participants
                                • Study Design and Analysis
                                  • Sample Formation
                                  • Outcome Measures
                                    • Outcomes
                                    • Probes
                                    • Administrations
                                    • Fluency Score Calculation
                                      • Validity
                                      • Reliability
                                      • Analytic Approach
                                      • Statistical Adjustments
                                      • Students Removed from Study
                                      • Missing Data
                                        • Frustration Level
                                        • Instructional Level
                                          • Mastery Level
                                            • Study Data
                                              • Pre-Intervention DatamdashAll Pretest Takers
                                              • Pre-Intervention DatamdashBaseline Sample
                                              • Pre-intervention Data Analytic Sample
                                              • Post-intervention Data and Findings
                                                • Analytic Sample
                                                • Analytic Sample with No Imputation
                                                  • Subpopulation Analyses
                                                    • Acknowledgment
                                                    • Appendices
                                                    • Appendix Full Model
                                                    • Appendix Demographic Model
                                                    • Appendix Reduced Model

                              Table 8 Descriptive Statistics of Control Variables

                              Control Variable Mean SD Skew Kurtosis

                              Grade3 048 050 008 -203Age 842 057 003 -095Male 047 050 014 -201LEP 020 040 151 027ESE 026 044 113 -073Lunch 020 040 151 027Pretest Accuracy 092 009 -263 823Pretest Speed 429 118 018 098Pretest Score 458 115 008 085Interim Accuracy 094 006 -184 359Interim Speed 507 120 055 042Interim Fluency Score 531 118 046 039

                              they would not receive typical instruction in math fact fluency This deter-mination was provided on the day of the pretest

                              Five of these ten (all from comparison) were excluded because they didnot stop when time was called on the pretest In three cases these studentshad higher values on their pretest than on their posttest

                              One grade 2 student from the Intervention condition was noted as ap-pearing frustrated and not working on the pretest He had the fourth lowestfluency score of all 2nd-grade participants on the pretest and showed dra-matic improvement by the interim assessment on which he scored at the 33rdpercentile within his grade According to Reflexrsquo internal initial testing thestudent had pre-existing automaticity for 171 of the addition facts within20 and had basic recall ability with 599 This suggests his pretest scoreunder-estimated his actual ability and he was removed from the analysis forfear of artificially inflating the impact of the intervention Note that thisstudent was absent from the final administration

                              28 Missing Data

                              Eight students 5 from the treatment group and 3 from the comparison groupwere absent for the final administration Seven of the students had taken theinterim assessment No values were imputed for the student who missedboth the interim and the final assessment For the seven who had attendedthe interim test we imputed posttest values using a multilinear regressionbased on students in the same instructional level group using the threshold

                              15

                              established by Burns et al (2006)

                              Table 9 Categorization of Students

                              Fluency (dcmin) Category N

                              Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                              All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                              281 Frustration Level

                              One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                              282 Instructional Level

                              Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                              29 Mastery Level

                              There were no students in the mastery level for whom imputation was nec-essary

                              16

                              3 Study Data

                              Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                              The tables in this section report unscaled uncentered values for ease ofinterpretability

                              31 Pre-Intervention DatamdashAll Pretest Takers

                              This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                              Outcome Data

                              Measure Comparison Group Intervention Group

                              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                              Fluency Score 4 70 444 117 4 70 449 126

                              Background Data

                              VariableComparison Intervention

                              Mean SD Mean SD

                              Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                              17

                              32 Pre-Intervention DatamdashBaseline Sample

                              This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                              Outcome Data

                              Measure Comparison Group Intervention Group

                              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                              Fluency Score 4 64 457 112 4 66 460 120

                              Background Data

                              VariableComparison Intervention

                              Mean SD Mean SD

                              Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                              18

                              33 Pre-intervention Data Analytic Sample

                              Outcome DatamdashAnalytic Sample

                              Measure Comparison Group Intervention Group

                              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                              Fact Fluency 4 64 4573 1121 4 65 4580 1195

                              Background DatamdashAnalytic Sample

                              VariableComparison Intervention

                              Mean SD Mean SD

                              Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                              Outcome DatamdashAnalytic Sample with No Imputation

                              Measure Comparison Group Intervention Group

                              Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                              Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                              Fact Fluency 4 61 4557 1143 4 61 4640 1142

                              19

                              Background DatamdashAnalytic Sample with No Imputation

                              VariableComparison Intervention

                              Mean SD Mean SD

                              Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                              34 Post-intervention Data and Findings

                              341 Analytic Sample

                              As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                              Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                              Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                              Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                              4Nminus9for small

                              effect size

                              20

                              Estimation of Effect SizemdashAnalytic Sample

                              Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                              Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              342 Analytic Sample with No Imputation

                              Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                              Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                              Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                              Model Comparison Group Intervention Group Estimated Effect

                              Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                              Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              Estimation of Effect SizemdashAnalytic Sample with No Imputation

                              Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                              Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              21

                              35 Subpopulation Analyses

                              We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                              Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                              Difference Within-Group SD (adj Hedgesrsquo g) t-score

                              Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              4 Acknowledgment

                              We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                              This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                              The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                              22

                              References

                              Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                              Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                              Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                              Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                              Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                              Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                              Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                              Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                              UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                              VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                              Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                              Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                              23

                              Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                              24

                              Appendix A Full Model

                              The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                              Factor Coefficient t-score

                              Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              25

                              Appendix B Demographic Model

                              The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                              Factor Coefficient t-score

                              Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              26

                              Appendix C Reduced Model

                              The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                              This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                              Factor Coefficient t-score

                              Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                              27

                              9 5 18 9 8 13

                              + 8 + 9 minus10 minus 6 + 3 minus 5

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              10 2 3 10 12 9

                              minus 3 + 7 + 8 + 1 minus10 minus 1

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              5 8 3 19 7 16 3

                              + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              3 7 15 0 4 14 7

                              minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              9 20 11 4 9 6 1

                              + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              9 12 12 2 5 9 5

                              +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              4 14 7 11 7 4 6

                              + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              2 12 14 4 10 1 7

                              + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              13 10 3 9 17 10 3

                              minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              4 10 10 3 5 5 10

                              + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                              Appendix B Sample AdditionSubtraction Probe

                              • Study Characteristics
                                • Intervention Condition
                                • Comparison Condition
                                • Setting
                                • Participants
                                  • Study Design and Analysis
                                    • Sample Formation
                                    • Outcome Measures
                                      • Outcomes
                                      • Probes
                                      • Administrations
                                      • Fluency Score Calculation
                                        • Validity
                                        • Reliability
                                        • Analytic Approach
                                        • Statistical Adjustments
                                        • Students Removed from Study
                                        • Missing Data
                                          • Frustration Level
                                          • Instructional Level
                                            • Mastery Level
                                              • Study Data
                                                • Pre-Intervention DatamdashAll Pretest Takers
                                                • Pre-Intervention DatamdashBaseline Sample
                                                • Pre-intervention Data Analytic Sample
                                                • Post-intervention Data and Findings
                                                  • Analytic Sample
                                                  • Analytic Sample with No Imputation
                                                    • Subpopulation Analyses
                                                      • Acknowledgment
                                                      • Appendices
                                                      • Appendix Full Model
                                                      • Appendix Demographic Model
                                                      • Appendix Reduced Model

                                established by Burns et al (2006)

                                Table 9 Categorization of Students

                                Fluency (dcmin) Category N

                                Less than 14 Frustration Level 29 (22)14-31 Instructional Level 81 (63)Greater than 31 Mastery Level 19 (15)

                                All available data (demographic data pretest data and interim test data)were used to impute posttest scores using a OLS regression that retained onlystatistically significant regressors

                                281 Frustration Level

                                One of the seven students for whom posttest scores were imputed was in thefrustration level For that group age (t = 26) pretest accuracy (t = minus32)and interim fluency score (t = 62) were the statistically significant regressors

                                282 Instructional Level

                                Six of the seven students for whom posttest scores were imputed were in theinstructional level Among students in that level grade (t = 42) interimaccuracy (t = minus26) and interim fluency score (t = 95) were statisticallysignificant

                                29 Mastery Level

                                There were no students in the mastery level for whom imputation was nec-essary

                                16

                                3 Study Data

                                Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                                The tables in this section report unscaled uncentered values for ease ofinterpretability

                                31 Pre-Intervention DatamdashAll Pretest Takers

                                This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                                Outcome Data

                                Measure Comparison Group Intervention Group

                                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                Fluency Score 4 70 444 117 4 70 449 126

                                Background Data

                                VariableComparison Intervention

                                Mean SD Mean SD

                                Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                                17

                                32 Pre-Intervention DatamdashBaseline Sample

                                This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                                Outcome Data

                                Measure Comparison Group Intervention Group

                                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                Fluency Score 4 64 457 112 4 66 460 120

                                Background Data

                                VariableComparison Intervention

                                Mean SD Mean SD

                                Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                                18

                                33 Pre-intervention Data Analytic Sample

                                Outcome DatamdashAnalytic Sample

                                Measure Comparison Group Intervention Group

                                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                Fact Fluency 4 64 4573 1121 4 65 4580 1195

                                Background DatamdashAnalytic Sample

                                VariableComparison Intervention

                                Mean SD Mean SD

                                Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                                Outcome DatamdashAnalytic Sample with No Imputation

                                Measure Comparison Group Intervention Group

                                Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                Fact Fluency 4 61 4557 1143 4 61 4640 1142

                                19

                                Background DatamdashAnalytic Sample with No Imputation

                                VariableComparison Intervention

                                Mean SD Mean SD

                                Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                                34 Post-intervention Data and Findings

                                341 Analytic Sample

                                As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                                Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                                Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                                4Nminus9for small

                                effect size

                                20

                                Estimation of Effect SizemdashAnalytic Sample

                                Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                342 Analytic Sample with No Imputation

                                Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                                Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                                Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                                Model Comparison Group Intervention Group Estimated Effect

                                Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                Estimation of Effect SizemdashAnalytic Sample with No Imputation

                                Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                21

                                35 Subpopulation Analyses

                                We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                4 Acknowledgment

                                We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                22

                                References

                                Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                23

                                Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                24

                                Appendix A Full Model

                                The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                Factor Coefficient t-score

                                Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                25

                                Appendix B Demographic Model

                                The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                Factor Coefficient t-score

                                Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                26

                                Appendix C Reduced Model

                                The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                Factor Coefficient t-score

                                Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                27

                                9 5 18 9 8 13

                                + 8 + 9 minus10 minus 6 + 3 minus 5

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                10 2 3 10 12 9

                                minus 3 + 7 + 8 + 1 minus10 minus 1

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                5 8 3 19 7 16 3

                                + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                3 7 15 0 4 14 7

                                minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                9 20 11 4 9 6 1

                                + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                9 12 12 2 5 9 5

                                +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                4 14 7 11 7 4 6

                                + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                2 12 14 4 10 1 7

                                + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                13 10 3 9 17 10 3

                                minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                4 10 10 3 5 5 10

                                + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                Appendix B Sample AdditionSubtraction Probe

                                • Study Characteristics
                                  • Intervention Condition
                                  • Comparison Condition
                                  • Setting
                                  • Participants
                                    • Study Design and Analysis
                                      • Sample Formation
                                      • Outcome Measures
                                        • Outcomes
                                        • Probes
                                        • Administrations
                                        • Fluency Score Calculation
                                          • Validity
                                          • Reliability
                                          • Analytic Approach
                                          • Statistical Adjustments
                                          • Students Removed from Study
                                          • Missing Data
                                            • Frustration Level
                                            • Instructional Level
                                              • Mastery Level
                                                • Study Data
                                                  • Pre-Intervention DatamdashAll Pretest Takers
                                                  • Pre-Intervention DatamdashBaseline Sample
                                                  • Pre-intervention Data Analytic Sample
                                                  • Post-intervention Data and Findings
                                                    • Analytic Sample
                                                    • Analytic Sample with No Imputation
                                                      • Subpopulation Analyses
                                                        • Acknowledgment
                                                        • Appendices
                                                        • Appendix Full Model
                                                        • Appendix Demographic Model
                                                        • Appendix Reduced Model

                                  3 Study Data

                                  Tables compose the large majority of this section They are organized bytable title and subsection title rather than by use of numbers

                                  The tables in this section report unscaled uncentered values for ease ofinterpretability

                                  31 Pre-Intervention DatamdashAll Pretest Takers

                                  This section provides data on all students who took the pretest includingthose that were formally removed from the analysis

                                  Outcome Data

                                  Measure Comparison Group Intervention Group

                                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                  Fluency Score 4 70 444 117 4 70 449 126

                                  Background Data

                                  VariableComparison Intervention

                                  Mean SD Mean SD

                                  Age 8440 0598 8442 0602ESE 0271 0448 0243 0432Male 0500 0504 0400 0493Grade3 0486 0503 0486 0503LEP 0243 0432 0171 0380Lunch 0229 0423 0186 0392Pretest accuracy 0919 0098 0909 0113Pretest speed 4145 1186 4220 1257

                                  17

                                  32 Pre-Intervention DatamdashBaseline Sample

                                  This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                                  Outcome Data

                                  Measure Comparison Group Intervention Group

                                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                  Fluency Score 4 64 457 112 4 66 460 120

                                  Background Data

                                  VariableComparison Intervention

                                  Mean SD Mean SD

                                  Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                                  18

                                  33 Pre-intervention Data Analytic Sample

                                  Outcome DatamdashAnalytic Sample

                                  Measure Comparison Group Intervention Group

                                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                  Fact Fluency 4 64 4573 1121 4 65 4580 1195

                                  Background DatamdashAnalytic Sample

                                  VariableComparison Intervention

                                  Mean SD Mean SD

                                  Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                                  Outcome DatamdashAnalytic Sample with No Imputation

                                  Measure Comparison Group Intervention Group

                                  Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                  Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                  Fact Fluency 4 61 4557 1143 4 61 4640 1142

                                  19

                                  Background DatamdashAnalytic Sample with No Imputation

                                  VariableComparison Intervention

                                  Mean SD Mean SD

                                  Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                                  34 Post-intervention Data and Findings

                                  341 Analytic Sample

                                  As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                                  Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                                  Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                  Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                                  4Nminus9for small

                                  effect size

                                  20

                                  Estimation of Effect SizemdashAnalytic Sample

                                  Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                  Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  342 Analytic Sample with No Imputation

                                  Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                                  Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                                  Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                                  Model Comparison Group Intervention Group Estimated Effect

                                  Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                  Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  Estimation of Effect SizemdashAnalytic Sample with No Imputation

                                  Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                  Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  21

                                  35 Subpopulation Analyses

                                  We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                  Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                  Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                  Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  4 Acknowledgment

                                  We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                  This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                  The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                  22

                                  References

                                  Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                  Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                  Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                  Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                  Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                  Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                  Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                  Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                  UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                  VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                  Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                  Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                  23

                                  Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                  24

                                  Appendix A Full Model

                                  The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                  Factor Coefficient t-score

                                  Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  25

                                  Appendix B Demographic Model

                                  The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                  Factor Coefficient t-score

                                  Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  26

                                  Appendix C Reduced Model

                                  The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                  This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                  Factor Coefficient t-score

                                  Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                  27

                                  9 5 18 9 8 13

                                  + 8 + 9 minus10 minus 6 + 3 minus 5

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  10 2 3 10 12 9

                                  minus 3 + 7 + 8 + 1 minus10 minus 1

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  5 8 3 19 7 16 3

                                  + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  3 7 15 0 4 14 7

                                  minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  9 20 11 4 9 6 1

                                  + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  9 12 12 2 5 9 5

                                  +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  4 14 7 11 7 4 6

                                  + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  2 12 14 4 10 1 7

                                  + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  13 10 3 9 17 10 3

                                  minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  4 10 10 3 5 5 10

                                  + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                  Appendix B Sample AdditionSubtraction Probe

                                  • Study Characteristics
                                    • Intervention Condition
                                    • Comparison Condition
                                    • Setting
                                    • Participants
                                      • Study Design and Analysis
                                        • Sample Formation
                                        • Outcome Measures
                                          • Outcomes
                                          • Probes
                                          • Administrations
                                          • Fluency Score Calculation
                                            • Validity
                                            • Reliability
                                            • Analytic Approach
                                            • Statistical Adjustments
                                            • Students Removed from Study
                                            • Missing Data
                                              • Frustration Level
                                              • Instructional Level
                                                • Mastery Level
                                                  • Study Data
                                                    • Pre-Intervention DatamdashAll Pretest Takers
                                                    • Pre-Intervention DatamdashBaseline Sample
                                                    • Pre-intervention Data Analytic Sample
                                                    • Post-intervention Data and Findings
                                                      • Analytic Sample
                                                      • Analytic Sample with No Imputation
                                                        • Subpopulation Analyses
                                                          • Acknowledgment
                                                          • Appendices
                                                          • Appendix Full Model
                                                          • Appendix Demographic Model
                                                          • Appendix Reduced Model

                                    32 Pre-Intervention DatamdashBaseline Sample

                                    This section includes all students who were formally part of the analysisincluding those who were absent for the posttest

                                    Outcome Data

                                    Measure Comparison Group Intervention Group

                                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                    Fluency Score 4 64 457 112 4 66 460 120

                                    Background Data

                                    VariableComparison Intervention

                                    Mean SD Mean SD

                                    Age 8444 0556 8405 0581ESE 0250 0436 0258 0441Grade3 0484 0504 0470 0503LEP 0250 0436 0152 0361Lunch 0219 0417 0182 0389Male 0531 0503 0394 0492Pretest accuracy 0932 0087 0916 0102Pretest speed 4268 1147 4324 1213

                                    18

                                    33 Pre-intervention Data Analytic Sample

                                    Outcome DatamdashAnalytic Sample

                                    Measure Comparison Group Intervention Group

                                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                    Fact Fluency 4 64 4573 1121 4 65 4580 1195

                                    Background DatamdashAnalytic Sample

                                    VariableComparison Intervention

                                    Mean SD Mean SD

                                    Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                                    Outcome DatamdashAnalytic Sample with No Imputation

                                    Measure Comparison Group Intervention Group

                                    Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                    Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                    Fact Fluency 4 61 4557 1143 4 61 4640 1142

                                    19

                                    Background DatamdashAnalytic Sample with No Imputation

                                    VariableComparison Intervention

                                    Mean SD Mean SD

                                    Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                                    34 Post-intervention Data and Findings

                                    341 Analytic Sample

                                    As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                                    Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                                    Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                    Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                                    4Nminus9for small

                                    effect size

                                    20

                                    Estimation of Effect SizemdashAnalytic Sample

                                    Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                    Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    342 Analytic Sample with No Imputation

                                    Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                                    Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                                    Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                                    Model Comparison Group Intervention Group Estimated Effect

                                    Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                    Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    Estimation of Effect SizemdashAnalytic Sample with No Imputation

                                    Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                    Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    21

                                    35 Subpopulation Analyses

                                    We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                    Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                    Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                    Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    4 Acknowledgment

                                    We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                    This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                    The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                    22

                                    References

                                    Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                    Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                    Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                    Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                    Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                    Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                    Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                    Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                    UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                    VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                    Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                    Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                    23

                                    Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                    24

                                    Appendix A Full Model

                                    The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                    Factor Coefficient t-score

                                    Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    25

                                    Appendix B Demographic Model

                                    The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                    Factor Coefficient t-score

                                    Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    26

                                    Appendix C Reduced Model

                                    The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                    This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                    Factor Coefficient t-score

                                    Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                    27

                                    9 5 18 9 8 13

                                    + 8 + 9 minus10 minus 6 + 3 minus 5

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    10 2 3 10 12 9

                                    minus 3 + 7 + 8 + 1 minus10 minus 1

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    5 8 3 19 7 16 3

                                    + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    3 7 15 0 4 14 7

                                    minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    9 20 11 4 9 6 1

                                    + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    9 12 12 2 5 9 5

                                    +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    4 14 7 11 7 4 6

                                    + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    2 12 14 4 10 1 7

                                    + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    13 10 3 9 17 10 3

                                    minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    4 10 10 3 5 5 10

                                    + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                    Appendix B Sample AdditionSubtraction Probe

                                    • Study Characteristics
                                      • Intervention Condition
                                      • Comparison Condition
                                      • Setting
                                      • Participants
                                        • Study Design and Analysis
                                          • Sample Formation
                                          • Outcome Measures
                                            • Outcomes
                                            • Probes
                                            • Administrations
                                            • Fluency Score Calculation
                                              • Validity
                                              • Reliability
                                              • Analytic Approach
                                              • Statistical Adjustments
                                              • Students Removed from Study
                                              • Missing Data
                                                • Frustration Level
                                                • Instructional Level
                                                  • Mastery Level
                                                    • Study Data
                                                      • Pre-Intervention DatamdashAll Pretest Takers
                                                      • Pre-Intervention DatamdashBaseline Sample
                                                      • Pre-intervention Data Analytic Sample
                                                      • Post-intervention Data and Findings
                                                        • Analytic Sample
                                                        • Analytic Sample with No Imputation
                                                          • Subpopulation Analyses
                                                            • Acknowledgment
                                                            • Appendices
                                                            • Appendix Full Model
                                                            • Appendix Demographic Model
                                                            • Appendix Reduced Model

                                      33 Pre-intervention Data Analytic Sample

                                      Outcome DatamdashAnalytic Sample

                                      Measure Comparison Group Intervention Group

                                      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                      Fact Fluency 4 64 4573 1121 4 65 4580 1195

                                      Background DatamdashAnalytic Sample

                                      VariableComparison Intervention

                                      Mean SD Mean SD

                                      Age 8444 0556 8414 0580ESE 0250 0436 0262 0443Grade3 0484 0504 0477 0503LEP 0250 0436 0154 0364Lunch 0219 0417 0185 0391Male 0531 0503 0400 0494Pretest accuracy 0932 0087 0914 0102Pretest speed 4268 1147 4307 1214

                                      Outcome DatamdashAnalytic Sample with No Imputation

                                      Measure Comparison Group Intervention Group

                                      Sample Sizes Sample Characteristics Sample Sizes Sample Characteristics

                                      Unit of Unit of Mean Standard Unit of Unit of Mean StandardAssignment Analysis Deviation Assignment Analysis Deviation

                                      Fact Fluency 4 61 4557 1143 4 61 4640 1142

                                      19

                                      Background DatamdashAnalytic Sample with No Imputation

                                      VariableComparison Intervention

                                      Mean SD Mean SD

                                      Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                                      34 Post-intervention Data and Findings

                                      341 Analytic Sample

                                      As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                                      Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                                      Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                      Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                                      4Nminus9for small

                                      effect size

                                      20

                                      Estimation of Effect SizemdashAnalytic Sample

                                      Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                      Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      342 Analytic Sample with No Imputation

                                      Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                                      Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                                      Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                                      Model Comparison Group Intervention Group Estimated Effect

                                      Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                      Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      Estimation of Effect SizemdashAnalytic Sample with No Imputation

                                      Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                      Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      21

                                      35 Subpopulation Analyses

                                      We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                      Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                      Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                      Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      4 Acknowledgment

                                      We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                      This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                      The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                      22

                                      References

                                      Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                      Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                      Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                      Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                      Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                      Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                      Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                      Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                      UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                      VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                      Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                      Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                      23

                                      Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                      24

                                      Appendix A Full Model

                                      The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                      Factor Coefficient t-score

                                      Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      25

                                      Appendix B Demographic Model

                                      The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                      Factor Coefficient t-score

                                      Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      26

                                      Appendix C Reduced Model

                                      The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                      This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                      Factor Coefficient t-score

                                      Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                      27

                                      9 5 18 9 8 13

                                      + 8 + 9 minus10 minus 6 + 3 minus 5

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      10 2 3 10 12 9

                                      minus 3 + 7 + 8 + 1 minus10 minus 1

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      5 8 3 19 7 16 3

                                      + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      3 7 15 0 4 14 7

                                      minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      9 20 11 4 9 6 1

                                      + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      9 12 12 2 5 9 5

                                      +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      4 14 7 11 7 4 6

                                      + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      2 12 14 4 10 1 7

                                      + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      13 10 3 9 17 10 3

                                      minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      4 10 10 3 5 5 10

                                      + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                      Appendix B Sample AdditionSubtraction Probe

                                      • Study Characteristics
                                        • Intervention Condition
                                        • Comparison Condition
                                        • Setting
                                        • Participants
                                          • Study Design and Analysis
                                            • Sample Formation
                                            • Outcome Measures
                                              • Outcomes
                                              • Probes
                                              • Administrations
                                              • Fluency Score Calculation
                                                • Validity
                                                • Reliability
                                                • Analytic Approach
                                                • Statistical Adjustments
                                                • Students Removed from Study
                                                • Missing Data
                                                  • Frustration Level
                                                  • Instructional Level
                                                    • Mastery Level
                                                      • Study Data
                                                        • Pre-Intervention DatamdashAll Pretest Takers
                                                        • Pre-Intervention DatamdashBaseline Sample
                                                        • Pre-intervention Data Analytic Sample
                                                        • Post-intervention Data and Findings
                                                          • Analytic Sample
                                                          • Analytic Sample with No Imputation
                                                            • Subpopulation Analyses
                                                              • Acknowledgment
                                                              • Appendices
                                                              • Appendix Full Model
                                                              • Appendix Demographic Model
                                                              • Appendix Reduced Model

                                        Background DatamdashAnalytic Sample with No Imputation

                                        VariableComparison Intervention

                                        Mean SD Mean SD

                                        Age 8436 0556 8394 0568ESE 0246 0434 0279 0452Grade3 0492 0504 0459 0502LEP 0262 0444 0148 0358Lunch 0230 0424 0180 0388Male 0541 0502 0410 0496Pretest accuracy 0930 0088 0922 0084Pretest speed 4254 1170 4360 1177

                                        34 Post-intervention Data and Findings

                                        341 Analytic Sample

                                        As grand-centered means were used for all Level-1 covariates and grade wasthe only Level-2 covariate other than condition adjusted means for eachgroup were estimated from the Constant term of the HLM model the grade3coefficient of the HLM model the average value of the grade3 variable acrossall students and (in the case of the intervention group) the treatment coef-ficient of the HLM model Standard Deviations are unadjusted

                                        Outcome Data and Statistical SignificancemdashAnalytic SampleModel Comparison Group Intervention Group Estimated Effect

                                        Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                        Full Model 64 504 1099 65 597 1093 0927lowastlowastlowast 5753Demographic Model 64 513 1099 65 597 1093 0836lowastlowastlowast 4966Reduced Model 64 508 1099 65 595 1093 0867lowastlowastlowast 4343

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        Effect size was calculated based on adjusted means unadjusted pooledwithin-group standard deviations and a correction ω = 1 minus 3

                                        4Nminus9for small

                                        effect size

                                        20

                                        Estimation of Effect SizemdashAnalytic Sample

                                        Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                        Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        342 Analytic Sample with No Imputation

                                        Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                                        Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                                        Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                                        Model Comparison Group Intervention Group Estimated Effect

                                        Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                        Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        Estimation of Effect SizemdashAnalytic Sample with No Imputation

                                        Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                        Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        21

                                        35 Subpopulation Analyses

                                        We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                        Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                        Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                        Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        4 Acknowledgment

                                        We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                        This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                        The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                        22

                                        References

                                        Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                        Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                        Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                        Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                        Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                        Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                        Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                        Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                        UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                        VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                        Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                        Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                        23

                                        Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                        24

                                        Appendix A Full Model

                                        The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                        Factor Coefficient t-score

                                        Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        25

                                        Appendix B Demographic Model

                                        The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                        Factor Coefficient t-score

                                        Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        26

                                        Appendix C Reduced Model

                                        The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                        This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                        Factor Coefficient t-score

                                        Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                        Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                        27

                                        9 5 18 9 8 13

                                        + 8 + 9 minus10 minus 6 + 3 minus 5

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        10 2 3 10 12 9

                                        minus 3 + 7 + 8 + 1 minus10 minus 1

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        5 8 3 19 7 16 3

                                        + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        3 7 15 0 4 14 7

                                        minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        9 20 11 4 9 6 1

                                        + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        9 12 12 2 5 9 5

                                        +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        4 14 7 11 7 4 6

                                        + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        2 12 14 4 10 1 7

                                        + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        13 10 3 9 17 10 3

                                        minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        4 10 10 3 5 5 10

                                        + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                        Appendix B Sample AdditionSubtraction Probe

                                        • Study Characteristics
                                          • Intervention Condition
                                          • Comparison Condition
                                          • Setting
                                          • Participants
                                            • Study Design and Analysis
                                              • Sample Formation
                                              • Outcome Measures
                                                • Outcomes
                                                • Probes
                                                • Administrations
                                                • Fluency Score Calculation
                                                  • Validity
                                                  • Reliability
                                                  • Analytic Approach
                                                  • Statistical Adjustments
                                                  • Students Removed from Study
                                                  • Missing Data
                                                    • Frustration Level
                                                    • Instructional Level
                                                      • Mastery Level
                                                        • Study Data
                                                          • Pre-Intervention DatamdashAll Pretest Takers
                                                          • Pre-Intervention DatamdashBaseline Sample
                                                          • Pre-intervention Data Analytic Sample
                                                          • Post-intervention Data and Findings
                                                            • Analytic Sample
                                                            • Analytic Sample with No Imputation
                                                              • Subpopulation Analyses
                                                                • Acknowledgment
                                                                • Appendices
                                                                • Appendix Full Model
                                                                • Appendix Demographic Model
                                                                • Appendix Reduced Model

                                          Estimation of Effect SizemdashAnalytic Sample

                                          Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                          Full Model 129 0927lowastlowastlowast 1096 084Demographic Model 129 0836lowastlowastlowast 1096 076Reduced Model 129 0867lowastlowastlowast 1096 079

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          342 Analytic Sample with No Imputation

                                          Analysis of students who were present for the interim assessment but absentfor post assessment indicated that a full case study would substantially un-derstate the effect of the intervention The covariate-adjusted effect of thetreatment on interim test scores was greater among students who missedthe post test than among those who were present for all three tests Thisis born out in the results of an analysis limited to those students where noimputation occurred

                                          Values for adjusted means for this subgroup were calculated by recenter-ing all Level-1 covariates and generating a new HLM with the same structureas for the full analytic sample but using only those participants with no miss-ing data

                                          Outcome Data and Statistical SignificancemdashAnalytic Sample with NoImputation

                                          Model Comparison Group Intervention Group Estimated Effect

                                          Students adj Mean unadj Standard Students adj Mean unadj Standard adj Mean Difference adj t-scoreDeviation Deviation

                                          Full Model 61 507 112 61 594 111 0867lowastlowastlowast 5255Demographic Model 61 515 112 61 594 111 0787lowastlowastlowast 4669Reduced Model 61 509 112 61 592 111 0828lowastlowastlowast 4249

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          Estimation of Effect SizemdashAnalytic Sample with No Imputation

                                          Model N Adjusted Mean (unadj) Pooled Effect SizeDifference Within-Group SD (adj Hedgesrsquo g)

                                          Full Model 122 0867lowastlowastlowast 1113 077Demographic Model 122 0787lowastlowastlowast 1113 070Reduced Model 122 0828lowastlowastlowast 1113 074

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          21

                                          35 Subpopulation Analyses

                                          We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                          Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                          Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                          Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          4 Acknowledgment

                                          We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                          This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                          The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                          22

                                          References

                                          Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                          Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                          Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                          Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                          Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                          Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                          Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                          Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                          UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                          VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                          Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                          Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                          23

                                          Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                          24

                                          Appendix A Full Model

                                          The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                          Factor Coefficient t-score

                                          Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          25

                                          Appendix B Demographic Model

                                          The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                          Factor Coefficient t-score

                                          Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          26

                                          Appendix C Reduced Model

                                          The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                          This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                          Factor Coefficient t-score

                                          Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                          Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                          27

                                          9 5 18 9 8 13

                                          + 8 + 9 minus10 minus 6 + 3 minus 5

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          10 2 3 10 12 9

                                          minus 3 + 7 + 8 + 1 minus10 minus 1

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          5 8 3 19 7 16 3

                                          + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          3 7 15 0 4 14 7

                                          minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          9 20 11 4 9 6 1

                                          + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          9 12 12 2 5 9 5

                                          +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          4 14 7 11 7 4 6

                                          + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          2 12 14 4 10 1 7

                                          + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          13 10 3 9 17 10 3

                                          minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          4 10 10 3 5 5 10

                                          + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                          macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                          Appendix B Sample AdditionSubtraction Probe

                                          • Study Characteristics
                                            • Intervention Condition
                                            • Comparison Condition
                                            • Setting
                                            • Participants
                                              • Study Design and Analysis
                                                • Sample Formation
                                                • Outcome Measures
                                                  • Outcomes
                                                  • Probes
                                                  • Administrations
                                                  • Fluency Score Calculation
                                                    • Validity
                                                    • Reliability
                                                    • Analytic Approach
                                                    • Statistical Adjustments
                                                    • Students Removed from Study
                                                    • Missing Data
                                                      • Frustration Level
                                                      • Instructional Level
                                                        • Mastery Level
                                                          • Study Data
                                                            • Pre-Intervention DatamdashAll Pretest Takers
                                                            • Pre-Intervention DatamdashBaseline Sample
                                                            • Pre-intervention Data Analytic Sample
                                                            • Post-intervention Data and Findings
                                                              • Analytic Sample
                                                              • Analytic Sample with No Imputation
                                                                • Subpopulation Analyses
                                                                  • Acknowledgment
                                                                  • Appendices
                                                                  • Appendix Full Model
                                                                  • Appendix Demographic Model
                                                                  • Appendix Reduced Model

                                            35 Subpopulation Analyses

                                            We analyzed sub-populations by grade We also analyzed the sub-populationof students not designated as exceptional students Due to the smaller samplesizes the Reduced Model was used for the analyses except grade was removedas a variable for subpopulations of constant grade

                                            Statistical Significance and Estimation of Effect SizeGroup N Adjusted Mean (unadj) Pooled Effect Size Adjusted

                                            Difference Within-Group SD (adj Hedgesrsquo g) t-score

                                            Grade 2 68 0739lowastlowast 094 078 246Grade 3 63 0877lowastlowast 105 082 247Non-Exceptional Students 102 0904lowastlowastlowast 1101 089 463

                                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                            4 Acknowledgment

                                            We used R (Team 2013) for some of the analysis in this report especiallythe lme4 package for fitting mixed models (Bates Machler Bolker amp Walker2015) Other libraries utilized were dplyr tidyr and magrittr (Wickham ampFrancois 2016 Bache amp Wickham 2014 Wickham 2016)

                                            This document was typeset using LATEX and makes use of the harvardbooktabs multirow graphicx and url packages

                                            The stargazer package was used to generate LATEX for several of the tables(Hlavac 2013)

                                            22

                                            References

                                            Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                            Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                            Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                            Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                            Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                            Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                            Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                            Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                            UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                            VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                            Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                            Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                            23

                                            Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                            24

                                            Appendix A Full Model

                                            The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                            Factor Coefficient t-score

                                            Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                            25

                                            Appendix B Demographic Model

                                            The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                            Factor Coefficient t-score

                                            Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                            26

                                            Appendix C Reduced Model

                                            The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                            This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                            Factor Coefficient t-score

                                            Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                            Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                            27

                                            9 5 18 9 8 13

                                            + 8 + 9 minus10 minus 6 + 3 minus 5

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            10 2 3 10 12 9

                                            minus 3 + 7 + 8 + 1 minus10 minus 1

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            5 8 3 19 7 16 3

                                            + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            3 7 15 0 4 14 7

                                            minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            9 20 11 4 9 6 1

                                            + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            9 12 12 2 5 9 5

                                            +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            4 14 7 11 7 4 6

                                            + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            2 12 14 4 10 1 7

                                            + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            13 10 3 9 17 10 3

                                            minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            4 10 10 3 5 5 10

                                            + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                            macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                            Appendix B Sample AdditionSubtraction Probe

                                            • Study Characteristics
                                              • Intervention Condition
                                              • Comparison Condition
                                              • Setting
                                              • Participants
                                                • Study Design and Analysis
                                                  • Sample Formation
                                                  • Outcome Measures
                                                    • Outcomes
                                                    • Probes
                                                    • Administrations
                                                    • Fluency Score Calculation
                                                      • Validity
                                                      • Reliability
                                                      • Analytic Approach
                                                      • Statistical Adjustments
                                                      • Students Removed from Study
                                                      • Missing Data
                                                        • Frustration Level
                                                        • Instructional Level
                                                          • Mastery Level
                                                            • Study Data
                                                              • Pre-Intervention DatamdashAll Pretest Takers
                                                              • Pre-Intervention DatamdashBaseline Sample
                                                              • Pre-intervention Data Analytic Sample
                                                              • Post-intervention Data and Findings
                                                                • Analytic Sample
                                                                • Analytic Sample with No Imputation
                                                                  • Subpopulation Analyses
                                                                    • Acknowledgment
                                                                    • Appendices
                                                                    • Appendix Full Model
                                                                    • Appendix Demographic Model
                                                                    • Appendix Reduced Model

                                              References

                                              Bache S M amp Wickham H (2014) magrittr A Forward-Pipe Operator for RhttpsCRANR-projectorgpackage=magrittr

                                              Bates D Machler M Bolker B amp Walker S (2015) lsquoFitting linear mixed-effects models using lme4rsquo Journal of Statistical Software 67(1) 1ndash48

                                              Burns M K VanDerHeyden A M amp Jiban C L (2006) lsquoAssessing the instruc-tional level for mathematics A comparison of methodsrsquo School PsychologyReview 35(3) 401

                                              Hintze J M Christ T J amp Keller L A (2002) lsquoThe generalizability of cbmsurvey-level mathematics assessments Just how many samples do we needrsquoSchool Psychology Review 31(4) 514

                                              Hlavac M (2013) lsquostargazer Latex code and ascii text for well-formatted regres-sion and summary statistics tablesrsquo http CRANR-projectorg package=stargazer

                                              Osborne J (2005) lsquoNotes on the use of data transformationsrsquo Practical Assess-ment Research and Evaluation 9(1) 42ndash50

                                              Stevens O amp Leigh E (2012) lsquoMathematics curriculum based measurement topredict state test performance A comparison of measures and methodsrsquoProQuest LLC

                                              Team R C (2013) R A Language and Environment for Statistical Comput-ing R Foundation for Statistical Computing Vienna Austria httpwwwR-projectorg

                                              UTA (2015) Multilevel modeling tutorial using sas stata hlm r spss and mplusTechnical report The Department of Statistics and Data Sciences The Uni-versity of Texas at Austin httpstatutexaseduimagesSSCdocumentsSoftwareTutorialsMultilevelModelingpdf

                                              VanDerHeyden A M amp Burns M K (2008) lsquoExamination of the utility ofvarious measures of mathematics proficiencyrsquo Assessment for Effective In-tervention

                                              Wickham H (2016) tidyr Easily Tidy Data with lsquospread()lsquo and lsquogather()lsquo Func-tions httpsCRANR-projectorgpackage=tidyr

                                              Wickham H amp Francois R (2016) dplyr A Grammar of Data ManipulationhttpsCRANR-projectorgpackage=dplyr

                                              23

                                              Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                              24

                                              Appendix A Full Model

                                              The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                              Factor Coefficient t-score

                                              Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                              25

                                              Appendix B Demographic Model

                                              The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                              Factor Coefficient t-score

                                              Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                              26

                                              Appendix C Reduced Model

                                              The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                              This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                              Factor Coefficient t-score

                                              Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                              Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                              27

                                              9 5 18 9 8 13

                                              + 8 + 9 minus10 minus 6 + 3 minus 5

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              10 2 3 10 12 9

                                              minus 3 + 7 + 8 + 1 minus10 minus 1

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              5 8 3 19 7 16 3

                                              + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              3 7 15 0 4 14 7

                                              minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              9 20 11 4 9 6 1

                                              + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              9 12 12 2 5 9 5

                                              +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              4 14 7 11 7 4 6

                                              + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              2 12 14 4 10 1 7

                                              + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              13 10 3 9 17 10 3

                                              minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              4 10 10 3 5 5 10

                                              + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                              macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                              Appendix B Sample AdditionSubtraction Probe

                                              • Study Characteristics
                                                • Intervention Condition
                                                • Comparison Condition
                                                • Setting
                                                • Participants
                                                  • Study Design and Analysis
                                                    • Sample Formation
                                                    • Outcome Measures
                                                      • Outcomes
                                                      • Probes
                                                      • Administrations
                                                      • Fluency Score Calculation
                                                        • Validity
                                                        • Reliability
                                                        • Analytic Approach
                                                        • Statistical Adjustments
                                                        • Students Removed from Study
                                                        • Missing Data
                                                          • Frustration Level
                                                          • Instructional Level
                                                            • Mastery Level
                                                              • Study Data
                                                                • Pre-Intervention DatamdashAll Pretest Takers
                                                                • Pre-Intervention DatamdashBaseline Sample
                                                                • Pre-intervention Data Analytic Sample
                                                                • Post-intervention Data and Findings
                                                                  • Analytic Sample
                                                                  • Analytic Sample with No Imputation
                                                                    • Subpopulation Analyses
                                                                      • Acknowledgment
                                                                      • Appendices
                                                                      • Appendix Full Model
                                                                      • Appendix Demographic Model
                                                                      • Appendix Reduced Model

                                                Woltman H Feldstain A MacKay J C amp Rocchi M (2012) lsquoAn introduc-tion to hierarchical linear modelingrsquo Tutorials in Quantitative Methods forPsychology 8(1) 52ndash69

                                                24

                                                Appendix A Full Model

                                                The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                                Factor Coefficient t-score

                                                Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                25

                                                Appendix B Demographic Model

                                                The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                                Factor Coefficient t-score

                                                Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                26

                                                Appendix C Reduced Model

                                                The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                                This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                                Factor Coefficient t-score

                                                Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                                Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                27

                                                9 5 18 9 8 13

                                                + 8 + 9 minus10 minus 6 + 3 minus 5

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                10 2 3 10 12 9

                                                minus 3 + 7 + 8 + 1 minus10 minus 1

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                5 8 3 19 7 16 3

                                                + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                3 7 15 0 4 14 7

                                                minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                9 20 11 4 9 6 1

                                                + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                9 12 12 2 5 9 5

                                                +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                4 14 7 11 7 4 6

                                                + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                2 12 14 4 10 1 7

                                                + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                13 10 3 9 17 10 3

                                                minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                4 10 10 3 5 5 10

                                                + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                                macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                Appendix B Sample AdditionSubtraction Probe

                                                • Study Characteristics
                                                  • Intervention Condition
                                                  • Comparison Condition
                                                  • Setting
                                                  • Participants
                                                    • Study Design and Analysis
                                                      • Sample Formation
                                                      • Outcome Measures
                                                        • Outcomes
                                                        • Probes
                                                        • Administrations
                                                        • Fluency Score Calculation
                                                          • Validity
                                                          • Reliability
                                                          • Analytic Approach
                                                          • Statistical Adjustments
                                                          • Students Removed from Study
                                                          • Missing Data
                                                            • Frustration Level
                                                            • Instructional Level
                                                              • Mastery Level
                                                                • Study Data
                                                                  • Pre-Intervention DatamdashAll Pretest Takers
                                                                  • Pre-Intervention DatamdashBaseline Sample
                                                                  • Pre-intervention Data Analytic Sample
                                                                  • Post-intervention Data and Findings
                                                                    • Analytic Sample
                                                                    • Analytic Sample with No Imputation
                                                                      • Subpopulation Analyses
                                                                        • Acknowledgment
                                                                        • Appendices
                                                                        • Appendix Full Model
                                                                        • Appendix Demographic Model
                                                                        • Appendix Reduced Model

                                                  Appendix A Full Model

                                                  The table below describes the fixed-effects data of the full HLM for the analyticsample This model uses grand-mean-centered values for all level-1 variables scaledto be univariate These variables are prefixed with ldquocrdquo to indicate this

                                                  Factor Coefficient t-score

                                                  Intercept 4709lowastlowastlowast 25719cage 0136 0887cLEP minus0015 minus0111cLunch minus0122 minus0981cprescore minus2220 minus0932cgender minus0148 minus1518cESE 0054 0272cprespeed 2471 1177cpreaccuracy 0750 1313treatment 0927lowastlowastlowast 5753grade3 0695lowastlowastlowast 3335cageytreatment 0076 0469cageygrade3 minus0224 minus1134cLEPtreatment minus0021 minus0123cLEPgrade3 minus0027 minus0156cLunchtreatment 0195 1440cLunchgrade3 0174 1269cprescoretreatment 1881 1056cprescoregrade3 2097 0885cgendertreatment 0096 0836cgendergrade3 0180 1564cESEtreatment minus0036 minus0204cESEgrade3 minus0013 minus0073cprespeedtreatment minus1327 minus0849cprespeedgrade3 minus1581 minus0753cpreaccuracytreatment minus0712lowast minus1657cpreaccuracygrade3 minus0608 minus1101

                                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                  25

                                                  Appendix B Demographic Model

                                                  The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                                  Factor Coefficient t-score

                                                  Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                  26

                                                  Appendix C Reduced Model

                                                  The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                                  This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                                  Factor Coefficient t-score

                                                  Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                                  Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                  27

                                                  9 5 18 9 8 13

                                                  + 8 + 9 minus10 minus 6 + 3 minus 5

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  10 2 3 10 12 9

                                                  minus 3 + 7 + 8 + 1 minus10 minus 1

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  5 8 3 19 7 16 3

                                                  + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  3 7 15 0 4 14 7

                                                  minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  9 20 11 4 9 6 1

                                                  + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  9 12 12 2 5 9 5

                                                  +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  4 14 7 11 7 4 6

                                                  + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  2 12 14 4 10 1 7

                                                  + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  13 10 3 9 17 10 3

                                                  minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  4 10 10 3 5 5 10

                                                  + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                                  macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                  Appendix B Sample AdditionSubtraction Probe

                                                  • Study Characteristics
                                                    • Intervention Condition
                                                    • Comparison Condition
                                                    • Setting
                                                    • Participants
                                                      • Study Design and Analysis
                                                        • Sample Formation
                                                        • Outcome Measures
                                                          • Outcomes
                                                          • Probes
                                                          • Administrations
                                                          • Fluency Score Calculation
                                                            • Validity
                                                            • Reliability
                                                            • Analytic Approach
                                                            • Statistical Adjustments
                                                            • Students Removed from Study
                                                            • Missing Data
                                                              • Frustration Level
                                                              • Instructional Level
                                                                • Mastery Level
                                                                  • Study Data
                                                                    • Pre-Intervention DatamdashAll Pretest Takers
                                                                    • Pre-Intervention DatamdashBaseline Sample
                                                                    • Pre-intervention Data Analytic Sample
                                                                    • Post-intervention Data and Findings
                                                                      • Analytic Sample
                                                                      • Analytic Sample with No Imputation
                                                                        • Subpopulation Analyses
                                                                          • Acknowledgment
                                                                          • Appendices
                                                                          • Appendix Full Model
                                                                          • Appendix Demographic Model
                                                                          • Appendix Reduced Model

                                                    Appendix B Demographic Model

                                                    The table below describes the fixed-effects data of the full HLM for the analyticsample but excludes the pretest features of speed and accuracy This model usesgrand-mean-centered values for all level-1 variables scaled to be univariate Thesevariables are prefixed with ldquocrdquo to indicate this

                                                    Factor Coefficient t-score

                                                    Intercept 4906lowastlowastlowast 27157cage 0230 1554cLEP 0042 0315cLunch minus0128 minus1225cprescore 0756lowastlowastlowast 5872cgender minus0124 minus1283cESE 0172 0855treatment 0836lowastlowastlowast 4966grade3 0480lowastlowast 2235cageytreatment minus0003 minus0022cageygrade3 minus0270 minus1378cLEPtreatment minus0059 minus0346cLEPgrade3 minus0071 minus0408cLunchtreatment 0313lowastlowast 2631cLunchgrade3 0174 1459cprescoretreatment 0003 0020cprescoregrade3 0094 0685cgendertreatment 0122 1062cgendergrade3 0120 1039cESEtreatment minus0131 minus0734cESEgrade3 minus0117 minus0642

                                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                    26

                                                    Appendix C Reduced Model

                                                    The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                                    This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                                    Factor Coefficient t-score

                                                    Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                                    Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                    27

                                                    9 5 18 9 8 13

                                                    + 8 + 9 minus10 minus 6 + 3 minus 5

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    10 2 3 10 12 9

                                                    minus 3 + 7 + 8 + 1 minus10 minus 1

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    5 8 3 19 7 16 3

                                                    + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    3 7 15 0 4 14 7

                                                    minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    9 20 11 4 9 6 1

                                                    + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    9 12 12 2 5 9 5

                                                    +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    4 14 7 11 7 4 6

                                                    + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    2 12 14 4 10 1 7

                                                    + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    13 10 3 9 17 10 3

                                                    minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    4 10 10 3 5 5 10

                                                    + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                                    macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                    Appendix B Sample AdditionSubtraction Probe

                                                    • Study Characteristics
                                                      • Intervention Condition
                                                      • Comparison Condition
                                                      • Setting
                                                      • Participants
                                                        • Study Design and Analysis
                                                          • Sample Formation
                                                          • Outcome Measures
                                                            • Outcomes
                                                            • Probes
                                                            • Administrations
                                                            • Fluency Score Calculation
                                                              • Validity
                                                              • Reliability
                                                              • Analytic Approach
                                                              • Statistical Adjustments
                                                              • Students Removed from Study
                                                              • Missing Data
                                                                • Frustration Level
                                                                • Instructional Level
                                                                  • Mastery Level
                                                                    • Study Data
                                                                      • Pre-Intervention DatamdashAll Pretest Takers
                                                                      • Pre-Intervention DatamdashBaseline Sample
                                                                      • Pre-intervention Data Analytic Sample
                                                                      • Post-intervention Data and Findings
                                                                        • Analytic Sample
                                                                        • Analytic Sample with No Imputation
                                                                          • Subpopulation Analyses
                                                                            • Acknowledgment
                                                                            • Appendices
                                                                            • Appendix Full Model
                                                                            • Appendix Demographic Model
                                                                            • Appendix Reduced Model

                                                      Appendix C Reduced Model

                                                      The table below describes the fixed-effects data of the HLM for the analytic sampleretaining only age This model uses grand-mean-centered values for all level-1 variables scaled to be univariate These variables are prefixed with ldquocrdquo toindicate this

                                                      This model used Restricted Maximum Likelihood as there were convergenceproblems when using maximum likelihood

                                                      Factor Coefficient t-score

                                                      Intercept 4827lowastlowastlowast 25683cage 0227 1146cLunch minus0111 minus0952cprescore 0758lowastlowastlowast 5488treatment 0867lowastlowastlowast 4343grade3 0527lowastlowast 2219cageytreatment minus0031 minus0154cageygrade3 minus0168 minus0692cLunchtreatment 0268lowast 1936cLunchgrade3 0153 1109cprescoretreatment minus0042 minus0286cprescoregrade3 0100 0677

                                                      Note lowastplt01 lowastlowastplt005 lowastlowastlowastplt0001

                                                      27

                                                      9 5 18 9 8 13

                                                      + 8 + 9 minus10 minus 6 + 3 minus 5

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      10 2 3 10 12 9

                                                      minus 3 + 7 + 8 + 1 minus10 minus 1

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      5 8 3 19 7 16 3

                                                      + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      3 7 15 0 4 14 7

                                                      minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      9 20 11 4 9 6 1

                                                      + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      9 12 12 2 5 9 5

                                                      +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      4 14 7 11 7 4 6

                                                      + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      2 12 14 4 10 1 7

                                                      + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      13 10 3 9 17 10 3

                                                      minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      4 10 10 3 5 5 10

                                                      + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                                      macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                      Appendix B Sample AdditionSubtraction Probe

                                                      • Study Characteristics
                                                        • Intervention Condition
                                                        • Comparison Condition
                                                        • Setting
                                                        • Participants
                                                          • Study Design and Analysis
                                                            • Sample Formation
                                                            • Outcome Measures
                                                              • Outcomes
                                                              • Probes
                                                              • Administrations
                                                              • Fluency Score Calculation
                                                                • Validity
                                                                • Reliability
                                                                • Analytic Approach
                                                                • Statistical Adjustments
                                                                • Students Removed from Study
                                                                • Missing Data
                                                                  • Frustration Level
                                                                  • Instructional Level
                                                                    • Mastery Level
                                                                      • Study Data
                                                                        • Pre-Intervention DatamdashAll Pretest Takers
                                                                        • Pre-Intervention DatamdashBaseline Sample
                                                                        • Pre-intervention Data Analytic Sample
                                                                        • Post-intervention Data and Findings
                                                                          • Analytic Sample
                                                                          • Analytic Sample with No Imputation
                                                                            • Subpopulation Analyses
                                                                              • Acknowledgment
                                                                              • Appendices
                                                                              • Appendix Full Model
                                                                              • Appendix Demographic Model
                                                                              • Appendix Reduced Model

                                                        9 5 18 9 8 13

                                                        + 8 + 9 minus10 minus 6 + 3 minus 5

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        10 2 3 10 12 9

                                                        minus 3 + 7 + 8 + 1 minus10 minus 1

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        5 8 3 19 7 16 3

                                                        + 0 + 4 + 6 minus 9 minus 1 minus10 + 0

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        3 7 15 0 4 14 7

                                                        minus 1 + 5 minus 5 + 9 + 3 minus 5 minus 5

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        9 20 11 4 9 6 1

                                                        + 9 minus10 minus 3 minus 4 + 0 minus 1 +10

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        9 12 12 2 5 9 5

                                                        +10 minus 3 minus 9 + 8 minus 0 minus 4 + 0

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        4 14 7 11 7 4 6

                                                        + 4 minus 9 minus 0 minus 8 + 0 minus 1 + 5

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        2 12 14 4 10 1 7

                                                        + 3 minus 5 minus 5 minus 4 +10 + 0 + 2

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        13 10 3 9 17 10 3

                                                        minus 6 +10 + 6 minus 6 minus 7 +10 + 6

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        4 10 10 3 5 5 10

                                                        + 9 + 2 +10 minus 0 + 3 minus 5 minus10

                                                        macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr macrmacrmacr

                                                        Appendix B Sample AdditionSubtraction Probe

                                                        • Study Characteristics
                                                          • Intervention Condition
                                                          • Comparison Condition
                                                          • Setting
                                                          • Participants
                                                            • Study Design and Analysis
                                                              • Sample Formation
                                                              • Outcome Measures
                                                                • Outcomes
                                                                • Probes
                                                                • Administrations
                                                                • Fluency Score Calculation
                                                                  • Validity
                                                                  • Reliability
                                                                  • Analytic Approach
                                                                  • Statistical Adjustments
                                                                  • Students Removed from Study
                                                                  • Missing Data
                                                                    • Frustration Level
                                                                    • Instructional Level
                                                                      • Mastery Level
                                                                        • Study Data
                                                                          • Pre-Intervention DatamdashAll Pretest Takers
                                                                          • Pre-Intervention DatamdashBaseline Sample
                                                                          • Pre-intervention Data Analytic Sample
                                                                          • Post-intervention Data and Findings
                                                                            • Analytic Sample
                                                                            • Analytic Sample with No Imputation
                                                                              • Subpopulation Analyses
                                                                                • Acknowledgment
                                                                                • Appendices
                                                                                • Appendix Full Model
                                                                                • Appendix Demographic Model
                                                                                • Appendix Reduced Model

                                                          top related