Chinese/English Journal of Educational Measurement and Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊 Evaluation | 教育测量与评估双语季刊 Volume 1 Issue 1 Article 6 2020 二十世纪参数项目反应理论模型思想史 二十世纪参数项目反应理论模型思想史 David Thissen Lynne Steinberg Follow this and additional works at: https://www.ce-jeme.org/journal Recommended Citation Recommended Citation Thissen, David and Steinberg, Lynne (2020) "二十世纪参数项目反应理论模型思想史," Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊: Vol. 1 : Iss. 1 , Article 6. Available at: https://www.ce-jeme.org/journal/vol1/iss1/6 This Article is brought to you for free and open access by Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊. It has been accepted for inclusion in Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊 by an authorized editor of Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊.
17
Embed
Chinese/English Journal of Educational Measurement and ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chinese/English Journal of Educational Measurement and Chinese/English Journal of Educational Measurement and
Evaluation | 教育测量与评估双语季刊 Evaluation | 教育测量与评估双语季刊
Volume 1 Issue 1 Article 6
2020
二十世纪参数项目反应理论模型思想史 二十世纪参数项目反应理论模型思想史
David Thissen
Lynne Steinberg
Follow this and additional works at: https://www.ce-jeme.org/journal
Recommended Citation Recommended Citation Thissen, David and Steinberg, Lynne (2020) "二十世纪参数项目反应理论模型思想史," Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊: Vol. 1 : Iss. 1 , Article 6. Available at: https://www.ce-jeme.org/journal/vol1/iss1/6
This Article is brought to you for free and open access by Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊. It has been accepted for inclusion in Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊 by an authorized editor of Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语季刊.
Percival Symonds 进一步完善了 E.L. Thorndike 提出的建议 (Thorndike et al., 1926),他通过分析 Ayres(1926)的《Measuring Scale f or Ability in Spelling》,为IRT的发展提供了另一种思路。2 在 Ayres (1915)的研
2在《T he Measurement o f Intelligence》中, E.L. Thorndike(1926) 将难度相似的智力测验题目称为“集合”(composites),并
究中,先是收集了英语写作中“最常用的 1000 个词汇”,并且在许多小学老师的帮助下收集了全美 84个城市各年级儿童的拼写测验数据。然后, Ayres 将这1000 个词汇分为 26 个列表,并以字母 A 到 Z 对列表命名。每个列表所包含的词汇按照以下规则安排,根据答对该词汇的儿童人数的百分比来计算标准正态离差,然后将具有相近标准正态离差的词汇归为同一列表。例如,列表 A 包括 me 和 do 两个单词,列表M包括 trust,extra,dress,beside等等,列表 V 包括principal,testimony,discussion,arrangement 等具有相同难度的单词,列表 Z 包括 judgment,recommend以及 allege。 Ayres (1915, p. 36) 写道,每个列表中的单词“几乎都具有同等的拼写难度”,并公开了所使用的单词列表和评分表,以便其余研究者用以与他的正态样本进行比较。
《Statistical T heories o f Mental Test Scores》出版之前,IRT 主要还是测验理论学家 (相对于测验实践者) 的一个概念模型。 Lord 和 Novick (1968) 整合了很多之前的工作,他们的著作和当时刚出现的电子计算机一起,标志着测验理论新纪元的开始。
5Lazarsfeld 的章节在《T he American Soldier》上发表时,Lord正在纽约市完成他的博士论文。然而,我们并不清楚 Lazarsfeld的研究成果对 Lord 有多少直接影响。 Lord (1952) 并未引用 Lazars-feld的研究,另外,在 Lord (1953a, 1953b)中也只是简单地提到了Lazarsfeld (1950)。
Lord 和 Novick (1968, p. 366) 整理了直至当时的IRT 的理论发展;他们描述了一种以正态肩形模型为基础的心理学理论,这种理论具备了 (几乎)所有必要的元素。图 4 的下半部分以 Lord 和 Novick (1968, p.371) 中的图 16.6.1 为灵感绘制,它反映了对于某个特定的题目,潜变量 θ (通常称为“能力”),和不可观察的反应过程变量 Y,阈值参数 γ,以及答对这道题的概率 T 之间的关系。图 4 表达的想法是:存在一个潜在的反应过程变
在过去的三四十年中, IRT 模型的具体应用实现了的爆炸式增长。在最近出版的《Handbook o f ItemResponse T heory, Volume One: Models》(van der Linden,2016b)13 就包括 33 个章节,将近 600 页。本文仅仅提到了该书所涵盖的模型中的一小部分,其中的大部分都是在过去几十年中出现的。大的通用类模型囊括了许多包括适用于多维潜变量的 IRT 扩展模型 (mul-tidimensional IRT, or MIRT; Reckase, 2009),以及阶层(hierarchical)或多水平 (multilevel)项目反应模型 (比如,Fox and Glas, 2001)。在广义潜变量模型的范围内,不同于传统的现代综合体将 IRT 和因素分析框架融合在一起 (Skrondal & Rabe-Hesketh, 2004; Rabe-Hesketh,Skrondal, & Pickles, 2004; Bock & Moustaki, 2007)。认知诊断模型 (cognitive diagnostic models)应用于结构化教育评估,以判断受试者是否掌握某项具体技能 (vonDavier & Lee, 2019)。更具体的模型包括非补偿性多维模型 (non-compensatory multidimensional models),它可用于成就测验或能力测验,以测量加工过程中涉及的多种能力 (比如, Embretson and Yang, 2013),或
13限于空间,在此仅引用关于这些主题的代表性文献。
者用于人格或态度量表,以测量项目组的反应 (比如,Thissen-Roe and Thissen, 2013)。还有一些针对相对不常用的反应模式和过程的模型 (比如, Mellenbergh, 1994;Roberts, Donoghue, and Laughlin, 2000),以及针对目前计算机化测验中会收集的反应时的模型 (比如, van derLinden, 2016)。解释性项目反应模型 (Explanatory itemresponse models)是为了解释和检验关于加工过程的心理学假设而建立的特殊 IRT 模型 (De Boeck & Wilson,2004)。同时,还有一些非参数的分析传统,旨在提供与参数 IRT 模型相似或补充性的数据分析结果 (比如,Sijtsma and Molenaar, 2002; Ramsay, 2016)。但这已经是 IRT 的当代发展而非历史了。总的来
说,IRT 是一个活跃的研究领域,并将继续扩大和发展。
参考文献Albert, J. H. (1992). Bayesian estimation of normal ogive
item response curves using Gibbs sampling. Jour-nal of Educational Statistics, 17, 251–269. Retrievedfrom https://doi.org/10.2307/1165149
Andersen, E. B. (1970). Asymptotic properties of con-ditional maximum-likelihood estimators. Journal ofthe Royal Statistical Society: Series B (Methodologi-cal), 32(2), 283–301. Retrieved from https://doi.org/10.1111/j.2517-6161.1970.tb00842.x
Andersen, E. B. (1972). The numerical solution of a setof conditional estimation equations. Journal of theRoyal Statistical Society: Series B (Methodological),34, 42–54. Retrieved from https://doi.org/10.1111/j.2517-6161.1972.tb00887.x
Andersen, E. B. (1973). Conditional inference and modelsfor measuring. Copenhagen: Mentalhygiejnisk for-lag.
Andersen, E. B. (1977). Sufficient statistics and latent traitmodels. Psychometrika, 42, 69–81. Retrieved fromhttps://doi.org/10.1007/BF02293746
Andrich, D. (1978). A rating formulation for ordered re-sponse categories. Psychometrika, 43, 561–573. Re-trieved from https://doi.org/10.1007/BF02293814
Andrich, D. (2016). Rasch rating-scale model. In W. J. vander Linden (Ed.), Handbook of item response theory,volume one: Models (pp. 75–94). Boca Raton, FL:Chapman & Hall/CRC.
Ayres, L. P. (1915). A measuring scale for ability inspelling. N.Y.: Russell Sage Foundation.
Berkson, J. (1953). A statistically precise and rel-atively simple method of estimating the bio-assay
55 Thissen & Steinberg
with quantal response, based on the logistic func-tion. Journal of the American Statistical Association,48, 565–599. Retrieved from https://doi.org/10.1080/01621459.1953.10483494
Berkson, J. (1957). Tables for the maximum likelihoodestimate of the logistic function. Biometrics, 13, 28–34. Retrieved from https://doi.org/10.2307/3001900
Birnbaum, A. (1968). Some latent trait models and theiruse in inferring an examinee’s ability. In F. M. Lord& M. R. Novick (Eds.), Statistical theories of mentaltest scores (pp. 392–479). Reading MA: Addison-Wesley.
Bock, R. D. (1972). Estimating item parameters and la-tent ability when responses are scored in two or morenominal categories. Psychometrika, 37, 29–51. Re-trieved from https://doi.org/10.1007/BF02291411
Bock, R. D. (1983). The mental growth curve reexamined.In D. J. Weiss (Ed.), New horizons in testing (pp. 205–219). N.Y.: Academic Press.
Bock, R. D. (1997). A brief history of item response theory.Educational Measurement: Issues and Practice, 16,21–33. Retrieved from https://doi.org/10.1111/j.1745-3992.1997.tb00605.x
Bock, R. D., & Aitkin, M. (1981). Marginal maximum like-lihood estimation of item parameters: Application ofan EM algorithm. Psychometrika, 46, 443–459. Re-trieved from https://doi.org/10.1007/BF02291262
Bock, R. D., & Lieberman, M. (1970). Fitting a responsemodel for n dichotomously scored items. Psychome-trika, 35, 179–197. Retrieved from https://doi.org/10.1007/BF02291262
Bock, R. D., & Moustaki, I. (2007). Item response theoryin a general framework. In C. R. Rao & S. Sinharay(Eds.), Handbook of Statistics Volume 26: Psycho-metrics (pp. 469–513). Amsterdam: North-Holland.
Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRTestimation of domain scores. Journal of EducationalMeasurement, 34, 197–211. Retrieved from https://doi.org/10.1111/j.1745-3984.1997.tb00515.x
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesianrandom effects model for testlets. Psychometrika,64, 153–168. Retrieved from https://doi.org/10.1007/BF02294533
Burt, C. (1922). Mental and scholastic tests. London,P.S.King.
Cai, L. (2017). flexMIRT® version 3.51: Flexible multi-level multidimensional item analysis and test scoring[Computer software]. Chapel Hill, NC: Vector Psy-
chometric Group.Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO
for Windows [Computer software]. Lincolnwood, IL:Scientific Software International.
Camilli, G. (1994). Origin of the scaling constant d=1.7in item response theory. Journal of Educational andBehavioral Statistics, 19, 293–295. Retrieved fromhttps://doi.org/10.2307/1165298
Chalmers, R. P. (2012). mirt: A Multidimensional Item Re-sponse Theory Package for the R Environment. Jour-nal of Statistical Software, 48, 1–29. Retrieved fromhttps://doi.org/10.18637/jss.v048.i06
Cressie, N., & Holland, P. W. (1983). Characterizing themanifest probabilities of latent trait models. Psy-chometrika, 48, 129–141. Retrieved from https://doi.org/10.1007/BF02314681
De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatoryitem response models: A generalized linear and non-linear approach. New York: Springer.
de Leeuw, J., & Verhelst, N. (1986). Maximum likelihoodestimation in generalized Rasch models. Journal ofEducational Statistics, 11, 183–196. Retrieved fromhttps://doi.org/10.3102\%2F10769986011003183
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Max-imum likelihood from incomplete data via the EMalgorithm. Journal of the Royal Statistical Society:Series B, 39, 1–38. Retrieved from https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Duncan, O. D. (1984). Rasch measurement: Further ex-amples and discussion. In C. F. Turner & E. Mar-tin (Eds.), Surveying subjective phenomena, volume2 (pp. 367–403). New-York, NY: Russell Sage Foun-dation.
du Toit, M. (Ed.). (2003). IRT from SSI: BILOG-MG MUL-TILOG PARSCALE TESTFACT. Lincolnwood, IL:Scientific Software International.
Embretson, S. E., & Yang, X. (2013). A MulticomponentLatent Trait Model for Diagnosis. Psychometrika,78, 14–36. Retrieved from https://doi.org/10.1007/s11336-012-9296-y
Ferguson, G. A. (1943). Item selection by the constantprocess. Psychometrika, 7, 19–29. Retrieved fromhttps://doi.org/10.1007/BF02288601
Fischer, G. H. (1974). Einfuhrung in die theorie psycholo-gischer tests. Bern: Huber.
Fischer, G. H. (1985). Some consequences of specific ob-jectivity for the measurement of change. In E. E.Roskam (Ed.), Measurement and personality assess-
CEJEME 56
ment (pp. 39–55). Amsterdam: North-Holland.Fischer, G. H. (2007). Rasch models. In C. R. Rao & S.
Fox, J.-P., & Glas, C. A. W. (2001). Bayesian estima-tion of a multilevel IRT model using Gibbs sam-pling. Psychometrika, 66, 269–286. Retrieved fromhttps://doi.org/10.1007/BF02294839
Guilford, J. P. (1936). Psychometric methods. N.Y.:McGraw-Hill. Retrieved from https://doi.org/10.1007/BF02287877
Haberman, S. (in press). Statistical theory and assessmentpractice. Journal of Educational Measurement.
Haley, D. C. (1952). Estimation of the dosage mortalityrelationship when the dose is subject to error. Stan-ford: Applied Mathematics and Statistics Laboratory,Stanford University, Technical Report 15.
Hart, H. N. (1923). Progress report on a test of social atti-tudes and interests. In B. T. Baldwin (Ed.), Universityof Iowa Studies in Child Welfare (Vol.2) (pp. 1–40).Iowa City: The University.
Holland, P. W. (1990). On the sampling theory founda-tions of item response theory models. Psychometrika,55, 577–601. Retrieved from https://doi.org/10.1007/BF02294609
Kelderman, H. (1984). Loglinear Rasch model tests. Psy-chometrika, 49, 223–245. Retrieved from https://doi.org/10.1007/BF02294174
Lawley, D. N. (1943). On problems connected withitem selection and test construction. Proceed-ings of the Royal Society of Edinburgh, 62-A, PartI, 74–82. Retrieved from https://doi.org/10.1017/S0080454100006282
Lazarsfeld, P. F. (1950). The logical and mathematicalfoundation of latent structure analysis. In S. A. Stouf-fer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S.A. Star, & J. A. Clausen (Eds.), Measurement andPrediction (pp. 362–412). New York: Wiley.
Likert, R. (1932). A technique for the measurement ofattitudes. Archives of Psychology, 140, 4-55.
Likert scale. (2020, June 11). Retrieved June16, 2020, from https://en.wikipedia.org/wiki/Likertscale#Pronunciation
Lord, F. M. (1951). A maximum likelihood approach to testscores (ETS Research Bulletin Series No. RB-51-19).Educational Testing Service. Retrieved from https://doi.org/10.1002/j.2333-8504.1951.tb00219.x
Lord, F. M. (1952). A theory of test scores. PsychometricMonographs, Whole No.7.
Lord, F. M. (1953a). An application of confidence intervalsand of maximum likelihood to the estimation of anexaminee’s ability. Psychometrika, 18, 57–76. Re-trieved from https://doi.org/10.1007/BF02289028
Lord, F. M. (1953b). The relation of test score to the traitunderlying the test. Educational and PsychologicalMeasurement, 13, 517–548. Retrieved from https://doi.org/10.1177/001316445301300401
Lord, F. M., & Novick, M. R. (1968). Statistical Theories ofMental Test Scores. Reading, MA: Addison-Wesley.
Masters, G. N. (1982). A Rasch model for partial creditscoring. Psychometrika, 47, 149–174. Retrievedfrom https://doi.org/10.1007/BF02296272
Masters, G. N. (2016). Partial credit model. In W. J. vander Linden (Ed.), Handbook of item response theory,volume one: Models (pp. 109–126). Boca Raton, FL:Chapman & Hall/CRC.
Mellenbergh, G. J. (1994). A unidimensional latent traitmodel for continuous item responses. MultivariateBehavioral Research, 29, 223–236. Retrieved from10.1207/s15327906mbr2903 2
Muraki, E. (1992). A generalized partial credit model:Application of an EM algorithm. Applied Psycho-logical Measurement, 29, 159–176. Retrieved fromhttps://doi.org/10.1177/014662169201600206
Muraki, E., & Muraki, M. (2016). Partial credit model.In W. J. van der Linden (Ed.), Handbook of item re-sponse theory, volume one: Models (pp. 127–137).Boca Raton, FL: Chapman & Hall/CRC.
Neumann, G. B. (1926). A study of international attitudesof high school students. New York,NY: Teachers Col-lege, Columbia University, Bureau of Publications.
Neyman, J., & Scott, E. L. (1948). Consistent estimatesbased on partially consistent observations. Econo-metrica, 16, 1–32. Retrieved from https://doi.org/10.2307/1914288
Patz, R. J., & Junker, B. W. (1999a). Applications and ex-tensions of MCMC in IRT: Multiple item types, miss-ing data, and rated responses. Journal of Educationaland Behavioral Statistics, 24, 342–366. Retrievedfrom https://doi.org/10.3102/10769986024004342
Patz, R. J., & Junker, B. W. (1999b). A straightforwardapproach to Markov chain Monte Carlo methods foritem response models. Journal of Educational andBehavioral Statistics, 24, 146–178. Retrieved fromhttps://doi.org/10.3102/10769986024002146
57 Thissen & Steinberg
Patz, R. J., & Yao, L. (2007). Vertical scaling: Statisticalmodels for measuring growth and achievement. In C.R. Rao & S. Sinharay (Eds.), Handbook of statisticsvolume 26: Psychometrics (pp. 955–975). Amster-dam: North-Holland. Retrieved from https://doi.org/10.1016/S0169-7161(06)26030-9
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004).GLLAMM Manual (Second Edition). Berkeley, CA:U.C. Berkeley Division of Biostatistics Working Pa-per Series University of California Working Paper160.
Ramsay, J. O. (2016). Functional approaches to modelingresponse data. In W. J. van der Linden (Ed.), Hand-book of item response theory, volume one: Mod-els (pp. 337–350). Boca Raton, FL: Chapman &Hall/CRC.
Rasch, G. (1960). Probabilistic models for some intelli-gence and attainment tests. Copenhagen: DenmarksPaedagogiske Institut.
Rasch, G. (1961). On General Laws and the Meaning ofMeasurement in Psychology. Proceedings of the IVBerkeley Symposium on Mathematical Statistics andProbability, 4, 321–333.
Rasch, G. (1966). An individualistic approach to item anal-ysis. In P. Lazarsfeld & N. V. Henry (Eds.), Read-ings in mathematical social science (pp. 89–108).Chicago: Science Research Associates.
Rasch, G. (1977). On specific objectivity: An attempt atformalizing the request for generality and validity ofscientific statements. In M. Blegvad (Ed.), The Dan-ish yearbook of philosophy. Copenhagen: Munks-gaard.
Reckase, M. D. (2009). Multidimensional item responsetheory models. N.Y.: Springer. Retrieved fromhttps://doi.org/10.1007/978-0-387-89976-3
Richardson, M. W. (1936). The relationship between thedifficulty and the differential validity of a test. Psy-chometrika, 1, 33–49. Retrieved from https://doi.org/10.1007/BF02288003
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000).A General Item Response Theory Model for Unfold-ing Unidimensional Polytomous Responses. AppliedPsychological Measurement, 24, 3–32. Retrievedfrom https://doi.org/10.1177/01466216000241001
Samejima, F. (1969). Estimation of latent ability usinga response pattern of graded scores. PsychometrikaMonograph, No. 17, 34, Part 2. Retrieved fromhttps://doi.org/10.1007/BF03372160
Samejima, F. (2016). Graded response models. In W. J. vander Linden (Ed.), Handbook of item response theory,volume one: Models (pp. 95–107). Boca Raton, FL:Chapman & Hall/CRC.
Sijtsma, K., & Molenaar, I. W. (2002). MeasurementMethods for the Social Science: Introduction to non-parametric item response theory. Thousand Oaks,CA: Sage Publications, Inc. Retrieved from https://doi.org/10.4135/9781412984676
Sitgreaves, R. (1961a). Further contributions to the theoryof test design. In H. Solomon (Ed.), Studies in itemanalysis and prediction (pp. 46–63). Stanford, CA:Stanford University Press.
Sitgreaves, R. (1961b). Optimal test design in a specialtesting situation. In H. Solomon (Ed.), Studies in itemanalysis and prediction (pp. 29–45). Stanford, CA:Stanford University Press.
Sitgreaves, R. (1961c). A statistical formulation of the at-tenuation paradox in test theory. In H. Solomon (Ed.),Studies in item analysis and prediction (pp. 17–28).Stanford, CA: Stanford University Press.
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized la-tent variable modeling: Multilevel, longitudinal, andstructural equation models. Boca Raton, FL: Chap-man & Hall/CRC. Retrieved from https://doi.org/10.1201/9780203489437
Solomon, H. (1956). Probability and statistics in psycho-metric research: item analysis and classification tech-niques. In J. Neyman (Ed.), Proceedings of the thirdberkeley symposium on mathematical statistics andprobability (Vol. 5, pp. 169–184). Berkeley, CA: Uni-versity of California Press.
Solomon, H. (1961). Classification procedures based on di-chotomous response vectors. In H. Solomon (Ed.),Studies in item analysis and prediction (pp. 177–186). Stanford, CA: Stanford University Press.
Symonds, P. M. (1929). Choice of items for a test on the ba-sis of difficulty. Journal of Educational Psychology,20, 481–493. Retrieved from https://doi.org/10.1037/h0075650
Tanner, M. A., & Wong, W. H. (1987). The calculationof posterior distributions by data augmentation (withdiscussion). Journal of the American statistical Asso-ciation, 82, 528–540. Retrieved from https://doi.org/10.1080/01621459.1987.10478458
Thissen, D., & Cai, L. (2016). Nominal categories mod-
CEJEME 58
els. In W. J. van der Linden (Ed.), Handbook of itemresponse theory, volume one: Models (pp. 51–73).Boca Raton, FL: Chapman & Hall/CRC.
Thissen, D., Cai, L., & Bock, R. D. (2010). The nomi-nal categories item response model. In M. L. Nering& R. Ostini (Eds.), Handbook of polytomous item re-sponse theory models (pp. 43–75). New York, NY:Routledge.
Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C.-H. (2007). Methodological issues for building itembanks and computerized adaptive scales. Quality ofLife Research, 16, 109–116. Retrieved from https://doi.org/10.1007/s11136-007-9169-5
Thissen, D., & Steinberg, L. (1984). A responsemodel for multiple choice items. Psychometrika,49, 501–519. Retrieved from https://doi.org/10.1007/BF02302588
Thissen, D., & Steinberg, L. (1997). A response modelfor multiple choice items. In W. J. van der Linden &R. K. Hambleton (Eds.), Handbook of modern itemresponse theory (pp. 51–65). New York: Springer-Verlag. Retrieved from https://doi.org/10.1007/978-1-4757-2691-6 3
Thissen-Roe, A., & Thissen, D. (2013). A two-decisionmodel for responses to Likert-type items. Jour-nal of Educational and Behavioral Statistics, 38,522–547. Retrieved from https://doi.org/10.3102/1076998613481500
Thorndike, E. L. (1913). An introduction to the theory ofmental and social measurements (Second ed.). NewYork, NY: Teachers College, Columbia University.Retrieved from https://doi.org/10.1037/10866-000
Thorndike, E. L., Bregman, E. O., Cobb, M. V., Woodyard,E., & Institute of Educational Research, Division ofPsychology, Teachers College, Columbia University.(1926). The measurement of intelligence. Teach-ers College Bureau of Publications. Retrieved fromhttps://doi.org/10.1037/11240-000
Thurstone, L. L. (1925). A method of scaling psychologi-cal and educational tests. Journal of Educational Psy-chology, 16, 433–449. Retrieved from https://doi.org/10.1037/h0073357
Thurstone, L. L. (1927). A law of comparative judgment.Psychological Review, 34, 273—286. Retrieved fromhttps://doi.org/10.1037/h0070288
Thurstone, L. L. (1928). Attitudes can be measured. Amer-ican Journal of Sociology, 33, 529–554. Retrievedfrom https://doi.org/10.1086/214483
Thurstone, L. L. (1938). Primary mental abilities. Chicago:University of Chicago Press.
Thurstone, L. L., & Chave, E. J. (1929). The Measure-ment of Attitude. Chicago, IL: University of ChicagoPress.
Tjur, T. (1982). A connection between Rasch’s item analy-sis model and a multiplicative poisson model. Scan-dinavian Journal of Statistics, 9, 23–30.
Tucker, L. R. (1946). Maximum validity of a test withequivalent items. Psychometrika, 11, 1–13. Retrievedfrom https://doi.org/10.1007/BF02288894
van der Linden, W. J. (2016a). Handbook of item re-sponse theory, volume one: Models. Boca Raton,FL: Chapman & Hall/CRC. Retrieved from https://doi.org/10.1201/9781315374512
van der Linden, W. J. (2016b). Lognormal responsetime model. In W. J. van der Linden (Ed.), Hand-book of item response theory, volume one: Mod-els (pp. 261–282). Boca Raton, FL: Chapman &Hall/CRC. Retrieved from https://doi.org/10.1201/9781315374512
von Davier, M., & Lee, Y.-S. (Eds.). (2019). Handbookof Diagnostic Classification Models. New York, NY:Springer. Retrieved from https://doi.org/10.1007/978-3-030-05584-4
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Test-let response theory and its applications. New York:Cambridge University Press. Retrieved from https://doi.org/10.1017/CBO9780511618765
Williams, V. S. L., Pommerich, M., & Thissen, D.(1998). A comparison of developmental scales basedon Thurstone methods and item response theory.Journal of Educational Measurement, 35, 93–107.Retrieved from https://doi.org/10.1111/j.1745-3984.1998.tb00529.x
Wimmer, R. (2012). Likert Scale-Dr. RensisLikert Pronunciation-Net Talk. Retrieved June16, 2020, from https://www.allaccess.com/forum/viewtopic.php?t=24251
Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982). LO-GIST user’s guide. Princeton NJ: Educational TestingService.
Wright, B. D., & Douglas, G. A. (1977). Best proceduresfor sample free item analysis. Applied PsychologicalMeasurement, 1, 281–295.
Wright, B. D., & Panchapakesan, N. (1969). A procedurefor sample-free item analysis. Educational and Psy-chological Measurement, 29, 23–48. Retrieved from
59 Thissen & Steinberg
https://doi.org/10.1177/001316446902900102Yen, W. M. (1993). Scaling performance assessments:
Strategies for managing local item dependence. Jour-nal of Educational Measurement, 30(3), 187–213.Retrieved from https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Yen, W. M., & Burket, G. R. (1997). Comparison ofitem response theory and Thurstone methods of ver-tical scaling. Journal of Educational Measurement,34, 293–313. Retrieved from https://doi.org/10.1111/j.1745-3984.1997.tb00520.x