Data Simulation Using Unidimensional Item Response Theory
Özet
Bu bölümde, Tek Boyutlu Madde Tepki Kuramı (MTK) çerçevesinde veri simülasyonu süreçleri kuramsal ve uygulamalı bir bütünlükle ele alınmaktadır. Bölüm kapsamında iki kategorili ve çok kategorili puanlanan maddeler için MTK modellerinin matematiksel altyapıları sunulmaktadır. Ayrıca MTK’nın temel varsayımları, parametre kestirim teknikleri ve Monte Carlo simülasyon çalışmalarında izlenmesi gereken adımlar detaylandırılmıştır. Uygulama aşamasında, örnek bir araştırma sorusu dikkate alınarak MTK kapsamında bir Monte Carlo simülasyon çalışmasının R programlama dili kullanılarak nasıl yürütüldüğü tüm aşamalarıyla sunulmuştur. Bu doğrultuda örnek araştırma sorusuna yönelik simülasyon deseninin oluşturulması, farklı test uzunlukları ve örneklem büyüklükleri dikkate alınarak veri setlerinin üretilmesi, üretilen veri setlerine yönelik simülasyon geçerliği çalışmalarının yürütülmesi, madde ve yetenek parametresi kestirimleri, parametrelere yönelik yanlılık (bias) ve kök ortalama kare hatası (RMSE) hesaplamaları, bulguların görsel hale getirilmesi ilgili R kodları sunularak ele alınmıştır. Sonuç olarak bu bölüm, araştırmacılar için MTK tabanlı Monte Carlo simülasyon çalışmalarına yönelik uygulamalı bir metodolojik rehber sunmaktadır.
This chapter addresses data simulation processes within the framework of Unidimensional Item Response Theory (IRT) by integrating both theoretical and applied perspectives. Within this scope, the mathematical foundations of IRT models for dichotomously and polytomously scored items are presented. In addition, the fundamental assumptions of IRT, parameter estimation techniques, and the key steps to be followed in Monte Carlo simulation studies are explained. In the application section, an example research question is used to demonstrate how a Monte Carlo simulation study within the IRT framework can be conducted using the R programming language, with all stages presented step by step. In this context, the construction of the simulation design based on the research question, the generation of datasets under different test lengths and sample sizes, the implementation of simulation validity checks, the estimation of item and ability parameters, the calculation of bias and Root Mean Square Error (RMSE) for parameter estimates, and the visualization of findings are addressed through the presentation of relevant R codes. In conclusion, this chapter provides researchers with an applied methodological guide for conducting IRT-based Monte Carlo simulation studies.
Referanslar
Baker, F. B. (2016). Madde tepki kuramının temelleri (M. İlhan, Çev.). Ankara: Pegem Akademi.
Bock, R. D. & Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. https://doi.org/10.1007/BF02293801
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431-444. https://doi.org/10.1177/014662168200600405
Bond, T. G. & Fox, C. M. (2007). Applying the Rasch model. Fundamental measurement in the human sciences. New York: Routledge.
Bulut, O., & Sünbül, Ö. (2017). R programlama dili ile madde tepki kuramında monte carlo simülasyon çalışmaları. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Dai, S., Wang, X. & Svetina, D. (2022). subscore: Computing subscores in classical test theory and item response theory. Erişim adresi: https://CRAN.R-project.org/package=subscore.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
DeMars, C. (2010). Item response theory: Understanding statistics measurement. Oxford University Press.
Dinno, A. (2025). paran: Horn's test of principal components/factors (R package version 1.5.4). https://CRAN.R-project.org/package=paran
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Feinberg, R. A. & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. https://doi.org/10.1111/emip.1211
Gürdil, H., Soğuksu, Y. B., & Salihoğlu, S., ve diğ., (2025). Eğitimde Ölçmede Yapay Zekanın Entegrasyonu: Madde Tepki Kuramı Kapsamında Veri Üretiminde ChatGPT'nin Etkililiği. Trakya Journal of Education, 15(2).
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. California: Sage Publications.
Harwell, M., Stone, C. A., Hsu, T. C. & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika 47, 149–174. https://doi.org/10.1007/BF02296272
Maydeu-Olivares, A. & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrica, 71, 713-732. https://doi.org/10.2139/ssrn.1016131
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Pekmezci, F. B. & Avşar, A. Ş. (2021). A guide for more accurate and precise estimations in simulative unidimensional IRT models. International Journal of Assessment Tools in Education, 8(2), 423-453. https://doi.org/10.21449/ijate.790289
Revelle, W. (2022). psych: Procedures for psychological, psychometric, and personality research. doi: https://cran.r-project.org/web/packages/psych/index.html
Robitzsch, A. (2022). sirt: Supplementary item response theory models. Erişim adresi: https://cran.r-project.org/web/packages/sirt/index.html
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34, 1-97. https://doi.org/10.1007/BF03372160
Spence, I. (1983). Monte carlo simulation studies. Applied Psychological Measurement, 7, 405-425. https://doi.org/10.1177/014662168300700403
Stone, C. A. (1993). The use of multiple replications in IRT based Monte Carlo research. Paper presented at the European Meeting of the Psychometric Society, Barcelona.
Thissen, D., Steinberg, L. & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128. https://doi.org/10.1037/0033-2909.99.1.118
Wickham, H. & Bryan, J. (2023). readxl: Read Excel Files. https://readxl.tidyverse.org , https://github.com/tidyverse/readxl.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
Wright, B. D. & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: Mesa Press.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three parameter logistic model. Applied Psychological Measurement, 8, 125- 145. https://doi.org/10.1177/014662168400800201
Referanslar
Baker, F. B. (2016). Madde tepki kuramının temelleri (M. İlhan, Çev.). Ankara: Pegem Akademi.
Bock, R. D. & Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. https://doi.org/10.1007/BF02293801
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431-444. https://doi.org/10.1177/014662168200600405
Bond, T. G. & Fox, C. M. (2007). Applying the Rasch model. Fundamental measurement in the human sciences. New York: Routledge.
Bulut, O., & Sünbül, Ö. (2017). R programlama dili ile madde tepki kuramında monte carlo simülasyon çalışmaları. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Dai, S., Wang, X. & Svetina, D. (2022). subscore: Computing subscores in classical test theory and item response theory. Erişim adresi: https://CRAN.R-project.org/package=subscore.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
DeMars, C. (2010). Item response theory: Understanding statistics measurement. Oxford University Press.
Dinno, A. (2025). paran: Horn's test of principal components/factors (R package version 1.5.4). https://CRAN.R-project.org/package=paran
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Feinberg, R. A. & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. https://doi.org/10.1111/emip.1211
Gürdil, H., Soğuksu, Y. B., & Salihoğlu, S., ve diğ., (2025). Eğitimde Ölçmede Yapay Zekanın Entegrasyonu: Madde Tepki Kuramı Kapsamında Veri Üretiminde ChatGPT'nin Etkililiği. Trakya Journal of Education, 15(2).
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. California: Sage Publications.
Harwell, M., Stone, C. A., Hsu, T. C. & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika 47, 149–174. https://doi.org/10.1007/BF02296272
Maydeu-Olivares, A. & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrica, 71, 713-732. https://doi.org/10.2139/ssrn.1016131
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Pekmezci, F. B. & Avşar, A. Ş. (2021). A guide for more accurate and precise estimations in simulative unidimensional IRT models. International Journal of Assessment Tools in Education, 8(2), 423-453. https://doi.org/10.21449/ijate.790289
Revelle, W. (2022). psych: Procedures for psychological, psychometric, and personality research. doi: https://cran.r-project.org/web/packages/psych/index.html
Robitzsch, A. (2022). sirt: Supplementary item response theory models. Erişim adresi: https://cran.r-project.org/web/packages/sirt/index.html
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34, 1-97. https://doi.org/10.1007/BF03372160
Spence, I. (1983). Monte carlo simulation studies. Applied Psychological Measurement, 7, 405-425. https://doi.org/10.1177/014662168300700403
Stone, C. A. (1993). The use of multiple replications in IRT based Monte Carlo research. Paper presented at the European Meeting of the Psychometric Society, Barcelona.
Thissen, D., Steinberg, L. & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128. https://doi.org/10.1037/0033-2909.99.1.118
Wickham, H. & Bryan, J. (2023). readxl: Read Excel Files. https://readxl.tidyverse.org , https://github.com/tidyverse/readxl.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
Wright, B. D. & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: Mesa Press.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three parameter logistic model. Applied Psychological Measurement, 8, 125- 145. https://doi.org/10.1177/014662168400800201