Comparative Evaluation of Estimation Methods

Comparative Evaluation of Estimation Methods in Item Response Theory (IRT) for High-Stakes Economics Examinations among SSS III Students in Oyo State, Nigeria

^*1Abass Adekunle Sulaiman and ²Dorcas. S. Daramola

¹Department of social sciences Education, University of Ilorin, Ilorin, Kwara State, Nigeria. Email: princeatilola4u@gmail.com

²Department of social sciences Education, University of Ilorin, Ilorin, Kwara State, Nigeria. Email: olatunji.ds@unilorin.edu.ng

Abstract

This study undertook a comparative evaluation of Maximum Likelihood (ML) and Bayesian estimation methods in Item Response Theory (IRT) with a view to determining their implications for high-stakes testing, using WAEC Economics multiple-choice items as a case study. The study adopted a descriptive survey and correlational research design. The population comprised all Senior Secondary School III (SSS III) students offering Economics in public secondary schools in Oyo State, Nigeria. A multi-stage sampling procedure was employed to select 1,200 students from 20 secondary schools. The 2021 WAEC Economics multiple-choice items served as the research instrument. Data were analysed using both descriptive and inferential statistics through Xcalibre 4.2 and SPSS version 25. Descriptive statistics of mean, standard deviation, skewness and kurtosis were used to describe examinees’ ability distribution and item parameter estimates, while Pearson Product Moment Correlation (PPMC) was employed to test the hypotheses at the 0.05 level of significance. Findings revealed that examinees’ ability estimates obtained through ML and Bayesian methods were moderately and positively correlated, indicating consistency between the two estimation techniques. Similarly, item difficulty and discrimination indices showed strong and significant positive correlations between the two methods, suggesting comparability in parameter estimation. However, the guessing parameter exhibited a weak and non-significant correlation, implying notable differences in the estimation of pseudo-guessing between ML and Bayesian approaches. Overall, the results indicate that ML and Bayesian estimation methods can be used interchangeably for estimating examinees’ ability, item difficulty, and discrimination in high-stakes testing, but caution is required in interpreting guessing parameters. The study concludes that the choice of estimation method has important implications for the fairness, accuracy, and credibility of high-stakes examinations and recommends the combined or complementary use of both methods in large-scale assessment programmes.

Keywords:Item Response Theory, Maximum Likelihood Estimation, Bayesian Estimation, Ability Estimation, Item Difficulty, Item Discrimination, Guessing Parameter

Reference

Adedoyin, O. O., & Mokobi, T. (2013). Using IRT psychometric analysis in examining the quality of junior certificate mathematics multiple choice examination test items. International Journal of Asian Social Science, 3(4), 992–1011.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Andersson, B., & Xin, T. (2021). Improved integration methods for item response theory. British Journal of Mathematical and Statistical Psychology, 74(S1), 60–82. https://doi.org/10.1111/bmsp.12210

Awopeju, O. A., & Afolabi, E. R. I. (2016). Comparative analysis of Classical Test Theory and Item Response Theory-based item parameter estimates of Senior School Certificate Mathematics Examination. International Journal of Evaluation and Research in Education, 5(1), 38–45. https://doi.org/10.11591/ijere.v5i1.4512

Baker, F. B. (1991). Comparison of minimum logit chi‐square and Bayesian item parameter estimation. British Journal of Mathematical and Statistical Psychology, 44(2), 299–313.

Baker, F. B. (2001). The basics of item response theory. ERIC Clearinghouse on Assessment and Evaluation.

Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Marcel Dekker.

Baker, F. B., & Kim, S. H. (2017). The basics of item response theory using R. Springer. https://doi.org/10.1007/978-3-319-54205-8

Bartolucci, F., & Pigini, C. (2017). Numeric and computational aspects of item response theory. In W. J. van der Linden (Ed.), Handbook of item response theory: Volume three: Applications (pp. 345–364). CRC Press.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

Cai, L. (2013). flexMIRT version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Vector Psychometric Group.

Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates Publishers.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.

Jimoh, I. M., Daramola, D. S., Oladele, J. I., & Sheu, A. L. (2020). Assessment of items prone to guessing in SSCE Economics multiple-choice tests among students in Kwara State, Nigeria. Anatolian Journal of Education, 5(1), 17–28.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.

Mahmud, J. (2017). Item response theory: A basic concept. Educational Research and Reviews, 12(5), 258–266.

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195.

Ojerinde, D., Popoola, A., Ojo, O., & Onyenecho, E. (2019). Comparison of maximum likelihood and Bayesian estimation methods in the 3-PL IRT model. Journal of Educational Measurement and Statistics, 4(2), 11–24.

Olatunji, N. O., & Awopeju, O. A. (2012). Psychometric properties of multiple-choice tests: A comparative study of CTT and IRT. Nigerian Journal of Educational Research and Evaluation, 11(1), 45–56.

Paek, I., & Cole, K. (2020). Using R for item response theory model applications. Routledge.

Robitzsch, A. (2024). Recent developments in item response theory: Computational approaches and practical applications. Journal of Educational Measurement, 61(1), 45–68. https://doi.org/10.1111/jedm.12345

Selçuk, E., & Demir, E. (2024). Comparison of IRT ability and item parameter estimations according to classical and Bayesian estimation methods. International Journal of Assessment Tools in Education, 11(2), 213–248. https://doi.org/10.21449/ijate.1290831

Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Lawrence Erlbaum Associates.

Woods, C. M. (2014). Item response theory. In P. Robinson (Ed.), The Routledge encyclopedia of second language acquisition (pp. 336–339). Routledge.

Yusuf, F. A. (2019). Psychometric evaluation of high-stakes examinations in Nigeria: Challenges and the way forward. Journal of Educational Assessment in Africa, 14(1), 88–102.

Post Views: 35

Rima International Journal of Education (RIJE)