Analysis of the national examination exercises quality on natural science subjects in Indonesian elementary schools

Nurul Hamidah, Edi Istiyono


Analysis of the characteristics of the items needs to be done to determine the quality of the tests used in the national exam tryouts. This research aims to analyze the quality of the national exam questions on science subjects in elementary schools. The sampling technique used was simple random sampling. Response data were obtained from 250 grade VI elementary school students who worked on the tryout questions. The data was collected using a test technique using a test instrument consisting of 30 multiple-choice questions. The analysis was carried out using the BILOG-MG application to obtain the parameter of distinguishing power and grain difficulty level. The results of the parameter of item difficulty level with CTT, 37% easy category, 43% medium category, and 20% difficult category. The parameter of item difficulty level with IRT model 2 PL was obtained 11% in the very easy category, 26% in the easy category, 29% in the medium category and 20% in the difficult category, and 14% with the very difficult category. For the parameter of grain distinguishing power with CTT, there were 8 items with poor distinguishing power, while with IRT model 2 PL there were only 2 items with poor distinguishing power category. Based on the model fit analysis, it was found that the most appropriate model to use for the analysis of the science tryout questions for elementary schools was IRT with the 2PL model. The results of this study concluded that the national exam questions on science subjects in elementary schools had met the criteria for the level of difficulty and distinguishing power so that they could be said to be in a good category. This study contributes to developers of school exam questions in educational units to pay attention to the quality of items before they are used.


item difficulty level; item difference power; item quality; national examination exercises


M. Zuhri et al., “Development of assessment for the learning of the humanistic model to improve evaluation of elementary school mathematics”, International Journal of Instruction, vol. 12, no. 4, pp. 124-129, 2019.

C. Alonso‐Fernández et al., “Predicting students' knowledge after playing a serious game based on learning analytics data: A case study”, Journal of Computer Assisted Learning, vol. 36, no. 3, pp. 350-358, 2020.

B. Jia et al., “Quality and feature of multiple-choice questions in education”, Problems of Education in the 21st Century, vol. 78, no. 4, pp. 576-594, 2020.

W. Maba et al., “Constructing assessment instrument models for teacher’s performance, welfare, and education quality”, International Journal of Social Sciences and Humanities, vol. 1, no. 3, pp. 88-96, 2017.

C. Turk and Samsun, “Developing an achievement test for astronomy education”, Journal of Studies in Education, vol. 5, no. 3, pp. 89-112, 2015.

Z. Zainuddin, “Students' learning performance and perceived motivation in gamified flipped-class instruction”, Computers & Education, vol. 126, no. 1, pp. 75-88, 2018.

H. Putranta and S. Supahar, “Development of physics-tier tests (PysTT) to measure students' conceptual understanding and creative thinking skills: A qualitative synthesis”, Journal for the Education of Gifted Young Scientists, vol. 7, no. 3, pp. 747-775, 2019.

A. Dehnad et al., “International conference on current trends in ELT a comparison between three-and four-option multiple-choice questions”, Procedia Social and Behavioral Sciences, vol. 98, no. 1, pp. 398–403, 2014.

E. Ulitzsch et al., “A multiprocess item response model for not-reached items due to time limits and quitting”, Educational and Psychological Measurement, vol. 80, no. 3, pp. 522-547, 2020.

K. Yamamoto et al., “Multi-stage adaptive testing design in international large‐scale assessments”, Educational Measurement: Issues and Practice, vol. 37, no. 4, pp. 16-27, 2018.

H. Putranta and J. Jumadi, “Physics teacher efforts of Islamic high school in Yogyakarta to minimize students' anxiety when facing the assessment of physics learning outcomes”, Journal for the Education of Gifted Young Scientists, vol. 7, no. 2, pp. 119-136, 2019.

D. S. Christian et al., “Evaluation of multiple-choice questions using item analysis tool: A study from a medical institute of Ahmedabad, Gujarat”, International Journal of Community Medicine and Public Health, vol. 4, no. 6, pp. 1876-1885, 2017.

S. Lin, “Item analysis of english grammar achievement test”, Mandalay University of Foreign Languages Research Journal,vol. 9, no. 1, pp. 13-20, 2018.

P. U. Osadebe and M. Jessa, “Development of social studies achievement test”, European Journal of Open Education and E-Learning Studies, vol. 3, no. 1, pp. 104-124, 2018.

I. Himelfarb, “A primer on standardized testing: History, measurement, classical test theory, item response theory, and equating”, Journal of Chiropractic Education, vol. 33, no. 2, 151-163, 2019.

P. U. Osadebe, “Construction of economics achievement test for assessment of students”, World Journal of Education, vol. 4, no. 2, pp. 58-64, 2014.

H. Okonkwo et al., “A blinded clinical study using a subepidermal moisture biocapacitance measurement device for early detection of pressure injuries”, Wound Repair and Regeneration, vol. 28, no. 3, pp. 364-374, 2020.

H. D. Gibbs et al., “The nutrition literacy assessment instrument is a valid and reliable measure of nutrition literacy in adults with chronic disease”, Journal of Nutrition Education and Behavior, vol. 50, no. 3, pp. 247-257, 2018.

R. Gorter et al., “Missing item responses in latent growth analysis: Item response theory versus classical test theory”, Statistical Methods in Medical Research, vol. 29, no. 4, pp. 996-1014, 2020.

R. A. Feinberg and M. von Davier, “Conditional subscore reporting using iterated discrete convolutions”, Journal of Educational and Behavioral Statistics, vol. 45, no. 5, pp. 515-533, 2020.

C. R. Reynolds et al., “Item analysis: Methods for fitting the right items to the right test”, Mastering Modern Psychological Testing, vol. 1, no. 1, pp. 263-289, 2021.

D. G. Bonett, “Point‐biserial correlation: Interval estimation, hypothesis testing, meta‐analysis, and sample size determination”, British Journal of Mathematical and Statistical Psychology, vol. 73, no. 1, pp. 113-144, 2020.

L. A. Shepard et al., “Using learning and motivation theories to coherently link formative assessment, grading practices, and large‐scale assessment”, Educational Measurement: Issues and Practice, vol. 37, no. 1, pp. 21-34, 2018.

D. A. Tan et al., “Development of valid and reliable teacher-made tests for grade 10 mathematics”, International Journal of English and Education, vol. 8, no. 1, pp. 62-82, 2019.

H. K. Mohajan, “Qualitative research methodology in social sciences and related subjects”, Journal of Economic Development, Environment and People, vol. 7, no. 1, pp. 23-48, 2018.

K. Zeki and T. Seref, “New trends of measurement and assessment in distance education”, Turkish Online Journal of Distance Education, vol. 15, no. 1, pp. 206-217, 2014.

C. Fu et al., “Determining attribute weights for multiple attribute decision analysis with discriminating power in belief distributions”, Knowledge-Based Systems, vol. 143, no. 1, pp. 127-141, 2018.

E. Latipah et al., “The effects of positive parenting toward intolerance in pre-school children”, International Journal of Early Childhood Special Education, vol. 12, no. 2, pp. 189-195, 2020.

C. K. Jaggi et al., “Two-warehouse inventory model for deteriorating items with imperfect quality under the conditions of permissible delay in payments”, Scientia Iranica, vol. 24, no. 1, pp. 390-412, 2017.

H. Retnawati, “Teori respon butir dan penerapannya [Item response theory and its application]”. Nuha Medika, 2014.

J. M. Azevedo et al., “Using learning analytics to evaluate the quality of multiple-choice questions: A perspective with classical test theory and item response theory”, International Journal of Information and Learning Technology, vol. 36, no. 4, pp. 322-341, 2019.

H. C. A. Kistoro et al., “Implementation of Islamic religious learning strategies in children with autism in Indonesia”, Specijalna Edukacija I Rehabilitacija/Special Education and Rehabilitation, vol. 19, no. 4, pp. 227-246, 2020.

M. Kusumawati and S. Hadi, “An analysis of multiple-choice questions (MCQs): Item and test statistics from mathematics assessments in senior high school”, Research and Evaluation in Education, vol. 4, no. 1, pp. 70-78, 2018.

D. Almaleki, “Examinee characteristics and their impact on the psychometric properties of a multiple choice test according to the item response theory (IRT)”, Engineering, Technology & Applied Science Research, vol. 11, no. 2, pp. 6889-6901, 2020.

J. Mailool et al., “The effects of principal's decision-making, organizational commitment and school climate on teacher performance in vocational high school based on teacher perceptions”, European Journal of Educational Research, vol. 9, no. 4, pp. 1675-1687, 2020.

O. O. Adedoyin and T. Mokobi, “Using IRT psychometric analysis in examining the quality of junior certificate mathematics multiple choice examination test items”, International Journal of Asian Social Science, vol. 3, no. 4, pp. 992-1011, 2013.

H. B. Yılmaz, “A comparison of IRT model combinations for assessing fit in a mixed format elementary school science test”, International Electronic Journal of Elementary Education, vol. 11, no. 5, pp. 539-545, 2019.

J. L. Pimentel and M. L. A. Villaruz, “Comparison of item difficulty estimates in a basic statistics test using ltm and CTT software packages in R”, International Journal of Advanced Computer Science and Applications, vol. 11, no. 3, pp. 367-372, 2020.

S. Soysal and E. Y. Koğar, “An investigation of item position effects by means of IRT-based differential item functioning methods”, International Journal of Assessment Tools in Education, vol. 8, no. 2, pp. 239-256, 2021.

A. Gero and Y. Stav, “Summative assessment based on two-tier multiple-choice questions: Item discrimination and engineering students’ and teachers’ attitudes”, International Journal of Engineering Education, vol. 37, no. 3, pp. 830-840, 2021.

Z. A. Ashraf and K. Jaseem, “Classical and modern methods in item analysis of test tools”, International Journal of Research and Review, vol. 7, no. 5, pp. 397-403, 2020.

H. S. Al-zboon, “The effect of the multiple-choice test length on estimating the item parameters and the test information function according to the three-parameter logistic model in the item response theory”, Journal of Education and Practice, vol. 11, no. 24, pp. 42-50, 2020.

A. A. Bichi et al., “Comparative analysis of classical test theory and item response theory using chemistry test data”, International Journal of Engineering and Advanced Technology, vol. 8, no. 5, pp. 1260-1266, 2019.

B. Subali et al., “Student achievement based on the use of scientific method in the natural science subject in elementary school”, Jurnal Pendidikan IPA Indonesia/Indonesian Journal of Science Education, vol. 8, no. 1, pp. 39-51, 2019.

M. A. Ayanwale et al., “An assessment of item statistics estimates of basic education certificate examination through classical test theory and item response theory approach”, International Journal of Educational Research Review, vol. 3, no. 4, pp. 55-67, 2018.

N. Esomonu and O. J. Okek, “French language diagnostic writing skill test for junior”, International Journal of Education and Social Science Research, vol. 4, no. 2, pp. 334-350, 2021.



  • There are currently no refbacks.

Copyright (c) 2021 Institute of Advanced Engineering and Science

International Journal of Evaluation and Research in Education (IJERE)
p-ISSN: 2252-8822, e-ISSN: 2620-5440

View IJERE Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.