This study aims to evaluate the reliability of chemistry laboratory practice assessments for Grade XII senior high school students by applying Generalizability Theory (G-theory) and Decision Study (D-study). A total of 79 students were involved in the research, each randomly assigned to perform one of six available chemistry laboratory practice tasks: electrolytes and nonelectrolytes, exothermic and endothermic reactions, enthalpy of neutralization between HCl and NaOH, acid-base titration, identification of acidic-basic properties, and electrolysis of CuSO₄. Each student was assessed independently by two chemistry teachers based on seven performance criteria: equipment selection, procedure, data reading, analysis, conclusion, cleanliness, and time efficiency. The G-study was conducted using a nested-crossed model in which students were nested within laboratory practice tasks and crossed with raters. The results revealed that variance due to raters (43.2%) and residual error (42.2%) dominated the total score variance, while the variance attributed to students nested within laboratory practice was relatively low (14.6%). The D-study produced a generalizability coefficient (Eρ²) of 0.41 and a dependability index (Φ) of 0.26, indicating low reliability for both relative and absolute decisions. A D-study simulation demonstrated that increasing the number of raters and laboratory practice tasks improved reliability. An optimal configuration of six tasks assessed by nine raters is required to achieve an Eρ² ≥ 0.80. These findings underscore the importance of well-designed assessment systems, consistent rater training, and diverse task coverage to ensure fair and dependable laboratory practice scoring. G-theory and D-study prove to be valuable tools for enhancing the quality of performance-based assessments in science education.
Copyrights © 2025