Sampling methods are crucial for large-scale assessments. International surveys like PISA, TIMSS, and PIRLS use stratified random sampling (StRS) to enhance estimation accuracy, ensure representation of all subpopulations, and provide efficient administration. Similarly, Indonesia's National Assessment (AN) applies StRS, dividing populations by school size, class size, and gender. However, the accuracy of the AN sampling method, including its reliability and validity, has not been tested since its 2021 implementation. This study compares the reliability and validity of the AN sampling method to simple random sampling (SRS). Reliability is assessed by the consistency of estimates across repeated sampling, indicated by small standard error (SE) and confidence intervals (CI). Validity measures how accurately sample estimates reflect population parameters, evaluated through Mean Square Error (MSE). Using AN data from 1.9 million junior high school students out of 4.2 million, the analysis shows no significant differences in national population parameters between StRS and SRS. Both methods produce similar mean estimates (55) and standard deviations (10.7). However, StRS demonstrates greater variability in weights, reflecting its ability to account for sampling structure. At the school level, StRS outperforms SRS, yielding narrower CI and MSE ranges, highlighting its superior reliability. While MSE differences are statistically significant, their practical impact is minor due to the small effect size and large dataset. These results suggest StRS is more reliable for school-level reporting.
                        
                        
                        
                        
                            
                                Copyrights © 2025