Euis Yanah Mulyanah
Universitas Sultan Ageng Tirtayasa

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Psychometric Evidence for Using FSI Speaking Ratings in Indonesian Primary EFL Classrooms: Content Validity and Inter-Rater Reliability Euis Yanah Mulyanah; Yudi Juniardi; Lukman Nulhakim
AL-ISHLAH: Jurnal Pendidikan Vol 18, No 1 (2026): MARCH 2026
Publisher : STAI Hubbulwathan Duri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35445/alishlah.v18i1.9642

Abstract

Reliable and valid speaking assessment is crucial for accurately interpreting young learners’ communicative competence in English as a Foreign Language (EFL). Although the Foreign Service Institute (FSI) Speaking Ratings are widely used in adult contexts, empirical evidence supporting their adaptation for primary school learners, particularly in Indonesia, remains limited. This study employed a quantitative psychometric validation design to examine the content validity and inter-rater reliability of an adapted FSI scale. Six expert validators (two media, two language, and two material experts), two trained raters, and 30 Grade V students from a public primary school participated. The scale was contextually modified to align with young learners’ characteristics while retaining its five domains. Students performed a 2–3 minute monologue based on visual prompts, which was video-recorded and independently scored. Content validity was assessed using Aiken’s V, and inter-rater reliability was analyzed using a two-way random-effects Intraclass Correlation Coefficient (ICC) with absolute agreement. Aiken’s V coefficients ranged from 0.50 to 1.00, with a mean of 0.87 across 54 indicators, indicating strong content validity. The ICC results demonstrated consistent scoring between raters, suggesting satisfactory inter-rater reliability. The findings provide initial psychometric support for the adapted FSI Speaking Ratings in primary EFL contexts, enhancing assessment objectivity and standardization. However, limitations include a small sample size, limited number of raters, single-site data, and the absence of construct validity analysis. Future studies should address these constraints to strengthen generalizability and validation.