International Journal of Advances in Artificial Intelligence and Machine Learning
Vol. 2 No. 2 (2025): International Journal of Advances in Artificial Intelligence and Machine Learni

Analyzing Bias in Large Language Models: A Quantitative Study Using Sentiment and Demographic Metrics

Mandava, Ramya (Unknown)



Article Info

Publish Date
28 May 2025

Abstract

Background of study: The widespread adoption of Large Language Models (LLMs) raises concerns about biases that affect fairness and credibility. As LLMs affect areas such as recruitment and customer service, systematic quantitative analysis is essential to identify and mitigate these biases.Aims and scope of paper: This research investigates demographic bias in LLM quantitatively by analyzing sentiment polarity scores across different demographic categories. The goal is to provide a statistically confirmed analysis of sentiment bias and propose mitigation methods, focusing on GPT-4, LLaMA-2, Claude, and BLOOM.Methods: Quantitative analysis was performed on GPT-4, LLaMA-2, Claude, and BLOOM using sentiment and demographic data. Sentiment polarity assessments for gender and racial/ethnic groups were obtained with VADER and TextBlob. Demographic Disparity Score, ANOVA, and Cohen's Kappa assessed the significance and appropriateness of bias. Inter-rater reliability between automated tools and human annotators was also evaluated.Result: Sentiment bias was found in all models, varying by gender and race, particularly in GPT-4 and Claude. Sentiment scores were consistently higher for queries pertaining to females than those pertaining to males across all models, with GPT-4 and Claude showing the largest differences. Claude also showed racial sentiment alignment, favoring queries relating to white people over black people. ANOVA confirmed statistically significant sentiment variation by demographics across all models. High inter-rater reliability validated the sentiment analysis.Conclusion: This study shows demographic bias in GPT-4, LLaMA-2, Claude, and BLOOM, with different sentiment trends across demographic classifications. The models showed more positive sentiment for female questions and a trend towards certain racial groups. These findings indicate an embedded bias in the training data, which raises ethical concerns. Identifying and addressing these biases is critical to ensuring fairness and credibility in real-world LLM applications.

Copyrights © 2025






Journal Info

Abbrev

ijaaiml

Publisher

Subject

Computer Science & IT

Description

The International Journal of Advances in Artificial Intelligence and Machine Learning (IJAAIML) is a prominent academic journal dedicated to publishing cutting-edge research and developments in the fields of Artificial Intelligence (AI) and Machine Learning (ML). It serves as an essential platform ...