The binary classification of truth and lies is often a detriment in criminal investigations as statements are intentionally not entirely true nor entirely false. This ambiguity in the veracity of their claims demands more extensive methods such as explainable models. Explainable models, particularly SHapley Additive exPlanations (SHAP), can help dissect statements and narrow down information for a more thorough investigation. Data from the Miami University Deception Database, comprising of various statements and their veracity, was analyzed for its linguistic features. This research utilizes Bidirectional Encoder Representations from Transformers (BERT) Embeddings to provide contextual understanding of statements and Sentiment Lexicons to provide domain specific knowledge. Results show that the R² (coefficient of determination) of the 2-Gram embedding performed the best at 0.39 by being able to capture more context than the 1-Gram embedding while being more general than the 3-Gram and 4-Gram embeddings. Each variant of the BERT Embedding was proven to be much more effective than general word embedding such as GloVe, Word2Vec and FastText. SHAP values were able to capture key points of interest in a statement by narrowing down pivotal and decision-making points. These results highlight potential indicators of either deceptive or truthful language such as the word ‘something’ and ‘our’. These points of interest can help humans focus on key points of investigation and intervention.
Copyrights © 2026