Luo, Xiaofei
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Explainable Multi-Hop Question Answering for QA Assistants: Two-Hop Evidence Retrieval, Sentence-Level Supporting Facts, and Explicit Reasoning Paths Luo, Xiaofei
Journal of Technology Informatics and Engineering Vol. 5 No. 1 (2026): APRIL | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v5i1.504

Abstract

Multi-hop question answering (QA) for customer-facing assistants requires not only accurate answers but also an auditable evidence trail that explains how the system arrived at each answer. We present a fully interpretable multi-hop QA pipeline that decomposes inference into three explicit modules—Retriever → Evidence Selector → Reasoner—and produces an explanation consisting of sentence-level supporting facts and an explicit two-hop evidence path. The retriever ranks candidate paragraphs using lexical IDF-weighted token overlap; the evidence selector chooses a small set of high-scoring sentences; and the reasoner extracts a final answer using weighted candidate phrase scoring and deterministic rules for date/number and constrained yes/no comparisons. We conduct full experimental evaluations on the complete development splits of HotpotQA (7,405 questions, distractor setting) and 2WikiMultihopQA (12,576 questions). On HotpotQA, sentence-level evidence selection improves Supporting Fact F1 from 0.334 to 0.419, and adding an explicit two-hop retrieval path further increases Supporting Fact F1 to 0.426 while raising paragraph recall@2 to 0.603. Answer F1 increases from 0.084 to 0.088. On 2WikiMultihopQA, evidence selection improves Supporting Fact F1 from 0.328 to 0.429 and Answer F1 from 0.071 to 0.075. These results quantify the contribution of explicit evidence selection and path-constrained retrieval to explainability and provide a practical, reproducible baseline for knowledge assistants that must justify answers with supporting facts.
Natural-Language Policy Reasoning with Proof Generation: Turning Platform Rules into Verifiable Knowledge Luo, Xiaofei
Journal of Technology Informatics and Engineering Vol. 4 No. 2 (2025): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i2.505

Abstract

Policy and compliance systems increasingly express rules in natural language, yet enforcement requires deterministic decisions and auditable explanations. This paper studies a practical pipeline that converts natural-language facts and rules into a verifiable knowledge base, answers queries with three-valued semantics (True/False/Unknown), and produces machine-checkable proofs. The contribution is system-level rather than a new reasoning formalism: we integrate controlled-language parsing, symbolic proof extraction, independent proof checking, and proof-based supervision in a single auditable framework. We evaluate the pipeline on two natural-language rule-reasoning benchmarks: (i) a balanced subset of ProofWriter’s open-world-assumption tasks (360 train, 360 test), and (ii) a RuleTaker-style dataset generated from its grammar and label semantics (1800 train, 900 test), both balanced across reasoning depths 0–5. We compare a text-only logistic regression baseline, a retrieval-based “proof” baseline, a symbolic forward-chaining reasoner with proof extraction, and a proof-trained classifier using generated proofs. To ensure fairness, LR-text and LR-proof share the same TF-IDF/logistic-regression setup, and the retrieval baseline uses the same representation with a fixed top-4 configuration. On ProofWriter-Balanced, the symbolic reasoner achieves 0.803 accuracy (0.808 macro-F1), while proof-trained classification reaches 0.825 accuracy (0.825 macro-F1). On RuleTaker-Rep, both methods achieve 1.000 accuracy. Proof verifiability clearly separates faithful from post-hoc explanations: symbolic proofs are verifiable for all predictions, whereas retrieval-based proofs are verifiable for only 31.4%. Sensitivity analyses varying reasoning depth, distractors, and proof corruption show that proof-based methods remain robust to noise but depend on proof integrity. These findings demonstrate the feasibility of auditable natural-language policy reasoning in controlled settings, while highlighting limitations in parser coverage and benchmark regularity.