Mauro Castelli
NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Campus de Campolide, 1070-312 Lisboa,

Published : 7 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 7 Documents
Search
Journal : Emerging Science Journal

Machine Learning Bias in Predicting High School Grades: A Knowledge Perspective Ricardo Costa-Mendes; Frederico Cruz-Jesus; Tiago Oliveira; Mauro Castelli
Emerging Science Journal Vol 5, No 5 (2021): October
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/esj-2021-01298

Abstract

This study focuses on the machine learning bias when predicting teacher grades. The experimental phase consists of predicting the student grades of 11th and 12thgrade Portuguese high school grades and computing the bias and variance decomposition. In the base implementation, only the academic achievement critical factors are considered. In the second implementation, the preceding year’s grade is appended as an input variable. The machine learning algorithms in use are random forest, support vector machine, and extreme boosting machine. The reasons behind the poor performance of the machine learning algorithms are either the input space poor preciseness or the lack of a sound record of student performance. We introduce the new concept of knowledge bias and a new predictive model classification. Precision education would reduce bias by providing low-bias intensive-knowledge models. To avoid bias, it is not necessary to add knowledge to the input space. Low-bias extensive-knowledge models are achievable simply by appending the student’s earlier performance record to the model. The low-bias intensive-knowledge learning models promoted by precision education are suited to designing new policies and actions toward academic attainments. If the aim is solely prediction, deciding for a low bias knowledge-extensive model can be appropriate and correct. Doi: 10.28991/esj-2021-01298 Full Text: PDF
Mathematics and Mother Tongue Academic Achievement: A Machine Learning Approach Catarina Nunes; Ana Beatriz-Afonso; Frederico Cruz-Jesus; Tiago Oliveira; Mauro Castelli
Emerging Science Journal Vol 6 (2022): Special Issue "Current Issues, Trends, and New Ideas in Education"
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2022-SIED-010

Abstract

Academic achievement is of great interest to education researchers and practitioners. Several academic achievement determinants have been described in the literature, mostly identified by analyzing primary (sample) data with classic statistical methods. Despite their superiority, only recently have machine learning methods started to be applied systematically in this context. However, even when this is the case, the ability to draw conclusions is greatly hampered by the "black-box" effect these methods entail. We contribute to the literature by combining the efficiency of machine learning methods, trained with data from virtually every public upper-secondary student of a European country, with the ability to quantify exactly how much each driver impacts academic achievement on Mathematics and mother tongue, through the use of prototypes. Our results indicate that the most important general academic achievement inhibitor is the previous retainment. Legal guardian's education is a critical driver, especially in Mathematics; whereas gender is especially important for mother tongue, as female students perform better. Implications for research and practice are presented. Doi: 10.28991/ESJ-2022-SIED-010 Full Text: PDF
Deep Learning in Predicting High School Grades: A Quantum Space of Representation Ricardo Costa-Mendes; Frederico Cruz-Jesus; Tiago Oliveira; Mauro Castelli
Emerging Science Journal Vol 6 (2022): Special Issue "Current Issues, Trends, and New Ideas in Education"
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2022-SIED-012

Abstract

This paper applies deep learning to the prediction of Portuguese high school grades. A deep multilayer perceptron and a multiple linear regression implementation are undertaken. The objective is to demonstrate the adequacy of deep learning as a quantitative explanatory paradigm when compared with the classical econometrics approach. The results encompass point predictions, prediction intervals, variable gradients, and the impact of an increase in the class size on grades. Deep learning’s generalization error is lower in the student grade prediction, and its prediction intervals are more accurate. The deep multilayer perceptron gradient empirical distributions largely align with the regression coefficient estimates, indicating a satisfactory regression fit. Based on gradient discrepancies, a student’s mother being an employer does not seem to be a positive factor. A benign paradigm shift concerning the balance between home and career affairs for both genders should be reinforced. The deep multilayer perceptron broadens the spectrum of possibilities, providing a quantum solution hinged on a universal approximator. In the case of an academic achievement-critical factor such as class size, where the literature is neither unanimous on its importance nor its direction, the multilayer perceptron formed three distinct clusters per the individual gradient signals. Doi: 10.28991/ESJ-2022-SIED-012 Full Text: PDF
The Benefits of Automated Machine Learning in Hospitality: A Step-By-Step Guide and AutoML Tool Mauro Castelli; Diego Costa Pinto; Saleh Shuqair; Davide Montali; Leonardo Vanneschi
Emerging Science Journal Vol 6, No 6 (2022): December
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2022-06-06-02

Abstract

The manuscript presents a tool to estimate and predict data accuracy in hospitality by means of automated machine learning (AutoML). It uses a tree-based pipeline optimization tool (TPOT) as a methodological framework. The TPOT is an AutoML framework based on genetic programming, and it is particularly useful to generate classification models, for regression analysis, and to determine the most accurate algorithms and hyperparameters in hospitality. To demonstrate the presented tool’s real usefulness, we show that the TPOT findings provide further improvement, using a real-world dataset to convert key hospitality variables (customer satisfaction, loyalty) to revenue, with up to 93% prediction accuracy on unseen data. Doi: 10.28991/ESJ-2022-06-06-02 Full Text: PDF
A Genetic Programming Based Heuristic to Simplify Rugged Landscapes Exploration Gloria Pietropolli; Giuliamaria Menara; Mauro Castelli
Emerging Science Journal Vol 7, No 4 (2023): August
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2023-07-04-01

Abstract

Some optimization problems are difficult to solve due to a considerable number of local optima, which may result in premature convergence of the optimization process. To address this problem, we propose a novel heuristic method for constructing a smooth surrogate model of the original function. The surrogate function is easier to optimize but maintains a fundamental property of the original rugged fitness landscape: the location of the global optimum. To create such a surrogate model, we consider a linear genetic programming approach coupled with a self-tuning fitness function. More specifically, to evaluate the fitness of the produced surrogate functions, we employ Fuzzy Self-Tuning Particle Swarm Optimization, a setting-free version of particle swarm optimization. To assess the performance of the proposed method, we considered a set of benchmark functions characterized by high noise and ruggedness. Moreover, the method is evaluated over different problems’ dimensionalities. The proposed approach reveals its suitability for performing the proposed task. In particular, experimental results confirm its capability to find the global argminimum for all the considered benchmark problems and all the domain dimensions taken into account, thus providing an innovative and promising strategy for dealing with challenging optimization problems. Doi: 10.28991/ESJ-2023-07-04-01 Full Text: PDF
Learning Curves Prediction for a Transformers-Based Model Francisco Cruz; Mauro Castelli
Emerging Science Journal Vol 7, No 5 (2023): October
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2023-07-05-03

Abstract

One of the main challenges when training or fine-tuning a machine learning model concerns the number of observations necessary to achieve satisfactory performance. While, in general, more training observations result in a better-performing model, collecting more data can be time-consuming, expensive, or even impossible. For this reason, investigating the relationship between the dataset's size and the performance of a machine learning model is fundamental to deciding, with a certain likelihood, the minimum number of observations that are necessary to ensure a satisfactory-performing model is obtained as a result of the training process. The learning curve represents the relationship between the dataset’s size and the performance of the model and is especially useful when choosing a model for a specific task or planning the annotation work of a dataset. Thus, the purpose of this paper is to find the functions that best fit the learning curves of a Transformers-based model (LayoutLM) when fine-tuned to extract information from invoices. Two new datasets of invoices are made available for such a task. Combined with a third dataset already available online, 22 sub-datasets are defined, and their learning curves are plotted based on cross-validation results. The functions are fit using a non-linear least squares technique. The results show that both a bi-asymptotic and a Morgan-Mercer-Flodin function fit the learning curves extremely well. Also, an empirical relation is presented to predict the learning curve from a single parameter that may be easily obtained in the early stage of the annotation process. Doi: 10.28991/ESJ-2023-07-05-03 Full Text: PDF
Artificial Intelligence for Impact Assessment of Administrative Burdens Victor Costa; Pedro Coelho; Mauro Castelli
Emerging Science Journal Vol 8, No 1 (2024): February
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2024-08-01-019

Abstract

This study proposes the use of Artificial Intelligence (AI) to automatize part of the legislative impact assessment process. In particular, the focus of this study is the automatic identification of administrative burdens from legislative documents. The goal of impact assessment for administrative burdens is to apply an evidence-based approach toward compliance costs generated by regulation. Employing advanced Natural Language Processing (NLP) techniques based on a transformer architecture, a system was specifically developed and tested using Portuguese legislation. The experimental phase involved the system's ability to accurately and comprehensively identify administrative burdens. Experimental results demonstrated the system's effectiveness, showing its suitability for supporting the legislative impact assessment process by automating a time-consuming task. To the best of our knowledge, this is the first attempt concerning the use of AI for automatizing the identification of administrative burdens. The proposed system may provide governments and policymakers with a tool to speed up the legislative impact assessment process, thereby streamlining decision-making processes. Moreover, the use of AI can make the legislative impact assessment process less subjective, thus increasing its transparency and making citizens more confident about the impartiality of the process that leads to new legislation. Doi: 10.28991/ESJ-2024-08-01-019 Full Text: PDF