Claim Missing Document
Check
Articles

Found 5 Documents
Search

Analyzing Customer Spending Based on Transactional Data Using the Random Forest Algorithm Siddique, Quba; Wahid, Arif Muamar
International Journal for Applied Information Management Vol. 5 No. 2 (2025): Regular Issue: July 2025
Publisher : Bright Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijaim.v5i2.103

Abstract

This study explores customer spending behavior using transactional data from a retail dataset, employing a Random Forest Regressor to predict the total amount spent by customers. The dataset includes various customer attributes such as age, gender, and product category, alongside transactional details including quantity purchased and price per unit. Through Exploratory Data Analysis (EDA), it was found that Price and Quantity were the most significant factors influencing total spending, with other features like Age, Gender, and Product Category playing a minimal role in predicting spending behavior. The model achieved perfect accuracy, with an R-squared value of 1.000, indicating that it explained all the variance in customer spending. The findings suggest that transactional features, particularly Price and Quantity, are the key drivers of customer spending, and retailers can optimize their marketing and sales strategies by focusing on these factors. This study also highlights the importance of data preprocessing and feature engineering in enhancing model performance, though the results were limited by the lack of external and behavioral features. Future research could further explore the impact of customer loyalty, external factors, and more complex algorithms to improve predictive accuracy.
A Comparative Analysis of Linear Regression and XGBoost Algorithms for Predicting GPU Prices Using Technical Specifications Prakoso, Dendi Putra; Irfan, Muhammad; Siddique, Quba
International Journal of Informatics and Information Systems Vol 7, No 4: December 2024
Publisher : International Journal of Informatics and Information Systems

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijiis.v7i4.228

Abstract

This study investigates and compares the predictive performance of Linear Regression and XGBoost algorithms in estimating Graphics Processing Unit (GPU) prices based on their technical specifications. GPU prices are known for their high volatility, influenced not only by hardware characteristics—such as memory capacity, clock speed, and bandwidth—but also by external market factors including demand from the gaming industry, machine learning applications, and cryptocurrency mining activities. The dataset used in this research comprises 475 GPU units from three leading manufacturers—NVIDIA, AMD, and Intel Arc—featuring 15 technical attributes obtained from publicly accessible data sources. Adopting an experimental quantitative approach, the dataset was divided into training and testing subsets using an 80:20 ratio. The data preprocessing phase involved handling missing values, detecting outliers through the Interquartile Range (IQR) method, performing data normalization, and encoding categorical features. The models were evaluated using four performance metrics: the Coefficient of Determination (R²), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results demonstrate that XGBoost outperforms Linear Regression, achieving an R² of 0.8129, MAE of 85.07 USD, RMSE of 122.03 USD, and MAPE of 35.23%. In comparison, the Linear Regression model recorded an R² of 0.7629, MAE of 106.59 USD, RMSE of 137.38 USD, and MAPE of 56.04%. The superior performance of XGBoost can be attributed to its ability to model non-linear relationships and capture complex feature interactions among GPU specifications.
Comparative Analysis of Sentiment Classification Techniques on Flipkart Product Reviews: A Study Using Logistic Regression, SVC, Random Forest, and Gradient Boosting Henderi; Siddique, Quba
Journal of Digital Market and Digital Currency Vol. 1 No. 1 (2024): Regular Issue June 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jdmdc.v1i1.4

Abstract

Sentiment analysis plays a crucial role in e-commerce, providing valuable insights from customer reviews on platforms like Flipkart. This study aims to compare the effectiveness of various sentiment classification techniques, specifically Logistic Regression, Support Vector Classifier (SVC), Random Forest, and Gradient Boosting. The dataset, collected from Flipkart, consists of 205,052 product reviews spanning various categories. Key data preprocessing steps included handling missing values, removing duplicates, normalizing text, and applying TF-IDF vectorization for feature extraction. We implemented and tuned the hyperparameters for each algorithm using grid search and randomized search. The data was divided into training and testing sets with an 80-20 split, and cross-validation techniques ensured robust model evaluation. The performance of each model was assessed using several metrics: accuracy, precision, recall, F1-score, and ROC-AUC. The results revealed that Logistic Regression achieved an accuracy of 0.8995, precision of 0.8773, recall of 0.8995, an F1 score of 0.8736, and a ROC AUC score of 0.9105. The SVC model showed slightly higher accuracy at 0.8997, precision of 0.8619, recall of 0.8997, and an F1 score of 0.8738. The Random Forest model, while robust, had lower accuracy (0.7953) and struggled with precision (0.6326), recall (0.7953), and an F1 score of 0.7047, but achieved a ROC AUC score of 0.9037. Gradient Boosting performed comparably to Logistic Regression with an accuracy of 0.8993, precision of 0.8512, recall of 0.8993, an F1-score of 0.8735, and a ROC AUC score of 0.9098. Comparative analysis identified SVC and Logistic Regression as top performers, balancing accuracy and computational efficiency. These findings suggest that implementing these models can significantly enhance sentiment analysis in e-commerce, improving customer insights and business strategies. Future research should explore advanced deep learning techniques and address class imbalances to further refine sentiment analysis capabilities.
Optimizing Pricing Strategies for Female Fashion Products Using Regression Analysis to Maximize Revenue and Profit in Digital Marketing Siddique, Quba
Journal of Digital Market and Digital Currency Vol. 2 No. 1 (2025): Regular Issue March
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jdmdc.v2i1.28

Abstract

This study explores optimal pricing strategies for the female fashion sector through the application of advanced data science methodologies. Utilizing a dataset of 4,272 entries, comprising various attributes such as original prices, promotional prices, and discount percentages, we employed regression models to predict promotional pricing. The research highlights Ridge Regression as the most effective model, balancing high accuracy with reduced overfitting. The model achieved an R-squared (R²) value of 0.9999999999999678, a Mean Absolute Error (MAE) of 4.31×10−6, and a Mean Squared Error (MSE) of 4.89×10 −11, demonstrating its robustness and reliability. The study's findings indicate that dynamic pricing and tailored discount strategies can significantly enhance revenue and profitability. High-value items are best priced with moderate discounts, maintaining higher promotional prices, while low-value items benefit from aggressive discounting to drive sales volume. Sensitivity analysis further supported these strategies by showing that a 10% increase in original prices proportionally increased promotional prices, while a 10% increase in discount percentages led to lower promotional prices, affecting sales performance differently across product categories. Practical implications for e-commerce businesses include implementing dynamic pricing, developing targeted discount strategies, and timing promotions strategically. Regular sensitivity analysis and continuous model validation are recommended to adapt to market changes effectively. Future research should consider broader datasets, advanced modeling techniques, external market factors, and customer segmentation to enhance the generalizability and applicability of pricing strategies across different sectors. This research underscores the importance of data-driven approaches in optimizing digital marketing strategies, offering actionable insights that can significantly boost revenue and profitability in the female fashion sector.
Anomaly Detection in Blockchain Transactions within the Metaverse Using Anomaly Detection Techniques Henderi; Siddique, Quba
Journal of Current Research in Blockchain Vol. 1 No. 2 (2024): Regular Issue September
Publisher : Bright Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jcrb.v1i2.17

Abstract

The rapid expansion of blockchain technology and its integration into the Metaverse has brought about significant opportunities, but also new challenges, particularly in ensuring the security and integrity of transactions. This study explores the application of anomaly detection techniques, specifically the Isolation Forest algorithm, to identify unusual and potentially fraudulent transactions within a blockchain dataset. The analysis focuses on detecting anomalies across various transaction types, such as sales and scams, and regions including Asia and Africa. The dataset, comprising 78,600 transactions, revealed that 3,930 (approximately 5%) were flagged as anomalies. "Sale" and "Scam" transactions were found to be particularly vulnerable, accounting for the majority of anomalies. Geographical analysis highlighted that Asia and Africa had the highest average risk scores, indicating a higher prevalence of high-risk transactions in these regions. Visualizations further emphasized the distribution of anomalous activities, providing valuable insights into regional and transaction-specific risks. The study demonstrates the effectiveness of Isolation Forest in detecting anomalies within blockchain transactions and underscores the importance of targeted security measures. The findings suggest that focusing on high-risk transaction types and regions can enhance blockchain security. Future research is encouraged to explore additional anomaly detection methods and integrate network analysis to further refine the detection of suspicious activities in decentralized networks. This research contributes to the growing body of knowledge on blockchain security, offering practical insights for improving the detection and mitigation of risks in the increasingly complex and interconnected world of the Metaverse.