Claim Missing Document
Check
Articles

Found 4 Documents
Search

Deciphering Digital Social Dynamics: A Comparative Study of Logistic Regression and Random Forest in Predicting E-Commerce Customer Behavior Sunarya, Po Abas; Rahardja, Untung; Chen, Shih Chih; Lic, Yung-Ming; Hardini, Marviola
Journal of Applied Data Sciences Vol 5, No 1: JANUARY 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i1.155

Abstract

This study compares Logistic Regression and Random Forest in predicting e-commerce customer churn. Utilizing the E-commerce Customer dataset, it navigates the complexities of customer interactions and behaviors, offering a rich context for analysis. The methodology focuses on meticulous data preprocessing to ensure data integrity, setting the stage for applying and evaluating Logistic Regression and Random Forest. Both models were assessed using accuracy, precision, recall, F1-Score, and AUC-ROC. Logistic Regression showed an accuracy of 90%, precision of 91% for class 0 and 82% for class 1, recall of 98% for class 0 and 50% for class 1, F1-Score of 94% for class 0 and 62% for class 1, and AUC-ROC of 0.88. Random Forest, with its ability to handle complex patterns, demonstrated higher overall performance with an accuracy of 95%, precision of 95% for class 0 and 93% for class 1, recall of 99% for class 0 and 74% for class 1, F1-Score of 97% for class 0 and 82% for class 1, and an AUC-ROC of 0.97. This comparative analysis offers insights into each model's strengths and suitability for predicting customer churn. The findings contribute to a deeper understanding of machine learning applications in e-commerce, guiding stakeholders in enhancing customer retention strategies. This research provides a foundation for further exploration into the digital social dynamics that shape customer behavior in the evolving digital marketplace.
Customer Segmentation and Targeted Retail Pricing in Digital Advertising using Gaussian Mixture Models for Maximizing Gross Income Hariguna, Taqwa; Chen, Shih Chih
Journal of Digital Market and Digital Currency Vol. 1 No. 2 (2024): Regular Issue September
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jdmdc.v1i2.11

Abstract

This study investigates the application of Gaussian Mixture Models (GMM) for customer segmentation and targeted pricing strategies in the retail industry to maximize gross income. Using a dataset of 1000 transaction records, the analysis focused on attributes such as unit price, quantity, total amount, and payment methods. The dataset was preprocessed to handle missing values, encode categorical features, and scale numerical features. The optimal number of components for the GMM was determined using the Bayesian Information Criterion (BIC), resulting in the selection of 10 clusters. Model training was conducted using the Expectation-Maximization (EM) algorithm, achieving convergence after 18 iterations. Customer segments were identified and analyzed based on their purchasing behaviors and demographic traits. For instance, Segment 0 preferred bulk purchases of lower-priced items, while Segment 1 favored higher-priced items in smaller quantities, resulting in a higher average purchase value of 2274.19. Conversely, Segment 2 showed a high frequency of returns, indicated by a negative average purchase value of -2608.40. Targeted pricing strategies were developed for each segment, aiming to maximize gross income. The effectiveness of the segmentation and pricing strategies was evaluated using metrics such as the silhouette score, with training and testing scores of 0.175 and 0.015 respectively, highlighting areas for improvement in clustering quality. This study underscores the potential of GMM in uncovering distinct customer segments and tailoring pricing strategies to enhance profitability. Future research should explore alternative clustering techniques and extend the analysis to other retail domains and larger datasets to validate and improve the findings. The practical implications for retail businesses include the need for iterative testing and refinement of pricing strategies based on customer segmentation to achieve sustainable growth and customer satisfaction.
Analyzing Sentiment Trends and Patterns in Bitcoin-Related Tweets Using TF-IDF Vectorization and K-Means Clustering Wahyuningsih, Tri; Chen, Shih Chih
Journal of Current Research in Blockchain Vol. 1 No. 1 (2024): Regular Issue June
Publisher : Bright Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jcrb.v1i1.11

Abstract

This study conducts a comprehensive analysis of Bitcoin-related tweets to understand sentiment trends and patterns using TF-IDF vectorization and K-means clustering. The dataset, comprising 1,544 unique tweets, was collected via the Twitter API and preprocessed to remove duplicates and clean the text. Sentiment analysis revealed a distribution of 53.7% neutral, 29.7% positive, and 16.6% negative tweets, indicating a predominant neutral sentiment in the discourse. Keyword analysis identified frequent terms such as 'bitcoin' (479 occurrences), 'new' (46), 'good' (43), 'crypto' (39), and 'trade' (39). Visualizations through word clouds highlighted the specific language associated with each sentiment category, with positive tweets focusing on opportunities and innovation, while negative tweets emphasized risks and scams. Cluster analysis using K-means, with the optimal number of clusters determined by the elbow method, resulted in three distinct clusters. Cluster 0, comprising 1,346 tweets, was characterized by neutral and informative content, focusing on market updates and trading strategies. Cluster 1, with 163 tweets, contained a higher concentration of positive sentiment, highlighting positive developments and investment opportunities. Cluster 2, the smallest with 35 tweets, focused on negative sentiment, reflecting concerns about market volatility and fraudulent activities. These clusters provided a nuanced understanding of the thematic composition of Bitcoin-related tweets. The study's findings have practical implications for investors, traders, and market analysts by providing insights into market mood and sentiment trends. The integration of these findings into predictive models can enhance market prediction accuracy and develop more effective trading strategies. Despite the study's contributions, limitations such as the dataset's language and scope suggest areas for future research, including real-time sentiment analysis and the incorporation of multimodal data sources. This research advances the field of sentiment analysis in financial markets, particularly within the context of cryptocurrencies, by offering a detailed and longitudinal examination of social media sentiment.
Determinants of Virtual Property Prices in Decentraland an Empirical Analysis of Market Dynamics and Cryptocurrency Influence Wahyuningsih, Tri; Chen, Shih Chih
International Journal Research on Metaverse Vol. 1 No. 2 (2024): Regular Issue September
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijrm.v1i2.12

Abstract

This study explores the emerging virtual property market within the digital world, with a focus on identifying the key factors influencing property prices, market activity, and sales volume. Using a dataset of 2,000 virtual property transactions, the research provides a comprehensive analysis of market dynamics in this new frontier of digital real estate. The findings reveal significant volatility in transaction activity, with a peak of 1,222 transactions in January 2022 followed by a sharp decline to 539 in February 2022 and just 24 in March 2022, indicative of a nascent and speculative market. The analysis identifies land price as the most significant determinant of virtual property values, showing a near-perfect correlation of 0.992 with sales prices. This highlights the critical role of location and land value, similar to traditional real estate markets. Additionally, the study finds that properties attracting more bids tend to sell at higher prices, with a moderate correlation of 0.380 between bids count and sales price, reflecting the impact of competitive bidding in driving up values. However, the market is relatively illiquid, with a mean sales count of just 1.79, indicating that most properties are held as long-term investments rather than frequently traded assets. Interestingly, the research also uncovers a weak negative correlation of -0.051 between sales price and the underlying cryptocurrency, MANA, suggesting that the value of virtual properties may be increasingly decoupled from cryptocurrency volatility as the market matures. These insights provide valuable guidance for investors, developers, and policymakers navigating the evolving landscape of virtual real estate. The study concludes with a discussion of the implications for future market stability and potential areas for further research.