The rapid growth of urban transportation systems has led to the generation of massive amounts of data, commonly referred to as big data. This study aims to analyze transportation patterns using large-scale data obtained from the NYC Taxi Trip Records. The dataset exhibits key big data characteristics, including volume, velocity, and variety. This research applies the K-Means clustering algorithm to group taxi trip data based on features such as trip distance, fare amount, and trip duration. Several preprocessing techniques are performed, including data cleaning, feature engineering, sampling, and normalization. The optimal number of clusters is determined using the Elbow Method and Silhouette Score. The results show that the dataset can be effectively grouped into three clusters representing distinct transportation patterns. These findings demonstrate the capability of clustering techniques in extracting meaningful insights from large-scale datasets and highlight their potential application in urban transportation planning.
Copyrights © 2026