The Growing use of technology in society, really affect the intensity of society in doing online transaction for buy and sell items, including the competition between e-commerce companies. In order to compete with other e-commerce companies, SIRCLO, an e-commerce company, need to do an analytics to data that they have from all the transaction activities in their online shop, but to do that analytics, it needs a system that can read the raw dat. Based on those problem, this research is needed related to designing the infrastructure that can read those data. Basically, this research is using Apache Drill, HDFS as a file system, and script that written in Python to convert data from MySQL to JSON. This research starts from converting from data source (this research is using MySQL) to JSON, then will be stored in HDFS, and Apache Drill will do query to the file. Apache Drill is used because of the flexibility, it could do query with MySQL's syntax to plain text, and using schema free concept, also for file system is using HDFS because with hope that reading the data from distributed file system could be more effective and have better data management. This research was conducted with several scenarios, that is from the number of server that is used and size of the file, Parameter that's used is resource usage and process time of an activity. After this research is finish, this research acquired a design and component that can read SIRCLO's data, data from MySQL can be acquired and normalized to JSON, and after the design is implemented, this infrastructure can process SIRCLO's data.
Copyrights © 2018