In recent years, efforts to detect traffic events from social media platforms have accelerated due to their extensive coverage and low costs. In the studies conducted to date, tweets have been converted into numerical vectors using the bag-of-words representations. However, bag-of-words do not take into account the order of words and have several problems, such as sparsity. Also, last studies have used supervised deep learning architectures and generic word embeddings, which obtained from sources like Wikipedia. Word embeddings obtained by using this type more formal spelling corpora is successful in representing the general meanings of words, while there are limitations in terms of both coping with noise in user-generated texts and representing domainspecific meanings of words. In this study, to overcome these problems, a domain-specific word embedding created for the traffic area consisting of approximately 1.5 M tweets and its concatenated with generic word embedding. Besides, two datasets were created, which are composed of 2 and 8 classes. Then, the concatenated word embedding tested on these datasets using a convolutional neural network (CNN) and long short-term memory (LSTM) architectures. Experimental results show that the proposed approach on the generated dataset provides a significant improvement over state-of-the-art methods.Keywords: Traffic event detection, Domain-specific word embedding, Twitter, Deep learning
Copyrights © 2019