KPI raw data (5,855,201 rows) CDR raw data (9,232,275 rows) Twitter API (44,989 tweets) Keras: Open-source neural network library. The deep learning engine to train the model. Bert: Pre-training language model. The embedding layer. VaderSentiment Sentiment analysis algorithm for social media content. The sentiment analysis tool. PostgreSQL: Open-source relational database management system. Storing data in psql table. Grafana: Open source metric analytics & visualization suite. Visualize the anomaly results and twitter sentiment analysis result. Docker: A tool for building and running distributed applications. Application deployment an environment version control. Bert Embedding + LSTM 87.1% Keras Embedding + LSTM 51.33% Twitter API Live streaming data VaderSentiment Bert Embedding Customer Issue Content (CSV file, 3 labels, 42,267 rows) KPI STD data LSTM Reconstruction error calculation Dimension Reduction AutoEncoder 200 39 36 24 36 39 Data Anomalies Output Shape 39 Dimensions Output Shape 200 Dimensions Input Shape 39 Dimensions Anomalies As telecommunication companies seek to improve network coverage and maintain customer satisfaction, they need to constantly monitor customer complaints and the status of their networks. This project, in collaboration with Tupl Inc. seeks to reduce the latency between the time a network coverage area experiences issues and when the network operator notices these issues. Deep learning models process several types of telecom data to predict which data are anomalies. - Call detail records (CDR) data from customer complaints determines if customers have recurrent cellular issues. - Key performance indicators (KPI) data determines which base stations are down at a given time. - Twitter tweets identify customers’ dissatisfaction on social media that isn’t directly reported to the company. Sentiment Trends Tweets location with sentiment value Sentiment Distribution Issue label Distribution Detail of tweets Anomaly station on topology map Twitter Sentiment Analysis Anomaly & Standard Deviation Heatmap Anomalies LSTM Layer (256 units) DenseLayer (100 units, relu) DenseLayer (3 units, softmax) dropout = 0.1 dropout = 0.1 CDR raw data (41 features) 2/3G data 4G VoLTE data KPI-Like Data (5 features) KPI-Like Data (8 features) x1 x2 x3 x4 x5 x6 x7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 a1 a1 a2 a3 a4 a5 a2 a3 a4 a5 a1 a2 a3 Reconstruction error 2/3G Anomaly Result 4G Anomaly Result Input Layer 1432 Hidden Layer 256 Output Layer 1432 716 716 Find Anomalies Otsu Threshold Encoder Decoder Training Data Input Shape (30, 768) Classification Result Telcom related tweet Tweet with sentiment value Comparing the Upload Traffic Volume, we plotted the input data in blue vs. the prediction data from our model in orange. We calculated the reconstruction error between the two datasets and determined the anomaly threshold for this feature to identify the anomalies. Index of Site ID IMSFaultCodeRate Prediction IMSFaultCodeRate