AI Sentiment Analysis using Google AI and Google Cloud
Last updated
Was this helpful?
Last updated
Was this helpful?
I built this automated real-time sentiment analysis system to explore better options for building an automated data pipeline to monitor customers' sentiments. I tried out a lot of different big data technologies for this project such as Hortonworks products, MapR products, Microsoft Azure data products, and Google Cloud data products. For the final project, I chose Google AI and Google cloud products. In this article, I will explain how I used Google AI and GCP to build an automated analytics system to analyze popular products from both technical and business perspectives. In this case study, I analyzed Apple products using the pipeline and made product recommendations based on the analysis.
There are a couple of components to form this data pipeline including data input stream, natural language processing engine, real-time database storage, long-term database storage, real-time messaging queue, and a real-time dashboard.
1. Twitter real-time stream API grabs the data input stream to feed into this pipeline.
2. Real-time data stream then gets processed in a natural language processing engine.
3. Processed data gets injected into Google firebase real-time DB and Google BigQuery Table as well as published to Eon Messaging Queue.
4. Eon dashboard project subscribes data from Eon Messaging Queue and displays real-time tweets analysis on the dashboard.
The keywords that Twitter real-time stream API filters on are "Apple Watch, Apple Music, iPhone 8, iPhone X, IOS, iPhone 7, Macbook Pro, Apple TV, Apple pay, iMac, macOS High Sierra". Those keywords are only for the minimum viable product (MVP). I plan to use different keywords for a product line such as Apple music for future projects. You can refer to the Roadmap section to get more information.
After twitter API pushes real-time Apple products related tweets stream to Google Natural language processing API through the app I built, Google Natural language processing API starts real-time analysis and parse them to tokens as well as give every single tweet a score and magnitude. The app then grabs the analyzed results from Natural language processing API and sends the data to FireBase real-time database and Google BigQuery DB.
This is what Google Dashboard looked like when Google Natural language API and Google BigQuery API were processing the real-time tweets data of Apple products.
Google Natural Lan API + BigQuery API Processing the Data
You can check all the messages you send to Eon messaging queue at their console.
In addition, FireBase has some cool A/B Testing functions that you can use to design the A/B testing experiment. We can customize this pipeline to grab user behavior data directly from Apple products such as the Apple music app. Then we can push A/B testing easily by reusing this pipeline with firebase A/B testing tools. You can set up goals and multivariate test cells.
You can also measure A/B testing results with statistical analysis.
● This is the first phase of the sentiment analysis project for Apple products. It filters on all Apple products to analyze people's attitude towards Apple products in general.
● In the second phase of the project, we can filter by a specific product line or specific Events such as Apple Music or WWDC to identify customers' attitudes towards a specific product line or event.
● In the third phase, we can use machine learning algorithms such as clustering algorithms to improve analytics to cluster issues that customers complain about and rank those issues for further examination. By adding this functionality, we can be more proactive and understand top product issues in real-time.
● In the fourth phase, this system can be expanded to all social media platforms. We can grab information from all social media to get a larger population of random data.
● In the fifth phase, we can run real-time analysis on competitors' products to identify the strengths and weaknesses of their products in order to identify opportunities that we can tackle as well as threats we need to be prepared for.
● In the sixth phase, we can expand this automated real-time data analytics system to the call center to analyze the reasons why people call and to predict call volumes and ways to reduce calls.
● In the seventh phase, we can use production log information as input data to find out production issues in our applications and predict issues in production.
● In the eighth phase, we can use this pipeline to grab user behavior data from Apple devices and apps to design A/B testing and recommendation system.
Prioritization We can prioritize roadmap according to couple frameworks, experiences, and intuition. The first framework that I love to use is the importance versus satisfaction framework.
For example, we need to decide the priority between phase 5 and phase 6. Should we expand the functionalities to analyze competitors' products first or analyze call center issues first? Since this is an internal analytical system, our customer is Apple including Apple employees and stakeholders. We can first collect information regarding current solutions and situations on competitor analysis and call center. Does Apple have a good data analytical solution towards one of them or both of them? Which issue is more important to Apple? Then we can pick up more important issues that Apple is not currently satisfied with as the first priority.
The second framework that I like is the Kano model
Another idea to prioritize is Return on Investment
We can prioritize features based on if it's must-have, nice to have, or Delightful to have. We can make a comparison matrix with competitive products according to the Kano model to see what strategies and what features worth investing to get competitive advantages.
In this Return on Investment chart, we can see Idea A and G clearly have more ROI than any other ideas on the chart. Those two can be prioritized based on this chart. Prioritization can't solely rely on those frameworks. Experience and intuition are important as well. The business world is not perfect, we can't always get all the information that we need to make an absolutely correct decision. We need to be able to make decisions faster with fewer errors even with little or no information. Those decision making ability or intuition can be obtained through a combination of multiple ways such as work experience, building side projects, running a side business, reading books and etc.
Measure the SuccessHow do we know if the data pipeline project works for Apple? We can measure the result from a couple of metrics. Some of my favorite KPIs is AARRR ( Acquisition, Activation, Retention, Referral, Revenue).
Maybe at first glance, you don't think those metrics are applicable for an internal project. I challenge you to think about running an internal project as a real business. For example, we kick off the project by building it for one particular team or just a POC and then we socialize inside the company. We will gather acquisition and activation data as we sell it to more product lines and departments. The same thing is for retention and referral rates. We can use revenue metrics as well. For example, for the revenue data, we can measure the cost-saving plus revenue of new business opportunities identified by this real-time analytics system. We definitely can simplify the process of measurement and data collection and lower the cost. We don't want to consume too many resources while serving internal customers.
This pipeline has a lot of potentials to help popular product lines. As you can see in this case study, Apple could save costs and identify new product opportunities by analyzing customers’ feelings towards products. For MVP, we only need one engineer for a week to make it production-ready. Let’s work together to put MVP into production first and then make the roadmap a reality!
Appendix
● You can check this video to see real-time dashboard changes.
This is how looked like when processing real-time tweets data of Apple products which has already been analyzed by . You can see magnitude, score, text, tokens associated with every tweet here.
Before the application sends the processed data to , I put some calculations there to calculate 3 values - counts of positive, neutral, and negative tweets. First, I multiplied score and magnitude. A positive score means the customer demonstrates a positive attitude in this tweet. A negative score means the customer demonstrates a negative attitude in this tweet. Zero means a neutral attitude. Magnitude means to what degree that customer demonstrates that attitude. By multiplying these two values together, I get more accurate sentiments from customers. Second, I used the results of multiplication to compare with Zero to identify positive, neutral, and negative attitudes. Third, I use 3 counters to count the total number of positive, neutral, and negative tweets. Only those 3 values will be pushed into Eon Messaging Queue for every tweet.
I used Eon open source project to build a simple dashboard that subscribes to the queue I created in the Eon Messaging system. The dashboard shows real-time changes in total counts of positive, neutral, and negative sentiments for Apple products. Here is the final Dashboard!! You can check to see real-time dashboard changes. You can check out the project from my
● You can check out the project from my here.
If you want to know more, you can shoot me an email at .
● You can check out my here.
● Images in prioritization and measure the success sections are from this book -
● I learned some of the data pipeline ideas from