In previous posts we have covered the role thatGoogle Analytics and Facebookplay in BI projects focused on Social Media Analysis. Therefore, it was only a matter of time before we covered Twitter - the most popular micro-blogging network you can find on the web. Besides that, it will not be rare to find analysts and reviews that consider Twitter the social network that can potentially deliver the highest amount of meaningful information to analyze.At this point, I guess everyone has a general idea of what Twitter is and what it delivers, so the objective of this article will be to make an overview of a Sentiment Analysis showcase that we built extracting data from Twitter with SAP BusinessObjects Tools. Then, in future articles we will cover each phase of the development in more detail. Generally speaking, we consider Sentiment Analysis as the process of identifying, extracting and measuring data from a subjective information source, such as customer surveys, opinion polls, or tweets as in our case.
Data Extraction
As in any BI project, the first step is to define the data that you need, and how to get it. Using SAP BusinessObjects tools, the best way to do this is to develop an Adapter for Data Integrator using the SDK that this tool includes in its installation folders (check this article from SAP SDN that proved to be very helpful).
However, to do the demo as quickly as possible, we used another approach:
We developed a Java program that made use of Twitter’s getSearch API to extract tweets and place them in text files Note that for demo purposes this is more than enough, but for a broader project the flat files are not a satisfactory solution.
With Data Integrator, we configured an ETL flow to extract the data from the files and store them in database tables to accumulate enough tweets to make the demo meaningful.
Also consider that in this phase it is very important to get comfortable with Twitter’s API and the different parameters that it uses so you can take advantage of it as much as possible.
Data Parsing and Sentiment Analysis
Once we were able to place the tweets in text files and customize the extraction parameters as we desire, then we could actually analyse the tweets to start delivering insight from them. To do so, we followed these steps:
Get the raw tweets that we stored in the database before and perform a parsing process with Data Integrator, to get rid of the JSON format that Twitter API uses, enabling us to manipulate the tweets as text strings.
Use the feature of Text Analysis that Data Integrator includes to perform the “Sentiment Analysis” process and classify the tweets in one of the different sentiment categories that we used. For the demo purposes that we had there is a SAP Blueprint called Text Data Processing Data Quality that contains Data Integrator jobs with a Voice of Customer implementation that already contains a set of extraction rules implemented for the English language. Therefore, you can make use of this blueprint and its rules to develop the Sentiment Analysis phase.
Build a universe on top of the tables with the analyzed data in order to make it available for reporting with any of the SAP BusinessObjects tools that take an universe as data source, e.g. Xcelsius, WebIntelligence, Explorer, etc. In this step, we also made a use of an universe that came included in the same Text Data Processing Data Quality blueprint that we used for the point above.
Data Visualization
Finally comes the eye-catching part: present all the hard work you have done. To show the users how flexible this solution can be, we decided to present the data with Explorer and some Exploration Views built on top of its Information Spaces. However, as said before, if you build an universe on top of the tables that resulted from the Text Analysis process then you will have a great number of possibilities and tools to play with, in order to bring forth the presentation you want according to your requirements and objectives.
In future articles, we will cover each one of these sections in further detail. However, with this general layout we hope you get a good idea of what you need to do to make your Sentiment Analysis demo happen!
If you have any questions or anything to add to help improve this post, please feel free to leave your comments.