Using NoSQL for Analytics, or the Data Scientist's Frankenstein
It is no secret that NoSQL is a thing, and adoption rates are rising as companies are trying to figure out how and why they should adopt or even switch over to this “new” approach to storing data.
Since we are always keen on innovation, we set out to figure out if we could use a NoSQL database in our analytics applications. Unlike our title’s namesake however, we also wanted to figure out if we should.
Why did we do this?
Well, like in Mary Shelley’s novel, for science! Data Science, to be precise.
And precision is indeed the topic here: a central point of analytics is that it requires precision and structure so that reporting models can be built and shown, aggregated and drilled, filtered and manipulated. There is a need for models to be strictly defined in order to make sense of the underlying business logic that produces the data and the KPIs that we derive from it.
For example, one of our analytics products here at Clariba is Genie, providing mobile analytics for the Executive on the go. Would you like to know more?
Like any analytics application, it is built on tables and models because it needs to be so. However, this need is a constraint, as each new endeavor needs a new model, tables to be created, semantics to be stipulated and charts to be created. Custom Analytics needs precision.
Could it be possible to circumvent this requirement and break the mold that has been made by enterprise analytics and defy the status quo?
1. Breaking the Enterprise Analytics Status Quo
As our alternate title's story used lightning to defy the status quo of life, we wanted to do the same for Analytics. Sadly, modern computers still aren’t very fond of lightning, so we used some NoSQL for a base and the next best thing to lightning to power it up - apparently the answer to this question is usually JavaScript.
We implemented Java-script on top of a MongoDB. The resulting creation has pretty good performance, leveraging NoSQL speed and usability, with Mongoose providing a “model” so that analytics can still be available at the end. In internal performance benchmarks, it even performed slightly better than our own SAP Cloud Platform implementation in terms of data availability latency. Code review-wise, however, the author’s code gathered similar comments to Frankenstein’s creation.
Hold on, faster (if only slightly) than a leading market provider in Cloud Database and Analytics solutions? Yes! For the KPI that we custom-built… on top of a backend we custom-built… on top of a database whose selling point is speed and scalability and it’s only eventually consistent. Oh right, we’re in the real world, and not in Frankenstein! So, when can you really expect a significant improvement with NoSQL?
Denormalized data models with few joins
While usually you can perform a join in a NoSQL environment, it doesn’t mean you should. Often you are trading off part of the benefit of using it in favor of doing a join:
This is easily the biggest hurdle for most, as most enterprises already have their data in a normalized format where joins are mandatory.
In stark contrast, even in the “Semantics Level” for SAP Cloud Platform, Calculation views easily allow for joins and unions with usually non-relevant performance overhead.
New capabilities, not changes
In order to take full advantage of NoSQL you must consider using it as early as possible in the design process. It is not feasible to “change” an existing solution and apply NoSQL to it - the word you’d be looking for is rebuilding – it is simply different than traditional SQL approaches from an architectural standpoint, not only for data modelling.
Consistency
Lastly, this would not be a NoSQL comparison without mentioning consistency:
When you INSERT and COMMIT into a SQL Database, the data is either written and made available, or you get an error. This is an immediate yes/no operation.
In NoSQL it’s a bit more complicated. The data is immediately staged and written or made available eventually. While this is great for speed, it also means that should something go wrong when the data is written, there are no guarantees that it will be available. This is simply not feasible for some applications regardless of the increase in speed.
2. Summing up
NoSQL implementations are very promising, and have shown great results, but they are not a panacea for modern analytics.
The bottom line is that if you are an enterprise customer that needs a very specialized solution for a single problem, a suite of tools like SAP Cloud Platform might be considered overkill, but it would still work, and you would have a foundation on which to build more. Most enterprises probably would not be able to do the same with a custom NoSQL solution unless they have specialized resources to administer and maintain it.
NoSQL is not a catch-all and it requires that developers and business users adjust to it, since they are likely already used to working with traditional RDMS. But if speed and scale are what you really need for a specific application, then absolutely go for a NoSQL-based solution.
As always, your mileage (and JavaScript) may vary, what is important is that there are no stops on the road to progress. The good news is that SAP is offering MongoDB as part of the SAP Cloud Platform, so existing customers can get the best of both worlds - just get in touch with us to find out more.