Unboxing SAP Data Warehouse Cloud – Part 7
In this article, we complete our series on SAP Data Warehouse Cloud, looking at how best to make use of dataflows using the inbuilt editor, to import, transform and consolidate data from multiple sources.
If you have been following this series of blogs, you’ll already be familiar with SAP Data Warehouse Cloud. In this final episode, we are going to dive deeper into the Dataflow editor, to fully analyse its capabilities and give you some hints and tips for developing ETL (extract, transform, and load) dataflows in SAP DWC.
1. What is a dataflow
First, we need to understand what a dataflow is. In this case, we are talking specifically about an ETL dataflow. We can think about it as a linear process, that extracts data from one or more systems, performs transformations to the data, and consolidates it into a single structure (or table). Dataflows can be independent or can be grouped into small chains, depending on your exact requirements.
2. Use cases
When using dataflows in SAP Data Warehouse Cloud, there are two main use cases to consider:
Importing data into our DWC system: If we have multiple independent external systems, the data within them might not be aligned. The first step would be to load the data from the various systems into one single system, in this case, SAP Data Warehouse Cloud. This will improve the performance of any subsequent transformations/queries performed via the application, as all the data would be loaded into a single location, avoiding potential delays due to connection constraints.
Transforming existing data in our system to obtain the desired information: Once we have the data loaded into our system, the next step would be to create a dataflow that integrates the information we requre. Let’s say we have information about the master data regarding products coming from one system, and from another system we receive information about sales. We can combine said data using a dataflow, and we can perform analytics on the result of that transformation. Since the data is consolidated into a single table, this process will be faster and more efficient than accessing the different systems containing the original data.
3. Dataflows
Let’s dive straight in, open the Dataflow editor in SAP Data Warehouse Cloud and have a look at the layout we are presented with. We must navigate to the Data Builder tab, and within that, the option to create a new dataflow will appear. We can also edit existing dataflows.
Two main panels are visible:
3.1. Data source selection
In this panel we see all the different data sources we have available, with two tabs - Repository and Sources.
3.1.1. Repository
In this area, we see all the tables and views we have imported into our SAP Data Warehouse Cloud system. We encourage you to use the data from here if you want to create complex data flows.
3.1.2. Sources
Here, we can see all the different data sources we have connected into our system, and the content inside them. We want to extract the desired data from those sources and insert it into our system, so we can use it as a staging area, where further calculations will be performed on the data.
3.2. Data transformation
In this area, we can find the tools we will use in order to transform the data, based on our requirements.
Here’s a brief explanation of each feature:
3.2.1. Join
This feature allows us to join two different data sources by a set of fields of our choice. When performing a join we can select the kind of join we want to do (Inner, Left Outer, etc…), and we can set the mappings of the fields that will perform the join action.
3.2.2. Union
When performing a union, we are combining data from two different sources into one. This action is done without any constraint, so, for example, if we want to combine product information from two different sources, we can do so using a union clause, and decide which fields map into the result set.
3.2.3. Projection
When using the projection function, we can decide which fields of a data source we are projecting forward into the flow. This is useful when we have a very large data source in terms of fields, and we only want some of them for our process. We can also perform data filtering in this area, in order to restrict the output of the flow. We can think of this function as the ‘SELECT’ and ‘WHERE’ part of a SQL query.
3.2.4. Aggregation
When using this feature, we can perform data aggregation, based on our requirements. We can set which fields we want to aggregate (and the type of aggregation) and which fields we want to set to ‘static’. We can think about this as the ‘GROUP BY’ clause.
3.2.5. Script
When using SAP Data Warehouse Cloud, we can run small python scripts, (if we are familiar with programming in python) in order to perform more complex calculations “on the fly” within the Dataflow editor.
3.2.6. Add table
This option allows us to add a new data source - or alternatively we can drag and drop a source from the data source selection panel.
4. Conclusions
Following the points above should give you a good initial overview of what the Dataflow editor in SAP Data Warehouse Cloud can do for you. The tool is powerful enough to perform considerable and complex data consolidation from external sources, and facilitate the required transformations to prepare your data for the next step into your analytic layer.
This concludes our series of 7 articles on “Unboxing SAP Data Warehouse Cloud”. If you’ve joined us partway through, we’d recommend you check back through the previous articles for more information on the functionality and features of the platform.
If you’re interested in learning more about setting up SAP Data Warehouse Cloud as your enterprise data warehouse, please get in touch, and we’ll be happy to give you all the information you need.