Developing an Advanced Auditing Strategy Using Data Services
How often do you ask yourself: why isn't there anything similar to the audit universe that comes as a default feature in the SAP Business intelligence platform 4.0 for auditing ETL? How useful it would be to have an audit table in order to save useful information regarding Data Services ran jobs? All of us know that you can audit some things via DSCM (Data Services Management Console), but can you do it at the row level? In order to solve all of these questions, this article explains a workaround for auditing the ETL at the row level and make your life easier when errors occur, when you need an analysis or if you are just tracking data.
The idea is quite simple and the main part of it consists of a table (Load_Audit_Table). This table contains a number of columns that include a unique identifier called “Load_Key” and some other columns containing:
- the name of the user who ran the job
- the name of the job ran
- the starting date and time
- the ending date and time
- a Boolean field to indentify if the job finished in error or not
- the error message
- the log file location description.
Of course you can add more columns depending on your needs.
The other important part of the solution is a column named “Load_Key” and that column should appear in all the tables we use in the ETL (ODS tables, FACT tables, and so on).
The purpose of this solution is to give us the possibility not just to easily identify an error but also to audit at the row level. This means we can also analyze or track data which has been loaded in a certain date during the whole process because all the rows in all the tables that had loaded on this date will have the same “Load_key”. Have a look at the example below.
Every time the ETL runs, a new load Key in generated automatically and is loaded into the Load_Audit_Table and in all the other system tables if there are no errors.
Now that we understand how it works and the value of this implementation, we need to explain how to develop this solution in terms of objects, codes, etc.
In the image below you find the job structure developed to implement the auditing method:
As you can see in the image the job is ALWAYS composed by two workflows, one at the beginning called “Audit_start” and a second one at the end of the job called “Audit_End”.
The job that we want to audit is located in the middle, in our example we are auditing the workflow that loads the dimension tables. Moreover, the main workflow should be surrounded by the “Try” and “Catch” blocks in order to get the error information at the end of the job in case of failure.
Now let see what is inside of each one of these blocks in order to understand how the auditing system works.
In the image below we are showing what is inside the first block, “Audit_Start”.
In this step we generate the unique identifier for the audit table. It is very simple: in the plain code block we add a SQL function which extracts the last key from the Audit table plus 1.
Sql(‘Datawharehouse’,’select max(load_key) +1 from audit_table)
Then, inside the data flow, we generate a new row in order to add the new key in the audit table.
The next block is the Try, but this block cannot be modified because it is only a block that gives a flag to record error information. The only thing we need to do is add it into the job and create the links.
After the main job, the Catch block is the following. In this case, Catch is editable - by opening the block with a double-click we can add objects into it.
In the image below you can see what is inside the Catch block.
First of all, we have the Audit_End workflow, and you might ask why? The reason is because when an error occurs, the job finishes in the Catch block. This means that if we had not got the Audit_end block inside the Catch block, every time there was an error, we would have been missing important information.
Then a plain code block comes; inside it there is a code that provides us all the error information and the code that allows us to update the new row created for the new load_key.
sql('DataWarehouse','update load_audit set is_error='1', error_num=' || error_number() || ', error_desc=''
|| error_message() || ''where load_Key='|| $G_Job_Load_key);
Finally, the last step is adding the “Audit_End” block which provides us with the date and time when the job is finished. To do this we write a SQL code that adds this information to the Audit_table.
sql('DataWarehouse','update load_audit set job_end_date=sysdate where load_Key='|| $G_Job_Load_key);
Now you have implemented the audit method developed by Clariba consultants! In addition, you can use this auditing table to shoot an email when an error occurs. You can set it up by reading the column “Is_Error” and if the record is “1” send an email to the BO administrator with the error information saved in the same table.
We hope this will be of use to you. If you have any questions or observations, please leave a comment below.