Monday, August 20, 2012

Source System Analysis

Step – 1 - Go through the BRS or business requirement specification and then:-
1. Determine whatever attributes business is asking to be in datawarehouse is already existing in datawarehouse or not. Sometimes it happens that you might be extracting that attribute in the datawarehouse.
2. Determine what can be your most relevant source. Most of the times, you are supposed to get the required attributes from a single source or some from source A and some from source B. But sometimes you may find that some attributes can be extracted from source A as well as source B.At that time you need to consider which source you are going to use. You may consider that some source is already existing in your dawarehouse to use that.

Step – 2 – Meet the source system to know the following:-
1. What are the source tables needed. Primary key of the table. If more than one tables needed to fetch data of a specfic flow what is the relation between the table ( PK-FK).
2. The total data volume the source get in the source tables. Is there any update of not. What is determining column to know whether a record in that table is an update/insert. What is the data field to know when that insertion/updation happened.
3. What will be volume of data in datawarehouse in case of incremental load or in case of initial load.
4. What is the retention period of the table data for source system?
5. Know the business meanings of the entities. Collect the data dictionary of they have. Find our the list of values if any.
6. Do they get batch.

Step – 3 – Check back with your requirement which are :-
1. Did you get all the attributes from source system as per the business requirement specification? If not, complete the step -1 once again and when you are satisfied with the meeting with the source system that there is nothing more information to extract from source system you can send your gap analysis to the business team. Gap analysis tells about some attributes which are not present in the source system so that business personals are aware what they are expecting to get.

Step – 4 – Hands on the table and attributes.
1. Get the access of the source tables from where you are to extract from source system DBAs.
2. Do some random check on data like duplicates in date columns.
3. How much nulls or junks in some attributes.Are the attributes from the LOVs or not etc.
4. Relationship among tables. It is basically verifying whatever information you have got from the source system is correct or not. In case of any doubt anywhere perform step 2 and step 3 again.

Step -5 – Start High Level design of Staging Tables:-
1. At first, determine whether you require new staging table or you can accomodate the new attributes to some existing tables.
2. If you are going the ammend the new attributes in some existing table
3. In case of new staging table:-
a. Are you about to truncate and re-load?
b. Or the table will be append only?
c. Which datatype you prefer for the fields?
d. Do you want any constraint to there?
e. What about pk/fk or indexes?
f. Are you going to pull all columns as is from source or only fields you need as per BRS.
g. Did you think about surrogate key?
Recommended: Keep you staging table as similar as the source flat data/table.

Step -6 – Start High Level design of DW Tables:-
1. Put all the columns which will be more accessed at the beginning.
2. Try to put the date columns together in date format.
3. Try always to assign some default values instead of leaving a column value as NULL.
4. Determine your partitioning and index strategy as per the query to be performed on this table.
5. Determine the retention time for the table.
6. To make DW table with richer information and easier reporting/BI, replace any coded values to LOV.

Labels: