Doing analysis of ETL (ex. Oracle Data Integrator) artifacts , mainly mappings can be challenging.
The more you invest time in analyzing the business requirements , the less you need to reinvestigate and customize in the future of what your team has developed.
In the following document , I am attaching a summary of steps that can make your analyst role more convenient .
Workshop Name : Data Sources – ETL Requirements Analysis
Proposed Audience :
-From Client Side : ETL Developers , Business Analyst
-Oracle Team : ETL and Data warehousing consultants
Proposed Duration :
5 Days per each data external source
-Questionnaire / Info Sheet :
The following info can be a guidance for analysis the ETL requirements for the above source system
1: Know the Data Requirements (For each interface / source table)
Please note that in some cases many source tables columns can be mapped into one target table , The following are the basic initial requirements to be collected about each target table .
Data source Name | |
Data Source Type (Supported types in ODI .. ex : CSV – DB Table – XML – ) | |
Target Table Type (Dimension / Fact…….) | |
Source Table Name | |
Source Table Function Description | |
Source Column Name | |
Source Column Description | |
Source Column Type | |
Target Column Name | |
Target Column Type | |
Mapping (Transformation Required Expression if any) |
2: Map the Source [to Staging Area] to Warehouse
There are datatypes to consider, and security permissions to consider, and naming conventions to implement. The mapping must be managed in much the same way as source code changes are tracked.
3: Dependencies
Collect dependency for each interface (source / target table mapping)
It is customary to load data in parallel, when possible. Even medium-sized data warehouses will have many gigabytes of data loaded every day. Yet, the data model will have dependencies on loading dimensions. ETL packages or jobs for some data will need to be completely loaded before other packages or jobs can begin.
4: Sizing :
Collect Sizing of each source / target table .
Volume of expected data and growth rates and the time it will take to load the increasing volume of data. If the ETL processes are expected to run during a three hour window be certain that all processes can complete in that timeframe, now and in the future.
5: Scheduling
Knowing the volume and dependencies will be critical in ensuring the infrastructure is able to perform the ETL processes reliably. Scheduling is required for every interface .