Add Your Heading Text Here

Doing analysis of ETL (ex. Oracle Data Integrator) artifacts , mainly mappings can be challenging.

The more you invest time in analyzing the business requirements , the less you need to reinvestigate and customize in the future of what your team has developed.

In the following document , I am attaching a summary of steps that can make your analyst role more convenient .

Workshop Name : Data Sources – ETL Requirements Analysis

 Proposed Audience :

-From Client Side : ETL Developers , Business Analyst

-Oracle Team : ETL and Data warehousing consultants

Proposed Duration :

5 Days per each data external source

-Questionnaire / Info Sheet :

The following info can be a guidance for analysis the ETL requirements for the above source system

1: Know the Data Requirements (For each interface / source table)

Please note that in some cases many source tables columns can be mapped into one target table , The following are the basic initial requirements to be collected about each target table  .

Data source Name
Data Source Type (Supported types in ODI .. ex : CSV – DB Table – XML – )
Target Table Type (Dimension / Fact…….)
Source Table Name
Source Table Function Description
Source Column Name
Source Column Description
Source Column Type
Target Column Name
Target Column Type
Mapping (Transformation Required Expression if any)

2: Map the Source [to Staging Area] to Warehouse

There are datatypes to consider, and security permissions to consider, and naming conventions to implement. The mapping must be managed in much the same way as source code changes are tracked.

3: Dependencies

Collect dependency for each interface (source / target table mapping)

It is customary to load data in parallel, when possible. Even medium-sized data warehouses will have many gigabytes of data loaded every day. Yet, the data model will have dependencies on loading dimensions. ETL packages or jobs for some data will need to be completely loaded before other packages or jobs can begin.

4: Sizing :

Collect Sizing of each source / target table .

Volume of expected data and growth rates and the time it will take to load the increasing volume of data. If the ETL processes are expected to run during a three hour window be certain that all processes can complete in that timeframe, now and in the future.

5: Scheduling

Knowing the volume and dependencies will be critical in ensuring the infrastructure is able to perform the ETL processes reliably. Scheduling is required for every interface .

Related Articles

INFORMATICA_ADMIN_PAGE_SSL_ERROR_PROD

Problem Description : While attempting to connect to PowerCenter Informatica Administrator on Production https:// etl1.mycompany.com:8443/administrator/ The following error message(s) is displayed, which differs from browser

Read More