What is ETL?
ETL stands for Extract, Transform, Load. ETL is the process of combining three steps i.e. Extracting, Transforming and Loading of data from one source to another.
In simple words, ETL operations are performed on the data to pull it out from one database to another. The ETL process is often used in data warehousing.
Let’s discuss the ETL process in brief, now.
What is ETL Process?
The first step of the ETL process is Data Extracting. In this step, data is extracted from the source database; there can be more than one data source.
In the second step, Data Transforming, extracted data is transformed by applying various rules and functions to be stored in the destination database in a proper format. Data is extracted from different sources and it is very likely that it will have many problems such as the same object is given different names, or the same name is given to different objects.
In the final step, Data Loading, transformed and uniformly formatted data is loaded into the destination database.
What is ETL Testing?
ETL testing is a type of testing performed to ensure that the data migrated from source to destination database is accurate and it has followed the valid transformation rules.
Let’s take an example of the merger between two companies – Company A and Company B. After the merger, their operations will be combined and their customers, employees and other details will be stored in a single centralized database.
Assume that Company A is using the Oracle database to store all the information and company B is using MySQL. Now, to merge their information, both the companies can use the ETL process to migrate data from their individual databases to one agreed-upon database.
In the ETL process, as two databases are different, both the companies’ data will be in a different format, different naming conventions would have been used, different table structures would have been used and so on. Due to these differences, companies need to make sure before data is loaded into the destination database, it has been cleaned properly and it can mold into the desired format.
In ETL testing, testers need to make sure that the data of both the databases have been converted into the format of destination database; required transformation functions have been performed; no data has been lost during the process and data is accurate.
Steps of ETL Testing Process
The ETL testing process is somewhat similar to other traditional or conventional testing processes. There are stages in which ETL testing is conducted. The following are the stages of ETL testing process-
- Identify and understand Business Requirements – In this step, the testing team tries to understand the requirements of the ETL process usually with the help of a Business Analyst. Testers need to understand the structure of the source database, transformation rules that will be used. They also need to find out any issues or gaps present in the requirements or data sources.
- Validate Data Sources – In this step, once requirements have been understood, testers try to validate the requirements to check whether their understanding is the same as the requirements.
- Test Estimation – In the test estimation step, the estimation of effort and time required in the testing process will be calculated. This estimation is based on the complexity of data sources, transformation rules, and resource availability.
- Test Planning – In this step, detailed test planning happens based on the previous steps. A test plan document will be prepared along with test scope, testing approach, potential risks and risk mitigation steps.
- Test Design – This step involves designing of test scenarios and test cases. These test cases and scenarios should cover all the requirements and all the transformation rules.
- Pre-execution Check – Prepared test cases and scenarios are confirmed with the business analyst to make sure everything is covered as per the business requirements.
- Test Data Creation – In this step, testers should create dummy data for all the applicable source tables.
- Test Execution – After test data creation, all the test cases are executed. Test cases are included of all three steps of the ETL process i.e. extracting, transforming and loading.
- Defect Reporting and Fixing – This step is performed until exit criteria are met i.e. if any defect is found in the previous step, it is sent for fixing and once it has been fixed, retesting is done.
- Test Reporting – In this step, a test report is prepared which has a list of test cases and their final status (passed or failed). This test report is given to the stakeholder and the process is closed.
ETL Testing Types
Following are the four types of ETL testing-
- New Data Warehouse Testing – In this type of ETL testing, everything is done from scratch. The data input information is collected from the customer. Source and destination databases are freshly built and checked using ETL tools.
- Migration Testing – In this type of ETL testing, the customer has an existing working data warehouse; the customer also has an existing ETL tool. The migration testing process is required when data is loaded from the existing database to the fresh database. The old database is called legacy or source database and the new database is called the target or destination database.
- Change Request – In this process, data is fetched from different sources and is loaded in the existing warehouse, it doesn’t use any new database. Apart from the new data loading, the customer might need to change an existing business rule or add a new business rule.
- Report Testing – Once the data warehouse is built, the system allows users to generate various reports. This testing verifies layout, data accuracy and user access restrictions of the reports.
In today’s market, every business runs around the data. To organize the data effectively and accurately, data warehouse and business intelligence processes are extremely helpful. The ETL process is an important part of data warehousing projects.
ETL testing is a significant process when data is transferred from one or multiple databases to another database, especially when bulk data is used. ETL testing assures the accuracy of data loaded in the destination database. Due to its importance, ETL testing has a bright scope and ETL testers are in big demand.
We hope that this tutorial was helpful in giving you a detailed understanding of the ETL testing process. Do check our ETL testing interview questions post for some of the most commonly asked ETL interview questions.
Kuldeep is the founder and lead author of ArtOfTesting. He is skilled in test automation, performance testing, big data, and CI-CD. He brings his decade of experience to his current role where he is dedicated to educating the QA professionals. You can connect with him on LinkedIn.