ETL testing is used to verify the correctness of data migration from the source database to the destination database, along with the validation of the transformation rules.
In this article, we have compiled a list of some of the frequently asked ETL Testing Interview Questions. So, let’s begin.
ETL Testing Interview Questions
Ques.1. What is ETL?
Ans. ETL stands for Extract, Transform, and Load. These three are database functions and they are performed on the data to pull it out from one or more sources (database) to another (database).
Ques.2. What is the ETL Process?
Ans. ETL process has three steps-
- Extraction – In this step, data is extracted from one or more source databases.
- Transformation – In this second step, data is transformed into a format that is appropriate for the destination database.
- Load – Finally, transformed data is loaded into the destination/target database.
Ques.3. What is ETL Testing? Or What is the importance of ETL Testing?
Ans. ETL testing is performed to ensure the accuracy of the data that is loaded into the destination database. It also makes sure that required data transformation rules are followed and there isn’t any loss of data during the ETL process.
Ques.4. What are the various challenges of ETL Testing?
Ans. The various challenges of ETL Testing are-
- Unavailability of test data since ETL testing requires a large amount of test data.
- Lack of skilled resources since ETL testing requires complex SQL queries.
- Designing test cases is complex since ETL testing requires working with large volumes of data.
- Available test data may not cover all the possible scenarios.
- Data loss during the ETL testing.
Ques.5. What are the four ETL Testing Types? Explain each of them in brief.
Ans. This is one of the most commonly asked ETL testing interview questions. Following are the four ETL testing types-
- New Data Warehouse Testing – In this type of testing, a new data warehouse is built from scratch. Data inputs are collected from the customer.
- Migration Testing – This type of testing is required when data is migrated from the old database to the new database.
- Change Request – In this type of testing, data is fetched from different sources and loaded into the existing database.
- Report Testing – In report testing, reports generated in the data warehouse system are tested for data accuracy, access privileges, and layout.
Ques.6. What is a Star Schema?
Ans. A star schema is a multidimensional schema used to model data warehouse systems. It contains one or more fact tables and multiple dimension tables. The fact table is at the center and multiple dimension tables are associated with it which resembles the star shape.
Ques.7. What is a Snowflake Schema?
Ans. A snowflake schema is a multidimensional schema used to model data warehouse systems. A snowflake schema contains one or more fact tables, multiple dimension tables, and sub dimension tables. It is an extension of star schema and it resembles the snowflake shape.
Ques.8. What is a Fact Table?
Ans. A fact table in the dimensional model contains quantitative information about the facts important to the business. Fact tables are used for analysis and they are often denormalized.
A fact table consists of two types of columns: ‘Foreign key’ columns and ‘Measures’ columns. ‘Foreign Key’ columns are used to refer to the dimension tables and ‘Measures’ columns have data to be analyzed.
Some examples of facts are the number of products sold, the number of orders placed, etc.
Ques.9. What are the three types of Facts?
Ans. The three types of Facts are –
- Additive – Additive facts are the facts for which each dimension can be summed up.
- Semi-additive – Semi-additive facts are the facts for which some of the dimensions can be summed up but not all.
- Non-additive – Non-additive facts are the facts for which none of the dimensions can be summed up.
Ques.10. What is a Transaction Fact Table?
Ans. A transaction fact table is one of the three types of the fact table and the most basic one. In this type of fact table, each event is stored only once, and it contains the data of the smallest level. Also, the number of rows in this fact table is similar to the number of rows in the source table.
Ques.11. What is a Periodic Snapshot Fact Table?
Ans. In this type of fact table, an event can be stored multiple times and it captures the state of the facts at predetermined periods.
Ques.12. What is an Accumulated or Accumulating Snapshot Fact Table?
Ans. An accumulated fact table is the most challenging of all the fact table types and it can be used when a business process has a definite beginning and end.
Ques.13. What is Grain in the context of a Fact Table?
Ans. The grain in the fact table represents the level of detail for a single row.
Ques.14. What is a factless fact table?
Ans. A factless fact table does not have any measures or facts. Also, it only contains ‘Foreign Key’ columns for establishing the relationship between dimensions.
Ques.15. What is a dimension table?
Ans. A dimension table is one of the two types of tables used in dimensional modeling, other being fact table. A dimension table describes the dimensions or the descriptive criteria of objects in the fact table e.g. Location dimension can include street, city, zip code, state, etc.
Ques.16. Name some of the types of dimensions.
Ans. Following are the types of dimensions-
- Slowly Changing Dimension
- Conformed Dimension
- Junk Dimension/Dirty Dimension
- De-generated Dimension
- Bridge Dimension
Ques.17. What is SCD?
Ans. SCD stands for Slowly Changing Dimensions. These are one of the types of dimensions. SCDs are the dimensions whose attributes do not change regularly but change over a period.
For example – Customer Dimension. The attributes of customer dimension such as Address and Name of the customer do not change often.
Ques.18. What are the types of SCD? Explain each of them in brief.
Ans. Types of SCD-
- Types 0 – In this type, the dimension attributes never change. e.g. DoB.
- Type 1 – In this type of SCD, the new information overwrites the old data and the track of historical changes are not kept. Such as a change in the ‘City’ attribute of Customer Dimension if the customer changes the city.
- Type 2 – In this type of SCD, the new information is added in a new row and the track of historical changes is kept. The new record will be assigned a new primary key. e.g. In the above example of city change, a new record will be added to add a new city.
- Type 3 – In this type, instead of creating a new row like Type 2, a new column is added e.g. In the above example of city change, two more columns, ‘New City’ and ‘Year’ or ‘Date’ will be added.
- Type 4 – In this type of SCD, a separate ‘History table’ or mini-dimension is created to track the changes. Current values of the attributes are displayed in the dimension table and all the changes or old values are displayed in the history table.
- Type 5 – In this type, the type 4 mini dimension is used, and type 1 reference is used to refer to the mini-dimension table in the base dimension table.
- Type 6 – This type of SCD uses a combination of types 1, 2, and 3. It is also known as a ‘Hybrid’ method.
Ques.19. What is Conformed Dimension?
Ans. These types of dimensions can be used in multiple locations with different fact tables either in a single database or in multiple data marts.
Some of the examples of conformed dimensions are Customer dimension, Product dimension, Time dimension, etc. e.g. Product dimension can be used in calculating the number of products sold and revenue generated.
Ques.20. What is Junk Dimension or Dirty Dimension?
Ans. As the name suggests, these dimensions contain junk or random attributes that do not belong to any particular dimension. Such dimensions are created so that in the fact table many foreign keys can be avoided.
Ques.21. What is Degenerated Dimension?
Ans. Degenerated dimension is a dimension that does not have its own separate dimension, but it is a part of the fact table e.g. receipt or Invoice number does not have a separate dimension table. But as it is very important from a business perspective, it is stored in the fact table.
Ques.22. What is a Data Mart?
Ans. The data mart is a database that contains data stored in one of the segments of the data warehouse so in other words, the data mart is a subset of the data warehouse. These segments of the data warehouse usually belong to individual organizational business areas such as finance, marketing, sales, etc.
Ques.23. What is BUS Schema?
Ans. BUS schema contains conformed dimensions. In other words, it is used to identify the common dimensions across all the data marts of the organization.
Ques.24. What are the three layers of an ETL cycle? Explain them in brief.
Ans. The three layers of ETL cycle are-
- Staging Layer – It is also known as the source layer. This layer is used to store the data fetched from different data sources.
- Integration Layer – After the source data is transformed, it is stored in the integration layer. In this layer, data is stored in a database.
- Access Layer – It is also known as the dimension layer. It is a front-end layer used by the end-user to generate the reports.
Ques.25. What is a staging area?
Ans. A staging area is also known as the landing zone where data from various sources is stored temporarily. A staging area is important as all the data should be available in a single area before it is loaded into the data warehouse or data mart. It is used for storing and cleaning the data before moving it to the target database.
Ques.26. What is ODS?
Ans. ODS stands for Operational Data Store. It is used for operational reporting on the data stored from various data stores. Also, it contains the snapshot of the latest data from these sources.
Ques.27. What are active and passive transformations? Explain them in brief.
Ans. Active Transformation – In this type of transformation, when the data is passed to the target database from source, the number of rows is changed.
Passive Transformation – In this type of transformation, when the data is passed to the target database, the number of rows is not changed.
Ques.28. What is Data Purging?
Ans. Data purging is the process of removing or erasing junk data from the data warehouse. It is usually done to free-up the database space.
This completes our post on top ETL testing interview questions for freshers and experienced professionals. I hope these questions will help you in your interviews. Do let us know if you think we have missed any of the common ETL interview questions.
Kuldeep is the founder and lead author of ArtOfTesting. He is skilled in test automation, performance testing, big data, and CI-CD. He brings his decade of experience to his current role where he is dedicated to educating the QA professionals. You can connect with him on LinkedIn.