Is this exam completely free?

ExamCademy offers a free preview for every exam, covering around 20% of the total questions. Full access to all questions is available for free by signing up for an account. Optional supporters unlock AstroTutor (AI tutor) and advanced study modes.

Can I practice IT certification exams from Microsoft, AWS, or CompTIA?

Yes! ExamCademy supports a wide range of IT certification exams including Microsoft Azure, AWS Cloud Practitioner, and CompTIA A+ and Security+.

How accurate are the mock exams?

Our practice questions are community-created and moderator-reviewed to align with realistic exam objectives, style, and difficulty.

Certified Data Engineer Associate by Databricks - Page 1 | ExamCademy

Mode Selection

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

Question 25

Page 1 of 7 • Questions 1-25 of 171

1 2 3 4 5

→

Know a question that should be here? Contribute to this exam

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.
Which of the following describes how a data lakehouse could alleviate this issue?

A Both teams would autoscale their work as data size evolves
B Both teams would use the same source of truth for their work
C Both teams would reorganize to report to the same department
D Both teams would be able to collaborate on projects in real-time
E Both teams would respond more quickly to ad-hoc requests

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

A Databricks Repos automatically saves development progress
B Databricks Repos supports the use of multiple branches
C Databricks Repos allows users to revert to previous versions of a notebook
D Databricks Repos provides the ability to comment on specific changes
E Databricks Repos is wholly housed within the Databricks Lakehouse Platform

A data engineer has been given a new record of data:

id STRING = 'a1'
rank INTEGER = 6
rating FLOAT = 9.4

Which SQL commands can be used to append the new record to an existing Delta table my_table?

A INSERT INTO my_table VALUES ('a1', 6, 9.4)
B INSERT VALUES ('a1', 6, 9.4) INTO my_table
C UPDATE my_table VALUES ('a1', 6, 9.4)
D UPDATE VALUES ('a1', 6, 9.4) my_table

A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.

Which keyword can be used to compact the small files?

A OPTIMIZE
B VACUUM
C COMPACTION
D REPARTITION

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

A Table
B Function
C View
D Temporary view

A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s sales are in their own file in the location "/transactions/raw".

Today, the data engineer runs the following command to complete this task:

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

What explains why the statement might not have copied any new records into the table?

A The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
B The COPY INTO statement requires the table to be refreshed to view the copied rows.
C The previous day’s file has already been copied into the table.
D The PARQUET file format does not support COPY INTO.

Which command can be used to write data into a Delta table while avoiding the writing of duplicate records?

A DROP
B INSERT
C MERGE
D APPEND

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which command could the data engineering team use to access sales in PySpark?

A SELECT * FROM sales
B spark.table("sales")
C spark.sql("sales")
D spark.delta.table("sales")

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which location will the customer360 database be located?

A dbfs:/user/hive/database/customer360
B dbfs:/user/hive/warehouse
C dbfs:/user/hive/customer360
D dbfs:/user/hive/database

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:

DROP TABLE IF EXISTS my_table;

After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.

What is the reason behind the deletion of all these files?

A The table was managed
B The table's data was smaller than 10 GB
C The table did not have a location
D The table was external

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.

They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

A FROM "path/to/csv"
B USING CSV
C FROM CSV
D USING DELTA

What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

A Parquet files can be partitioned
B Parquet files will become Delta tables
C Parquet files have a well-defined schema
D Parquet files have the ability to be optimized

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.
Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

A Databricks account representative
B This transfer is not possible
C Workspace administrator
D New lead data engineer
E Original data engineer

Which SQL keyword can be used to convert a table from a long format to a wide format?

A TRANSFORM
B PIVOT
C SUM
D CONVERT

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f"SELECT customer_id, spend FROM {table_name}")

What can be used to fill in the blank to successfully complete the task?

A spark.delta.sql
B spark.sql
C spark.table
D dbutils.sql

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

A
B
C
D

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.

Which code block successfully completes this task?

A
B
C
D

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which code block can the data engineer use to complete this task?

A
B
C
D

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

A trigger("5 seconds")
B trigger(continuous="5 seconds")
C trigger(once="5 seconds")
D trigger(processingTime="5 seconds")

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

A Auto Loader
B Unity Catalog
C Delta Lake
D Delta Live Tables

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which approach can the data engineer take to identify the table that is dropping the records?

A They can set up separate expectations for each table when developing their DLT pipeline.
B They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.
C They can set up DLT to notify them via email when records are dropped.
D They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

What is used by Spark to record the offset range of the data being processed in each trigger in order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing?

A Checkpointing and Write-ahead Logs
B Replayable Sources and Idempotent Sinks
C Write-ahead Logs and Idempotent Sinks
D Checkpointing and Idempotent Sinks

What describes the relationship between Gold tables and Silver tables?

A Gold tables are more likely to contain aggregations than Silver tables.
B Gold tables are more likely to contain valuable data than Silver tables.
C Gold tables are more likely to contain a less refined view of data than Silver tables.
D Gold tables are more likely to contain truthful data than Silver tables.

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following commands could the data engineering team use to access sales in PySpark?

A SELECT * FROM sales
B There is no way to share data between PySpark and SQL.
C spark.sql("sales")D. spark.delta.table("sales")
D spark.table("sales")

What describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

A CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.
B CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.
C CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.
D CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Certified Data Engineer AssociatePreview