Free preview mode

Enjoy the free questions and consider upgrading to gain full access!

Certified Data Engineer AssociateFree trialFree trial

By databricks
Aug, 2025

Verified

25Q per page

Question 26

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

  • A: All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.
  • B: All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.
  • C: All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.
  • D: All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

Question 27

Which type of workloads are compatible with Auto Loader?

  • A: Streaming workloads
  • B: Machine learning workloads
  • C: Serverless workloads
  • D: Batch workloads

Question 28

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Why has Auto Loader inferred all of the columns to be of the string type?

  • A: Auto Loader cannot infer the schema of ingested data
  • B: JSON data is a text-based format
  • C: Auto Loader only works with string data
  • D: All of the fields had at least one null value

Question 29

Which statement regarding the relationship between Silver tables and Bronze tables is always true?

  • A: Silver tables contain a less refined, less clean view of data than Bronze data.
  • B: Silver tables contain aggregates while Bronze data is unaggregated.
  • C: Silver tables contain more data than Bronze tables.
  • D: Silver tables contain less data than Bronze tables.

Question 30

Which query is performing a streaming hop from raw data to a Bronze table?

  • A:
  • B:
  • C:
  • D:

Question 31

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

  • A: Records that violate the expectation cause the job to fail.
  • B: Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.
  • C: Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.
  • D: Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Question 32

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which action can the data engineer perform to improve the start up time for the clusters used for the Job?

  • A: They can use endpoints available in Databricks SQL
  • B: They can use jobs clusters instead of all-purpose clusters
  • C: They can configure the clusters to autoscale for larger data sizes
  • D: They can use clusters that are from a cluster pool

Question 33

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.

Which approach can the data engineer use to set up the new task?

  • A: They can clone the existing task in the existing Job and update it to run the new notebook.
  • B: They can create a new task in the existing Job and then add it as a dependency of the original task.
  • C: They can create a new task in the existing Job and then add the original task as a dependency of the new task.
  • D: They can create a new job from scratch and add both tasks to run concurrently.

Question 34

A single Job runs two notebooks as two separate tasks. A data engineer has noticed that one of the notebooks is running slowly in the Job’s current run. The data engineer asks a tech lead for help in identifying why this might be the case.

Which approach can the tech lead use to identify why the notebook is running slowly as part of the Job?

  • A: They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.
  • B: They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing notebook.
  • C: They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing notebook.
  • D: They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.

Question 35

Which of the following commands will return the location of database customer360?

  • A: DESCRIBE LOCATION customer360;
  • B: DROP DATABASE customer360;
  • C: DESCRIBE DATABASE customer360;
  • D: ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};
  • E: USE DATABASE customer360;

That’s the end of your free questions

You’ve reached the preview limit for Certified Data Engineer Associate

Consider upgrading to gain full access!

Page 2 of 7 • Questions 26-50 of 171

Free preview mode

Enjoy the free questions and consider upgrading to gain full access!