Loading provider exams...

Professional Data Engineer Practice Exam — Free Google Questions

Your organization is modernizing its IT services and moving to Google Cloud. You must organize the data that will be stored in Cloud Storage and BigQuery. You need to implement a data mesh approach for sharing data among the sales, product design, and marketing departments. What should you do?

A
1. Create a project for storage of the data for each of your departments.2. Enable each department to create Cloud Storage buckets and BigQuery datasets.3. Create user groups for authorized readers for each bucket and dataset.4. Enable the IT team to administer the user groups to add or remove users as the departments’ request.
B
1. Create multiple projects for storage of the data for each of your departments’ applications.2. Enable each department to create Cloud Storage buckets and BigQuery datasets.3. Publish the data that each department shared in Analytics Hub.4. Enable all departments to discover and subscribe to the data they need in Analytics Hub.
C
1. Create a project for storage of the data for your organization.2. Create a central Cloud Storage bucket with three folders to store the files for each department.3. Create a central BigQuery dataset with tables prefixed with the department name.4. Give viewer rights for the storage project for the users of your departments.
D
1. Create multiple projects for storage of the data for each of your departments’ applications.2. Enable each department to create Cloud Storage buckets and BigQuery datasets.3. In Dataplex, map each department to a data lake and the Cloud Storage buckets, and map the BigQuery datasets to zones.4. Enable each department to own and share the data of their data lakes.

Explanation

A data mesh decentralizes data ownership to business domains while enabling governed sharing. Dataplex Universal Catalog can model departmental domains as data lakes, organize them with zones, and attach Cloud Storage buckets and BigQuery datasets as data assets. This structure enables each department to own and share its domain data.

Learn more

Build a data mesh | Dataplex Universal Catalog

You are monitoring your organization’s data lake, which is hosted on BigQuery. The ingestion pipelines read data from Pub/Sub and write it into BigQuery tables. After deploying a new version of the ingestion pipelines, the daily stored data grew by 50%. Pub/Sub data volumes remained unchanged, and only some tables had their daily partition data size double. You need to investigate and resolve the cause of the data increase. What should you do?

A
1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled.2. Schedule daily SQL jobs to deduplicate the affected tables.3. Share the deduplication script with the other operational teams to reuse if this occurs to other tables.
B
1. Check for code errors in the deployed pipelines.2. Check for multiple writing to pipeline BigQuery sink.3. Check for errors in Cloud Logging during the day of the release of the new pipelines.4. If no errors, restore the BigQuery tables to their content before the last release by using time travel.
C
1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled.2. Check the BigQuery Audit logs to find job IDs.3. Use Cloud Monitoring to determine when the identified Dataflow jobs started and the pipeline code version.4. When more than one pipeline ingests data into a table, stop all versions except the latest one.
D
1. Roll back the last deployment.2. Restore the BigQuery tables to their content before the last release by using time travel.3. Restart the Dataflow jobs and replay the messages by seeking the subscription to the timestamp of the release.

Explanation

Duplicate data in only a subset of tables after a pipeline deployment indicates that more than one pipeline version may be ingesting the same Pub/Sub data into those tables. Duplicate rows confirm the symptom; BigQuery audit logs identify the jobs associated with table writes, and Dataflow monitoring correlates those jobs with their start times and pipeline versions. Stopping all but the latest version removes the concurrent writer and prevents further duplicate ingestion. BigQuery audit logs record table and job activity, while the Dataflow monitoring interface provides job status and timing information.

Learn more

Professional Data EngineerDemo

Professional Data EngineerDemo

Exam Topics

How to Use This Practice Exam

Download the Full Exam PDF

QuestionQ1

Community Discussion

QuestionQ2

Community Discussion

QuestionQ3

Community Discussion

QuestionQ4

Community Discussion

QuestionQ5

Community Discussion

QuestionQ6

Community Discussion

QuestionQ7

QuestionQ8

QuestionQ9

QuestionQ10

QuestionQ11

QuestionQ12

QuestionQ13

QuestionQ14

QuestionQ15

QuestionQ16

QuestionQ17

QuestionQ18

QuestionQ19

QuestionQ20

QuestionQ21

QuestionQ22

QuestionQ23

QuestionQ24

QuestionQ25

It's free

QuestionQ6

Community Discussion

QuestionQ7

QuestionQ8

QuestionQ9

QuestionQ10

QuestionQ11

QuestionQ12

QuestionQ13

QuestionQ14

QuestionQ15

QuestionQ16

QuestionQ17

QuestionQ18

QuestionQ19

QuestionQ20

QuestionQ21

QuestionQ22

QuestionQ23

QuestionQ24

QuestionQ25

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Want a break from the ads?

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion

Community Discussion