Is this exam completely free?

ExamCademy offers a free preview for every exam, covering around 20% of the total questions. Full access to all questions is available for free by signing up for an account. Optional supporters unlock AstroTutor (AI tutor) and advanced study modes.

Can I practice IT certification exams from Microsoft, AWS, or CompTIA?

Yes! ExamCademy supports a wide range of IT certification exams including Microsoft Azure, AWS Cloud Practitioner, and CompTIA A+ and Security+.

How accurate are the mock exams?

Our practice questions are community-created and moderator-reviewed to align with realistic exam objectives, style, and difficulty.

Professional Machine Learning Engineer by Google - Page 1 | ExamCademy

Mode Selection

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

Question 25

Page 1 of 14 • Questions 1-25 of 339

1 2 3 4 5

→

Know a question that should be here? Contribute to this exam

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?

A 1 = Dataflow, 2 = AI Platform, 3 = BigQuery
B 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
C 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
D 1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage

Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [˜drivers_license', ˜passport', `˜credit_card']. Which loss function should you use?

A Categorical hinge
B Binary cross-entropy
C Categorical cross-entropy
D Sparse categorical cross-entropy

You are an ML engineer at a manufacturing company. You need to build a model that identifies defects in products based on images of the product taken at the end of the assembly line. You want your model to preprocess the images with lower computation to quickly extract features of defects in products. Which approach should you use to build the model?

A Reinforcement learning
B Recommender system
C Recurrent Neural Networks (RNN)
D Convolutional Neural Networks (CNN)

You are developing an ML model intended to classify whether X-ray images indicate bone fracture risk. You have trained a ResNet architecture on Vertex AI using a TPU as an accelerator, however you are unsatisfied with the training time and memory usage. You want to quickly iterate your training code but make minimal changes to the code. You also want to minimize impact on the model’s accuracy. What should you do?

A Reduce the number of layers in the model architecture.
B Reduce the global batch size from 1024 to 256.
C Reduce the dimensions of the images used in the model.
D Configure your model to use bfloat16 instead of float32.

You have successfully deployed to production a large and complex TensorFlow model trained on tabular data. You want to predict the lifetime value (LTV) field for each subscription stored in the BigQuery table named subscription. subscriptionPurchase in the project named my-fortune500-company-project.

You have organized all your training code, from preprocessing data from the BigQuery table up to deploying the validated model to the Vertex AI endpoint, into a TensorFlow Extended (TFX) pipeline. You want to prevent prediction drift, i.e., a situation when a feature data distribution in production changes significantly over time. What should you do?

A Implement continuous retraining of the model daily using Vertex AI Pipelines.
B Add a model monitoring job where 10% of incoming predictions are sampled 24 hours.
C Add a model monitoring job where 90% of incoming predictions are sampled 24 hours.
D Add a model monitoring job where 10% of incoming predictions are sampled every hour.

You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?

A Distribute the dataset with tf.distribute.Strategy.experimental_distribute_dataset
B Create a custom training loop.
C Use a TPU with tf.distribute.TPUStrategy.
D Increase the batch size.

You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform across the various languages and without changing the serving infrastructure.

You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API. However, the model has significant differences in performance across the different languages. How should you improve it?

A Add a regularization term such as the Min-Diff algorithm to the loss function.
B Train a classifier using the chat messages in their original language.
C Replace the in-house word2vec with GPT-3 or T5.
D Remove moderation for languages for which the false positive rate is too high.

You work for a gaming company that develops massively multiplayer online (MMO) games. You built a TensorFlow model that predicts whether players will make in-app purchases of more than $10 in the next two weeks. The model’s predictions will be used to adapt each user’s game experience. User data is stored in BigQuery. How should you serve your model while optimizing cost, user experience, and ease of management?

A Import the model into BigQuery ML. Make predictions using batch reading data from BigQuery, and push the data to Cloud SQL
B Deploy the model to Vertex AI Prediction. Make predictions using batch reading data from Cloud Bigtable, and push the data to Cloud SQL.
C Embed the model in the mobile application. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data to Cloud SQL.
D Embed the model in the streaming Dataflow pipeline. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data to Cloud SQL.

You are building a linear regression model on BigQuery ML to predict a customer’s likelihood of purchasing your company’s products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

A Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file, and upload it as part of your model to BigQuery ML.
B Create a new view with BigQuery that does not include a column with city information
C Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.
D Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.

You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the app that verifies a customer’s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be downloaded and stored into the bank databases. Which learning strategy should you recommend to train and deploy this ML mode?

A Data Loss Prevention API
B Federated learning
C MD5 to encrypt data
D Differential privacy

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.8);

CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

A There is training-serving skew in your production environment.
B There is not a sufficient amount of training data.
C The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table.
D The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the training table.

During batch training of a neural network, you notice that there is an oscillation in the loss. How should you adjust your model to ensure that it converges?

A Decrease the size of the training batch.
B Decrease the learning rate hyperparameter.
C Increase the learning rate hyperparameter.
D Increase the size of the training batch.

You are designing an ML recommendation model for shoppers on your company's ecommerce website. You will use Recommendations AI to build, test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?

A Use the ג€Other Products You May Likeג€ recommendation type to increase the click-through rate.
B Use the ג€Frequently Bought Togetherג€ recommendation type to increase the shopping cart size for each order.
C Import your user events and then your product catalog to make sure you have the highest quality event stream.
D Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.

You work for a toy manufacturer that has been experiencing a large increase in demand. You need to build an ML model to reduce the amount of time spent by quality control inspectors checking for product defects. Faster defect detection is a priority. The factory does not have reliable Wi-Fi. Your company wants to implement the new ML model as soon as possible. Which model should you use?

A AutoML Vision Edge mobile-high-accuracy-1 model
B AutoML Vision Edge mobile-low-latency-1 model
C AutoML Vision model
D AutoML Vision Edge mobile-versatile-1 model

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

A Train a TensorFlow model on Vertex AI.
B Train a classification Vertex AutoML model.
C Run a logistic regression job on BigQuery ML.
D Use scikit-learn in Notebooks with pandas library.

You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. What should you do?

A Convert the speech to text and extract sentiments based on the sentences.
B Convert the speech to text and build a model based on the words.
C Extract sentiment directly from the voice recordings.
D Convert the speech to text and extract sentiment using syntactical analysis.

You need to analyze user activity data from your company’s mobile applications. Your team will use BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?

A Configure Pub/Sub to stream the data into BigQuery.
B Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.
C Run a Dataflow streaming job to ingest the data into BigQuery.
D Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,

You work for a gaming company that manages a popular online multiplayer game where teams with 6 players play against each other in 5-minute battles. There are many new players every day. You need to build a model that automatically assigns available players to teams in real time. User research indicates that the game is more enjoyable when battles have players with similar skill levels. Which business metrics should you track to measure your model’s performance?

A Average time players wait before being assigned to a team
B Precision and recall of assigning players to teams based on their predicted versus actual ability
C User engagement as measured by the number of battles played daily per user
D Rate of return as measured by additional revenue generated minus the cost of developing a new model

You are building an ML model to predict trends in the stock market based on a wide range of factors. While exploring the data, you notice that some features have a large range. You want to ensure that the features with the largest magnitude don’t overfit the model. What should you do?

A Standardize the data by transforming it with a logarithmic function.
B Apply a principal component analysis (PCA) to minimize the effect of any particular feature.
C Use a binning strategy to replace the magnitude of each feature with the appropriate bin number.
D Normalize the data by scaling it to have values between 0 and 1.

You work for a biotech startup that is experimenting with deep learning ML models based on properties of biological organisms. Your team frequently works on early-stage experiments with new architectures of ML models, and writes custom TensorFlow ops in C++. You train your models on large datasets and large batch sizes. Your typical batch size has 1024 examples, and each example is about 1 MB in size. The average size of a network with all weights and embeddings is 20 GB. What hardware should you choose for your models?

A A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64 machine with 64 vCPUs and 58 GB RAM
B A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM
C A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM
D A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM

You are an ML engineer at an ecommerce company and have been tasked with building a model that predicts how much inventory the logistics team should order each month. Which approach should you take?

A Use a clustering algorithm to group popular items together. Give the list to the logistics team so they can increase inventory of the popular items.
B Use a regression model to predict how much additional inventory should be purchased each month. Give the results to the logistics team at the beginning of the month so they can increase inventory by the amount predicted by the model.
C Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory on the amount predicted by the model.
D Use a classification model to classify inventory levels as UNDER_STOCKED, OVER_STOCKED, and CORRECTLY_STOCKEGive the report to the logistics team each month so they can fine-tune inventory levels.

You are building a TensorFlow model for a financial institution that predicts the impact of consumer spending on inflation globally. Due to the size and nature of the data, your model is long-running across all types of hardware, and you have built frequent checkpointing into the training process. Your organization has asked you to minimize cost. What hardware should you choose?

A A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4 NVIDIA P100 GPUs
B A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an NVIDIA P100 GPU
C A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a non-preemptible v3-8 TPU
D A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU

You work for a company that provides an anti-spam service that flags and hides spam posts on social media platforms. Your company currently uses a list of 200,000 keywords to identify suspected spam posts. If a post contains more than a few of these keywords, the post is identified as spam. You want to start using machine learning to flag spam posts for human review. What is the main advantage of implementing machine learning for this business case?

A Posts can be compared to the keyword list much more quickly.
B New problematic phrases can be identified in spam posts.
C A much longer keyword list can be used to flag spam posts.
D Spam posts can be flagged using far fewer keywords.

You are designing an architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon.
The proposed architecture has the following flow:

Which endpoints should the Enrichment Cloud Functions call?

A 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision
B 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language
C 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API
D 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API

One of your models is trained using data provided by a third-party data broker. The data broker does not reliably notify you of formatting changes in the data. You want to make your model training pipeline more robust to issues like this. What should you do?

A Use TensorFlow Data Validation to detect and flag schema anomalies.
B Use TensorFlow Transform to create a preprocessing component that will normalize data to the expected distribution, and replace values that don’t match the schema with 0.
C Use tf.math to analyze the data, compute summary statistics, and flag statistical anomalies.
D Use custom TensorFlow functions at the start of your model training to detect and flag known formatting errors.

Professional Machine Learning EngineerPreview