Free preview mode
Enjoy the free questions and consider upgrading to gain full access!
Professional Machine Learning Engineer
Free trial
Verified
Question 51
You are developing an image recognition model using PyTorch based on ResNet50 architecture. Your code is working fine on your local laptop on a small subsample. Your full dataset has 200k labeled images. You want to quickly scale your training workload while minimizing cost. You plan to use 4 V100 GPUs. What should you do?
- A: Create a Google Kubernetes Engine cluster with a node pool that has 4 V100 GPUs. Prepare and submit a TFJob operator to this node pool.
- B: Create a Vertex AI Workbench user-managed notebooks instance with 4 V100 GPUs, and use it to train your model.
- C: Package your code with Setuptools, and use a pre-built container. Train your model with Vertex AI using a custom tier that contains the required GPUs.
- D: Configure a Compute Engine VM with all the dependencies that launches the training. Train your model with Vertex AI using a custom tier that contains the required GPUs.
Question 52
You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator:
Your model performs well, but just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You're willing to accept a small decrease in performance in order to reach the latency requirement.
Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?
- A: Switch from CPU to GPU serving.
- B: Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
- C: Increase the dropout rate to 0.8 and retrain your model.
- D: Increase the dropout rate to 0.8 in _PREDICT mode by adjusting the TensorFlow Serving parameters.
Question 53
You work on the data science team at a manufacturing company. You are reviewing the company’s historical sales data, which has hundreds of millions of records. For your exploratory data analysis, you need to calculate descriptive statistics such as mean, median, and mode; conduct complex statistical tests for hypothesis testing; and plot variations of the features over time. You want to use as much of the sales data as possible in your analyses while minimizing computational resources. What should you do?
- A: Visualize the time plots in Google Data Studio. Import the dataset into Vertex Al Workbench user-managed notebooks. Use this data to calculate the descriptive statistics and run the statistical analyses.
- B: Spin up a Vertex Al Workbench user-managed notebooks instance and import the dataset. Use this data to create statistical and visual analyses.
- C: Use BigQuery to calculate the descriptive statistics. Use Vertex Al Workbench user-managed notebooks to visualize the time plots and run the statistical analyses.
- D: Use BigQuery to calculate the descriptive statistics, and use Google Data Studio to visualize the time plots. Use Vertex Al Workbench user-managed notebooks to run the statistical analyses.
Question 54
Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?
- A: Use Vertex Al Pipelines to execute the experiments. Query the results stored in MetadataStore using the Vertex Al API.
- B: Use Vertex Al Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.
- C: Use Vertex Al Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
- D: Use Vertex Al Workbench user-managed notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API.
Question 55
You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (PII). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?
- A: Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.
- B: Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption.
- C: Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.
- D: Before training, use BigQuery to select only the columns that do not contain sensitive data. Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.
Question 56
You recently deployed an ML model. Three months after deployment, you notice that your model is underperforming on certain subgroups, thus potentially leading to biased results. You suspect that the inequitable performance is due to class imbalances in the training data, but you cannot collect more data. What should you do? (Choose two.)
- A: Remove training examples of high-performing subgroups, and retrain the model.
- B: Add an additional objective to penalize the model more for errors made on the minority class, and retrain the model
- C: Remove the features that have the highest correlations with the majority class.
- D: Upsample or reweight your existing training data, and retrain the model
- E: Redeploy the model, and provide a label explaining the model's behavior to users.
Question 57
You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?
- A: Create a tf.data.Dataset.prefetch transformation.
- B: Convert the images to tf.Tensor objects, and then run Dataset.from_tensor_slices().
- C: Convert the images to tf.Tensor objects, and then run tf.data.Dataset.from_tensors().
- D: Convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training.
Question 58
You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metric would give you the most confidence in your model?
- A: Precision
- B: Recall
- C: RMSE
- D: F1 score
Question 59
While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?
- A: Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow.
- B: Move the evaluation step out of your pipeline and run it on custom Compute Engine VMs with sufficient memory.
- C: Migrate your pipeline to Kubeflow hosted on Google Kubernetes Engine, and specify the appropriate node parameters for the evaluation step.
- D: Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step.
Question 60
You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?
- A: Use sparse representation in the test set.
- B: Randomly redistribute the data, with 70% for the training set and 30% for the test set
- C: Apply one-hot encoding on the categorical variables in the test data
- D: Collect more data representing all categories
Question 61
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?
- A: Modify the target variable using the Box-Cox transformation.
- B: Z-normalize all the numeric features.
- C: Oversample the fraudulent transaction 10 times.
- D: Log transform all numeric features.
Question 62
You are developing a classification model to support predictions for your company’s various products. The dataset you were given for model development has class imbalance You need to minimize false positives and false negatives What evaluation metric should you use to properly train the model?
- A: F1 score
- B: Recall
- C: Accuracy
- D: Precision
Question 63
You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?
- A: Increase the instance memory to 512 GB, and increase the batch size.
- B: Replace the NVIDIA P100 GPU with a K80 GPU in the training job.
- C: Enable early stopping in your Vertex AI Training job.
- D: Use the tf.distribute.Strategy API and run a distributed training job.
Question 64
You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?
- A: Train a TensorFlow model on Vertex AI.
- B: Train a classification Vertex AutoML model.
- C: Run a logistic regression job on BigQuery ML.
- D: Use scikit-learn in Vertex AI Workbench user-managed notebooks with pandas library.
Question 65
You recently developed a deep learning model. To test your new model, you trained it for a few epochs on a large dataset. You observe that the training and validation losses barely changed during the training run. You want to quickly debug your model. What should you do first?
- A: Verify that your model can obtain a low loss on a small subset of the dataset
- B: Add handcrafted features to inject your domain knowledge into the model
- C: Use the Vertex AI hyperparameter tuning service to identify a better learning rate
- D: Use hardware accelerators and train your model for more epochs
Question 66
You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?
- A: Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training.
- B: Develop a regression model using BigQuery ML.
- C: Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training.
- D: Develop a custom PyTorch regression model, and optimize it using Vertex AI Training.
Question 67
Your organization manages an online message board. A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive. Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?
- A: Add synthetic training data where those phrases are used in non-toxic ways.
- B: Remove the model and replace it with human moderation.
- C: Replace your model with a different text classifier.
- D: Raise the threshold for comments to be considered toxic or harmful.
Question 68
You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your model's features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?
- A: Classification
- B: Reinforcement Learning
- C: Recurrent Neural Networks (RNN)
- D: Convolutional Neural Networks (CNN)
That’s the end of your free questions
You’ve reached the preview limit for Professional Machine Learning EngineerConsider upgrading to gain full access!
Free preview mode
Enjoy the free questions and consider upgrading to gain full access!