Loading questions...
Updated
Want a break from the ads?
Become a Supporter and enjoy a completely ad-free experience, plus unlock Learn Mode, Exam Mode, AstroTutor AI, and more.
Which of the following issues should a data scientist be most concerned about when generating a synthetic data set?
A data scientist is performing a linear regression and wants to construct a model that explains the most variation in the data. Which of the following should the data scientist maximize when evaluating the regression performance metrics?
A data scientist is building an inferential model with a single predictor variable. A scatter plot of the independent variable against the real-number dependent variable shows a strong relationship between them. The predictor variable is normally distributed with very few outliers. Which of the following algorithms is the best fit for this model, given the data scientist wants the model to be easily interpreted?
A data scientist wants to evaluate the performance of various nonlinear models. Which of the following is best suited for this task?
Which of the following is the layer that is responsible for the depth in deep learning?
Which of the following modeling tools is appropriate for solving a scheduling problem?
Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?
A data analyst wants to save a newly analyzed data set to a local storage option. The data set must meet the following requirements:
Be minimal in size -
Have the ability to be ingested quickly
Have the associated schema, including data types, stored with it
Which of the following file types is the best to use?
Which of the following is a key difference between KNN and k-means machine-learning techniques?
A data scientist needs to:
Build a predictive model that gives the likelihood that a car will get a flat tire.
Provide a data set of cars that had flat tires and cars that did not.
All the cars in the data set had sensors taking weekly measurements of tire pressure similar to the sensors that will be installed in the cars consumers drive. Which of the following is the most immediate data concern?
The term "greedy algorithms" refers to machine-learning algorithms that:
A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?
Which of the following compute delivery models allows packaging of only critical dependencies while developing a reusable asset?
A data analyst is analyzing data and would like to build conceptual associations. Which of the following is the best way to accomplish this task?
Which of the following belong in a presentation to the senior management team and/or C-suite executives? (Choose two.)
During EDA, a data scientist wants to look for patterns, such as linearity, in the data. Which of the following plots should the data scientist use?
Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?
A data scientist has constructed a model that meets the minimum performance requirements specified in the proposal for a prediction project. The data scientist thinks the model's accuracy should be improved, but the proposed deadline is approaching. Which of the following actions should the data scientist take first?
Which of the following best describes the minimization of the residual term in a ridge linear regression?
A statistician notices gaps in data associated with age-related illnesses and wants to further aggregate these observations. Which of the following is the best technique to achieve this goal?
A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses. Which of the following is the most efficient way to identify the chemical businesses' observations?
Which of the following distance metrics for KNN is best described as a straight line?
A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years. Which of the following forecasting techniques is the most appropriate for the data scientist to use?
An analyst wants to show how the component pieces of a company's business units contribute to the company's overall revenue. Which of the following should the analyst use to best demonstrate this breakdown?
Which of the following does k represent in the k-means model?