Loading questions...
Updated
Want a break from the ads?
Become a Supporter and enjoy a completely ad-free experience, plus unlock Learn Mode, Exam Mode, AstroTutor AI, and more.
Support Examcademy
Your support keeps this platform running. Become a Supporter to remove all ads and unlock exclusive study tools.
Study without distractions
Supporters get zero ads, spaced-repetition Learn Mode, timed Exam Mode, and AI-powered explanations from AstroTutor.
You are using MADlib for Linear Regression analysis. Which value does the statement return?
SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;
In which lifecycle stage are test and training data sets created?
When creating a presentation for a technical audience, what is the main objective?
Your company has 3 different sales teams. Each team's sales manager has developed incentive offers to increase the size of each sales transaction. Any sales manager whose incentive program can be shown to increase the size of the average sales transaction will receive a bonus.
Data are available for the number and average sale amount for transactions offering one of the incentives as well as transactions offering no incentive.
The VP of Sales has asked you to determine analytically if any of the incentive programs has resulted in a demonstrable increase in the average sale amount.
Which analytical technique would be appropriate in this situation?
In data visualization, what is used to focus the audience on a key part of a chart?
Which word or phrase completes the statement? Data-ink ratio is to data visualization as __________ .
You are using the Apriori algorithm to determine the likelihood that a person who owns a home has a good credit score. You have determined that the confidence for the rules used in the algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are homeowners". What can you determine from the lift calculation?
Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?
What is an appropriate data visualization to use in a presentation for an analyst audience?
When would you use GROUP BY ROLLUP clause in your OLAP query?
Which type of numeric value does a logistic regression model estimate?
Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming.
Which query interface would you recommend?
The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single- instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use?
What does the R code -
z <- f[1:10, ]
do?
In R, functions like plot() and hist() are known as what?
Review the following code:
SELECT pn, vn, sum(prc*qty)
FROM sale -
GROUP BY CUBE(pn, vn)
ORDER BY 1, 2, 3;
Which combination of subtotals do you expect to be returned by the query?
The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in their massively parallel database. Which tool should they use to export the structured data from Hadoop?
In MADlib what does MAD stand for?
When would you prefer a Naive Bayes model to a logistic regression model for classification?
Before you build an ARMA model, how can you tell if your time series is weakly stationary?
What is an example of a null hypothesis?
You have fit a decision tree classifier using 12 input variables. The resulting tree used 7 of the 12 variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the model is 0.85. What is your evaluation of this model?
If your intention is to show trends over time, which chart type is the most appropriate way to depict the data?