These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate at a high level.
- What is the biggest data set that you processed, and how did you process it, what were the results?
- Tell me two success stories about your analytic or computer science projects? How was lift (or success) measured?
- What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
- What is: collaborative filtering, n-grams, map reduce, cosine distance?
- How to optimize a web crawler to run much faster, extract better information, and better summarize data to produce cleaner databases?
- How would you come up with a solution to identify plagiarism?
- How to detect individual paid accounts shared by multiple users?
- Should click data be handled in real time? Why? In which contexts?
- What is better: good data or good models? And how do you define "good"? Is there a universal good model? Are there any models that are definitely not so good?
- What is probabilistic merging (AKA fuzzy merging)? Is it easier to handle with SQL or other languages? Which languages would you choose for semi-structured text data reconciliation?
Click here to read full list of 66 questions