Could using big data for recruiting hide your purple squirrels?

Last night, I attended Moneyball for Talent Acquisition, a great presentation by LinkedIn’s James Raybould on how to use data to drive your talent search. Raybould pointed out some of the ways in which LinkedIn can help you narrow down your candidate pool, identifying the supply of qualified candidates for a given position based on particular characteristics. And where there’s lots of competition for qualified people–hiring for casino operations positions in Las Vegas, for example–LinkedIn can also suggest alternative markets where the supply might not be quite as big, but the demand is lower as well (Denver turns out to be that comparable supply market for Las Vegas–who knew?).

The talk did a great job of showcasing many advantages (supply and demand modeling) and disadvantages (lack of target market identification, filtering of endorsement providers) of some LinkedIn tools. LinkedIn certainly knows a lot, and it’s a valuable resource in finding and connecting with potential employees, as well as potential employers (not surprising, but if you follow a company you’re more likely to apply for a job there). Raybould also touched on how LinkedIn can help employers predict not only the skills they currently need, but the skills they will need, based in part on trends in the market and among competitors. Still, the presentation, and some discussions I had with people at the event, also made me a little uneasy about the potential of big data for recruiting. Here’s why.

Part of the idea behind Moneyball is that what really matters are the unidentified factors in success. You don’t want the home run hitters, you want the left-handed batters with the best on-base percentage at away games. The magic of the model was its ability to take seemingly irrelevant factors into account, thanks to a plethora of data inputs. But can LinkedIn, and tools like it, do the same, especially when they’re armed primarily with where you went to school and where you worked before? (For example, I went to the same undergraduate school as Steve Jobs–does that make me just like him? Or was wearing a turtleneck the major factor in Steve’s success?)

If you create your models based on where people were educated and where they’ve worked, it only makes sense that those factors become your predictors of success. But data modeling, if done with narrow inputs like these, has a problem: it can’t (always) find the outliers. As Malcolm Gladwell has taught us, unexpected factors are often what shape outliers: cultural background is a huge determinant of airplane pilot performance, for example, or your grandparents’ occupations may actually be what drive your success in math and science. The big factor in determining success could even be something as simple asworking from home for an hour a day. So if you’re constantly trolling LinkedIn only for Caltech-trained engineers from Oklahoma who have at least 20 endorsements for C++, because that’s what all your other successful engineers were, you might be missing out on your real purple squirrel: someone with the same (unidentified) characteristics that matter, but the wrong (identified) characteristics that don’t.

This isn’t to say you shouldn’t use big data for recruiting. You absolutely should. And I don’t mean to pick on LinkedIn specifically–it’s probably one of the best tools for employment modeling data out there. But keep in mind that data models are only as good as the inputs they have. If models focus on ultimately irrelevant criteria such as degree or previous employer, and ignore the real source of performance distinction, like whether you liked eating bananas in sixth grade or if you can play the saxophone left-handed (hey, you never know–until you model it), the models have the potential to reinforce the wrong criteria. That’s not to say they can’t change–but only if the real source of your outliers is later identified and incorporated.

LinkedIn may be heading in this direction with endorsements, which may allow people with proven skills to trump people with flashy credentials. And I’d be very interested to see the impact of LinkedIn’s volunteer experience and causes on hiring. Do people with greater volunteer involvement make better employees? Do you want to focus your recruiting on those who help out at the United Way or with the American Cancer Society? Data models may tell–but only if they take these factors into consideration.

So what are your purple squirrel predictors--and how do you know they're the right ones? Tell me here or over at my company site.