How Google Combines Hiring Steps

31 January 2017

For a lot of companies, the hiring process looks similar: Review CVs, have a round of one or more phone screens, Skype, or Hangout sessions, then some personal interviews, and finally a hiring decision. A work sample test (e.g., write some code that does this or that) might be asked for at some point or other. This sequence of different steps has a couple of advantages, after all:

Before spending a lot of time and money on flying a candidate in, you have a remote conversation first, and thus can avoid situations where the candidate is hopeless in the on-site interviews.
A work sample test gives you an actual impression of the candidate’s problem-solving ability, coding skills, and the quality of their work.
On-site interviews give the interviewers a feeling whether they can imagine working with this candidate.

Often, it starts with an application

Superficially, Google is no different

All of these methods can also be found at Google. What is interesting, though, is the data they gathered on each of the methods, and the way they vary and combine them. In Work Rules!, Laszlo Bock, Senior Vice President of People Operations at Google, shares some interesting facts about Google’s recruiting process and how they arrived there.

Specifically, Bock gives some numbers on the predictive power of various stages of the hiring process. “Predictive power” in this context means: How likely is it that performance at a certain stage of the hiring process co-occurs with actual performance on the job (correlation)? For example, if a candidate does well in a technical interview, how likely is it that she will also perform well on the job? What if she does well in a work sample test?

Handwriting analysis (0.07% predictive power), number of years of work experience (3%), and reference checks provided by the candidate (7%) can be discarded as a waste of time rather safely. Luckily, other methods prove more useful.

Work sample tests

The greatest predictive power, with 29%, comes with work sample tests. These are case studies, take-home exercises, or technical tasks solved during an interview. This makes sense to me, because I have conducted interviews both with and without technical tasks, and always felt a lot more confident about those candidates who were faced with at least some small programming tasks. Especially simple technical challenges are useful for weeding out hopeless candidates: If it takes you five minutes to write down a simple iteration over an array, I will probably not admit you to the next stage.

By the way: Maybe you are surprised that 29% is the greatest predictive power there is. It would be nice if it were greater, wouldn’t it? Say, 80%. However, we are out of luck there: In a dynamic environment, a technical exercise is unlikely to match the variety and the complexity of tasks the new hire will face in her day-to-day work. Moreover, given the highly interactive and interdisciplinary teams you often find today, technical skills are by far not the only (and maybe not even the most important) skills a software engineer needs.

This relatively low predictive power of each individual method forces us to combine several methods in an intelligent way to reach a higher overall predictive power of the hiring process — that is, you want to find your optimal recruiting mix that works best for your organization. We will look at combination of techniques further down.

IQ tests and interviews

An interview

The second place in predictive power is a tie between tests of general cognitive ability and structured interviews. Both score 26%. Tests of general cognitive ability are similar to IQ tests: There is a pool of questions, and each question comes with a set of right and wrong answers from which you choose. The fact that these tests have high predictive power can be explained by the complex work environment and problems employees have to cope with, and the multitude of information they have to process. High cognitive ability helps with that. The downside is that these kinds of tests are suspected to discriminate against women and (at least in the U.S.) certain ethnic groups.

Structured interviews, also at 26%, are an interesting case, once you consider that unstructured interviews have a predictive power of only 14% — only about half as much as the structured ones. What is a structured interview, then? The two defining criteria are:

They use one or several consistent set(s) of questions
There are clear criteria for assessing responses

The consistent set of questions makes it possible to compare different answers over time, and collect data on what makes a good answer, a mediocre answer, or a poor one. Also, having a list of questions ensures that you cover everything that is important to you, and don’t wander off course too much.

Beware of your gut

The second point, “clear criteria for assessing responses”, can protect against interviewers’ gut feeling. Everybody has it, but unfortunately, it is often wrong (the corresponding chapter in the book is aptly named “Don’t Trust Your Gut”). I consider myself a somewhat experienced interviewer (>150 in-person interviews, >300 remote ones), and I know my gut feeling has often been wrong. Some mistakes were averted by a colleague who advocated for a candidate I wanted to reject, and, in some cases, I should thank them on my knees that they convinced me. In other cases, I let my gut feeling cover up some pretty obvious weaknesses in a candidate whom we hired and subsequently had to let go again.

All too often in interviews, we form an opinion during the first minute, and spend the rest of the time unconsciously trying to confirm this opinion. This phenomenon is called confirmation bias.

However, if your questions have been carefully chosen beforehand, you cannot choose them during the interview any more to nurture your confirmation bias. This is what the structured interview is about.

Additionally, the clear criteria for assessing responses to these questions add objectivity to the evaluation phase and protect against a potentially wrong gut feeling and your current mood. We all have good days and not-so-good days. Our current mood might influence our judgement. Being hungry already impacts our judgement, and can cause us to be overly harsh. If you note down a candidate’s answers to a set of proven and good questions, you can make a decision based on facts instead of your current mood, because you can go over the answers again at a later time (don’t let too much time pass, though), and clearly see if they were good or not.

Having said all that, there is a limit, of course, how far you should take objectivity. A candidate can give all the best answers in the world. However, if he constantly cuts you short, brags a lot, or otherwise presents himself as a jerk, that’s still a no-hire.

What Google uses

Of the techniques described above, Google uses the following:

Structured interviews
Assessments of cognitive ability
Assessments of conscientiousness (Do you work to completion? Do you act like an owner?)
Assessments of leadership

Unfortunately, Laszlo does not go into detail about the assessments of conscientiousness and leadership. I guess we cannot expect them to tell us all of their secrets, right? My guess would be that they use behavioural questions to do it.

He does tell us about some interesting additions to the “standard” hiring process, though. The ones I found most interesting are:

Resumes are screened by somebody who has an overview over all open positions, not just the ones within one team or department. The reasoning is that sometimes, a candidate might not be a great fit for a certain role, but will be excellent or should at least be considered for a different one. The person screening the resumes can then re-route the application to the appropriate hiring manager.
The remote interview assesses a candidate’s problem-solving and learning ability. This is no big surprise per se, but it confirms that it is important to test for some analytical skill before inviting somebody. Otherwise, you might waste time on somebody who is a great person, but does not have the skills required for the position.
For team lead positions, candidates have in-person interviews with future subordinates. This makes perfect sense, because those employees are most affected by the hiring decision.
For all positions, candidates have an in-person interview with a cross-functional interviewer. This means, a prospective software engineer gets interviewed by somebody from the legal department, or a product manager gets interviewed by a salesperson. This practice ensures objectivity, because the cross-functional interviewer has no stake in the position being filled or not. They might notice something that is a blind spot to the interviewers coming from the same domain of expertise.
Google invests time and effort in a formal and very structured feedback compilation, consisting of interviewer score, feedback for each interview question, detailed feedback on each candidate answer based on Google’s four hiring attributes (general cognitive ability, leadership, role-related knowledge, and “Googleyness”), and other information. Average scores across all interviewers are calculated and emphasized.
To make the final decisions, Google employs hiring committees, which typically consist of directors and VPs. They review the packets of information compiled in the previous steps, and might reject a candidate based on that information. Otherwise, they add their own feedback, and pass the hiring recommendation on to a group of Senior Leaders. From there, Larry Page himself is sent all the recommended hires each week, and has to approve them.

What you can do

Offer accepted: We all hope it gets to that

These variations show that even with a relatively standard process, adding little variations here and there can create a lot of value. Are you tired of how your interviews always drag on the same way? Ask a colleague from a different department (cross-functional) to join you for a couple of interviews, and exchange ideas. Ask if you can visit interviews in different departments to find out what they do differently.

Do you sometimes have the feeling after an interview that you did not really learn the relevant things about the candidate that would let you make an informed decision? Prepare one or more structured interviews with pre-determined sets of questions. Bock shares a great list of questions by the Office of Veteran affairs, and with good reason. It has great questions that are relevant across a wide range of jobs, and it is structured according to career level and soft skill category.

My own biggest takeaway was to eliminate or at least reduce the chance of my gut feeling taking over and making the decision for me. Biases can hurt you, and it simply feels stupid when you reject a candidate, but cannot give a good reason why. This means I am investing more time in structured interviews, try to establish clear criteria on what makes a good answer and what doesn’t, and compare a candidate’s answers against those criteria. I have to iterate over the interviews before they are perfect, but I feel that it is a step in the right direction.

Your greatest potential for improving your hiring process might be somewhere entirely different, but I think eliminating randomness — especially across a large number of interviewers — is a good place to start. Whatever you do, just don’t resort to handwriting analysis.

Time investment

This blog post took me about 4.5 hours to write.