I have written a few posts regarding semantic search on www.booleanblackbelt.com, and I get many inquiries about the concept and how it can be applied effectively to sourcing and recruiting.

When I talk about semantic search, I don't mean the semantic web (which I still think is a LONG way off), I am referring to user-generated and defined semantic search. In other words, a sourcer or recruiter creating Boolean search strings that go beyond simply trying to match the words themselves and attempting to delve into the meaning implied by the words.

In linguistics, semantics refers to the study of meaning, as inherent at the levels of words, phrases, and sentences.

The vast majority of sourcers and recruiters create Boolean search strings that simply return a collection of words - words that do not have any associative meaning and that are not guaranteed to be relevant with regard to the intent of the search. Relevance can be defined as the extent to which a search result matches the information need based on the intent of the person executing the search. Highly "relevant" results = results that match exactly what the searcher is looking for.

Most sourcers and recruiters are actually trying to find people that have specific skills and experience. Just because certain words appear in a person's resume or profile - it does not mean that the person has been primarily responsible for working with those words (typically skills, technologies, etc.). For example, a knowledgeable sourcer or recruiter knows that documents with the word “account" mentioned close to the word “executive” will often have a different meaning and relevance than documents that simply mention the words “account" and "executive" located anywhere within them.

This is the critical difference between the semantic similarity between a search and its results vs. the lexical similarity between a search and its results. In other words - when the search results match the intended MEANING of the search, there is a semantic similarity between the search and its results. When search results simply match the search terms but not the intended meaning of the search, there is a lexical similarity (the words match) between the search and its results.

Semantic search can best be achieved through the use of search interfaces and engines that support proximity searching. Proximity search functionality allows a sourcer or recruiter to control how close specific words are mentioned in relation to other words.

When you are able to control the proximity of words to each other, you can take advantage of linguistics and sentence structure to look for verbs mentioned in close proximity to nouns, which can imply taking action. If a resume mentions (configure OR configured OR configuration) - which are verbs - in close proximity to (router OR routers) - which are nouns - and within the same sentence, it is highly likely that the writer is talking about being responsible for configuring routers.

A sourcer or recruiter should not be satisfied to merely scan and read resumes of people who simply mention the words "configure" and "routers" somewhere in the resume - there are many people who can mention those words somewhere in their resume who have never been specifically responsible for configuring routers. The issue is that just because these words are found in a resume - the presence of the words themselves does not MEAN anything with regard to what the candidate has specifically been responsible for.

With the appropriate search interface/engine, sourcers and recruiters can craft semantic searches to find people who not only mention specific words such as "configure" and "routers", but who have actually had experience configuring routers. Being able to control the proximity of words can enable recruiters to quickly get more results that are semantically relevant to what the recruiter is actually trying to find.

There are 3 main types of proximity searching - I will focus on what I think are the two most powerful - fixed proximity search and configurable proximity search.

Fixed proximity search functionality such as the "extended Boolean" NEAR operator enables users to search for words or phrases that are mentioned close to other specific words or phrases. The range of the NEAR operator is fixed, typically at 1-10 words.

Did you know that Monster supports the NEAR operator? Many people aren't aware of this - but it's the only major job board resume database that I am aware of to do so. Kudos to Monster! It is unfortunate that there are very few people who even know about the NEAR operator, and even fewer still who know how to utilize it to achieve semantic search.

Among Internet search engines - Google, Yahoo, Live, and Ask do not support proximity searching of any kind - only Exalead does, to my knowledge. As for Applicant Tracking Systems, I am aware that Bullhorn has integrated Lucene, a free and open source text search engine that suppports configurable proximity, into their search interface

Configurable proximity search goes one step further than fixed proximity, allowing a sourcer or recruiter to precisely control the maximum distance between specific search terms and to return even more relevant results than the NEAR operator. This is because the NEAR operator’s maximum range of 10 words can allow for some non-relevant results to be returned. The farther words are mentioned apart from each other, the less likely it is that they are semantically related. In fact, when two search terms are separated by 10 words, each could be mentioned in separate bullet points or sentences on a resume and be completely unrelated.

However, with configurable proximity, a sourcer or recruiter can choose the maximum distance between search terms. Although search engines supporting configurable proximity vary with their exact syntax, here is an example of a search looking for someone who has been responsible for administering Exchange servers: Windows AND Exchange w/5 admin* AND server*. That search can ONLY return results of resumes or profiles that mention Exchange within 5 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.), regardless of order. A maximum distance of 5 words will dramatically increase the semantic similarity between the search's intent and the search results because mentioning those 2 search terms at such a close range makes it more likely that they are mentioned in the same bullet point or sentence and thus more likely to be semantically related. Essentially, this search will only return results of people who specifically mention something about being responsible for administering Exchange in their resume.

Many sourcers and recruiters employing basic search tactics and strategies may unfortunately be simply throwing a bunch of keywords in a search - and as a result, end up reviewing large volumes of irrelevant results that simply match the search terms they entered (lexical match) in order to "get lucky" to find the few results buried among them that are relevant to what they are seeking. This is a huge time drain, is inefficient, and is low yield.

Experts at talent mining seek to craft Boolean search strings designed to reduce irrelevant "false positive" results, eliminating those of people who simply mention the words they are searching for somewhere in their resumes or profiles, and go beyond the simple lexical match to achieve semantic search - finding people whose experience and skills match the essence of their search.

If you don't already take advantage of the power of semantic search to quickly find more relevant results when creating your Boolean search strings, now is the perfect time to set it as a resolution for 2009. Make it a goal to move beyond simple buzzword matching and create Boolean searches that target people more based on what they DO, rather than just the words they use in their resume or profile.

Views: 1117

Reply to This

Replies to This Discussion

Wonderful piece Glen and makes me think about the "semantic" component of telephone names sourcing - to use some of your words (in italics) mixed with mine (in non-italics): "In other words, a sourcer or recruiter...creating communication with a Gatekeeper, uses words... that go beyond simply trying to match the words themselves and attempting to delve into the meaning implied by the words."

I have long wondered how many different ways people interpret simple phrases/requests for information. It is a key understanding of successful phone sourcers.

Glen did an amazing one hour class on his Boolean techniques on December 16 over on the MagicMethod network. If you missed it, you can read the entire chat log here.
Glen, this is fantastic stuff. Thanks for posting.
Hi Glen,

Thanks for the post. Would you agree that in fact we are all trying to do a semantic search when we look for candidates who match a job description? You are right that simply picking a few keywords and pasting them into a search string will end up costing us a lot of reviewing of irrelevant results.

The NEAR operator is one way to narrow down the results. Would you consider the following to be "semantic" searches?

* Try to add keywords that may not be on the job description but (since you know the subject area) are often found on the right pages/resumes. These keywords may sometimes be assumed by the person who wrote the description but not explicitly said. These may be certifications, industry-specific words etc. As an example, words like Swing and Eclipse need to be used along with Java to point us in the right direction.
* Try to modify your keywords so that they are likely to point to the right stuff. Looking for a User Interface engineer for Windows? Name UI packages for Windows such as MFC.
* Add keywords with a - sign on Google (NOT on databases) if they are likely to take you away from the right results. We're all taught to add -job -jobs etc. to our Google resume search strings.
* On Google, look for your results using a site: command pointing to sites that are likely to have what you are looking for.

There are some requirements/keywords that make semantic search very hard. If you are looking for an engineer from Microsoft or Yahoo, or (as it was posted on the LinkedIn "Boolean Strings" group) for a person in Oregon (OR), that may be hard.
Sometime ago I got a recommendation to consult with two eye doctors with the last names Good and Day in San Francisco (true story). Boy, was it a project to find them online. But with the right Boolean Strings even this is doable.

-Irina
Irina,
I would agree that most sourcers and recruiters are trying to find people who have had specific experience performing the role and responsibilities required (and desired) of a job description – but they are NOT actually performing semantic search (not specifically leveraging semantics in their search tactics and strategy). A collection of search terms is just that – a collection of words.

The NEAR operator and configurable proximity functionality of search applications such as Lucene and dtSearch are the best ways to leverage semantics when searching because they allow you to target sentence structure, such as when people talk about doing X with Y (configuring routers, reconciling reports, administering a server cluster, implementing SAP, customizing interfaces, performing SOX audits, etc.).

In response to your question of whether or not I would consider the following to be "semantic" searches:

* Try to add keywords that may not be on the job description but (since you know the subject area) are often found on the right pages/resumes. These keywords may sometimes be assumed by the person who wrote the description but not explicitly said. These may be certifications, industry-specific words etc. As an example, words like Swing and Eclipse need to be used along with Java to point us in the right direction.
* Try to modify your keywords so that they are likely to point to the right stuff. Looking for a User Interface engineer for Windows? Name UI packages for Windows such as MFC.
* Add keywords with a - sign on Google (NOT on databases) if they are likely to take you away from the right results. We're all taught to add -job -jobs etc. to our Google resume search strings.
* On Google, look for your results using a site: command pointing to sites that are likely to have what you are looking for.

Answer - while these are certainly search best practices, they are not instrinsically semantic search.

Adding additional keywords of any type to a search, or using the NOT/- operator may (or may not) help narrow search results, but adding or selectively removing keywords/search terms in many cases simply produces results with the search terms (and without the removed terms) without implying any responsibility with the search terms.

I’ll use your UI Engineer as an example. There are many people who can mention (UI or user interface or GUI) and MFC in their resume who in fact do not have any significant experience with interface design, even though the words are somewhere in the resume. Even if we added other UI-related related terms such as (wireframe or human factors or cognitive or heuristic) we can still return many resumes that match the keywords but who have not been primarily responsible for interface design. Sourcers and recruiters encounter this all the time – the words are in the resume, but the person has not actually DONE what they need them to have done in their career. That is an excellent example of a high lexical similarity between a search and the results (the words match) and low semantic similarity of the search and the results (the person’s experience does NOT match).

This can also be evidenced in the “technical skills summary” of most resumes, where a laundry list of skills and technologies are present, but simply being mentioned does not imply any level of expertise or paid experience. Hence someone could mention Java, Eclipse, and Swing in their resume, but not have any paid experience developing applications with them (as in educational experience or at home). This is a normal experience for sourcers and recruiters and so they assume this is simply “the way it is.”

However, if we use the NEAR command (or even better - a more powerful proximity search operator such as dtSearch’s w/x), we could add this to a search string: (develop* or design*) NEAR (Java or Eclipse or Swing), and the results MUST mention Java or Eclipse or Swing within 10 words of develop or design, increasing the likelihood that the results will include resumes that have sentences specifically stating development or design-level responsibility with Java/Eclipse/Swing. This is tapping into semantics – the presence of words does not necessarily imply any meaning, but words in the same sentence do imply meaning in most (but certainly not all) cases.

I whole heartedly agree that there are some keywords that make semantic search difficult, but as you stated, creative application of search tactics and strategies can in most cases solve these challenges.
Glen-

Hidden in your response - at least to me - is the notion that many recruiters/sourcers simply do not understand the job they're being paid to fill. Semantic searching requires intimate knowledge of the position in question and how work is accomplished - rather than tossing in a few keywords in a Mad Libs attempt to find a person.
Sorry I'm just getting around to reading this...good stuff... but i agree with Steve.... sounds like as recruiters and sourcers we need to truly understand the roles for which we seek candidates.....Look forward to reading more.

Reply to Discussion

RSS

Subscribe

All the recruiting news you see here, delivered straight to your inbox.

Just enter your e-mail address below

Webinar

RecruitingBlogs on Twitter

© 2024   All Rights Reserved   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service