I have written a few posts regarding semantic search on
www.booleanblackbelt.com, and I get many inquiries about the concept and how it can be applied effectively to sourcing and recruiting.
When I talk about semantic search, I don't mean the
semantic web (which I still think is a LONG way off), I am referring to user-generated and defined semantic search. In other words, a sourcer or recruiter creating Boolean search strings that go beyond simply trying to match the words themselves and attempting to delve into the
meaning implied by the words.
In linguistics, semantics refers to the study of meaning, as inherent at the levels of words, phrases, and sentences.
The vast majority of sourcers and recruiters create Boolean search strings that simply return a collection of words - words that do not have any associative meaning and that are not guaranteed to be relevant with regard to the intent of the search. Relevance can be defined as the extent to which a search result matches the information need based on the intent of the person executing the search. Highly "relevant" results = results that match exactly what the searcher is looking for.
Most sourcers and recruiters are actually trying to find people that have specific skills and experience. Just because certain words appear in a person's resume or profile - it does not mean that the person has been primarily responsible for working with those words (typically skills, technologies, etc.). For example, a knowledgeable sourcer or recruiter knows that documents with the word “account" mentioned close to the word “executive” will often have a different meaning and relevance than documents that simply mention the words “account" and "executive" located anywhere within them.
This is the critical difference between the semantic similarity between a search and its results vs. the lexical similarity between a search and its results. In other words - when the search results match the intended MEANING of the search, there is a semantic similarity between the search and its results. When search results simply match the search terms but not the intended meaning of the search, there is a lexical similarity (the words match) between the search and its results.
Semantic search can best be achieved through the use of search interfaces and engines that support proximity searching. Proximity search functionality allows a sourcer or recruiter to control how close specific words are mentioned in relation to other words.
When you are able to control the proximity of words to each other, you can take advantage of linguistics and sentence structure to look for verbs mentioned in close proximity to nouns, which can imply taking action. If a resume mentions (configure OR configured OR configuration) - which are verbs - in close proximity to (router OR routers) - which are nouns - and within the same sentence, it is highly likely that the writer is talking about being responsible for configuring routers.
A sourcer or recruiter should not be satisfied to merely scan and read resumes of people who simply mention the words "configure" and "routers" somewhere in the resume - there are many people who can mention those words somewhere in their resume who have never been specifically responsible for configuring routers. The issue is that just because these words are found in a resume - the presence of the words themselves does not MEAN anything with regard to what the candidate has specifically been responsible for.
With the appropriate search interface/engine, sourcers and recruiters can craft semantic searches to find people who not only mention specific words such as "configure" and "routers", but who have actually had experience configuring routers. Being able to control the proximity of words can enable recruiters to quickly get more results that are semantically relevant to what the recruiter is actually trying to find.
There are 3 main types of proximity searching - I will focus on what I think are the two most powerful - fixed proximity search and configurable proximity search.
Fixed proximity search functionality such as the "extended Boolean" NEAR operator enables users to search for words or phrases that are mentioned close to other specific words or phrases. The range of the NEAR operator is fixed, typically at 1-10 words.
Did you know that Monster supports the NEAR operator? Many people aren't aware of this - but it's the only major job board resume database that I am aware of to do so. Kudos to Monster! It is unfortunate that there are very few people who even know about the NEAR operator, and even fewer still who know how to utilize it to achieve semantic search.
Among Internet search engines - Google, Yahoo, Live, and Ask do not support proximity searching of any kind - only Exalead does, to my knowledge. As for Applicant Tracking Systems, I am aware that Bullhorn has integrated
Lucene, a free and open source text search engine that suppports configurable proximity, into their search interface
Configurable proximity search goes one step further than fixed proximity, allowing a sourcer or recruiter to precisely control the maximum distance between specific search terms and to return even more relevant results than the NEAR operator. This is because the NEAR operator’s maximum range of 10 words can allow for some non-relevant results to be returned. The farther words are mentioned apart from each other, the less likely it is that they are semantically related. In fact, when two search terms are separated by 10 words, each could be mentioned in separate bullet points or sentences on a resume and be completely unrelated.
However, with configurable proximity, a sourcer or recruiter can choose the maximum distance between search terms. Although search engines supporting configurable proximity vary with their exact syntax, here is an example of a search looking for someone who has been responsible for administering Exchange servers: Windows AND Exchange w/5 admin* AND server*. That search can ONLY return results of resumes or profiles that mention Exchange within 5 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.), regardless of order. A maximum distance of 5 words will dramatically increase the semantic similarity between the search's intent and the search results because mentioning those 2 search terms at such a close range makes it more likely that they are mentioned in the same bullet point or sentence and thus more likely to be semantically related. Essentially, this search will only return results of people who specifically mention something about being responsible for administering Exchange in their resume.
Many sourcers and recruiters employing basic search tactics and strategies may unfortunately be simply throwing a bunch of keywords in a search - and as a result, end up reviewing large volumes of irrelevant results that simply match the search terms they entered (lexical match) in order to "get lucky" to find the few results buried among them that are relevant to what they are seeking. This is a huge time drain, is inefficient, and is low yield.
Experts at talent mining seek to craft Boolean search strings designed to reduce irrelevant "false positive" results, eliminating those of people who simply mention the words they are searching for somewhere in their resumes or profiles, and go beyond the simple lexical match to achieve semantic search - finding people whose experience and skills match the essence of their search.
If you don't already take advantage of the power of semantic search to quickly find more relevant results when creating your Boolean search strings, now is the perfect time to set it as a resolution for 2009. Make it a goal to move beyond simple buzzword matching and create Boolean searches that target people more based on what they DO, rather than just the words they use in their resume or profile.