8 tools to find semantic similarity between words
Semantic net
The big news this week has been Google announcing their use of semantics to enhance the performance of the search engine. This will not come as a surprise to computer scientists working in the language field (IR, NLP etc…). There are also already quite a few semantic search engines around like cognition for example. I think we were waiting for Google to take this step for a while and now it has it’s really interesting. This does not take over from the keyword approach obviously but is an enhancement.
What the announcement means:
This announcement has led to questions about what the difference was between the semantic web and semantic search. The announcement does not relate to the semantic web in any shape or form. Google is not announcing that it is adding support for RDFa, OWL, microformats or anything else to allow for structured browsing. Their improvement means that by looking at relationships between the words in queries (and in documents I imagine) they can find a better spread of relevant results.
3 useful definitions:
- Concept: an abstract or general idea inferred or derived from specific instances
- Data: a collection of facts from which conclusions may be drawn
- Information: knowledge acquired through study or experience or instruction
- Semantics: the study of language meaning
Putting it all together:
Semantics identify concepts which allow for the extraction of information from data. If you are looking for the meaning of documents or queries concepts need to be captured.
The semantic web/search:
I have covered this at length on this blog at length, take a look at the semantic web section. Particularly “What is semantic search“. In that post you will find the difference between semantic search and semantic web explained in an easy to digest way.
Instead of repeating myself I will list a number of tools that I have been using for quite a while to find semantically related concepts using keywords as a starting point. I think these might give a bit more insight into what is involved and what kind of thing is output as a result.
The tools:
- Wordnet::Similarity (Perl module that implements a variety of semantic similarity and relatedness measures based on info found in WordNet)
- MSR (You can find how semantically related words are using Google, Wikipedia and many others - or all of them at once)
- SenseRelate (uses measures of semantic similarity and relatedness to perform word sense disambiguation)
- UMLS::Similarity::path (Perl module for computing semantic similarity of concepts in the UMLS by simple edge counting)
- SenseBot (Search engine but it will display a number of semantically related terms to your query)
- SenseLearner (A Tool for All-Words Word Sense Disambiguation)
- GWSD (Unsupervised Graph-based Word Sense Disambiguation)
- FrameNet (Visualise relationships)
Reading:
Semantic Networks: Visualizations of Knowledge (Roger Hartley and John Barnden)
Keyphrase Extraction using Semantic Networks Structure Analysis (Chong Huang, Yonghong Tian2, Zhi Zhou, Charles X. Ling, Tiejun Huang)
Semantic Search (Guha, McCool, Miller)
How can semantic technology facilitate candidate research?
Over the past 5 years the recruiting world has adopted the wonderful world of Social networks and On-line communities to attract potential candidates. Hence a problem began to surface as more and more Online communities and services began to exploit such markets while recruiters scrambled to join as many as they could. The problem was no longer where to find potential candidates… it was ‘how do I keep track off all my networks, hundreds of emails, invites, networking requests, and would-you-like-to-join my group messages?”. Total saturation and counter productive became a familiar chat around water coolers and recruiting circles.
In 2007 we began the design and of an open Source platform that generated results from all major soical networks and could crawl the deepest areas of the web with capabilities of cross referencing targets, email generation, SMS Texting and more. With boolean black-belt style filtering+automated updating processes our recruiters began harnessing the most up-to-date results. The amount of time saved meant more time our recruiters and sourcers where speaking with candidates.
It's time to get semantic. If you want to harness this incredible time-saving technology give me a call at 203-648-9638 ext 103 to guide you in the right direction.
You need to be a member of RecruitingBlogs to add comments!
Join RecruitingBlogs