Facebook

Semantic search is a feature of SQL Server that allows you to perform advanced searches on unstructured text data. With semantic search, you can use natural language search queries to find relevant information in documents and other text-based data stored in SQL Server.

In this article I have explained, how you can implement semantic search using wordnet in SQL Server.

  1. Download the files
  2. Restore wordnet and TileDB on SQL Server 2019 or above
  3. Run below queries to test
SELECT *	FROM   titles
exec [dbo].[get_noun_verb] 'tension',null
exec [dbo].[sp_search_items]  'someone is in tension',null
select wordnet.[dbo].[fn_get_noun_verb]('foot',null)
exec [dbo].[sp_search_items]  'psycho',null
select wordnet.[dbo].[fn_get_noun_verb]('psycho',null)
select * FROM sys.dm_fts_parser  ('"you are in tension"', 1033, 0, 0) 
SELECT [wordid]
      ,[lemma]
  FROM [wordnet].[dbo].[words] where lemma like 'psycho%'

Script for function and Stored procedure:

--Function in wordnet database to get related words
CREATE   FUNCTION [dbo].[fn_get_noun_verb] (
	   @word NVARCHAR(200)
	,@type CHAR(1) = NULL
	)
RETURNS VARCHAR(max) AS

BEGIN
	DECLARE @val VARCHAR(MAX);

	SELECT  @val = STRING_AGG(ISNULL(lemma, ' '), ',')
	FROM (
		SELECT DISTINCT  subq.lemma
		FROM   words a        
		INNER JOIN     senses b ON a.wordid = b.wordid        
		LEFT JOIN     synsets c ON b.synsetid = c.synsetid        
		LEFT JOIN     samples d ON b.synsetid = d.synsetid        
		INNER JOIN    (SELECT         a1.lemma, b1.synsetid     FROM         senses b1     INNER JOIN words a1 ON a1.wordid = b1.wordid) subq ON subq.synsetid = c.synsetid        
			AND subq.lemma <> a.lemma
		WHERE     a.lemma = @word
			AND  (@type IS NULL
				OR c.pos = cast(@type AS VARCHAR))
		) t

	RETURN @val
END
GO


 

ALTER PROC [dbo].[sp_search_items]   (
	  @phrase NVARCHAR(200)
	,  @type VARCHAR(1)  
	)  
AS
 

BEGIN
	 SELECT ROW_NUMBER() OVER (
			ORDER BY (
					SELECT 1
					)
			) rownum
		,occurrence
		,display_term
		,CONCAT (
			display_term
			,','
			,isnull(wordnet.dbo.fn_get_noun_verb(display_term, @type),  display_term)
			) words
	INTO #serchwords
	FROM sys.dm_fts_parser  ('"' + @phrase +   '"', 1033, 0, 0)  where special_term = 'Exact Match'  declare @row INT = 1  declare @max INT

	CREATE TABLE #items (
		[title_id] [varchar](100) NOT NULL
		,[title] [varchar](80) NOT NULL
		,[type] [char](12) NOT NULL
		,[pub_id] [char](4) NULL
		,[price] [money] NULL
		,[advance] [money] NULL
		,[royalty] [int] NULL
		,[ytd_sales] [int] NULL
		,[notes] [varchar](200) NULL
		,[pubdate] [datetime] NOT NULL
		,
		)  select @max = max (rownum)
	FROM #serchwords  while @row <= @max  begin

	DECLARE @searchphrase NVARCHAR(2000)

	SELECT @searchphrase = CONCAT (
			'"'
			,words
			,'"'
			)
	FROM #serchwords
	WHERE rownum = @row

	INSERT INTO #items
	SELECT *
	FROM   titles
	WHERE FREETEXT (
			notes
			,@searchphrase
			)
		OR FREETEXT (
			title
			,@searchphrase
			);

	 

	SET @row = @row + 1  end  select *
	FROM #items  drop TABLE #items  drop TABLE #serchwords  
END
GO

Wordnet table information

adjpositiontypes – defines three positions that adjectives can take in English language, predicate, attributive and immediatelly postnominal.

adjpositions – links concrete words (adjectives) with their allowed position types in adjpositiontypes table.

linktypes – defines all relation (link) types used in wordnet, about two dozen of them. Both lexlinks and semlinks tables use this table to define the type of each link. Some link types are marked as recursive, meaning that if “furniture” is, for example, a hypernim to a “chair”, then a “chair” is a hyponym to “furniture”.

lexlinks – lexical links, i.e., relations between words. Example:
sad – saddness (derivation)

semlinks – semantic links, i.e. relations between synsets. Example:
chair – furniture (hypernym)

morphs – connected to “words” table, contains irregular word forms. One word can have multiple morphs, and one morph can be an irregular form for multiple words, so you additionally have the morphmaps table. Examples:
abacus (word) – abaci (morph)
abhor (word) – abhorred, abhorring (morphs)

postypes – defines “parts of speech”. Contains only following values:
n – noun, v –verb, a – adjective, r – adverb, s – adjective satellite.

samples – sample sentences for synsets. One synset can have multiple samples.

vframemaps & vframes – vframes define a kind of standard “verb templates”. Vframemaps links words (verbs) with corresponding vframes in which they can appear.

vframesentencemaps & vframesentences – similar to previous two tables, just here you have entire sentences as verb templates.

Leave a Reply

Your email address will not be published. Required fields are marked *