Daniel Flower Dot Com Banner


The heart of the case based reasoner was programmed using SQL Server functions. These functions are detailed below:

Get Similar Cases Function

-- This function takes as input the options as entered by the user. All input values may be null, and are either numeric values, or the IDs of symbols.
-- This returns a table with each row being the ID of a similar case, and the similarity level of that case.
( @holidayType int, @price int, @numberOfPersons int, @region int, @transportationType int,
@duration int, @season int, @hotelType int )
RETURNS @idTable table (caseId int PRIMARY KEY, similarity real)
-- When we calculate the similarity of numeric values, we need to know the maximum allowed value. We take this to be the maximum values in the database. These are calculated here, and passed to the function that calculates similarity.
DECLARE @maxPrice int
DECLARE @maxNumberOfPersons int
DECLARE @maxDuration int
SET @maxPrice = (SELECT MAX(price) FROM [Case])
SET @maxNumberOfPersons = (SELECT MAX(numberOfPersons) FROM [Case])
SET @maxDuration = (SELECT MAX(duration) FROM [Case])
-- This next statement inserts the results into the table. The select statement loops through each case in the database, passing the user's input values along with that cases' information to the getCaseSimilarity function, which returns a value between zero and one.
SELECT [id], dbo.getCaseSimilarity(
@maxPrice, @maxNumberOfPersons, @maxDuration,
@holidayType, @price, @numberOfPersons, @region, @transportationType,
@duration, @season, @hotelType,
holidayTypeId, price, numberOfPersons, regionId, transportationTypeId,
duration, seasonId, accommodationTypeId
) AS similarity FROM [Case] ORDER BY similarity DESC
-- Return the table

Get Case Similarity Function

-- This function calculates the similarity between two cases returning a value between zero and one.
CREATE FUNCTION getCaseSimilarity
@maxPrice int, @maxNumberOfPersons int, @maxDuration int,
@holidayType int, @price int, @numberOfPersons int, @region int, @transportationType int,
@duration int, @season int, @hotelType int,
@otherholidayType int, @otherprice int, @othernumberOfPersons int, @otherregion int, @othertransportationType int,
@otherduration int, @otherseason int, @otherhotelType int
-- Declare the similarity variable which we will calculate
DECLARE @similarity REAL
-- We build up the similarity value by adding the individual similarities of each dimension. There are two functions that calculate the similarity depending on whether it is a numeric or symbolic value. We also add a weighting to each dimension to signify the different importances of each dimension. Each weighting is between zero and one and the weightings were chosen and tweaked by estimating the relative importances of each dimension, and testing to make sure it gave reasonable results. A better approach would be to pass these values into the function and also allow the user to specify their personal importances for each dimension.
SET @similarity =
1.0 * dbo.getNumericSimilarity( @maxPrice, @price, @otherPrice, 1 ) +
0.4 * dbo.getNumericSimilarity( @maxNumberOfPersons, @numberOfPersons, @otherNumberOfPersons, 0 ) +
0.7 * dbo.getNumericSimilarity( @maxDuration, @duration, @otherDuration, 0 ) +
0.7 * dbo.getSimilarity( @holidayType, @otherholidayType )
-- At this point we check to see whether we can discard the current case. While we potentially throw away useful cases here, we get significant performance gains. This is because calculating numeric similarity is a lot quicker than calculating symbolic similarity (this is explained later) . Because the holiday type is the most important symbol, we include it in the first calculation to increase the chance that we keep good cases.
-- To choose the cut-off point, I timed a test case and compared the case results. In my test case, without this step it took 16 seconds to find similar cases. Setting the cut-off to 2.0 only cut 3 seconds off the time. Using 3.0 as the cutt-off, the query took only 4 seconds, however the results were significantly different indicating that some good cases had been thrown away. I then tried 2.5 which took 9 seconds to run and the results were exactly the same as having no cut-off. So this line may almost double the speed of queries, however undoubtedly some good cases will be rejected by it.
IF @similarity < 2.5 AND @holidayType IS NOT NULL AND @price IS NOT NULL RETURN @similarity / 4.4
-- If the current case is promising, we add the remaining 4 Symbol dimensions.
SET @similarity = @similarity +
0.4 * dbo.getSimilarity( @region, @otherregion ) +
0.4 * dbo.getSimilarity( @transportationType, @othertransportationType ) +
0.4 * dbo.getSimilarity( @season, @otherSeason ) +
0.4 * dbo.getSimilarity( @hotelType, @otherhotelType )
-- We then return the overall similarity. Rather than dividing the similarity by the number of dimensions (i.e. 8), we must divide by the sum of the weightings, which happens to be 4.4.
RETURN @similarity / 4.4

Get Numeric Similarity Function

-- This function calculates the similarity of numerically based values. It takes the maximum possible value, and uses that to calculate the similarity of the 2 input values returning a number between zero and one. If less is perfect, such as cost, then the similarity is 1.0 if the user's input is greater than the case being compared to.
CREATE FUNCTION getNumericSimilarity
( @maximumValue real, @inputValue real, @otherValue real, @lessIsPerfect bit)
-- If the input has not been specified then we just return a small amount
IF @inputValue IS NULL RETURN 0.25
DECLARE @similarity REAL
-- If less is perfect then see if we should return one
IF @lessIsPerfect = 1 AND @otherValue < @inputValue
SET @similarity = 1.0
-- Calculate the difference between the two input values as a percentage of the maximum possible value
SET @similarity = (@maximumValue - ABS( @inputValue - @otherValue)) / @maximumValue
-- Return the result
RETURN @similarity

Get Symbol Similarity Function

-- This function calculates the similarity between two symbols. It takes the IDs of two symbols as input and returns a value between 0.0 and 1.0. Although this is the most simple function, it is relatively computationally expensive as it finds the similarity by looking up the value in the SymbolSimilarity table. This can slow the queries down as it must be called five times per case, which explains the importance of rejecting bad looking cases early.
CREATE FUNCTION getSimilarity (@symbol1 int, @symbol2 int)
DECLARE @similarity REAL
-- If the input has not been specified then we just return a small amount
IF @symbol1 IS NULL
SET @similarity = 0.25
-- Get the result from the SymbolSimilarity table.
SET @similarity = (SELECT similarity FROM symbolsimilarity WHERE symbol1ID = @symbol1 AND symbol2ID = @symbol2)
-- Return the result
RETURN @similarity

With those functions in place, the rest of the system was basic HTML and ASP.NET. I simply populated some drop-down boxes with the data from the Symbol table, had a couple more text boxes for the numerical inputs, and used the user's input to call the following SQL query:

String sql = "SELECT TOP " + Constants.MaximumCasesToRetrieve + " caseId, similarity FROM dbo.getSimilarCases( " + holidayType + ", " + price + ", " + numberOfPersons + ", " + region + ", " + transportationType + ", " + duration + ", " + season + ", " + accommodationType + ") WHERE similarity > 0.25 ORDER BY similarity DESC";

You will see that I only receive the top Constants.MaximumCasesToRetrieve (currently set to 10) and I only retrieve those cases with a similarity greater than 25%. Each of the strings (e.g. holidayType) hold the user's choice: a number or symbol ID (both integers) or null if the user did not enter anything for that dimension. I then fed this SQL string to the database and I could then retrieve the cases and display them on the screen.

Go to the next page to see the system in action.
Comments for this page
posted by Ashley on 1/10/2012 2:17:22 a.m. (NZ time)
Stands back from the kybaeord in amazement! Thanks!
Add your comment below
Your Name:
Comment Title: