0
   

Confidence Level of Collaborative Filtering

 
 
egercek
 
Reply Tue 28 Feb, 2017 05:36 am
I did item based collaborative filtering with R and have some questions about it.

1- How can I know the confidence level of the results. I mean results can show that x item similar to y item 50% probability. How can I rely on this result?

2- I see so many duplicated relation ratio on similarity matrix ( see some examples below).

1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

1 0.707106781186547 0.707106781186547 0.707106781186547 0.707106781186547 0.707106781186547 0.707106781186547 0.5 0.5 0.5 0.5

1 1 1 1 1 0 0 0 0 0 0

And so on.

Since almost all of my data is like that, I'm having difficulties to rely on. I totally believe that my data set is big and various enough. What could be the reason of this type of result? Could you please help me to clarify myself?

Here is my code:

RestaurantData1 <- read.csv(paste(getwd(),"/Restaurant/Restaurant.csv",sep = ""), stringsAsFactors = FALSE)

names(RestaurantData1)[1:2] <- c("UserName","ResName")
RestaurantData1 <- RestaurantData1[,!names(RestaurantData1) %in% "Visits"]
RestaurantData1 <- subset(RestaurantData1, RestaurantData1$Orders > 0 & RestaurantData1$UserName != "")
RestaurantData1$Orders <- 1
gc()

getCosine <- function(x,y)
{
this.cosine <- sum(x*y) / (sqrt(sum(x*x)) * sqrt(sum(y*y)))
return(this.cosine)
}

ColumnBasedData <-
reshape(
RestaurantData1, idvar = "UserName", timevar = "ResName", direction =
"wide"
)
rm(RestaurantData1)
gc()
ColumnBasedData[is.na(ColumnBasedData)] <- 0

ResData <<-
(ColumnBasedData[,!(names(ColumnBasedData) %in% c("UserName"))])
rm(ColumnBasedData)
gc()
holder <-
matrix(
NA, nrow = ncol(ResData),ncol = ncol(ResData),dimnames = list(colnames(ResData),colnames(ResData))
)
ResData.similarity <<- as.data.frame(holder)

for(i in 1:ncol(ResData)) {
for(j in 1:ncol(ResData)) {
ResData.similarity[i,j] <- getCosine(as.matrix(ResData),as.matrix(ResData[j]))
}
}
ResDataNeighbour <- matrix(NA, nrow=ncol(ResData.similarity),ncol=11,dimnames=list(colnames(ResData.similarity)))

for(i in 1:ncol(ResData))
{
ResDataNeighbour[i,] <- (t(head(n=11,rownames(ResData.similarity[order(ResData.similarity[,i],decreasing=TRUE),]))))
}
  • Topic Stats
  • Top Replies
  • Link to this Topic
Type: Question • Score: 0 • Views: 412 • Replies: 0
No top replies

 
 

Related Topics

 
  1. Forums
  2. » Confidence Level of Collaborative Filtering
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 0.09 seconds on 11/16/2024 at 11:35:05