Confidence Level of Collaborative Filtering

Forums: Statistics
Email this Topic • Print this Page

I did item based collaborative filtering with R and have some questions about it.

1- How can I know the confidence level of the results. I mean results can show that x item similar to y item 50% probability. How can I rely on this result?

2- I see so many duplicated relation ratio on similarity matrix ( see some examples below).

1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

1 0.707106781186547 0.707106781186547 0.707106781186547 0.707106781186547 0.707106781186547 0.707106781186547 0.5 0.5 0.5 0.5

1 1 1 1 1 0 0 0 0 0 0

And so on.

Since almost all of my data is like that, I'm having difficulties to rely on. I totally believe that my data set is big and various enough. What could be the reason of this type of result? Could you please help me to clarify myself?

Here is my code:

RestaurantData1 <- read.csv(paste(getwd(),"/Restaurant/Restaurant.csv",sep = ""), stringsAsFactors = FALSE)

names(RestaurantData1)[1:2] <- c("UserName","ResName")
RestaurantData1 <- RestaurantData1[,!names(RestaurantData1) %in% "Visits"]
RestaurantData1 <- subset(RestaurantData1, RestaurantData1$Orders > 0 & RestaurantData1$UserName != "")
RestaurantData1$Orders <- 1
gc()

getCosine <- function(x,y)
{
this.cosine <- sum(x*y) / (sqrt(sum(x*x)) * sqrt(sum(y*y)))
return(this.cosine)
}

ColumnBasedData <-
reshape(
RestaurantData1, idvar = "UserName", timevar = "ResName", direction =
"wide"
)
rm(RestaurantData1)
gc()
ColumnBasedData[is.na(ColumnBasedData)] <- 0

ResData <<-
(ColumnBasedData[,!(names(ColumnBasedData) %in% c("UserName"))])
rm(ColumnBasedData)
gc()
holder <-
matrix(
NA, nrow = ncol(ResData),ncol = ncol(ResData),dimnames = list(colnames(ResData),colnames(ResData))
)
ResData.similarity <<- as.data.frame(holder)

for(i in 1:ncol(ResData)) {
for(j in 1:ncol(ResData)) {
ResData.similarity[i,j] <- getCosine(as.matrix(ResData),as.matrix(ResData[j]))
}
}
ResDataNeighbour <- matrix(NA, nrow=ncol(ResData.similarity),ncol=11,dimnames=list(colnames(ResData.similarity)))

for(i in 1:ncol(ResData))
{
ResDataNeighbour[i,] <- (t(head(n=11,rownames(ResData.similarity[order(ResData.similarity[,i],decreasing=TRUE),]))))
}

Topic Stats
Top Replies
Link to this Topic

Type: Question • Score: 0 • Views: 446 • Replies: 0

No top replies

Confidence Level of Collaborative Filtering

Related Topics

Quick Links

My Account

able2know