I've had a bit of a search, but didn't find anything quite like what I'm trying to achieve.
Basically, I'm trying to find a similarity between two users' voting habits.
I have a table storing each individual vote made, which stores:
voteID
itemID (the item the vote is attached to)
userID (the user who voted)
direction (whether the user voted the post up, or down)
I'm aiming to calculate the similarity between, say, users A and B, by finding out two things:
- The number of votes they have in common. That is, the number of times they've both voted on the same post (the direction does not matter at this point).
- The number of times they've voted in the same direction, on common votes.
(Then simply to calculate #2 as a percentage of #1, to achieve a crude similarity rating).
My question is, how do I find the intersection between the two users' sets of votes? (i.e. how do I calculate point #1 adequately, without looping over every vote in a highly inefficient way.) If they were in different tables, an INNER JOIN would suffice, I'd imagine... but that obviously won't work on the same table (or will it?).
Any ideas would be greatly appreciated.
-
You most certainly can join a table to itself. In fact, that's what you're going to have to do. You must use aliasing when joining a table to itself. If your table doesn't have a PK or FK, you'll have to use Union instead. Union will remove duplicates and Union All will not.
James Burgess : But how do you go about joining it on the same column with the differing user IDs? Any chance you could give an example - as I'm obviously overlooking something, here...GregD : I was going to, but ninesided beat me too it.ninesided : my bad, your post appeared halfway through me typing mine!GregD : no problems. Your post was good! :) -
Something like this:
SELECT COUNT(*) FROM votes v1 INNER JOIN votes v2 ON (v1.item_id = v2.item_id) WHERE v1.userID = 'userA' AND v2.userUD = 'userB'James Burgess : Just what I needed, thanks. -
Here's an example that should get you closer:
SELECT COUNT(*) FROM ( SELECT u1.userID FROM vote u1, vote u2 WHERE u1.itemID = u2.itemID AND u1.userID = user1 AND u2.userID = user2)ninesided : old school join syntaxLance Roberts : That could be, you're welcome to point me out to some article on joins that explains why one way is better than the other. I don't have tons of SQL experience.ninesided : It's all about readability and ease of comprehension. Check out this answer: http://bit.ly/ansijoinLance Roberts : Thanks, that was a good link. I'll upvote your answer. -
Assuming userID 1 being compared to userID 2
For finding how many votes they have in common:
SELECT COUNT(*) FROM Votes AS v1 INNER JOIN Votes AS v2 ON (v2.userID = 2 AND v2.itemID = v1.itemID) WHERE v1.userID = 1;For finding when they also voted the same:
SELECT COUNT(*) FROM Votes AS v1 INNER JOIN Votes AS v2 ON (v2.userID = 2 AND v2.itemID = v1.itemID AND v2.direction = v1.direction) WHERE v1.userID = 1; -
A self join is in order. Here it is with all you asked:
SELECT v1.userID user1, v2.userID user2, count(*) n_votes_in_common, sum(case when v1.direction = v2.direction then 1 else 0 end) n_votes_same_direction, (n_votes_same_direction * 100.0 / n_votes_in_common) crude_similarity_percent FROM votes v1 INNER JOIN votes v2 ON v1.item_id = v2.item_id -
In case you want to do this for a single user (rather than knowing both users at the start) to find to whom they are the closest match:
SELECT v2.userID, COUNT(*) AS matching_items, SUM(CASE WHEN v2.direction = v1.direction THEN 1 ELSE 0 END) AS matching_votes FROM Votes v1 INNER JOIN Votes v2 ON v2.userID <> v1.userID AND v2.itemID = v1.itemID WHERE v1.userID = @userID GROUP BY v2.userIDYou can then limit that however you see fit (return the top 10, top 20, all, etc.)
I haven't tested this yet, so let me know if it doesn't act as expected.
James Burgess : That's actually really helpful, thanks :)
0 comments:
Post a Comment