Thursday, March 3, 2011

MySQL Joining Problems

The following query returns strange results for me:

SELECT
    `Statistics`.`StatisticID`,
    COUNT(`Votes`.`StatisticID`) AS `Score`,
    COUNT(`Views`.`StatisticID`) AS `Views`,
    COUNT(`Comments`.`StatisticID`) AS `Comments`
FROM `Statistics`
LEFT JOIN `Votes` ON `Votes`.`StatisticID` = `Statistics`.`StatisticID`
LEFT JOIN `Views` ON `Views`.`StatisticID` = `Statistics`.`StatisticID`
LEFT JOIN `Comments` ON `Comments`.`StatisticID` = `Statistics`.`StatisticID`
GROUP BY `Statistics`.`StatisticID`
LIMIT 0, 10

I'm querying this on a table structure like the following:

(only data relevant to Statistics.StatisticID = 8)

Votes

StatisticID
    8

Views

StatisticID
    8
    8

Comments

StatisticID
    8
    8
    8
    8
    8

Now if I run this query I get the following result set:

StatisticID    Score    Views   Comments
     8           5        5        5

I knwo where the 5 comes from - the number of Comments - and this works if I take the comments statement out. Can anyone debug this as this is out of my reach (I'm relatively new with SQL).

Thanks, Ross

From stackoverflow
  • When joining like this, you will duplicate the data as many times as you find mathing rows in the other tables. This is fine, if you only have 1 corresponding row in each table.

    Run this query without the grouping, and you will get an idea on why you get the same result on all counts. I would however have guessed that you'll get 10 as the count for every field (1*2*5) If you want to solve this, you need to call a subselect for every count.

    SELECT
        s.`StatisticID`,
        (SELECT COUNT(*) FROM Votes WHERE Votes.StatisticID = s.StatisticID) AS Score,
        (SELECT COUNT(*) FROM Views WHERE Views.StatisticID = s.StatisticID) AS Views,
        (SELECT COUNT(*) FROM Comments WHERE Comments.StatisticID = s.StatisticID) AS Comments,
    FROM `Statistics` s
    LIMIT 0, 10
    

    There are certain performance-issues with this if the outher result is big. You might optimize it a little by joining one of the tables however I'm uncertain if the queryparser would be smart enough to only run 1 single time for each grouped item. Hopefully it would. Otherwise you could split it into different queries.

    Kieveli : I wouldn't worry about performance issues - this sub-query solution will execute perfectly, and not perform any spurious actions!
    jishi : I was more thinking if the outer result would have like 1000 rows, then it would need to run 3000 sub-queries aswell. In that case it would probably be faster to join one table at the time, and call the query 3 times.
    benlumley : agreed on the subquery thing - if there are a lot of outer results. But we do have limit 0, 10 in it at the moment i suppose. Also, you've selected from votes twice, third suybquery should be from comments
    jishi : Hm, I did change that, somehow my edit was reverted, or it was a glitch in the update.
    Ross : Thanks, I wouldn't have been aware of the perf issues if you hadn't pointed that out to me :)
  • Assuming you have an id field or similar on the votes/views/comments:

    SELECT
        `Statistics`.`StatisticID`,
        COUNT(DISTINCT `Votes`.`VoteID`) AS `Score`,
        COUNT(DISTINCT `Views`.`ViewID`) AS `Views`,
        COUNT(DISTINCT `Comments`.`CommentID`) AS `Comments`
    FROM `Statistics`
    LEFT JOIN `Votes` ON `Votes`.`StatisticID` = `Statistics`.`StatisticID`
    LEFT JOIN `Views` ON `Views`.`StatisticID` = `Statistics`.`StatisticID`
    LEFT JOIN `Comments` ON `Comments`.`StatisticID` = `Statistics`.`StatisticID`
    GROUP BY `Statistics`.`StatisticID`
    LIMIT 0, 10
    

    Not tested it, but think it should work. (We have to use the different field because the statisticID will always be the same within a given group ...)

    jishi : Actually, this is a better solution. Didn't know that you could use DISTINCT within a count, just tested it and it works.

0 comments:

Post a Comment