Optimizing Subqueries in MySQL

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Optimizing Subqueries in MySQL

Wed
08
Feb 2012

I'm not SQL guru, but I have to use databases from time to time, whether at home or at work, for desktop or (mostly) web applications. Some time ago I discovered a trick which greatly optimizes performance of complex queries that contain subqueries. I used it again recently when coding new comments administration panel for this website.

In my previous job I had a case when after several months of work my system gathered so much data that it couldn't generate report in any reasonable time. After applying this optimization reports from same data were generated instantly.

I know for these who deal with databases every day this "trick" or "optimization" is probably very basic, but if you - just like me - do only simple database tasks, it can be new to you. Here are the details:

Let's say we code a blog CMS and we have two tables:

CREATE TABLE posts (
    post_id int(11) NOT NULL AUTO_INCREMENT,
    title varchar(128) NOT NULL,
    content text NOT NULL,
    time datetime NOT NULL,
    PRIMARY KEY (post_id),
    KEY time_k (time)
);

CREATE TABLE comments (
    comment_id int(11) NOT NULL AUTO_INCREMENT,
    post_id int(11) NOT NULL,
    author varchar(32) NOT NULL,
    text text NOT NULL,
    time datetime NOT NULL,
    PRIMARY KEY (comment_id),
    KEY post_id_fk (post_id),
    CONSTRAINT post_id_fk FOREIGN KEY (post_id)
        REFERENCES posts (post_id) ON DELETE CASCADE ON UPDATE CASCADE
);

We want to show all comments (regardless of referred post), from newest to oldest.

select comment_id, post_id, author, text, time
from comments
order by time desc;

Now we want to show also some information about the post each comment refers to. We have to join table comments with table posts.

select
    c.comment_id comment_id, c.post_id post_id, c.author comment_author,
    c.text comment_text, c.time comment_time,
    p.title post_title, p.time post_time
from comments c
    join posts p on p.post_id = c.post_id
order by c.time desc;

But then things get complicated as we want some more information about each post: number of comments, time of its first and last comment. Such information for a single post can be fetched with following query:

select count(*) count, min(time) first_time, max(time) last_time
from comments
where post_id = 1;

Now, how to join this with the query above? First idea is to put subqueries in "select" section. That will fetch desired information for each record:

select
    c.comment_id comment_id, c.post_id post_id, c.author comment_author,
    c.text comment_text, c.time comment_time,
    p.title post_title, p.time post_time,
    ( select count(*)
        from comments c2
        where c2.post_id = p.post_id ) comment_count,
    ( select min(time) first_comment_time
        from comments c2
        where c2.post_id = p.post_id ) first_comment_time,
    ( select max(time) last_comment_time
        from comments c2
        where c2.post_id = p.post_id ) last_comment_time
from comments c
    join posts p on p.post_id = c.post_id
order by c.time desc;

This is the great killer of the performance. I've found that such subqueries in select section, which logically issue a new query for each returned row, execute very slowly in MySQL. It means the database probably doesn't optimize it as well as it could.

But we can optimize it manually to retrieve exactly same data much faster. The trick involves writing a subquery in "from" section instead of "select" section. This way the subquery will be logically executed only once and joined with the rest of the query like it was just another database table.

To implement this idea, we have to first code a query that will return same data like our subqueries above, but this time for all posts we are interested in, not for a single one:

select p.post_id post_id, count(*) comment_count,
    min(c.time) first_comment_time, max(c.time) last_comment_time
from comments c
    join posts p on p.post_id = c.post_id
group by c.post_id;

The final query is:

select
    c.comment_id comment_id, c.post_id post_id, c.author comment_author,
    c.text comment_text, c.time comment_time,
    p.title post_title, p.time post_time,
    sub.comment_count comment_count,
    sub.first_comment_time first_comment_time,
    sub.last_comment_time last_comment_time
from comments c
    join posts p on p.post_id = c.post_id
    join (
        select p.post_id post_id, count(*) comment_count,
            min(c.time) first_comment_time, max(c.time) last_comment_time
        from comments c
            join posts p on p.post_id = c.post_id
        group by c.post_id ) sub on sub.post_id = p.post_id
order by c.time desc;

Comments | #sql Share

Comments

STAT NO AD
[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2019