Recently I’ve stumbled upon a few similar cases of ineffective queries which I want to share with you.
An example use case is as follows: we have an online store, each order contains some products with respective quantity. We want to show a few last orders in user’s personal account with some top-ranked items for each order. Let’s show the item with maximal quantity.
Your last orders:
1. Order #136 on June 07: Hankook All Season Tire x 48, … Click for details
2. Order #135 on June 02: Yokohama Winter Tire x 36, … Click for details
Set up the tables and indexes…
CREATE TABLE product ( id serial NOT NULL, name TEXT, CONSTRAINT product_pkey PRIMARY KEY (id) ); CREATE TABLE "order" ( id serial NOT NULL, name TEXT, CONSTRAINT order_pkey PRIMARY KEY (id) ); CREATE TABLE item ( order_id INT NOT NULL, product_id INT NOT NULL, quantity INT NOT NULL, CONSTRAINT item_order_id_fk FOREIGN KEY (order_id) REFERENCES "order"(id), CONSTRAINT item_product_id_fk FOREIGN KEY (product_id) REFERENCES product(id) ); CREATE INDEX item_order_id ON item USING btree (order_id); CREATE INDEX item_product_id ON item USING btree (product_id);
And add some data…
INSERT INTO product (id, name) SELECT s, 'product_' || s FROM generate_series(1, 1000000) AS s; INSERT INTO "order" (id, name) SELECT s, 'order_' || s FROM generate_series(1, 100000) AS s; INSERT INTO item (order_id, product_id, quantity) SELECT floor(random() * 100000) + 1, floor(random() * 1000000) + 1, floor(random() * 100) + 1 FROM generate_series(1, 1000000);
Now let’s fetch those top-ranked items for each order. We can use a CTE and a window function for that.
WITH top_order_item AS ( SELECT * FROM ( SELECT *, dense_rank() OVER (PARTITION BY i.order_id ORDER BY i.quantity DESC) r FROM item i JOIN product p ON p.id = i.product_id ) x WHERE r = 1 ) SELECT * FROM "order" o JOIN top_order_item i ON i.order_id = o.id ORDER BY o.id DESC LIMIT 10
Here we introduce a rank on each order item sorting them by descending quantity and then leave only the first item for each order. We are fetching only last 10 orders.
On my machine this query runs 1800 ms which is way too long. The problem here is that Postgres does not know in advance that we need ranked items for only a few orders. Thus Postgres ranks all the million items first.
The solution is to first filter then sort.
WITH top_order_item AS ( SELECT * FROM ( SELECT *, dense_rank() OVER (PARTITION BY i.order_id ORDER BY i.quantity DESC) r FROM item i JOIN product p ON p.id = i.product_id WHERE order_id IN ( SELECT o.id FROM "order" o ORDER BY o.id DESC LIMIT 10 ) ) x WHERE r = 1 ) SELECT * FROM "order" o JOIN top_order_item i ON i.order_id = o.id ORDER BY o.id DESC LIMIT 10
Here we filter orders in advance and this query only takes 0.5 ms to run which is ~3000 times faster than the previous version.
Note that applying the order filter to the outer query (after
"WHERE r = 1") won’t help as well as replacing CTE with a subquery. We should apply filtering as early as possible.
In real life when the queries are quite complex and involve a lot of CTEs, tables and conditions it’s easy to miss the point.
EXPLAIN ANALYZE is not a silver bullet either as Postgres shuffles the query plan for ineffective queries making it hard to find the source of the problem.