A hint on PostgreSQL ranking query optimization

Recently I’ve stumbled upon a few similar cases of ineffective queries which I want to share with you.

An example use case is as follows: we have an online store, each order contains some products with respective quantity. We want to show a few last orders in user’s personal account with some top-ranked items for each order. Let’s show the item with maximal quantity.

Your last orders:
1. Order #136 on June 07: Hankook All Season Tire x 48, … Click for details
2. Order #135 on June 02: Yokohama Winter Tire x 36, … Click for details

Set up the tables and indexes…

CREATE TABLE product (
  id serial NOT NULL,
  name TEXT,
  CONSTRAINT product_pkey PRIMARY KEY (id)
);

CREATE TABLE "order" (
  id serial NOT NULL,
  name TEXT,
  CONSTRAINT order_pkey PRIMARY KEY (id)
);

CREATE TABLE item (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    CONSTRAINT item_order_id_fk FOREIGN KEY (order_id) REFERENCES "order"(id),
    CONSTRAINT item_product_id_fk FOREIGN KEY (product_id) REFERENCES product(id)
);

CREATE INDEX item_order_id ON item USING btree (order_id);
CREATE INDEX item_product_id ON item USING btree (product_id);

And add some data…

INSERT INTO product (id, name)
SELECT s, 'product_' || s FROM generate_series(1, 1000000) AS s;

INSERT INTO "order" (id, name)
SELECT s, 'order_' || s FROM generate_series(1, 100000) AS s;

INSERT INTO item (order_id, product_id, quantity)
SELECT
  floor(random() * 100000) + 1,
  floor(random() * 1000000) + 1,
  floor(random() * 100) + 1
FROM generate_series(1, 1000000);

Now let’s fetch those top-ranked items for each order. We can use a CTE and a window function for that.

WITH top_order_item AS (
  SELECT * FROM (
    SELECT
      *,
      dense_rank() OVER (PARTITION BY i.order_id ORDER BY i.quantity DESC) r
    FROM item i
      JOIN product p ON p.id = i.product_id
  ) x
  WHERE r = 1
)
SELECT *
FROM "order" o
JOIN top_order_item i ON i.order_id = o.id
ORDER BY o.id DESC
LIMIT 10

Here we introduce a rank on each order item sorting them by descending quantity and then leave only the first item for each order. We are fetching only last 10 orders.

On my machine this query runs 1800 ms which is way too long. The problem here is that Postgres does not know in advance that we need ranked items for only a few orders. Thus Postgres ranks all the million items first.

The solution is to first filter then sort.

WITH top_order_item AS (
  SELECT * FROM (
    SELECT
      *,
      dense_rank() OVER (PARTITION BY i.order_id ORDER BY i.quantity DESC) r
    FROM item i
      JOIN product p ON p.id = i.product_id
    WHERE 
      order_id IN (
        SELECT o.id
        FROM "order" o
        ORDER BY o.id DESC
        LIMIT 10
      )
  ) x
  WHERE r = 1
)
SELECT *
FROM "order" o
JOIN top_order_item i ON i.order_id = o.id
ORDER BY o.id DESC
LIMIT 10

Here we filter orders in advance and this query only takes 0.5 ms to run which is ~3000 times faster than the previous version.

Note that applying the order filter to the outer query (after "WHERE r = 1") won’t help as well as replacing CTE with a subquery. We should apply filtering as early as possible.

In real life when the queries are quite complex and involve a lot of CTEs, tables and conditions it’s easy to miss the point. EXPLAIN ANALYZE is not a silver bullet either as Postgres shuffles the query plan for ineffective queries making it hard to find the source of the problem.

Kotlin coroutines real life example

I love Kotlin and it’s concept of coroutines attracts me. But first I didn’t understand two simple things about using coroutines on backend.

Coroutines are useless without non-blocking API

People usually think: “OK, I put this Spring RestTemplate request within async coroutine and everything becomes non-blocking and efficient!”

Alas, coroutines documentation is a bit misleading. As soon as RestTemplate calls are blocking all your coroutines will also block. You won’t gain anything from calling blocking methods be it HTTP or database requests.

Let’s mentally trace the call – a thread is started, another coroutine is started inside the thread, coroutine makes HTTP call and blocks the thread until the response comes. There’s no magic here. If you have only one thread the whole application halts.

There’s a simple rule: non-blocking + blocking = blocking

To gain from coroutines you should only use non-blocking API: Spring Reactive Web Client or Ktor Http Client for HTTP requests, R2DBC for database queries.

Coroutines only reduce number of threads

— OK, I use Ktor HTTP Client with coroutines. Does everything run very fast now?
— No, it’s not faster than if you start a separate thread for every call. Just use multi-threading.

But if you are making a really large number of simultaneous requests the coroutines are here to serve. Remember that threads are expensive. They cost memory as well as setup and context switching time. Hundred of threads is already a large number.

If your application makes a lot of calls and most of the time waits for the response you should definitely stick to coroutines.

A real life example

Consider you are parsing data from 1000 different sites. You need to make a call to every site wait for the response and then do something with aggregated data.

With multi-threading you will start 200 threads in a pool each of them eating memory, quickly sending a request and then sleeping. As some threads are finished they return to thread pool, make another call and sleep again.

With coroutines you could make all 1000 simultaneous calls in one thread!

fun parseAll(urls: List<String>) = runBlocking {
    val client = HttpClient { }

    val results = urls
            .map { url ->
                async {
                    client.get<String>(url)
                }
            }
            .awaitAll()

    // Do something with results
}

All calls are started in a single thread one by one. After a request is sent thread switches to another coroutine. When all requests are sent thread sleeps until the results start coming. After all responses are received thread continues with results collected.

Is that all about coroutines?

Coroutines are more than just a tool to reduce number of threads. They suggest a rather clean and error-prone concept for dealing with asynchronous operations. But this is mostly useful on frontend where number of threads is limited to a few or even one. At server-side you could use coroutines when making many calls to external services which are waiting most of the time.

Additional benefit is that when you skip multi-threading you avoid all the concurrency problems.

Providing error response in RESTful API. Part 3


In the previous article we discussed error response for each specific situation. Now I’ll show you how this could be easily implemented in a Spring web application.

Convert Exception to HTTP response

We have a web layer where our controllers reside and a service layer where business logic runs. When our service runs into an exceptional situation it throws an exception which should be converted to a proper error response returned by web layer. At the same time we want to separate the concepts so that service layer does not operate with http response or status codes directly.

Spring offers a convenient solution for that — ExceptionHandler

Let’s take an example use case — a user submits a new contact form providing a companyId the contact should be tied to but our service cannot find that company in the DB. This time we should return a 400 Bad Request.

@Service
class ContactService(
  val companyRepository: CompanyRepository
) {
  fun createContact(name: String, companyId: Int) {
    val company = companyRepository.findById(companyId)
      .orElseThrow { IllegalArgumentException("Company not found: ID=$companyId") }
    ... // Save new contact 
  }
}

This service method is just throwing an exception, let’s handle it within the web layer. That could be performed with @ControllerAdvice and @ExceptionHandler annotations.

@ControllerAdvice
class RestAPIExceptionHandler {
  ...

  @ExceptionHandler(IllegalArgumentException::class)
  fun handleIllegalArgumentException(e: IllegalArgumentException, request: WebRequest) {
    val errorResponse = ErrorResponse("invalid_request", e.message)

    return ResponseEntity(errorResponse, HttpHeaders(), HttpStatus.BAD_REQUEST)
  }

  ...
}

Where ErrorResponse is a DTO for response body holding error code, message, solution and other information a client can use.

class ErrorResponse(
  val code: String,
  val message: String,
  val solution: String?
)

This way any time we throw an IllegalArgumentException in the service, data or web layer it is intercepted by RestAPIExceptionHandler and returned to client as an HTTP response with 400 Bad Request status code, specific "invalid_request" application code and a user friendly message.

Provide error specific data

"invalid_request" error code is too broad. For a client it would be more convenient if he knows which form field was incorrect and which value.

For that it makes sense to inherit specific responses from a generic ErrorResponse.

abstract class ErrorResponse(
  val code: String,
  val message: String,
  val solution: String?
)

open class InvalidRequestResponse(message: String): 
  ErrorResponse(
    code = "invalid_request",
    message = message
  )

class CompanyNotFoundResponse(val companyId: Int):
  ErrorResponse(
    code = "company_not_found",
    message ="Company not found: ID=$companyId",
    solution = "Provide another ID or create a new company"
  )

Note that we also provide companyId property within the specific response.

To return such response we need to also create a specific exception and a handler.

class CompanyNotFoundException(val companyId: Int):
  RuntimeException("Company not found: ID=$companyId")
@ControllerAdvice
class RestAPIExceptionHandler {
  ...

  @ExceptionHandler(CompanyNotFoundException::class)
  fun handleCompanyNotFoundException(e: CompanyNotFoundException, request: WebRequest) {
    val errorResponse = CompanyNotFoundReponse(e.companyId)

    return ResponseEntity(errorResponse, HttpHeaders(), HttpStatus.BAD_REQUEST)
  }

  ...
}

Return 404 instead of 400

If a company is requested as a resource by a direct URL we should return 404 instead. I suggest having additional exception and response classes for those cases.

class CompanyResourceNotFoundResponse(val companyId: Int):
  ErrorResponse(
    code = "company_not_found",
    message ="Company not found: ID=$companyId",
    solution = "Provide another ID or create a new company"
  )
class CompanyResourceNotFoundException(val companyId: Int):
  RuntimeException("Company not found: ID=$companyId")
@ControllerAdvice
class RestAPIExceptionHandler {
  ...

  @ExceptionHandler(CompanyResourceNotFoundException::class)
  fun handleCompanyResourceNotFoundException(e: CompanyResourceNotFoundException, request: WebRequest) {
    val errorResponse = CompanyResourceNotFoundReponse(e.companyId)

    return ResponseEntity(errorResponse, HttpHeaders(),   HttpStatus.NOT_FOUND)
  }

  ...
}

To keep things simple you can create generic classes for this two types of errors and extend them for specific types:

abstract class EntityNotFoundException(
  val entityType: String,
  val entityId: Int
): RuntimeException("$entityType with id $entityId not found")
class EntityNotFoundResponse(
  val entityType: String,
  val entityId: Int
):
ErrorResponse(
  code = "$entityType_not_found",
  message ="$entityType not found: ID=$entityId",
  solution = "Provide another ID or create a new $entityType"
)

And the same for EntityResourceNotFoundException


That’s all about error handling in RESTful web applications. Hope this helps!

Links

  1. Providing error response in RESTful API. Part 1
  2. Providing error response in RESTful API. Part 2
  3. Providing error response in RESTful API. Part 3

Providing error response in RESTful API. Part 2

Last time we talked about how a proper error response should be structured. Now let’s analyze when we should use each of the HTTP status codes.

401 Unauthorized and 403 Forbidden

There’s some confusion about these codes because 401 Unauthorized is actually intended for “unauthenticated”, i.e. user cannot log in, error while 403 Forbidden is what we usually call “unauthorized”, i.e. not enough permissions.

For 401 I distinguish two cases:

  • Username or password invalid
  • Session expired

Example responses:

{
  "status": 401,
  "code": "bad_credentials",
  "message": "Username or password invalid",
  "solution": "Check provided credentials or register a new user"
}

{
  "status": 401,
  "code": "session_expired",
  "message": "Your session token is expired",
  "solution": "Log in with your username and password"
}

403 is a very common error which is usually thrown by security layer. It should be returned when a user lacks permission to perform requested action. I suggest adding two fields in the response:

  1. permission — lacking permission name
  2. logged_in — is user logged in or not, as problem may be caused by user being unauthenticated

Example responses:

{
  "status": 403,
  "code": "access_denied",
  "permission": "create_contact",
  "logged_in": true,
  "message": "Access denied: you don't have necessary permission to perform this action",
  "solution": "Ask the administrator for appropriate permission"
}

{
  "status": 403,
  "code": "access_denied",
  "permission": "create_contact",
  "logged_in": false,
  "message": "Access denied: you must be logged in to perform this action",
  "solution": "Log in with your username and password"
}

There’s one edge case though. Consider an intruder is scanning our private contacts API GET /api/contact/ID with different ids. We should not give him a hint whether an ID exists or not. That’s why I advise to return 404 instead of 403 if a resource requested by URL is not authorized.

400 Bad request

This is the largest group as it covers multiple problems with invalid data provided by client. Let’s consider most common.

Missing required request parameter

When a required query or path parameter value is not specified.

{
  "status": 400,
  "code": "missing_parameter",
  "parameter_name": "contact_id",
  "message": "Required request parameter value is missing: request_id",
  "solution": "Provide the request_id value"
}

Record not found

This should be thrown when some complex request references non-existing entity with a request parameter or in request body, e.g. GET /api/income-report?company_id=100&contact_id=123

{
  "status": 400,
  "code": "record_not_found",
  "record_type": "contact",
  "record_id": 123,
  "message": "A contact with id 123 not found",
  "solution": "Check the id provided or create a new record"
}

If we throw 404 a client may think that he misspelled income-report URL.

But if a client requests a specific record by a direct URL like GET /api/contacts/123 we should throw 404 Not found instead.

Validation error

Different validation errors occurring in business layer when requested parameters are processed. A specific message should be provided in each case.

{
  "status": 400,
  "code": "invalid_property_value", 
  "property": "age",
  "message": "Age must be a positive number",
  "solution": "Check value provided"
}

Invalid JSON

There could be syntax error in request JSON which should also be covered with a proper response.

404 Resource not found

When a non-existing entity is requested by a direct link we should return 404. I also suggest returning this status code in case a resource is non-authorized (see 403 section above)

{
  "status": 404,
  "code": "record_not_found", 
  "record_type": "contact",
  "record_id": 123,
  "message": "Requested contact with id 123 was not found",
  "solution": "Check the id provided or create a new record"
}

409 Conflict

There comes a broad range of different database conflicts.

Duplicate id

Thrown when an attempt to create an entity with existing unique id is made.

{
  "status": 409,
  "code": "duplicate_id", 
  "record_type": "contact",
  "record_id": 123,
  "message": "A contact with id 123 already exists",
  "solution": "Provide another id value"
}

Foreign key violation

An attempt to violate foreign key reference, e.g. delete a referenced record or reference a non-existing record.

{
  "status": 409,
  "code": "foreign_key_violation", 
  "record_type": "contact",
  "record_id": 123,
  "referenced_type": "company",
  "referenced_id": 25,
  "message": "Cannot delete contact#123 as it is still referenced by company#25",
  "solution": "Delete company#25 first"
}

This is just an example as there could be many different types of constraint violations.

402 Payment Required — an exotic one

This code is not widely used but it could be used in case when for example a user wants to perform an action not covered by his subscription plan. Thus this action could be performed after the plan is upgraded.

{
  "status": 402,
  "code": "listen_limit_exceeded", 
  "message": "Limit of 1000 audio records per day is exceeded",
  "solution": "Upgrade your subscription"
}

You could also use 409 Conflict in this case but it usually implies you should do something with the data, not with the money 🙂

5xx Server errors

And finally to the server errors.

As you always write high quality server code (aren’t you?) I cannot imagine a situation when you would throw a server error explicitly (your HTTP server could of course return 503 Service Unavailable).

The only server code I use is 502 Bad Gateway. It is helpful if our server is interacting with some external API (e.g. Facebook / Twitter) thus serving as a gateway. If we just want to resend external API error to our client without processing it we can return 502 server error code.

Next time I will show how to implement such proper error handling in Spring web application with a minimal effort.

You could also check the previous article for general ideas on proper error handling in REST architecture.

Links

  1. Providing error response in RESTful API. Part 1
  2. Providing error response in RESTful API. Part 2
  3. Providing error response in RESTful API. Part 3

Providing error response in RESTful API. Part 1

Providing comprehensive error response is an important part of a well-designed REST architecture. It is very annoying to get “500 Internal Server Error” every time something goes wrong being you a frontend developer, public API user or end customer.

Never return 500

When your endpoint returns “500 Internal Server Error” it means just that — there’s an error in your code. Your service should never return this code in case of an invalid request parameters, missing database record or null pointer exception. Every time you detect code 500 dig into your code and make the response user friendly.

Client and Server errors

There are actually two groups of error codes — 4xx Client errors and 5xx Server errors.

The first group is much larger because we tend to fix server errors but can’t fix incorrect client requests.

Nevertheless there are only 5 client error codes that are really useful to us:

  • 401 Unauthorized — user is not logged in
  • 403 Forbidden — user does not have sufficient permissions
  • 400 Bad Request — invalid user input, missing data referenced
  • 404 Not Found — requested resource (DB record) not found or not accessible
  • 409 Conflict — data conflict, e.g. constraint violation or unique id duplicated

It is unlikely that you ever return server error codes yourself. Errors like 502 Bad Gateway, 503 Service Unavailable or 504 Gateway Timeout are usually thrown by your HTTP server before it reaches your application layer.

Error response structure

What a client expects to see in case of an error? I believe this is sufficient:

  1. HTTP status code — this is a must have and the code should match the real error cause.
  2. Error message — a comprehensive error message for a human, but not necessary for the end user. This is mostly intended for the frontend developer or the public API user to immediately detect the source of the problem. E.g. : "Company not found: Rabbit Ltd."
  3. Solution — this is not always necessary but it’s a good practice to give an advice, e.g.: "Check that you entered valid company name or create a new company"
  4. Application specific error code — consider we throw a 400 Bad Request in case of invalid form data. How do we know which form field to highlight with a red border? We must always provide some short constant error code, e.g. "company_name_invalid" or "contact_not_found" so that the frontend can automatically provide a friendly response for the end user. The same for the programs that use our public API and want to resolve errors automatically (e.g. create company if it does not exist)
  5. Additional application specific data — in addition to an application error code we should usually provide some specific values, e.g. "contact_id: 123" for "contact_not_found" error.

Example response

Consider we post some form data including non existing company name. This is what I call an ideal error response:

Status code: 400 Bad Request
Content-type: application/json
Response-body:
{
  "status": 400,
  "message": "Company not found: name=Rabbit Ltd.",
  "solution": "Check that you provided an existing company name or create a new company",
  "code": "company_not_found",
  "company_name": "Rabbit Ltd."
}

Another example. We are trying to close a customer support request:

Status code: 409 Conflict
Content-type: application/json
Response-body:
{
  "status": 409,
  "message": "Cannot close request as it’s not marked resolved by customer",
  "solution": "Ask customer to mark request resolved or wait until auto close occurs",
  "code": "cannot_close_unresolved_request",
  "request_id": "100514"
}

The next story covers different HTTP status codes and their use cases.

Links

  1. Providing error response in RESTful API. Part 1
  2. Providing error response in RESTful API. Part 2
  3. Providing error response in RESTful API. Part 3