Tag: JVM

  • When Postgres is the wrong place to traverse the graph

    Some performance problems are not query-optimization problems. You can index, rewrite, tune parameters, throw hardware at it — and still lose, because the work simply doesn’t belong in the database. The fix isn’t a better query. It’s moving the work somewhere else.

    The task

    Imagine you perform a deep search in a huge graph — think of finding level-N friends in a social network or tier-N product parts in a bill of materials. You start from some node, visit all its nearest neighbours and recursively continue from there (breadth-first search) or recursively traverse every path (depth-first search).

    With Postgres the standard approach is to store (node A, node B) edge tuples in the database and apply recursive CTEs to query the graph.

    SQL
    WITH RECURSIVE r(node, depth, path) AS (
      SELECT       -- Start with the first node
        :first_node AS node,
        0 AS depth,
        ARRAY[:first_node] AS path
      UNION ALL
      SELECT       -- Recursively find next node
        g.b AS node,
        r.depth + 1 AS depth,
        r.path || g.b AS path
      FROM graph g
      JOIN r ON r.node = g.a
      WHERE g.b <> ALL(r.path)    -- Guard against cycles
    )
    SELECT *
    FROM r;

    The problem

    There are two main parameters affecting the performance of graph traversal: the branching factor b — average number of node connections, and the depth of traversal D.

    Whatever you do you eventually hit the O(b^D) wall. The average number of nodes to traverse grows exponentially with D. There’s no algorithmic way you can do deep search asymptotically faster (no, quantum computing would not help either).

    As your database grows the b will most probably increase, and as your business grows there will certainly be a demand to increase D. This means you would be constantly stuck with search time regressing from milliseconds to minutes. You could fight this by first using materialized CTEs, then switching to temporary tables with indexes, but all those efforts would be immediately swallowed by a slight increase of search depth.

    Another important issue is that Postgres cannot perform depth-first search, the SEARCH DEPTH FIRST does not change the algorithm under the hood. This means that at depth N you need to store ~b^N frontier nodes.

    The solution

    First of all you need to switch to the depth-first search to only store the current path in the memory, not the whole frontier. Then, as you cannot work around the general O(b^D) limitations of the algorithm, you should optimize each basic operation for speed and memory consumption to its limits.

    We should load the whole graph into the memory and make it effectively accessible. The main operation is finding the neighbours of the current node. For the most optimal access we need a map whose key is the node id and whose value is the list of connected nodes.

    Standard approach assuming node id is Int would be

    Kotlin
    val graph: Map<Int, List<Int>> = hashMapOf()

    But here comes the huge inefficiency. Maps and lists in Java operate on boxed values. Every id becomes a boxed Integer: a separate heap object — a 12-byte header plus the 4-byte value, ~16 bytes total — plus a 4–8 byte reference to reach it. Compare that to 4 bytes for a raw int. Accessing such a map is also slower: each lookup chases a pointer to a scattered heap object — a likely cache miss — and then unboxes it. 

    This could be overcome by using specialized Java collection libraries such as HPPC (High Performance Primitive Collections) which operate over unboxed values instead. See Reducing memory usage 10 times with High-Performance Primitive Collections

    In our example the graph would become:

    Kotlin
    val graph: IntObjectMap<IntList> = IntObjectHashMap()

    And as shown in the referenced article this leads to up to 10x less memory consumption. The latency improvement is usually even more substantial.

    One additional optimization is to prematurely filter the nodes, throwing away whole subgraphs if the node or edge does not satisfy certain conditions.

    You should also decide whether you actually need all paths to each node, or just the reachable nodes. If only distinct target nodes matter, a visited-set collapses the exponential blow-up toward linear! This is also something much easier to implement outside of SQL.

    A note on concurrency

    If you are going to update the graph in real time you need to make it writable. Here comes the trade-off — making the whole graph immutable and replacing it every time is too expensive, especially if the changes are minor. But making every element concurrently writable is also expensive.

    A possible solution might be to combine the two approaches:

    Kotlin
    val graph: ConcurrentIntObjectMap<IntList> = ConcurrentIntObjectHashMap()

    Keep the relatively small IntList values immutable and swap them wholesale, while making the outer map concurrently accessible via optimistic locking. This is not natively supported by HPPC, I will show the exact implementation in one of the next articles.

    The lesson

    The useful skill here isn’t using primitive collections or optimistic locking. The value comes from seeing the architectural flaw and moving application parts to where they actually belong.

    When you optimize SQL queries you see the solution as a better query plan, not as a conceptual shift. Sometimes it helps to stop fighting, zoom out and start from scratch.

  • Behavioral drift: silent bugs in LLM workflows

    I am working on an AI workflow, which, like many others, has a classifier step. It looks at a user prompt and routes to a proper specialist agent — support, sales, feedback. The respective system prompt had this line:

    Respond only with the exact query type. No explanation. No formatting.

    During daily editing this line was accidentally lost.

    After deployment, the classifier started returning markdown instead of a clean enum value:

    **feedback**

    At that point there was neither validation nor safe fallback value, so the workflow started to fail every time at that point.

    Why you cannot just “add validation”?

    First of all — this is a runtime issue. It will fail when a real LLM is requested, not in unit tests.

    Second. The change leading to the bug is not in the code, it’s in the system prompts.

    Third. The most important. This exact issue could be caught by validation, but what if the change happened within a complex json response? Or a node always returns markdown, but now the meaning is completely different? And even more, what if the form is still correct, but the enum distribution changed from 50:50 to 1:99?

    In practice most AI work lives in the prototyping and research phase. Prompts change weekly. Models get swapped. Workflows are restructured. The output shapes are being discovered. Under such conditions proper validation is a big challenge in itself.

    What actually happens here is that implicit contract between the LLM and the code is silently broken.

    I’d call this behavioral drift. It has three properties that make it especially nasty:

    • It looks correct to humans. **feedback** next to feedback doesn’t read as a bug.
    • It often passes automated checks. Both feedback and **feedback** are valid strings. Schemas describe what’s allowed, not what was normal.
    • Safe fallbacks make things worse. If **feedback** value is silently classified as "General request" days might pass before you notice something is wrong.

    Don’t fight it, but monitor

    How do we deal with such bugs then?

    Instead of trying to add more and more layers of sophisticated validation, one can instead compare the behaviour after any changes to the code or to the system prompts. This is kind of git diff applied to the workflow behaviour.

    Imagine after dropping the aforementioned line from the system prompt we get a notification

    Classification result format changed from "scalar string" to "markdown".

    This is a serious alert worth immediate checking.

    In a more complex case the monitoring system might warn you that

    Distribution of node responses changed from 30/30/40 to 10/0/90

    which is a behavioural drift probably concealing a serious problem.

    Anomaly Detector tool

    To obtain some peace of mind during rapid AI workflow prototyping I’ve implemented a small Kotlin/JVM tool which (in its current version) does the following:

    1. captures most important characteristics of each step output as “workflow node profiles”;
    2. upon request compares two workflow versions, flagging possible anomalies ordered by severity.

    Example findings:

    1. a node is missing (never visited) in the new version;
    2. the node is here, but its output form changed drastically;
    3. node started to route to a different target most of the time.

    I tried to keep the tool minimally invasive. To collect profiling information you just need to add checkpoints like this to your code:

    Kotlin
    detector.checkpoint(step = "classify-query", output = response.content)

    You can find the tool with full documentation here:

    https://github.com/minogin/ai-anomaly-detector

    The bigger point

    Non-deterministic software is still software. It’s time to expand our toolkit for these kinds of systems. Not just answering deterministic-world questions, such as “is this node output correct?”, but also questions like “did this node’s output distribution change significantly?”

    If you’ve shipped LLM workflows in production, you’ve probably seen something like this. Please share your experience, what non-deterministic problems you faced and what the cure was?

  • Reducing memory usage 10 times with High-Performance Primitive Collections

    Kotlin basic types such as Int or Double correspond to high-performance Java primitive types such as int or double. But nullable (Int?) and generic (<Int>) versions of those types are mapped to boxed Java types such as Integer or Double.

    Boxed types are memory heavy. Let’s make a simple comparison.

    Kotlin
    @Test
    fun `memory occupied by primitive int`() {
        data class A(
            val x: Int
        )
    
        val N = 100_000_000
    
        val mem1 = calculateOccupiedMemoryMB()
    
        val list = List(N) { A(it) }
    
        val mem2 = calculateOccupiedMemoryMB()
    
        println("Occupied memory: ${mem2 - mem1} MB")
    
        list
    }
    
    > Occupied memory: 1910 MB
    Kotlin
    @Test
    fun `memory occupied by boxed Int`() {
        data class A(
            val x: Int?
        )
    
        val N = 100_000_000
    
        val mem1 = calculateOccupiedMemoryMB()
    
        val list = List(N) { A(it) }
    
        val mem2 = calculateOccupiedMemoryMB()
    
        println("Occupied memory: ${mem2 - mem1} MB")
    
        list
    }
    
    > Occupied memory: 3436 MB

    We already see almost 2x difference, but actually it’s more serious as our test is not accurate enough.

    Code explained

    calculateOccupiedMemoryMB measures the diff between total and occupied memory running garbage collection for at least 3 seconds in advance to reduce the garbage footprint.

    Kotlin
    fun calculateOccupiedMemoryMB(): Int {
        getRuntime().gc()
        Thread.sleep(3000)
        return ((getRuntime().totalMemory() - getRuntime().freeMemory()) / (1024 * 1024)).toInt()
    }

    list reference at the end of the block is a trick to avoid JVM optimization. If JVM sees an object is not used it might wipe it off the RAM.

    What if we need a huge Set of Int‘s or a huge Map of Int to Object? Unfortunately standard Java Collections are based on generics which means all of the objects will be autoboxed.

    Here HPPC: High Performance Primitive Collections comes to the rescue. This library has predefined collection for all the primitive types.

    Let’s compare memory footprints of a normal Java HashSet<Int> and a corresponding HPPC IntHashSet.

    Kotlin
    @Test
    fun `memory occupied by HashSet`() {
        val N = 100_000_000
    
        val mem1 = calculateOccupiedMemoryMB()
    
        val set = hashSetOf<Int>()
        repeat(N) { set.add(it) }
    
        val mem2 = calculateOccupiedMemoryMB()
    
        println("Occupied memory: ${mem2 - mem1} MB")
    
        1 in set
    }
    
    > Occupied memory: 5098 MB
    Kotlin
    @Test
    fun `memory occupied by HashSet`() {
        val N = 100_000_000
    
        val mem1 = calculateOccupiedMemoryMB()
    
        val set = IntHashSet()
        repeat(N) { set.add(it) }
    
        val mem2 = calculateOccupiedMemoryMB()
    
        println("Occupied memory: ${mem2 - mem1} MB")
    
        1 in set
    }
    
    > Occupied memory: 518 MB

    10 times less memory used!