Avatar
😀

Organizations

  • Always use a Surrogate Primary Key

    Many years ago, when I was still in possession of my hair, I entered a subcontract to develop a system to replace paper-based case files for a quango in South Africa.

    The main contract was held by a large development shop that specialised in a related, but different field. They had given one of their systems analysts the task of analysing the system (duh!) but had no spare capacity to code it, so they farmed it out to me.

    dbre database design normalising Created Sat, 26 Apr 2025 10:49:47 +0100
  • Ansible Antipatterns

    Lots of people tell you how to do things properly. Here are some things I’ve inherited in our Ansible that break things in horrible ways.

    Workshop

    Coupling

    Coupling is (roughly) how interconnected different components of a system are. One of the most common patterns in modern software engineering is loose coupling.

    The idea of separating things across an API so that you can change things behind the API without having to change other components is very powerful.

    dbre ansible old posts automation Created Sat, 26 Apr 2025 10:32:52 +0100

  • The Forgotten Aspect of DBRE

    The horror… the horror…

    I am a great fan of the DBRE book, and an avid follower of the authors’ podcasts, videos and blog posts.

    But there is one glaring omission that I can’t find discussed properly anywhere: upgrades.

    You need to patch to fix bugs or address vulnerabilities. You need to upgrade for support and new features that developers want.

    I understand that site-specific needs make it hard to proclaim detailed edicts. But the DBRE book isn’t about detailed edicts. It’s about principles and a way of working.

    dbre upgrades old posts Created Sat, 26 Apr 2025 09:51:11 +0100
  • Or, “a litany of errors”

    The lean model for continuous improvement looks something like this:

    Lean

    Where you start is a matter of some debate, as is the direction of flow. However, most people start with the idea of building a Minimum Viable Product (MVP) as an experiment; measuring the results; and then learning from the measures they recorded, which then feeds into the next build of the product.

    What is an MVP for a Database Reliability Engineer (DBRE)? A good, old-fashioned manual install. A best practices manual install.

    dbre toil starting Created Fri, 25 Apr 2025 20:06:00 +0100
  • Overview

    Part 1 looks at some principles and introduces some of the tools in your arsenal.

    Part 2 covers the dreaded PostgreSQL bloat issue as well as how TOAST can help performance, or hinder it.

    Part 3 dives deeper into identifying, investigating and mitigating slow queries.

    Part 4 covers indexes.

    Part 5 (this post) covers rewriting queries.

    TL;DR

    • Break your query down using CTEs
    • Optimise each step
    • Rewrite using what you’ve learned (if you want to)
    • Visualising an explain plan can really make problems obvious, compared to sifting through a sea of text

    Rewriting queries

    bench=# \d pgbench_accounts
                  Table "public.pgbench_accounts"
      Column  |     Type      | Collation | Nullable | Default
    ----------+---------------+-----------+----------+---------
     aid      | integer       |           | not null |
     bid      | integer       |           |          |
     abalance | integer       |           |          |
     filler   | character(84) |           |          |
    Indexes:
        "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
        "i_pa2" btree (aid, abalance, bid) WHERE aid > 0 AND aid < 194999 AND abalance < '-4500'::integer AND abalance > '-5000'::integer
    Foreign-key constraints:
        "pgbench_accounts_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
    Referenced by:
        TABLE "pgbench_history" CONSTRAINT "pgbench_history_aid_fkey" FOREIGN KEY (aid) REFERENCES pgbench_accounts(aid)
    
    bench=# \d pgbench_branches
                  Table "public.pgbench_branches"
      Column  |     Type      | Collation | Nullable | Default
    ----------+---------------+-----------+----------+---------
     bid      | integer       |           | not null |
     bbalance | integer       |           |          |
     filler   | character(88) |           |          |
    Indexes:
        "pgbench_branches_pkey" PRIMARY KEY, btree (bid)
    Referenced by:
        TABLE "pgbench_accounts" CONSTRAINT "pgbench_accounts_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
        TABLE "pgbench_history" CONSTRAINT "pgbench_history_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
        TABLE "pgbench_tellers" CONSTRAINT "pgbench_tellers_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
    
    bench=# \d pgbench_history
                        Table "public.pgbench_history"
     Column |            Type             | Collation | Nullable | Default
    --------+-----------------------------+-----------+----------+---------
     tid    | integer                     |           |          |
     bid    | integer                     |           |          |
     aid    | integer                     |           |          |
     delta  | integer                     |           |          |
     mtime  | timestamp without time zone |           |          |
     filler | character(22)               |           |          |
    Indexes:
        "i1" btree (aid, delta) WHERE delta < 0
    Foreign-key constraints:
        "pgbench_history_aid_fkey" FOREIGN KEY (aid) REFERENCES pgbench_accounts(aid)
        "pgbench_history_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
        "pgbench_history_tid_fkey" FOREIGN KEY (tid) REFERENCES pgbench_tellers(tid)
    
    bench=# \d pgbench_tellers
                  Table "public.pgbench_tellers"
      Column  |     Type      | Collation | Nullable | Default
    ----------+---------------+-----------+----------+---------
     tid      | integer       |           | not null |
     bid      | integer       |           |          |
     tbalance | integer       |           |          |
     filler   | character(84) |           |          |
    Indexes:
        "pgbench_tellers_pkey" PRIMARY KEY, btree (tid)
    Foreign-key constraints:
        "pgbench_tellers_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
    Referenced by:
        TABLE "pgbench_history" CONSTRAINT "pgbench_history_tid_fkey" FOREIGN KEY (tid) REFERENCES pgbench_tellers(tid)
    

    Joins

    We have already covered one aspect of query rewriting, namely how to rewrite joins, which we covered in Query Performance Improvement Part 3.

    postgresql performance Created Wed, 23 Apr 2025 12:07:07 +0100
  • Overview

    Part 1 looks at some principles and introduces some of the tools in your arsenal.

    Part 2 covers the dreaded PostgreSQL bloat issue as well as how TOAST can help performance, or hinder it.

    Part 3 dives deeper into identifying, investigating and mitigating slow queries.

    Part 4 (this post) covers indexing.

    Part 5 covers rewriting queries.

    TL;DR

    • If you see “Filter:” or “Rows Removed” in your explain plan, you are likely to be using the wrong index
    • Composite indexes can be much faster than multiple single indexes
    • Partial indexes reduce the height and size of the index, making them faster
    • Covering indexes can speed queries even further
    • The bigger the table, the more likely it is that an index will add value
    • Small tables are often sequentially scanned, whether indexed or not
    • Small to medium tables tend to benefit from single column indexes, less from more complex indexes
    • Large tables tend to benefit more from complex indexes
    • Always test before pushing to production!

    Indexes

    Indexes are one of the most effective tools to improve query performance. Indexes work best when they return a small subset of data and are small compared to row size. So an 8-byte integer index on a 2kB row is more efficient than a 1kB text index on a 1.5kB row.

    postgresql performance Created Wed, 23 Apr 2025 12:07:01 +0100
  • Overview

    Part 1 looks at some principles and introduces some of the tools in your arsenal.

    Part 2 covers the dreaded PostgreSQL bloat issue as well as how TOAST can help performance, or hinder it.

    Part 3 (this post) dives deeper into identifying, investigating and mitigating slow queries.

    Part 4 covers indexing.

    Part 5 covers rewriting queries.

    TL;DR

    • Use logging to identify slow queries
    • Use EXPLAIN to examine an explain plan
    • EXPLAIN (ANALYZE) can be dangerous!
    • Use depesz, dalibo or PEV to visualize your explain plan
    • Red flags in your explain plan include filters, Rows Removed, and / or costs in excess of 100,000
    • Data may not be evenly distributed, check it! Especially test data!
    • Only select the rows and columns you need

    Identifying slow queries

    The most reliable way of identifying slow queries is to enable slow query logging using the parameter log_min_duration_statement. The pain threshold varies by environment, but for example if you wanted to log all queries that take longer than 50ms, you would issue the following command in psql:

    postgresql performance Created Sun, 20 Apr 2025 13:55:43 +0100
  • Overview

    Part 1 looks at some principles and introduces some of the tools in your arsenal.

    Part 2 (this post) covers the dreaded PostgreSQL bloat issue as well as how TOAST can help performance, or hinder it.

    Part 3 dives deeper into identifying, investigating and mitigating slow queries.

    Part 4 covers indexing.

    Part 5 covers rewriting queries.

    TL;DR

    • Bloat occurs because PostgreSQL does not update in place, it creates a new row and marks the old one as obsolete
    • VACUUM fixes bloat, well, it marks obsolete rows as re-usable space
    • VACUUM FULL fixes bloat by compactly rewriting the data file and rebuilding indexes
    • You should never run VACUUM FULL because it requires a full table lock
    • pg_repack is an option … possibly
    • Use pg_repack with care - it can have problematic side effects
    • TOAST compresses wide columns so that the resulting row is < 2kB
    • If the table is still too wide, then compressed wide columns are split into 2kB chunks and move to a TOAST table
    • A 2kB (ish) wide table can be problematic

    Bloat

    What is bloat?

    Strictly speaking, bloat is the unused space left behind by a VACUUM that is not a VACUUM FULL.

    postgresql performance bloat toast Created Sun, 20 Apr 2025 10:53:36 +0100
  • Overview

    Part 1 (this post) looks at some principles and introduces some of the tools in your arsenal.

    Part 2 covers the dreaded PostgreSQL bloat issue as well as how TOAST can help performance, or not.

    Part 3 dives deeper into identifying, investigating and mitigating slow queries.

    Part 4 covers indexing.

    Part 5 covers rewriting queries.

    TL;DR

    • Scaling up or out is more expensive than spending a couple of hours optimising
    • Get answers, not data
    • Lazy load lookups
    • Eliminate non-essential queries
    • Don’t use SERIALIZABLE isolation
    • Make sure auto-analyze is running and have a manual ANALYZE schedule as a safety net
    • Denormalising is usually the wrong solution to performance issues

    Introduction

    Welcome to this series of blog posts which will hopefully walk you through everything you need to know about improving query performance, specifically in PostgreSQL. The basic principles apply to all databases, though, even if implementation differences mean that different databases find different things painful.

    postgresql performance serializable analyze Created Sat, 19 Apr 2025 01:30:19 +0100