Understanding PostgreSQL’s Write-Ahead Logging (WAL)

May 22, 2025 • postgres

PostgreSQL’s Write-Ahead Logging (WAL) is at the heart of its durability and crash recovery. If you’ve ever wondered how PostgreSQL ensures your data is safe—even in the event of a crash—this post will walk you through the architecture, flow, and the actual source code that makes it all work.

High-Level Architecture & Flow of WAL #

What is WAL? #

WAL is a mechanism that ensures all changes to the database are first recorded in a log before being applied to the data files. This guarantees that, even if the system crashes, PostgreSQL can recover to a consistent state.

Key Components #

WAL Buffers:
In-memory buffers that temporarily hold WAL records before they’re written to disk.
WAL Files:
On-disk files (in pg_wal/), typically 16MB each, storing the WAL records.
WAL Writer Process:
A background process that flushes WAL buffers to disk.
Checkpointer:
Ensures data files are consistent with WAL.
Archiver:
Optionally archives completed WAL segments for point-in-time recovery (PITR).

Sequence of Events #

Change Initiation:
A transaction modifies data (e.g., an INSERT).
WAL Record Creation:
The change is encoded as a WAL record in memory.
WAL Buffering:
The WAL record is placed in the WAL buffers.
WAL Flush:
Before a transaction commits, its WAL records are flushed to disk.
WAL File Management:
WAL files are rotated, archived, and recycled as needed.
Crash Recovery:
On restart after a crash, WAL is replayed to bring the database to a consistent state.

Mapping WAL to the PostgreSQL Source Code #

Let’s walk through the main code files and functions for each component.

1. WAL Record Creation #

File: xlog.c
Key Struct: XLogRecord
Key Functions:
- XLogInsert(): Called whenever a change is made (e.g., tuple insert/update/delete). Constructs a WAL record and appends it to the WAL buffers.
- XLogRegisterData(), XLogRegisterBuffer(): Used by lower-level code to register data and buffers that should be included in the WAL record.

2. WAL Buffering and Flushing #

File: xlog.c
Key Struct: XLogCtlData (shared memory control structure for WAL)
Key Functions:
- XLogWrite(): Flushes WAL buffers to disk.
- XLogFlush(): Ensures that WAL up to a certain point is safely on disk (called before commit).
- XLogBackgroundFlush(): Used by the WAL writer background process.

3. WAL Writer Process #

File: walwriter.c
Key Function:
- WalWriterMain(): Main loop for the WAL writer background process, periodically flushing WAL buffers.

4. WAL File Management #

File: xlog.c
Key Functions:
- XLogFileInit(), XLogFileOpen(), XLogFileClose(): Manage creation, opening, and closing of WAL segment files.
- XLogFileName(): Generates the filename for a given WAL segment.

5. Crash Recovery #

File: xlog.c
Key Function:
- StartupXLOG(): Main function for crash recovery; reads and replays WAL records.

6. Archiving #

File: xlogarchive.c
Key Functions:
- XLogArchiveNotify(), XLogArchiveCheckDone()

Important Structs, Macros, and Configurations #

XLogRecPtr: 64-bit pointer to a WAL location.
XLogRecord: Struct representing a single WAL record.
XLogCtlData: Shared memory structure for WAL state.

Configuration Parameters (in postgresql.conf):

wal_level
wal_buffers
wal_writer_delay
archive_mode, archive_command
max_wal_size, min_wal_size

Step-by-Step Exploration #

WAL Record Creation:
Start in xlog.c with XLogInsert(). See how a WAL record is constructed and added to the buffer. Explore how XLogRegisterData() and XLogRegisterBuffer() are used to build the record.
WAL Buffering and Flushing:
Follow XLogWrite() and XLogFlush(). See how WAL buffers are managed and flushed to disk. Understand the role of XLogCtlData.
WAL Writer Process:
Look at WalWriterMain() in walwriter.c. See how the background process periodically flushes WAL.
WAL File Management:
Explore XLogFileInit(), XLogFileOpen(), etc. See how WAL files are created, opened, and rotated.
Crash Recovery:
Study StartupXLOG(). See how WAL is replayed after a crash.

Conclusion #

PostgreSQL’s WAL system is a robust, well-architected mechanism that ensures your data is safe and recoverable. By understanding both the high-level flow and the underlying source code, you gain insight into one of the most critical parts of PostgreSQL’s architecture.