IBM Support

Understanding Fuzzy Checkpoints

Question & Answer


Question

Explain the concept of fuzzy checkpoints.

Answer

INTRODUCTION

Checkpoints provide a place in the system to recover to in case of failure. The purpose of a checkpoint is to write all of the modified pages out to disk from the buffer cache and thus provide a physically consistent point for recovery. A "new" type of checkpoint, called a fuzzy checkpoint, was introduced to reduce the time it takes to checkpoint the database. The "old" type of checkpoint is now referred to as a sync checkpoint.


Some scenarios under which a checkpoint would occur:

    1. The physical log buffer becomes 75 % full.
    2. The time specified by the CKPTINTVL parameter in the ONCONFIG file is reached.
    3. The onmode –c command is used to request a sync checkpoint
    4. During an archive.
    5. During a rollback.
    6. When the next to the last available logical log contains the last checkpoint.


Main events that occur during a sync checkpoint:

    1. Database server prevents user threads from entering the critical section and also blocks other users from entering the critical section.
    2. Modified partition partition entries are stored to disk.
    3. All modified pages in the buffer pool are flushed to disk by the page cleaners. The physical log and logical log buffer are also flushed to disk as a side effect of this operation.
    4. A checkpoint record is written to the logical log and the checkpoint reserved page is updated on disk.
    5. The physical log is logically emptied.
    6. Restriction on entering the critical section is released.

After the introduction of fuzzy checkpoints these checkpoints are still used for special operations. This "old" type of a checkpoint is now called a sync checkpoint. A sync checkpoint can still be requested through an onmode –c.

Fuzzy checkpoints were basically introduced to reduce the checkpoint time. But the side effect is that fast recovery will now be longer. Pages that are modified by a fuzzy operation in the shared memory buffer pool are not flushed to disk. Certain subset of database operations is designated as fuzzy and the buffers that are modified by these fuzzy operations are flagged and are not written to disk when a check point occurs. In 9.2 the fuzzy operations include only inserts, updates and deletes to data or remainder pages and the database must have logging.


The main events that occur during a fuzzy checkpoint are as follow:

    1. Database server prevents user threads from entering the critical section and also blocks other users from entering the critical section.
    2. Modified partition partition entries are stored to disk.
    3. All modified pages in the buffer pool are flushed to disk by the page cleaners that are not modified by the fuzzy operations. A Dirty Page Table (DPT) is constructed. The physical log and logical log buffer are also flushed to disk as a side effect of this operation.
    4. The DPT is logged.
    5. A checkpoint record is written to the logical log and the checkpoint reserved page is updated on disk.
    6. The physical log is logically emptied.
    7. Restriction on entering the critical section is released.

Dirty Page Table: (DPT)



This is not a table but a structure in memory that stores the dirty pages. It stores the page addresses with fuzzy operations. This structure is created at the time of a fuzzy checkpoint. When a fuzzy checkpoint occurs, a list of all the dirty pages is compiled and this list is written to the logical log in a separate record just prior to the checkpoint record. This list is used to help determine which records in the logical log need to be applied if fast recovery occurs.
Example: onlog output.
addr     len   type        xid     id link
14c018   1772 DPT       1       5   0         217
14c704   32   CKPOINT   1       0   14c018   0

The Buffer header for a page will contain the flags LM_FUZZY which indicates that a fuzzy operation has taken place on that page and the Log Sequence Number (LSN) for that page.



Log Sequence Number (LSN)

This is a position in the logical log. It consists of a logical log number and a logpos (Page offset into log and byte offset into page).

LSN is stored in two ways:
    1. When a fuzzy operation has occurred the LSN is stored in the buffer’s header and this signifies when it was last modified.
    2. The oldest LSN is also tracked because it represents where the oldest fuzzy operation has taken place.


Restriction for LSN:

The oldest LSN can never be further back in the log than one logical log and 2 checkpoints. At each checkpoint the oldest LSN is checked and moved forward after any writes that are required are done.

Along with the LSN’s we also store the timestamp associated with the modification of the fuzzy buffer in the buffer’s header. Also the oldest timestamp is stored similar to the oldest LSN.

Any pages with time stamp prior to the oldest LSN are written to disk at the time of the checkpoint. Usage of larger logs will gain more from fuzzy checkpoints than smaller logs.

The LSN is stored in memory but the last slot of the check point reserved page will also contain the LSN which is recorded at the time of the checkpoint.

Example:

oncheck –pr

Validating PAGE_1CKPT & PAGE_2CKPT...
Using check point page PAGE_2CKPT.
 
    Time stamp of checkpoint                    0x1176c
    Time of checkpoint                          08/24/2000 12:48:43
    Physical log begin address                  0x100107
    Physical log size                           1000 (p)
    Physical log position at Ckpt               130
    Logical log unique identifier               5
    Logical log position at Ckpt                0x14d704 (Page 333, byte 1796)
    DBspace descriptor page                     0x100004
    Chunk descriptor page                       0x100007
    Mirror chunk descriptor page                0x100008
   
Fast recovery starting point:  Log ID 4, Position 0x28b23c


Recovery Issues.

The changes in the check point procedure has affected our fast recovery operations also.

Physical recovery: remains the same as in a sync checkpoint where we read the before images contained in the physical log and write those images back out to disk. This ensures that the data on disk reflects the state of the pages in question at the time of the last checkpoint.

Logical Recovery: Committed transactions now cannot be rolled forward starting from the last checkpoint because the fuzzy operations have not been flushed to disk. Hence Logical recovery now has 2 phases.
    1. If LSN is prior to the current checkpoint then we have Phase A recovery.
    2. If LSN is equal to the current checkpoint, then we only have Phase B recovery.

    Phase A Logical recovery:

    Log records are selectively redone. Steps include:
      1. Locate the DPT and read the DPT into memory.
      2. Roll forward starting with the oldest log record associated with a page in the DPT.
      3. Apply a Log record that meets the following criteria.
        i. Buffer header has the LM_FUZZY flag turned on.
        ii. Applies to a page in the DPT.
        iii. If the page on disk is older than the log record time stamp, the page has not been flushed and the same needs to be applied.

    Phase A is complete when the last checkpoint in the log is reached and Phase B starts here. One thing to note is that if a page is newer than the log record time stamp, then probably it did get applied by a page cleaner thread.


    Phase B Logical Recovery:

    This is the same as previous logical recovery. All logical log pages are reapplied and then at the end all open transactions are rolled back. The only change is that the log record is applied only if the time stamp on the page is older than the log record. This situation could be caused by foreground writes on pages that contained fuzzy operations.

    Check point reserved page now also contains:
      1. Checkpoint information.
      2. Logical log information.
      3. LSN.

    Prior to 9.2, ONCONFIG could be configured such that fast recovery could occur in parallel. In 9.2 the same operation is a single threaded operation. Any ONCONFIG setting is ignored. This was because there could be problems running UDR’s in parallel threads during fast recovery.

    A new logical log entry now precedes the checkpoint record in the logical logs. It is of type LG_DPT and is used to reference the DPT.

    Last phase of Logical Recovery is the dropping of temporary tables after scanning through the bitmap page for the tblspace tblspace in each dbspaces for a temp flag.

    The original onmode –c performs a regular sync checkpoint.

    onmode –c fuzzy can be used to perform a fuzzy checkpoint.

    A sync checkpoint will always occur during the following operations.
      1. Archive checkpoints.
      2. IDS shutdown.
      3. Forced checkpoints.

Debugging fuzzy checkpoints:

An environment variable TRACEFUZZYCKPT needs to be set before shared memory initialization and this traces the number of buffers that remain dirty after the checkpoint was completed. It also lists the location of the oldest LSN.

Example:

17:59:25  Dataskip is now OFF for all dbspaces
17:59:25  On-Line Mode
17:59:25  2 buffers dirty
17:59:25  oldest lsn loguniq 5, logpos 0x14e018
17:59:25  dskflush() took 0 seconds
17:59:25  wait4critex() took 0 seconds
17:59:25  0 buffers dirty
17:59:25  oldest lsn loguniq 5, logpos 0x14e018
17:59:25  safe_dskflush() took 0 seconds
17:59:25  Checkpoint Completed:  duration was 0 seconds.
17:59:25  Checkpoint loguniq 5, logpos 0x14f018


If old style check pointing is always desired, then setting NOFUZZYCKPT to 1 before shared memory initialization will disable fuzzy checkpoints.

[{"Product":{"code":"SSGU8G","label":"Informix Servers"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Not Applicable","Platform":[{"code":"PF033","label":"Windows"},{"code":"PF002","label":"AIX"},{"code":"PF025","label":"Platform Independent"},{"code":"PF008","label":"DYNIX\/ptx"},{"code":"PF010","label":"HP-UX"},{"code":"PF015","label":"IRIX"},{"code":"PF016","label":"Linux"},{"code":"PF026","label":"Reliant UNIX"},{"code":"PF027","label":"Solaris"}],"Version":"10.0;9.3;9.4","Edition":"Enterprise","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
16 June 2018

UID

swg21140289