Worm with Glasses

Coding • DevOps • Personal

Oct 18, 2017

🔗 Efficient pagination of a table with 100M records

This Chapter is focused on efficient scanning a large table using pagination with offset on the primary key. This is also known as keyset pagination.

Efficient pagination of a table with 100M records

The key insight is to use the primary key as the offset to avoid missing records (if they’re deleted between invocations) and to increase performance by using a RANGE join type, which is much faster (constant time.)

Simplified algorithm:

  1. We get PAGE_SIZE number of records from the table. Starting offset value is 0.
  2. Use the max returned value for the primary key (i.e., user_id) in the batch as the offset for the next page.
  3. Get the next batch from the records which have the primary key (i.e., user_id) value higher than current offset.

For example:

SELECT user_id, external_id, name, metadata, date_created
FROM users
WHERE user_id > 51 234 123 --- value of user_id for 50 000 000th record
ORDER BY user_id ASC
LIMIT 10 000;