Efficient pagination of a table with 100M records
This Chapter is focused on efficient scanning a large table using pagination with offset on the primary key. This is also known as keyset pagination.
The key insight is to use the primary key as the offset to avoid missing records (if they’re deleted between invocations) and to increase performance by using a RANGE join type, which is much faster (constant time.)
- We get
PAGE_SIZEnumber of records from the table. Starting offset value is 0.
- Use the max returned value for the primary key (i.e.,
user_id) in the batch as the offset for the next page.
- Get the next batch from the records which have the primary key (i.e.,
user_id) value higher than current offset.
SELECT user_id, external_id, name, metadata, date_created FROM users WHERE user_id > 51 234 123 --- value of user_id for 50 000 000th record ORDER BY user_id ASC LIMIT 10 000;