Dead Tuples In Focus — Vol II: Enhancing PostgreSQL Performance 🚀

firattamur
8 min readJan 14, 2024

--

Photo by Nam Anh on Unsplash

In the vast universe of open-source, object-relational databases, PostgreSQL stands out as a reliable and efficient cornerstone 🌍. Celebrating over 25 years of a rich legacy, PostgreSQL is revered for its steadfast commitment to data integrity and accuracy. Its remarkable performance capabilities are what set it apart in a crowded field of database systems 💡.

Our exploration into PostgreSQL’s nuances has already shed light on the concept of dead tuples, as detailed in our initial article. These elements, often overlooked, play a crucial role in the efficiency of database operations 📊.

As we continue our journey, this article aims to delve deeper into PostgreSQL’s performance-enhancing features. Our focal point will be the VACUUM and VACUUM FULL operations, which are instrumental in maintaining and optimizing database performance.

Setup:

Photo by Venti Views on Unsplash

Here’s what our setup looks like:

  • We’ve chosen Docker 🐳 as our environment for this demonstration, running PostgreSQL within a Docker container.

For those new to Docker, it’s a breeze to get started with the official documentation. 📖 Docker Compose is our tool of choice for managing multi-container Docker applications, using a simple YAML file for configuration.

To deploy PostgreSQL with Docker, you’ll be working with a docker-compose.yml file like this:

version: '3.8'
services:
postgres_db:
image: postgres:latest
container_name: postgres_db
restart: always
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
ports:
- 5432:5432
volumes:
- postgres_db:/var/lib/postgresql/data
volumes:
postgres_db:
driver: local

After installing Docker, kickstart your PostgreSQL container with these commands:

  • To create the container:
$ docker-compose up -d
  • To check its status:
$ docker-compose ps
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
postgres_db postgres:latest "docker-entrypoint.s…" postgres_db About an hour ago Up 7 seconds 0.0.0.0:5432->5432/tcp

To dive into the container and start interacting with the database, these are your magic spells:

  • Enter into the container:
$ docker exec -it postgres_db bash
  • Connect to the database:
$ psql -h localhost -p 5432 -U postgres
  • And to set up a basic table, here’s a simple SQL command:
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL
) with (autovacuum_enabled = false);
  • Now, we can check the table using the following command:
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+-------+-------+--------
public | users | table | postgres
(1 row)

Recap: The Phenomenon of Dead Tuples in PostgreSQL 📚

In our journey through the captivating world of PostgreSQL, we've encountered the intriguing concept of 'dead tuples' - a topic that resonates with database enthusiasts. These 'dead tuples' are far from being eerie remnants of the database; they're actually key components of PostgreSQL's Multi-version Concurrency Control (MVCC) system 🌐.

MVCC is a sophisticated architectural approach 🏛️ that allows multiple users to interact with the same database record simultaneously, ensuring seamless and conflict-free transactions. When a row in PostgreSQL is updated 🔄 or deleted ❌, it doesn't disappear. Instead, it evolves into a 'dead tuple' - think of it as the row's ghost, invisible yet present within the database's realm 👻.

These 'dead tuples' signify the complexity and elegance of PostgreSQL's MVCC, serving as invisible but essential cogs in the database’s lifecycle. They play a crucial role in maintaining harmony and efficiency, allowing multiple users to coexist and operate without interference.

In this article, we'll delve deeper into how PostgreSQL manages these 'dead tuples' and explore the nuances of VACUUM and VACUUM FULL operations, which are pivotal in optimizing database performance.

While these dead tuples are invisible to transactions, they still occupy space, leading to bloated tables and sluggish queries. To monitor them, we use a simple command:

postgres=# SELECT relname, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'users';

relname | n_dead_tup
--------+------------
users | 0
(1 row)
  • To demonstrate the dead tuples, we will insert some data into the table:
-- Insert 3 rows into the table

postgres=# INSERT INTO users (name) VALUES ('Kaan');
INSERT 0 1
postgres=# INSERT INTO users (name) VALUES ('Ali');
INSERT 0 1
postgres=# INSERT INTO users (name) VALUES ('Hakan');
INSERT 0 1

postgres=# SELECT * FROM users;
id | name
----+-------
1 | Kaan
2 | Ali
3 | Hakan
(3 rows)
— Insert 3 rows into the table

VACUUM:

Think of VACUUM as your daily tidy-up routine. It goes through your database and cleans out ‘dead tuples’ — those old data pieces that are no longer needed, kind of like sweeping up dust bunnies under your bed. 🐰

VACUUM doesn’t shrink your database size. It’s more about housekeeping, making sure the space taken up by dead tuples is now free for new data. Imagine it as reorganizing your closet to make space without actually getting rid of the closet! 🚪

The best part? It does all this without interrupting your database’s daily life. There’s no locking of tables, meaning your database activities continue uninterrupted. 🎉

  • We’ll proceed by deleting one row and updating another.
postgres=# DELETE FROM users WHERE id = 2;
DELETE 1
postgres=# UPDATE users SET name = 'Hakan - v2' WHERE id = 3;
UPDATE 1
  • When will check the dead tuples in the table:
postgres=# SELECT relname, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'users';
relname | n_dead_tup
---------+------------
users | 2
(1 row)
  • To remove the dead tuples from the table, we will use the following command:
postgres=# VACUUM users;
VACUUM
VACUUM doesn’t shrink your database size!
  • Now, we will check the dead tuples in the table:
postgres=# SELECT relname, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'users';
relname | n_dead_tup
---------+------------
users | 0
(1 row)

VACUUM FULL:

VACUUM FULL is like that deep, thorough cleaning you do once in a while. It’s more intensive, removing dead tuples and also compacting the database to actually reduce its physical size. Think of it as not only clearing out your closet but also resizing it to fit your current needs. 🧳

This command rewrites the table to include only the live tuples, which can significantly shrink the table size. It’s like repacking your suitcase more efficiently for a trip. ✈️

VACUUM FULL requires exclusive access to the table it’s cleaning. This means it locks the table, so no one can read or write to it during the process. It’s like closing your shop for a day to do a major renovation. 🚫🛠️

  • Following the similar steps outlined above, once we verify the count of dead tuples, we can proceed to run VACUUM FULL this time.
VACUUM does shrink your database size!
  • If you check dead tuples count again, you will see that it is 0 now.
postgres=# SELECT relname, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'users';
relname | n_dead_tup
---------+------------
users | 0
(1 row)

Demonstrating Dead Tuples with a Larger Dataset 📊

To better illustrate the impact of dead tuples, let’s scale up our example by working with a larger dataset:

  • First, we populate our ‘users’ table with a substantial number of rows:
-- inserts 10,000 rows named 'User 1' to 'User 10000'.
postgres=# INSERT INTO users (name) SELECT 'User ' || i FROM generate_series(1, 10000) AS i;
INSERT 0 10000
  • Let’s perform an update on all these rows:
postgres=# UPDATE users SET name = name || ' - updated';
UPDATE 10002
postgres=# SELECT relname, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'users';
relname | n_dead_tup
---------+------------
users | 10002
(1 row)

This update appends ‘ — updated’ to the name of each user, effectively creating 10,000 dead tuples.

  • Now, we apply the VACUUM command:
postgres=# SELECT pg_size_pretty(pg_total_relation_size('users'));
pg_size_pretty
----------------
1440 kB
(1 row)

postgres=# VACUUM users;
VACUUM

postgres=# SELECT pg_size_pretty(pg_total_relation_size('users'));
pg_size_pretty
----------------
1440 kB
(1 row)

VACUUM will clean up the dead tuples, making space available for new data. However, it won’t reduce the physical file size of the ‘users’ table.

  • Subsequently, we employ VACUUM FULL:
postgres=# SELECT pg_size_pretty(pg_total_relation_size('users'));
pg_size_pretty
----------------
1440 kB
(1 row)

postgres=# VACUUM FULL users;
VACUUM

postgres=# SELECT pg_size_pretty(pg_total_relation_size('users'));
pg_size_pretty
----------------
760 kB
(1 row)

VACUUM FULL not only removes dead tuples but also compacts the table, which should reduce its physical size.

This size should be notably smaller than after just running VACUUM, demonstrating how VACUUM FULL can effectively reduce the physical footprint of a table.

Wrapping Up:

As we reach the end of our PostgreSQL exploration, it’s clear that both VACUUM and VACUUM FULL play vital roles in database management, each with its unique strengths and limitations. Let’s summarize:

VACUUM: The Daily Maintenance Hero 🛠️

Best Sides:

  • 🚀 Non-Disruptive: VACUUM doesn’t lock the table, allowing regular database operations to continue smoothly.
  • 🧹 Efficient Cleanup: It effectively removes dead tuples, freeing up space for new data without downtime.

Drawbacks:

  • 📏 No Size Reduction: VACUUM doesn’t shrink the physical size of the database.
  • 🔄 Frequent Need: Regular runs are necessary to maintain optimal performance.

VACUUM FULL: The Deep Cleaner 🌊

Best Sides:

  • 📉 Size Reduction: It significantly reduces the physical size of the database by compacting tables.
  • 👌 One-Time Fix: Ideal for occasional intensive cleanups to reclaim space.

Drawbacks:

  • 🔒 Exclusive Lock: Requires locking the table, which can disrupt access during the operation.
  • Time-Consuming: Often slower and not ideal for databases that require constant availability.

Conclusion: Choosing the Right Tool for the Job

In the grand scheme of things, both VACUUM and VACUUM FULL have their moments to shine. It’s about choosing the right tool for the right situation:

  • Go for VACUUM when you need regular maintenance without disrupting the database’s flow. It’s like keeping your engine oiled and running smoothly.
  • Opt for VACUUM FULL when you have the luxury of time and need to aggressively reclaim disk space. Think of it like a full garage cleanup on a quiet weekend.

Remember, the key to efficient PostgreSQL database management lies in understanding and appropriately using these powerful tools. Happy optimizing!

References:

--

--

firattamur
firattamur

Written by firattamur

Hey there, I'm firattamur! I have a passion for learning and explaining things.

No responses yet