Postgres Versioning with Hidden Fields

Versioning in Postgres you say? Multi Version Concurrency Control (MVCC) controls what information can be seen by storing both the past image and the latest image with hidden fields in the table itself.  When you create a table you only see the columns you have added and respectively, only care about them.  But there is a little more to the table structure than you think. 

Take this table for example.

\d member.account

 

Postgres cmd Image

 

 

As you can see this is a simple table with a few columns, Postgres hides the system columns from general view.  However, by looking at pg_attribute, you can see all these columns.

select attname, format_type (atttypid, atttypmod) from pg_attribute where attrelid::regclass::text=member.account' order by attnum;

 

Postgres cmd Image

 

 

 Let’s define these hidden fields and see what they are used for.

Xmin – Stores the transactionId when an insert commit occurs.  If an update is issued, the value will change to the current transactionId. 

The following example will show when we insert data the xmin value is of the transactionId.

Get the id select txid_current();

 

Postgres cmd Image

 

 

Insert into member.account (accountname) values (‘dbshed’);

select xmin,xmax,cmin,cmax,* from member.account where accountname = ‘dbshed’;

 

Postgres cmd Image

 

 

xmax – The value of this column will be zero if a delete never occurred.  However, when a delete occurs, this value will be changed to the transaction id right before the delete is committed.  The following example will demonstrate the xmax values.

begin; delete from member.account where accountname = ‘google’;

 

Postgres cmd Image

 

 

select xmin,xmax,cmin,cmax,* from member.account;

 

Postgres cmd Image

 

 

I’m sure you are wondering if the row is deleted and rightfully so.  We can dig a little deeper and see if this is the case.

create extension pageinspect;

select t_xmin, t_xmax, tuple_data_split(member.account::regclass, t_data, t_infomask, t_infomask2, t_bits) from heap_page_items(get_raw_page('member.account', 0));  

 

Postgres cmd Image

 

 

Looking at the results from this, you can see all the rows are there including the ones we deleted.  However, notice t_xmax has the value of the transactionId the deleted them.  This information is kept for reference

cmin – the identifier within the insert transaction or the value of zero.

cmax – the identifier within the delete transaction or the value of zero.  

Lets take a look what occurs when we do an update.

update member.account set accountname = 'dbshed.com' where accountname = ‘dbshed’;

select t_xmin, t_xmax, tuple_data_split('member.account'::regclass, t_data, t_infomask, t_infomask2, t_bits) from heap_page_items(get_raw_page('member.account', 0));

 

Postgres cmd Image

 

 

As you can see, it deletes all the rows we put in and then inserted them back with the new values.  All the rows that have been deleted will be marked in t_xmax with a zero.

 

Postgres cmd Image

 

 

Summary

Data storage can be plain, simple and some may say a little misleading.  If you don’t have the deep understanding on how something works, the rest then is just a guess.  We now know that Postgres data is not removed out of the table, just hidden from view thus keeping your database growing in size.  With that statement, we are going to go over how to clean these hidden rows so you can keep your storage space.