merge_by_key

merge_by_key provides a means for a user to merge a result set into a table based on that table's primary key. It provides interfaces for simple merges, and conditional ones.

This is not a replacement for the MERGE command. It is a use-case of merge that works right now with versions 8.1 and greater of PostgreSQL.

Installation

To install MBK, get the source from: http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz

Fetch:

fetch http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz

Or

curl -O http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz

Or

wget http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz

Extract:

tar zxf merge_by_key-1.0.tar.gz

Install:

cd merge_by_key-1.0
psql -f source.sql

That's it. merge_by_key is now available for use in the target database.

Usage

In the following examples, the dst table will be used as the target:

CREATE TABLE dst (k int PRIMARY KEY, c1 text, c2 int);

Simple Merge

In this case, a row where k = 1 is merged:

INSERT INTO dst VALUES (1, 'foo', 1);
INSERT 0 1
SELECT * FROM merge_by_key(
        'dst'::text,
        'select 1 AS k, ''bleh''::text AS c1, 123 AS g'::text
);
NOTICE:  ignoring absent columns for merge, {c2}
CONTEXT:  SQL function "merge_by_key" statement 1
SQL function "merge_by_key" statement 1
-[ RECORD 1 ]------+---------------------------
start_time         | 2008-05-07 10:30:37.035424
end_time           | 2008-05-07 10:30:37.064345
create_count       | 1
create_time        | 00:00:00.004621
update_using_count | 1
update_using_time  | 00:00:00.000217
delete_using_count | 1
delete_using_time  | 00:00:00.000177
insert_count       | 0
insert_time        | 00:00:00.00009

Notably, this shows merge_by_key's tolerance to missing columns. While this does allow potential errors to occur, it does provide you with an indication of that possibility everytime it happens. With the presence of the notice, the provided convenience out-weighs the hazard of an erroneous merge.

Conditional Merge

In this case, a row where k = 1 is merged iff the new c2 is less than the old, existing, c2:

INSERT INTO dst VALUES (1, 'foo', 5);
INSERT 0 1
SELECT * FROM merge_by_key(
        'public', -- table schema
        'dst', -- table name
        'mnew.c2 < mold.c2', -- merge condition
        'select 1 as k, ''bleh''::text as c1, 2 AS c2'
);

The merge created one and updated one:

-[ RECORD 1 ]------+---------------------------
start_time         | 2008-05-08 13:56:46.693153
end_time           | 2008-05-08 13:56:46.723416
create_count       | 1
create_time        | 00:00:00.005627
update_using_count | 1
update_using_time  | 00:00:00.000356
delete_using_count | 1
delete_using_time  | 00:00:00.000179
insert_count       | 0
insert_time        | 00:00:00.000093

The effect, c1 is now 'bleh':

SELECT * FROM dst;
-[ RECORD 1 ]
k  | 1
c1 | bleh
c2 | 2

Merge again, but where the condition fails:

SELECT * FROM merge_by_key(
        'public', -- table schema
        'dst', -- table name
        'mnew.c2 < mold.c2', -- merge condition
        'select 1 as k, ''foo''::text as c1, 2 AS c2'
);

No update occurred, and no new records were inserted:

-[ RECORD 1 ]------+---------------------------
start_time         | 2008-05-08 14:03:12.581652
end_time           | 2008-05-08 14:03:12.62839
create_count       | 1
create_time        | 00:00:00.022227
update_using_count | 0
update_using_time  | 00:00:00.000226
delete_using_count | 1
delete_using_time  | 00:00:00.000183
insert_count       | 0
insert_time        | 00:00:00.000092

The effect, nothing changed:

SELECT * FROM dst;
-[ RECORD 1 ]
k  | 1
c1 | bleh
c2 | 2

Returns

merge_by_key returns timing and quantity information about the merge process. It uses OUT parameters to provide this data. OUT parameter names and meaning:

start_time
The timestamp when the merge was started.
end_time
The timestamp when the merge was completed.
create_count
The number of rows generated by the source; the number of rows materialized by the given query.
create_time
The amount time it took to fully materialize the source.
update_using_count
The number of rows that were updated due to a key match.
update_using_time
The amount of time it took to update the rows whose keys were matched in the source.
delete_using_count
The number of rows that were removed from the source due to being processed by the previous step(update using).
delete_using_time
The amount of time it took to remove key matched rows.
insert_count
The number of rows inserted from the source; ie, entirely new rows.
insert_time
The amount of time it took to insert the new rows.

Compatibility

PostgreSQL 8.1 and greater is required for merge_by_key to work properly. Notably, UPDATE statements must have the ability for the target to be re-labeled("MOLD"). Also, in 8.0 and earlier, key information was not available via the information_schema unless the current user owned the table being referenced. With most metadata being extracted from the information_schema, this yields a problem with versions that have that restriction. Also, of course, OUT parameter support came in 8.1.

Greenplum Support

In addition to PostgreSQL, merge_by_key includes support for Greenplum. While most of the functionality need not change for Greenplum support, table distribution needed to be specified in order to achieve a performat implementation. This was required to due the creation of the temporary table using the source query, as opposed to using a CREATE TEMP TABLE x (LIKE y) statement.

TODO (Maybe)

LICENSE

Copyright 2008, James William Pye. http://jwp.name

Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL James William Pye BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF James William Pye HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

James William Pye SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND James William Pye HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.