
Foreign Data Wrapper based on SQL-MED is one the coolest features of PostgreSQL. The feature set of foreign data wrapper is expanding since version 9.1. We know that the PostgreSQL 14 beta is out and GA will be available shortly, therefore it is helpful to study the upcoming features of PostgreSQL 14. There are a lot of them, along with some improvements in foreign data wrapper. A new performance feature, “Bulk Insert“, is added in PostgreSQL 14. The API is extended and allows bulk insert of the data into the foreign table, therefore, using that API, any foreign data wrapper now can implement Bulk Insert. It is definitely more efficient than inserting individual rows.
The API contains two new functions, which can be used to implement the bulk insert.
There is no need to explain these functions here because it is useful for people interested in having that functionality into their foreign data wrapper like mysql_fdw, mongo_fdw, and oracle_fdw. If someone is interested to see that, they can see it in the PostgreSQL documentation. But the good news is, postgres_fdw already implement that and have that in PostgreSQL 14.
There is a new server option is added which is batch_size, and you can specify that when creating the foreign server or creating a foreign table.
- Create a postgres_fdw extension
CREATE EXTENSION postgres_fdw;
- Create a foreign server without batch_size
CREATE SERVER postgres_svr FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1'); CREATE USER MAPPING FOR vagrant SERVER postgres_svr OPTIONS (user 'postgres', password 'pass'); CREATE FOREIGN TABLE foo_remote (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)) SERVER postgres_svr OPTIONS(table_name 'foo_local'); EXPLAIN (VERBOSE, COSTS OFF) insert into foo_remote values (generate_series(1, 1), 'c', 'text', 'varchar'); QUERY PLAN ----------------------------------------------------------------------------------------------------------- Insert on public.foo_remote Remote SQL: INSERT INTO public.foo_local(a, b, c, d) VALUES ($1, $2, $3, $4) Batch Size: 1 -> ProjectSet Output: generate_series(1, 1), 'c'::character(1), 'text'::text, 'varchar'::character varying(255) -> Result (6 rows)
- Execution time with batch_size not specified
EXPLAIN ANALYZE INSERT INTO foo_remote VALUES (generate_series(1, 100000000), 'c', 'text', 'varchar'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Insert on foo_remote (cost=0.00..500000.02 rows=0 width=0) (actual time=4591443.250..4591443.250 rows=0 loops=1) -> ProjectSet (cost=0.00..500000.02 rows=100000000 width=560) (actual time=0.006..31749.132 rows=100000000 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 4.988 ms Execution Time: 4591447.101 ms -- timing is important (5 rows)
- Create a foreign table with batch_size = 10, in case no batch_size is specified with server creation
CREATE FOREIGN TABLE foo_remote (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)) SERVER postgres_svr OPTIONS(table_name 'foo_local', batch_size '10');
- Create a foreign server with batch_size = 10, now every table of that server will use the batch_size 10
CREATE SERVER postgres_svr_bulk FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', batch_size = '10'); -- new option batch_size CREATE USER MAPPING FOR vagrant SERVER postgres_svr OPTIONS (user 'postgres', password 'pass'); CREATE FOREIGN TABLE foo_remote (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)) SERVER postgres_svr OPTIONS(table_name 'foo_local'); EXPLAIN (VERBOSE, COSTS OFF) insert into foo_remote_bulk values (generate_series(1, 1), 'c', 'text', 'varchar'); QUERY PLAN ----------------------------------------------------------------------------------------------------------- Insert on public.foo_remote_bulk Remote SQL: INSERT INTO public.foo_local_bulk(a, b, c, d) VALUES ($1, $2, $3, $4) Batch Size: 10 -> ProjectSet Output: generate_series(1, 1), 'c'::character(1), 'text'::text, 'varchar'::character varying(255) -> Result (6 rows)
- Execution time with batch_size = 10:
EXPLAIN ANALYZE INSERT INTO foo_remote_bulk VALUES (generate_series(1, 100000000), 'c', 'text', 'varchar'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Insert on foo_remote_bulk (cost=0.00..500000.02 rows=0 width=0) (actual time=822224.678..822224.678 rows=0 loops=1) -> ProjectSet (cost=0.00..500000.02 rows=100000000 width=560) (actual time=0.005..10543.845 rows=100000000 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1) Planning Time: 0.250 ms Execution Time: 822239.178 ms -- timing is important (5 rows)
Conclusion
PostgreSQL is expanding the feature list of foreign data wrappers, and Bulk Insert is another good addition. As this feature is added to the core, I hope all other foreign data wrappers will implement it as well.