Thread: Couple of question on functions
Hello,
Reading the manual recently I came across this: ( http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
> Because of the snapshotting behavior of MVCC (see Chapter 13) a function containing only SELECT commands can safely be marked
> STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute a STABLE
> function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query. Also
> note that the
It stroke me that it might be not all that safe to mark SELECTing only function STABLE vs VOLATILE (or vice versa). Consider an example:
create table t1(id int);
create or replace function f1() returns void as
$$
declare
i int;
begin
select count(*) into i from t1;
raise notice '%', i;
-- waste some time
for i in 1..700000000 loop
end loop;
select count(*) into i from t1;
raise notice '%', i;
end;
$$
language 'plpgsql';
Now in first connection do:
select f1();
While the execution is in the loop which takes a while do in another connection:
insert into t1 values (1);
The function returns with the following notices:
NOTICE: 0
NOTICE: 1
Should I change the volatility type of f1() to STABLE and run the above again I would get:
NOTICE: 1
NOTICE: 1
It looks like at least plpgsql functions use most recent snapshot on each call to SPI manager instead that of a calling query, so since default transaction isolation level in postgres is READ COMMITTED concurrent transactions may affect result of pure-reader VOLATILE function. I wonder if any-language (including SQL,and C) function would behave in the same way?
Another thing I've recently discover is that SQL function seem to be unexpectedly slow to call. Example:
create or replace function f2sql(int) returns int as
$$
select case when $1 < 100000 then 1
when 100000 <= $1 and $1 < 500000 then 2
when $1 >= 500000 then 3
end;
$$
language 'sql' immutable;
create or replace function f2plpgsql(int) returns int as
$$
begin
return case when $1 < 100000 then 1
when 100000 <= $1 and $1 < 500000 then 2
when $1 >= 500000 then 3
end;
end;
$$
language 'plpgsql' immutable;
These two function do exactly the same calculation on input and differ only in language used. Now I write some query involving them and wrap it into another function (so that I could use PERFORM to avoid possible overhead on fetching results to the client, to cache the plan and to measure the time in more precise manner):
create or replace function f3() returns void as
$$
declare
st timestamp;
begin
st := clock_timestamp();
perform f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int)
from generate_series(1, 100000);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;
create or replace function f4() returns void as
$$
declare
st timestamp;
begin
st := clock_timestamp();
perform f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int)
from generate_series(1, 100000);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;
Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that is a notable difference especially considering that SQL function is likely to be inlined. Do i miss something?
Reading the manual recently I came across this: ( http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
> Because of the snapshotting behavior of MVCC (see Chapter 13) a function containing only SELECT commands can safely be marked
> STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute a STABLE
> function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query. Also
> note that the
current_timestamp
family of functions qualify as stable, since their values do not change within a transaction.It stroke me that it might be not all that safe to mark SELECTing only function STABLE vs VOLATILE (or vice versa). Consider an example:
create table t1(id int);
create or replace function f1() returns void as
$$
declare
i int;
begin
select count(*) into i from t1;
raise notice '%', i;
-- waste some time
for i in 1..700000000 loop
end loop;
select count(*) into i from t1;
raise notice '%', i;
end;
$$
language 'plpgsql';
Now in first connection do:
select f1();
While the execution is in the loop which takes a while do in another connection:
insert into t1 values (1);
The function returns with the following notices:
NOTICE: 0
NOTICE: 1
Should I change the volatility type of f1() to STABLE and run the above again I would get:
NOTICE: 1
NOTICE: 1
It looks like at least plpgsql functions use most recent snapshot on each call to SPI manager instead that of a calling query, so since default transaction isolation level in postgres is READ COMMITTED concurrent transactions may affect result of pure-reader VOLATILE function. I wonder if any-language (including SQL,and C) function would behave in the same way?
Another thing I've recently discover is that SQL function seem to be unexpectedly slow to call. Example:
create or replace function f2sql(int) returns int as
$$
select case when $1 < 100000 then 1
when 100000 <= $1 and $1 < 500000 then 2
when $1 >= 500000 then 3
end;
$$
language 'sql' immutable;
create or replace function f2plpgsql(int) returns int as
$$
begin
return case when $1 < 100000 then 1
when 100000 <= $1 and $1 < 500000 then 2
when $1 >= 500000 then 3
end;
end;
$$
language 'plpgsql' immutable;
These two function do exactly the same calculation on input and differ only in language used. Now I write some query involving them and wrap it into another function (so that I could use PERFORM to avoid possible overhead on fetching results to the client, to cache the plan and to measure the time in more precise manner):
create or replace function f3() returns void as
$$
declare
st timestamp;
begin
st := clock_timestamp();
perform f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int)
from generate_series(1, 100000);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;
create or replace function f4() returns void as
$$
declare
st timestamp;
begin
st := clock_timestamp();
perform f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int)
from generate_series(1, 100000);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;
Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that is a notable difference especially considering that SQL function is likely to be inlined. Do i miss something?
"Vyacheslav Kalinin" <vka@mgcp.com> writes: > Reading the manual recently I came across this: ( > http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html ) >> Because of the snapshotting behavior of MVCC (see Chapter 13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>) > a function containing only SELECT commands can safely be marked >> STABLE, even if it selects from tables that might be undergoing > modifications by concurrent queries. PostgreSQL will execute a STABLE >> function using the snapshot established for the calling query, and so it > will see a fixed view of the database throughout that query. > It stroke me that it might be not all that safe to mark SELECTing only > function STABLE vs VOLATILE (or vice versa). What it says is that you *can* mark such a function stable, without violating the rules for a stable function. It doesn't say that this choice doesn't affect the results. Feel free to propose better wording... > Another thing I've recently discover is that SQL function seem to be > unexpectedly slow to call. Example: > ... > perform f2sql(trunc(1000000*random())::int) + > ... > Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that > is a notable difference especially considering that SQL function is likely > to be inlined. Do i miss something? You might think it's "likely", but in fact no inlining will happen because you have a volatile argument that's used more than once inside the function body. Inlining would change the results. regards, tom lane
> You might think it's "likely", but in fact no inlining will happen
> because you have a volatile argument that's used more than once inside
> the function body. Inlining would change the results.
Yeah, right, apparently I overestimated chances of inilining, thanks for the hint, Tom. In fact in my project performance problem was caused by the fact that the analogue of f2sql() was declared SECURITY INVOKER and thus could not be inlined :-/. Uhm, does anyone else think inlining tips are worth mentioning in docs or it's just me?
> because you have a volatile argument that's used more than once inside
> the function body. Inlining would change the results.
Yeah, right, apparently I overestimated chances of inilining, thanks for the hint, Tom. In fact in my project performance problem was caused by the fact that the analogue of f2sql() was declared SECURITY INVOKER and thus could not be inlined :-/. Uhm, does anyone else think inlining tips are worth mentioning in docs or it's just me?
On Tue, 13 May 2008 22:51:00 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Vyacheslav Kalinin" <vka@mgcp.com> writes: > > Reading the manual recently I came across this: ( > > http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html ) > >> Because of the snapshotting behavior of MVCC (see Chapter > >> 13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>) > > a function containing only SELECT commands can safely be marked > >> STABLE, even if it selects from tables that might be undergoing > > modifications by concurrent queries. PostgreSQL will execute a > > STABLE > >> function using the snapshot established for the calling query, > >> and so it > > will see a fixed view of the database throughout that query. > > > It stroke me that it might be not all that safe to mark SELECTing > > only function STABLE vs VOLATILE (or vice versa). > > What it says is that you *can* mark such a function stable, without > violating the rules for a stable function. It doesn't say that this > choice doesn't affect the results. Feel free to propose better > wording... I'm confused... Actually f1 seems to be stable. the insert is executed "outside" the function. My understanding is that immutable, stable and volatile are hints for the optimizer. Results from an immutable function could be cached across the whole life of the DB if input parameters are the same. insert into t (a,b) values(5,fi(3)); insert (a,b) values(7,fi(3)); fi *could* be executed just one time. Results from a stable function could be cached across a statement. insert into t (a,b) values(fs(3),fs(3)); fs *could* be executed just one time. Inside *any* function selects will see the snapshot and the modification made inside the function since function are executed inside an implicit transaction. Now I read: http://searchwarp.com/swa9860.htm Read Committed Isolation Level Read Committed is the default isolation level in PostgreSQL. When a transaction runs on this isolation level, a SELECT query sees only data committed before the query began it never sees either uncommitted data or changes committed during query execution by concurrent. create table t1(a int); insert into t1 values(1); create or replace function ft(out a1 int, out a2 int) as $$ begin select into a1 a from t1 limit 1; for i in 1..700000000 loop end loop; select into a2 a from t1 limit 1; return; end; $$ language plpgsql; select * from ft(); update t1 set a=5; So I'd expect ft() return always (1,1) or (5,5). Since select * from ft(); is one statement... it should see only data that were committed when select started. But actually I can obtain (1,5) ??? -- Ivan Sergio Borgonovo http://www.webthatworks.it