Thread: Couple of question on functions

Couple of question on functions

From

"Vyacheslav Kalinin"

Date:

13 May 2008, 21:49:48

Hello,

Reading the manual recently I came across this: ( http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
> Because of the snapshotting behavior of MVCC (see Chapter 13) a function containing only SELECT commands can safely be marked > STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute a STABLE
> function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query. Also
> note that the current_timestamp family of functions qualify as stable, since their values do not change within a transaction.

It stroke me that it might be not all that safe to mark SELECTing only function STABLE vs VOLATILE (or vice versa). Consider an example:

create table t1(id int);

create or replace function f1() returns void as
$$
declare
i int;
begin
    select count(*) into i from t1;
    raise notice '%', i;
    -- waste some time
    for i in 1..700000000 loop
    end loop;
    select count(*) into i from t1;
    raise notice '%', i;
end;
$$
language 'plpgsql';

Now in first connection do:
select f1();

While the execution is in the loop which takes a while do in another connection:
insert into t1 values (1);

The function returns with the following notices:
NOTICE: 0
NOTICE: 1

Should I change the volatility type of f1() to STABLE and run the above again I would get:
NOTICE: 1
NOTICE: 1

It looks like at least plpgsql functions use most recent snapshot on each call to SPI manager instead that of a calling query, so since default transaction isolation level in postgres is READ COMMITTED concurrent transactions may affect result of pure-reader VOLATILE function. I wonder if any-language (including SQL,and C) function would behave in the same way?

Another thing I've recently discover is that SQL function seem to be unexpectedly slow to call. Example:

create or replace function f2sql(int) returns int as
$$
select case when $1 < 100000 then 1
            when 100000 <= $1 and $1 < 500000 then 2
            when $1 >= 500000 then 3
       end;
$$
language 'sql' immutable;

create or replace function f2plpgsql(int) returns int as
$$
begin
return case when $1 < 100000 then 1
            when 100000 <= $1 and $1 < 500000 then 2
            when $1 >= 500000 then 3
       end;
end;
$$
language 'plpgsql' immutable;

These two function do exactly the same calculation on input and differ only in language used. Now I write some query involving them and wrap it into another function (so that I could use PERFORM to avoid possible overhead on fetching results to the client, to cache the plan and to measure the time in more precise manner):

create or replace function f3() returns void as
$$
declare
st timestamp;
begin
    st := clock_timestamp();
    perform f2sql(trunc(1000000*random())::int) +
               f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int) +
            f2sql(trunc(1000000*random())::int)
       from generate_series(1, 100000);
    raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;

create or replace function f4() returns void as
$$
declare
st timestamp;
begin
    st := clock_timestamp();
    perform f2plpgsql(trunc(1000000*random())::int) +
               f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int) +
            f2plpgsql(trunc(1000000*random())::int)
       from generate_series(1, 100000);
    raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;

Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that is a notable difference especially considering that SQL function is likely to be inlined. Do i miss something?

Re: Couple of question on functions

From

Tom Lane

Date:

13 May 2008, 23:51:10

"Vyacheslav Kalinin" <vka@mgcp.com> writes:
> Reading the manual recently I came across this: (
> http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
>> Because of the snapshotting behavior of MVCC (see Chapter
13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>)
> a function containing only SELECT commands can safely be marked
>> STABLE, even if it selects from tables that might be undergoing
> modifications by concurrent queries. PostgreSQL will execute a STABLE
>> function using the snapshot established for the calling query, and so it
> will see a fixed view of the database throughout that query.

> It stroke me that it might be not all that safe to mark SELECTing only
> function STABLE vs VOLATILE (or vice versa).

What it says is that you *can* mark such a function stable, without
violating the rules for a stable function.  It doesn't say that this
choice doesn't affect the results.  Feel free to propose better wording...

> Another thing I've recently discover is that SQL function seem to be
> unexpectedly slow to call. Example:
> ...
>     perform f2sql(trunc(1000000*random())::int) +
> ...

> Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that
> is a notable difference especially considering that SQL function is likely
> to be inlined. Do i miss something?

You might think it's "likely", but in fact no inlining will happen
because you have a volatile argument that's used more than once inside
the function body.  Inlining would change the results.

            regards, tom lane

Re: Couple of question on functions

From

"Vyacheslav Kalinin"

Date:

14 May 2008, 01:58:11

> You might think it's "likely", but in fact no inlining will happen
> because you have a volatile argument that's used more than once inside
> the function body. Inlining would change the results.

Yeah, right, apparently I overestimated chances of inilining, thanks for the hint, Tom. In fact in my project performance problem was caused by the fact that the analogue of f2sql() was declared SECURITY INVOKER and thus could not be inlined :-/. Uhm, does anyone else think inlining tips are worth mentioning in docs or it's just me?

Re: Couple of question on functions

From

Ivan Sergio Borgonovo

Date:

14 May 2008, 04:21:18

On Tue, 13 May 2008 22:51:00 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Vyacheslav Kalinin" <vka@mgcp.com> writes:
> > Reading the manual recently I came across this: (
> > http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
> >> Because of the snapshotting behavior of MVCC (see Chapter
> >> 13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>)
> > a function containing only SELECT commands can safely be marked
> >> STABLE, even if it selects from tables that might be undergoing
> > modifications by concurrent queries. PostgreSQL will execute a
> > STABLE
> >> function using the snapshot established for the calling query,
> >> and so it
> > will see a fixed view of the database throughout that query.
>
> > It stroke me that it might be not all that safe to mark SELECTing
> > only function STABLE vs VOLATILE (or vice versa).
>
> What it says is that you *can* mark such a function stable, without
> violating the rules for a stable function.  It doesn't say that this
> choice doesn't affect the results.  Feel free to propose better
> wording...

I'm confused...

Actually f1 seems to be stable. the insert is executed "outside" the
function.

My understanding is that immutable, stable and volatile are hints for
the optimizer.

Results from an immutable function could be cached across the whole
life of the DB if input parameters are the same.

insert into t (a,b) values(5,fi(3));
insert (a,b) values(7,fi(3));

fi *could* be executed just one time.

Results from a stable function could be cached across a statement.

insert into t (a,b) values(fs(3),fs(3));

fs *could* be executed just one time.

Inside *any* function selects will see the snapshot and the
modification made inside the function since function are executed
inside an implicit transaction.

Now I read:
http://searchwarp.com/swa9860.htm

Read Committed Isolation Level
Read Committed is the default isolation level in PostgreSQL. When a
transaction runs on this isolation level, a SELECT query sees only
data committed before the query began it never sees either
uncommitted data or changes committed during query execution by
concurrent.

create table t1(a int);
insert into t1 values(1);

create or replace function ft(out a1 int, out a2 int) as
$$
begin
  select into a1 a from t1 limit 1;
  for i in 1..700000000 loop
  end loop;
  select into a2 a from t1 limit 1;
  return;
end;
$$ language plpgsql;

select * from ft();

update t1 set a=5;

So I'd expect ft() return always (1,1) or (5,5).

Since
select * from ft();

is one statement... it should see only data that were committed when
select started.

But actually I can obtain (1,5)

???

--
Ivan Sergio Borgonovo
http://www.webthatworks.it