Re: RE : RE: Postgresql vs SQLserver for this application ?

Lists: pgsql-performance
From: bsimon(at)loxane(dot)com
To: pgsql-performance(at)postgresql(dot)org
Cc: "Mohan, Ross" <RMohan(at)arbinet(dot)com>
Subject: RE : RE: Postgresql vs SQLserver for this application ?
Date: 2005-04-06 07:17:15
Message-ID: OF9F44D17F.4458ADD0-ONC1256FDB.0027AF25-C1256FDB.002792BB@beauchamp.loxane.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Unfortunately.

But we are in the the process to choose Postgresql with pgcluster. I'm
currently running some tests (performance, stability...)
Save the money on the license fees, you get it for your hardware ;-)

I still welcome any advices or comments and I'll let you know how the
project is going on.

Benjamin.

"Mohan, Ross" <RMohan(at)arbinet(dot)com>
05/04/2005 20:48


Pour : <bsimon(at)loxane(dot)com>
cc :
Objet : RE: [PERFORM] Postgresql vs SQLserver for this application ?

You never got answers on this? Apologies, I don't have one, but'd be
curious to hear about any you did get....

thx

Ross
-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of bsimon(at)loxane(dot)com
Sent: Monday, April 04, 2005 4:02 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] Postgresql vs SQLserver for this application ?

hi all.

We are designing a quite big application that requires a high-performance
database backend.
The rates we need to obtain are at least 5000 inserts per second and 15
selects per second for one connection. There should only be 3 or 4
simultaneous connections.
I think our main concern is to deal with the constant flow of data coming
from the inserts that must be available for selection as fast as possible.
(kind of real time access ...)

As a consequence, the database should rapidly increase up to more than one
hundred gigs. We still have to determine how and when we shoud backup old
data to prevent the application from a performance drop. We intend to
develop some kind of real-time partionning on our main table keep the
flows up.

At first, we were planning to use SQL Server as it has features that in my
opinion could help us a lot :
- replication
- clustering

Recently we started to study Postgresql as a solution for our project :
- it also has replication
- Postgis module can handle geographic datatypes (which would
facilitate our developments)
- We do have a strong knowledge on Postgresql administration (we
use it for production processes)
- it is free (!) and we could save money for hardware purchase.

Is SQL server clustering a real asset ? How reliable are Postgresql
replication tools ? Should I trust Postgresql performance for this kind
of needs ?

My question is a bit fuzzy but any advices are most welcome...
hardware,tuning or design tips as well :))

Thanks a lot.

Benjamin.


From: Alex Turner <armtuk(at)gmail(dot)com>
To: "bsimon(at)loxane(dot)com" <bsimon(at)loxane(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, "Mohan, Ross" <RMohan(at)arbinet(dot)com>
Subject: Re: RE : RE: Postgresql vs SQLserver for this application ?
Date: 2005-04-06 15:37:30
Message-ID: 33c6269f05040608377dfc9272@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I think everyone was scared off by the 5000 inserts per second number.

I've never seen even Oracle do this on a top end Dell system with
copious SCSI attached storage.

Alex Turner
netEconomist

On Apr 6, 2005 3:17 AM, bsimon(at)loxane(dot)com <bsimon(at)loxane(dot)com> wrote:
>
> Unfortunately.
>
> But we are in the the process to choose Postgresql with pgcluster. I'm
> currently running some tests (performance, stability...)
> Save the money on the license fees, you get it for your hardware ;-)
>
> I still welcome any advices or comments and I'll let you know how the
> project is going on.
>
> Benjamin.
>
>
>
> "Mohan, Ross" <RMohan(at)arbinet(dot)com>
>
> 05/04/2005 20:48
>
> Pour : <bsimon(at)loxane(dot)com>
> cc :
> Objet : RE: [PERFORM] Postgresql vs SQLserver for this
> application ?
>
>
> You never got answers on this? Apologies, I don't have one, but'd be curious
> to hear about any you did get....
>
> thx
>
> Ross
>
> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf
> Of bsimon(at)loxane(dot)com
> Sent: Monday, April 04, 2005 4:02 AM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] Postgresql vs SQLserver for this application ?
>
>
> hi all.
>
> We are designing a quite big application that requires a high-performance
> database backend.
> The rates we need to obtain are at least 5000 inserts per second and 15
> selects per second for one connection. There should only be 3 or 4
> simultaneous connections.
> I think our main concern is to deal with the constant flow of data coming
> from the inserts that must be available for selection as fast as possible.
> (kind of real time access ...)
>
> As a consequence, the database should rapidly increase up to more than one
> hundred gigs. We still have to determine how and when we shoud backup old
> data to prevent the application from a performance drop. We intend to
> develop some kind of real-time partionning on our main table keep the flows
> up.
>
> At first, we were planning to use SQL Server as it has features that in my
> opinion could help us a lot :
> - replication
> - clustering
>
> Recently we started to study Postgresql as a solution for our project :
> - it also has replication
> - Postgis module can handle geographic datatypes (which would
> facilitate our developments)
> - We do have a strong knowledge on Postgresql administration (we use
> it for production processes)
> - it is free (!) and we could save money for hardware purchase.
>
> Is SQL server clustering a real asset ? How reliable are Postgresql
> replication tools ? Should I trust Postgresql performance for this kind of
> needs ?
>
> My question is a bit fuzzy but any advices are most welcome...
> hardware,tuning or design tips as well :))
>
> Thanks a lot.
>
> Benjamin.
>
>
>


From: Mischa <mischa(dot)Sandberg(at)telus(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)
Date: 2005-04-06 18:46:39
Message-ID: 1112813199.42542e8f17b4d@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

This thread seems to be focusing in on COPY efficiency,
I'd like to ask something I got no answer to, a few months ago.

Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
I accidentally strung together several \n-terminated input lines,
and sent them to the server with a single "putline".

To my (happy) surprise, I ended up with exactly that number of rows
in the target table.

Is this a bug? Is this fundamental to the protocol?

Since it hasn't been documented (but then, "endcopy" isn't documented),
I've been shy of investing in perf testing such mass copy calls.
But, if it DOES work, it should be reducing the number of network
roundtrips.

So. Is it a feechur? Worth stress-testing? Could be VERY cool.

--
"Dreams come true, not free."


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mischa <mischa(dot)Sandberg(at)telus(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)
Date: 2005-04-06 20:27:13
Message-ID: 5831.1112819233@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Mischa <mischa(dot)Sandberg(at)telus(dot)net> writes:
> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".

> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.

> Is this a bug?

No, it's the way it's supposed to work. "putline" really just sends a
stream of data ... there's no semantic significance to the number of
putline calls you use to send the stream, only to the contents of the
stream. (By the same token, it's unlikely that deliberately aggregating
such calls would be much of a win.)

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Mischa <mischa(dot)Sandberg(at)telus(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this
Date: 2005-04-07 02:04:26
Message-ID: 4254952A.7090705@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".
>
> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.
>
> Is this a bug? Is this fundamental to the protocol?
>
> Since it hasn't been documented (but then, "endcopy" isn't documented),
> I've been shy of investing in perf testing such mass copy calls.
> But, if it DOES work, it should be reducing the number of network
> roundtrips.

I think it's documented in the libpq docs...

Chris


From: Harald Fuchs <use_reply_to(at)protecting(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)
Date: 2005-04-07 12:21:53
Message-ID: puwtreg4ke.fsf@srv.protecting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

In article <1112813199(dot)42542e8f17b4d(at)webmail(dot)telus(dot)net>,
Mischa <mischa(dot)Sandberg(at)telus(dot)net> writes:

> This thread seems to be focusing in on COPY efficiency,
> I'd like to ask something I got no answer to, a few months ago.

> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".

> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.

> Is this a bug? Is this fundamental to the protocol?

> Since it hasn't been documented (but then, "endcopy" isn't documented),
> I've been shy of investing in perf testing such mass copy calls.
> But, if it DOES work, it should be reducing the number of network
> roundtrips.

> So. Is it a feechur? Worth stress-testing? Could be VERY cool.

Using COPY from DBD::Pg _is_ documented - presumed you use DBD::Pg
version 1.41 released just today.


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)
Date: 2005-04-08 02:03:18
Message-ID: 067c42864e24cd39b83c335da6adc8b5@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".
...
> So. Is it a feechur? Worth stress-testing? Could be VERY cool.

As explained elsewhere, not really a feature, more of a side-effect.
Keep in mind, however, that any network round-trip time saved has to
be balanced against some additional overhead of constructing the
combined strings in Perl before sending them over. Most times COPY
is used to parse a newline-separated file anyway. If you have a slow
network connection to the database, it *might* be a win, but my
limited testing shows that it is not an advantage for a "normal"
connection: I added 1 million rows via COPY using the normal way
(1 million pg_putline calls), via pg_putline of 1000 rows at a
time, and via 10,000 rows at a time. They all ran in 22 seconds,
with no statistical difference between them. (This was the "real" time,
the system time was actually much lower for the combined calls).

It can't hurt to test things out on your particular system and see
if it makes a real difference: it certainly does no harm as long as
you make sure the string you send always *end* in a newline.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200504072201
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCVeZrvJuQZxSWSsgRAoP+AJ9jTNetePMwKv9rdyu6Lz+BjSiDOQCguoSU
ie9TaeIxUuvd5fhjFueacvM=
=1hWn
-----END PGP SIGNATURE-----


From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: multi-line copy (was: Re: COPY Hacks)
Date: 2005-04-08 04:53:22
Message-ID: 1112936002.42560e420adf7@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Quoting Greg Sabino Mullane <greg(at)turnstep(dot)com>:

> > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> > I accidentally strung together several \n-terminated input lines,
> > and sent them to the server with a single "putline".
> ...
> > So. Is it a feechur? Worth stress-testing? Could be VERY cool.
>
> As explained elsewhere, not really a feature, more of a side-effect.
> Keep in mind, however, that any network round-trip time saved has to
> be balanced against some additional overhead of constructing the
> combined strings in Perl before sending them over. Most times COPY
> is used to parse a newline-separated file anyway. If you have a slow
> network connection to the database, it *might* be a win, but my
> limited testing shows that it is not an advantage for a "normal"
> connection: I added 1 million rows via COPY using the normal way
> (1 million pg_putline calls), via pg_putline of 1000 rows at a
> time, and via 10,000 rows at a time. They all ran in 22 seconds,
> with no statistical difference between them. (This was the "real" time,
> the system time was actually much lower for the combined calls).
>
> It can't hurt to test things out on your particular system and see
> if it makes a real difference: it certainly does no harm as long as
> you make sure the string you send always *end* in a newline.

Many thanks for digging into it.

For the app I'm working with, the time delay between rows being posted
is /just/ enough to exceed the TCP Nagle delay, so every row goes across
in its own packet :-( Reducing the number of network roundtrips
by a factor of 40 is enough to cut elapsed time in half.
The cost of join("",@FortyRows), which produces a 1-4K string, is what's
negligible in this case.

--
"Dreams come true, not free" -- S.Sondheim, ITW