Home > mailing lists

Re: Databases vs Schemas - Mailing list pgsql-performance

From	Merlin Moncure
Subject	Re: Databases vs Schemas
Date	October 10, 2009 10:26:38
Msg-id	b42b73150910100626u695179edq98b2f21ea3b4c533@mail.gmail.com Whole thread Raw
In response to	Re: Databases vs Schemas (Scott Carey <scott@richrelevance.com>)
List	pgsql-performance

Tree view

On Fri, Oct 9, 2009 at 10:50 PM, Scott Carey <scott@richrelevance.com> wrote:
> On 10/9/09 2:02 PM, "Merlin Moncure" <mmoncure@gmail.com> wrote:
>
>> On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis <scott.otis@intand.com> wrote:
>>> Over the next couple of months we will be creating an instance of our
>>> solution for each public school district in the US which is around 18,000.
>>> That means currently we would be creating 18,000 databases (all on one server
>>> right now  which is running 8.4).  I am assuming this is probably not the
>>> best way of doing things.
>>
>> Schema advantages:
>> *) maintenance advantages; all functions/trigger functions can be
>> shared.  HUGE help if you use them
>> *) can query shared data between schemas without major headaches
>> *) a bit more efficiency especially if private data areas are small.
>> kinda analogous to processes vs threads
>> *) Can manage the complete system without changing database sessions.
>> This is the kicker IMO.
>>
>> Database Advantages:
>> *) More discrete.  Easier to distinctly create, dump, drop, or move to
>> separate server
>> *) Smaller system catalogs might give efficiency benefits
>>
>
> I'm concerned how a system with 57 * 18000 > 1M tables will function.
>
> I've got 200,000 tables in one db (8.4), and some tools barely work.  The
> system catalogs get inefficient when large and psql especially has trouble.
> Tab completion takes forever, even if I make a schema "s" with one table in
> it and type "s." and try and tab complete -- its as if its scanning all
> without a schema qualifier or using an index.  Sometimes it does not match
> valid tables at all, and sometimes regex matching fails too ('\dt
> schema.*_*_*' intermittently flakes out if it returns a lot of matches).
> Other than that the number of tables doesn't seem to cause much performance
> trouble.  The only exception is constraint exclusion which is fundamentally
> broken with too many tables on the performance and memory consumption side.
>
> Having a lot of tables really makes me wish VACUUM, ANALYZE, and other
> maintenance tools could partially matched object names with regex though.
>
> On the other hand, lots of databases probably has performance drawbacks too.
> And its maintenance drawbacks are even bigger.
>
> I certainly don't see any reason at all to try and put all of these in one
> schema.  The only useful choices are schemas vs databases.  I'd go for
> schemas unless the performance issues there are a problem.   Schemas can be
> dumped/restored/backed up independent of one another easily too.

They can, but: drop schema foo cascade; is a different operation than:
drop database foo;  The first is kinda surgical and the second is a
rocket launcher.  What would you rather have in battle?

For the record, just about every database I've ever designed has had
some of what I call 'de facto table partitioning' using
schemas/search_path tricks.  I'm working on a system right now that is
going to get very large and if I started to run into psql problems I'd
probably look at patching it, maybe \set an option  to simplify some
of the queries.

merlin

pgsql-performance by date:

From: tsuraan
Date: 10 October 2009, 02:14:41
Subject: Re: UUID as primary key

From: Chris Kratz
Date: 10 October 2009, 12:17:46
Subject: Re: Databases vs Schemas

Re: Databases vs Schemas - Mailing list pgsql-performance

Previous

Next