Re: Databases vs Schemas - Mailing list pgsql-performance
| From | Merlin Moncure |
|---|---|
| Subject | Re: Databases vs Schemas |
| Date | |
| Msg-id | b42b73150910100626u695179edq98b2f21ea3b4c533@mail.gmail.com Whole thread Raw |
| In response to | Re: Databases vs Schemas (Scott Carey <scott@richrelevance.com>) |
| List | pgsql-performance |
On Fri, Oct 9, 2009 at 10:50 PM, Scott Carey <scott@richrelevance.com> wrote:
> On 10/9/09 2:02 PM, "Merlin Moncure" <mmoncure@gmail.com> wrote:
>
>> On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis <scott.otis@intand.com> wrote:
>>> Over the next couple of months we will be creating an instance of our
>>> solution for each public school district in the US which is around 18,000.
>>> That means currently we would be creating 18,000 databases (all on one server
>>> right now which is running 8.4). I am assuming this is probably not the
>>> best way of doing things.
>>
>> Schema advantages:
>> *) maintenance advantages; all functions/trigger functions can be
>> shared. HUGE help if you use them
>> *) can query shared data between schemas without major headaches
>> *) a bit more efficiency especially if private data areas are small.
>> kinda analogous to processes vs threads
>> *) Can manage the complete system without changing database sessions.
>> This is the kicker IMO.
>>
>> Database Advantages:
>> *) More discrete. Easier to distinctly create, dump, drop, or move to
>> separate server
>> *) Smaller system catalogs might give efficiency benefits
>>
>
> I'm concerned how a system with 57 * 18000 > 1M tables will function.
>
> I've got 200,000 tables in one db (8.4), and some tools barely work. The
> system catalogs get inefficient when large and psql especially has trouble.
> Tab completion takes forever, even if I make a schema "s" with one table in
> it and type "s." and try and tab complete -- its as if its scanning all
> without a schema qualifier or using an index. Sometimes it does not match
> valid tables at all, and sometimes regex matching fails too ('\dt
> schema.*_*_*' intermittently flakes out if it returns a lot of matches).
> Other than that the number of tables doesn't seem to cause much performance
> trouble. The only exception is constraint exclusion which is fundamentally
> broken with too many tables on the performance and memory consumption side.
>
> Having a lot of tables really makes me wish VACUUM, ANALYZE, and other
> maintenance tools could partially matched object names with regex though.
>
> On the other hand, lots of databases probably has performance drawbacks too.
> And its maintenance drawbacks are even bigger.
>
> I certainly don't see any reason at all to try and put all of these in one
> schema. The only useful choices are schemas vs databases. I'd go for
> schemas unless the performance issues there are a problem. Schemas can be
> dumped/restored/backed up independent of one another easily too.
They can, but: drop schema foo cascade; is a different operation than:
drop database foo; The first is kinda surgical and the second is a
rocket launcher. What would you rather have in battle?
For the record, just about every database I've ever designed has had
some of what I call 'de facto table partitioning' using
schemas/search_path tricks. I'm working on a system right now that is
going to get very large and if I started to run into psql problems I'd
probably look at patching it, maybe \set an option to simplify some
of the queries.
merlin
pgsql-performance by date: