Thursday, April 28, 2011

Which table structure is better

We have to create few tables for the sole purpose of reporting in Oracle.

Option 1

Receivable Table

  • RefNo
  • Date
  • TrnType eg:Tax, Fee, Premium
  • Amount

Option 2

Receivable Table

  • RefNo
  • Date
  • Tax
  • Fee
  • Premium

Note: For a given RefNo all types Tax, Fee and Premium or a sub set of them can exist.

What would be the optimal structure(Tables will have over 100k records)

From stackoverflow
  • If your transaction types are explicitly defined and unlikely to be sparsely populated (ie, most records will have values for all 3) then the latter is more likely to be your best option. It also represents the data in a format that's closer to how you'd think about it in reality.

    Even if the values can be sparse, "gut instict" makes me still lean toward the column-based approach rather than the row-based.

  • Neither of those is actually the best (in terms of how DBAs think). The best would be (assuming RefNo is unique and therefore the primary key):

    Receivables:
        RefNo
        Date
    ReceivableDollarVals:
        RefNo
        Type
        Amount
    

    If RefNo/Date is the primary key, add date to the second table as well.

    This allows you to minimize storage space for those rows that don't have all three types (although savings is minimal). Then you use WHERE clauses combining the two tables (or JOINs) to do your queries.

    It also allows you to add other types at will without restructuring the database.

    However, you need to keep in mind that third normal form is an ideal. It's quite acceptable to violate the rules to gain performance as long as you understand the implications.

    100,000 records is actually quite small so, unless you think you're going to be adding more types in the near future, I'd go for your option 2 and use zeros for those values that don't exist. NULLs would most likely make your queries a little more complicated.

    mghie : +1. It also doesn't have the potential to give different dates for the same RefNo.
    Adam Robinson : Storing the data in discreet fields rather than individual rows is not necessarily a violation of 3NF. Violating 3NF would mean that the values are not directly related to the primary key (refno). I'd be very surprised if this was the case.
    paxdiablo : Good point, @Adam, since 3NF is just column depends on key, whole key and nothing but key. We still design our tables to avoid the ways given in the question but that's because we tend to have LOTS of 'columns' which may be NULL. Maybe I should call my approach 3.5NF :-)
    Eranga : clarification: As i'hv mensioned this table is included just for reporting so denormalization is acceptable (since lots of joins will be required to get summaries without this type of table).
    paxdiablo : @SecretWiz, your expected table size (both rows [<10] and columns [100K]) will not cause any serious degradation in storage and the speed you will gain from a non-normalized table should be more than noticeable. I'd still go for option 2 in your particular case.
    Adam Robinson : @PAX: The advisability of that depends on a number of factors (how sparse it really is, etc.). In this scenario--while I'm obviously not entirely familiar--it doesn't really seem like you're going to have a majority of rows with only one or two of the values. This matters more with many fields.
  • The real advantage of the fully normalized version in the first answer comes when the requirements change -- when someone changes the specs so you have to add types beyond the 3 you've identified.

    Like discount, refund, whatever. Those changes do happen.

    The normalized structure should let you do that more easily, without needing to change either table structure or most of the programs that use the data.

    But the normalized structure does require more investment in the beginning -- every new transaction involves inserting into 2 tables, you need to have a check constraint to control the types, etc.

    Generally, you'll do better in the long term with the normalized structure. However, with a simple case lik this, you can sometimes get away without normalizing and not have to pay the consequences (at least, nobody has to pay until you're long gone and it's somebody else's problem).

    Professionally, reasonable levels of normalization should be your standard strategy and you should require yourself to have very good reasons for denormalizing.

  • If Table will have over 100k records. Then Option 2 is the good choice. Bcz option 1 slow down the accessing of data.

0 comments:

Post a Comment