Friday, April 8, 2011

How do you determine if a database table relationship merits enforcing referential integrity?

I have an application where the majority of the database tables have a strong relationship to one other table. Currently I am enforcing referential integrity with foreign keys, but I'm wondering if this is really the best approach. Data in the primary table can be deleted from an admin interface by business users, which means having to do a cascading delete, (or writing several delete statements), but I'm not sure if I really want to remove all that other data at the same time. It could be a lot of data that *might* be useful at a later date (reporting maybe?). However, the data in the secondary tables is basically useless to the application itself unless the relationship exists with the primary table.

From stackoverflow
  • You can use foreign keys and relationships to enforce referential integrity without having to use cascading deletes. I seldom use cascading deletes as I've always found it's often better to have the data and manage/archive it well than it is to delete it.

    Just write your own delete logic to support your own business rules.

    Logical deletions work excellently as well and I use them extensively.

    Mitch Wheat : +1. I also avoid Cascade Deletes.
  • Given the option, I always keep data around. And since you already have foreign keys in place, you have some built-in protection from integrity violations.

    If what your users want is to "delete" a record, therefore hiding it from the application, consider the "virtual delete" strategy -- mark a record as inactive, instead of physically removing it from the database.

    As for implementation, depending on your db, add whatever equates to boolean/bit logic for your table. All rows get assigned true/1 by default; "deletes" are marked as false/0.

  • I would say use foreign key constraints as a rule - this "safeguards" your DB design long-term, as well as data integrity itself. Constraints are there also to explicitly state a designer's decision.

    I've seen constraints ditched on extremely large databases - that would be one reason not to use them, if you compare the performance and there is a significant foreign key overhead.

  • I'd use logical/soft delete. This basically means adding one more column (possibly bit column Deleted) to the table in question, which would mark a particular row as deleted.

    That said, "deleted" data is just that: deleted. Thus it cannot logically be used in reporting and similar stuff. In order to overcome this, I'd also introduce Hidden column to hide certain rows retaining their logical meaning.

  • You don't want to delete some of the data - you'll likely end up with rogue data, that you have no idea where it belonged in the first place. It's either all or nothing.

    Soft delete, i.e. having a bit field on every row that determins if the record is "deleted" or not is the way to go. That way, you simply check if the record is deleted == true in the API, and hide it from the application.

    You keep the data, but no one can retrieve it through the application.

  • Never do physical deletes. You can add a BOOL flag IsDeleted to indicate the record is deleted. When you want to "Delete" a record, simply set the flag to True.

0 comments:

Post a Comment