I would like to know from the experienced programmers what they consider to be the most important considerations when designing a new database.
-
- Enforcing the integrity of the data. Foreign keys and primary keys are important and should be defined in the database (not solely enforced by the application).
- No duplication; store data in one place. EDIT: In the case of data warehousing, duplication may be acceptable.
- Primary keys should be one column, not multiple. EDIT: Multiple column primary keys are acceptable when foreign key columns make up part of the primary key. However, I have issues with, for example, a Person table that has a first name, last name, and address column that all make up the primary key. In this case, there should be a surrogate primary key.
DancesWithBamboo : #3 is absolutely wrong. Tables holding many-to-many relationships will always have at least 2 columns as part of the key.chburd : depending on your use, #2 may be wrong. datawarehouse often denormalize some data to save time on big long queriesMitch Wheat : #3 is totally wrong (as pointed out already) -
Fidelity with the real world entities that the database is supposed to model.
Steven A. Lowe : +1 if you don't have that, you will have a mess that will never work right. -
I'd personally suggest picking up or borrowing a copy of "Database Design for Mere Mortals". Everything you'd ever need to consider in designing a database would be listed in that book, and it's in a very methodical and logical order in which you can build out the database. The Table and Column definitions are tedious, but worth every minute used in the end.
I believe you might be able to read the first chapter if the book via Google Books or via the page preview on Amazon.com.
There are some tidbits you can learn over time or from this site as 'best practices', but nothing beats designing it from the ground up the correct way on the first try.
-
First off and most important learn and understand the business domain.
1) Are you looking at a high transaction rate like a busy web site, or low use like a a small company HR system
2) Is security a big issue - are you handling personal details, or financial data. Or is it just a product catalogue
3) Will your users be doing many updates/inserts or is it mainly read only
4) How many users, what are the usage patterns (peak load or evenly distributed)
5) Do you need 24x7, 16x5 or other uptime, 24x7 is much harder to do as you have no down time for maintenance
6) How big is the DB going to go? If it's really big you'll have to design your tables to take account of that and/or partition
7) Do you need to look at enterprise cluster with hot fail over, or just normal hosting
8) How will the DB be adminstered, in most DB projects 95% of the effort is spent developing for the users and their applications, DB admin is forgotten
9) DB Admin, from previous includes backups, changes to other systems, integration to other systems, data loading
10) Actually Data loading and using existing data is another big issue in its own right.
That's it for a start
-
The database is secondary to your business process design and should cleanly support your business process in a direct and simple way. You will gain far more benefit from a well-formed, clean, entity model than you will from an index here and there. So once your process is defined, you take it and split it up into "entities" as cleanly as possible with relations that make sense. Once you know your entities, they translate into database tables.
One of the most important things to do is to not overarchitect.
To give you an answer with some teeth, let's take a "vehicle" entity as an example. A vehicle has multiple wheels. You have a critical decision to make knowing that there will be multiple wheels attached to the vehicle. You have 2 choices to make - You can make "wheels" a separate entity, or you can make "number of wheels" an integer field in the "Vehicle" entity.
If you absolutely know that you will need to store lots of changing information about each wheel, then create a "Wheel" entity. You now have a relationship between entities (the car and the wheel).
If not, a simple field will do just fine.
Thinking through these critical decisions and making things as simple as possible is by far the most important thing for me when designing a database. It can make the difference between things being really easy and really difficult when you build the next layer(s) of your application.
-
A basic set of points:
- Determine the purpose of your system.
- Identify the entities that your system will need.
- Identify which information each entity should provide.
- Identify the exiting relationships between your entities
- What would the user want to know about and do with your data.
- Conceptual and Logical Database Design
- Normalization and ERD
- Identify fields with unique values.
- Select the appropriate data types for your fields.
- Database refactoring.
-
you also have to understand what the database will be used for. if it's for transactions (OLTP), it should be as normalised as possible, and the goal is for transactions to be completed as quickly as possible. if it's for analysis and/or reporting (OLAP), it should be denormalised in many places where you'll be performing aggregation. design considerations for OLTP databases are not applicable to OLAP databases, and vice-versa.
-
Know your data.
-
who's going to build and maintain it, where , how and with what. DO you have methods and procedures and processes for doing this or just winging it. Certainly the Business needs drive the data needed which should be captured in an implementation independent ERD. But, you also have to think about who will maintain the data over time. As well as who "owns" the data. Who owns entity and attribute definitions.
-
The information requirements are the most important part.
This is another way of saying "determine the purpose of your system", from a response provided by CMS.
Conceptual data modeling is just an organized way of collecting and presenting the information requirements. Every value stored and served up by the database is connected to an attribute, and every attribute to a domain. Attributes, in turn, describe either entities or relationships among entities. Subject matter entities and relationships make up the conceptual structure of the "real world" that the data describes. Building an ERD from a conceptual model is easy, although tedious.
After that, you can pick a DBMS, design the logical database, design the physical database, and build. At each step, the decisions you make are more reversible, due to data independence. Data independence encapsulates design decisions inside the database, except for performance consequences. You have to know your tool, of course.
Having a tool for managing models, and converting them to diagrams and scripts can be very helpful at speeding this process up, and cutting down on errors.
But if you have serious errors or omissions in your information requirments, no amount of clever design or implementation is going to make up for that.
-
1 - Consistency
Over time your database will change and other people will need to work with it. Do yourself and them a favor and make sure that the structures are named in such a way that any reasonable person with basic domain knowledge will be able to anticipate the contents of the table. Take the time to write down (could be a simple as notepad) some basic constructs that you use.
Examples:
- Primary keys all start with IdTableName
- Casing of table names is Pascal
- Foreign keys are all TableNameId
- ext...
Whether you choose to use underscores or not (substitute any other conversion for underscores) doesn't really matter at the end of the day as long as you are consistent in the way that you use or don't use them.
Your database is the last line of defense for data integrity. Do all of your data access through stored procedures and enforce the integrity of the data by using check constraints, foreign keys and so on. Type the data correctly, don't use VARCHAR(50) when CHAR(5) is more specific and accurate.
Someone else mentioned something about keeping it simple. Last but not least don't build something because you "think" you will need it next month. Things change quickly and you will end up doing more maintenance on stuff you "thought" you were going to use rather than things that you are using if you fill your database will stuff that serves no purpose.
-
A good database can be judged as follows:
If a database is properly designed, you should be able to understand how a business operates by looking at nothing more then the schema.
In other words, the database is the business. If the database isn't reflecting how the business is run, either the database is wrong or the business is wrong.
The database is also one of the few things that you really, really need to nail upfront. You can always fix bad code, but rarely can you back out of a bad schema change. Make sure to do it right.
0 comments:
Post a Comment