Navigation
Search
|
Database design tips for developers
Wednesday July 16, 2025. 11:00 AM , from InfoWorld
It is a universal truth that everything in software eventually rots. Things slowly get worse. What starts out as cool and clean becomes “what the hell were they thinking” before you know it.
But that doesn’t mean we sit back and let it happen. We should fight code rot like a cat that just realized it has socks on. One area of code rot that doesn’t get enough attention is database design. Now, there are some big and important rules for designing databases, but I’m not talking about those here. Here I’ll focus on some lesser database design tips that might not be on your radar. Let’s call them small and important rules. However, I will say this about those big rules: If you are designing a database and you don’t know what database normalization is, stop right now and go learn. Don’t even consider designing a database until you understand second normal form and third normal form. And try for third normal form. Every table has an ID field I know this is debatable, but I believe that every single table in your database should have a primary key named ID. That’s it, just ID. Not CustomerID, not OrderID, just ID. It should be an auto-incremented integer value (or maybe a UUID if you have a really, really good reason, like a distributed system). There should, of course, be an index on the ID field. It should be a very, very rare case that you would need a multiple-field key for a table that isn’t a cross-reference for a many-to-many relationship. Never put spaces in table or field names I hereby heap eternal scorn and shame upon the person who thought it would be a good idea to include spaces in table or field names. Just don’t. Spaces in names make you use quotation marks, which you forget, and they make you ask, “Is that with or without spaces?” when you are writing a query. It’s a pain in the ass. Never use spaces and you’ll never have to wonder about it again. And for the sake of sweet baby Elvis, please don’t use underscores in names, either. I have no idea how people can stand typing names_like_this. My pinky wants to file a workman’s comp claim just thinking about it. Table names are plural Again, this is a great debate, but I believe that tables represent many things, not one thing. Thus, table names should always be plural. Customers, not Customer. This way, when you see the word Orders, you know it is referring to the table. If you name the table Order, you create an ambiguity around the word “order.” Are you talking about the table or a row in the table? I know a lot of ink has been spilled on this topic. I strongly prefer plural names. Whatever you do, pick one system and stick with it. Foreign keys should be clearly labeled Remember what I said above about the ID field? Here’s where it comes into play. If you have a row in the Orders table that references a customer (i.e., a foreign key), give it the name CustomerID. Any field named ID will always be a foreign key to the table. Do this consistently throughout your schema, and it will always be clear which fields are foreign keys and which table those fields refer back to. Index what you query Put an index on every field that shows up in a WHERE, JOIN, or ORDER BY clause. Do this religiously and you’ll avoid many performance problems down the road. There may be exceptions, but you should find them by over-indexing, not under-indexing. Assume an index is needed and then let your query analyzer convince you to remove any indices that are causing problems. Referential integrity is not optional Ensuring that the relations between tables remain intact and that there are no orphaned records in your database is essential for data integrity. All modern relational databases have referential integrity. Use it and enforce it ruthlessly from the very start. Do not rely on your code to maintain those relationships. The database has the capability, and you should employ it. Don’t embed SQL in your code If you ever embed SQL in your code, even “just this one time,” you will live to regret it. Never mind that it opens the door for “just one more time.” Embedded SQL tangles your code and couples it to your database in ways that will result in a huge pile of spaghetti. Remember, let the database do the work. If you need to use SQL in your code, maintain it apart from the code, and don’t require the compiler to process it. Store it in separate files that can be embedded or used outside of the code and can be updated without altering code logic. Some additional thoughts As a general rule, if the database can do it for you, let the database do it for you. Databases know how to handle data about 453.7 times better than you do. Don’t try to do their job. If you ever feel like adding fields ending in 1, 2, 3, etc., don’t. Read up on normalization. Use the correct data type for a column. Don’t use a number for a boolean or a string for a date. Strongly consider adding CreatedAt and UpdatedAt timestamp fields to every table. You’ll be surprised at how often you end up being glad that you did. Automate these timestamps with triggers, and they will become useful and painless. Parameterized stored procedures are your friends. Use them as much as you can. Your query analyzer is an order of magnitude better at deciding what is the best way to query data than you are. Beware of booleans. Null turns booleans into quantum states—neither true nor false until someone runs a query. Don’t use a boolean unless you know exactly what null means in that context. Don’t depend on string values to define state. Use an enumerated value instead, ensuring that the data is never wrong. Don’t let status = 'bananna' cause an error because someone fat-fingered a field. It is a universal truth that everything in software eventually rots. Things slowly get worse. What starts out as cool and clean becomes “what the hell were they thinking” before you know it. But that doesn’t mean we sit back and let it happen. We should fight code rot like a cat that just realized it has socks on. One area of code rot that doesn’t get enough attention is database design. Now, there are some big and important rules for designing databases, but I’m not talking about those here. Here I’ll focus on some lesser database design tips that might not be on your radar. Let’s call them small and important rules. However, I will say this about those big rules: If you are designing a database and you don’t know what database normalization is, stop right now and go learn. Don’t even consider designing a database until you understand second normal form and third normal form. And try for third normal form. Every table has an ID field I know this is debatable, but I believe that every single table in your database should have a primary key named ID. That’s it, just ID. Not CustomerID, not OrderID, just ID. It should be an auto-incremented integer value (or maybe a UUID if you have a really, really good reason, like a distributed system). There should, of course, be an index on the ID field. It should be a very, very rare case that you would need a multiple-field key for a table that isn’t a cross-reference for a many-to-many relationship. Never put spaces in table or field names I hereby heap eternal scorn and shame upon the person who thought it would be a good idea to include spaces in table or field names. Just don’t. Spaces in names make you use quotation marks, which you forget, and they make you ask, “Is that with or without spaces?” when you are writing a query. It’s a pain in the ass. Never use spaces and you’ll never have to wonder about it again. And for the sake of sweet baby Elvis, please don’t use underscores in names, either. I have no idea how people can stand typing names_like_this. My pinky wants to file a workman’s comp claim just thinking about it. Table names are plural Again, this is a great debate, but I believe that tables represent many things, not one thing. Thus, table names should always be plural. Customers, not Customer. This way, when you see the word Orders, you know it is referring to the table. If you name the table Order, you create an ambiguity around the word “order.” Are you talking about the table or a row in the table? I know a lot of ink has been spilled on this topic. I strongly prefer plural names. Whatever you do, pick one system and stick with it. Foreign keys should be clearly labeled Remember what I said above about the ID field? Here’s where it comes into play. If you have a row in the Orders table that references a customer (i.e., a foreign key), give it the name CustomerID. Any field named ID will always be a foreign key to the table. Do this consistently throughout your schema, and it will always be clear which fields are foreign keys and which table those fields refer back to. Index what you query Put an index on every field that shows up in a WHERE, JOIN, or ORDER BY clause. Do this religiously and you’ll avoid many performance problems down the road. There may be exceptions, but you should find them by over-indexing, not under-indexing. Assume an index is needed and then let your query analyzer convince you to remove any indices that are causing problems. Referential integrity is not optional Ensuring that the relations between tables remain intact and that there are no orphaned records in your database is essential for data integrity. All modern relational databases have referential integrity. Use it and enforce it ruthlessly from the very start. Do not rely on your code to maintain those relationships. The database has the capability, and you should employ it. Don’t embed SQL in your code If you ever embed SQL in your code, even “just this one time,” you will live to regret it. Never mind that it opens the door for “just one more time.” Embedded SQL tangles your code and couples it to your database in ways that will result in a huge pile of spaghetti. Remember, let the database do the work. If you need to use SQL in your code, maintain it apart from the code, and don’t require the compiler to process it. Store it in separate files that can be embedded or used outside of the code and can be updated without altering code logic. Some additional thoughts As a general rule, if the database can do it for you, let the database do it for you. Databases know how to handle data about 453.7 times better than you do. Don’t try to do their job. If you ever feel like adding fields ending in 1, 2, 3, etc., don’t. Read up on normalization. Use the correct data type for a column. Don’t use a number for a boolean or a string for a date. Strongly consider adding CreatedAt and UpdatedAt timestamp fields to every table. You’ll be surprised at how often you end up being glad that you did. Automate these timestamps with triggers, and they will become useful and painless. Parameterized stored procedures are your friends. Use them as much as you can. Your query analyzer is an order of magnitude better at deciding what is the best way to query data than you are. Beware of booleans. Null turns booleans into quantum states—neither true nor false until someone runs a query. Don’t use a boolean unless you know exactly what null means in that context. Don’t depend on string values to define state. Use an enumerated value instead, ensuring that the data is never wrong. Don’t let status = 'bananna' cause an error because someone fat-fingered a field. I’ve left you with a lot of dos and don’ts here. Again, the most important takeaway is to pick a set of rules and enforce them ruthlessly. Do that today, and you’ll save yourself many headaches down the road. Future you will thank you, believe me.
https://www.infoworld.com/article/4022772/database-design-tips-for-developers.html
Related News |
25 sources
Current Date
Jul, Thu 17 - 01:01 CEST
|