In both the Relational Model, and its practical realisation by SQL, null is used to represent “missing”, “inapplicable” or “unknown” information. For example, a record might exist for a given Part Number, but there is no attribute/value given for its weight – it is not zero grams, it is unknown. In terms of Three-Value Logic, the result of a predictate is one of: True, False or Unknown. In the Relational Model and SQL, null is not a value, but is a flag or marker, independent of any data type, that there is no value. However, in practice it is often incorrectly treated as a value.
Null is represented in the Relational Model and SQL by the keyword NULL, and as it is not a value it cannot be specified as the result of a search condition, so thus requires a mechanism to refer to it specifically: SQL uses IS NULL, a Relational Model tuple containing null values would be: <David, Jones, null, 35, null>.
Codd (1985) published a set of 13 rules to determine whether a DBMS can be considered Relational. Codd’s Rule 3 specified The Systematic Treatment of Null Values, distinct from any empty character string, zero, or any other number.
C.J. Date on the other hand, argues that nulls are a mistake because they are not values, and have no place in the Relational Model, instead he proposes using Special Values in their place, claiming that this is what we do in the real world, and that there is no such thing as null in the real world (Date, 2004). In this context, a special value would be a value (of the correct data type) inserted when no other value is available/appropriate. For example, a record might exist to indicate that a student has submitted an assignment, which records such information as the student’s name and date of submission. However, if the assignment has not yet been marked, the student’s score would be recorded as a value outside of the range of normal values to indicate this, such as -1 for a paper with possible scores of 1 – 100 marks. When using special values, it is imperative that the special value cannot represent any valid value for the attribute being represented.
A distinct disadvantage with the use of special values becomes apparent with SQL functions such as MAX(), MIN() and AVG(). SQL will skip any rows containing null in a field used as part of a search expression, therefore not including the row/value in any mathematical calculations. However, returning to the above example of student assignments, MIN(score) would return the special value of -1 as it’s a legitimate value for the data type. Similarly, AVG(score) would also include the -1, even though it clearly should not as the -1 is being used to indicate that the assignment hasn’t yet been marked. Neither the -1 nor the row itself should be included in the calculation of the average.
This means that the special value must be known and catered for in the formulation of the query and/or interpretation of the results, eg:
SELECT AVG(score)
FROM student_assignments
WHERE score > -1
In contrast to the explicit inclusion of the search condition shown above, SQL would automatically ignore any rows where the score was null. Special values require the end-user to have an intimate knowledge of the database and the special values in use, which can vary from database to database and even table to table.
Special values can lead to inconsistencies and anomalies in the data, especially where they are input by the end user. For example, a record may have fields for FirstName, MiddleName, Surname. Not everyone has a middle name so a special value is required to take the place of the null, this could be: N/A, na, none, etc. This special value must then be specifically taken into account to prevent issues such as letters addressed to “Dear David None Jones”, or worse still “Dear Mr Deceased”, which has happened on numerous occasions where end users have inserted a special value which is meaningful to themselves but not the DBMS! These issues can however be mitigated to a certain extent by the use of SQL Default values, e.g.:
CREATE TABLE customer (
…
MiddleName VARCHAR(30) DEFAULT “None”,
… );
The use of special values in place of nulls also works against standardisation, integration and interoperability. Consider ODBC which provides a standardised interface to any DBMS with a suitable ODBC Driver. The end user application can make a call to the ODBC interface to find out whether a column value is null (eg SQL’s IS NULL), irrespective of the backend DBMS, and the driver will handle the ‘translation’ of the call. The DBMS could be replaced with an entirely different implementation but the call remains unaffected – it’s the job of the DBMS manufacturer’s ODBC driver to handle the communication between the ODBC interface and the DBMS. This DBMS independence would be lost, or at least severely impaired, through the use of special values instead of nulls.
CJ Date argues that nulls have no place in the Relational Model because they do not exist in the real world. However, the Relational Model is conceptual whereas relational databases do exist and play a mission-critical role in the real world and the concept/use of nulls aid DBMS in this role, along with the people who use them – one of the essential components of any Information System.
References
Webopedia, n.d.. What is Codd's Rules? [Online] Available at http://www.webopedia.com/TERM/C/Codds_Rules.html [Accessed 6 June 2010].
Date, C.J., 2004. “An Introduction to Database Systems” Eighth Edition, page 591, Addison-Wesley