First normal form: Difference between revisions

From HandWiki
imported>Jport
simplify
 
simplify
 
Line 1: Line 1:
{{short description|Property of a relation in a relational database}}
{{short description|Level of database normalization}}
'''First normal form''' ('''1NF''') is a property of a relation in a [[Relational database|relational database]]. A relation is in first normal form if and only if no attribute domain has relations as elements.<ref>Codd, E.F (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. Classics. 13 (6): 377–87. p. 380-381</ref> Or more informally, that no table column can have tables as values. [[Database normalization]] is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. [[SQL-92]] does not support creating or using table-valued columns, which means that using only the "traditional relational database features" (excluding extensions even if they were later standardized) most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called [[NoSQL]] systems. Newer SQL standards like [[SQL:1999|1999]] have started to allow so called non-atomic types, which include composite types. Even newer versions like [[SQL:2016|2016]] allow JSON.
{{citation style|date=May 2025}}


==Overview==
'''First normal form''' ('''1NF''') is the most basic level of [[Database normalization|database normalization]] defined by English computer scientist [[Biography:Edgar F. Codd|Edgar F. Codd]], the inventor of the [[Relational database|relational database]]. A relation (or a [[Software:Table (database)|''table'']], in [[SQL]]) can be said to be in first normal form if each field is ''atomic'', containing a single value rather than a set of values or a nested table. In other words, a relation complies with first normal form if no [[Attribute domain|attribute domain]] (the set of values allowed in a given column) has relations as elements.<ref>Codd, E. F. (1972). "Further Normalization of the Data Base Relational Model". p. 27</ref>


In a hierarchical database, a record can contain sets of child records ― known as repeating groups or table-valued attributes. If such a data model is represented as relations, a repeating group would be an attribute where the value is itself a relation. First normal form eliminates nested relations by turning them into separate "top-level" relations associated with the parent row through foreign keys rather than through direct containment.
Most relational database management systems, including standard SQL, do not support creating or using table-valued columns, which means most relational databases will be in first normal form by necessity. Otherwise, normalization to 1NF involves eliminating nested relations by breaking them up into separate relations associated with each other using [[Foreign key|foreign key]]s.<ref name="Codd 1970">{{Cite journal |title=A relational model of data for large shared data banks |journal=Communications of the ACM |last=Codd |first=E. F. |volume=13 |issue=6 |pages=377&ndash;387 |year=1970 |doi=10.1145/362384.362685}}</ref>{{rp|pages=381}} This process is a necessary step when moving data from a non-relational (or [[NoSQL]]) database, such as one using a hierarchical or [[Document-oriented database|document-oriented]] model, to a relational database.


The purpose of this normalization is to increase flexibility and [[Data independence|data independence]], and to simplify the data language. It also opens the door to further normalization, which eliminates redundancy and anomalies.
A database must satisfy 1NF to satisfy further "[[Database normalization#Normal forms|normal forms]]", such as [[Second normal form|2NF]] and [[Third normal form|3NF]], which enable the reduction of redundancy and anomalies. Other benefits of adopting 1NF include the introduction of increased [[Data independence|data independence]] and flexibility (including features like [[Many-to-many (data model)|many-to-many]] relationships) and simplification of the [[Software:Relational algebra|relational algebra]] and [[Query language|query language]] necessary to describe operations on the database.


Most relational database management systems do not support nested records, so tables are in first normal form by default. In particular, [[SQL]] does not have any facilities for creating or exploiting nested tables. Normalization to first normal form would therefore be a necessary step when moving data from a hierarchical database to a relational database.
Codd considered 1NF mandatory for relational databases, while the other normal forms were merely guidelines for database design.<ref>{{Cite journal |title=Extending the database relational model to capture more meaning |journal=ACM Transactions on Database Systems |last=Codd |first=E. F. |volume=4 |issue=4 |pages=397&ndash;434 |year=1979 |doi=10.1145/320107.320109}}</ref>{{rp|page=439}}


== Rationale ==
== Background ==
First normal form was introduced in 1970 by [[Biography:Edgar F. Codd|Edgar F. Codd]] in his paper "A relational model of data for large shared data banks",{{r|Codd 1970}} although initially it was simply referred to as "normalization" or "normal form". It was renamed to "first normal form" when Codd introduced additional normal forms in his paper "Further Normalization of the Data Base Relational Model" in 1971.<ref>Codd, E. F. (1971). "Further Normalization of the Data Base Relational Model". ''Data Base Systems. Courant Computer Science Symposium 6'' edited by Rustin, R.</ref>


The rationale for normalizing to 1NF:<ref>Codd, E.F (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. Classics. 13 (6): 377–87. </ref>
The relational model was proposed as an improvement over hierarchical databases which were prevalent at the time.{{r|Codd 1970|p=377}} A key difference lies in how relationships between records are represented. In a hierarchical database, one-to-many relationships are represented through containment: a single record may contain sets of records (known as repeating groups) as attribute values. But Codd argued that hierarchy is not flexible and expressive enough for more complex data models. For example, many-to-many relationships cannot be represented through hierarchy.{{r|Codd 1970|p=378}} Thus he suggest eliminating nested records and instead represent relationship through [[Foreign key|foreign key]]s. This allows richer relationships to be expressed, since a record can now participate in multiple relationships.{{r|Codd 1970|p=378}}


* Allows presenting, storing and interchanging relational data in the form of regular two-dimensional arrays. Supporting nested relations would require more complex data structures.
A direct translation of a hierarchical database into relations would represent repeating groups as nested relations. Thus normalization is defined as eliminating nested relations and instead represent the one-to-many relationship through foreign keys. {{r|Codd 1970|p=381}}
* Simplifies the data language, since any data item can be identified just by relation name, attribute name and key. Supporting nested relations would require a more complex language with support for hierarchical data paths in order to address nested data items.
* Representing relationships using foreign keys is more flexible, where a hierarchical model only can represent one-to many relationships.
* Since locating data items is not directly coupled to the parent-child hierarchy, the database is more resilient to structural changes over time.
* Makes further normalization levels possible which eliminate data redundancy and anomalies.


== Drawbacks and criticism ==
Codd distinguishes between "atomic" and "compound" data. Atomic (or "nondecomposable") data includes basic types such as numbers and [[String (computer science)|strings]] – broadly speaking, it "''cannot'' be decomposed into smaller pieces by the DBMS (excluding certain special functions)". Compound data is made up of structures such as relations (or ''[[Software:Table (database)|tables]]'', in [[SQL]]) which contain several pieces of atomic data and thus "''can'' be decomposed by the DBMS".<ref name="Codd 1990">{{Cite book |last=Codd |first=E. F. |title=The relational model for database management: version 2 |publisher=Addison-Wesley |isbn=978-0-201-14192-4 |publication-date=1 January 1990}}</ref>{{rp|page=6}}


* Performance for certain operations. In a hierarchical model, nested records are physically stored after the parent record, which means a whole sub-tree can be retrieved in a single read operation. In a 1NF form, it will require a join operation per record type, which can be costly, especially for complex trees. For this reason document databases eschew 1NF.
In a relation, each attribute (or [[Software:Column (database)|''column'']]) has a set of allowed values known as its [[Attribute domain|domain]] (e.g., a "Price" attribute's domain may be the set of non-negative numbers with up to 2 fractional digits). Each tuple (or [[Software:Row (database)|''row'']]) in the relation contains one value per attribute, and each must be an element in that attribute's domain. Codd distinguishes attributes which have "simple domains" containing only atomic data from attributes with "nonsimple domains" containing at least some forms of compound data.{{r|Codd 1970}}{{rp|pages=380}} Nonsimple domains introduce a degree of structural complexity which can be difficult to navigate, to query and to update – for instance, it will be time-consuming to operate across several nested relations (that is, tables containing further tables), which can be found in some non-relational databases.
* [[Object-oriented language|Object-oriented languages]] represent runtime state as trees or [[Directed graph|directed graphs]] of objects connected by pointers or references. This does not map cleanly to a 1NF relational database, a problem sometimes called the Object-Relational Impedance Mismatch and which ORM libraries try to bridge.
* 1NF has been interpreted as not allowing complex data types for values. This is open to interpretation though, and C.J.Date has argued that values can be arbitrarily complex objects.{{citation-needed|date=June 2023}}


== History ==
First normal form therefore requires all attribute domains to be ''simple'' domains, such that the data in each field is atomic and no relation has relation-valued attributes. Precisely, Codd states that, in the relational model, "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."<ref name="Codd 1990" />{{rp|page=6}} Normalization to 1NF is thus a process of eliminating nonsimple domains from all relations.
 
First normal form was introduced by E.F. Codd in the paper "A Relational Model of Data for Large Shared Data Banks", although it was initially just called "Normal Form". It was renamed to "First Normal Form" when additional normal forms were introduced in the paper Further Normalization of the Relational Model <ref>Codd, E. F. (1971). Further Normalization of the Relational Model. Courant Computer Science Symposium 6 in Data Base Systems edited by Rustin, R.</ref>


==Examples==
==Examples==
The following scenarios first illustrate how a database design might violate first normal form, followed by examples that comply.
===Design that violates 1NF===
This table of customers' credit card transactions does not conform to first normal form, as each customer corresponds to a repeating group of transactions. Such a design can be represented in a hierarchical database, but not in an SQL database, since SQL does not support nested tables.


===Designs that violate 1NF===
{{Table alignment}}
This table over customers' credit card transactions does not conform to first normal form:
{| class="wikitable col1right"
 
|+ Customer
{| class="wikitable"
! <u>CustomerID</u> !! Name !! Transactions
! Customer !! Customer ID !! Transactions
|-
|-
| Abraham || 1
| 1 || Abraham
||
||
{| class="wikitable"
{{Table alignment}}
! Transaction ID !! Date !! Amount
{| class="wikitable col1right col3right"
! <u>TransactionID</u> !! Date !! Amount
|-
|-
| 12890
| 12890 || 2003-10-14 || &minus;87
| 14-Oct-2003
|&minus;87
|-
|-
|12904
| 12904 || 2003-10-15 || &minus;50
|15-Oct-2003
|&minus;50
|}
|}
|-
|-
| Isaac || 2
| 2 || Isaac
||  
||
{| class="wikitable"
{{Table alignment}}
! Transaction ID !! Date !! Amount
{| class="wikitable col1right col3right"
! <u>TransactionID</u> !! Date !! Amount
|-
|-
| 12898
| 12898 || 2003-10-14 || &minus;21
| 14-Oct-2003
|&minus;21
|}
|}
|-
|-
| Jacob || 3
| 3 || Jacob
||  
||
{| class="wikitable"
{{Table alignment}}
! Transaction ID !! Date !! Amount
{| class="wikitable col1right col3right"
! <u>TransactionID</u> !! Date !! Amount
|-
|-
| 12907
| 12907 || 2003-10-15 || &minus;18
| 15-Oct-2003
| &minus;18
|-
|-
| 14920
| 14920 || 2003-11-20 || &minus;70
| 20-Nov-2003
| &minus;70
|-
|-
| 15003
| 15003 || 2003-11-27 || &minus;60
| 27-Nov-2003
| &minus;60
|}
|}
|}
|}


To each customer corresponds a 'repeating group' of transactions.
The evaluation of any query relating to customers' transactions would broadly involve two stages:
Such a design can be represented in a [[Hierarchical database model|hierarchical database]] but not a SQL database, since SQL does not support nested tables.
# unpacking one or more customers' groups of transactions, allowing the individual transactions in a group to be examined, and
 
# deriving a query result from the results of the first stage.
The automated evaluation of any query relating to customers' transactions would broadly involve two stages:
# Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and
# Deriving a query result based on the results of the first stage
 
For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the system would have to know that it must first unpack the ''Transactions'' group of each customer, then sum the ''Amounts'' of all transactions thus obtained where the ''Date'' of the transaction falls in October 2003.


One of Codd's important insights was that structural complexity can be reduced. Reduced structural complexity gives users, applications, and DBMSs more power and flexibility to formulate and evaluate the queries. A more normalized equivalent of the structure above might look like this:
For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the database management system (DMBS) would have to first unpack the Transactions field of each customer, then sum the Amount of each transaction thus obtained where the Date of the transaction falls in October 2003.


===Designs that comply with 1NF===
===Design that complies with 1NF===
Codd described how a database like this could be made less structurally complex and more flexible by transforming it into a relational database in first normal form. To normalize the table so it complies with first normal form, attributes with nonsimple domains must be extracted to separate, stand-alone relations. Each extracted relation gains a [[Foreign key|foreign key]] referencing the [[Primary key|primary key]] of the relation which initially contained it. This process can be applied recursively to nonsimple domains nested in multiple levels (i.e., domains containing tables within tables within tables, and so on).{{r|Codd 1970}}{{rp|pages=380&ndash;381}}


To bring the model into the first normal form, we can perform normalization. Normalization (to first normal form) is a process where attributes with non-simple domains are extracted to separate stand-alone relations. The extracted relations are amended with foreign keys referring to the primary key of the relation which contained it. The process can be applied recursively to non-simple domains nested in multiple levels.<ref>Codd, E.F (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. Classics. 13 (6): 377–87. p. 381</ref>
In this example, CustomerID is the primary key of the containing relation and will therefore be appended as a foreign key to the new relation:


In this example, ''Customer ID'' is the primary key of the containing relations and will therefore be appended as foreign key to the new relation:
{{Col-float}}
 
{{Col-float-break|style=margin-right: 20px;}}
{| class="wikitable"
{{Table alignment}}
{| class="wikitable col1right"
|+ Customer
! <u>CustomerID</u> !! Name
|-
|-
! Customer !! Customer ID
| 1 || Abraham
|-
|-
| Abraham|| 1
| 2 || Isaac
|-
|-
| Isaac || 2
| 3 || Jacob  
|-
| Jacob || 3
|}
|}
 
{{Col-float-break}}
{| class="wikitable"
{{Table alignment}}
|-
{| class="wikitable col1right col2right col4right"
! Customer ID !! Transaction ID !! Date !! Amount
|+ Transaction
! <u>CustomerID</u> !! <u>TransactionID</u> !! Date !! Amount
|-
|-
| 1 || 12890 || 14-Oct-2003 || &minus;87
| 1 || 12890 || 2003-10-14 || &minus;87
|-
|-
| 1 || 12904 || 15-Oct-2003 || &minus;50
| 1 || 12904 || 2003-10-15 || &minus;50
|-
|-
| 2 || 12898 || 14-Oct-2003 || &minus;21
| 2 || 12898 || 2003-10-14 || &minus;21
|-
|-
| 3 || 12907 || 15-Oct-2003 || &minus;18
| 3 || 12907 || 2003-10-15 || &minus;18
|-
|-
| 3 || 14920 || 20-Nov-2003 || &minus;70
| 3 || 14920 || 2003-11-20 || &minus;70
|-
|-
| 3 || 15003 || 27-Nov-2003 || &minus;60
| 3 || 15003 || 2003-11-27 || &minus;60
|}
|}
{{Col-float-end}}
In this modified design, the primary key is {CustomerID} in the first relation and {CustomerID, TransactionID} in the second relation.
Now that a single, "top-level" relation contains all transactions, it will be simpler to run queries on the database. To find the monetary sum of all October transactions, the DMBS would simply find all rows with a Date falling in October and sum the Amount fields. All values are now easily exposed to the DBMS, whereas previously some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself well to general-purpose query processing, whereas the unnormalized design does not.
It is worth noting that the revised design also meets the additional requirements for [[Second normal form|second]] and [[Third normal form|third normal form]].


In the modified structure, the [[Primary key|primary key]] is {Customer ID} in the first relation, {Customer ID, Transaction ID} in the second relation.
== Rationale ==


Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest, simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in queries; whereas in the previous situation some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the unnormalized design does not.
Normalization to 1NF is the major theoretical component of transferring a database to the [[Relational model|relational model]]. Use of a relational database in 1NF brings certain advantages:


It is worth noting that this design meets the additional requirements for [[Second normal form|second]] and [[Third normal form|third normal form]].
* It enables data to be stored in regular two-dimensional arrays; supporting nested relations would require more complex data structures.{{r|Codd 1970}}{{rp|page=381}}
* It allows for the use of a simpler [[Query language|query language]], like [[SQL]], since any data item can be identified using only a relation name, attribute name and key; addressing nested data items would require a more complex language with support for hierarchical data paths.
* Representing relationships using foreign keys is more flexible and allows for features such as [[Many-to-many (data model)|many-to-many]] relationships, while a hierarchical model can represent only [[One-to-one (data model)|one-to-one]] or [[One-to-many (data model)|one-to-many]] relationships.
* Since locating data items is not coupled to a parent–child hierarchy, a database in 1NF creates greater [[Data independence|data independence]] and is more resilient to structural changes over time.{{Clarify|date=May 2025}}
* From 1NF, further normalization becomes possible (for example to [[Second normal form|2NF]] or [[Third normal form|3NF]]), which can reduce data redundancy and anomalies.


==Atomicity==
== Controversy about compound values ==
[[Biography:Edgar F. Codd|Edgar F. Codd]]'s definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."<ref name="CoddAtmReq">Codd, E. F. ''The Relational Model for Database Management Version 2'' (Addison-Wesley, 1990).</ref> Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)"<ref name="CoddAtmDefn">Codd, E. F. ''The Relational Model for Database Management Version 2'' (Addison-Wesley, 1990), p. 6.</ref> meaning a column should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same column.
There is some discussion about to what extent compound or complex values other than relations (such as [[Array (data structure)|arrays]] or [[XML]] data) are permitted in 1NF. Codd states that relations are the only type of compound data allowed within the relational model (if not in attribute domains), since any additional type of compound data would add complexity without adding power; nevertheless, the model specifically allows "certain special functions" like <code>SUBSTRING</code> to decompose values otherwise considered atomic.{{r|Codd 1990}}{{rp|page=6,340}}


Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.<ref name="Darwen">Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen, ''Relational Database Writings 1989-1991'' (Addison-Wesley, 1992).</ref><ref>{{cite book |last=Date |first=C. J. |date=2007 |title=What First Normal Form Really Means |work=Date on Database: Writings 2000–2006 |publisher=Apress |isbn=978-1-4842-2029-0 |page=108 |quote='[F]or many years,' writes Date, 'I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through my writings, seminars, and other presentations.'}}</ref> In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
Hugh Darwen and Christopher J. Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.<ref>Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen, ''Relational Database Writings 1989-1991'' (Addison-Wesley, 1992).</ref><ref>{{cite book |last=Date |first=C. J. |chapter=Chapter 8: What First Normal Form Really Means |date=2007 |title=Date on Database: Writings 2000–2006 |publisher=Apress |isbn=978-1-4842-2029-0 |page=108 |quote='[F]or many years,' writes Date, 'I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through my writings, seminars, and other presentations.'}}</ref> In particular, the notion of an atomic value as a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
*A [[String (computer science)|string]] would seem not to be atomic, as an RDBMS typically provides operators to decompose it into [[Substring|substring]]s.
*A [[Fixed-point arithmetic|fixed-point]] number would seem not to be atomic, as an RDBMS typically provides operators to decompose it into integer and fractional components.
* An [[ISBN]] would seem not to be atomic, as it includes various parts, including the ''registration group'', ''registrant'' and ''publication'' elements.
Date suggests that "the notion of atomicity ''has no absolute meaning''":<ref name="Date 2007">{{cite book |last=Date| first=C. J. |chapter=Chapter 8: What First Normal Form Really Means |date=2007 |title=Date on Database: Writings 2000–2006 |publisher=Apress |isbn=978-1-4842-2029-0}}</ref>{{rp|page=112}}<ref>{{cite book |last=Date |first=C. J. |author-link=Christopher J. Date |url=https://books.google.com/books?id=BCjkCgAAQBAJ&pg=PA50 |title=SQL and Relational Theory: How to Write Accurate SQL Code |date=6 November 2015 |publisher=O'Reilly Media |isbn=978-1-4919-4115-7 |pages=50– |access-date=31 October 2018}}</ref>{{Pages needed|date=May 2025}} a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns containing any conceivable data type (from strings and numeric types to arrays and tables) are then acceptable in a 1NF table, although perhaps not always desirable – for example, it may be desirable to separate a CustomerName column into two columns, FirstName and Surname.


*A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings.
==Christopher J. Date's definition of 1NF==
*A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.
{{Importance section|date=May 2025}}
* An [[ISBN]] would seem not to be atomic, as it includes language and publisher identifier.
According to Christopher J. Date's definition, a table is in first normal form if and only if it is "[[Isomorphism|isomorphic]] to some relation", which means, specifically, that it satisfies the following five conditions:{{r|Date 2007}}{{rp|pages=127&ndash;128}}
Date suggests that "the notion of atomicity ''has no absolute meaning''":<ref>{{cite book |last=Date |first=C. J. |date=2007 |title=What First Normal Form Really Means |work=Date on Database: Writings 2000–2006 |publisher=Apress |isbn=978-1-4842-2029-0 |page=112 }}</ref><ref name="Date2015">{{cite book |last=Date |first=C. J. |date=6 November 2015 |title=SQL and Relational Theory: How to Write Accurate SQL Code |publisher=O'Reilly Media |isbn=978-1-4919-4115-7 |pages=50– |url=https://books.google.com/books?id=BCjkCgAAQBAJ&pg=PA50 |access-date=31 October 2018}}</ref> a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to [[Array data structure|array]] types and table types) are then acceptable in a 1NF table—although perhaps not always desirable; for example, it may be more desirable to separate a Customer Name column into two separate columns as First Name, Surname.


==1NF tables as representations of relations==
# There is no specific top-to-bottom ordering of the rows.
According to Date's definition, a table is in first normal form if and only if it is "[[Isomorphism|isomorphic]] to some relation", which means, specifically, that it satisfies the following five conditions:<ref>{{cite book |last=Date |first=C. J. |date=2007 |title=What First Normal Form Really Means |work=Date on Database: Writings 2000–2006 |publisher=Apress |isbn=978-1-4842-2029-0 |pages=127–128 }}</ref>
# There is no specific left-to-right ordering of the columns.
# There are no duplicate rows.
# Every field (or intersection of a row and a column) contains exactly one value from the applicable domain and nothing else.
# All columns are regular (i.e., rows have no hidden components such as row IDs, object IDs, or hidden timestamps).


{{quote|
#There's no top-to-bottom ordering to the rows.
#There's no left-to-right ordering to the columns.
#There are no duplicate rows.
#Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
#All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
}}
Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.
Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.


Examples of tables (or views) that would not meet this definition of first normal form are:
This definition of 1NF permits relation-valued attributes (tables within tables), which Date argues are useful in rare cases.{{r|Date 2007}}{{rp|pages=121&ndash;126}} Examples of tables (or views) that would not meet this definition of first normal form are:
 
*A table that lacks a [[Unique key|unique key]] constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
*A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view, in violation of condition 1. The [[Tuple|tuple]]s in true relations are not ordered with respect to each other (such views cannot be created using [[SQL]] that conforms to the [[SQL:2003|2003]] standard).
*A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial; it marks an important departure from Codd's later vision of the [[Relational model|relational model]],<ref>{{cite book |last=Date |first=C. J. |year=2009 |title=SQL and Relational Theory |publisher=O'Reilly |chapter=Appendix A.2 |quote=Codd first defined the relational model in 1969 and didn't introduce nulls until 1979}}</ref> which made explicit provision for nulls.<ref>{{cite magazine |last=Date |first=C. J. |author-link=Christopher J. Date |date=14 October 1985 |title=Is Your DBMS Really Relational? |magazine=Computerworld |quote=Null values ... [must be] supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.}} (the third of Codd's 12 rules)</ref>


*A table that lacks a unique key constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
==See also==
*A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view. (Such views cannot be created using [[SQL]] that conforms to the [[SQL:2003|2003]] standard.) This violates condition 1. The [[Tuple|tuple]]s in true relations are not ordered with respect to each other.
*[[Attribute–value system]]
*A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain.  This aspect of condition 4 is controversial. It marks an important departure from [[Biography:Edgar F. Codd|Codd]]'s later vision of the [[Relational model|relational model]],<ref>{{cite book |last=Date |first=C. J. |year=2009 |title=SQL and Relational Theory |publisher=O'Reilly |chapter=Appendix A.2 |quote=Codd first defined the relational model in 1969 and didn't introduce nulls until 1979}}</ref> which made explicit provision for nulls.<ref>{{cite magazine |last=Date |first=C. J. |author-link=Christopher J. Date |date=October 14, 1985 |title=Is Your DBMS Really Relational? |magazine=Computerworld |quote=Null values ... [must be] supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.}} (the third of Codd's 12 rules)</ref> First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables).  Date argues that relation-valued attributes, by means of which a column within a table can contain a table, are useful in rare cases.<ref>{{cite book |last=Date |first=C. J. |author-link=Christopher J. Date |date=2007 |title=What First Normal Form Really Means |work=Date on Database: Writings 2000–2006 |publisher=Apress |isbn=978-1-4842-2029-0 |pages=121–126 }}</ref>
*[[Second normal form]] (2NF)
*[[Third normal form]] (3NF)
*[[Boyce–Codd normal form]] (BCNF or 3.5NF)
*[[Fourth normal form]] (4NF)
*[[Fifth normal form]] (5NF)
*[[Sixth normal form]] (6NF)


==References==
==References==
Line 169: Line 169:
* Date, C. J., & Lorentzos, N., & Darwen, H. (2002). ''[https://archive.today/20121209052842/http://www.elsevier.com/wps/product/cws_home/680662 Temporal Data & the Relational Model]'' (1st ed.). Morgan Kaufmann. {{ISBN|1-55860-855-9}}.
* Date, C. J., & Lorentzos, N., & Darwen, H. (2002). ''[https://archive.today/20121209052842/http://www.elsevier.com/wps/product/cws_home/680662 Temporal Data & the Relational Model]'' (1st ed.). Morgan Kaufmann. {{ISBN|1-55860-855-9}}.
* Date, C. J. (1999), ''[https://web.archive.org/web/20050404010227/http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html An Introduction to Database Systems]'' (8th ed.). Addison-Wesley Longman. {{ISBN|0-321-19784-4}}.
* Date, C. J. (1999), ''[https://web.archive.org/web/20050404010227/http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html An Introduction to Database Systems]'' (8th ed.). Addison-Wesley Longman. {{ISBN|0-321-19784-4}}.
* Kent, W. (1983) ''[http://www.bkent.net/Doc/simple5.htm A Simple Guide to Five Normal Forms in Relational Database Theory]'', Communications of the ACM, vol. 26, pp.&nbsp;120–125
* Kent, W. (1983) ''[http://www.bkent.net/Doc/simple5.htm A Simple Guide to Five Normal Forms in Relational Database Theory]'', ''Communications of the ACM'', vol. 26, p.&nbsp;120–125.
* Codd, E.F. (1970). A Relational Model of Data for. Large Shared Data Banks. IBM Research Laboratory, San Jose, California.
* Codd, E. F. (1971). Further Normalization of the Relational Model. Courant Computer Science Symposium 6 in Data Base Systems edited by Rustin, R.
{{Refend}}
{{Refend}}



Latest revision as of 00:43, 23 May 2026

Short description: Level of database normalization

First normal form (1NF) is the most basic level of database normalization defined by English computer scientist Edgar F. Codd, the inventor of the relational database. A relation (or a table, in SQL) can be said to be in first normal form if each field is atomic, containing a single value rather than a set of values or a nested table. In other words, a relation complies with first normal form if no attribute domain (the set of values allowed in a given column) has relations as elements.[1]

Most relational database management systems, including standard SQL, do not support creating or using table-valued columns, which means most relational databases will be in first normal form by necessity. Otherwise, normalization to 1NF involves eliminating nested relations by breaking them up into separate relations associated with each other using foreign keys.[2]: 381  This process is a necessary step when moving data from a non-relational (or NoSQL) database, such as one using a hierarchical or document-oriented model, to a relational database.

A database must satisfy 1NF to satisfy further "normal forms", such as 2NF and 3NF, which enable the reduction of redundancy and anomalies. Other benefits of adopting 1NF include the introduction of increased data independence and flexibility (including features like many-to-many relationships) and simplification of the relational algebra and query language necessary to describe operations on the database.

Codd considered 1NF mandatory for relational databases, while the other normal forms were merely guidelines for database design.[3]: 439 

Background

First normal form was introduced in 1970 by Edgar F. Codd in his paper "A relational model of data for large shared data banks",[2] although initially it was simply referred to as "normalization" or "normal form". It was renamed to "first normal form" when Codd introduced additional normal forms in his paper "Further Normalization of the Data Base Relational Model" in 1971.[4]

The relational model was proposed as an improvement over hierarchical databases which were prevalent at the time.[2]: 377  A key difference lies in how relationships between records are represented. In a hierarchical database, one-to-many relationships are represented through containment: a single record may contain sets of records (known as repeating groups) as attribute values. But Codd argued that hierarchy is not flexible and expressive enough for more complex data models. For example, many-to-many relationships cannot be represented through hierarchy.[2]: 378  Thus he suggest eliminating nested records and instead represent relationship through foreign keys. This allows richer relationships to be expressed, since a record can now participate in multiple relationships.[2]: 378 

A direct translation of a hierarchical database into relations would represent repeating groups as nested relations. Thus normalization is defined as eliminating nested relations and instead represent the one-to-many relationship through foreign keys. [2]: 381 

Codd distinguishes between "atomic" and "compound" data. Atomic (or "nondecomposable") data includes basic types such as numbers and strings – broadly speaking, it "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)". Compound data is made up of structures such as relations (or tables, in SQL) which contain several pieces of atomic data and thus "can be decomposed by the DBMS".[5]: 6 

In a relation, each attribute (or column) has a set of allowed values known as its domain (e.g., a "Price" attribute's domain may be the set of non-negative numbers with up to 2 fractional digits). Each tuple (or row) in the relation contains one value per attribute, and each must be an element in that attribute's domain. Codd distinguishes attributes which have "simple domains" containing only atomic data from attributes with "nonsimple domains" containing at least some forms of compound data.[2]: 380  Nonsimple domains introduce a degree of structural complexity which can be difficult to navigate, to query and to update – for instance, it will be time-consuming to operate across several nested relations (that is, tables containing further tables), which can be found in some non-relational databases.

First normal form therefore requires all attribute domains to be simple domains, such that the data in each field is atomic and no relation has relation-valued attributes. Precisely, Codd states that, in the relational model, "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."[5]: 6  Normalization to 1NF is thus a process of eliminating nonsimple domains from all relations.

Examples

Design that violates 1NF

This table of customers' credit card transactions does not conform to first normal form, as each customer corresponds to a repeating group of transactions. Such a design can be represented in a hierarchical database, but not in an SQL database, since SQL does not support nested tables.

Customer
CustomerID Name Transactions
1 Abraham
TransactionID Date Amount
12890 2003-10-14 −87
12904 2003-10-15 −50
2 Isaac
TransactionID Date Amount
12898 2003-10-14 −21
3 Jacob
TransactionID Date Amount
12907 2003-10-15 −18
14920 2003-11-20 −70
15003 2003-11-27 −60

The evaluation of any query relating to customers' transactions would broadly involve two stages:

  1. unpacking one or more customers' groups of transactions, allowing the individual transactions in a group to be examined, and
  2. deriving a query result from the results of the first stage.

For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the database management system (DMBS) would have to first unpack the Transactions field of each customer, then sum the Amount of each transaction thus obtained where the Date of the transaction falls in October 2003.

Design that complies with 1NF

Codd described how a database like this could be made less structurally complex and more flexible by transforming it into a relational database in first normal form. To normalize the table so it complies with first normal form, attributes with nonsimple domains must be extracted to separate, stand-alone relations. Each extracted relation gains a foreign key referencing the primary key of the relation which initially contained it. This process can be applied recursively to nonsimple domains nested in multiple levels (i.e., domains containing tables within tables within tables, and so on).[2]: 380–381 

In this example, CustomerID is the primary key of the containing relation and will therefore be appended as a foreign key to the new relation:

Customer
CustomerID Name
1 Abraham
2 Isaac
3 Jacob
Transaction
CustomerID TransactionID Date Amount
1 12890 2003-10-14 −87
1 12904 2003-10-15 −50
2 12898 2003-10-14 −21
3 12907 2003-10-15 −18
3 14920 2003-11-20 −70
3 15003 2003-11-27 −60

In this modified design, the primary key is {CustomerID} in the first relation and {CustomerID, TransactionID} in the second relation.

Now that a single, "top-level" relation contains all transactions, it will be simpler to run queries on the database. To find the monetary sum of all October transactions, the DMBS would simply find all rows with a Date falling in October and sum the Amount fields. All values are now easily exposed to the DBMS, whereas previously some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself well to general-purpose query processing, whereas the unnormalized design does not.

It is worth noting that the revised design also meets the additional requirements for second and third normal form.

Rationale

Normalization to 1NF is the major theoretical component of transferring a database to the relational model. Use of a relational database in 1NF brings certain advantages:

  • It enables data to be stored in regular two-dimensional arrays; supporting nested relations would require more complex data structures.[2]: 381 
  • It allows for the use of a simpler query language, like SQL, since any data item can be identified using only a relation name, attribute name and key; addressing nested data items would require a more complex language with support for hierarchical data paths.
  • Representing relationships using foreign keys is more flexible and allows for features such as many-to-many relationships, while a hierarchical model can represent only one-to-one or one-to-many relationships.
  • Since locating data items is not coupled to a parent–child hierarchy, a database in 1NF creates greater data independence and is more resilient to structural changes over time.[clarification needed]
  • From 1NF, further normalization becomes possible (for example to 2NF or 3NF), which can reduce data redundancy and anomalies.

Controversy about compound values

There is some discussion about to what extent compound or complex values other than relations (such as arrays or XML data) are permitted in 1NF. Codd states that relations are the only type of compound data allowed within the relational model (if not in attribute domains), since any additional type of compound data would add complexity without adding power; nevertheless, the model specifically allows "certain special functions" like SUBSTRING to decompose values otherwise considered atomic.[5]: 6,340 

Hugh Darwen and Christopher J. Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.[6][7] In particular, the notion of an atomic value as a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:

  • A string would seem not to be atomic, as an RDBMS typically provides operators to decompose it into substrings.
  • A fixed-point number would seem not to be atomic, as an RDBMS typically provides operators to decompose it into integer and fractional components.
  • An ISBN would seem not to be atomic, as it includes various parts, including the registration group, registrant and publication elements.

Date suggests that "the notion of atomicity has no absolute meaning":[8]: 112 [9][pages needed] a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns containing any conceivable data type (from strings and numeric types to arrays and tables) are then acceptable in a 1NF table, although perhaps not always desirable – for example, it may be desirable to separate a CustomerName column into two columns, FirstName and Surname.

Christopher J. Date's definition of 1NF

According to Christopher J. Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:[8]: 127–128 

  1. There is no specific top-to-bottom ordering of the rows.
  2. There is no specific left-to-right ordering of the columns.
  3. There are no duplicate rows.
  4. Every field (or intersection of a row and a column) contains exactly one value from the applicable domain and nothing else.
  5. All columns are regular (i.e., rows have no hidden components such as row IDs, object IDs, or hidden timestamps).

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.

This definition of 1NF permits relation-valued attributes (tables within tables), which Date argues are useful in rare cases.[8]: 121–126  Examples of tables (or views) that would not meet this definition of first normal form are:

  • A table that lacks a unique key constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
  • A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view, in violation of condition 1. The tuples in true relations are not ordered with respect to each other (such views cannot be created using SQL that conforms to the 2003 standard).
  • A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial; it marks an important departure from Codd's later vision of the relational model,[10] which made explicit provision for nulls.[11]

See also

References

  1. Codd, E. F. (1972). "Further Normalization of the Data Base Relational Model". p. 27
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Codd, E. F. (1970). "A relational model of data for large shared data banks". Communications of the ACM 13 (6): 377–387. doi:10.1145/362384.362685. 
  3. Codd, E. F. (1979). "Extending the database relational model to capture more meaning". ACM Transactions on Database Systems 4 (4): 397–434. doi:10.1145/320107.320109. 
  4. Codd, E. F. (1971). "Further Normalization of the Data Base Relational Model". Data Base Systems. Courant Computer Science Symposium 6 edited by Rustin, R.
  5. 5.0 5.1 5.2 Codd, E. F. (1 January 1990). The relational model for database management: version 2. Addison-Wesley. ISBN 978-0-201-14192-4. 
  6. Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen, Relational Database Writings 1989-1991 (Addison-Wesley, 1992).
  7. Date, C. J. (2007). "Chapter 8: What First Normal Form Really Means". Date on Database: Writings 2000–2006. Apress. p. 108. ISBN 978-1-4842-2029-0. "'[F]or many years,' writes Date, 'I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through my writings, seminars, and other presentations.'" 
  8. 8.0 8.1 8.2 Date, C. J. (2007). "Chapter 8: What First Normal Form Really Means". Date on Database: Writings 2000–2006. Apress. ISBN 978-1-4842-2029-0. 
  9. Date, C. J. (6 November 2015). SQL and Relational Theory: How to Write Accurate SQL Code. O'Reilly Media. pp. 50–. ISBN 978-1-4919-4115-7. https://books.google.com/books?id=BCjkCgAAQBAJ&pg=PA50. Retrieved 31 October 2018. 
  10. Date, C. J. (2009). "Appendix A.2". SQL and Relational Theory. O'Reilly. "Codd first defined the relational model in 1969 and didn't introduce nulls until 1979" 
  11. Date, C. J. (14 October 1985). "Is Your DBMS Really Relational?". Computerworld. "Null values ... [must be] supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.".  (the third of Codd's 12 rules)

Further reading

  • Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
  • Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
  • Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, p. 120–125.

de:Normalisierung (Datenbank)#Erste Normalform (1NF) pl:Postać normalna (bazy danych)