[[File:Response surface metodology.jpg|thumb|Design of experiments with full factorial design (left), response surface with second-degree polynomial (right)]]
[[File:Response surface metodology.jpg|thumb|Design of experiments with full [[factorial design]] (left), response surface with second-degree polynomial (right)]]
The '''design of experiments''' ('''DOE''' or '''DOX'''), also known as '''experiment design''' or '''experimental design''', is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of [[Social:Quasi-experiment|quasi-experiment]]s, in which [[Naturalistic observation|natural]] conditions that influence the variation are selected for observation.
The '''design of experiments''' ('''DOE'''),<ref>{{cite web |title=What Is Design of Experiments (DOE)? |url=https://asq.org/quality-resources/design-of-experiments?srsltid=AfmBOoqGNe13QlU1WGcx1ABznp_0sVoAdwVX3jHd_Hq_a9iaqVTQ9p1u |website=asq.org |publisher=American Society for Quality |access-date=20 February 2025}}</ref> also known as '''experiment design''' or '''experimental design''', is the design of any task that aims to describe and explain the variation of information under conditions that are [[Hypothesis|hypothesized]] to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of [[Social:Quasi-experiment|quasi-experiment]]s, in which [[Naturalistic observation|natural]] conditions that influence the variation are selected for observation.
In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, which is represented by one or more [[Dependent and independent variables|independent variables]], also referred to as "input variables" or "predictor variables." The change in one or more independent variables is generally hypothesized to result in a change in one or more [[Dependent and independent variables|dependent variables]], also referred to as "output variables" or "response variables." The experimental design may also identify [[Controlling for a variable|control variables]] that must be held constant to prevent external factors from affecting the results. Experimental design involves not only the selection of suitable independent, dependent, and control variables, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources. There are multiple approaches for determining the set of design points (unique combinations of the settings of the independent variables) to be used in the experiment.
In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, which is represented by one or more [[Dependent and independent variables|independent variables]], also referred to as "input variables" or "predictor variables." The change in one or more independent variables is generally hypothesized to result in a change in one or more [[Dependent and independent variables|dependent variables]], also referred to as "output variables" or "response variables." The experimental design may also identify [[Control variable|control variable]]s that must be held constant to prevent external factors from affecting the results. Experimental design involves not only the selection of suitable independent, dependent, and control variables, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources. There are multiple approaches for determining the set of design points (unique combinations of the settings of the independent variables) to be used in the experiment.
Main concerns in experimental design include the establishment of [[Validity (statistics)|validity]], [[Reliability (statistics)|reliability]], and [[Reproducibility|replicability]]. For example, these concerns can be partially addressed by carefully choosing the independent variable, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and [[Sensitivity and specificity|sensitivity]].
Main concerns in experimental design include the establishment of [[Validity (statistics)|validity]], [[Reliability (statistics)|reliability]], and [[Reproducibility|replicability]]. For example, these concerns can be partially addressed by carefully choosing the independent variable, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and [[Sensitivity and specificity|sensitivity]].
Line 14:
Line 15:
===Statistical experiments, following Charles S. Peirce===
===Statistical experiments, following Charles S. Peirce===
{{See also|Randomization}}
A theory of [[Statistical inference|statistical inference]] was developed by [[Biography:Charles Sanders Peirce|Charles S. Peirce]] in "Illustrations of the Logic of Science" (1877–1878)<ref>Peirce, Charles Sanders (1887). "Illustrations of the Logic of Science". Open Court (10 June 2014). {{ISBN|0812698495}}.</ref> and "A Theory of Probable Inference" (1883),<ref>Peirce, Charles Sanders (1883). "A Theory of Probable Inference". In C. S. Peirce (Ed.), Studies in logic by members of the Johns Hopkins University (p. 126–181). Little, Brown and Co (1883)</ref> two publications that emphasized the importance of randomization-based inference in statistics.<ref name=Stigler78>{{cite journal |last1=Stigler |first1=Stephen M. |year=1978 |title=Mathematical statistics in the early States |url=http://projecteuclid.org/euclid.aos/1176344123 |journal=Annals of Statistics |volume=6 |issue= 2|pages=239–65 [248] |quote="Indeed, Pierce's work contains one of the earliest explicit endorsements of mathematical randomization as a basis for inference of which I am aware (Peirce, 1957, pages 216–219" | doi=10.1214/aos/1176344123 |jstor=2958876 |mr=483118|doi-access=free }}</ref>
A theory of [[Statistical inference|statistical inference]] was developed by [[Biography:Charles Sanders Peirce|Charles S. Peirce]] in "Illustrations of the Logic of Science" (1877–1878)<ref>Peirce, Charles Sanders (1887). "Illustrations of the Logic of Science". Open Court (10 June 2014). {{ISBN|0812698495}}.</ref> and "A Theory of Probable Inference" (1883),<ref>Peirce, Charles Sanders (1883). "A Theory of Probable Inference". In C. S. Peirce (Ed.), Studies in logic by members of the Johns Hopkins University (p. 126–181). Little, Brown and Co (1883)</ref> two publications that emphasized the importance of randomization-based inference in statistics.<ref name=Stigler78>{{cite journal |last1=Stigler |first1=Stephen M. |year=1978 |title=Mathematical statistics in the early States |url=http://projecteuclid.org/euclid.aos/1176344123 |journal=Annals of Statistics |volume=6 |issue= 2|pages=239–65 [248] |quote="Indeed, Pierce's work contains one of the earliest explicit endorsements of mathematical randomization as a basis for inference of which I am aware (Peirce, 1957, pages 216–219" | doi=10.1214/aos/1176344123 |jstor=2958876 |mr=483118|doi-access=free }}</ref>
====Randomized experiments====
====Randomized experiments====
{{Main|Random assignment}}
{{Main|Random assignment}}
{{See also|Repeated measures design}}
Charles S. Peirce randomly assigned volunteers to a blinded, [[Repeated measures design|repeated-measures design]] to evaluate their ability to discriminate weights.<ref name="smalldiff">{{Cite journal| last1= Peirce|first1=Charles Sanders|last2=Jastrow|first2=Joseph |author-link2=Joseph Jastrow|year=1885|title=On Small Differences in Sensation|url=http://psychclassics.yorku.ca/Peirce/small-diffs.htm| journal=Memoirs of the National Academy of Sciences|volume=3|pages=73–83}}</ref><ref name="telepathy">of
Charles S. Peirce randomly assigned volunteers to a blinded, [[Repeated measures design|repeated-measures design]] to evaluate their ability to discriminate weights.<ref name="smalldiff">{{Cite journal| last1= Peirce|first1=Charles Sanders|last2=Jastrow|first2=Joseph |author-link2=Joseph Jastrow|year=1885|title=On Small Differences in Sensation|url=http://psychclassics.yorku.ca/Peirce/small-diffs.htm| journal=Memoirs of the National Academy of Sciences|volume=3|pages=73–83}}</ref><ref name="telepathy">of
{{Cite journal|first=Ian |last=Hacking| title=Telepathy: Origins of Randomization in Experimental Design|journal=Isis|issue=3|volume=79|date=September 1988 |pages=427–451|jstor=234674|mr=1013489 | doi=10.1086/354775|s2cid=52201011}}</ref><ref name="stigler">
{{Cite journal|first=Ian |last=Hacking| title=Telepathy: Origins of Randomization in Experimental Design|journal=Isis|issue=3|volume=79|date=September 1988 |pages=427–451|jstor=234674|mr=1013489 | doi=10.1086/354775|s2cid=52201011}}</ref><ref name="stigler">
Line 26:
Line 29:
====Optimal designs for regression models====
====Optimal designs for regression models====
{{Main|Response surface methodology}}
{{Main|Response surface methodology}}
{{See also|Optimal design}}
[[Biography:Charles Sanders Peirce|Charles S. Peirce]] also contributed the first English-language publication on an [[Optimal design|optimal design]] for [[Regression analysis|regression]] [[Statistical model|models]] in 1876.<ref>{{cite journal| author=Peirce, C. S. | year=1876| title=Note on the Theory of the Economy of Research | journal=Coast Survey Report | pages=197–201| author-link=Charles Sanders Peirce}}, actually published 1879, NOAA [http://docs.lib.noaa.gov/rescue/cgs/001_pdf/CSC-0025.PDF#page=222 PDF Eprint] {{Webarchive|url=https://web.archive.org/web/20170302071239/https://docs.lib.noaa.gov/rescue/cgs/001_pdf/CSC-0025.PDF#page=222 |date=2 March 2017 }}.<br /> Reprinted in ''Collected Papers'' '''7''', paragraphs 139–157, also in ''Writings'' '''4''', pp. 72–78, and in {{cite journal| author=Peirce, C. S. |date=July–August 1967
[[Biography:Charles Sanders Peirce|Charles S. Peirce]] also contributed the first English-language publication on an [[Optimal design|optimal design]] for [[Regression analysis|regression]] [[Statistical model|models]] in 1876.<ref>{{cite journal| author=Peirce, C. S. | year=1876| title=Note on the Theory of the Economy of Research | journal=Coast Survey Report | pages=197–201| author-link=Charles Sanders Peirce}}, actually published 1879, NOAA [http://docs.lib.noaa.gov/rescue/cgs/001_pdf/CSC-0025.PDF#page=222 PDF Eprint] {{Webarchive|url=https://web.archive.org/web/20170302071239/https://docs.lib.noaa.gov/rescue/cgs/001_pdf/CSC-0025.PDF#page=222 |date=2 March 2017 }}.<br /> Reprinted in ''Collected Papers'' '''7''', paragraphs 139–157, also in ''Writings'' '''4''', pp. 72–78, and in {{cite journal| author=Peirce, C. S. |date=July–August 1967
| title=Note on the Theory of the Economy of Research
| title=Note on the Theory of the Economy of Research
Line 31:
Line 35:
|volume=15 | issue=4|pages=643–648
|volume=15 | issue=4|pages=643–648
| jstor=168276|doi=10.1287/opre.15.4.643
| jstor=168276|doi=10.1287/opre.15.4.643
}}</ref> A pioneering [[Optimal design|optimal design]] for [[Polynomial regression|polynomial regression]] was suggested by Gergonne in 1815. In 1918, Kirstine Smith published optimal designs for polynomials of degree six (and less).<ref name=GL2009>{{cite journal |last1=Guttorp |first1=P. |last2=Lindgren |first2=G. |title= Karl Pearson and the Scandinavian school of statistics |journal= International Statistical Review |volume=77 |year=2009 |page=64 |doi=10.1111/j.1751-5823.2009.00069.x|citeseerx=10.1.1.368.8328 |s2cid=121294724 }}</ref><ref name="polynomials">{{Cite journal| last1= Smith| first1=Kirstine| year=1918| title=On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations.| url=https://books.google.com/books?id=UMNLAAAAYAAJ | journal=Biometrika|volume=12| issue=1–2|pages=1–85| doi=10.1093/biomet/12.1-2.1}}</ref>
}}</ref> A pioneering [[Optimal design|optimal design]] for [[Polynomial regression|polynomial regression]] was suggested by Gergonne in 1815. In 1918, Kirstine Smith published optimal designs for polynomials of degree six (and less).<ref name=GL2009>{{cite journal |last1=Guttorp |first1=P. |last2=Lindgren |first2=G. |title= Karl Pearson and the Scandinavian school of statistics |journal= International Statistical Review |volume=77 |year=2009 |page=64 |doi=10.1111/j.1751-5823.2009.00069.x|citeseerx=10.1.1.368.8328 |s2cid=121294724 }}</ref><ref name="polynomials">{{Cite journal| last1= Smith| first1=Kirstine| year=1918| title=On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations.| url=https://books.google.com/books?id=UMNLAAAAYAAJ | journal=Biometrika|volume=12| issue=1–2|pages=1–85| doi=10.1093/biomet/12.1-2.1| url-access=subscription}}</ref>
:Measurements are usually subject to variation and [[Measurement uncertainty|measurement uncertainty]]; thus they are repeated and full experiments are replicated to help identify the sources of variation, to better estimate the true effects of treatments, to further strengthen the experiment's reliability and validity, and to add to the existing knowledge of the topic.<ref>{{cite web|last=Dr. Hani|title=Replication study|url=http://www.experiment-resources.com/replication-study.html|access-date=27 October 2011|year=2009|archive-url=https://web.archive.org/web/20120602061136/http://www.experiment-resources.com/replication-study.html|archive-date=2 June 2012|url-status=dead}}</ref> However, certain conditions must be met before the replication of the experiment is commenced: the original research question has been published in a peer-reviewed journal or widely cited, the researcher is independent of the original experiment, the researcher must first try to replicate the original findings using the original data, and the write-up should state that the study conducted is a replication study that tried to follow the original study as strictly as possible.<ref>{{citation|last=Burman|first=Leonard E.|title=A call for replication studies|url=http://pfr.sagepub.com|journal=[[Finance:Public Finance Review|Public Finance Review]] | volume=38 |issue=6|access-date=27 October 2011|author2=Robert W. Reed |author3=James Alm |pages=787–793|doi=10.1177/1091142110385210|year=2010|s2cid=27838472}}</ref>
:Measurements are usually subject to variation and [[Measurement uncertainty|measurement uncertainty]]; thus they are repeated and full experiments are replicated to help identify the sources of variation, to better estimate the true effects of treatments, to further strengthen the experiment's reliability and validity, and to add to the existing knowledge of the topic.<ref>{{cite web|last=Dr. Hani|title=Replication study|url=http://www.experiment-resources.com/replication-study.html|access-date=27 October 2011|year=2009|archive-url=https://web.archive.org/web/20120602061136/http://www.experiment-resources.com/replication-study.html|archive-date=2 June 2012|url-status=dead}}</ref> However, certain conditions must be met before the replication of the experiment is commenced: the original research question has been published in a peer-reviewed journal or widely cited, the researcher is independent of the original experiment, the researcher must first try to replicate the original findings using the original data, and the write-up should state that the study conducted is a replication study that tried to follow the original study as strictly as possible.<ref>{{citation|last=Burman|first=Leonard E.|title=A call for replication studies|url=http://pfr.sagepub.com|journal=[[Finance:Public Finance Review|Public Finance Review]] | volume=38 |issue=6|access-date=27 October 2011|author2=Robert W. Reed |author3=James Alm |pages=787–793|doi=10.1177/1091142110385210|year=2010|s2cid=27838472|url-access=subscription}}</ref>
;[[Blocking (statistics)|Blocking]]
;[[Blocking (statistics)|Blocking]]
:[[File:No block vs block chart.jpg|thumb|150x150px|Blocking (right) ]]Blocking is the non-random arrangement of experimental units into groups (blocks) consisting of units that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.
:[[File:No block vs block chart.jpg|thumb|150x150px|Blocking (right)]]Blocking is the non-random arrangement of experimental units into groups (blocks) consisting of units that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.
;
;
Line 60:
Line 64:
;[[Orthogonality#Statistics.2C econometrics.2C and economics|Orthogonality]]
;[[Orthogonality#Statistics.2C econometrics.2C and economics|Orthogonality]]
[[File:Factorial Design.svg|thumb|Example of orthogonal factorial design]]
[[File:Factorial Design.svg|thumb|Example of orthogonal factorial design]]
:Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are ''T'' treatments and ''T'' – 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.
:Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are ''T'' treatments and ''T'' − 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.
;Multifactorial experiments
;Multifactorial experiments
:Use of multifactorial experiments instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible [[Interaction (statistics)|interactions]] of several factors (independent variables). Analysis of [[Experiment|experiment]] design is built on the foundation of the [[Analysis of variance|analysis of variance]], a collection of models that partition the observed variance into components, according to what factors the experiment must estimate or test.
:Use of multifactorial experiments instead of the [[One-factor-at-a-time method|one-factor-at-a-time method]]. These are efficient at evaluating the effects and possible [[Interaction (statistics)|interactions]] of several factors (independent variables). Analysis of [[Experiment|experiment]] design is built on the foundation of the [[Analysis of variance|analysis of variance]], a collection of models that partition the observed variance into components, according to what factors the experiment must estimate or test.
==Example==
==Example==
Line 76:
Line 80:
}}</ref>
}}</ref>
Weights of eight objects are measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects in the left pan and any objects in the right pan by adding calibrated weights to the lighter pan until the balance is in equilibrium. Each measurement has a random error. The average error is zero; the [[Standard deviation|standard deviation]]s of the [[Probability distribution|probability distribution]] of the errors is the same number σ on different weighings; errors on different weighings are independent. Denote the true weights by
Weights of eight objects are measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects in the left pan and any objects in the right pan by adding calibrated weights to the lighter pan until the balance is in equilibrium. Each measurement has a random error <math>\epsilon </math>. The average error is zero; the [[Standard deviation|standard deviation]]s of the [[Probability distribution|probability distribution]] of the errors is the same number σ on different weighings; errors on different weighings are independent. Denote the true weights by
We consider two different experiments with the same amount of measurements:
# Weigh each object in one pan, with the other pan empty. Let ''X''<sub>''i''</sub> be the measured weight of the object, for ''i'' = 1, ..., 8.
<ol type="A">
# Do the eight weighings according to the following schedule—a [[Weighing matrix|weighing matrix]]:
<li>
Weigh each of the eight objects individually.
:: <math>
\begin{array}{lcc}
& \text{left pan} & \text{right pan} \\
\hline
\text{1st weighing:} & 1\ & \text{(empty)} \\
\text{2st weighing:} & 2\ & \text{(empty)} \\
\text{3rd weighing:} & 3\ & \text{(empty)} \\
... & ... & ...
\end{array}
</math>
</li>
<li>
Do the eight weighings according to the following schedule:
:: <math>
:: <math>
Line 99:
Line 119:
\end{array}
\end{array}
</math>
</math>
</li>
</ol>
: Let ''Y''<sub>''i''</sub> be the measured difference for ''i'' = 1, ..., 8. Then the estimated value of the weight ''θ''<sub>1</sub> is
Let ''y''<sub>''i''</sub> be the measured difference for ''i'' = 1, ..., 8. The relationship between the true weights and experimental measurements may be represented with a [[General linear model|general linear model]], with the [[Design matrix|design matrix]] <math> W </math> having entries from <math> \{-1, 0, 1\} </math>:
:Similar estimates can be found for the weights of the other items:
The first design is represented by an [[Identity matrix|identity matrix]] while the second design is represented by an 8x8 [[Hadamard matrix]], <math> H </math>, both examples of weighing matrices.
:: <math>
The weights are typically estimated using the method of least squares. Using a weighing matrix, this is equivalent to inverting on the measurements:
The variance of the estimate ''X''<sub>1</sub> of ''θ''<sub>1</sub> is ''σ''<sup>2</sup> if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is ''σ''<sup>2</sup>/8. Thus the second experiment gives us 8 times as much precision for the estimate of a single item, and estimates all items simultaneously, with the same precision. What the second experiment achieves with eight would require 64 weighings if the items are weighed separately. However, note that the estimates for the items obtained in the second experiment have errors that correlate with each other.
A similar result follows for the remaining weight estimates. Thus, the second experiment gives us 8 times as much precision for the estimate of a single item, despite costing the same number of resources (number of weightings).
Many problems of the design of experiments involve [[Combinatorial design|combinatorial design]]s, as in this example and others.<ref name="yout_Howt"/>
Many problems of the design of experiments involve [[Combinatorial design|combinatorial design]]s, as in this example and others.<ref name="yout_Howt"/>
==Avoiding false positives==
==Avoiding false positives==
False positive conclusions, often resulting from the [[Publish or perish|pressure to publish]] or the author's own [[Confirmation bias|confirmation bias]], are an inherent hazard in many fields.<ref>{{Cite journal |last1=Forstmeier |first1=Wolfgang |last2=Wagenmakers |first2=Eric-Jan |last3=Parker |first3=Timothy H. |date=23 November 2016 |title=Detecting and avoiding likely false-positive findings – a practical guide |journal=Biological Reviews |language=en |volume=92 |issue=4 |pages=1941–1968 |doi=10.1111/brv.12315 |pmid=27879038 |s2cid=26793416 |issn=1464-7931|doi-access=free }}</ref>
{{see also|Metascience}}
False positive conclusions, often resulting from the [[Publish or perish|pressure to publish]] or the author's own [[Confirmation bias|confirmation bias]], are an inherent hazard in many fields.<ref>{{Cite journal |last1=Forstmeier |first1=Wolfgang |last2=Wagenmakers |first2=Eric-Jan |last3=Parker |first3=Timothy H. |date=23 November 2016 |title=Detecting and avoiding likely false-positive findings – a practical guide |journal=Biological Reviews |language=en |volume=92 |issue=4 |pages=1941–1968 |doi=10.1111/brv.12315 |pmid=27879038 |s2cid=26793416 |issn=1464-7931|doi-access=free |hdl=11245.1/31f84a5b-4439-4a4c-a690-6e98354199f5 |hdl-access=free }}</ref>
Use of double-blind designs can prevent biases potentially leading to false positives in the [[Data collection|data collection]] phase. When a double-blind design is used, participants are randomly assigned to experimental groups but the researcher is unaware of what participants belong to which group. Therefore, the researcher can not affect the participants' response to the intervention.<ref name=":0">{{Cite journal |last1=David |first1=Sharoon |last2=Khandhar1 |first2=Paras B. |date=July 17, 2023 |title=Double-Blind Study |url=https://www.ncbi.nlm.nih.gov/books/NBK546641/ |journal=StatPearls Publishing|pmid=31536248 }}</ref>
Use of double-blind designs can prevent [[Bias|bias]]es potentially leading to false positives in the [[Data collection|data collection]] phase. When a double-blind design is used, participants are randomly assigned to experimental groups but the researcher is unaware of what participants belong to which group. Therefore, the researcher can not affect the participants' response to the intervention.<ref name=":0">{{Cite journal |last1=David |first1=Sharoon |last2=Khandhar |first2=Paras B. |date=July 17, 2023 |title=Double-Blind Study |url=https://www.ncbi.nlm.nih.gov/books/NBK546641/ |journal=StatPearls Publishing|pmid=31536248 }}</ref>
Experimental designs with undisclosed [[Degrees of freedom|degrees of freedom]]{{Technical inline|date=August 2023}} are a problem,<ref>{{cite journal| last = Simmons| first = Joseph|author2=Leif Nelson |author3=Uri Simonsohn | title = False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant| journal = Psychological Science| volume = 22| issue = 11| pages = 1359–1366| date = November 2011| issn = 0956-7976| doi = 10.1177/0956797611417632| pmid = 22006061| doi-access = }}
Experimental designs with undisclosed [[Degrees of freedom|degrees of freedom]]{{Technical inline|date=August 2023}} are a problem,<ref>{{cite journal| last = Simmons| first = Joseph|author2=Leif Nelson |author3=Uri Simonsohn | title = False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant| journal = Psychological Science| volume = 22| issue = 11| pages = 1359–1366| date = November 2011| issn = 0956-7976| doi = 10.1177/0956797611417632| pmid = 22006061| doi-access = }}
Line 145:
Line 168:
| date=2014-06-04
| date=2014-06-04
| access-date=2014-06-12 }}
| access-date=2014-06-12 }}
</ref>
</ref>
P-hacking can be prevented by [[Preregistration (science)|preregistering]] researches, in which researchers have to send their data analysis plan to the journal they wish to publish their paper in before they even start their data collection, so no data manipulation is possible.<ref>{{Cite journal |last1=Nosek |first1=Brian A. |last2=Ebersole |first2=Charles R. |last3=DeHaven |first3=Alexander C. |last4=Mellor |first4=David T. |date=2018-03-13 |title=The preregistration revolution |journal=Proceedings of the National Academy of Sciences |language=en |volume=115 |issue=11 |pages=2600–2606 |doi=10.1073/pnas.1708274114 |issn=0027-8424 |pmc=5856500 |pmid=29531091 |bibcode=2018PNAS..115.2600N |doi-access=free }}</ref><ref>{{Cite web |title=Pre-Registering Studies – What Is It, How Do You Do It, and Why? |url=https://www.acf.hhs.gov/opre/blog/2022/08/pre-registering-studies-what-it-how-do-you-do-it-and-why |access-date=2023-08-29 |website=www.acf.hhs.gov |language=en}}</ref>
P-hacking can be prevented by [[Preregistration (science)|preregistering]] researches, in which researchers have to send their data analysis plan to the journal they wish to publish their paper in before they even start their data collection, so no data manipulation is possible.<ref>{{Cite journal |last1=Nosek |first1=Brian A. |last2=Ebersole |first2=Charles R. |last3=DeHaven |first3=Alexander C. |last4=Mellor |first4=David T. |date=2018-03-13 |title=The preregistration revolution |journal=Proceedings of the National Academy of Sciences |language=en |volume=115 |issue=11 |pages=2600–2606 |doi=10.1073/pnas.1708274114 |issn=0027-8424 |pmc=5856500 |pmid=29531091 |bibcode=2018PNAS..115.2600N |doi-access=free }}</ref><ref>{{Cite web |title=Pre-Registering Studies – What Is It, How Do You Do It, and Why? |url=https://www.acf.hhs.gov/opre/blog/2022/08/pre-registering-studies-what-it-how-do-you-do-it-and-why |archive-url=https://web.archive.org/web/20220829212456/https://www.acf.hhs.gov/opre/blog/2022/08/pre-registering-studies-what-it-how-do-you-do-it-and-why |url-status=dead |archive-date=29 August 2022 |access-date=2023-08-29 |website=www.acf.hhs.gov |language=en}}</ref>
Another way to prevent this is taking a double-blind design to the data-analysis phase, making the study triple-blind, where the data are sent to a data-analyst unrelated to the research who scrambles up the data so there is no way to know which participants belong to before they are potentially taken away as outliers.<ref name=":0" />
Another way to prevent this is taking a double-blind design to the data-analysis phase, making the study triple-blind, where the data are sent to a data-analyst unrelated to the research who scrambles up the data so there is no way to know which participants belong to before they are potentially taken away as outliers.<ref name=":0" />
Line 161:
Line 184:
==Discussion topics when setting up an experimental design==
==Discussion topics when setting up an experimental design==
An experimental design or randomized clinical trial requires careful consideration of several factors before actually doing the experiment.<ref>Ader, Mellenberg & Hand (2008) "Advising on Research Methods: A consultant's companion"</ref> An experimental design is the laying out of a detailed experimental plan in advance of doing the experiment. Some of the following topics have already been discussed in the principles of experimental design section:
An experimental design or randomized [[Engineering:Clinical trial|clinical trial]] requires careful consideration of several factors before actually doing the experiment.<ref>Ader, Mellenberg & Hand (2008) "Advising on Research Methods: A consultant's companion"</ref> An experimental design is the laying out of a detailed experimental plan in advance of doing the experiment. Some of the following topics have already been discussed in the principles of experimental design section:
# How many factors does the design have, and are the levels of these factors fixed or random?
# How many factors does the design have, and are the levels of these factors fixed or random?
Line 173:
Line 196:
# How feasible is repeated administration of the same measurement instruments to the same units at different occasions, with a post-test and follow-up tests?
# How feasible is repeated administration of the same measurement instruments to the same units at different occasions, with a post-test and follow-up tests?
# What about using a proxy pretest?
# What about using a proxy pretest?
# Are there lurking variables?
# Are there [[Confounding|confounding variables]]?
# Should the client/patient, researcher or even the analyst of the data be blind to conditions?
# Should the client/patient, researcher or even the analyst of the data be blind to conditions?
# What is the feasibility of subsequent application of different conditions to the same units?
# What is the feasibility of subsequent application of different conditions to the same units?
Line 181:
Line 204:
==Causal attributions==
==Causal attributions==
In the pure experimental design, the independent (predictor) variable is manipulated by the researcher – that is – every participant of the research is chosen randomly from the population, and each participant chosen is assigned randomly to conditions of the independent variable. Only when this is done is it possible to certify with high probability that the reason for the differences in the outcome variables are caused by the different conditions. Therefore, researchers should choose the experimental design over other design types whenever possible. However, the nature of the independent variable does not always allow for manipulation. In those cases, researchers must be aware of not certifying about causal attribution when their design doesn't allow for it. For example, in observational designs, participants are not assigned randomly to conditions, and so if there are differences found in outcome variables between conditions, it is likely that there is something other than the differences between the conditions that causes the differences in outcomes, that is – a third variable. The same goes for studies with correlational design (Adér & Mellenbergh, 2008).
In the pure experimental design, the independent (predictor) variable is manipulated by the researcher – that is – every participant of the research is chosen randomly from the population, and each participant chosen is assigned randomly to conditions of the independent variable. Only when this is done is it possible to certify with high probability that the reason for the differences in the outcome variables are caused by the different conditions. Therefore, researchers should choose the experimental design over other design types whenever possible. However, the nature of the independent variable does not always allow for manipulation. In those cases, researchers must be aware of not certifying about causal attribution when their design doesn't allow for it. For example, in observational designs, participants are not assigned randomly to conditions, and so if there are differences found in outcome variables between conditions, it is likely that there is something other than the differences between the conditions that causes the differences in outcomes, that is – a third variable. The same goes for studies with correlational design.
==Statistical control==
==Statistical control==
Line 198:
Line 221:
As with other branches of statistics, experimental design is pursued using both frequentist and [[Bayesian experimental design|Bayesian]] approaches: In evaluating statistical procedures like experimental designs, frequentist statistics studies the [[Sampling distribution|sampling distribution]] while [[Bayesian statistics]] updates a [[Bayesian probability|probability distribution]] on the parameter space.
As with other branches of statistics, experimental design is pursued using both frequentist and [[Bayesian experimental design|Bayesian]] approaches: In evaluating statistical procedures like experimental designs, frequentist statistics studies the [[Sampling distribution|sampling distribution]] while [[Bayesian statistics]] updates a [[Bayesian probability|probability distribution]] on the parameter space.
Some important contributors to the field of experimental designs are [[Biography:Charles Sanders Peirce|C. S. Peirce]], R. A. Fisher, [[Biography:Frank Yates|F. Yates]], R. C. Bose, A. C. Atkinson, [[Biography:Rosemary A. Bailey|R. A. Bailey]], D. R. Cox, G. E. P. Box, W. G. Cochran, W. T. Federer, V. V. Fedorov, A. S. Hedayat, J. Kiefer, [[Biography:Oscar Kempthorne|O. Kempthorne]], J. A. Nelder, Andrej Pázman, Friedrich Pukelsheim, [[Biography:D. Raghavarao|D. Raghavarao]], [[Biography:C. R. Rao|C. R. Rao]], Shrikhande S. S., J. N. Srivastava, William J. Studden, G. Taguchi and H. P. Wynn.<ref>{{cite book | last1 = Giri | first1 = Narayan C. | last2 = Das | first2 = M. N. | title = Design and Analysis of Experiments | publisher = Wiley | location = New York, N.Y | year = 1979 | isbn = 9780852269145 | url = https://books.google.com/books?id=-vGlnx-ZVvEC | pages=53, 159, 264 }}</ref>
Some important contributors to the field of experimental designs are [[Biography:Charles Sanders Peirce|C. S. Peirce]], R. A. Fisher, [[Biography:Frank Yates|F. Yates]], R. C. Bose, A. C. Atkinson, [[Biography:Rosemary A. Bailey|R. A. Bailey]], D. R. Cox, G. E. P. Box, W. G. Cochran, [[Biography:Walter T. Federer|Walter T. Federer]], V. V. Fedorov, A. S. Hedayat, J. Kiefer, [[Biography:Oscar Kempthorne|O. Kempthorne]], J. A. Nelder, Andrej Pázman, Friedrich Pukelsheim, [[Biography:D. Raghavarao|D. Raghavarao]], [[Biography:C. R. Rao|C. R. Rao]], Shrikhande S. S., J. N. Srivastava, William J. Studden, G. Taguchi and H. P. Wynn.<ref>{{cite book | last1 = Giri | first1 = Narayan C. | last2 = Das | first2 = M. N. | title = Design and Analysis of Experiments | publisher = Wiley | location = New York, N.Y | year = 1979 | isbn = 9780852269145 | url = https://books.google.com/books?id=-vGlnx-ZVvEC | pages=53, 159, 264 }}</ref>
The textbooks of D. Montgomery, R. Myers, and G. Box/W. Hunter/J.S. Hunter have reached generations of students and practitioners.
The textbooks of D. Montgomery, R. Myers, and G. Box/W. Hunter/J.S. Hunter have reached generations of students and practitioners.<ref>{{cite book | last = Montgomery | first = Douglas
<ref>{{cite book | last = Montgomery | first = Douglas
| year = 2013 | isbn = 9781118146927 }}</ref><ref>
<ref>
{{cite book
{{cite book
| last1 = Walpole | first1 = Ronald E.
| last1 = Walpole | first1 = Ronald E.
Line 215:
Line 236:
| publisher = Pearson Prentice Hall | location = Upper Saddle River, NJ
| publisher = Pearson Prentice Hall | location = Upper Saddle River, NJ
| edition = 8
| edition = 8
| year = 2007 | isbn = 978-0131877115 }}</ref>
| year = 2007 | isbn = 978-0131877115 }}</ref><ref>
<ref>
{{cite book
{{cite book
| last1 = Myers | first1 = Raymond H.
| last1 = Myers | first1 = Raymond H.
Line 225:
Line 245:
| publisher = Wiley | location = Hoboken, N.J.
| publisher = Wiley | location = Hoboken, N.J.
| edition = 2
| edition = 2
| year = 2010 | isbn = 978-0470454633 }}</ref>
| year = 2010 | isbn = 978-0470454633 }}</ref><ref>
<ref>
{{cite book | last1 = Box | first1 = George E.P. | last2 = Hunter | first2 = William G. | last3 = Hunter | first3 = J. Stuart | title = Statistics for Experimenters : An Introduction to Design, Data Analysis, and Model Building | publisher = Wiley | location = New York | year = 1978 | isbn = 978-0-471-09315-2 | url = https://archive.org/details/statisticsforexp00geor }}</ref><ref>
{{cite book | last1 = Box | first1 = George E.P. | last2 = Hunter | first2 = William G. | last3 = Hunter | first3 = J. Stuart | title = Statistics for Experimenters : An Introduction to Design, Data Analysis, and Model Building | publisher = Wiley | location = New York | year = 1978 | isbn = 978-0-471-09315-2 | url = https://archive.org/details/statisticsforexp00geor }}</ref>
<ref>
{{cite book
{{cite book
| last1 = Box | first1 = George E.P.
| last1 = Box | first1 = George E.P.
Line 236:
Line 254:
| publisher = Wiley | location = Hoboken, N.J.
| publisher = Wiley | location = Hoboken, N.J.
| edition = 2
| edition = 2
| year = 2005 | isbn = 978-0471718130 }}</ref>
| year = 2005 | isbn = 978-0471718130 }}</ref> Furthermore, there is ongoing discussion of experimental design in the context of model building for models either static or dynamic models, also known as [[System identification|system identification]].<ref>{{cite journal | last1 = Spall | first1 = J. C. | year = 2010 | title = Factorial Design for Efficient Experimentation: Generating Informative Data for System Identification | journal = IEEE Control Systems Magazine | volume = 30 | issue = 5| pages = 38–53 | doi=10.1109/MCS.2010.937677| s2cid = 45813198 }}</ref><ref>{{cite journal | last1 = Pronzato | first1 = L | year = 2008 | title = Optimal experimental design and some related control problems | journal = Automatica | volume = 44 | issue = 2| pages = 303–325 | doi=10.1016/j.automatica.2007.05.016| arxiv = 0802.4381| s2cid = 1268930 }}</ref>
Some discussion of experimental design in the context of [[System identification|system identification]] (model building for static or dynamic models) is given in<ref>{{cite journal | last1 = Spall | first1 = J. C. | year = 2010 | title = Factorial Design for Efficient Experimentation: Generating Informative Data for System Identification | journal = IEEE Control Systems Magazine | volume = 30 | issue = 5| pages = 38–53 | doi=10.1109/MCS.2010.937677| s2cid = 45813198 }}</ref> and.<ref>{{cite journal | last1 = Pronzato | first1 = L | year = 2008 | title = Optimal experimental design and some related control problems | journal = Automatica | volume = 44 | issue = 2| pages = 303–325 | doi=10.1016/j.automatica.2007.05.016| arxiv = 0802.4381| s2cid = 1268930 }}</ref>
==Human participant constraints==
==Human participant constraints==
Line 247:
Line 263:
and [[Confidentiality|confidentiality]] affecting both clinical (medical) trials and
and [[Confidentiality|confidentiality]] affecting both clinical (medical) trials and
behavioral and social science experiments.<ref>
behavioral and social science experiments.<ref>
{{cite book | last1 = Moore | first1 = David S.
{{cite book | last1 = Moore | first1 = David S. | author-link1=David S. Moore
| last2 = Notz | first2 = William I.
| last2 = Notz | first2 = William I.
| title = Statistics : concepts and controversies
| title = Statistics : concepts and controversies
Line 265:
Line 281:
==See also==
==See also==
{{Div col}}
* {{annotated link|Adversarial collaboration}}
* [[Adversarial collaboration]]
* {{annotated link|Bayesian experimental design}}
* [[Bayesian experimental design]]
* {{annotated link|Block design}}
* [[Block design]]
* {{annotated link|Box–Behnken design}}
* [[Box–Behnken design]]
* {{annotated link|Central composite design}}
* [[Central composite design]]
* {{annotated link|Medicine:Clinical study design}}
* [[Engineering:Clinical trial|Clinical trial]]
* {{annotated link|Computer experiment}}
* [[Medicine:Clinical study design|Clinical study design]]
* [[Philosophy:Royal Commission on Animal Magnetism|Royal Commission on Animal Magnetism]]
* [[Survey sampling]]
* [[System identification]]
* [[Taguchi methods]]
{{div col end}}
== References ==
== References ==
Line 326:
Line 332:
|label = Experimental design
|label = Experimental design
}}
}}
* A [http://www.itl.nist.gov/div898/handbook/pri/section1/pri1.htm chapter] from a [http://www.itl.nist.gov/div898/handbook/ "NIST/SEMATECH Handbook on Engineering Statistics"] at [[Organization:National Institute of Standards and Technology|NIST]]
* A [http://www.itl.nist.gov/div898/handbook/pri/section1/pri1.htm chapter] from a [http://www.itl.nist.gov/div898/handbook/ "NIST/SEMATECH Handbook on Engineering Statistics"] at [[National Institute of Standards and Technology|NIST]]
* [http://www.itl.nist.gov/div898/handbook/pri/section3/pri3362.htm Box–Behnken designs] from a [http://www.itl.nist.gov/div898/handbook/ "NIST/SEMATECH Handbook on Engineering Statistics"] at [[Organization:National Institute of Standards and Technology|NIST]]
* [http://www.itl.nist.gov/div898/handbook/pri/section3/pri3362.htm Box–Behnken designs] from a [http://www.itl.nist.gov/div898/handbook/ "NIST/SEMATECH Handbook on Engineering Statistics"] at [[National Institute of Standards and Technology|NIST]]
* [https://archive.org/details/OperaMagistris Detailed mathematical developments of most common DoE] in the Opera Magistris v3.6 online reference Chapter 15, section 7.4, {{ISBN|978-2-8399-0932-7}}.
Design of experiments with full factorial design (left), response surface with second-degree polynomial (right)
The design of experiments (DOE),[1] also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.
In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, which is represented by one or more independent variables, also referred to as "input variables" or "predictor variables." The change in one or more independent variables is generally hypothesized to result in a change in one or more dependent variables, also referred to as "output variables" or "response variables." The experimental design may also identify control variables that must be held constant to prevent external factors from affecting the results. Experimental design involves not only the selection of suitable independent, dependent, and control variables, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources. There are multiple approaches for determining the set of design points (unique combinations of the settings of the independent variables) to be used in the experiment.
Main concerns in experimental design include the establishment of validity, reliability, and replicability. For example, these concerns can be partially addressed by carefully choosing the independent variable, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and sensitivity.
Correctly designed experiments advance knowledge in the natural and social sciences and engineering, with design of experiments methodology recognised as a key tool in the successful implementation of a Quality by Design (QbD) framework.[2] Other applications include marketing and policy making. The study of the design of experiments is an important topic in metascience.
A theory of statistical inference was developed by Charles S. Peirce in "Illustrations of the Logic of Science" (1877–1878)[3] and "A Theory of Probable Inference" (1883),[4] two publications that emphasized the importance of randomization-based inference in statistics.[5]
Charles S. Peirce randomly assigned volunteers to a blinded, repeated-measures design to evaluate their ability to discriminate weights.[6][7][8]
Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s.[6][7][8][9]
The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, is within the scope of sequential analysis, a field that was pioneered[13] by Abraham Wald in the context of sequential tests of statistical hypotheses.[14]Herman Chernoff wrote an overview of optimal sequential designs,[15] while adaptive designs have been surveyed by S. Zacks.[16] One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit, on which early work was done by Herbert Robbins in 1952.[17]
Fisher's principles
A methodology for designing experiments was proposed by Ronald Fisher, in his innovative books: The Arrangement of Field Experiments (1926) and The Design of Experiments (1935). Much of his pioneering work dealt with agricultural applications of statistical methods. As a mundane example, he described how to test the lady tasting teahypothesis, that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. These methods have been broadly adapted in biological, psychological, and agricultural research.[18]
Comparison
In some fields of study it is not possible to have independent measurements to a traceable metrology standard. Comparisons between treatments are much more valuable and are usually preferable, and often compared against a scientific control or traditional treatment that acts as baseline.
Random assignment is the process of assigning individuals at random to groups or to different groups in an experiment, so that each individual of the population has the same chance of becoming a participant in the study. The random assignment of individuals to groups (or conditions within a group) distinguishes a rigorous, "true" experiment from an observational study or "quasi-experiment".[19] There is an extensive body of mathematical theory that explores the consequences of making the allocation of units to treatments by means of some random mechanism (such as tables of random numbers, or the use of randomization devices such as playing cards or dice). Assigning units to treatments at random tends to mitigate confounding, which makes effects due to factors other than the treatment to appear to result from the treatment.
The risks associated with random allocation (such as having a serious imbalance in a key characteristic between a treatment group and a control group) are calculable and hence can be managed down to an acceptable level by using enough experimental units. However, if the population is divided into several subpopulations that somehow differ, and the research requires each subpopulation to be equal in size, stratified sampling can be used. In that way, the units in each subpopulation are randomized, but not the whole sample. The results of an experiment can be generalized reliably from the experimental units to a larger statistical population of units only if the experimental units are a random sample from the larger population; the probable error of such an extrapolation depends on the sample size, among other things.
Measurements are usually subject to variation and measurement uncertainty; thus they are repeated and full experiments are replicated to help identify the sources of variation, to better estimate the true effects of treatments, to further strengthen the experiment's reliability and validity, and to add to the existing knowledge of the topic.[20] However, certain conditions must be met before the replication of the experiment is commenced: the original research question has been published in a peer-reviewed journal or widely cited, the researcher is independent of the original experiment, the researcher must first try to replicate the original findings using the original data, and the write-up should state that the study conducted is a replication study that tried to follow the original study as strictly as possible.[21]
Blocking (right)Blocking is the non-random arrangement of experimental units into groups (blocks) consisting of units that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.
Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are T treatments and T − 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.
Multifactorial experiments
Use of multifactorial experiments instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible interactions of several factors (independent variables). Analysis of experiment design is built on the foundation of the analysis of variance, a collection of models that partition the observed variance into components, according to what factors the experiment must estimate or test.
Weights of eight objects are measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects in the left pan and any objects in the right pan by adding calibrated weights to the lighter pan until the balance is in equilibrium. Each measurement has a random error . The average error is zero; the standard deviations of the probability distribution of the errors is the same number σ on different weighings; errors on different weighings are independent. Denote the true weights by
.
We consider two different experiments with the same amount of measurements:
Weigh each of the eight objects individually.
Do the eight weighings according to the following schedule:
Let yi be the measured difference for i = 1, ..., 8. The relationship between the true weights and experimental measurements may be represented with a general linear model, with the design matrix having entries from :
The first design is represented by an identity matrix while the second design is represented by an 8x8 Hadamard matrix, , both examples of weighing matrices.
The weights are typically estimated using the method of least squares. Using a weighing matrix, this is equivalent to inverting on the measurements:
The question of design of experiments is: which experiment is better?
Investigating estimate A vs B for the first weight:
A similar result follows for the remaining weight estimates. Thus, the second experiment gives us 8 times as much precision for the estimate of a single item, despite costing the same number of resources (number of weightings).
Many problems of the design of experiments involve combinatorial designs, as in this example and others.[24]
Use of double-blind designs can prevent biases potentially leading to false positives in the data collection phase. When a double-blind design is used, participants are randomly assigned to experimental groups but the researcher is unaware of what participants belong to which group. Therefore, the researcher can not affect the participants' response to the intervention.[26]
Experimental designs with undisclosed degrees of freedom[jargon] are a problem,[27] in that they can lead to conscious or unconscious "p-hacking": trying multiple things until you get the desired result. It typically involves the manipulation – perhaps unconsciously – of the process of statistical analysis and the degrees of freedom until they return a figure below the p<.05 level of statistical significance.[28][29]
P-hacking can be prevented by preregistering researches, in which researchers have to send their data analysis plan to the journal they wish to publish their paper in before they even start their data collection, so no data manipulation is possible.[30][31]
Another way to prevent this is taking a double-blind design to the data-analysis phase, making the study triple-blind, where the data are sent to a data-analyst unrelated to the research who scrambles up the data so there is no way to know which participants belong to before they are potentially taken away as outliers.[26]
Clear and complete documentation of the experimental methodology is also important in order to support replication of results.[32]
Discussion topics when setting up an experimental design
An experimental design or randomized clinical trial requires careful consideration of several factors before actually doing the experiment.[33] An experimental design is the laying out of a detailed experimental plan in advance of doing the experiment. Some of the following topics have already been discussed in the principles of experimental design section:
How many factors does the design have, and are the levels of these factors fixed or random?
Are control conditions needed, and what should they be?
Manipulation checks: did the manipulation really work?
What are the background variables?
What is the sample size? How many units must be collected for the experiment to be generalisable and have enough power?
What is the relevance of interactions between factors?
What is the influence of delayed effects of substantive factors on outcomes?
How do response shifts affect self-report measures?
How feasible is repeated administration of the same measurement instruments to the same units at different occasions, with a post-test and follow-up tests?
Should the client/patient, researcher or even the analyst of the data be blind to conditions?
What is the feasibility of subsequent application of different conditions to the same units?
How many of each control and noise factors should be taken into account?
The independent variable of a study often has many levels or different groups. In a true experiment, researchers can have an experimental group, which is where their intervention testing the hypothesis is implemented, and a control group, which has all the same element as the experimental group, without the interventional element. Thus, when everything else except for one intervention is held constant, researchers can certify with some certainty that this one element is what caused the observed change. In some instances, having a control group is not ethical. This is sometimes solved using two different experimental groups. In some cases, independent variables cannot be manipulated, for example when testing the difference between two groups who have a different disease, or testing the difference between genders (obviously variables that would be hard or unethical to assign participants to). In these cases, a quasi-experimental design may be used.
Causal attributions
In the pure experimental design, the independent (predictor) variable is manipulated by the researcher – that is – every participant of the research is chosen randomly from the population, and each participant chosen is assigned randomly to conditions of the independent variable. Only when this is done is it possible to certify with high probability that the reason for the differences in the outcome variables are caused by the different conditions. Therefore, researchers should choose the experimental design over other design types whenever possible. However, the nature of the independent variable does not always allow for manipulation. In those cases, researchers must be aware of not certifying about causal attribution when their design doesn't allow for it. For example, in observational designs, participants are not assigned randomly to conditions, and so if there are differences found in outcome variables between conditions, it is likely that there is something other than the differences between the conditions that causes the differences in outcomes, that is – a third variable. The same goes for studies with correlational design.
Statistical control
It is best that a process be in reasonable statistical control prior to conducting designed experiments. When this is not possible, proper blocking, replication, and randomization allow for the careful conduct of designed experiments.[34]
To control for nuisance variables, researchers institute control checks as additional measures. Investigators should ensure that uncontrolled influences (e.g., source credibility perception) do not skew the findings of the study. A manipulation check is one example of a control check. Manipulation checks allow investigators to isolate the chief variables to strengthen support that these variables are operating as planned.
One of the most important requirements of experimental research designs is the necessity of eliminating the effects of spurious, intervening, and antecedent variables. In the most basic model, cause (X) leads to effect (Y). But there could be a third variable (Z) that influences (Y), and X might not be the true cause at all. Z is said to be a spurious variable and must be controlled for. The same is true for intervening variables (a variable in between the supposed cause (X) and the effect (Y)), and anteceding variables (a variable prior to the supposed cause (X) that is the true cause). When a third variable is involved and has not been controlled for, the relation is said to be a zero order relationship. In most practical applications of experimental research designs there are several causes (X1, X2, X3). In most designs, only one of these causes is manipulated at a time.
Experimental designs after Fisher
Some efficient designs for estimating several main effects were found independently and in near succession by Raj Chandra Bose and K. Kishen in 1940 at the Indian Statistical Institute, but remained little known until the Plackett–Burman designs were published in Biometrika in 1946. About the same time, C. R. Rao introduced the concepts of orthogonal arrays as experimental designs. This concept played a central role in the development of Taguchi methods by Genichi Taguchi, which took place during his visit to Indian Statistical Institute in early 1950s. His methods were successfully applied and adopted by Japanese and Indian industries and subsequently were also embraced by US industry albeit with some reservations.
In 1950, Gertrude Mary Cox and William Gemmell Cochran published the book Experimental Designs, which became the major reference work on the design of experiments for statisticians for years afterwards.
Developments of the theory of linear models have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in linear algebra, algebra and combinatorics.
As with other branches of statistics, experimental design is pursued using both frequentist and Bayesian approaches: In evaluating statistical procedures like experimental designs, frequentist statistics studies the sampling distribution while Bayesian statistics updates a probability distribution on the parameter space.
Some important contributors to the field of experimental designs are C. S. Peirce, R. A. Fisher, F. Yates, R. C. Bose, A. C. Atkinson, R. A. Bailey, D. R. Cox, G. E. P. Box, W. G. Cochran, Walter T. Federer, V. V. Fedorov, A. S. Hedayat, J. Kiefer, O. Kempthorne, J. A. Nelder, Andrej Pázman, Friedrich Pukelsheim, D. Raghavarao, C. R. Rao, Shrikhande S. S., J. N. Srivastava, William J. Studden, G. Taguchi and H. P. Wynn.[35]
The textbooks of D. Montgomery, R. Myers, and G. Box/W. Hunter/J.S. Hunter have reached generations of students and practitioners.[36][37][38][39][40] Furthermore, there is ongoing discussion of experimental design in the context of model building for models either static or dynamic models, also known as system identification.[41][42]
Human participant constraints
Laws and ethical considerations preclude some carefully designed
experiments with human subjects. Legal constraints are dependent on
jurisdiction. Constraints may involve
institutional review boards, informed consent
and confidentiality affecting both clinical (medical) trials and
behavioral and social science experiments.[43]
In the field of toxicology, for example, experimentation is performed
on laboratory animals with the goal of defining safe exposure limits
for humans.[44] Balancing
the constraints are views from the medical field.[45] Regarding the randomization of patients,
"... if no one knows which therapy is better, there is no ethical
imperative to use one therapy or another." (p 380) Regarding
experimental design, "...it is clearly not ethical to place subjects
at risk to collect data in a poorly designed study when this situation
can be easily avoided...". (p 393)
↑Peirce, Charles Sanders (1887). "Illustrations of the Logic of Science". Open Court (10 June 2014). ISBN0812698495.
↑Peirce, Charles Sanders (1883). "A Theory of Probable Inference". In C. S. Peirce (Ed.), Studies in logic by members of the Johns Hopkins University (p. 126–181). Little, Brown and Co (1883)
↑Cite error: Invalid <ref> tag; no text was provided for refs named dehue
↑Peirce, C. S. (1876). "Note on the Theory of the Economy of Research". Coast Survey Report: 197–201., actually published 1879, NOAA PDF Eprint . Reprinted in Collected Papers7, paragraphs 139–157, also in Writings4, pp. 72–78, and in Peirce, C. S. (July–August 1967). "Note on the Theory of the Economy of Research". Operations Research15 (4): 643–648. doi:10.1287/opre.15.4.643.
↑Guttorp, P.; Lindgren, G. (2009). "Karl Pearson and the Scandinavian school of statistics". International Statistical Review77: 64. doi:10.1111/j.1751-5823.2009.00069.x.
↑Zacks, S. (1996) "Adaptive Designs for Parametric Models". In: Ghosh, S. and Rao, C. R., (Eds) (1996). "Design and Analysis of Experiments," Handbook of Statistics, Volume 13. North-Holland. ISBN0-444-82061-2. (pages 151–180)
↑Robbins, H. (1952). "Some Aspects of the Sequential Design of Experiments". Bulletin of the American Mathematical Society58 (5): 527–535. doi:10.1090/S0002-9904-1952-09620-8.
↑Miller, Geoffrey (2000). The Mating Mind: how sexual choice shaped the evolution of human nature, London: Heineman, ISBN0-434-00741-2 (also Doubleday, ISBN0-385-49516-1) "To biologists, he was an architect of the 'modern synthesis' that used mathematical models to integrate Mendelian genetics with Darwin's selection theories. To psychologists, Fisher was the inventor of various statistical tests that are still supposed to be used whenever possible in psychology journals. To farmers, Fisher was the founder of experimental agricultural research, saving millions from starvation through rational crop breeding programs." p.54.
↑Creswell, J.W. (2008), Educational research: Planning, conducting, and evaluating quantitative and qualitative research (3rd edition), Upper Saddle River, NJ: Prentice Hall. 2008, p. 300. ISBN0-13-613550-1
↑Montgomery, Douglas (2013). Design and analysis of experiments (8th ed.). Hoboken, NJ: John Wiley & Sons, Inc. ISBN9781118146927.
↑Walpole, Ronald E.; Myers, Raymond H.; Myers, Sharon L.; Ye, Keying (2007). Probability & statistics for engineers & scientists (8 ed.). Upper Saddle River, NJ: Pearson Prentice Hall. ISBN978-0131877115.
↑Myers, Raymond H.; Montgomery, Douglas C.; Vining, G. Geoffrey; Robinson, Timothy J. (2010). Generalized linear models : with applications in engineering and the sciences (2 ed.). Hoboken, N.J.: Wiley. ISBN978-0470454633.
↑Box, George E.P.; Hunter, William G.; Hunter, J. Stuart (2005). Statistics for Experimenters : Design, Innovation, and Discovery (2 ed.). Hoboken, N.J.: Wiley. ISBN978-0471718130.
↑Spall, J. C. (2010). "Factorial Design for Efficient Experimentation: Generating Informative Data for System Identification". IEEE Control Systems Magazine30 (5): 38–53. doi:10.1109/MCS.2010.937677.
↑Pronzato, L (2008). "Optimal experimental design and some related control problems". Automatica44 (2): 303–325. doi:10.1016/j.automatica.2007.05.016.
↑Moore, David S.; Notz, William I. (2006). Statistics : concepts and controversies (6th ed.). New York: W.H. Freeman. pp. Chapter 7: Data ethics. ISBN9780716786368.
Peirce, C. S. (1877–1878), "Illustrations of the Logic of Science" (series), Popular Science Monthly, vols. 12–13. Relevant individual papers:
(1878 March), "The Doctrine of Chances", Popular Science Monthly, v. 12, March issue, pp. 604–615. Internet ArchiveEprint.
(1878 April), "The Probability of Induction", Popular Science Monthly, v. 12, pp. 705–718. Internet ArchiveEprint.
(1878 June), "The Order of Nature", Popular Science Monthly, v. 13, pp. 203–217.Internet ArchiveEprint.
(1878 August), "Deduction, Induction, and Hypothesis", Popular Science Monthly, v. 13, pp. 470–482. Internet ArchiveEprint.
(1883), "A Theory of Probable Inference", Studies in Logic, pp. 126–181, Little, Brown, and Company. (Reprinted 1983, John Benjamins Publishing Company, ISBN90-272-3271-7)