Software:Stata
Stata 18 on Windows  
Original author(s)  William Gould^{[1]} 

Developer(s)  StataCorp 
Initial release  1985 
Stable release  18.0
/ April 25, 2023 
Written in  C 
Operating system  Windows, macOS, Linux 
Type  Statistical analysis Numerical analysis 
License  Proprietary 
Website  www 
Stata (/ˈsteɪtə/,^{[2]} STAYta, alternatively /ˈstætə/, occasionally stylized as STATA^{[3]}^{[4]}) is a generalpurpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including biomedicine, economics, epidemiology, and sociology.^{[5]}
Stata was initially developed by Computing Resource Center in California and the first version was released in 1985.^{[6]} In 1993, the company moved to College Station, TX and was renamed Stata Corporation, now known as StataCorp.^{[1]} A major release in 2003 included a new graphics system and dialog boxes for all commands.^{[6]} Since then, a new version has been released once every two years.^{[7]} The current version is Stata 18, released in April 2023.^{[8]}
Technical overview and terminology
User interface
From its creation, Stata has always employed an integrated commandline interface. Starting with version 8.0, Stata has included a graphical user interface based on Qt framework which uses menus and dialog boxes to give access to many builtin commands. The dataset can be viewed or edited in spreadsheet format. From version 11 on, other commands can be executed while the data browser or editor is opened.
Data structure and storage
Until the release of version 16,^{[9]} Stata could only open a single dataset at any one time. Stata allows for flexibility with assigning data types to data. Its compress
command automatically reassigns data to data types that take up less memory without loss of information. Stata utilizes integer storage types which occupy only one or two bytes rather than four, and singleprecision (4 bytes) rather than doubleprecision (8 bytes) is the default for floatingpoint numbers.
Stata's data format is always tabular in format. Stata refers to the columns of tabular data as variables.
Data format compatibility
Stata can import data in a variety of formats. This includes ASCII data formats (such as CSV or databank formats) and spreadsheet formats (including various Excel formats).
Stata's proprietary file formats have changed over time, although not every Stata release includes a new dataset format. Every version of Stata can read all older dataset formats, and can write both the current and most recent previous dataset format, using the saveold
command.^{[10]} Thus, the current Stata release can always open datasets that were created with older versions, but older versions cannot read newer format datasets.
Stata can read and write SAS XPORT format datasets natively, using the fdause and fdasave commands.
Some other econometric applications, including gretl, can directly import Stata file formats.
History
Origins
The development of Stata began in 1984, initially by William (Bill) Gould and later by Sean Becketti. The software was originally intended to compete with statistical programs for personal computers such as SYSTAT and MicroTSP.^{[6]} Stata was written, then as now, in the C programming language, initially for PCs running the DOS operating system. The first version was released in 1985 with 44 commands.^{[6]}
append  dir  infile  plot  spool 
beep  do  input  query  summarize 
by  drop  label  regress  tabulate 
capture  erase  list  rename  test 
confirm  exit  macro  replace  type 
convert  expand  merge  run  use 
correlate  format  modify  save  
count  generate  more  set  
describe  help  outfile  sort 
Development
There have been 17 major releases of Stata between 1985 and 2021, and additional code and documentation updates between major releases.^{[7]} In its early years, extra sets of Stata programs were sometimes sold as "kits" or distributed as Support Disks. With the release of Stata 6 in 1999, update
s began to be delivered to users via the web.^{[6]} The initial release of Stata was for the DOS operating system. Since then, versions of Stata have been released for systems running Unix variants like Linux distributions, Windows, and MacOS.^{[6]} All Stata files are platformindependent.
Hundreds of commands have been added to Stata in its 37year history.^{[11]}^{[12]} Certain developments have proved to be particularly important and continue to shape the user experience today, including extensibility, platform independence, and the active user community.^{[6]}
Extensibility
The program
command was implemented in Stata 1.2, giving users the ability to add their own commands.^{[6]}^{[13]} adofiles followed in Stata 2.1, allowing a userwritten program to be automatically loaded into memory. Many userwritten adofiles are submitted to the [ Statistical Software Components Archive] hosted by Boston College. StataCorp added an ssc
command to allow communitycontributed programs to be added directly within Stata.^{[14]} More recent editions of Stata allow users to call Python and R scripts using commands, as well as allowing Python IDEs like Jupyter Notebooks to import Stata commands.^{[15]}^{[16]}
User community
A number of important developments were initiated by Stata's active user community.^{[6]} The Stata Technical Bulletin, which often contains usercreated commands, was introduced in 1991 and issued six times a year. It was relaunched in 2001 as the peerreviewed Stata Journal, a quarterly publication containing descriptions of communitycontributed commands and tips for the effective use of Stata. In 1994, a listserv began as a hub for users to collaboratively solve coding and technical issues; in 2014, it was converted into a web forum. In 1995, Statacorp began organizing user and developer conferences that meet annually. Only the annual Stata Conference held in the United States is hosted by StataCorp. Other user group meetings are held annually in the United States (the Stata Conference), the UK, Germany, and Italy, and less frequently in several other countries. Local Stata distributors host User Group meetings in their own countries.
Version  Release date  Select new or enhanced features 

1.0  January 1985 

1.1  February 1985 

1.2  May 1985 

1.3  August 1985 

1.4  August 1986 

1.5  February 1987 

2.0  June 1988 

2.1  September 1990 

3.0  March 1992 

3.1  August 1993 

4.0  January 1995 

5.0  October 1996 

6.0  January 1999 

7.0  December 2000 

8.0  January 2003 

8.1  July 2003 

8.2  October 2003 

9.0  April 2005 

9.1  September 2005  
9.2  April 2006  
10.0  June 2007 

10.1  August 2008  
11.0  July 2009 

11.1  June 2010  
11.2  March 2011  
12.0  July 2011 

12.1  January 2012  
13.0  June 2013 

13.1  October 2013  
14.0  April 2015 

14.1  October 2015  
14.2  September 2016  
15.0  June 2017 

15.1  November 2017  
16.0  June 2019 

16.1  February 2020  
17.0  April 2021 

18.0  April 2023 

Software products
There are four builds of Stata: Stata/MP, Stata/SE, Stata/BE, and Numerics by Stata.^{[17]} Whereas Stata/MP allows for builtin parallel processing of certain commands, Stata/SE and Stata/BE are bottlenecked and limit usage to only one single core.^{[18]} Stata/MP runs certain commands about 2.4 times faster, roughly 60% of theoretical maximum efficiency, when running parallel processes on four CPU cores compared to SE or BE versions.^{[18]} Numerics by Stata allows for web integration of Stata commands.
SE and BE versions differ in the amount of memory datasets may utilize. Though Stata/MP can store 10 to 20 billion observations and up to 120,000 variables, Stata/SE and Stata/BE store up to 2.14 billion observations and handle 32,767 variables and 2,048 variables respectively. The maximum number of independent variables in a model is 65,532 variables in Stata/MP, 10,998 variables in Stata/SE, and 798 variables in Stata/BE.^{[17]}
The pricing and licensing of Stata depends on its intended use: business, government/nonprofit, education, or student. Single user licenses are either renewable annually or perpetual. Other license types include a single license for use by concurrent users, a site license, volume single user for bulk pricing, or a student lab.^{[19]}
Example code
The following set of commands revolve around simple data management.^{[20]}
sysuse auto // Open the included auto dataset browse // Browse the dataset (opens the Data Editor window) describe // Describes the dataset and associated variables summarize // Summary information about numerical variables codebook make foreign // Summary information about the make (string) and foreign (numeric) variables browse if missing(rep78) // Browse only observations with missing data for variable rep78 list make if missing(rep78) // List makes of the cars with missing data for variable rep78
The next set of commands move onto descriptive statistics.
summarize price, detail // Detailed summary statistics for variable price tabulate foreign // Oneway frequency table for variable foreign tabulate rep78 foreign, row // Twoway frequency table for variables rep78 and foreign summarize mpg if foreign == 1 // Summary information about mpg if the car is foreign (the "==" sign tests for equality) by foreign, sort: summarize mpg // As above, but using the "by" prefix. tabulate foreign, summarize(mpg) // As above, but using the tabulate command.
A simple hypothesis test:
ttest mpg, by(foreign) // Ttest for difference in means for domestic vs. foreign cars
Graphing data:
twoway (scatter mpg weight) // Scatter plot showing relationship between mpg and weight twoway (scatter mpg weight), by(foreign, total) // Three graphs for domestic, foreign, and all cars
Linear regression:
generate wtsq = weight^2 // Create a new variable for weight squared regress mpg weight wtsq foreign, vce(robust) // Linear regression of mpg on weight, wtsq, and foreign predict mpghat // Create a new variable contained the predicted values of mpg twoway (scatter mpg weight) (line mpghat weight, sort), by(foreign) // Graph data and fitted line
See also
References
 ↑ ^{1.0} ^{1.1} Newton, H. Joseph (2005). "A conversation with William Gould". The Stata Journal 5 (1): 19–31. doi:10.1177/1536867X0500500103. https://journals.sagepub.com/doi/pdf/10.1177/1536867X0500500103.
 ↑ Cox, Nicholas J.. "Statalist FAQ". https://www.statalist.org/forums/help#spelling.
 ↑ "STATA Data Manipulation: Basics and Applications 7" (PDF). https://www.iuj.ac.jp/faculty/kucc625/documents/DM1.pdf.
 ↑ Suárez, Erick; Pérez, Cynthia; Nogueras, Graciela; MorenoGorrín, Camille (2016). biostatisticsinpublichealthusingstata. https://www.stata.com/bookstore/biostatisticsinpublichealthusingstata/.
 ↑ "Disciplines". https://www.stata.com/disciplines/.
 ↑ ^{6.0} ^{6.1} ^{6.2} ^{6.3} ^{6.4} ^{6.5} ^{6.6} ^{6.7} ^{6.8} Cox, Nicholas J. (2005). "A brief history of Stata on its 20th anniversary". The Stata Journal 5 (1): 2–18. doi:10.1177/1536867X0500500102. https://journals.sagepub.com/doi/pdf/10.1177/1536867X0500500102. Retrieved 22 April 2021.
 ↑ ^{7.0} ^{7.1} Gould, William W.; Cox, Nicholas J.. "When was Stata first released? When were later versions released?". https://www.stata.com/support/faqs/resources/historyofstata/.
 ↑ "What's new in Stata?". StataCorp. https://www.stata.com/newinstata/.
 ↑ "Data frames: multiple datasets in memory". https://www.stata.com/newinstata/multipledatasetsinmemory/.
 ↑ "Stata 16 help for save". https://www.stata.com/help.cgi?save.
 ↑ Stata Glossary and Index: Release 17. College Station, TX: Stata Press. pp. 1–50. ISBN 1597182834. https://www.stata.com/manuals/icombinedsubjecttableofcontents.pdf.
 ↑ "Stata features". StataCorp. https://www.stata.com/features/.
 ↑ "program  Define and manipulate programs". Stata Press. https://www.stata.com/manuals/pprogram.pdf.
 ↑ "ssc  Install and uninstall packages from SSC". Stata Press. https://www.stata.com/manuals/rssc.pdf.
 ↑ "Use Python and Stata together  Stata". https://www.stata.com/python/.
 ↑ "How to Switch Your Workflow from Stata to R, One Bit at a Time · Frederick Solt". https://fsolt.org/blog/2018/08/15/switchtor.html.
 ↑ ^{17.0} ^{17.1} "Which Stata is right for me?". https://www.stata.com/products/whichstataisrightforme/.
 ↑ ^{18.0} ^{18.1} "Parallel Stata". Harvard Business School. https://grid.rcs.hbs.org/parallelstata.
 ↑ "Order Stata software". StataCorp. https://www.stata.com/order/dl/.
 ↑ Getting Started with Stata for Windows (Release 17 ed.). College Station, TX: Stata Press. pp. 1–19. ISBN 1597183342. https://www.stata.com/manuals/gsw.pdf. Retrieved 25 April 2021.
Further reading
 Stata  A Really Short Introduction. Boston: DeGruyter Oldenbourg. 2019. ISBN 9783110617290. https://books.google.com/books?id=nGOMDwAAQBAJ&q=stata+bittmann&pg=PP1.
 Pinzon, Enrique, ed (2015). Thirty Years with Stata: A Retrospective. College Station, Texas: Stata Press. ISBN 9781597181723. https://books.google.com/books?id=9ARswEACAAJ.
 Statistics with STATA. Boston: Cengage. 2013. ISBN 9780840064639. https://books.google.com/books?id=pUELAAAAQBAJ&pg=PP1.
External links
Original source: https://en.wikipedia.org/wiki/Stata.
Read more 