DScience:Start

From HandWiki
Revision as of 20:47, 27 March 2020 by Jworkorg (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Limitted access. Login to DataMelt if you are a full DataMelt member.
Table of Contents


HandWiki datascience encyclopedia Online Encyclopedia on Data Science using Python and Jython
Contributors (names will be filled later). Editor S.Chekanov


Dm logo125px hat.png The goal of this project is to create a free online encyclopedia on data science. Each theoretical principle and the description of numeric algorithm will be supported by a programming code implemented using Python (+ external C/C++ libraries) or Jython (+ external Java libraries).



Everyone is welcome to contribute to this wiki. One can login or request account using the top-right menu. Read the section How to edit wiki text. Please use your real name and email during the registration since this will help generate the author list (all people contributed to this book will be included to the author list).

Here are some tips:

  • Use Python or Jython code for real-world examples to illustrate theoretical principles;
  • Do not use complex templates which cannot easily be converted to LaTeX using top-right action menu. Always use imported BibTeX files for citations;
  • Your wiki text can link 234,307 wiki articles that already exist in HandWiki. One can upload images, or reference the exiting images from Wikipedia Commons or DataMelt projects.
  • If you see a red box on top of this page, then access to some examples and Java class documentation is till restricted. Read read this section.


Table of contents

Descriptive statistics

  1. Introduction to data science
  2. Introduction to statistics
  3. Random numbers
  4. Histograms
  5. Discrete probability distributions and their characteristics
  6. Continuous probability distribution and their characteristics

Saving and restoring data

  1. Flat files
  2. Spreadsheets
  3. Databases
    1. File-based databases
    2. SQL databases

Data visualization

  1. 2D representation of data
  2. 3D representation of data

Data mining

  1. Finding regularities
  2. Correlation analysis
  3. Unsupervised machine learning
  4. Supervised machine learning

Regression analysis

  1. Linear regression
  2. Non-Linear regression

Statistical tests

  1. Statistical inference
  2. Confidence levels
  3. Statistical limits

Forecasting and finding missing data

  1. Extrapolation
  2. Interpolation
  3. Forecasting
This tutorial is provided under this license agreement.

<addthis />