DScience:Discrete probability distributions and their characteristics

From HandWiki
Limitted access. First login to DataMelt member area if you are a full DataMelt member.


30% complete
   


What is discrete probability distribution?

Discrete probability distribution is a Probability_distribution of a discrete random variable X. Each possible value of the discrete random variable can be associated with a non-zero probability. A discrete probability distribution is often presented as tables.

Examples:

  • The number of people going to a given shop per day.
  • The number of students that come to class on a given day.

Implementations

A number of Java libraries are available to create random discrete distributions:

Here is the code how to fill histograms using a number of popular distributions, including discrete:

DMelt example: Verious random distributions

with the code;

No access. To show this code, login to DataMelt member area

= Description of discrete distributions

DataMelt can be used to determine statistical characteristics of an arbitrary frequency distribution, with moments calculated up to the 6th order. Read more about Moment (mathematics) Moment (mathematics).

<pycode"> from jhplot import * from jhplot.math.StatisticSample import * a=randomLogNormal(1000,0,10) # generate random 1000 numbers between 0 and 10 using a LogNormal distribution p0=P0D(a) # convert it to an array print p0.getStatString() # print detailed characteristics </pycode>

Run this script and you will get a very detailed information about this distribution (rather self-explanatory)

To show this output, click expand

Let us continue with this example and now we would like to return all statistical characteristics of the sample as a dictionary. We can do this by appending the following lines that 1) create a dictionary "stat" with key/value pairs; 2) retrieve a variance of the sample using the key ``Variance.

which will print "Variance= 757.3". If not sure about the names of the keys, simply print the dictionary as "print stat".

One can create histograms that catch the most basic characteristics of data. This is especially important if there is no particular reasons to deal with complete data arrays. We can easily do this with above Fibonacci sequence as:

The code converts the array into a histogram with 10 equidistant bins in the range 0-100, and then it prints the map with statistical characteristics.

You can also visualize the random numbers in the form of a histogram as shown in this detailed example above. We create random numbers, convert them to histograms and plot them.

Statistics with 2D arrays

You can get detailed statistics on data described by jhplot.P1D jhplot.P1D class using the method getStat(axis), where axis=0 for X and axis=1 for Y. It returns a map (for JAVA) or Python dictionary (for Jython) where each statistical characteristics can be accessed using a key, such as mean, RMS, variance, error on the mean at. Assuming that P1D is represented by "p1" object, try this code:

This will print the following values:

error       0.996592835069
rms 	        5.05682000584
mean 	4.42857142857
variance 	6.95238095238
stddev 	 2.63673679998

Here is a more detailed example:

No access. To show this code, login to DataMelt member area
This tutorial is provided under this license agreement.