DScience:Discrete probability distributions and their characteristics
30% complete | ||
|
What is discrete probability distribution?
Discrete probability distribution is a Probability_distribution of a discrete random variable X. Each possible value of the discrete random variable can be associated with a non-zero probability. A discrete probability distribution is often presented as tables.
Examples:
- The number of people going to a given shop per day.
- The number of students that come to class on a given day.
Implementations
A number of Java libraries are available to create random discrete distributions:
- jhplot.math.num.pdf.package-summary
- javanpst.distributions.common.discrete.package-summary
- org.apache.commons.math3.distribution.IntegerDistribution
- smile.stat.distribution.Distribution
Here is the code how to fill histograms using a number of popular distributions, including discrete:
with the code;
from java.awt import Color from jhplot import * from cern.jet.random.engine import * from cern.jet.random import * # build a singleton c1=HPlot("Canvas",650,500,3,2) c1.visible() c1.setAutoRange() c1.setGTitle("Random Distributions") engine = MersenneTwister() Events=10000 c1.cd(1,1) r=Gamma(1,0.5,engine) h1=H1D("Gamma",25,0,10) h1.setFill(1) h1.setFillColor(Color.red) w=1.0/(Events*h1.getBinSize()) for i in range(Events): h1.fill(r.nextDouble(),w) c1.draw(h1) c1.cd(2,1) c1.setAutoRange() r=Binomial(10,0.5,engine) h1=H1D("Binominal",20,0,10) h1.setFill(1) h1.setFillColor(Color.blue) w=1.0/(Events*h1.getBinSize()) for i in range(Events): h1.fill(r.nextDouble(),w) c1.draw(h1) c1.cd(3,1) c1.setAutoRange() r=Poisson(5,engine) h1=H1D("Poisson",20,0,10) h1.setFill(1) h1.setFillColor(Color.green) w=1.0/(Events*h1.getBinSize()) for i in range(Events): h1.fill(r.nextDouble(),w) c1.draw(h1) c1.cd(1,2) c1.setAutoRange() r=StudentT(5,engine) h1=H1D("Student",20,0,5) h1.setFill(1) h1.setFillColor(Color.green) w=1.0/(Events*h1.getBinSize()) for i in range(Events): h1.fill(r.nextDouble(),w) c1.draw(h1) c1.cd(2,2) c1.setAutoRange() r=NegativeBinomial(10,0.5,engine) h1=H1D("NBD",30,0,30) h1.setFill(1) h1.setFillColor(Color.red) w=1.0/(Events*h1.getBinSize()) for i in range(Events): h1.fill(r.nextDouble(),w) c1.draw(h1) c1.cd(3,2) c1.setAutoRange() r=Logarithmic(0.5,engine) h1=H1D("Logarithmic",20,0,10) h1.setFill(1) h1.setFillColor(Color.blue) w=1.0/(Events*h1.getBinSize()) for i in range(Events): h1.fill(r.nextDouble(),w) c1.draw(h1)
= Description of discrete distributions
DataMelt can be used to determine statistical characteristics of an arbitrary frequency distribution, with moments calculated up to the 6th order. Read more about Moment (mathematics).
<pycode"> from jhplot import * from jhplot.math.StatisticSample import * a=randomLogNormal(1000,0,10) # generate random 1000 numbers between 0 and 10 using a LogNormal distribution p0=P0D(a) # convert it to an array print p0.getStatString() # print detailed characteristics </pycode>
Run this script and you will get a very detailed information about this distribution (rather self-explanatory)
To show this output, click expand
Size: 1000 Sum: 2.0795326321690155E11 SumOfSquares: 1.722072831288292E22 Min: 4.3681673233597326E-14 Max: 1.187289072883721E11 Mean: 2.0795326321690154E8 RMS: 4.1497865382309628E9 Variance: 1.7194678431631995E19 Standard deviation: 4.14664664899627E9 Standard error: 1.3112848062732975E8 Geometric mean: 0.7193930848008395 Product: 9.252494313364321E-144 Harmonic mean: 2.2976022239249118E-11 Sum of inversions: 4.352363475222163E13 Skew: 25.65476598759878 Kurtosis: 694.7699433878664 Sum of powers(3): 1.839916709064571E33 Sum of powers(4): 2.0782654881247146E44 Sum of powers(5): 2.4093597349729484E55 Sum of powers(6): 2.8286717081193334E66 Moment(0,0): 1.0 Moment(1,0): 2.0795326321690154E8 Moment(2,0): 1.722072831288292E19 Moment(3,0): 1.839916709064571E30 Moment(4,0): 2.0782654881247147E41 Moment(5,0): 2.409359734972948E52 Moment(6,0): 2.8286717081193336E63 Moment(0,mean()): 1.0 Moment(1,mean()): 4.931390285491944E-7 Moment(2,mean()): 1.7177483753200437E19 Moment(3,mean()): 1.8291913748162454E30 Moment(4,mean()): 2.0630054468429083E41 Moment(5,mean()): 2.3878300421487077E52 Moment(6,mean()): 2.798744135044988E63 25%, 50%, 75% Quantiles: 0.0012310644573427145, 0.9530465118707188, 535.0653267374155 quantileInverse(median): 0.5005 Distinct elements & frequencies not printed (too many).
Let us continue with this example and now we would like to return all statistical characteristics of the sample as a dictionary. We can do this by appending the following lines that 1) create a dictionary "stat" with key/value pairs; 2) retrieve a variance of the sample using the key ``Variance.
To display this code request membership or login if you are already member. |
which will print "Variance= 757.3". If not sure about the names of the keys, simply print the dictionary as "print stat".
One can create histograms that catch the most basic characteristics of data. This is especially important if there is no particular reasons to deal with complete data arrays. We can easily do this with above Fibonacci sequence as:
To display this code request membership or login if you are already member. |
The code converts the array into a histogram with 10 equidistant bins in the range 0-100, and then it prints the map with statistical characteristics.
You can also visualize the random numbers in the form of a histogram as shown in this detailed example above. We create random numbers, convert them to histograms and plot them.
To display this code request membership or login if you are already member. |
Statistics with 2D arrays
You can get detailed statistics on data described by jhplot.P1D class using the method getStat(axis), where axis=0 for X and axis=1 for Y. It returns a map (for JAVA) or Python dictionary (for Jython) where each statistical characteristics can be accessed using a key, such as mean, RMS, variance, error on the mean at. Assuming that P1D is represented by "p1" object, try this code:
To display this code request membership or login if you are already member. |
This will print the following values:
error 0.996592835069 rms 5.05682000584 mean 4.42857142857 variance 6.95238095238 stddev 2.63673679998
Here is a more detailed example:
from jhplot import * from jhplot.math.StatisticSample import * a=randomLogNormal(1000,0,1) # get statistics p0=P0D(a) print p0.getStat() # make histogram h=H1D("LogNormal",30,0,1) h.fill(a) c1 = HPlot("LogNormal") c1.setGTitle("LogNormal") c1.setRange(0,1,0,100) c1.visible() c1.draw(h)