DMelt:DataAnalysis/5 Fitting Data

From HandWiki
Member

Fitting data with functions

DataMelt offers many reach classes for linear and non-linear regressions. Read Curve fitting. In particular, DataMelt support:

  • Linear-regression fit (fit with a straight lines)
  • Non-linear regression (fitting with more complex functions)
  • Fitting data with shapes (circles, ellipses).

For fits with analytic functions, fit minimisation can include statistical errors on input data points. In addition, several approaches to minimization procedure can be used.

Linear regression

DataMelt offers many reach classes for linear and non-linear regressions. To perform a linear regression, use the class jhplot.stat.LinReg jhplot.stat.LinReg. The example below shows how to use this class to perform the linear regression:

from jhplot import *
from jhplot.stat  import LinReg
from java.awt import Color

# create input data
data="""2 3 4 2
8 2 3 1
7 1 2 3
7 2 3 4
2 3 4 5
1 2 2 4
3 3 4 5
2 2 1 3
8 8 8 3
1 3 5 1
2 3 1 3
4 5 6 1
2 1 0 0
2 3 4 5
6 7 8 2
1 2 3 5
2 3 1 2
1 2 4 2
3 5 5 5
1 2 3 4
"""
file = open("pnd.d", "w")
file.write(data)
file.close()


# read data from external file
pn=PND('data','pnd.d')
print pn.toString()

p0=pn.getP0D(1)    # extract 2nd column and put to a 1D array
print p0.getStat() # print statistics
print p0.variance()
h1=p0.getH1D(10)   # make a histogram with 10 bins


p1=pn.getP1D(1,2)   # extract column 2 and 3 
c1=HPlot("Plot",600,400,2,1)
c1.visible()
c1.cd(1,1)          # go to the first drawing region 
c1.setAutoRange()
c1.draw(h1)


c1.cd(2,1)       # go to second drawing region
c1.setAutoRange()
r = LinReg(p1)
print "Intercept=",r.getIntercept(), "+/-",r.getInterceptError() 
print "Slope=",r.getSlope(),"+/-",r.getSlopeError() 

# create a string with a*x+b function
func='%4.2f*x+%4.2f' % (r.getSlope(),r.getIntercept())
f1=F1D( func, p1.getMin(0), p1.getMax(0))    # define a function in the range of the data

p=r.getPredictionBand(Color.green) # calculate the prediction band
p.setLegend(False)                 # do not show the legend
p.setErrColor(Color.green)         # color for error bars
c1.draw(p)
c1.draw(p1)                        # redraw data and the function 
c1.draw(f1)                                    
c1.export("tutorial_dmin1.eps")    # make image file (EPS) 


p2=pn.getP2D(0,1,2)  # extract 1,2,3 columns
p3=pn.getP2D(0,2,3)  # extract 1,3,4 columns
c2=HPlot3D("Plot",600,400,2,1)
c2.visible()
c2.setAutoRange()
c2.cd(1,1)
c2.setBoxColor(Color.white)
c2.draw(p2)

c2.cd(2,1)
c2.setBoxColor(Color(200,210,210))
c2.setAutoRange()
c2.draw(p3)
c2.export("tutorial_dmin2.eps")

The result of this fit is shown below:

DMelt example: Data mining using a linear regression and predictions

Linear regression with HFitter

Now we can do a bit more flexible fit defining a linear function analytically. We will use jhplot.HFitter jhplot.HFitter which allows to define any fit function. We simulate a linear dependence of the variable Y on X using random numbers:

from jhplot import *
from java.util import Random
from java.awt import Color

tTpoints = P1D("Data")
rand = Random()
for i in range(10):
     x=rand.nextGaussian()
     y=2*x+10*(1+0.01*rand.nextGaussian())
     tTpoints.add(x,y)
          
fitter = HFitter("leastsquares")
fitter.setFunc("linear", 1, "a+b*x[0]","a,b")
fitter.setPar("a", 10)
fitter.setPar("b", 10)
fitter.fit(tTpoints)
fitresult = fitter.getResult()
a = fitresult.fittedParameter("a")  
b = fitresult.fittedParameter("b")
print("a = {0}, b = {1}".format(a,b))
 
c1=HPlot("Canvas",)
c1.visible()
c1.setAutoRange()
ff=fitter.getFittedFunc()
max_X = max(tTpoints.getArrayX())
min_X = min(tTpoints.getArrayX())
c1.draw(tTpoints)
f1=F1D("fit",ff,min_X,max_X)
f1.setColor(Color.red)
c1.draw(f1)

DMelt example: Fitting data using defined linear function

The fit returns exactly the same a and b values which were used to generate data:

a = 9.99526486454, b = 1.99636040719

Non-linear fitting

Data fits can be done by using the Java class jhplot.HFitter jhplot.HFitter. But before, one needs to construct a function and then set initial values for free parameters. Fitting can be done either in the interactive mode or using a script.

By default, HFitter fits data using chi2 methods. It is important to specify errors on Y-values of input data. You can print the available methods as:

from jhplot import *
f=HFitter()
print f.getFitMethod()

which prints "leastsquares", "cleverchi2", "chi2", "bml" fitting methods. You can set the fit method when initialize the fitter:

from jhplot import *
f=HFitter("leastsquares")

In this case, instead "chi2", "leastsquares" fit method is used. In this case, you can fit "X-Y" data arrays without specifying errors on Y.

Let's make a simple chi2 fit of experimental data using an analytic function [math]\displaystyle{ Tu + (Ta - Tu) * exp(-kk * x) }[/math], where parameters Tu, Ta and kk need to be determined by fitting data stored as array. The data has experimental (statistical or systematic) uncertainties. This is mandatory for chi2 minimization. The resulting fit is shown below (example is provided by Klaus Rohe).

DMelt example: Fitting data using a complex function (non-linear regression)

This image is generated by the code given below where we use jhplot.P1D jhplot.P1D container to store input data. You can access fit errors and the fit quality (chi2/ndf) as described by the jhplot.HFitter jhplot.HFitter class.

from jhplot import *

tData = P1D("time-temperature points") 
tData.add(0,  88, 8) # fill X, Y and error on Y 
tData.add(2,  85, 8)
tData.add(5,  80, 7)
tData.add(8,  75, 6)
tData.add(14, 68, 6)
tData.add(22, 60, 6)
tData.add(29, 55, 5)
tData.add(43, 48, 6)
tData.add(49, 45, 6)
tData.add(62, 40, 4)
tData.add(81, 35, 3)
tData.add(130,30, 2) 

fitter = HFitter() # build fitter 
fitter.setFunc("coolingfunction",1,"Tu + (Ta - Tu) * exp(-kk * x[0])","kk,Ta,Tu")
fitter.setPar("kk", 0.02)
fitter.setPar("Ta", 88.0)
fitter.setPar("Tu", 20.0)
fitter.setRange(0, 140)
func = fitter.getFunc()
print("{0} has parameters {1}".format(func.title(), func.parameterNames()))
print("{0}".format(func.codeletString()))
fitter.fit(tData)

fitresult = fitter.getResult()
kk = fitresult.fittedParameter("kk")  
Ta = fitresult.fittedParameter("Ta")
Tu = fitresult.fittedParameter("Tu") 
print("kk = {0}, Ta = {1}, Tu = {2}".format(kk, Ta, Tu))
 
c1=HPlot("Fitting")
c1.visible()
c1.setAutoRange()
max_X = max(tData.getArrayX())

c1.draw(tData)
f1=F1D("fit",func,0,max_X)
c1.draw(f1)


Below we will illustrate how to perform a rather complicated fit using the chi2 method. The fit will be done in several steps. In this example we fit data which can be described by multiple Gaussians, in which next Gaussians fit takes the values from the previous fit:

from jhplot  import *
from jhplot.io import *
from java.awt import Color
from java.util import Random
from jhplot.math.StatisticSample  import *

xmin=0
xmax=20
h1 = H1D('Data',100,xmin,xmax)
r= Random()

f=F1D('10+10*x',xmin,xmax)
p=f.getParse()
max=f.eval(xmax)

for i in range(10000):
  a=randomRejection(10,p,max,xmin,xmax)
  h1.fill(a)
  h1.fill(0.3*r.nextGaussian()+4)
  h1.fill(0.6*r.nextGaussian()+10)
  h1.fill(0.6*r.nextGaussian()+15)


c1 = HPlot('Canvas')
c1.setRange(0,20,0,4000)
c1.visible()
c1.draw(h1) 

f=HFitter()
f.setFunc('p1+g')
func=f.getFunc()
f.setPar('mean',4); f.setPar('amplitude',100)
print func.parameterNames()
f.setRange(0,7)
f.fit(h1)


ff=f.getFittedFunc()
r=f.getResult()
ff=f.getFittedFunc()
r=f.getResult()
fPars  = r.fittedParameters()


## next gaussiaon
f.setFunc('p1+g+g')
func=f.getFunc()
func.setParameters(fPars.tolist()+[500,10,0.5])
f.setRange(0,13)
f.fit(h1)
ff=f.getFittedFunc()
r=f.getResult()
fPars  = r.fittedParameters()
print func.parameterNames(), func.parameters()


# next gaussian 
f.setFunc('p1+g+g+g')
func=f.getFunc()
func.setParameters(fPars.tolist()+[500,15,0.5])
print func.parameterNames(), func.parameters()
f.setRange(0,20)
f.fit(h1)
ff=f.getFittedFunc()


# plot all 
f2 = F1D('Gaussians+background',ff,0,20)
f2.setPenWidth(1)
f2.setColor(Color.blue)
c1.draw(f2)


The output is shown here:

DMelt example: Fitting data using several Gaussian distributions added together

Not allowed to read See [https://datamelt.org/code/index.php?keyword=hfitter HFitter code examples] : Not datamelt.org URL!

Numerical interpolation

One can use numerical way to describe data using Interpolation and Smoothing Smoothing One example is shown on this figure:

DMelt example: Data smoothing (interpolation), Cubic and SPline

where we attempted to smooth data using a non-analytical approach. Such approach is often considered for various predictions when to find an appropriate analytical function is difficult or impossible. DataMelt provides a several flexible methods to smooth data and perform interpolation. The code of this example is given below:

# Here we use weightied smoothing as given by a histogram
 
from java.awt import Color,Font
from java.util import Random
from jhplot import *
from jhplot.stat import Interpolator

c1 = HPlot("Canvas",600,400)
c1.setGTitle("Cubic Spline interpolation")
c1.visible()
c1.setRange(-2,3,0,500)

h2 = H1D("Background",60, -2.0, 3.0)
h2.setFill(1)
h2.setErrX(0)
h2.setErrY(1)
h2.setFillColor(Color.yellow)
h2.setColor(Color.green)
h2.setErrColorY(Color.blue)

r=Random()
for i in range(10000): 
      h2.fill(r.nextGaussian()-2)
      if (i<3000): h2.fill(r.nextGaussian()*0.3+1)
      if (i<2000): h2.fill(r.nextGaussian()*0.7+2)

c1.draw(h2)


k1=Interpolator(h2)
k2=Interpolator(h2)
s1=k1.interpolateCubicSpline(0.9999,2) 
s2=k2.interpolateCubicSpline(0.9,2) 
p1=P1D("CubicSpline. s=0.9999")
p2=P1D("CubicSpline. s=0.9 ")
fit=s1.getSplinePolynomials() 
print "Length=",len(fit)
nbins=100
max=3
min=-2
h=(max-min)/float(nbins)
print "Step=",h
for i in range(nbins):
           z1 = min + i * h; 
           z2=s1.evaluate(z1)
           z3=s2.evaluate(z1)
           print i,z1, z2
           p1.add(z1,z2)
           p2.add(z1,z3)
p1.setStyle("l")
p1.setSymbolSize(1)
p1.setColor(Color.red)
c1.draw(p1)

p2.setStyle("l")
p2.setSymbolSize(1)
p2.setColor(Color.blue)
c1.draw(p2)

Interactive fit

One simple way to fit data using interactive fitter is to use the class jhplot.HFit jhplot.HFit. This fitter creates a dialog that is shown in

DMelt example: Interactive fit of P1D array and HFit


The code that creates this dialog window is shown here:

from java.awt import Color
from jhplot  import *

c1 = HPlot()
c1.visible(); c1.setAutoRange()
c1.setGTitle("&chi;^{2} curve fitting",Color.blue)
c1.setGrid(0,0)
c1.setGrid(1,0)

p1 = P1D("Linear fit")
p1.add(1, 2, 1) # x, y and dy  
p1.add(2, 2, 1)
p1.add(3, 5, 1)
p1.add(4, 5, 1)
p1.add(5, 7, 2)
p1.setErr(1)
p1.setErrFillColor(Color.yellow,0.3)
c1.draw(p1)

a=HFit(c1,p1) # run interactive curve fitter 
a.addFunc("User1","My fit function", "a*x[0]+b","a,b")

Data can be fitted using many predefined functions in an more interactive way. Let's create a few data containers (1D array, 1D histograms) and start jhplot.HPlotJas jhplot.HPlotJas plotter based on JAS2 program from Slac:

from java.util import Random
from jhplot  import * 

h1 = H1D("1D  histogram",100, -2, 2.0)
p1=P1D("data with errors")

rand = Random()
for i in range(600):
      h1.fill(rand.nextGaussian())     
p0=P0D("Normal distribution")
p0.randomNormal(1000,0.0,10.0)
p1.add(1,10,3)
p1.add(2,5,1)
p1.add(3,12,2)
from  java.util import ArrayList
a=ArrayList([h1,p1,p0])
c=HPlotJas("JAS",ArrayList([h1,p1,p0]))

This will bring up a "JAS" program with all objects shown on the left side.

DMelt example: Interactive fit using Jas

You can expand the tree (left) and click on each object to plot on the canvas. Then try to fit the data: using the mouse pop-up dialog, press "Add function" (for example, a Gaussian function) and click on "Fit". The program will perform chi2 minimisation and you will see a Gaussian line on to of the data. You can adjust the initial values of the function by dragging 3 points of this function.

In the above example we used pre-built functions to perform fits. You can add your own custom fit function to the menu and use it for fitting as well. You can do this directly in the Jython script and pass this function to HplotJas canvas. Below we show an example in which we create 2 custom functions (a new Gaussian and a parabola) and passed them to the HPlotJas for interactive fitting.

from hep.aida import IAnalysisFactory
from java.util import Random
from jhplot  import * 

h1 = H1D("1D  histogram",100, -2, 2.0)
p1=P1D("data with errors")

rand = Random()
for i in range(600):
      h1.fill(rand.nextGaussian())     
p0=P0D("Normal distribution")
p0.randomNormal(1000,0.0,10.0)
p1.add(1,10,3)
p1.add(2,5,1)
p1.add(3,12,2)

aFactory = IAnalysisFactory.create()
tFactory = aFactory.createTreeFactory()
tree = tFactory.create();
fFactory = aFactory.createFunctionFactory(tree)
gauss=fFactory.createFunctionFromScript("gauss", 1, "a*exp(-(x[0]-mean)*(x[0]-mean)/sigma/sigma)","a,mean,sigma","NewGaussian")
parabola=fFactory.createFunctionFromScript("parabola",1,"background + (a*x[0]*x[0]+b*x[0]+c)","a,b,c,background","Parabola")
 
from  java.util import ArrayList
a=ArrayList([h1,p1,p0])
# c=HPlotJas("JAS",ArrayList([h1,p1,p0]))
c=HPlotJas("JAS",ArrayList([h1,p1,p0,gauss,parabola]))

DMelt example: Interactive fit of H1D using Jas and a defined function

Advanced fitting of data

Section Advanced data fitting discusses how to perform non-linear fit in Java using data in many dimensions and using any complex function that can be defined not as a string, but totally programically.