DMelt:IO/8 Cross Paltform IO

From HandWiki

Cross platform I/O

HBook class

Many DMelt object (data arrays, histograms, functions) can be saved and restored using XML files without using the Java serialization. The Java class which writes data in XML is called or (which has slightly different XML tags). The class can read XML files with histograms created in C++ or Fortran programs.

These classes, unlike the standard Java serialization (including XML) discussed in Section Java Serialization, is very similar to the "AIDA" implementation. Notably:

  • keeps only information content of objects (values, titles, labels), without graphical attributes (Color, fonts)
  • does not use XML tags to keep separate values. This means that this class generates substantially smaller outputs since the standard tags for data values are not used. This is especially important for large data volumes, 2D matrices and histograms.
  • is better suited for cross platform. In particular, C++ and fortran packages are available to read and write data in this format. See the package CFBook.

Here is a small example of how to write a few DataMelt objects:

from jhplot  import *
from import *
hb = HBook("output.jdat","w") # HBook object 
hb.write("data",p1)      # we add data using "data" key

hb.write(10,f1)          # we can also add a function using numeric key (integer)
hb.close()               # write and close the file.

The output of the JDAT file look as:

<created-by>JDAT file. Work.ORG: @S.Chekanov</created-by>
<created-on>Sat Jan 26 20:43:42 CST 2013</created-on>


All XML tags of JDAT are self-explanatory. Now we will read both objects using their keys:

from jhplot  import *
from import *
hb = HBook("output.jdat")  # read HBook object 
print hb.getKeys()         # print all the keys
print p1.toString()
print f1.getName()

The expected number is:

array(java.lang.String, [u'10', u'data'])
# title=test  dimension=2
#  X       Y 
1.0  2.0
2.0  3.0


You can browser the data as explained in Sect.Input and Output. Similarly, you can save histograms, i.e. jhplot.H1D jhplot.H1D, jhplot.H2D jhplot.H2D and other objects.

Here is a more detailed example. We write data arrays, histogram and a function to 3 types of files with the extensions .jdat (HBook), .jser (HFile), .jpbu (PFile, protocol buffer). Then we open the DataBrowser for each file automatically and then plot the objects using the mouse clicks.

from  import * 
from jhplot  import *
import time

start = time.clock()
for i in range(10):
  p0= P0D('Random='+str(i))
print 'PFile  time (s)=',time.clock()-start

# browser objects

The data can be organized in "directories" inside the HBook XML files and can be viewed in the browser as a trees. Here is an example showing this:

To make a directory inside the HBook file, simply use "/" in the keys. This simple example shows this: we create 2 folders "folder" and "folder2" and put there the objects. When we will use the data browser, they will be shown as a tree.

hb.write("folder/histo",h1) # put histogram h1 to the folder "folder"
hb.write("folder/func",f1)  # also a function
hb.write("folder2/1d array",p) # put data "p" to another folder.
hb.write("folder2/2d array",pn)

Parse HBook files in CPython

HBook files with the extension ".jdat" can be read in CPython (Python implemented in C) using "xml.dom" Python module. In this case, one can use the output in a platform-specific languages (for example, CPython can be interfaced with C++/C libraries). Here is a small example how to parse jdat files in the CPython:

from xml.dom import minidom
xmldoc = minidom.parse('data.jdat')
itemlist = xmldoc.getElementsByTagName('p1d')
print len(itemlist)
for staff in itemlist:
        sid = staff.getAttribute("id")
        sid = (staff.getElementsByTagName("id")[0])
        title = (staff.getElementsByTagName("title")[0])
        size= (staff.getElementsByTagName("size")[0])
        values = (staff.getElementsByTagName("data")[0])
        print "Read: id=",sid, "title=",title," size=",size
        for line in values.splitlines():
                   line = line.strip()
                   if not line:continue
                   floats = [float(x) for x in line.split()]
print alldata["eta1"] # print all attributes

In this example ".jdat" file includes several P1D objects. We read them all objects with the keys and create a map where the ID of the object is the key. The array of this map has data.

CFBook class can visualize histograms created by Fortran or C++ code. For this, use a light stand-alone library called CFBook (See It can be linked to a C++ or Fortran code. You can compile it using gcc (for C++ programs) or gfortran (for Fortran program). This library creates XML file with 1D and 2D histograms, that can be read by Jas4pp. Here is an example of reading 1D histogram from fortran.xml file:

from jhplot  import *
from import *
 hb = CFBook()"fortran.xml")
print hb.listAll()
print hb.getKeysH1D() # list keys
h1=hb.getH1D(1)        # use the key 1 to retrive H1D
c1.setGTitle("Histograms from a file");

PFile class

This class is an attempt to build a multi-platform (and multi-language) I/O format based the Google's protocol buffer package. This package can be used to write files using a C++ program (or any other language) and read using Java, or write data in Java and read by a C/C++ programs.

The DataMelt Java class which implements the Google's protocol buffer format is called The class is designed to store mainly DataMelt containers (arrays) described in Data Structures and data projected to 1,2 and 3 dimensions (histograms) described in Histograms section.

A C/C++ package which is used to write data arrays, histograms and structural data (ntuples) in C++ to be read by Java class is called CBook.

Once data are written in C/C++ with the help of the CBook package, one can read such data with the class or even to open data in a interactive Browser using class.

Here is a more complicated example: We write a 2D histogram and 1D array into a file. Then we read the objects back and plot them. One can use the class as well (just replace PFile by HFile). But, in case of PFile, you can write histograms and arrays using a C++ code and read data back using Java/Jython!

Unlike the HFile class, only pre-defined data containers can be stored in PFile files (all jhplot arrays, functions, histograms). PFile class is optimized for write/read speed and small output. In addition, one can read/write such files in C++.

from import *
from jhplot import *
from java.util import Random

c1 = HPlot3D("Canvas",600,400,2,1)

h1 = H2D("input2D",5,-2, 2.0, 5, -3,3)
r = Random(33)
for i in range(100):
  h1.fill(r.nextGaussian(), r.nextGaussian())
print f.size()        # how many objects are stored
print f.listEntries() # list all entries, the size of objects and the compression level
c1.draw("input2D") )

In this example, instead of sequential reading the objects one by one, we retrieve the objects using its key "input2D" (which is the title of the histogram). Open the created file in a browser for easy plotting:

from  import *
from jhplot  import *

You will see a pop-up windows with all data entries. Click on the entry - the object will be plotted on the canvas.

Using DataBrowser to open ".jpbu" files

All DataMelt objects stored in compressed Java-serialized files can be viewed using a browser. For example, if a serialized file contains P1D, P0D, H1D, etc. objects, one can view them and plot them using a mouse-click approach.

If you have a file with the extension ".jpbu", you can view it using the DataBrowser. Go to the toolbar, select [Plot}->[HPlot canvas]-> [File]-> [Open data file].

Working with C++ external programs

Here we discuss how to create files with DataMelt data containers (arrays, histograms, functions) using an C++ code and then how to read them using 100% Java code of the DataMelt.