DMelt:IO/Databases

From HandWiki
Member

Working with databases

DataMelt includes several database engines:


HDataBase class

jhplot.io.HDataBase jhplot.io.HDataBase allows to write and read data in a form of disk-based database using the keys. This is handy for non-sequential I/O. The class is based on the original code of D.Hamner (JavaWorld.com).

This raw-level database allows to store and retrieve objects. It associates a key of type String with each record. The keys will be limited to a maximum length, although the record data will not be limited. The record will consist of only one "blob" of binary data. The number of records is not fixed at creation time. The file can grow and shrink as records are inserted and removed. The database operations not depend on the number of records in the file. In other words, they'll be of constant order time with respect to file accesses. The index should be small enough to load into memory.

Here is a small example. We use two keys "0" and "1" (both are strings) and associate a string "Test" and 2D list with each key. Then we retrieve data from the file. In this approach, we use "persistent" database which keeps the data on the disk.

from jhplot.io import *

file="output.db"
f=HDataBase(file,"w")
f.insert("0","Test")                # store "Test" using the key "0"
f.insert("1",[ [1,2,3,3],[2,3,4]])  # store 2D array using the key "1"
f.close()
f=HDataBase(file,"r")               # open and retrieve data using the keys
print f.get("0"),f.get("1")

The output of this code is shown below:

Test [[1, 2, 3, 3], [2, 3, 4]]

We can also write complicated objects "classes" into the files as in this example:

from jhplot  import *
from jhplot.io import *
import os.path


# build an event from 3 objects
def makeEvent(entry):
    label="Event="+str(entry)
    p=P0D(label)
    p.randomUniform(10,0,1)
    h=H1D(label,10,-1,1)
    h.fill(i)
    return [label,p,h]

# write to a raw-level database.
# use string as keys
file="output.db"
f=HDataBase(file,"w")
Events=100
for i in range(Events):
       event=makeEvent(i)
       if (i%100 == 0):  
           print event[0]+" size=",os.path.getsize(file)
       f.insert(str(i),event)
f.close()

# open database 
f=HDataBase(file,"r")
# print "Remove event 25" 
# f.remove("25")    

# print "extract event 26"
event=f.get("26")
print "=== "+event[0]+"===="
print event[1].toString()
print event[2].toString()
f.close()

Here we sequentially write a complicated data structure "event" using several objects, and later retrieve it. This class allows only "string" type as the keys. We will show later how to use arbitrary types foe the keys.

File-based map

Objects can be stored in HashMap which is kept in the memory. One can work with a huge amount of data using org.clapper.util.misc.FileHashMap org.clapper.util.misc.FileHashMap which keeps the keys in memory, but stores the values as serialized objects in a random access disk file. This approach is very similar to the previous case, but the implementation is different.

In this example we create a file "/tmp/mytest" (in the temporary directory tmp) and put some data (1D,2D arrays and histogram):

from jhplot import *
from org.clapper.util.misc import FileHashMap

# make some massive java objects
p1=P0D(); p1.randomNormal(10000,0,2)
p2=P0D(); p2.randomNormal(10000,1,2)
pp=P1D("test",p1,p2);  h1=H1D("OK",100,0,10)
ppp=PND(); ppp.add([1,2,3,4,5]); ppp.add([1,2,3,4,3])

######### store all objects in the file 
print "Write  to the database"
map=FileHashMap("/tmp/mytest",FileHashMap.FORCE_OVERWRITE);
map.put("d1",p1)
map.put("d2",p2)
map.put("d3",ppp)
map.put("4",h1)
print map.size()
map.save()
map.close()
print "Closed   database"

Note that the database will have 2 files: mytest.ix (index file) and mytest.db (actual data) Now we will read this database and extract stored objects. Then we print them:

from jhplot import *
from org.clapper.util.misc import FileHashMap
print "Read   to the database"
map=FileHashMap("/tmp/mytest",FileHashMap.RECLAIM_FILE_GAPS);
p1=map.get("d1")
p2=map.get("d2")
ppp=map.get("d3")
hh=map.get("4")
print hh
print ppp
print map.size()
map.close()
print "Closed   database"

Object databases

A massive data can be stored a a map, where each key corresponds to some object (can be arbitrary Java objects). You can store collections backed up by disk storage and you can store and handle billions of data objects. Unlike the previous approach, we will consider the true database where you can rollback changes, encrypt the data, etc. As before, only serialize object can be stored. In this section we will show example of using this database. As before, no SQL is required and the key can have arbitrary types:

Below is an example in which we write many values using their keys, close the database, and read the values back from the database using their keys.

from org.mapdb import *
from java.io import File
db =DBMaker.newFileDB(File("/tmp/db")).journalDisable().make()
map = db.getTreeMap("map");
max=1000
for i in range(max):
          map.put(i,"test"+str(i))
print "closing.."
db.close()

db =DBMaker.newFileDB(File("/tmp/db")).journalDisable().make() # reading the data
step = max/1000
for i in range(max):
        x=map.get(i)
        print x
db.close()

Note that the database creates several files in the /tmp/ directory (db,db.p).

Here is an example were we write 100 histograms into the database using integer keys, and then we read all histograms back.

from java.io import *
from jhplot import *
from org.mapdb import *
print "Write  to the database"
db =DBMaker.newFileDB(File("/tmp/dbtest")).journalDisable().make()
# db.encryptionEnable("password")
map = db.getTreeMap("mydata");
for i in range(100):
          map.put(i,h1)    
db.close()
print "Closed   database"


An then me read histograms back:


from java.io import *
from jhplot import *
from org.mapdb import *;
h1=H1D("OK",100,0,10)
print "Write  to the database"
db =DBMaker.newFileDB(File("/tmp/dbtest")).journalDisable().make()
map = db.getTreeMap("mydata");
for i in range(100):
          h=get.put(i)    
db.close()

Neodatis Object databases

NeoDatis is another simple Java-based Object Database. Below we show a rather complete example on how to use the NeoDatis database to store objects of the jhplot package:

# This example is based on http://www.neodatis.org/  (NeoDatis database)

from java.awt import Color,Font
from jhplot  import * 
from java.util import Random,ArrayList
import os 

# import from NeoDatis DB
from  org.neodatis.odb import *;
from  org.neodatis.odb.impl.core.query.criteria import *;
from  org.neodatis.odb.core.query.criteria import *;
from  org.neodatis.odb.xml import *;


c1 = HPlot("Canvas",600,400)
c1.setGTitle("Writing and reading objects from a NeoDatis database");
c1.visible(1)
c1.setAutoRange()

p1=P1D("X-Y points")
p1.add(10,20)
p1.add(30,40)

h1 = H1D("GaussianHistogram",20, -2, 2.0)
h1.fillGauss(100,10,2)

lab=HLabel("Label", 0.15, 0.7, "NDC")
lab.setColor(Color.blue)

# remove database  if exists
if os.path.exists("database.db"):
    os.remove("database.db")

print "Store all objects in a database"
odb = ODBFactory.open("database.db")
odb.store(p1)
odb.store(h1)
odb.store(lab)
odb.close()



print "######### open the database ################"

odb = ODBFactory.open("database.db")
query = CriteriaQuery(P0D);
obj = odb.getObjects(query)
print  "Number of P0D objects=",obj.size()
query = CriteriaQuery(H1D);
obj = odb.getObjects(query)
print  "Number of H1D  objects=",obj.size()


# find an P0D object with the name data1
query =  CriteriaQuery(P0D, Where.equal("title", "data1"));
#p0d=odb.getObjects(query).getFirst()
#print p0d.toString()
   
# find an H1D object with the name GaussianHistogram
query =  CriteriaQuery(H1D, Where.equal("title", "GaussianHistogram"));
h1d=odb.getObjects(query).getFirst()
h1d.setTitle("GaussianHistogram from a database")
c1.draw( h1d)
            


odb.close()

SQL databases

DataMelt supports SQL-type of databases:

Below we will show Jython scripts how to work with such databases.

SQLite database

# This example shows how to read SQLite database
# created using the native C++ code. Look at SQLjet for more examples. 
# S.Chekanov 
from org.tmatesoft.sqljet.core.table import SqlJetDb
from org.tmatesoft.sqljet.core import SqlJetTransactionMode
from java.io import File

dbFile=File("data.sqlite")
dbFile.delete()
db=SqlJetDb.open(dbFile, 1) # write database in "read mode"
db.getOptions().setAutovacuum(1)

db.beginTransaction(SqlJetTransactionMode.WRITE)
db.getOptions().setUserVersion(1)
query = "CREATE TABLE  ajax (name, value);"
db.createTable(query) 
table = db.getTable("ajax")
table.insert("Kitaev", "Alexander")
table.insert("Test", "Alexander")
table.insert("Test2", "Alexander")
db.commit()
db.close()

# read this databse
db = SqlJetDb.open(dbFile, 0);
print db.getSchema()
db.beginTransaction(SqlJetTransactionMode.READ_ONLY)
table=db.getTable("ajax")  # read one table
# open table
cursor=table.open()
# print number of fields
print cursor.getFieldsCount()
# print number of rows
print cursor.getRowCount()
db.close()

Now we read it:

# This example shows how to read SQLite database
# created using the native C++ code. Look at SQLjet for more examples. 
# S.Chekanov 
from org.tmatesoft.sqljet.core.table import SqlJetDb
from org.tmatesoft.sqljet.core import SqlJetTransactionMode
from java.io import File

dbFile=File("data.sqlite")
dbFile.delete()
db=SqlJetDb.open(dbFile, 1) # write database in "read mode"
db.getOptions().setAutovacuum(1)

db.beginTransaction(SqlJetTransactionMode.WRITE)
db.getOptions().setUserVersion(1)
query = "CREATE TABLE  ajax (name, value);"
db.createTable(query) 
table = db.getTable("ajax")
table.insert("Kitaev", "Alexander")
table.insert("Test", "Alexander")
table.insert("Test2", "Alexander")
db.commit()
db.close()

# read this databse
db = SqlJetDb.open(dbFile, 0);
print db.getSchema()
db.beginTransaction(SqlJetTransactionMode.READ_ONLY)
table=db.getTable("ajax")  # read one table
# open table
cursor=table.open()
# print number of fields
print cursor.getFieldsCount()
# print number of rows
print cursor.getRowCount()
db.close()


Here is a browser:

file="data.sqlite"
br=SQLiteBrowser(file)

HyperSQL database

Derby database

Starting from v3.8, Derby is excluded from the package since it comes with JDK as JavaDB (see the JAVAHOME/db/ directory). Here is example of how to write and read data using Derby:

This script writes the data:

from openDB import *

scon=protocol+dbName+';create=true'
conn = DriverManager.getConnection(scon, props)
s = conn.createStatement()

try:
  s.execute('drop table '+table) 
except SQLException:
  print 'no need to remove table'

s.execute('create table '+table+'(num int, addr varchar(40))')
s.execute('insert into '+table+' values (1956,\'Webster St.\')')
s.execute('insert into '+table+' values (1910,\'Union St.\')')


s1='update '+table+' set num=?, addr=? where num=?'
ps = conn.prepareStatement(s1)
ps.setInt(1,180)
ps.setString(2, 'Grand Ave.')
ps.setInt(3, 1956)
ps.executeUpdate()
print 'Updated 1956 Webster to 180 Grand'


s.close()
conn.commit()

try: 
 DriverManager.getConnection('jdbc:derby:;shutdown=true')
except SQLException:
 print 'all done'

Now we read data:

from openDB import *

scon=protocol+dbName+';create=false'
conn = DriverManager.getConnection(scon,props)
s = conn.createStatement()

s1='SELECT num, addr FROM '+table+' ORDER BY num'
rs = s.executeQuery(s1)
while rs.next():
  print 'sorted='+rs.getString(1),rs.getString(2)
     
s2 = 'SELECT * FROM '+table+' WHERE addr=\'Union St.\''
rs = s.executeQuery(s2)
while rs.next():
  print 'Found=',rs.getString(1),rs.getString(2)

s.close()
conn.commit()

try:
 DriverManager.getConnection('jdbc:derby:;shutdown=true')
except SQLException:
 print 'all done'