DMelt:DataAnalysis/Using Weka
Using Weka for data analysis
DataMelt, as an environment for computation, can be used to call 3rd party libraries. Most open-source libraries are already included to DataMelt as 3rd party libraries. Other jar libraries, which have more restrictive license, can be dynamically loaded as discussed above (DMelt:Programming/9_External_libraries).
In the case of Weka Data Mining Software, you can use use its classes directly inside DataMelt, and mix DataMelt Java classes with those from DataMelt.
You can also use Weka in the GUI mode using the menu "Tools - Neural Networks: Weka". Note that Weka scans only jar files inside the directories "user", "weka" and "math". Other DataMelt Java libraries are not visible for Weka.
Here is an example how to use Weka to classify data using J48 algorithm:
from java.io import FileReader from weka.core import Instances from weka.classifiers.trees import J48 from jhplot import Web xf="iris.arff" url="https://datamelt.org/examples/data/weka/"+xf print "Loading ",xf print Web.get(url) ifile = FileReader(xf) data = Instances(ifile) data.setClassIndex(data.numAttributes() - 1) j=J48() j.buildClassifier(data) print(j)
Here we downloaded iris.arff data from the web and run the Weka algorithm called weka.classifiers.trees.J48. The output is printed on the screen.
Here is another example that deals with clustering of data. This time we pass some options to the Weka weka.clusterers.EM algorithm using the method "setOptions":
from java.io import FileReader from weka.core import Instances from jhplot import Web xf="weather.arff" url="https://datamelt.org/examples/data/weka/"+xf print "Loading ",xf print Web.get(url) data = Instances(FileReader(xf)) from weka.clusterers import EM; cls = EM() # new instance of clusterer cls.setOptions(["-N", "2"]) cls.buildClusterer(data) # build the clusterer print cls # print cluster membership from weka.core import Utils for i in range(data.numInstances()): cluster = cls.clusterInstance(data.instance(i)) dist = cls.distributionForInstance(data.instance(i)) print ( str(i+1)+" - "+str(cluster)+" - "+Utils.arrayToString(dist))
More examples that use Weka can be found in datamelt examples |
Weka can be used inside Java code, or inside Jython, Groovy, BeanShell and JRuby. Look at this tutorial.