Perl Data Language

From HandWiki
Short description: Array programming library for Perl
Perl Data Language (PDL)
ParadigmArray
DeveloperKarl Glazebrook, Jarle Brinchmann, Tuomas Lukka, and Christian Soeller
First appeared1996 (1996)
OSCross-platform
LicenseGNU General Public License, Artistic License
Websitepdl.perl.org
Influenced by
APL, IDL, Perl

Perl Data Language (abbreviated PDL) is a set of free software array programming extensions to the Perl programming language. PDL extends the data structures built into Perl, to include large multidimensional arrays, and adds functionality to manipulate those arrays as vector objects. It also provides tools for image processing, machine learning, computer modeling of physical systems, and graphical plotting and presentation. Simple operations are automatically vectorized across complete arrays, and higher-dimensional operations (such as matrix multiplication) are supported.

Language design

PDL is a vectorized array programming language: the expression syntax is a variation on standard mathematical vector notation, so that the user can combine and operate on large arrays with simple expressions. In this respect, PDL follows in the footsteps of the APL programming language, and it has been compared to commercial languages such as MATLAB and Interactive Data Language, and to other free languages such as NumPy and Octave.[1] Unlike MATLAB and IDL, PDL allows great flexibility in indexing and vectorization: for example, if a subroutine normally operates on a 2-D matrix array, passing it a 3-D data cube will generally cause the same operation to happen to each 2-D layer of the cube.[2]

PDL borrows from Perl at least three basic types of program structure: imperative programming, functional programming, and pipeline programming forms may be combined. Subroutines may be loaded either via a built-in autoload mechanism or via the usual Perl module mechanism.

Graphics

A plot generated using PDL

True to the glue language roots of Perl, PDL borrows from several different modules for graphics and plotting support. NetPBM provides image file I/O (though FITS is supported natively). Gnuplot, PLplot, PGPLOT, and Prima modules are supported for 2-D graphics and plotting applications, and Gnuplot and OpenGL are supported for 3-D plotting and rendering.

I/O

PDL provides facilities to read and write many open data formats, including JPEG, PNG, GIF, PPM, MPEG, FITS, NetCDF, GRIB, raw binary files, and delimited ASCII tables. PDL programmers can use the CPAN Perl I/O libraries to read and write data in hundreds of standard and niche file formats.

Machine learning

PDL can be used for machine learning. It includes modules that are used to perform classic k-means clustering or general and generalized linear modeling methods such as ANOVA, linear regression, PCA, and logistic regression. Examples of PDL usage for regression modelling tasks include evaluating association between education attainment and ancestry differences of parents,[3] comparison of RNA-protein interaction profiles that needs regression-based normalization[4] and analysis of spectra of galaxies.[5]

perldl

An installation of PDL usually comes with an interactive shell known as perldl, which can be used to perform simple calculations without requiring the user to create a Perl program file. A typical session of perldl would look something like the following:

perldl> $x = pdl 1, 2], [3, 4;

perldl> $y = pdl 5, 6, 7],[8, 9, 0;

perldl> $z = $x x $y;

perldl> p $z;

[
 [21 24  7]
 [47 54 21]
]

The commands used in the shell are Perl statements that can be used in a program with PDL module included. x is an overloaded operator for matrix multiplication, and p in the last command is a shortcut for print.

Implementation

The core of PDL is written in C. Most of the functionality is written in PP, a PDL-specific metalanguage that handles the vectorization of simple C snippets and interfaces them with the Perl host language via Perl's XS compiler. Some modules are written in Fortran, with a C/PP interface layer. Many of the supplied functions are written in PDL itself. PP is available to the user to write C-language extensions to PDL. There is also an Inline module (Inline::Pdlpp) that allows PP function definitions to be inserted directly into a Perl script; the relevant code is low-level compiled and made available as a Perl subroutine.

The PDL API uses the basic Perl 5 object-oriented functionality: PDL defines a new type of Perl scalar object (eponymously called a "PDL", or "ndarray") that acts as a Perl scalar, but that contains a conventional typed array of numeric or character values. All of the standard Perl operators are overloaded so that they can be used on PDL objects transparently, and PDLs can be mixed-and-matched with normal Perl scalars. Several hundred object methods for operating on PDLs are supplied by the core modules.

See also

References

External links