Autovivification

From HandWiki

In the Perl programming language, autovivification is the automatic creation of new arrays and hashes as required every time an undefined value is dereferenced. Perl autovivification allows a programmer to refer to a structured variable, and arbitrary sub-elements of that structured variable, without expressly declaring the existence of the variable and its complete structure beforehand.[1] In contrast, other programming languages either: 1) require a programmer to expressly declare an entire variable structure before using or referring to any part of it; or 2) require a programmer to declare a part of a variable structure before referring to any part of it; or 3) create an assignment to a part of a variable before referring, assigning to or composing an expression that refers to any part of it.

Perl autovivification can be contrasted against languages such as Python, PHP, Ruby, and many of the C style languages, where dereferencing null or undefined values is not generally permitted.[lower-alpha 1] It can be compared to the HTML standard's "named access on the window object"[2] which results in corresponding globally scoped variables being automatically accessible to browser-based JavaScript.

Hashes

It is important to remember that autovivification happens when an undefined value is dereferenced. An assignment is not necessary. The debugger session below illustrates autovivification of a hash just from examining it:

DB<1> x \%h
0  HASH(0x2f1a248)
     empty hash
  DB<2> x $h{1}{2}{3}{4}
0  undef
  DB<3> x \%h
0  HASH(0x2f1a248)
   1 => HASH(0x2f1a260)
      2 => HASH(0x29a3c68)
         3 => HASH(0x2dc3038)
              empty hash
  DB<4>

The debugger session below illustrates autovivification of a hash from assigning to an inner hash:

DB<1> $h{A}{B}{C}{D}=1
  DB<2> x \%h
   0  HASH(0x83c71ac)
   'A' => HASH(0x837d50c)
      'B' => HASH(0x83c71e8)
         'C' => HASH(0x83c7218)
            'D' => 1
  DB<3>

Hashes several layers deep were created automatically without any declarations. Autovivification can prevent excessive typing. If Perl did not support autovivification, the structure above would have to be created as follows:

DB<1> %h = (A => {B => {C => {D => 1}}})
  DB<2> x \%h
  0  HASH(0x83caba4)
   'A' => HASH(0x83cfc28)
      'B' => HASH(0x83cab74)
         'C' => HASH(0x83b6110)
            'D' => 1
  DB<3>

File and directory handles

Perl 5.6.1 and newer support autovivification of file and directory handles.[3] Calling open() on an undefined variable will set it to a filehandle. According to perl561delta, "[t]his largely eliminates the need for typeglobs when opening filehandles that must be passed around, as in the following example:

for my $file ( qw(this.conf that.conf) ) {
    my $fin = open_or_throw('<', $file);
    process_conf($fin);
    # no close() needed
}
use Carp;
sub open_or_throw {
    my ($mode, $filename) = @_;
    open my $h, $mode, $filename
        or croak "Could not open '$filename': $!";
    return $h;
}

Emulation in other programming languages

C++

The C++ Standard Library's associative containers (std::unordered_map and std::map) use operator[] to get the value associated to a key. If there is nothing associated to this key, it will construct it and value initialize [4] the value. For simple types like int or float, the value initialization will be zero initialization.

std::unordered_map<std::string, std::vector<int>> a;
a["the answer"].push_back(42); // Autovivifies the a["the answer"] vector and then appends to it.

Another example of counting occurrences of strings:

std::unordered_map<std::string, int> counts;
while (auto& s = GetNextString()) {
  counts[s]++; // creates counts[s] if it doesn't exist and set to zero, then increment.
}

A similar trick can be achieved with the insert() method, which returns an iterator to the element associated to the key, even if it already exists.

Python

Python's built-in dict class can be subclassed to implement autovivificious dictionaries simply by overriding the __missing__() method that was added to the class in Python v2.5.[5] There are other ways of implementing the behavior,[6][7] but the following is one of the simplest and instances of the class print just like normal Python dictionary objects.

>>> class Tree(dict):
...     def __missing__(self, key):
...         value = self[key] = type(self)()
...         return value

>>> # Common names by class, order, genus, and type-species
>>> common_names = Tree()
>>> common_names['Mammalia']['Primates']['Homo']['H. sapiens'] = 'human being'
>>> common_names
{'Mammalia': {'Primates': {'Homo': {'H. sapiens': 'human being'}}}}

>>> # Famous quotes by play, act, scene, and page
>>> quotes = Tree()
>>> quotes['Hamlet'][1][3][3] = 'This above all: to thine own self be true.'
>>> quotes
{'Hamlet': {1: {3: {3: 'This above all: to thine own self be true.'}}}}

Ruby

Ruby hashes can take a block specifying an object to be returned for non-existing indexes. These can be used to implement autovivificious maps.

irb(main):001:0> tree = proc { Hash.new { |hash, key| hash[key] = tree.call } }
=> #<Proc:0x007fda528749a0@(irb):1>
irb(main):002:0> lupin = tree.call
=> {}
irb(main):003:0> lupin["express"][3] = "stand and deliver"
=> "stand and deliver"
irb(main):004:0> lupin
=> {"express"=>{3=>"stand and deliver"}}

Java

Java Map has a method computeIfAbsent[8] that can be used to emulate autovivificous maps.

public static <K,V> Function<K, V> defaultDict(Map<K, V> map, Supplier<? extends V> supplier) {
    return key -> map.computeIfAbsent(key, k -> supplier.get());
}

public static void main(String[] args) {
    Function<String, List<String>> dict = defaultDict(new HashMap<>(), ArrayList::new);
    dict.apply("foo").add("bar");
}

PHP

PHP arrays are natively autovivificious.

$arr = array();
$arr["express"][3] = "stand and deliver";

However, this only applies to assignment, and not array access.

JavaScript

ES6 introduces a new Proxy class that can be used to implement autovivification. With other features of JavaScript, this can be reduced to a single line of code:

var tree = () => new Proxy({}, { get: (target, name) => name in target ? target[name] : target[name] = tree() });

// Test:
var t = tree();
t.first.second.third = 'text';
console.log(t.first.second.third); // or t['first']['second']['third']

C#

C#, using indexers and C# 4.0 dynamics,

class Tree
{
    private IDictionary<string, object> _dict = new Dictionary<string, object>();

    public dynamic this[string key]
    {
        get { return _dict.ContainsKey(key) ? _dict[key] : _dict[key] = new Tree(); }
        set { _dict[key] = value; }
    }
}

// Test:
var t = new Tree();
t["first"]["second"]["third"] = "text";
Console.WriteLine(t["first"]["second"]["third"]);

DynamicObject can be used for implementing different syntaxes also,

using System;
using System.Collections.Generic;
using System.Dynamic;

class Tree : DynamicObject
{
    private IDictionary<object, object> dict = new Dictionary<object, object>();

    // for t.first.second.third syntax
    public override bool TryGetMember(GetMemberBinder binder, out object result)
    {
        var key = binder.Name;

        if (dict.ContainsKey(key))
            result = dict[key];
        else
            dict[key] = result = new Tree();

        return true;
    }
    
    public override bool TrySetMember(SetMemberBinder binder, object value)
    {
        dict[binder.Name] = value;
        return true;
    }

    // for t["first"]["second"]["third"] syntax
    public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result)
    {
        var key = indexes[0];

        if (dict.ContainsKey(key))
            result = dict[key];
        else
            dict[key] = result = new Tree();

        return true;
    }

    public override bool TrySetIndex(SetIndexBinder binder, object[] indexes, object value)
    {
        dict[indexes[0]] = value;
        return true;
    }
}

// Test:
dynamic t = new Tree();
t.first.second.third = "text";
Console.WriteLine(t.first.second.third);

// or,
dynamic t = new Tree();
t["first"]["second"]["third"] = "text";
Console.WriteLine(t["first"]["second"]["third"]);

See also

Notes

  1. For example, Python raises an TypeError if None.__getitem__ is called. Dereferencing a null pointer in C results in undefined behavior; many C implementations choose to raise a segmentation fault.

References

External links