Dope vector

From HandWiki

In computer programming, a dope vector is a data structure used to hold information about a data object,[1] especially its memory layout.

Purpose

Dope vectors are most commonly used to describe arrays, which commonly store multiple instances of a particular datatype as a contiguous block of memory. For example, an array containing 100 elements, each of which occupies 32 bytes, requires 100 × 32 bytes. By itself, such a memory block has no place to keep track of how large the array (or other object) is overall, how large each element within it is, or how many elements it contains. A dope vector is a place to store such information. Dope vectors can also describe structures which may contain arrays or variable elements.

If such an array is stored contiguously, with the first byte at memory location M, then its last byte is at location M + 3199. A major advantage of this arrangement is that locating item N is easy: it begins at location M + (N × 32). Of course, the value 32 must be known (this value is commonly called the "stride" of the array or the "width" of the array's elements). Navigating an array data structure using an index is called dead reckoning.

This arrangement, however (without adding dope vectors) means that having the location of item N is not enough to discover the index N itself; or the stride; or whether there are elements at N − 1 or N + 1. For example, a function or method may iterate over all the items in an array and pass each one to another function or method, which does not know the item is part of an array at all, much less where or how large the array is.

Without a dope vector, even knowing the address of the entire array does not tell you how big it is. This is important because writing to the N + 1 element in an array that only contains N elements, will likely destroy some other data. Because many programming languages treat character strings as a kind of array, this leads directly to the infamous buffer overflow problem.

A dope vector reduces these problems by storing a small amount of metadata along with an array (or other object). With dope vectors, a compiler can easily (and optionally) insert code that prevents accidentally writing beyond the end of an array or other object. Alternatively, the programmer can access the dope vector when desired, for safety or other purposes.

Description

The exact set of metadata included in a dope vector varies from one language and/or operating system to another, but a dope vector for an array might contain:

  • a pointer to the location in memory where the array elements begin (this is normally identical to the location of the zeroth element of the array (element with all subscripts 0). (This might not be the first actual element if subscripts do not start at zero.)
  • the type of each array element (integer, Boolean, a particular class, etc.).
  • the rank of an array.
  • the extent of an array (its range of indices). (In many languages the starting index for arrays is fixed at zero, or one, but the ending index is set when the array is (re-)allocated.)
  • for arrays where the extent in use at a given time may change, the maximum and current extents may both be stored.
  • the stride of an array, or the amount of memory occupied by each element of the array.

A program then can refer to the array (or other dope-vector-using object) by referring to the dope vector. This is commonly automatic in high-level languages. Getting to an element of the array costs a tiny bit more (commonly one instruction, which fetches the pointer to the actual data from out of the dope vector). On the other hand, doing many other common operations are easier and/or faster:

  • Without a dope vector, determining the number of elements in the array is impossible. Thus it is common to add an extra element to the end of an array, with a "reserved" value (such as NULL). The length can then be determined by scanning forward through the array, counting elements until this "end-marker" is reached. Of course, this makes length-checking much slower than looking up the length directly in a dope vector.
  • Without knowing the extent of an array, it is not possible to free() (unallocate) that memory when it is no longer needed. Thus, without dope vectors, something must store that length somewhere else. For example, asking a particular OS to allocate space for a 3200-byte array, might cause it to allocate 3204 bytes at some location M; it would then store the size in the first 4 bytes, and tell the requesting program the allocated space starts at M+4 (so that the caller will not treat the extra 4 bytes as part of the array proper). This extra data is not considered a dope vector, but achieves some of the same goals.
  • Without dope vectors, extra information must also be kept about the stride (or width) of array elements. In C, this information is handled by the compiler, which must keep track of a datatype distinction between "pointer to an array of 20-byte-wide elements", and "pointer to an array of 1000-byte-wide elements". This means that a pointer to an element in either kind of array can be incremented or decremented in order to reach the next or previous element; but it also means that array widths must be fixed at an earlier stage.

Even with a dope vector, having (only) a pointer to a particular member of an array does not enable finding the position in the array, or the location of the array or the dope vector itself. If that is desired, such information can be added to each element within the array. Such per-element information can be useful, but is not part of the dope vector.

Dope vectors can be a general facility, shared across multiple datatypes (not just arrays and/or strings).[2]

See also

References

  1. Pratt, T.; Zelkowitz, M. (1996). Programming Languages: Design and Implementation (3rd ed.). Upper Saddle River, NJ: Prentice-Hall. p. 114. ISBN 978-0-13-678012-0. 
  2. Claybrook, Billy G. (October 13–15, 1976). "The design of a template structure for a generalized data structure definition facility". ICSE '76: 2nd international conference on Software engineering. San Francisco, California, USA: IEEE Computer Society Press. pp. 408–413. http://dl.acm.org/citation.cfm?id=807713&CFID=799276124&CFTOKEN=88822581.