Comparison of Object Pascal and C

From HandWiki

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design (and compile) their own compilers early in their lifetimes.

Both C and Pascal are old programming languages: The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972. While C didn't change much in time, Pascal has evolved a lot and nowadays the vast majority of Pascal programming is done in modern Object Pascal, not in the old procedural Pascal. The old procedural Pascal today is essentially limited to microcontroller programming with tools such as mikroPascal, while Object Pascal is the main dialect and is used with tools such as Delphi, Lazarus (IDE) and Free Pascal.

What is documented here is the modern Object Pascal used in Free Pascal and Delphi. The C documented is C99, as standardized in 1999.

Syntax

Syntactically, Object Pascal is much more Algol-like than C. English keywords are retained where C uses punctuation symbols — Pascal has and, or, and mod where C uses &&, ||, and % for example. However, C is actually more Algol-like than Pascal regarding (simple) declarations, retaining the type-name variable-name syntax. For example, C can accept declarations at the start of any block, not just the outer block of a function.

Semicolon use

Another, more subtle, difference is the role of the semicolon. In Pascal semicolons separate individual statements within a compound statement whereas they terminate the statement in C. They are also syntactically part of the statement itself in C (transforming an expression into a statement). This difference manifests itself primarily in two situations:

  • there can never be a semicolon directly before else in Pascal whereas it is mandatory in C (unless a block statement is used)
  • the last statement before an end is not required to be followed by a semicolon

A superfluous semicolon can be put on the last line before end, thereby formally inserting an empty statement.

Comments

In traditional C, there are only /* block comments */. Since C99, there are also //Line comments . In Object Pascal, there are { block comments }, (* block comments *), and // Line comments.

Identifiers and keywords

C and Pascal differ in their interpretation of upper and lower case. C is case sensitive while Pascal is not, thus MyLabel and mylabel are distinct names in C but identical in Pascal. In both languages, identifiers consist of letters and digits, with the rule that the first character may not be a digit. In C, the underscore counts as a letter, so even _abc is a valid name. Names with a leading underscore are often used to differentiate special system identifiers in C. Pascal also accepts _ character as a part of identifiers, no difference with C.

Both C and Pascal use keywords (words reserved for use by the language itself). Examples are if, while, const, for and goto, which are keywords that happen to be common to both languages. In C, the basic built-in type names are also keywords (e.g. int, char) or combinations of keywords (e.g. unsigned char), while in Pascal the built-in type names are predefined normal identifiers.

Recent Object Pascal compilers however allow to escape keywords with &, this feature is mainly need when directly communication to foreign OOP systems like COM and COCOA that might use fields and methods based on Pascal keywords. C has no way to escape keywords.

Definitions, declarations, and blocks

In Pascal, procedure definitions start with keywords procedure or function and type definitions with type. In C, function definitions are determined by syntactical context while type definitions use the keyword typedef. Both languages use a mix of keywords and punctuation for definitions of complex types; for instance, arrays are defined by the keyword array in Pascal and by punctuation in C, while enumerations are defined by the keyword enum in C but by punctuation in Pascal.

In Pascal functions, begin and end delimit a block of statements (proper), while C functions use "{" and "}" to delimit a block of statements optionally preceded by declarations. C (prior to C99) strictly defines that any declarations must occur before the statements within a particular block but allows blocks to appear within blocks, which is a way to go around this. Pascal is strict that declarations must occur before statements, but allows definitions of types and functions - not only variable declarations - to be encapsulated by function definitions to any level of depth.

Implementation

The grammars of both languages are of a similar size. From an implementation perspective the main difference between the two languages is that to parse C it is necessary to have access to a symbol table for types, while in Pascal there is only one such construct, assignment. For instance, the C fragment X * Y; could be a declaration of Y to be an object whose type is pointer to X, or a statement-expression that multiplies X and Y. The corresponding Pascal fragment var Y:^X; is unambiguous without a symbol table.

Simple types

Integers

Pascal requires all variable and function declarations to specify their type explicitly. In traditional C, a type name may be omitted in most contexts and the default type int (which corresponds to integer in Pascal) is then implicitly assumed (however, such defaults are considered bad practice in C and are often flagged by warnings).

C accommodates different sizes and signed and unsigned modes for integers by using modifiers such as long, short, signed, unsigned, etc. The exact meaning of the resulting integer type is machine-dependent, however, what can be guaranteed is that long int is no shorter than int and int is no shorter than short int. However, in C standard, there are at least minimal sizes of types are specified which guarantees char to be a single byte and int to be at least two bytes.

Subranges

In Pascal, a similar end is performed by declaring a subrange of integer (a compiler may then choose to allocate a smaller amount of storage for the declared variable):

type a = 1..100;
     b = -20..20;
     c = 0..100000;

This subrange feature is not supported by C.

A major, if subtle, difference between C and Pascal is how they promote integer operations. In Pascal, all operations on integers or integer subranges have the same effect, as if all of the operands were promoted to a full integer. In C, there are defined rules as to how to promote different types of integers, typically with the resultant type of an operation between two integers having a precision that is greater than or equal to the precisions of the operands. This can make machine code generated from C efficient on many processors. A highly optimizing Pascal compiler can reduce, but not eliminate, this effect under standard Pascal rules.

The (only) pre-Standard implementation of C as well as Small-C et al. allowed integer and pointer types to be relatively freely intermixed.

Character types

In C the character type is char which is a kind of integer that is no longer than short int, . Expressions such as 'x'+1 are therefore perfectly legal, as are declarations such as int i='i'; and char c=74;.

This integer nature of char (an eight-bit byte on most machines) is clearly illustrated by declarations such as

unsigned char uc = 255;  /* common limit */
signed char sc = -128;   /* common negative limit */

Whether the char type should be regarded as signed or unsigned by default is up to the implementation.

In Pascal, characters and integers are distinct types. The inbuilt compiler functions ord() and chr() can be used to typecast single characters to the corresponding integer value of the character set in use, and vice versa. e.g. on systems using the ASCII character set ord('1') = 49 and chr(9) is a TAB character.

In addition to Char type, Object Pascal also has WideChar to represent Unicode characters. In C, this is usually implemented as a macro or typedef with name wchar_t, which is simply an alias for int.

Boolean types

In Pascal, boolean is an enumerated type. The possible values of boolean are false and true, with ordinal value of false=0 and true=1, other values are undefined. For conversion to integer, ord is used:

i := ord(b);

There is no standard function for integer to boolean, however, the conversion is simple in practice:

b := boolean(i); // Will raise proper rangecheck errors for undefined values with range checks on.

C has binary valued relational operators (<, >, ==, !=, <=, >=) which may be regarded as boolean in the sense that they always give results which are either zero or one. As all tests (&&, ||, ?:, if, while, etc.) are performed by zero-checks, false is represented by zero, while true is represented by any other value.

To interface with COM, Object Pascal has added ByteBool, WordBool and LongBool type whose size respects their prefix and that follow the C truth table.

Free Pascal has added proper Pascal boolean types with size suffix (boolean8, 16, 32, 64) to interface with GLIB, that uses gboolean, a 32-bit boolean type with Pascal truth table.

Bitwise operations

The C programmer may sometimes use bitwise operators to perform boolean operations. Care needs to be taken because the semantics are different when operands make use of more than one bit to represent a value.

Pascal has another more abstract, high level method of dealing with bitwise data, sets. Sets allow the programmer to set, clear, intersect, and unite bitwise data values, rather than using direct bitwise operators. Example;

Pascal:

Status := Status + [StickyFlag]; // or Include(Status,StickyFlag);
Status := Status - [StickyFlag]; // or Exclude(Status,StickyFlag);
if (StickyFlag in Status) then ...

C:

Status |= StickyFlag;
Status &= ~StickyFlag;
if (Status & StickyFlag) { ...

Although bit operations on integers and operations on sets can be considered similar if the sets are implemented using bits, there is no direct parallel between their uses unless a non-standard conversion between integers and sets is possible.

Pascal could also do bitwise operations exactly the same way as C through the use of and, or, not and xor operators. These operators normally work on booleans, but when the operands are integers, they behave as bitwise operators. This is made possible by boolean and integer being distinct incompatible types. Therefore, the C code above could be written in Pascal as:

Status := Status or StickyFlag;
Status := Status and not StickyFlag;
if Status and StickyFlag <> 0 then ...

Advanced types

String type

In C, string remains as pointer to the first element of a null-terminated array of char, as it was in 1972. One still has to use library support from <string.h> to manipulate strings.

Object Pascal has many string types because when a new type is introduced, the old one is kept for backwards compatibility. This happened twice, once with Delphi 2 (introduction of ansistring) and Delphi 2009 (Unicodestring). Besides the main string types (short-, ansi-, wide-, unicodestring) and the corresponding character types (ansichar, widechar=unicodechar), all types derived from the character type have some string properties too (pointer to char, array of char, dynamic array of char, pointer to array of char etc.).

In Object Pascal, string is a compiler-managed type and is reference-counted (if it has to be), i.e., its storage management is handled by the compiler (or more accurately, by the runtime code inserted by the compiler in the executable). String concatenation is done with the + operator, and string comparison can be done with standard relational operators (case sensitive): < <= = <> >= >.

Object Pascal also provides C-compatible strings under the type PAnsiChar, with manipulation routines defined in the Strings unit. Moreover, Object Pascal provides a wide variety of string types:

  • ShortString, which internally is an
    array [0 .. N] of Char;
    with N as the maximum number of characters that can be stored and the 0th index containing the string length. Maximally 255 characters can be stored in a ShortString, because the upper limit of an unsigned byte is 255 and the container array is defined to have maximally 255 characters data (remember that the 0th index contains the string length). N is given at either type definition or variable declaration (see example below)
  • AnsiString, a dynamic unlimited-length and reference-counted version of ShortString. Since Delphi 2009, it has a field that signals the encoding of the contents.
  • WideString, on Windows(win32/64/ce) compatible to COM BSTR, UCS2/UTF16 refcounted by COM. On systems other than Windows, equal to Unicodestring.
  • UnicodeString, like WideString, but encoded in UTF-16

For convenience, the plain String type is provided, which, depending on a compiler switch, could mean ShortString, AnsiString or UnicodeString. An additional convention used is that if a limit to the number of characters is given, it is a ShortString, otherwise it's the other.

Short- and Ansi- Strings can be freely intermixed when manipulating strings; the compiler will do silent conversion when required. Note that if the target string type is ShortString, silent truncation might happen due to the maximum length allowed.

Example:

type
  TString80 = String[80];
var
  ss  : ShortString;
  s80 : String[80]; // declare a (short-)string of maximum length 80
  s80t: TString80; // same as above
  astr: AnsiString;
  s   : String; // could mean String[255], AnsiString or UnicodeString
begin
  ss := astr + s80; // YES, this is possible and conversion is done transparently by the compiler
end;

Array type

Static array

In C, there is no real concept of an array; there is only a pseudo construct to declare storage for multiple variables of the same type. Arrays in C don't know their own length, and they're referenced through a pointer to the first element, which is why they're always 0 based. Example:

// declare int "array" named a of length 10
int a[10];
// print the first element, or more precisely element at address hold by a + 0
printf("%d",a[0]);
// print the second element, or more precisely element at address hold by a + 1
printf("%d",a[1]);
// pass array to a function, or more precisely pass the pointer to the first element
somefunction(a);
// same as above
somefunction(&a[0]);

To get the array length, one has to calculate sizeof(<array_variable>) / sizeof(<base_type>). Therefore, to count the length of an integer array, use: sizeof(intarr) / sizeof(int). It is a common mistake to calculate this in a function expecting an array as an argument. Despite its appearance, a function can only accept a pointer as an argument, not the real array. Therefore, inside the function, the array is treated as plain pointer. Example:

// This function does NOT accept array, but a pointer to int
// Semantically, it's the same as: int *a
void func(int a[]) {
  // WRONG! Would return sizeof(pointer) / sizeof(int)
  int len = sizeof(a) / sizeof(int);
}

int main() {
  int a[5];
  // correct, would return 5
  int len = sizeof(a) / sizeof(int);
  func(a);
  return 0;
}

A common solution to the problem above is to always pass the array length as a function argument, and functions that expect an array argument should also provide a placeholder for its length.

Despite its treatment as a pointer, not all pointer style constructs could be used to array. For example, this code would compile fine but would cause access violation when executed:

void func(int *a) {
  // RUNTIME ERROR! a is allocated statically
  a = (int*) malloc(sizeof(int) * 10);
}

int main() {
  int a[5];
  func(a);
}

Care should be taken when designing such code, and documentation should explicitly state this to prevent users from doing such a mistake.

Assignment between static arrays isn't allowed and one must use the memcpy function and its variants to copy data between arrays.

In Pascal, an array is declared using the array keyword, specifying its lower and upper bound, and its base type. The latter is usually defined as a range type. For example:

type
  T10IntegerArray = array [1 .. 10] of Integer;
  TNegativeLowerBoundArray = array [-5 .. 5] of Integer;
  TNamedIndexTypeArray = array [Low(Char) .. High(Char)] of Integer;
var
  IntegerArray: T10IntegerArray;
  NegArray: TNegativeLowerBoundArray;
  NamedIndexTypeArray: TNamedIndexTypeArray;

Arrays know their upper and lower bounds (and implicitly their length), and the bounds are passed along when a function expects an array as argument. The functions Low(), High() and Length() retrieve the lower bound, upper bound and array length, respectively, in any context.

Without an explicit cast, arrays can't and won't be converted to pointers and it is a compile time error. This is a property of type-safe programming.

Assignment between static arrays is allowed. The assignment copies all items from the source array to the destination. It is mandatory that the upper and lower bounds are compatible between source and destination. If somehow they're different, then one can use Move to partially copy data. However, since Move is a low-level function, one must use it with care. It is the programmer's responsibility to ensure that data movement exceeds neither destination nor source boundary. Example:

type
  TArray1 = array [1 .. 10] of Integer;
  TArray2 = array [1 .. 5] of Integer;
var
  a,b: TArray1;
  c: TArray2;
begin
  a := b; // OK
  // Copy all elements from c to a, overwriting elements from the 1st index of a up to 1st index + Length(c)
  Move(c,a,Length(c) * SizeOf(Integer));
  // Copy all elements from c to a, starting at index 5 of a
  Move(c,a[5],Length(c) * SizeOf(Integer));
  // Copy first 5 elements from b to c
  Move(b,c,5 * SizeOf(Integer));
end.

Dynamic array

C has no language support for declaring and using dynamic arrays. However, due to its pointer dereference syntax, a dynamic array could be implemented with memory management functions, usually those from <stdlib.h>. Example:

int size = 10;
int *a = (int*) malloc(sizeof(int) * size); // allocate dynamic array of integer with size 10
int i;

for (i = 0; i < size; i++)
  ... // do something with a[i]

size *= 2;
int *temp = realloc(a,sizeof(int) * size); // double the space, retaining the existing elements
if (temp == NULL)
  error("Not enough memory!");
a = temp;
... // do something with a
free(a); // free the storage

As can be seen, again the length isn't maintained automatically, and reallocation should use an additional variable to protect against not enough memory error. Assignment between dynamic arrays follows pointer assignment rule.

Object Pascal provides language-level support for dynamic arrays. It's declared with lower and upper bound omitted. One then must call SetLength() function to allocate the storage. Dynamic arrays in Object Pascal are reference counted, so one doesn't have to worry about freeing the storage. Dynamic arrays are always zero-based. The three functions Low(), High() and Length() would still retrieve lower bound, upper bound and array length correctly. Example:

type
  TIntArray = array of Integer;
  T2DimIntArray = array of array of Integer;
var
  a  : TIntArray;
  a2 : T2DimIntArray;
  i,j: Integer;
begin
  SetLength(a,10); // allocate 10 storage
  for i := Low(a) to High(a) do
    ... // do something with a[i]
  SetLength(a2,10,10); // allocate 10 x 10 storage
  for i := Low(a2) to High(a2) do
    for j := Low(a2[i]) to High(a2[i]) do
      ... // do something with a[i,j]
end;

Assignment between dynamic arrays copies the reference of the source array to the destination. If a real copy is required, one can use the Copy function. Example:

type
  TIntegerArray = array of Integer;
var
  a,b: TIntegerArray;
begin
  ... // initialize a and b
  a := b; // a now points to the same array pointed by b
  a[1] := 0; // b[1] should be 0 as well after this
  a := Copy(b,3,5); // Copy 5 elements from b starting from index 3
                    // a would access it from 0 to 4 however
end.

Further reading

  • Free Pascal: Language Reference [1]