Why does the indexing of Array start with Zero in C?

Martin Richards, creator of the BCPL language (a precursor of C), designed arrays initiating at 0 as the natural position to start accessing the array contents in the language, since the value of a pointer p used as an address accesses the position p+0 in memory.

The name of an array is essentially a pointer, a reference to a memory location, and so the expression array[n] refers to a memory location n-elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0]. Most programming languages have been designed this way, so indexing from 0 is pretty much inherent to the language.

So, 0-based index allows array[index] to be implemented as *(array + index). If index were 1-based, compiler would need to generate *(array + index - 1), and this "-1" would hurt the performance. Rather than subtracting 1, you should use the address of the array-1 as the base address. That eliminates the runtime subtraction. When you're writing a compiler, those extra instructions matter a lot. The compiler will be used to generate thousands of programs, each of which may be used thousands of times, and that extra 1 instruction may occur in several lines inside an n squared loop. It can add up to billions of wasted cycles.

However, E. Dijkstra later wrote a pertinent note, why numbering should start at zero in 1982. To denote the subsequence of natural numbers 1, 2, ..., 10 without the pernicious three dots, four conventions are open to us

a. 1 ≤ i < 11
b. 0 < i ≤ 10
c. 1 ≤ i ≤ 10
d. 0 < i < 11

Dijkstra argues that the proper notation should be able to denote naturally the two following cases :
1. The subsequence includes the smallest natural number, 0
2. The subsequence is empty

Leaves out b (0 < i ≤ 10) and d (0 < i < 11) since they would have the form -1 < i which uses a number not lying in the natural number set (Dijkstra says this is ugly). So we are left with a and c.

Leaves out c (1 ≤ i ≤ 10). Since for a set including 0 that is shrunk to the empty one, c takes the form 0 ≤ i ≤ -1. That is ugly, so for the upper bound we need to prefer <.

Hence we are left with a (1 ≤ i < 11). Also, subtracting the ranges in a get the sequence length, which is another plus. This is by far the most widely used notation in programming now.

So, each time you write something like

for(i=0; i<n; i++)
{
   sum += array[i];
}

You are not just following the rules of language notation. You are also promoting mathematical beauty!


You've successfully subscribed to Developer Insider
Great! Next, complete checkout for full access to Developer Insider
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.