Saturday, January 24, 2009

pieces of it......

ARRAYS

An array is a container object that holds a fixed number of values of a single type. The length of an array is established when the array is created. After creation, its length is fixed. You've seen an example of arrays already, in the main method of the "Hello World!" application. This section discusses arrays in greater detail.

It is a data structure that contains a number of variables called the elements of the array. The array elements are accessed through computed indexes. C# arrays are zero indexed; that is, the array indexes start at zero. All of the array elements must be of the same type, which is called the element type of the array. Array elements can be of any type, including an array type. An array can be a single-dimensional array, or a multidimensional array.

Illustration of an array as 10 boxes numbered 0 through 9; an index of 0 indicates the first element in the array

An array of ten elements

Each item in an array is called an element, and each element is accessed by its numerical index. As shown in the above illustration, numbering begins with 0. The 9th element, for example, would therefore be accessed at index 8.



Indexed Versus Associative Arrays

There are two kinds of arrays in PHP: indexed and associative. The keys of an indexed array are integers, beginning at 0. Indexed arrays are used when you identify things by their position. Associative arrays have strings as keys and behave more like two-column tables. The first column is the key, which is used to access the value.

PHP internally stores all arrays as associative arrays, so the only difference between associative and indexed arrays is what the keys happen to be. Some array features are provided mainly for use with indexed arrays, because they assume that you have or want keys that are consecutive integers beginning at 0. In both cases, the keys are unique--that is, you can't have two elements with the same key, regardless of whether the key is a string or an integer.

PHP arrays have an internal order to their elements that is independent of the keys and values, and there are functions that you can use to traverse the arrays based on this internal order. The order is normally that in which values were inserted into the array, but the sorting functions described later let you change the order to one based on keys, values, or anything else you choose.

Array

An array is a data structure consisting of a group of elements that are accessed by indexing. In most programming languages each element has the same data type and the array occupies a contiguous area of storage.

Array is sometimes really called an "associative array". Some programming languages support array programming which generalises operations and functions to work transparently over arrays as they do with scalars, instead of requiring looping over array members.

Multi-dimensional arrays are accessed using more than one index: one for each dimension. Multidimensional indexing reduced to a lesser number of dimensions, for example, a two-dimensional array with consisting of 6 and 5 elements respectively could be represented using a one-dimensional array of 30 elements.
Arrays can be classified as fixed-sized arrays (sometimes known as static arrays) whose size cannot change once their storage has been allocated, or dynamic arrays, which can be resized.


Properties of arrays

There are properties of arrays such as permitting constant time (O(1)) random access to individual elements, which is optimal, but moving elements requires time proportional to the number of elements moved. On actual hardware, the presence of e.g. caches can make sequential iteration over an array noticeably faster than random access — a consequence of arrays having good locality of reference because their elements occupy contiguous memory locations — but this does not change the asymptotic complexity of access. Likewise, there are often facilities (such as memcpy) which can be used to move contiguous blocks of array elements faster than one can do through individual element access, but that does not change the asymptotic complexity either.

Properties in Comparison of arrays

Dynamic arrays have similar characteristics to arrays, but can grow. The price for this is a memory overhead, due to elements being allocated but not used. With a constant per-element bound on the memory overhead, dynamic arrays can grow in constant amortized time per element.

Associative arrays provide a mechanism for array-like functionality without huge storage overheads when the index values are sparse. Specialized associative arrays with integer keys include Patricia tries and Judy arrays.

Balanced trees require O(log n) time for index access, but also permit inserting or deleting elements in Θ(log n) time. Arrays require O(n) time for insertion and deletion of elements.

Uses of arrays

There are different kinds of usage of arrays, such as implementing mathematical vectors and matrices, as well as other kinds of rectangular tables. In early programming languages, these were often the applications that motivated having arrays.
Because of their performance characteristics, arrays are used to implement other data structures, such as heaps, hash tables, deques, queues, stacks, strings, and VLists.
One or more large arrays are sometimes used to emulate in-program dynamic memory allocation, particularly memory pool allocation. Historically, this has sometimes been the only way to allocate "dynamic memory" portably.

Array accesses with statically predictable access patterns are a major source of data parallelism.
Some algorithms store a variable number of elements in part of a fixed-size array, which is equivalent to using dynamic array with a fixed capacity; the so-called Pascal strings are examples of this.

Multi-dimensional arrays

Ordinary arrays are indexed by a single integer. Also useful, particularly in numerical and graphics applications, is the concept of a multi-dimensional array, in which we index into the array using an ordered list of integers, such as in a[3,1,5]. The number of integers in the list used to index into the multi-dimensional array is always the same and is referred to as the array's dimensionality, and the bounds on each of these are called the array's dimensions. An array with dimensionality k is often called k-dimensional. One-dimensional arrays correspond to the simple arrays discussed thus far; two-dimensional arrays are a particularly common representation for matrices. In practice, the dimensionality of an array rarely exceeds three. Mapping a one-dimensional array into memory is obvious, since memory is logically itself a (very large) one-dimensional array. When we reach higher-dimensional arrays, however, the problem is no longer obvious. Suppose we want to represent this simple two-dimensional array:

It is most common to index this array using the RC-convention, where elements are referred in row, column fashion or , such as:

Common ways to index into multi-dimensional arrays include:

Row-major order. Used most notably by statically-declared arrays in C. The elements of each row are stored in order.
1
2
3
4
5
6
7
8
9

Column-major order. Used most notably in Fortran. The elements of each column are stored in order.
1
4
7
2
5
8
3
6
9
Arrays of arrays. Multi-dimensional arrays are typically represented by one-dimensional arrays of references (Iliffe vectors) to other one-dimensional arrays. The subarrays can be either the rows or columns.

The first two forms are more compact and have potentially better locality of reference, but are also more limiting; the arrays must be rectangular, meaning that no row can contain more elements than any other. Arrays of arrays, on the other hand, allow the creation of ragged arrays, also called jagged arrays, in which the valid range of one index depends on the value of another, or in this case, simply that different rows can be different sizes. Arrays of arrays are also of value in programming languages that only supply one-dimensional arrays as primitives.

In many applications, such as numerical applications working with matrices, we iterate over rectangular two-dimensional arrays in predictable ways. For example, computing an element of the matrix product AB involves iterating over a row of A and a column of B simultaneously. In mapping the individual array indexes into memory, we wish to exploit locality of reference as much as we can. A compiler can sometimes automatically choose the layout for an array so that sequentially accessed elements are stored sequentially in memory; in our example, it might choose row-major order for A, and column-major order for B. Even more exotic orderings can be used, for example if we iterate over the main diagonal of a matrix.

Parallel Array

In computing, a parallel array is a data structure for representing arrays of records. It keeps a separate, homogeneous array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory. For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.

An example in C using parallel arrays:

int ages[] = {0, 17, 2, 52, 25};
char *names[] = {"None", "Mike", "Billy", "Tom", "Stan"};
int parent[] = {0 /*None*/, 3 /*Tom*/, 1 /*Mike*/, 0 /*None*/, 3 /*Tom*/};

for(i = 1; i <= 4; i++) { printf("Name: %s, Age: %d, Parent: %s \n", names[i], ages[i], names[parent[i]]); } Or, in Python:

firstName = ['Joe', 'Bob', 'Frank', 'Hans' ]
lastName = ['Smith','Seger','Sinatra','Schultze']
heightInCM = [169, 158, 201, 199 ]

for i in xrange(len(firstName)):
print "Name: %s %s" % (firstName[i], LastName[i])
print "Height in CM: %s" % heightInCM[i]

Parallel arrays have a number of practical advantages over the normal approach:
They can be used in languages which support only arrays of primitive types and not of records (or perhaps don't support records at all).
Parallel arrays are simple to understand and use, and are often used where declaring a record is more trouble than it's worth.

They can save a substantial amount of space in some cases by avoiding alignment issues. For example, one of the fields of the record can be a single bit, and its array would only need to reserve one bit for each record, whereas in the normal approach many more bits would "pad" the field so that it consumes an entire byte or a word.

If the number of items is small, array indices can occupy significantly less space than full pointers, particularly on architectures with large words.

Sequentially examining a single field of each record in the array is very fast on modern machines, since this amounts to a linear traversal of a single array, exhibiting ideal locality of reference and cache behavior.

However, parallel arrays also have several strong disadvantages, which serves to explain why they are not generally preferred:

  • They have significantly worse locality of reference when visiting the records sequentially and examining multiple fields of each record, which is the norm.

  • They obscure the relationship between fields of a single record.

  • They have little direct language support (the language and its syntax typically express no relationship between the arrays in the parallel array.)

  • They are expensive to grow or shrink, since each of several arrays must be reallocated.

  • The bad locality of reference is the worst issue. However, a compromise can be made in some cases: if a structure can be divided into groups of fields that are generally accessed together, an array can be constructed for each group, and its elements are records containing only these subsets of the larger structure's fields. This is a valuable way of speeding up access to very large structures with many members, while keeping the portions of the structure tied together. An alternative to tying them together using array indexes is to use references to tie the portions together, but this can be less efficient in time and space. Some compiler optimizations, particularly for vector processors, are able to perform this transformation automatically when arrays of structures are created in the program.

Variable-length arrays

In programming, a variable length array (or VLA) is an array data structure of automatic storage duration whose length is determined at run time (instead of at compile time).
Programming languages that support VLAs include APL, COBOL, and C (added in C99).

Examples


The following C99 function allocates a variable-length array of a specified size, fills it with floating-point values, then passes it to another function for processing. Because the array is declared as an automatic variable, its lifetime ends when the read_and_process function returns.

float read_and_process(int n)
{
float vals[n];

for (int i = 0; i <>

The following COBOL fragment declares a variable-length array of records.

DATA DIVISION.
WORKING-STORAGE SECTION.
01 VAR-ITEM.
05 VAR-CNT PIC S9(4) BINARY.
05 VAR-PERSON OCCURS 0 TO 20 TIMES DEPENDING ON VAR-CNT.
10 VAR-NAME PIC X(20).
10 VAR-WAGE PIC S9(7)V99 PACKED-DECIMAL.

Languages such as C# and Java do not have variable-length arrays, because all arrays in those languages are dynamically allocated on the heap and therefore do not have automatic storage duration.

Friday, January 23, 2009

Well... That's The Way iT iS!

Array

In computer science, an array[1] is a data structure consisting of a group of elements that are accessed by indexing. In most programming languages each element has the same data type and the array occupies a contiguous area of storage.

Overview

Most programming languages have a built-in array data type, although what is called an array in the language documentation is sometimes really an associative array. Conversely, the contiguous storage kind of array discussed here may alternatively be called a vector, list,[2] or table.[3]

Some programming languages support array programming (e.g., APL, newer versions of Fortran) which generalises operations and functions to work transparently over arrays as they do with scalars, instead of requiring looping over array members.

Multi-dimensional arrays are accessed using more than one index: one for each dimension. Multidimensional indexing reduced to a lesser number of dimensions, for example, a two-dimensional array with consisting of 6 and 5 elements respectively could be represented using a one-dimensional array of 30 elements.

Arrays can be classified as fixed-sized arrays (sometimes known as static arrays) whose size cannot change once their storage has been allocated, or dynamic arrays, which can be resized.

Properties

Arrays permit constant time (O(1)) random access to individual elements, which is optimal, but moving elements requires time proportional to the number of elements moved. On actual hardware, the presence of e.g. caches can make sequential iteration over an array noticeably faster than random access — a consequence of arrays having good locality of reference because their elements occupy contiguous memory locations — but this does not change the asymptotic complexity of access. Likewise, there are often facilities (such as memcpy) which can be used to move contiguous blocks of array elements faster than one can do through individual element access, but that does not change the asymptotic complexity either.

Memory-wise, arrays are compact data structures with no per-element overhead. There may be a per-array overhead, e.g. to store index bounds, but this is language-dependent. It can also happen that elements stored in an array require less memory than the same elements stored in individual variables, because several array elements can be stored in a single word; such arrays are often called packed arrays.

Properties in comparison

Dynamic arrays have similar characteristics to arrays, but can grow. The price for this is a memory overhead, due to elements being allocated but not used. With a constant per-element bound on the memory overhead, dynamic arrays can grow in constant amortized time per element.

Associative arrays provide a mechanism for array-like functionality without huge storage overheads when the index values are sparse. Specialized associative arrays with integer keys include Patricia tries and Judy arrays.

Balanced trees require O(log n) time for index access, but also permit inserting or deleting elements in Θ(log n) time.[4] Arrays require O(n) time for insertion and deletion of elements.

Applications

Arrays are used to implement mathematical vectors and matrices, as well as other kinds of rectangular tables. In early programming languages, these were often the applications that motivated having arrays.

Because of their performance characteristics, arrays are used to implement other data structures, such as heaps, hash tables, deques, queues, stacks, strings, and VLists.

One or more large arrays are sometimes used to emulate in-program dynamic memory allocation, particularly memory pool allocation. Historically, this has sometimes been the only way to allocate "dynamic memory" portably.

Array accesses with statically predictable access patterns are a major source of data parallelism.

Some algorithms store a variable number of elements in part of a fixed-size array, which is equivalent to using dynamic array with a fixed capacity; the so-called Pascal strings are examples of this.

Indexing

The valid index values of each dimension of an array are a bounded set of integers (or values of some enumerated type). Programming environments that check indexes for validity are said to perform bounds checking.

Index of the first element

The index of the first element (sometimes called the "origin") varies by language. There are three main implementations: zero-based, one-based, and n-based arrays, for which the first element has an index of zero, one, or a programmer-specified value. The zero-based array is more natural in the root machine language and was popularized by the C programming language, where the abstraction of array is very weak, and an index n of a one-dimensional array is simply the offset of the element accessed from the address of the first (or "zeroth") element (scaled by the size of the element). One-based arrays are based on traditional mathematics notation for matrices and most, but not all, mathematical sequences. n-based is made available so the programmer is free to choose the lower bound, which may even be negative, which is most naturally suited for the problem at hand.

The Comparison of programming languages (array), indicates the base index used by various languages.

Supporters of zero-based indexing sometimes criticize one-based and n-based arrays for being slower. Often this criticism is mistaken when one-based or n-based array accesses are optimized with common subexpression elimination (for single dimensioned arrays) and/or with well-defined dope vectors (for multi-dimensioned arrays). However, in multidimensional arrays where the net offset into linear memory is computed from all of the indices, zero-based indexing is more natural, simpler, and faster. Edsger W. Dijkstra expressed an opinion in this debate: Why numbering should start at zero.

The 0-based/1-based debate is not limited to just programming languages. For example, the ground-floor of a building is elevator button "0" in France, but elevator button "1" in the USA.

Index of the last element

The relation between numbers appearing in an array declaration and the index of that array's last element also varies by language. In some languages (e.g. C) the number of elements contained in the arrays must be specified, whereas in others (e.g. Visual Basic .NET) the numeric value of the index of the last element must be specified.

Indexing methods

When an array is implemented as continuous storage, the index-based access, e.g. to element n, is simply done (for zero-based indexing) by using the address of the first element and adding n · sizeof(one element). So this is a Θ(1) operation.

Multi-dimensional arrays

Ordinary arrays are indexed by a single integer. Also useful, particularly in numerical and graphics applications, is the concept of a multi-dimensional array, in which we index into the array using an ordered list of integers, such as in a[3,1,5]. The number of integers in the list used to index into the multi-dimensional array is always the same and is referred to as the array's dimensionality, and the bounds on each of these are called the array's dimensions. An array with dimensionality k is often called k-dimensional. One-dimensional arrays correspond to the simple arrays discussed thus far; two-dimensional arrays are a particularly common representation for matrices. In practice, the dimensionality of an array rarely exceeds three. Mapping a one-dimensional array into memory is obvious, since memory is logically itself a (very large) one-dimensional array. When we reach higher-dimensional arrays, however, the problem is no longer obvious. Suppose we want to represent this simple two-dimensional array:

\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}.

It is most common to index this array using the RC-convention, where elements are referred in row, column fashion or A_{row,col}\,, such as:

A_{1,1}=1,\ A_{1,2}=2,\ \ldots,\ A_{3,2}=8,\ A_{3,3}=9.\,

Common ways to index into multi-dimensional arrays include:

  • Row-major order. Used most notably by statically-declared arrays in C. The elements of each row are stored in order.
1 2 3 4 5 6 7 8 9
1 4 7 2 5 8 3 6 9
  • Arrays of arrays. Multi-dimensional arrays are typically represented by one-dimensional arrays of references (Iliffe vectors) to other one-dimensional arrays. The subarrays can be either the rows or columns.

A two-dimensional array stored as a one-dimensional array of one-dimensional arrays.

The first two forms are more compact and have potentially better locality of reference, but are also more limiting; the arrays must be rectangular, meaning that no row can contain more elements than any other. Arrays of arrays, on the other hand, allow the creation of ragged arrays, also called jagged arrays, in which the valid range of one index depends on the value of another, or in this case, simply that different rows can be different sizes. Arrays of arrays are also of value in programming languages that only supply one-dimensional arrays as primitives.

In many applications, such as numerical applications working with matrices, we iterate over rectangular two-dimensional arrays in predictable ways. For example, computing an element of the matrix product AB involves iterating over a row of A and a column of B simultaneously. In mapping the individual array indexes into memory, we wish to exploit locality of reference as much as we can. A compiler can sometimes automatically choose the layout for an array so that sequentially accessed elements are stored sequentially in memory; in our example, it might choose row-major order for A, and column-major order for B. Even more exotic orderings can be used, for example if we iterate over the main diagonal of a matrix.