Wednesday, January 7, 2015

Arrays in Python

A list in Python is just an ordered collection of items which can be of any type. By comparison an array is an ordered collection of items of a single type - so in principle a list is more flexible than an array but it is this flexibility that makes things slightly harder when you want to work with a regular structure. A list is also a dynamic mutable type and this means you can add and delete elements from the list at any time.
To define a list you simply write a comma separated list of items in square brackets:
myList=[1,2,3,4,5,6]
This looks like an array because you can use "slicing" notation to pick out an individual element - indexes start from 0. For example
print myList[2]
will display the third element, i.e. the value 3 in this case. Similarly to change the third element you can assign directly to it:
myList[2]=100
The slicing notation looks like array indexing but it is a lot more flexible. For example
myList[2:5]
is a sublist from the third element to the fifth i.e. from myList[2] to myList[4]. notice that the final element specified i.e. [5] is not included in the slice.
Also notice that you can leave out either of the start and end indexes and they will be assumed to have their maximum possible value. For example
myList[5:]
is the list from List[5] to the end of the list and
myList[:5]
is the list up to and not including myList[5] and
myList[:]
is the entire list.
List slicing is more or less the same as string slicing except that you can modify a slice. For example:
myList[0:2]=[0,1]
has the same effect as
myList[0]=0
myList[1]=1
Finally is it worth knowing that the list you assign to a slice doesn't have to be the same size as the slice - it simply replaces it even if it is a different size.

Basic array operations

So far so good, and it looks as if using a list is as easy as using an array. The first thing that we tend to need to do is to scan through an array and examine values. For example, to find the maximum value (forgetting for a moment that there is a built-in max function)  you could use:
m=0
for e in myList:
 if m<e:
  m=e
This uses the for..in construct to scan through each item in the list. This is a very useful way to access the elements of an array but it isn't the one that most programmers will be familiar with. In most cases arrays are accessed by index and you can do this in Python:
m=0
for i in range(len(myList)):
 if m<myList[i]:
  m=myList[i]
Notice that we are now using range to generate the sequence 0,1, and so on up to the length of myList.
You have to admit that there is a lot to be said for the simplicity of the non-index version of the for loop but in many cases the index of the entry is needed. There is the option of using the index method to discover the index of an element but this has its problems.
For example if you wanted to return not the maximum element but its index position in the list you could use:
m=0
mi=0
for i in range(len(myList)):
 if m<myList[i]:
  m=myList[i]
  mi=i
or you could use the non-indexed loop and the index method:
m=0
for e in myList:
 if m<e:
  m=e
mi=myList.index(m)
print mi
The only problem is that index(m) returns the index of m in the list or an error if m isn't in the list. There is also the slight problem that behind the scenes it is clear that index is performing another scan of the entire list.

Dimension?

So far everything is going according to plan. Where things start to go wrong just a little is when you attempt to push the similarities between lists and arrays one step too far. For example, suppose you want to create an array initialized to a particular value. Following the general array idiom in most languages you might write:
myList=[]
for i in range(10):
 myList[i]=1
only to discover that this doesn't work because you can't assign to a list element that doesn't already exist.
One solution is to use the append method to add elements one by one:
myList=[]
for i in range(10):
 myList.append(1)
This works but it only works if you need to build up the list in this particular order - which most of the time you do. When the same situation arises in two- and multi-dimensioned arrays the problem often isn't as easy to solve with append, and there is a better way.
When you create an array in other languages you can usually perform operations like:
a[i]=0
without having to worry if a[i] already exists. In many cases the cost of this facility is that you have to declare the size of the array using a dimension statement of some sort. Python doesn't have such a facility because lists are dynamic and don't have to be "dimensioned". It is fairly easy to do the same job as a dimension statement, however, using a "comprehension".

Comprehensions

A comprehension is roughly speaking just an expression that specifies a sequence of values - think of it as a compact for loop. In Python a comprehension can be used to generate a list.
This means that we can use a comprehension to initialize a list so that it has a predefined size. The simplest form of a list comprehension is
[expression for variable in list]
For example, to create the list equivalent of a ten-element array you could write:
myList=[0 for i in range(10)]
Following this you have a ten-element list and you can write
myList[i]=something
without any worries - as long as i<10.
You can also use the for variable in the expression. For example:
myList=[i for i in range(10)]
sets myList to [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and
myList=[i*i for i in range(10)]
sets myList to [0, 1, 4, 9, 16, 25, 36, 49, 64, 81].
The idea is that if you want to treat a list as an array then initializing it in this way can be thought of as the Python equivalent of dimensioning the array.

Two dimensions

It has to be said that one-dimensional arrays are fairly easy - it is when we reach two or more dimensions that mistakes are easy to make. For a programmer moving to Python the problem is that there are no explicit provisions for multidimensional arrays. As a list can contain any type of data there is no need to create a special two-dimensional data structure. All you have to do is store lists within lists - after all what is a two-dimensional array but a one-dimensional array of rows.
In Python a 2x2 array is [[1,2],[3,4]] with the list [1,2] representing the first row and the list [3,4] representing the second row. You can use slicing to index the array in the usual way. For example, if
myArray=[[1,2],[3,4]]
then
myArray[0]
is the list [1,2] and
myArray[0][1]
is [1,2][1], i.e. 2.
As long as you build your arrays as nested lists in the way described then you can use slicing in the same way as you would array indexing. That is:
myArray[i][j]
is the i,jth element of the array.
For example, to do something with each element in myArray you might write:
for i in range(len(myArray)):
 for j in range(len(myArray[i])):
  print myArray[i][j]
Where len(myArray) us used to get the number of rows and len(myArray[i])) to get the number of elements in a row. Notice that there is no need for all of the rows to have the same number of elements, but if you are using a list as an array this is an assumption you need to enforce.
Notice that in the two-dimensional case the non-indexed for loop can also prove useful, but you cannot avoid a nested loop:
for row in myArray:
 for e in row:
  print e
You can also use the index method to recover the i,j type index of an element, but you have to be careful how you do it. For example, to print the row and column index of the element:
for row in myArray:
 for e in row:
  print myArray.index(row),row.index(e)
Finally, we have the problem of initializing the array so that
myArray[i][j]=value
doesn't generate an error. In other words, what is the two-dimensional equivalent of the dimension statement using comprehensions?
The answer is that we need to use a nested comprehension to create the list. For example, to create a 3x3 matrix we could use:
myArray=[[0 for j in range(3)] for i in range(3)]
To understand this comprehension you need to see that the inner comprehension is just:
[0 for j in range(3)]
which creates a row, and then the outer comprehension just creates a list of rows.
As before, you can use the index variables in the expression. For example:
myArray=[[i*j for j in range(3)] for i in range(3)]
In general, if you want to work with an m by n array use:
myArray=[[0 for j in range(n)] for i in range(m)]
and everything should work as you would expect.

Advanced comprehensions

In Python there are lots of clever ways of doing things that you generally wouldn't dream of in a lesser language. For example, a comprehension can be used to generate a list based on other lists. It is generally easy to get hold of the rows of a matrix:
for row in myArray:
  do something with row
but getting at the columns as lists is much more difficult. However, with the help of a comprehension it is easy to get column j as a list:
col=[row[j] for row in myArray]
Using the same idea, if you want a transpose a matrix then usually you need to write two explicit for loops but to do the job in Python you can simply write:
myArray= [[row[i] for row in myArray] 
          for i in range(len(myArray))]
Python has lots of, usually functional, ways of working with arrays that aren't encountered in other languages. This should be enough to get you started on using lists as arrays and in more creative ways.

No comments:

Post a Comment

python class topic video