Container data-types

In addition to the basic data types covered in the last section, Python has several built-in “containers”.

These come in two main types;

  • Lists and Tuples : store a sequence of values
  • Dictionaries : map keys to values

Lists

A list, as the name implies, is a sequence of items. For example, the numbers 1 – 10 could be arranged in a list: 1,2,3,4,5,6,7,8,9, and 10.

In Python, lists are created using square brackets, and items separated by commas, e.g.:

[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

List elements can be anything (including lists - which creates “nested lists”), strings, etc.

For now though, lets get to grips with lists by considering flat lists of simple data.

Manipulating lists

Lists can be manipulated using member-functions:

  • grown using
    • insert (anywhere in the list)
    • append (always at the end),
    • extend to add multiple items at the end
  • shortened, using
    • pop (a specific index - without any number the last element will be removed)
    • remove (a specific value)
  • sorted using sort
  • searched using index

Lists can also be combined using the plus operator, +, e.g.

[1,2,3] + [4,5] 

results in the list [1,2,3,4,5]

The length of a list can be determined using the built-in function len.

Note

len is not a member function of a list, i.e. you cannot call AListVariable.len()

Instead, use

len(AListVariable) 

to return the length of a list

Anything that can be “iterated” over or cycled through, can be converted into a list using the list function.

Python also contains a built-in function called range which is a convenient way of generating a range of numbers (as an iterable that can be converted into a list):

list(range(10)) 

is the same as

[0,1,2,3,4,5,6,7,8,9]

Exercise : List manipulation

Write a script (name the file exercise_lists.py) to perform the following operations (this is just to test them all out!):

  1. Create a variable that contains a list that holds the numbers 10 to 100 (i.e. 10, 11, 12, …, 99)
  2. Add the value 100 to the end of the list
  3. Remove the 20th element of the list (index 19!)
  4. Remove the value 55
  5. Add the elements 5,6,7 to the end of the list (hint use extend with [5,6,7]!)
  6. Print the length of the list.

Tuples

A tuple is similar to a list in that it is a sequence of items.

The main difference between a list and a tuple, is that a tuple is immutable which means that elements cannot be changed, added, or removed once created (i.e. it is ~”read-only”!).

This makes tuples computationally faster than lists, but also less versatile.

As a general rule of thumb, when in doubt, use a list.

Dictionaries

Lists are great when we have a sequence of data, but the caveat is that in order to get a specific element we need to keep track of its index or repeatedly call the member-function index.

This can get tricky if the list grows and shrinks.

Motivation: why dictionaries?

For example, if we want to keep track of the colours (and names) of fruit, our first attempt might be to keep two lists - one for the names, and one for the colours:

names   = [ "banana", "orange", "strawberry"]
colours = [ "yellow", "orange", "red"]

Then, to find the colour of a strawberry, we first have to find the index of strawberry, and then use that to find the colour:

ind = names.index("strawberry")
print("The colour is : " + colours[ind])

Not only is this clumsy, it’s also likely to lead to issues as the lists might accidentally become unsynchronized (meaning that the order of the name and the order of the colours doesn’t match).

Attempt number two: as briefly mentioned, lists can hold pretty much anything, including other lists, so we could use a list of lists, e.g.

fruit_colours = [ ["banana", "yellow"], ["orange", "orange"], ["strawberry", "red"]]

Now the name will always be paired with the right colour!

The downside to this approach, is that if we want to find out what colour a fruit is, we need to do a non-trivial search operation (as we’re trying to find lists in lists!) and can no longer use the index function that we used before.

The bottom line is that we could hack together a solution using lists, but luckily for these kinds of situations, Python offers us a much better solution in the form of a dictionary.

Dictionary creation

A dictionary is created using curly-bracket notation { }, for example continuing our fruit names and colours example, dictionary items are provided as a list of key:value pairs:

fruit_colours = { "banana" : "yellow", "orange" : "orange", "strawberry" : "red"}

Now we can access values using keys:

print(fruit_colours["banana"])

will print yellow to the terminal.

Note about keys

Here we’ve used strings as both the keys and values, but you can also use numbers as keys and/or values.

In fact pretty much anything can be a value (much as with lists), and anything that is hashable can be a key – hashable roughly translates as non-changing. A number or string is hashable, as is a tuple (as explained above). Lists and dictionaries are NOT hashable as they can change.

Converting a sequence to a dictionary

Dictionaries can be created from a sequence where each item in the sequence has two elements, by using the dict function e.g.

fruit_colours = dict( [ ["banana", "yellow"], ["orange", "orange"], ["strawberry", "red"]] )
fruit_colours = dict( ( ("banana", "yellow"), ("orange", "orange"), ("strawberry", "red")) )

both produce the same dictionary as in the previous section.

Note that while for this example this way of creating a dictionary might seem superfluous, there are scenarios where it is very useful. For example, if we have a function that generates a list of 2-element lists, but we want the result as a dictionary, we can simply convert the list to a dictionary as per above, without having to write a new function!

Manipulating Dictionaries

Once a dictionary has been created, it can be grown or shrunk slightly differently to lists:

  • Adding items : d[NEW_KEY] = NEW_VALUE - creates a new key-value pair, or updates one if it already exists
  • Removing items : d.pop(KEY, DEFAULT) - removes KEY and returns its value, or DEFAULT if KEY doesn’t exist in the dictionary.

As you might have spotted, you can’t assign multiple values with the same key - instead if you write

d[ALREADY_EXISTING_KEY] = NEW_VALUE

the old value that ALREADY_EXISTING_KEY pointed to will be overwritten.

Exercise : Dictionaries - Wherefore art thou Romeo!

Write a script (name the file exercise_dicts.py) to count the occurrences of the the words

  • "sword"
  • "love"
  • "wench"
  • "fool"

in Shakespeare’s Romeo & Juliet.

Use the following initial two lines (feel free to copy and paste!) which will pull the text from an online source and assign it to a variable called text

import urllib.request
text = urllib.request.urlopen("http://www.textfiles.com/etext/AUTHORS/SHAKESPEARE/shakespeare-romeo-48.txt").read().decode('utf8')