Additional advanced Python

The following concepts are less frequently needed than those in the first section of this workshop.

However they are nonetheless useful for certain scenarios, and included here for those of you who might find them useful.

Installing modules

You should already be familiar with how to use a module, and all of the modules that you will need for this workshop are included in the WinPython install.

However, for completeness we will briefly mention the simple install process for new modules; using pip.

pip comes as standard with Python as of 3.4, and allows you to search the online Python Package Index (PyPI) as well as install packages from PyPI.

From the command line, we can

  • Query the (online) Python Package Index, e.g.   pip search memory
  • Install a package, e.g.
    pip install memory_profiler
  • Remove an installed module, e.g.
    pip uninstall memory_profiler

NOTE (Windows):

If the pip executable isn’t recognized by the terminal the above commands can be replaced by e.g. python -m pip search memory.

Note however, that unless the module is only reliant on Python, the install process may run into dependency problems (e.g. needing a C++ development environment to be set up in order to compile included C++ code).

Command-line arguments & interaction

Command line arguments : sys.argv

Python can read inputs passed to it on the command line by using the sys module’s argv attribute (module variable):

print(sys.argv)

If we place that statement in a file (called, e.g. “test_inputs.py”), and run the file with Python as usual:

python test_inputs.py

the output would be the list [ "test_inputs.py" ]; i.e. the first element of argv (which is a list), is always the name of the script. Subsequent command-line arguments (separated by spaces) will appear as additional elements in the list.

For example if we call the script with

python test_inputs.py Hello 2 you

the output (contents of sys.argv) would be ["test_inputs", "Hello", "2", "you"].

This highlights that all command line inputs are interpreted as strings; if you wish to pass in a number, you would need to convert the string to a number using either float(STRING) or int(STRING), where STRING could be "2", "3.14", or sys.argv[2] if the 2nd command line argument (after the script name) were a number.

Advanced command-line input interfaces

More advanced command-line input interfaces can be created by using a module such as argparse which allows for

  • flags (like -h for help)
  • optional arguments
  • keyword arguments
  • …and many more functionalities!

Interactive input : input

As well as generating terminal output with the print function, we are also able to read input from the terminal with the input function; for example

some_stuff = input("Please answer yes or no:")

would cause the script (when run), to pause, display the prompt text (i.e. the first argument to the input function), and then wait for the use to enter text using the keyboard. Text entry would finish when the enter key is pressed.

Advanced interactive input

More advanced interactive terminal interfaces are also possible with Python. The cmd module, for example, allows the creation of interactive commands session, where the developer maps functions he has written to “interactive commands” which the user can then call interactively (including features such as interactive help, tab-completion of allowed commands, and custom prompts.

Decorators : Applying functions to functions

A decorator is a way to “dynamically alter the functionality of a function”. On a practical level, it amounts to passing a function into another function, which returns a “decorated” function.

Why would we want to do such as thing?

Consider the following simple example; we have a number of functions that generate return values;

def add(a, b):
    return a + b

def subtract(a, b):
    return a - b

def multiply(a, b):
    return a * b 

def divide(a, b):
    return a / b 

Now we want a clean and simple way of turning all of these functions into more “verbose” versions, which also print details to the terminal.

For example when we call add( 10, 2) we want the terminal to read:

Adding 10 and 2 results in 12

Without knowing about decorators, we might think we need to re-write all of our functions. But, as our modification is relatively straight forward to turn into an algorithm, we can instead create a decorator function:

def verbose(func):
    def wrapped_function(a, b):
        # Access the special __name__ attribute to get a function's name
        name   = func.__name__      
        result = func(a,b)
        print("%sing %0.2f and %0.2f results in %0.2f"%(name, a, b, result))
        return result
    return wrapped_function

which we the use to decorate our previous function definitions:

@verbose
def add(a, b):
    return a + b

@verbose
def subtract(a, b):
    return a - b

@verbose
def multiply(a, b):
    return a * b 

@verbose
def divide(a, b):
    return a / b 

E.g. then calling each of these on 10 and 2 as inputs gives:

adding 10.00 and 2.00 results in 12.00
subtracting 10.00 and 2.00 results in 8.00
multiplying 10.00 and 2.00 results in 20.00
divideing 10.00 and 2.00 results in 5.00

Note that the decorator syntax, where we use the @ symbol followed by the decorator function name, and then define the function to be decorated, is the same as

add = verbose(add)

i.e. we are calling the decorator function (here called verbose) on the input function (add), which returns a wrapped version of add, and we assign it back into add.

The reason that the decorator syntax (@) was created is that the core Python developers decided that having to first define a function and then apply the decorating syntax would be confusing, especially when dealing with large function definitions or class methods (yes, decorators can also be applied to class methods!).

Lambdas : ~ One-line function definitions

A quick but useful additional python construct is the lambda. Lambdas a one-line function definitions, which can be extremely useful to remove the need to create a full function definition for an almost trivial function.

Let’s look at an example of when this might be useful.

There is an in-built function called filter that is used to filter lists; the help doc for filter is:

filter(function or None, iterable) --> filter object

Return an iterator yielding those items of iterable for which function(item)
is true. If function is None, return the items that are true.

Given a list of files, we might want to use this function to select only files ending in a specific extension.

However, to do so we need to pass in a function that takes a filename as an input and returns True or False indicating if the filename ends in the desired extension. We could now go ahead and define such a function, but it really is a trivial function and a more efficient approach is to use a lamba:

# e.g. file list 
file_list = ["file1.txt", "file2.py", "file3.tif", "file4.txt"]
text_files = filter( lambda f : f.endswith("txt"), file_list)

Another example of a function that requires another function as an input is non-standard sorting using sorted, in which case we can pass in a custom “key” function that is used to extract a sort “key” from each item:

print( sorted(["10 kg", "99 kg", "100 kg"] )) 
print( sorted(["10 kg", "99 kg", "100 kg"] , key=lambda k : float(k.split()[0]) )) 

outputs

['10 kg', '100 kg', '99 kg']
['10 kg', '99 kg', '100 kg']

Generators and the yield statement

A generator is something that is iterable, and yet the values that are iterated over do not all exist in memory at the same time.

In fact, those of you who attended the Introductory course will have seen a generator without knowing it, when we covered reading data from a file, and you should have just made use of this above above!

There, the last method shown for iterating over a file was (roughly)

for line in open("filename.txt"):
    print(line)

However, what we didn’t mention, was that this approach provides sequential access to the file being read meaning that only one line of the file is loaded into memory per iteration.

This is in contrast to

lines = open("filename.txt").readlines()
for line in lines:
    print(line)

where we read all of the file into memory using readlines. Under the hood, a file object is a generator, meaning that for each iteration in for line in <fileobject> the file object yields the next line of the file.

Similarly, the last comprehension expression we encountered, i.e. the tuple comprehension, yields the value of each list item squared, without loading all of the new values into memory.

This may seem like a technicality, but can be extremely useful in memory-critical situations.

Any function where we iterate over something can be converted to a generator; instead of creating a list/tuple etc in the function we we use the yield statement instead of return at the end of the function.

For example consider a simple squared function that takes a list and generates a new list of squared values:

def squared(list_in):
    list_out = []
    for val in list_in:
        list_out.append(val * val)
    return list_out

To convert this into a generator function we might write

def squared(list_in):
    for val in list_in:
        yield val * val

Running the first version through a memory-profiler,

In [1]: %memit l1 = squared(range(10000000))
peak memory: 391.05 MiB, increment: 367.12 MiB

we see that we create almost 400MB of data in memory by running the traditional function (as we’ve created 10,000,000 numbers - and each is quite a bit larger than the 8 bytes needed for the value alone as Python objects are more than just data!)

By using the generator version of the function instead, we get

In [2]: %memit l2 = squared(range(10000000))
peak memory: 23.80 MiB, increment: 0.10 MiB

i.e. very little additional memory is used!

Can we still use the resulting generator as we would a list? The answer is, most of the time… by which I mean that many functions where we don’t need the list to be all in-memory will work, e.g.

print(sum(l2))

will output 333333283333335000000, but only once, as generators are iterable once and only once.

If we want to call another function on the generator, we would need to recreate it (i.e. set e.g. l2 = squared(10000000) again).

The bottom line is that generators are useful for specific memory-critical functionalities, such as working with data that is too large to fit in memory.

Exercise : Using generators

Create a script file (“exercise_generators.py”) to evaluate the sum of the squares of the first 1,000,000,000 integers (1 billion or 10E9).

As this is a large number you will at best be barely able to store all of these values in memory at the same time (if each integer were represented using 64 bit / 8 byte this would require at least 8 GB RAM, which is roughly the total amount of RAM a standard current standard desktop PC has!).

NOTE: This will take a couple of minutes to execute!.

While you’re waiting for this to finish running, it’s worth mentioning that the equivalent functionality in C++ is about 20 times faster; but getting C++ to do this is significantly more difficult (even for such a relatively simple task) as it involves using the non-standard __int128 compiler extension, which requires creating a custom output stream operator (to print the result to the screen)!

Nonetheless, for problems involving very simple operations repeated many many times, lower level languages such as C++ may be better suited.

Luckily for us, we don’t need to choose! We can have our cake and eat it…

Beyond Pure Python: When and how

Low-level language purists argue that Python is extremely slow compared with e.g. C, C++, Fortran, or to a lesser extent, Java.

While it is true that Python is slow compared with these languages (typically on the order of 10-40 times slower), pure Python isn’t designed to compete with these languages for speed.

Instead Python prioritizes readability, code design, and simplicity.

That said, there is a large community of Python developers who devote their time to optimizing Python in many ways. These include

  • Just-In-Time (JIT) compilers like PyPy, and Pyston (by the Dropbox team)
  • Stackless Python which adds “microthreading” for easy thread support
  • Multiple ways of interacting with C & C++
    • ctypes : C “foreign function library” built into Python often used for wrapping c libraries in Pure Python
    • Cython : writing C/C++ code with Python syntax - cross-compiles “Cython” code to c/c++ which can then be imported into Python
    • Others inc. swig, scipy.weave (Py2), Boost.Python
  • Support for GPU programming using e.g. PyOpenCL, CUDA Python
  • Specialist libraries like Numba

Using a compiled module

Before you get carried away with these hybrid approaches and/or start to think you might need to write your whole project using a lower-level language, consider the small snippet below that demonstrates some simple arithmetic operations in a for-loop:

def simple(amp, N):
    out = []
    for i in range(N):
        val1 = amp * math.sin(i)
        val2 = (abs(val1) - 1)**2
        out.append(val2*val1)
    return out

simple(0.1, 10000)

Ie here we perform some relatively basic mathematical operations (sin, abs, multiplication, exponentiation, etc), inside a for loop, looped 10,000 times.

For-loops are one of Python’s common speed bottlenecks compared with low-level languages, so this can be considered to be a relatively representative snippet.

Benchmark results

Running this code using the Numpy module, we get the following statistics:

Numpy:   0.000417 seconds / "loop iteration" (uses matrix operations) 

Execution time relative to numpy version (small is good!)

Cython versions
For loop                             :   10.09
List comprehension                   :   15.05
With Basic Cython type declarations  :    5.13
 + return type declarations          :    5.07
 + using cmath abs & sin             :    1.27
 + removed bounds checking           :    1.26

Normal Python
Numpy (vectorized)                   :    0.97
Basic for loop                       :   13.46
List comprehension                   :   17.23

Basic C++ 
Growing vector                       :    1.44
Pre-assigned vector                  :    1.36 

C++ with O2 (optimization) flag
Growing vector                       :    0.97 
Pre-assigned vector                  :    0.92 

C version with O2                    :    0.89

So the Numpy version (and the best Cython versions!) are faster than the basic Pure C++ version!

This is possible because Numpy is not a pure-python library; all of the basic operations are implemented in c, and wrapped in Python to provide a Python object interface to a load of really fast numerical algorithms!

As this still incurs some overhead, applying optimization flags to the C++ and C version mean these versions are still the fastest.

Using Cython

If someone else (like the Numpy development team!) hasn’t already written the speed-critical functions you need in C/C++ etc for you, there is a gentle way of optimizing your Python code using Cython to perform incremental optimization.

We mentioned Cython above as a way of interfacing with C++. Cython basically converts pure Python into C/C++ and automatically create a Python compatible module from the output.

This means that pure Python is valid Cython code. In fact, by just running Cython on pure python we already obtain a small increase in speed, often of about 10-20%.

By adding in additional Cython-specific information, such as type declarations (and removing calls to Python functions), Cython is able to further optimize your Python code for you.

For example, the optimized Cython version (which operates faster than the unoptimized Pure C++ and almost as fast as the highly optimized Numpy code) is identical to the Python version, featuring just a few Cython specific decorators:

from libc.math cimport sin, abs
@cython.locals(amp=cython.double, N=cython.int, out=list,
    val1 = cython.double, val2 = cython.double)
@cython.returns(list)
def cython_version(amp, N):
    out = []
    for i in range(N):
        val1 = amp * sin(i)
        val2 = (abs(val1) - 1)**2
        out.append(val2*val1)
    return out

That’s it!

The function definition itself is almost identical (except that we switched to the c math library version of sin and abs), and all we needed was three additional lines;

from libc.math cimport sin, abs

to import the c library functions for sin and abs (so that Cython doesn’t have to switch back to the Python library to call the Python versions of these functions!), and

@cython.locals(amp=cython.double, N=cython.int, out=list,
    val1 = cython.double, val2 = cython.double)
@cython.returns(list)

i.e. two special cython module decorators, locals and returns, that are used to specify the data types in the local variables and the return variables so that Cython can convert the Python code to c code. Cython knows how to deal with for loops and Python list conversions. Not much to add for a more than 10-fold speed increase!

And yet we’ve also been able to still write valid Python code in doing so.

This highlights how we can incrementally add cython annotation to Python code to make it run faster; so if you have a nicely written Python module that has a speed bottleneck that has to be addressed, no need to replace the whole thing with c code - you can simply add in a few cython library lines and run the code through cython to produce an optimized, compiled version of the same code!

Cython also already contains a special cimport numpy directive to allow operations involving Numpy arrays to be optimized using Cython.