Reading and Writing Data

Python contains a single function for opening files, open.

By passing in a flag for how the file should be opened, we can either read, overwrite, or apend a file.

A file that is opened for reading won’t be modified, so this is always the default mode to prevent accidentally modifying files.

The result of the open function is a file object, which has a number of member functions which we can use to read from or write to the file.

Note: if we open the file using the default read mode, then attempting to call the write or writelines member functions would result in an error.

Aside on relative vs absolute paths

As soon as we want to interact with a file-system, which usually means reading or writing (saving) files, we need to be aware of the difference between relative and absolute paths.

This concept may be a little foreign if you’re used to graphical operating system environments like Windows or MacOS, though you will probably be familiar with the processes of navigating through a file system using e.g. the Windows File Explorer.

For example, when you are viewing the contents of your “My Documents” folder, the full path of that folder is something like

C:\Users\Joe\My Documents

so that when referring to a file in that folder, you would specify the full path to the file as e.g.

C:\Users\Joe\My Documents\File1.txt

However, from within File Explorer, when you are viewing the contents of My Documents, you can double click on File1.txt to open it. In that case, the File Explorers working directory is C:\Users\Joe\My Documents, so that the relative reference to “File1.txt” makes sense as it means “File1.txt” in the current working directory.

Referring to files in Python code is similar!

If you refer to "File1.txt" in Python code, it will look for a file called "File1.txt" in the working directory of the script (i.e. the same folder that the script is in).

If you want to refer to a file that is not in the working directory of the script, you would use its absolute path, e.g. C:\Users\Joe\My Documents\File1.txt

The os module has a submodule called path, (i.e. os.path) which is useful for working with file paths.

Reading files

For the time being, we’re only going to be concerned with the functions readline, and readlines with respect to reading files (though read is also useful!).

These are convenient methods to read either a line at a time (readline) into a string, or the entire file (readlines) into a list of strings.

For example if we have a very simple text file that contains

My 
Name 
is 
Sam

and the file is named sample.txt (which has the full path /path/to/file/sample.txt), then we could use

fd = open("/path/to/file/sample.txt")
print(fd.readline())
fd.close()

and the output would be

My

Similarly, we could use

fd = open("/path/to/file/sample.txt")
print(fd.readlines())
fd.close()

and the output would be

["My", "Name", "is", "Sam"]

If we want to read the file line-by-line (for example, if it is a particularly large file!), we can iterate over the file object itself:

fd = open("/path/to/file/sample.txt")
for line in fd:
    print(line)
fd.close()
    

Why the close?

After opening and accessing the file, we also need to finally close the file, using the close member function.

This is similar to when you work on a word document or any other file; if the file is still open when you try and access it with another program, you will sometimes receive errors, as your computer is warning you that you might still be modifying the file elsewhere and so manipulating it in the meantime is dangerous!

Writing files

The process for writing to a file is very similar to reading from a file; first we get a file object using open, except this time we add the additional mode flag as either "w" or "a", corresponding to write which overwrites the original file, or append which only appends to the original file. Note: if the file doesn’t exist before calling open, these two are the same!

Now we use either write or writelines to write either a single string, or a list of strings to the file.

** Important note **

A major difference however between the read and write operations, is that neither write nor writelines insert newline characters. So slightly deceptively,

fd = open('writetest.txt', 'w')
fd.writelines(['a', 'b', 'c'])
fd.close()

would produce a file called writetest.txt that contains

abc

i.e. not three separate lines!

Instead, to actually write three separate lines, we need to add the newline character, \n to each line:

fd = open('writetest.txt', 'w')
fd.writelines(['a\n', 'b\n', 'c\n'])
fd.close()

A common alternative to “manually” adding a newline character to each line, is to use the string member function join and write a single string:

fd = open('writetest.txt', 'w')
fd.write('\n'.join(['a', 'b', 'c']))
fd.close()

Here, join is a string object member function (in this case the string '\n') that takes a list as an input, and joins all of the items in the list using the string object it was called with.

Exercise : Reading a data file

We have seen above how to read text from a file and display it in the console.

As researchers, we usually need to do more with data and the data is often in numerical format.

Download the data file from here: data_exercise_reading.csv, saving the file to your python exercises scripts directory.

The file contains a table of comma separated values. The values start

Time,Signal
0,100
5,101
10,98
:
9995,102

The first line contains the column headers, and the subsequent lines contain the time and signal values.

Write a new python script (exercise_reading.py) that

  • reads in the headers (but you don’t need to keep these!)
  • and the data values
    • converts the data values into numbers
  • On the signal data, calculate the
    • sum
    • mean
    • and population standard deviation
  • output those statistics to the console