Data types

At the end of of the last section, we introduced the string data-type as being an argument to the print function.

Two of the most basic data types in Python are strings and numbers.

Numerical data

Valid numbers like 10, 0.001, and 1E6 (a million in scientific notation) are all treated in the same way by Python.

Numbers can be operated on using standard arithmetic like

  • Addition + : 40 + 2 (= 42)
  • Subtraction - : 102 - 1 (= 101)
  • Multiplication ** * ** : 2 * 3 (= 6)
  • Division ** / ** : 3/2 (=1.5)
  • Exponentiation ** ** ** : 3**2 (=9)

Jargon

Unlike “statically typed” languages like C++ and Java, number data storage in Python is handled automatically and conversions done as needed.

For example 10 will be stored internally as an int, but 1/10 will result in 0.1 (float) while 10+1 will result in 11 (int).

Exercise : Using Python as a calculator

Write a script (name the file exercise_calculator.py) to output the result of the following operations:

  • 1 divided by 2323999
  • 2382 multiplied by 121987
  • 2 to the power of 256 (exponentiation)

Additional operations: comparison operators

In addition to standard algorithmic operators in the previous section, you can perform comparisons on numerical data resulting in Boolean (True/False) results, such as

  • == Equal : 4==5 results in False
  • > Greater than : 2 > 3 results in False
  • < Less than : 2 < 3 results in True

as well as >= (greater than or equal), <= (less than or equal), and != (not equal).

Booleans: True & False

Here we introduced a new data type - the boolean (aka bool).

Boolean data is converted to 0 and 1 when performing any kind of arithmetic, e.g.

  • True + False gives 1
  • True/10 gives 0.1

Booleans can be thought of as being a sub-type of numerical data - where only 0 and 1 are represented.

Strings

The term string is roughly speaking short for a string of characters, i.e. text. String data is enclosed in single or double quotes; the following are all valid Python strings

'I am a string'

"I'm a string too" (A double-quoted string can contain single quotes and vice-versa)

"""
And python accepts multi-line strings enclosed in 
triple quotes...(more on me in a while!) 
"""

Strings are one of the most basic “sequence” data types; we’ll encounter a few more in the next section.

Accessing individual characters in the String

To access individual characters in a string, we use index notation, which is represented using square brackets, [ ].

For example, to access the second character of a string we can use

"abcdefg"[1]

which gives access to the character “b”. This is because Python uses zero-indexing meaning that the first element is accessed using [0], as do most programming languages (a notable exception being Matlab, which uses 1-indexing, i.e. the first element is 1, not 0).

Accessing a range of characters

If instead of accessing a single character we want to access a range of characters, for example the first five characters in the string "Hello, world", we use what is called slice indexing:

"Hello, world"[0:5]

returns "Hello".

The syntax for slice indexing is [START_INDEX : END_INDEX_PLUS_ONE], e.g. if instead we had wanted the fifth to the eighth characters (inclusive) we would use

"Hello, world"[4:8]  

which returns "o, w"

By default, the START_INDEX is 0, and the END_INDEX_PLUS_ONE is the length of the string, so we could have written

"Hello, world"[0:5]

as

"Hello, world"[:5]

Both return "Hello".

Negative indexing

Lastly, a really handy indexing feature is negative indexing; the last character of a string is accessible using -1, the second last as -2, and so on. Negative indices can also be used as part of a slice, e.g. to access the last 5 characters we can use

"Hello, world"[-5:]

which returns "world".

String operations

There are a range of operations that can be performed with Strings.

These include some translations of the arithmetic operations:

  • Concatenation using the plus sign : "Hello " + "world" (= “Hello world”)
  • Repetition using multiplication sign : “abc” * 3 (= “abcabcabc”)

as well as functions known as member functions which can be accessed using dot-notation, e.g.

  • find: E.g. "Hello, world".find("wo") (=7, the index at which the string “wo” starts)
  • replace : E.g. "Hello, world".replace("w", "W") ( results in “Hello, World” - i.e. replaced “w” with “W”)
  • endswith : E.g. "Hello, world".endswith("Dog") (results in False`, because the string does not end in “Dog”)

The full list of member functions is:

capitalize    endswith      index         isidentifier  istitle       lstrip        rindex        split         title
casefold      expandtabs    isalnum       islower       isupper       maketrans     rjust         splitlines    translate
center        find          isalpha       isnumeric     join          partition     rpartition    startswith    upper
count         format        isdecimal     isprintable   ljust         replace       rsplit        strip         zfill
encode        format_map    isdigit       isspace       lower         rfind         rstrip        swapcase

More details on all of these methods can be found here.

Exercise : Using Python to analyse text

Write a script (name the file exercise_strings.py) to count the number of occurrences of the character “A”,
and also the number of occurrences of the sequence “AT” in the following string of text (tip: carefully double-click on the string to select the whole line, copy, and paste the string directly into the file).

'CGCCAATGCGGCAAGGATATGCGAAGTCTGGACTAATTCGGCTGACGTGTCCCTGCTTAGTGGTCTTCCACACTTGCGGATTCAGCCGTAAGTGGCGTATACCTCGTGAGTGCACAAGGCAGATGTGACCTACCGGGGTTTTATCATTAGACTTTTGGGGTGAGCCGGATGACCGATCGAAGCCCGAGTGCAATTGTCTCTCTCGAACGAAGAACGGAGGAGAAAACGTGTGTGGGGGCCTACCGCCATGCACAAACTAGACTGTCACTAAAACCGTGAAGCTACGCTGGCCTCCAGGCGGTATAAACCTTTCGATGTTAACAAGCAAAGAACCAATTCGCGTGAGTAGGCGGGCGTATGGCCCCACGAGCCTTGCACTTGTTTTCGAAATGAATCAGGACGCCTAATTATCAGAGGGAGGAGAAATGAGGCCAGCCAGCGACACTGGTCAAGGTACGGGCGGTCGCTAGTGCCCAACCAAAGGTAAGTTATTGCGATGGTCCAAAAGAAGGCACGTGTGGATACACTCGTTTATGAACGTTTCTACGGCAGATCAGGCCGACCTTCGATAATAACAAGCGGCGGGACGCACGACGGGACTCGCTGTCGGTCAGCTATGGCCATTCCTCGTAGGAGCCGCATCTATCTCGAACTAATTGATAGTTTGGTGTAAGTCCCCTCAGGTGTCACGCAACGAAGATGCGCTGAAGATTACTTTCGCACGGGTCACACGGAAGGAGTACTGTAGGGCGGAAGAGCACCGACTGAGGCCACAATCTCGAAGTACTGTGCTTTCGCTCTAACTCGGCTTACCCGTCTACCTGTCGCCTCCCTAGATCCAAATTGAATCCGCCCCCCGTGCTCTGTGACCCAGGACGTATACGGCGTTTAGGTTGTCCACAGCTAAAAACCAGAAAGCGACCGAGTGTATTCGAAATTTCGGTGGACCTTTCAACCTATAGGTCTTGTCGAATTCACTTGGGAGAACAACGCATGAAATTTGACGGATCGTGCACGTGATATAATGGGACTGCTTAATTGCGCCCCATTTTGGGAGCGCATTTGAACGCAAGCTCTGGGTCCCGCTATATATTAAGAAAAGTATGAAACGTTGTTACCATATCCGCACACTGGGATAGGTACGCAGATTTGTACTTGTATGCGTAACTGATTTTTCCCCTGACGGAGGGTCCGTTCCTCTGAGCCCCCGTCGTGCGATCCTGGGTGGCCACGTCTAAGCTGTCGCGAGCGAACATTATTTATGTTTATCTGCCAGACGAGCTTTGCCTACTTTCGAGGGGATGAAATTTAATTAAGCGATTTGAATATAAGGGGGTTTCATATGCCTAGATTACCTAGTGCGTTTATACAACTATGGTGAATAGAGGAGCAGTCCGAGTTAGAGGACAAACACTTTCGCAGGTGGCAAGTCGCACTAGCGAGTTGATTACGGACCACGAGGTATATTCAGGACATCAATTTTCCTGGGGGGATCATCTCCTCTTACTGTAGCAGCTTTTTTCTCTCCCTGCGGATTCAAAGCCCTTGTTCTGTCGCTGCCATTTAAAGGGAAAGGACTCGGAAGAACAGGTTCAGAGATTGGCAAAGACGGTCTTCTGTGCACTTTGATCATTGTGGCTTGAGGCGGGAGACACGAACGGCGCTAGCGACTCTCATCTACCAGCCTATTATATCCGCTCCCCTGGTTGAGTAAATACCTAATAAGGACTTTTGTCAGATTGACTTTCTGCAAGGGCAGGGATGGCATAGGAGATATTCACTAATAGGATGAACGTCGAAGGAGTAAATTGTTTGGAGTAATATTTTAATTCTCCTCCGCATAAAAACGTGCCTGACTAATGCTGACTGGAAATGACGTCATGGGGTGACATCCTGACAAGTATTCGACAGACGCAGAATGGCGACGGCGCACTCAGATTTAGTCCTCTTCTTCCGAGTAAATACTCGTACACCGCAAAGATTGAGGGCATAGGTAAGCGTACAAAATCCGGTGTCATCGACCCAAGTAGAGACTACATGACGGGCCGTGAGGTGATCTGATCTTTGACTCTCCGTAAGGTGTCCCTAGGGGGTTCCCATGGTAACGGATTTGCGCTCAACCCGAAACTCGAACAACATCGAAATGAGTATAACGGTTAGAGGTTAGTGGGGGGTGCGAGTGCGGTGTTCCTACTGTACCCGAAGGATAGTCCTGTTTCATTCATATTGGAGATTACAGCCCCTAGAAGTGAGGGAACACGCCCGAGGCTTTCATGGCTACAGGTCGGGATGTCAGCCCCCTCTAAGGTTGGAAGCAATAGATCACCTATGTTAGATGGCAGCTGATTTCCACCTCCTGCCGAAGGTCCCATTATAGGCATCCCAAGGTGCAGTCGATACCCCAATTGTTCGCCTAGTGGTGGAGTGGCCATCTGTGGGGCATGTCATGAAGAACAGGCCACCTCGGCGACCCAACCTCCACTCAGTCGGTCCGCTGAAGTCTCGGAGCTCTAGTTGACGGAAGGCTTCGGGTTTCTCACCACCTGTCCGTAAGAGACCTGTATTGGTCGCACGCAGGAGGAAGACGGCTTACGATGTGTGGCTAATTCGCGTCCTCATGCCCAGCCATACTATGTTGTGACGCGATGACCTCAGCGGTTAATGCCTCTCCGCCAGTTGGATAGTTCGTTCTGGAAACCTGCAATACATCCTTTCGTGCTTGGCGTCTGATAAGAGTAAGGAACTTATTGAACGTTTACCCATAGCGGGCACTTCAAGTCTGGGCCCGAAGGGAACTCGTGATAGGGGGCGCAATGATATTCTGCTGTCTAAAAGCCACGACAAGGTCTCCACAAGTCAGGACGCCAATCCAACTAAATACTGCCGAAATGCGAGAATTCGTGCCCCCACGCACGTTCTAGGCGAGCGTTGGCGTCAGAAATACGTAAGACTGGTGGACTTTGAACAGGCAACGGGCAGCGACTATCGATAAAGTAAATCCCGCGATAGAAGTTACATCTCTTAGCCTCAGAGACTCATACCGGGCGTATCCGGTACGTCATCGCCATGGACCATTCCGGTAAGTCCATATCATATCGAACAGCCTTTACTACTGGAAACCCATCTTCCAGTACATGTCCGGAAATGGGACAATAGAAAACTGCGGTGCGTGAGCCTACTATAGTGTATCCCGGTATAGATTGGTGCTCAGGCAAAAGAGCTCTACGAGACAACGTCGACAGAGACAGGCGATCGTACGAGCGAGTAGGCATCACCTGCGGTGTTTGGACTATGTGAGGAGCATCAGGTCGTCTCTAAAGTATCGACTCTTCGTATTAGGCATCCACTCAAAATGAACCTTGCCCACGTCTCCTCCATCTCAGAGGATATGTCACGTCTGCCTACCTGAATGCCGACTGATTCGTCTACAACCACTAATACGGACGTAGTCTCCTCAAGAGTTACAGGTTAGATCCTTACCCATAATATCGGACAATCGTATCGGGTGGTGGTTAAGCGTCGGCGAGCTGTGGTTCAGTGCGATAGGGTTAACCCGCGTGTTCAACGCCCGGGCACAAGAAGTGAACTAGGCGTCTCGGTCCCGGAGGGTTGGATCCATTTACCATCGAGTACGAATTATGACTCCCTAAGTAATACCAAAAGGCCTAACCGGGCCAGGGCCCGTATCGCACCGACGCTCTGGGGTCCGCCTAGAGGTTGACCGCACGACAGGCCTCCTCCTATAGGCGGTTCCGCGTCGGACTACTATCGTCTGGTGTAAGACACTAAGCTCGAATCGACCACACGTAGATTATTTACGATCATGGTCGCTAGGGACCAGCTGTACAAGCTCGTAAACTTAACCTAGTCAGTATTTTGGACCTTTCAGGTGTACGCCGGAATTGAATTGTGGGCTTCAGCGAGCGATGTCCTTATTTAGCAATTCACGCACGGCGTACTCATATCGCTATAAGCGTGTCCGACCTAAGTGCGTTGGGCACTCCGTTCCTGAAAATGTTTTTCGCTGAATCTGGTGTAACCTGCGCGGCGGCATCTTATGAACATTAACCCGCGTCCAGGACGTAAGGATTCCGCACCCTAAGGAAACCGGGTCCGCTTATCAGTATCAGCTCATTGGAGGTTGAAACATTGCTTCCATCATGTCAAATGGTGCGGGAGCGTAGGCTCGTTCAAGGATCAAAGCCGCATGGTCGCCTGCTCTCTAGTTTCAAACTGTTAATAGGAAAACCGTGTACTATTAGAGGGTGGAATCCAAAGCCTTGTAGGGCATATAAGAGGGAAATTCTTTTTCCGGTGCTTAACCCAATGACTCCCTCCGGATAGCCTCACTAAATTCTGGCGATACAACTACTCGTTCGGGATTCTATTGCCTTCCGGATGGTTCCCTGTGCCTATAAGTTCGTTAACGGTGTACCTCGAACAGAATAAAAGTCCACCATGGAAATGGGATTCTCGGAGTGCTCCAGAATGATCTGTTAGCAGCTACGCCGCTGGTACTTCGTAATCCATTAAAGCGGTTTAGACTGCCAACTCCTCCGTGCGCAACAGATAGCCTCAACAATTTACGCCATCTGAGCGGACAGCATTTGATAAGGAATGTACATCACCGGGACTCCTTTTGTGGGAGTGCGGCACGGACGCGTTATGCCGAGTTCTCTAGCTACCCTGGCTAGAGAACCTAGGAGTGCACGTTCGTTTTGAACCCTAAACGTCCGATGCGACCCTTGAGTCGCAAACTGTGTAACATGCCGGCGGTGGGTAAAGTTATCTCTGGGATAGGTCTGAGCTCGCGAAAAAAGTCGCATCCGGGCATGGCTTGCCCAACTGTGGACCATTGCACAATAGCGAAACAGGCATGCGTTAAGTCACACCACAGACCTTGGAATTAGGGCGATGGCGTACCACACCTTATCGTGGAGCCCACCAAGAGAGCAAAAGTCATTAACGATCAATTTTGTAACAGATCTAATTGGATGGAG'