Strings and Types

Authors: Tom Dunham
Date: 2009-03-31

Strings

>>> "hello world"
'hello world'
>>> "AA" + "BB"
'AABB'
>>> "=" * 10
'=========='

String Expressions

>>> greeting = "hello"
>>> greeting
'hello'
>>> greeting + " world"
'hello world'
>>> EcoRI = "GAATTC"
>>> EcoRI
'GAATTC'

Types

http://farm1.static.flickr.com/47/110315663_112b1d05f2.jpg?v=0

Types (2)

Types (3)

We now have another way to describe size = 5

"Bind the name size to an object with the value 5 and the type int"

If we has said size = 5.0, then size would be bound to an object with the value 5.0 and the type float.

And if we had size = "large", the value would be large, and the type str (for string).

We have also used the boolean (bool) type in the tests we've written for if statements and while loops.

Types (4)

To get the type of a value, you can use the type function

>>> type(10)
<type 'int'>
>>> type(1.0)
<type 'float'>

>>> size = 10
>>> type(size)
<type 'int'>

Type Conversions

Python can convert some types in the manner you would expect

>>> 1 + 1.0
2.0

But some types cannot be converted implicitly

>>> 1 + "1"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Explicit conversions

You can attempt to convert a value to another type

>>> float(1)
1.0
>>> int(1.1)
1
>>> str(1)
'1'
>>> int("hello world")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'hello world'

String operations

Not all operations have symbolic names like +

>>> EcoRI
'GAATTC'
>>> EcoRI.lower()
'gaattc'

EcoRI is a name which is bound to an object with type string and value GAATTC. The string object supports an operation called lower which returns a new object of type string but with value gattc.

String Operations (2)

It's the string object that supports the operation

>>> "hello".upper()
'HELLO'
>>> "TACTTTATATTTTA".replace("T", "U")
'UACUUUAUAUUUUA'
>>> EcoRI = "GAATTC"
>>> seq = "AACAGAATTCTTATATTTTATTTGAATTCTCG"
>>> seq.count(EcoRI)
2
This is the point where I start giving examples with a bias toward biology. Sadly my Biology education ended at the age of 16, so it's likely that in contriving examples to demonstrate the programming language I've made huge mistakes with the Biology. If so please accept my apologies and correct me.

Some terms

We have seen a number of ways to invoke operations:

Using functions
>>> square(2)
4
Using operators
>>> 2 + 2
4
Using methods
>>> "TACTTTATATTTTA".replace("T", "U")
'UACUUUAUAUUUUA'

A method is a function that is "stuck" to an object. It always acts upon the object, but can take other parameters as well.

A small number of operations and their notations are well known from other fields (particularly mathematics). To make these operations easier to read, Python allows the programmer to write them in a familiar manner, when Python evaluates these expressions, it turns them into functions - 2 + 2 becomes (2).__add__(2).

Strings as sequences

A string is a sequence of characters, and you can process it one character at a time:

seq = "GAGCTATTACCATAC-TCTCACTGATCGAAACCTTAATACATCATTCTTCGATCCAG-AGGAGGAGGAGATCCTATTTTATATCAACATTTATTT-"
gaps = 0
for base in seq:
    if base == "-":
       gaps = gaps + 1
print "There are", gaps, "gaps in sequence"

Exercise

  1. What are the types of these expressions (try to work them out first, then check using Python):

    1. 7
    2. "abc"
    3. 5.5
    4. True
    5. 2 == 3
    6. 5.0 / 2.0
    7. 5.0 / 2
    8. 5 / 2
  2. After executing the following:

    def square(x):
        return x * x
    

    What is type(square)?

Using

seq = "AACATTATATTTTATTTTCGGAATTTGGGCAGGTATAGTAGGAACCTCATTAAGATTATTAATTCGAGCTGAACTAGGAAACCCCGGATCTTTAATTGGGGATGATCAAATTTATAATACAATTGTTACAGCACATGCCTTCATTATAATTTTCTTTATAGTTATACCAATTATAATTGGAGGATTTGGAAATTGATTAGTACCATTAATATTAGGAGCTCCAGATATAGCCTTCCCTCGTATAAATAATATAAGTTTTTGACTTCTCCCCCCATCATTAACATTATTAATTTCCAGTAGAATTGTAGAAAATGGAGCAGGAACTGGATGAACTGTTTACCCCCCACTTTCATCTAATATCGCCCATAGAGGAAGATCAGTTGATTTAGCTATTTTTTCCCTTCATTTAGCAGGAATCTCTTCAATTTTAGGAGCAATTAATTTTATTACGACAATCATTAATATACGATTAAACAATTTAATATTTGATCAAATACCTCTATTTGTTTGAGCTGTAGGAATTACCGCTTTCCTTTTACTACTTTCATTGCCCGTATTAGCAGGAGCTATTACCATACTTCTCACTGATCGAAACCTTAATACATCATTCTTCGATCCAGCAGGAGGAGGAGATCCTATTTTATATCAACATTTATTT"
  1. Count the number of bases in seq

  2. Write a function to calculate the GC ratio of a sequence, and try seq

    Where GC ratio is calculated by: (A + T) / (G + C) (that's was it says on Wikipedia: http://en.wikipedia.org/wiki/GC-content)

  3. Write a function that reverse-transcribes RNA to DNA

Modules

Modules contain functions (and objects) that you can import and use in your program

>>> import string

A module is an object

>>> string
<module 'string' from 'C:\Python25\lib\string.pyc'>
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

Importing names

>>> from string import maketrans

maketrans is a function

>>> maketrans
<built-in function maketrans>

Help

Python comes with built-in help

Try it

>>> help()

Note that the prompt changes to help>

>>> import string
>>> help(string.maketrans)

String Translations

To convert vowels into numbers:

>>> tt = maketrans("aeiou", "01234")
>>> "fantastic".translate(tt)
'f0nt0st2c'

Strings and sequences

You can get characters from a string by index

>>> EcoRI
'GAATTC'
>>> EcoRI[1]
'A'
>>> EcoRI[0]
'G'
>>> EcoRI[-1]
'C'

Substrings

A string can be sliced, to give copies of substrings.

>>> EcoRI
'GAATTC'
>>> EcoRI[1:3]
'AA'
>>> EcoRI[1:]
'AATTC'
>>> EcoRI[1:-1]
'AATT'

Length

You can get the length of a string using len.

>>> len(EcoRI)
6
>>> EcoRI[1:3]
'AA'
>>> len(EcoRI[1:3])
2

Strings Review