Unit 3: Strings

Strings are made up of characters enclosed in quote marks. The indexing operator selects a single character from a string:

>>> fruit = "banana"
>>> m = fruit[1]
>>> print(m)
>>> a

The expression fruit[1] selects character number 1 from fruit, and creates a new string containing just this one character. The variable m refers to the result. When we display m, we could get a surprise:

Computer scientists always start counting from zero! The letter at subscript position zero of "banana" is b. So at position [1] we have the letter a.

If we want to access the zero-eth letter of a string, we just place 0, or any expression that evaluates to 0, in between the brackets:

>>> m = fruit[0]
>>> print(m)
b

The expression in brackets is called an index. An index specifies a member of an ordered collection, in this case, the collection of characters in the string. The index indicates which one you want, hence the name. It can be any integer expression.

Length

The len function, when applied to a string, returns the number of characters in a string:

>>> fruit = "banana"
>>> len(fruit)
6

Traversal using the for loop

A lot of computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end.

>>> for c in fruit:
>>>     print(c)

Each time through the loop, the next character in the string is assigned to the variable c. The loop continues until no characters are left.

Slices

A substring of a string is obtained by taking a slice.

>>> s = "Pirates of the Caribbean"
>>> print(s[0:7])
Pirates
>>> print(s[11:14])
the
>>> print(s[15:24])
Caribbean

The operator [n:m] returns the part of the string from the n’th character to the m’th character, including the first but excluding the last. This behavior makes sense if you imagine the indices pointing between the characters, as in the following diagram:

If you imagine this as a piece of paper, the slice operator [n:m] copies out the part of the paper between the n and m positions. Provided m and n are both within the bounds of the string, your result will be of length (m-n).

Three tricks are added to this: if you omit the first index (before the colon), the slice starts at the beginning of the string (or list). If you omit the second index, the slice extends to the end of the string (or list). Similarly, if you provide a value for n that is bigger than the length of the string (or list), the slice will take all the values up to the end. (It won’t give an “out of range” error like the normal indexing operation does.) Thus:

>>> fruit = "banana"
>>> fruit[:3]
'ban'
>>> fruit[3:]
'ana'
>>> fruit[3:999]
'ana'

The in and not in operators

The in operator tests for membership. When both of the arguments to in are strings, in checks whether the left argument is a substring of the right argument.

>>> "p" in "apple" evaluates to True
>>> "i" in "apple" evaluates to False

The not in operator returns the logical opposite results of in:

>>> "x" not in "apple" evaluates to True