How Not To Iterate Something in Python

This is an article for some friends at the pyslack study group.

A common beginner mistake when iterating an API or set of database rows (for example), is to end up with several lists of elements.

Let's build a simple "third party API" -- a list of three people, where a person is a dict with name, height, and state.

In [9]:
john = {'name': 'John', 'height': 73, 'state': 'NC'}
jenniffer = {'name': 'Jenniffer', 'height': 65, 'state': 'NC'}
helen = {'name': 'Helen', 'height': 62, 'state': 'CA'}

people_api = [john, jenniffer, helen]

# We have a list of dicts:
people_api
Out[9]:
[{'height': 73, 'name': 'John', 'state': 'NC'},
 {'height': 65, 'name': 'Jenniffer', 'state': 'NC'},
 {'height': 62, 'name': 'Helen', 'state': 'CA'}]

Iterating it, starting out innocently:

The mistake starts out innocently, usually. I want to iterate the list, but I start out only wanting to look of names to prove I can do it, so I do something like this:

In [7]:
names = []
for person in people_api:
    names.append(person["name"])
    
names
Out[7]:
['John', 'Jenniffer', 'Helen']

Here comes the mistake

The mistake happens when we think "Oh darn, I forgot I needed the heights. Oh well, if one list is good, two is better". So we add another list.

In [11]:
names = []
heights = []
for person in people_api:
    names.append(person["name"])
    heights.append(person['height'])

print(names)
print(heights)
['John', 'Jenniffer', 'Helen']
[73, 65, 62]

One List is NOT Better than Two

Unfortunately, the benefits of lists are not cumulative. They're not like twenty dollar bills, where having two of them makes you richer. Having a list is more like having a date for particular showing of a movie -- one is a good time (maybe), but having two spells trouble for you.

Note we had to call print twice, for example. That's better than getting a soda thrown on you, but things get worse when you need to iterate the two lists, for example.

So how do you fix it?

Easy -- never build more than one list coming out of a loop.

The pattern looks like this:

create collection     (a list, for example)
loop                  (... over the api)
    build             (... an object.  Create it and add values.  May span multiple lines.)
    append            (Add the object you built to the list)

Applying the Pattern -- Using Python dict's

In [17]:
people_with_heights = []
for person in people_api:
    person_with_height = {'name': person['name'], 'height': person['height']}
    people_with_heights.append(person_with_height)

print(people_with_heights)
[{'name': 'John', 'height': 73}, {'name': 'Jenniffer', 'height': 65}, {'name': 'Helen', 'height': 62}]

Applying the Pattern 2.0 -- Using Named Tuples

Python's namedtuple collection is another cool way to do this.

In [26]:
from collections import  namedtuple

Person = namedtuple('Person', ['name', 'state'])
people = []

heights = []
for person in people_api:
    p  = Person(name=person['name'], state=person['state'])
    people.append(p)

print(people)
[Person(name='John', state='NC'), Person(name='Jenniffer', state='NC'), Person(name='Helen', state='CA')]