Friday, 26 October 2012 at 4:21 pm

There are a set of specialized data containers in python that can be extremely useful as an extension to the basic data structures (lists, dictionary, tuple..):

  • Default dictionaries
  • Named tuples
  • Ordered dictionaries

These structures are only available in python 2.5+. Import them from the collections module by:

from collections import *

Description and code examples of these structures are in this post.

Default dictionary

The defaultdict container is a dictionary that allows you to define a default value type without having to initialize the value. For example, with a normal dictionary where values are lists, you have to:

myDict = {}
myDict['key01'] = []

Instead of initializing the value with a blank [], you can just create a default dictionary:

myDict = defaultdict(list)

This might not seem like a big deal. But for lists with redundancy that you want to insert into a dictionary, sometimes you have to do this:

myData = [('key01',1),('key01',2),('key02',5),('key03',6)]
myDict = {}
for key,value in myData:
    if myDict.has_key(key):
        myDict[key] = []

With default dictionaries, you can just do this without worrying about overwriting previous values:

myData = [('key01',1),('key01',2),('key02',5),('key03',6)]
myDict = defaultdict(list)
for key,value in myData:

Named tuple

The namedtuple container basically allows you to create a custom data structure in one line:

Hits = namedtuple('hits',['query','subject','bitscore'])
myHitA = Hits('geneA','wnt1',230)
myHitB = Hits(query='geneB',subject='frz',bitscore=200)
myHitC = Hits('geneC',subject='pot',250)

The namedtuple function takes in the name of the object and a list of attributes. You can initalize this object with the attribute values in the same order as the list or specify a specific attribute.

You can access the data by:


Ordered dictionary

Ordereddict is a dictionary that preserves order of insertion:

myData = [5,1,10,39,20]
myDict = OrderedDict(myData)
#myDict preserves the order defined in myData.
#Inserted elements will be appended after the last element of myData