How to improve performance of this code?

Thanks to some help from people here, I was able to get my code for Tasmanian camels puzzle working. However, it is horribly slow (I think. I’m not sure because this is my first program in python). The example run in the bottom of the code takes a long time to be solved in my machine:

dumrat@dumrat:~/programming/python$ time python camels.py
[['F', 'F', 'F', 'G', 'B', 'B', 'B'], ['F', 'F', 'G', 'F', 'B', 'B', 'B'],
 ['F', 'F', 'B', 'F', 'G', 'B', 'B'], ['F', 'F', 'B', 'F', 'B', 'G', 'B'],
 ['F', 'F', 'B', 'G', 'B', 'F', 'B'], ['F', 'G', 'B', 'F', 'B', 'F', 'B'],
 ['G', 'F', 'B', 'F', 'B', 'F', 'B'], ['B', 'F', 'G', 'F', 'B', 'F', 'B'],
 ['B', 'F', 'B', 'F', 'G', 'F', 'B'], ['B', 'F', 'B', 'F', 'B', 'F', 'G'],
 ['B', 'F', 'B', 'F', 'B', 'G', 'F'], ['B', 'F', 'B', 'G', 'B', 'F', 'F'],
 ['B', 'G', 'B', 'F', 'B', 'F', 'F'], ['B', 'B', 'G', 'F', 'B', 'F', 'F'],
 ['B', 'B', 'B', 'F', 'G', 'F', 'F']]

real    0m20.883s
user    0m20.549s
sys 0m0.020s

Here’s the code:

import Queue

fCamel = 'F'
bCamel = 'B'
gap = 'G'

def solution(formation):
    return len([i for i in formation[formation.index(fCamel) + 1:]
                if i == bCamel]) == 0

def heuristic(formation):
    fCamels, score = 0, 0
    for i in formation:
        if i == fCamel:
            fCamels += 1;
        elif i == bCamel:
            score += fCamels;
        else:
            pass
    return score

def getneighbors (formation):
    igap = formation.index(gap)
    res = []
    # AB_CD --> A_BCD | ABC_D | B_ACD | ABD_C
    def genn(i,j):
        temp = list(formation)
        temp[i], temp[j] = temp[j], temp[i]
        res.append(temp)

    if(igap > 0):
        genn(igap, igap-1)
    if(igap > 1):
        genn(igap, igap-2)
    if igap < len(formation) - 1:
        genn(igap, igap+1)
    if igap < len(formation) - 2:
        genn(igap, igap+2)

    return res

class node:
    def __init__(self, a, g, p):
        self.arrangement = a
        self.g = g
        self.parent = p

def astar (formation, heuristicf, solutionf, genneighbors):

    openlist = Queue.PriorityQueue()
    openlist.put((heuristicf(formation), node(formation, 0, None)))
    closedlist = []

    while 1:        
        try:
            f, current = openlist.get()
        except IndexError:
            current = None

        if current is None:
            print "No solution found"
            return None;

        if solutionf(current.arrangement):
            path = []
            cp = current
            while cp != None:
                path.append(cp.arrangement)
                cp = cp.parent
            path.reverse()
            return path

        #arr = current.arrangement
        closedlist.append(current)
        neighbors = genneighbors(current.arrangement)

        for neighbor in neighbors:
            if neighbor in closedlist:
                pass
            else:
                openlist.put((current.g + heuristicf(neighbor),
                             node(neighbor, current.g + 1, current)))

        #sorted(openlist, cmp = lambda x, y : x.f > y.f)

def solve(formation):
    return astar(formation, heuristic, solution, getneighbors)

print solve([fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel])
#print solve([fCamel, fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel, bCamel])

That is just for 3 camels each. I wanted to do this for 4 at least. That test case is still running (It’s been about 5 minutes now :(). I’ll update this if and when it finishes.

What should I do to improve this code?(Mostly performance-wise, any other suggestions are welcome also).

Thanks.




Answer

I’ve been tripped up by this before too. The bottleneck here is actually if neighbor in closedlist .

The in statement is so easy to use, you forget that it’s linear search, and when you’re doing linear searches on lists, it can add up fast. What you can do is convert closedlist into a set object. This keeps hashes of its items so the in operator is much more efficient than for lists. However, lists aren’t hashable items, so you will have to change your configurations into tuples instead of lists.

If the order of closedlist is crucial to the algorithm, you could use a set for the in operator and keep an parallel list around for your results.

I tried a simple implementation of this including aaronasterling’s namedtuple trick and it performed in 0.2 sec for your first example and 2.1 sec for your second, but I haven’t tried verifying the results for the second longer one.