Saturday, June 13, 2009

Kmeds Algo

This is the final project for a bioinformatics class I took last semester:

http://err.bio.nyu.edu/courses/index.php/V22.0480_Final_Project


I worked on Project 1. The goal was to find a scalable algorithm to cluster large data sets (something that couldn't all fit into memory) with arbitrary dimensions. We wrote a SQLite adapter to grab n data points at a time (based on k, the number of expected clusters), and then ran a clustering algorithm on those n points, storing the results in memory (if the data set was large enough to require it, we could write them back to the SQLite DB). After num.iter iterations, we run the algorithm again on the result set to get our final medoids. From there, its relatively easy to assign each point to a medoid, forming the final clusters.

Labels: , , ,

2 Comments:

At June 19, 2009 4:41 PM , Blogger Kevin said...

hm creative thinking

 
At August 4, 2009 10:51 AM , Blogger Gene said...

Ironically, I've been doing a lot of 'pam' clustering myself, but with much smaller data sets.

 

Post a Comment

Links to this post:

Create a Link

<< Home