Kmeds Algo
This is the final project for a bioinformatics class I took last semester:
http://err.bio.nyu.edu/courses/index.php/V22.0480_Final_Project
I worked on Project 1. The goal was to find a scalable algorithm to cluster large data sets (something that couldn't all fit into memory) with arbitrary dimensions. We wrote a SQLite adapter to grab
n data points at a time (based on k, the number of expected clusters), and then ran a clustering algorithm on those n points, storing the results in memory (if the data set was large enough to require it, we could write them back to the SQLite DB). After num.iter iterations, we run the algorithm again on the result set to get our final medoids. From there, its relatively easy to assign each point to a medoid, forming the final clusters.Labels: code, dead reckoning, finals, statistics
