Over the past several years, new DNA sequencing technologies have led to a great increase in the quantity of biological sequence data that can be generated. Typically there may be millions or even billions of short reads sequences of a few hundred base pairs that are to some degree redundant: the data fall naturally into clusters of sequences that are highly similar to each other. In order to reduce the time required for analysis of the data, it therefore becomes of interest to compute representatives of these clusters, based on some definition of similarity. The book examines two clustering software packages, USEARCH and DNACLUST, which seek to perform this clustering task efficiently. An overview of the techniques used by these two packages is presented. They are compared and evaluated both from a methodological and experimental perspective and conclusions are drawn about their effectiveness and utility. Before describing the algorithms' techniques, preliminaries are presented, which will be very helpful for naïve readers who are new to the field of bioinformatics or have some basic understanding in this area of study. However knowledgeable readers may skip this section.