SourceCluster 1.0 is a program that tests to see if sequences from the same source are more related to each other than expected by chance alone. SourceCluster uses a distance matrix of the proportion of nucleotide differences between all possible pairwise comparisons between isolates in the sample set to calculate the average distance between isolates. The pairwise differences in each column are then summed. Next, the column sum for each isolate representing a given source population (i.e., human, animal, and food) is divided by the number of pairwise comparisons. This value is termed T-statistic. The program then calculates a T-statistic for 1,000 random distance matrices. Column labels in the observed distance matrices were swapped to generate the randomized matrices. The T-statistic obtained from the observed distance matrix was compared to T-statistics generated from the randomized distance matrices to determine the statistical significance of same source isolate clustering within each tree.

First you will need to download Java 2 SDK Standard Edition version 1.4.0 (also known as JDK 1.4.0) to your computer.

The pairwise nucleotide difference matrix can be generated from a NEXUS/PAUP alignment by performing the "pairdiff" procedure is PAUP* software. The output should be saved as a matrix. This file will require minor manual modifications.

### SourceCluster Resources:

SourceCluster source code (.java file)(.class file)

Documentation

SourceCluster example

Sample infile

Sample outfile