Skip to content

Range Select Benchmark Instructions

Martin Nettling edited this page Dec 15, 2013 · 14 revisions

This wiki describes how to perform the benchmark on range selects from the paper "DRUMS: Disk Repository with Update Management, for high throughput sequencing data." Tutorials, how DRUMS can be used with SNP data and HERV data with example files, can be found in the both tutorial packages ("herv/tutorial","snp/tutorial").

The results of this benchmark can be seen in figure 7 in the paper "DRUMS: Disk Repository with Update Management, for high throughput sequencing data."

long OUTPUT_AFTER_SELECTS = 10000;
long returnedElements = 0;
long performedSelects = 0;

DRUMS

Open an existing DRUMS table. The old configuration and the used HashFunction will be loaded automatically. Instantiate a DRUMSReader to select elements from the underlying table.

DRUMSParameterSet<HERV> globalParameters = new DRUMSParameterSet<HERV>(new File("/the/path/to/the/DRUMStable"));
DRUMS<HERV> drums = DRUMSInstantiator.openTable(AccessMode.READ_ONLY, globalParameters);
DRUMSReader<HERV> reader = drums.getReader();

Performing the range select benchmark is very similiar to the random lookup benchmark.

HERV

...
long time = System.currentTimeMillis();
// run over all perviously generated ranges, and generate lower and upper HERV objects with key only
{
    HERV lower = new HERV(chromosome1, lower_startPositionChromosome, 0, 0, 0, 0);
    HERV upper = new HERV(chromosome1, upper_startPositionChromosome, Integer.MAX_VALUE, 65536, 65536, 65536);
    List<HERV> result = reader.getRange(lower.getKey(), upper.getKey());
    // remove elements which have an evalue larger than the requested one
    returnedElements += result.size();
    if(++performedSelects % elementInterval == 0) {
        System.out.println("Performed last " + OUTPUT_AFTER_SELECTS + " range requests in " + (System.currentTimeMillis() - time) + 
        " milli seconds. Requested " + returnedElements + " elements in total.");
        time = System.currentTimeMillis();
    }
}
System.out.println("Performed last range requests in " + (System.currentTimeMillis() - time) + 
" milli seconds. Requested " + requestedElements + " elements in total.");
...
drums.close();

SNP

The benchmark on DRUMS for SNP data looks exactly the same. But instead of using the HERV class the SNP class must be used.

...
     SNP lower = new SNP(sequenceId, lower_positionOnChromosome, 0);
     SNP upper = new SNP(sequenceId, upper_positionOnChromosome, 65536);
...

MySQL

Don't forget to configure MySQL properly.

To select records from the MySQL tables using Java, build up a connection to the database with a JDBC driver (com.mysql.jdbc.Driver). The difference to the benchmark for DRUMS is only the way records are requested.

HERV

...
selectQuery = "SELECT * FROM herv WHERE chromosome = " + lower.getChromosome() + 
    " AND startPositionChromosome BETWEEN " + lower.getStartPositionChromosome() + " AND " + upper.getStartPositionChromosome() +
    " AND eValue <= " + evalue;
ResultSet set = statement.executeQuery(selectQuery);
ArrayList<HERV> result = new ArrayList<HERV>();
while (set.next()) {
   HERV record = new HERV();
   // fill HERV with data from set
   list.add(record);
}
...

SNP

To perform the same benchmark for SNP data only the query and the used class must be adapted.

...
selectQuery = "SELECT * FROM arabreplace WHERE sequence_id = " + lower.getSequenceId() + 
    " AND position BETWEEN " + lower.getBasePosition()  + " AND " + upper.getBasePosition() ;
ResultSet set = statement.executeQuery(selectQuery);
ArrayList<SNP> result = new ArrayList<SNP>();
while (set.next()) {
   SNP record = new SNP();
   // fill HERV with data from set
   list.add(record);
}
...

Clone this wiki locally