We need an OpenCL implementation of the system, so that we can make experiments way faster. If we liberate the CPU to concentrate on higher-level tests, we start to get close to really interesting stuff. We expand the adjacent possible! (Eric, LOL)
In a 1st phase, we might want to have at least a function (callable from python):
get_active_hard_locations(bitstring)
Which should return both the "close" hard locations and their distances. I've just committed a very basic draft code that computes the distance for all hardlocs, and I am working on code that will select just those below a certain threshold value for return to the cpu.
So we have a selection problem. Given 1mm hard locations, we want something like 1000 back. That's the problem--the GPU can do anything with the entire million, but it gets totally weird for building arrays with the result (lists are impossible so far).
The problem is we have to insert a HL into array position i (say, that was the i-th hardloc active the system found). NOW:
what is "i"?
Remember the kernels are running in crazy parallel, so it's very hard to include a counter that is available to all without having parallel collisions. Serially we have something like this:
Get_Active_Locs (bitstring b) {
for (x=0, x<NUM_LOCS, x++) {
hamming = dist(location [x], b);
if (hamming<threshold) {
active_locs[num_found] = x;
num_found++;
}
}
In parallel, we have concurrency problems with num_found!
kernel Get_Active_Locs(bitstring b, __global int num_found*) {
x = get_global_id(0);
hamming = dist(location [x], b);
if (hamming<threshold) {
active_locs[num_found] = x;
num_found++;
}
}
I'm looking into pyopencl.scan.GenericScanKernel() --> http://documen.tician.de/pyopencl/algorithm.html
I think its philosophy is close to mapreduce. Let's see where this leads.
We need an OpenCL implementation of the system, so that we can make experiments way faster. If we liberate the CPU to concentrate on higher-level tests, we start to get close to really interesting stuff. We expand the adjacent possible! (Eric, LOL)
In a 1st phase, we might want to have at least a function (callable from python):
get_active_hard_locations(bitstring)
Which should return both the "close" hard locations and their distances. I've just committed a very basic draft code that computes the distance for all hardlocs, and I am working on code that will select just those below a certain threshold value for return to the cpu.
So we have a selection problem. Given 1mm hard locations, we want something like 1000 back. That's the problem--the GPU can do anything with the entire million, but it gets totally weird for building arrays with the result (lists are impossible so far).
The problem is we have to insert a HL into array position i (say, that was the i-th hardloc active the system found). NOW:
what is "i"?
Remember the kernels are running in crazy parallel, so it's very hard to include a counter that is available to all without having parallel collisions. Serially we have something like this:
Get_Active_Locs (bitstring b) {
for (x=0, x<NUM_LOCS, x++) {
hamming = dist(location [x], b);
if (hamming<threshold) {
active_locs[num_found] = x;
num_found++;
}
}
In parallel, we have concurrency problems with num_found!
kernel Get_Active_Locs(bitstring b, __global int num_found*) {
x = get_global_id(0);
hamming = dist(location [x], b);
if (hamming<threshold) {
active_locs[num_found] = x;
num_found++;
}
}
I'm looking into pyopencl.scan.GenericScanKernel() --> http://documen.tician.de/pyopencl/algorithm.html
I think its philosophy is close to mapreduce. Let's see where this leads.