This is a really interesting hardware architecture with a lot of out-of-the-box potential. The architecture reference link (PDF below) is particularly interesting.
Adapteva Launches $99 Parallella Open Computing Platform
@adapteva on Twitter.
In section 2.2 of the “Epiphany Architecture Reference”
To speed up this calculation using several mesh nodes simultaneously, we first need to distribute the A, B, C matrices over P tasks. Due to the matrix nature of the architecture, the natural way to distribute large matrices is by cutting them into smaller blocks, sometimes referred to as “blocked by row and column”. We then construct a SPMD program that runs on each of the mesh nodes.
Figure 3 shows how the matrix multiplication can be divided into 16 sub-tasks and mapped onto 16 mesh nodes. Data sharing between the sub tasks can be done by passing data between the cores using a message passing API provided in the Epiphany SDK or by explicitly writing to global shared memory.
This last part about “or writing to global shared memory” along with the tables of memory mapped I/O addresses caught my attention. If one was so inclined you could write your own light-weight executive kernel right on top of this hardware. Or build up on top of the existing Light Weight Run Time Kernel, Epiphany Resource Library, Standard C RTL port, or some combination of the above…
I signed up to get notifications about hardware availability.