Google Tech TalksMarch, 7 2008ABSTRACTCoconut is a developing system for high-assurance, high-performancesoftware. It was used to develop a library of special functions forthe Cell BE processor, which is distributed in the Cell BE SDK 3.0 asMASS. Average performance is 4X better than the alternativehand-tuned C library, SimdMath.Coconut has been successful where patterns of efficienthardware-specific computation can be captured as higher-orderfunctions and encoded in a Domain Specific Language embedded inHaskell. Patterns include efficient control structures notexpressible in C, e.g., the MultiLoop, and efficient uses of SIMDinstructions which require significant compile-time computation forpattern specialization. Some patterns interact with a novelinstruction scheduler called Explicitly Staged Software Pipelining,based on a min-cut approach, which outperforms SWING modulo schedulingin our tests.A less developed aspect of Coconut is the parallel production ofproofs of correctness along with executables. Current work aims toprove only limited properties about programs---the ones most likely tobe broken---creative use of SIMD instructions, and parallelization.Coconut intermediate code is represented as nested code (hyper)graphs. At the lowest level, we transform acyclic loop bodies to remove theeffect of SIMDization, and produce machine and/or human readablespecifications. This has been used to verify opaque patterns ofoptimizing linear algebra for SIMD processors.Such code graphs are embedded in higher levels containing controlflow, first single-threaded control flow optimized for ILP, and thenparallel control-flow, optimized to hide communication latency. Atthis level control flow is restricted to allow peak utilization ofmulti-core hardware, but enable efficient compile-time verification ofsoundness. Soundness, in this context, means that the parallelizedprogram can be transformed into a code graph without synchronizingcontrol flow, because every execution can be shown to produce the sameresult. Think of it as reducing the parallel debugging effort to thesingle-threaded debugging problem by eliminating the non-determinisminherent in parallel code. I will give a formal language descriptionof the language, and the O(n) algorithm which proves soundness andproduces the equivalent ``single threaded'' code graph.Speaker: Christopher AnandChristopher Anand is a professor of Computing and Software at McMaster University. His main research areas are software correctness, high performance computation, and automatic code generation. He has also founded the company Optimal Computational Algorithms to provide hardware-specific libraries for scientific applications on novel architectures.