The cluster‎ > ‎

Software



A cluster like the the HPU4Science's requires careful software selection to maximize the performance of the hardware. In this page, we describe the software choices for the HPU4Science cluster and detailed description is on the second Ars Technica article.


Operating System - LINUX


Linux is the current standard bearer for high performance computation operating systems with almost 92 percent of the top 500 super computers. Every major scientific computation project in the world, including the Large Hadron Collider, runs on linux. The key features of linux are stability, the ability to pick from a wide variety of file systems (more on this in a moment), the large existing code base for high performance computing, and the ease of tailoring the OS to the specific hardware requirements. Availability of open source projects with code that can be configured for highly parallelized processing was also an important consideration.



File System - BTRFS

For writing data at the highest possible speeds, one would normally choose a good PCI card or on-motherboard RAID. However, on-motherboard RAID chips do not perform as fast as PCI cards and their configuration can be both quirky and unstable. It also makes you dependent upon a hardware vendor’s choices and whims which may not align with the specific usage profiles of this system. Moreover, hardware RAID controllers come with a price, and the HPU4Science budget is large, but not infinite, and that money is better spent on more computational power (GPUs!).

These considerations lead to investigating BTRFS as a file system because it allowed software RAID with performance on par with hardware RAID. If it can perform as well as hardware RAID, software RAID is ideal because it can be more easily tailored to the specific system requirements and offers substantially more flexibility. It also makes future upgrades much easier as they do not require reconfiguring firmware or investing in new hardware. On top of that, because of the large amount of data that needs to be stored, the risk of data corruption is high, and BTRFS’s checksum algorithm ensures data reliability.

Primary Programming Language - Python


Given the choice of a highly parallelized GPU based cluster, there are very few choices for programming languages. The freely available libraries from nVidia for interfacing with the GPUs (CUDA) are written in C. CUDA, relative to previous general purpose GPU computation approaches, enables easier GPU processing by abstracting away most of the interaction between the software and the hardware and it opens up several hardware functions like shared memory that some open source implementation leave out.


SAGE


The research performed on the HPU4SCience cluster is expected to require extensive mathematical exploration. The researchers are not trying to create new theorems, but they do use high levels math, say level 3 on a log scale that ranges from 1 for Sudoku and 5 for Weinberg's Quantum Field Theory. Therefore, the system must explore both numeric and symbolic math. Since the programming for the cluster is largely written in Python, it would be nice if the mathematical software interacted well with Python.

Sage, a Computer Algebra System (CAS) which development was initiated by a number theorist, combines the power of commercial CASs but it is both written in and interprets Python. Sage is a combination of many open source mathematical and scientific packages including Maxima, Octave, Numeric Python, Scilab, SymPy, Matplotlib, Latex, etc. bound together into a single framework that lets users work in a single language but access a wide universe of software. It can also work with commercial software including Mathematica and MatLab. Sage provides an interactive graphical user interface (through any web browser) that is stylistically similar to Mathematica, but also very light and ideal for server configurations.


LITERATE PROGRAMMING and Reproducible computational research

Donald Knuth, master of us all, has long advocated literate programming as the way forward for technical programming, but these ideas have been superbly ignored by the large majority of people who code for science (let's not even mention people who just "code"). The core concept behind literate programming is to describe what you want to do and code it all at the same time, making sure that humans can understand what you’re doing, not just machines.



Subpages (1): Literate Programming