Advances in Numerical Algorithms and High Performance Computing

University College London, April 14-15, 2014

Background to the Conference

Two factors are creating major new challenges at the interface between numerical analysis (NA) and high performance computing (HPC). The use of more realistic but complicated mathematical models is leading to problems for which new algorithms need to be developed if they are to be tractable. Moreover computer hardware is in the middle of a dramatic paradigm shift: improvements of raw computational performance are no longer enabled by faster CPUs, but rather, by increasing parallelism and heterogeneity. Standard algorithms and associated software are unable to fully exploit this new generation of hardware. The long-standing lack of investment in the development of high quality mathematical software exacerbates the issue. The net effect is that scientists are unable to make the best use of computational resources and hence are unable to tackle problems of the size that the emerging hardware potentially allows. For all these reasons, new algorithms need to be developed that employ novel mathematical, algorithmic and coding techniques.

David Silvester & Nick Higham Coordinators of the EPSRC Network on NA-HPC.

Twitter hashtag #NAHPC14

Storify of the event.

Scope of the Meeting

The meeting will be broad in scope and is intended to facilitate collaboration between numerical analysts, computer scientists and developers and users of HPC. The aim of the meeting is to bring together practitioners in this challenging and modern research field, present advances in the area, and enable lively and fruitful exchange of ideas.

Download a full programme here: Advances in Numerical Algorithms and High Performance Computing UCL Programme

 14th April: Programme

(All talks will take place in the UCL Roberts Building, Room 106.)


09:50 Introduction

Ulrich Rüde, Erlangen, Germany

Is 2.44 trillion unknowns the largest finite element system that can be solved today? (slides available here)

Supercomputers have progressed beyond the Peta-Scale, i.e. they are capable to perform in excess of 10^(15) operations per second. I will present parallel multigrid based solvers for FE problems with beyond a trillion unknowns. This is e.g. enough to discretize the whole volume of the planet with a global resolution of about 1 km. Since the compute times are around 1 minute for computing a single solution, these algorithms can still be used reasonably within an implicit time stepping procedure or a nonlinear iteration.


Tim Warburton, Rice University, USA 

Manycore Algorithms for High-order Finite Element Methods: when time to solution matters. (slides available here)

The ultimate success of many modeling applications depends on time to solution. I will illustrate the critical nature of time to solution by describing a joint project between my group at Rice University and Dr David Fuentes at the MD Anderson Cancer Center. The project goal is to evaluate the role and viability of using finite element modeling as part of the treatment planning process for MR Guided Laser Induced Thermal Therapy. The success of this project will depend in great part on the ability to model individual treatments with calculations that take mere seconds.

Modern many-core processing units, including graphics processing units (GPU), presage a new era in on-chip massively parallel computing. The advent of processors with O(1000) floating point units (FPU) raises new issues challenging conventional measures of ``optimality'' of numerical methods. The ramp up in FPU counts for each new generation of GPU over the past four years has been accompanied by a slower increase in the the memory capacity of the GPU. For example,a few hundred US dollars currently buys a parallel computer that is capable of performing O(4E12) floating point operations per second, but only of reading O(5E10) values from memory per second. From the point of view of numerical analysis, this means that the traditional approach of comparing optimality of alternative numerical methods based on their floating point operation count per degree of freedom has become mostly irrelevant. Claims of optimality derived from this measure therefore need to be reevaluated and the formulation of numerical methods in general need to be revisited given the changing computational landscape.

The presentation will touch on several important and inter-linked issues that impacted the development of high-order finite-element methods based solvers for many-core architectures that are rapidly evolving. We will discuss on-chip scalability, multi-GPU scalability, inter-generational GPU scaling, and specialization for element internal structure.

Finally, I will introduce the OCCA API that my team is developing as a thin portability layer to enable our simulation codes to be threaded using OpenMP, OpenCL, or CUDA as selected dynamically at runtime. This additional flexibility enables us to include the threading model as an additional search direction when we optimize the simulation codes for a given processor. I will give comparisons of the performance of the simulator using OCCA on multiple different vendor devices using different threading models.


Mike Dewar, NAG

Numerical Libraries for Modern Architectures (slides available here)

The NAG Library has been under development since 1970 and is now the largest commercially available collection of mathematical and statistical algorithms.  In the four decades that we have been developing software there have been many changes in computer architectures, and somehow NAG has always managed to adapt to them. This talk will look at the challenges of manycore and accelerators, and how we are addressing them.

12:30 Lunch and posters

Hans Petter Langtangen, SIMULA

Code Generation for High-Level Problem Specification and HPC (anahpc13 Langtangen presentation slides)

This talk describes two software tools, FEniCS and Mint, for writing simple, serial code, close to the mathematical description of the problem, yet with high computational performance on modern architectures. Both tools build on the idea that in a limited problem domain, here partial differential equations, it is feasible to automatically translate the high-level problem description into low-level HPC code in C or C++. This approach allows the HPC competence to be built into software lets the application scientist concentrate on the science and the principal algorithms.

The FEniCS project ( enables high-level specification of finite element variational problems in Python and generates C++ code tailored to the problem at hand. The C++ code is linked to state-of-the-art libraries for linear algebra (e.g., PETSc or Trilinos) and made available to the user's Python program. The talk features many examples on the functionality and capabilities of FEniCS as a general-purpose tool for solving PDEs by finite elements.

Mint ( is a more specialized tool that takes serial C code with pragmas and generates highly optimized code for multicore/GPU architectures. Mint has a particular focus on finite difference stencil-type operations. Examples on applications and performance results will be presented.


Jiahao Chen, MIT

What's next in Julia?

Julia is a language for technical computing which is rapidly gaining popularity. This talk will introduce some of the unique features of the Julia language, and explain how its rich type system and multiple dispatch together allow for expressive syntax that is well suited for the implementation of numerical algorithms. Recent progress toward native Julia algorithms for scalable numerical linear algebra computations will also be presented.

15:40 Break

Jos Martin, Mathworks

MATLAB: The challenges involved in providing a high-level language on high-performance hardware. (anahpc13 Jos Martin presentation slides)

MATLAB provides a high-level programming language and environment that is widely used for scientific computing across academia and industry. GPUs & cluster environments provide fantastic computational power for certain classes of problem. Bringing these two together allows MATLAB programmers to solve bigger problems faster, but presents a number of technical challenges.

In this talk we will look at the ways in which we have tried to provide access to hardware without requiring the user to learn new programming techniques or breaking the standard MATLAB programming model. Doing this presents several challenges: in defining a simple interface, in reproducing answers and in managing and maintaining the hardware and software stack. We will discuss some of these challenges and the ways in which we have chosen to address them.

19:00  A joint workshop dinner is planned from 7pm. The venue will be announced closer to the meeting.


15th April: Programme

(All talks will take place in the UCL Roberts Building, Room 106.)



Mike Heroux, Sandia National Laboratories, USA

Toward the Next Generation of Parallel and Resilient Algorithms & Libraries

For decades parallel computing has been the focus of intense research and development in selected fields, and numerous large-scale parallel applications have been developed. SPMD via MPI has been a dominant approach to parallelism to date, but this approach alone will be insufficient going forward. Presently we are on the threshold of large-scale algorithm re-design and implementation across most application areas, but the path to developing these algorithms is uncertain. There are many competing concerns, and the number of choices is growing. Furthermore, resilience is becoming an issue and may force algorithm developers to explicitly manage system faults, beyond checkpoint-restart.

In this presentation we discuss some of the principles of parallel algorithm development that have produced today¹s approaches, and how we can address these principles going forward. We also discuss what much change in order to move forward and give ideas for developing parallel algorithms and libraries now that will have sustained value in the future.

10:20 Break

Didrik Pinte, Enthought Europe

Harnessing the Power of HPC with the Python Ecosystem

Numerical analysis is at a crossroads where hardware advances are outpacing the capability of software to harness its full potential, leaving data scientists constrained by the limitations of standard algorithms and generic software. To date, the lagging development of more sophisticated mathematical software has been in part due to the need for a programming language that can be learned quickly by a broad audience, yet still be powerful and flexible enough to handle the information throughput of HPC applications.

While Python has historically been overlooked in the high-performance computing world, today, with Python libraries such as NumPy, the language has transformed from a generalist programming language into a highly capable platform for developing next generation algorithms and HPC frameworks. Because Python is easy to learn but also easy to extend, professionals and researchers can build scalable, fast and robust software in a fraction of the time required by other coding languages.

In this presentation, we will illustrate how Python provides the tools and flexibility to develop software that paces the advances in HPC and how Python’s high-level programming stack is enriched by its broad ecosystem of libraries tailored to mathematical and scientific computing. We’ll provide examples of Python’s applicability for HPC ranging from how libraries such as NumPy allow for rapid manipulation and processing of large datasets to creating distributed arrays and interfacing with GPU's. With Python, data scientists now have the building blocks to create comprehensive HPC solutions to fully leverage the power of today’s hardware.


Spencer Sherwin, Imperial College

The Nektar++ framework for tensor product based spectral/hp element discretisations: Application to high Reynolds number complex geometry flows.

Spectral hp element methods are the high order extension to classical finite element methods using piecewise polynomial expansions. When developing spectral/hp element discretisations a number of design and implementations questions naturally arise. These include: What polynomial orders are most efficient? Should one use hierarchical or nodal expansions bases or even pure spectral expansions? Should one use discontinuous or continuous projection or explicit versus implicit time stepping?

In the Nektar++ library we have designed a framework based on tensor product 1D/2D and 3D expansions in mixed elemental shapes which can be applied to investigate the above design choices. In this presentation we will review our framework to provide the motivation behind the design of our library. In addition we will briefly discuss the challenges of maintaining this type of framework and making it easier to use for novice users. As means of a demonstrating some of the features and challenges for these discretisations we will also report on current applications to large eddy simulation of high/industrial Reynolds number, complex geometry flows which require specialised high order meshing strategies and numerical stabilisation strategies.

12:30 Closing remarks
12:40 Lunch



Registration is now closed. The administrator is Jenny Gradwell.

▲ Up to the top