### Teraflop Parallel Computing on a Budget: Applications of GPU Computing in Mechanical Engineering

August 30, 2009

Sponsored By: **National Science Foundation**

In the first part of the workshop a sequence of four presentations will highlight the impact of GPU computing in a wide spectrum of Mechanical Engineering applications (agent-based modeling, finite element analysis, manufacturing, and computational dynamics). The second part of the workshop builds around a “Getting Starting with GPU Computing” presentation and a discussion of more advanced performance tuning aspects related to GPU computing.

## Schedule

#### Part 1: Introduction to GPU Computing

12:30 – 2:00 | Dan Negrut, University of Wisconsin, Madison | Getting Started with GPU Computing | abstract presentation |

#### Part 2: GPU-enabled Research in Mechanical Engineering

2:10 – 2:30 | Krishnan Suresh, University of Wisconsin, Madison | GPU-based Algebraic Reduction of Bone Models | abstract presentation |

2:30 – 2:50 | Roshan D’Souza, Michigan Tech University, Houghton | Where Video Games Meet Scientific Computations | abstract presentation |

2:50 – 3:10 | Sara McMains, University of California, Berkeley | GPUs for Computer Aided Design and Manufacturability Analysis | abstract presentation |

3:10 – 3:30 | Dan Negrut, University of Wisconsin, Madison | Large Scale Collision Detection on the GPU | abstract presentation |

#### Part 3: Advanced GPU Computing Tutorial and Q&A Session

3:40 – 4:40 | Sara McMains, University of California, Berkeley | GPU-based Algebraic Reduction of Bone Models | abstract presentation |

4:50 – 5:30 | Panel Discussion | Where Video Games Meet Scientific Computations | abstract presentation |

## Abstracts

#### Krishnan Suresh: GPU-based Algebraic Reduction of Bone Models (Vikalp Mishra, Krishnan Suresh) 1. Background

```
There is an increasing demand for rapid finite element analysis of bio-artifacts such as osteopathic bones, fractured hip-joints, etc. Bio-artifacts are typically represented via a grey-scale voxel image that may be obtained via various modalities such as CT-scan, MRI, etc. Consequently, a finite element analysis of such artifacts typically proceeds along one of two paths: (1) a direct hex-mesh based analysis where each voxel, or a group of voxels, are replaced by an equivalent finite-element hex, endowed with hex-shape functions, or (2) the artifact's boundary is inferred via various user-assisted algorithms, followed by a standard finite element analysis of the enclosed volume.
Each strategy has its strengths and weakness. Briefly, the hex-method directly exploits the voxel output of various modalities, but can be computationally daunting since hex-methods could result in large stiffness matrices. The boundary-based method, on the other hand, is computationally less demanding, but is a lengthier process that relies heavily on manual assistance. Further, strategies share a common weakness in that they do not directly support lower-dimensional modeling. This is particularly important since bones often exhibit beam-like behavior. 2. Algebraic Reduction
We shall present an efficient yet easily automatable method for analyzing long and slender bones. The proposed method directly exploits voxel data from CT scanners, and it achieves high computational efficiency through an algebraic reduction process. In the proposed reduction process, a beam bending stiffness matrix is constructed by extracting the appropriate strain energy from the voxel data. This is then projected onto a lower-dimensional space by appealing to standard 1-D beam-theories, such Euler-Bernoulli theory. Thus, neither a 3-D hex mesh nor the boundary of the bone needs to be constructed. The theoretical validity of the proposed methodology is substantiated through numerical experiments. 3. GPU Implementation
The proposed method also lends itself to easy parallelization. We demonstrate an implementation of the proposed algebraic reduction on an NVIDIA general purpose graphic processing (GPU) unit, using the modern CUDA C-like language. We compare the speed-up offered by the GPU, as opposed to an implementation on a high-end (CPU) workstation. 4. References
Jorabchi, K., Danczyk, J., and Suresh, K., Shape Optimization of Potentially Slender Structures, Submitted to the Journal of Computing and Information Science in Engineering, 2008.
Wang, C.M., Reddy, J. N., Lee, K. H., Shear Deformable Beams and Plates: Relationship to Classical Solutions. 2000, London: Elsevier Science.
```

#### Roshan D’Souza: When Video Games Meet Scientic Computing: Large-Scale Agent-Based Model Simulations on the GPU

Agent-based modeling is a bottom up technique to simulate many discrete dynamic sys- tems (swarms, networks etc). It is a direct computational representation where the behavior of individual entities in the dynamic system is represented using a software construct called agent. System level behaviors are obtained through the interaction of a large number of these agents. There are no analytical tools to predict system behaviors and as such the only way to analyze the system is through simulation. Due to its emergent nature, ensemble properties are dependent on simulation size (agent population). Representative agent populations can range into the tens of millions for certain systems. Current state-of-the-art systems are incapable of eciently computing such large simulations. In this presentation I describe some of the early work that we have done in enabling large-scale agent-based model simulations on graph- ics processing units (GPUs). I will discuss representation of agent state data, algorithms for agent-communication, agent spawning, con ict resolution, and statistics. We have implemented and tested our techniques on an NVIDIA 8800 GTX GPU. Benchmarks against state-of-the art toolkits show that performance gains of over 1000x are possible.

#### Sara McMains: NURBS Evaluation using the GPU (Adarsh Krishnamurthy, Sara McMains)

We present a unified method for evaluating and displaying Non-Uniform Rational B-Spline (NURBS) surfaces using the Graphics Processing Unit (GPU). NURBS surfaces, the de facto standard in commercial mechanical CAD modeling packages, are currently being tessellated into triangles before being sent to the graphics card for display, since there is no native hardware support for NURBS. Our method uses a unified GPU fragment program to evaluate the surface point coordinates of any arbitrary degree NURBS patch directly, from the control points and knot vectors stored as textures in graphics memory. The display incorporates dynamic Level of Detail (LOD) for real-time interaction at different resolutions of the NURBS surfaces. Different data representations and access patterns are compared for efficiency and the optimized evaluation method is chosen. Our GPU evaluation and rendering speeds are more than 40 times faster than evaluation using the CPU.

#### Dan Negrut Rigid Body Collision Detection (Hammad Mazhar, Dan Negrut)

This work concentrates on the issue of rigid body collision detection, a critical component of any software package employed to approximate the dynamics of multibody systems with frictional contact. This paper presents a scalable collision detection algorithm designed for massively parallel computing architectures. The approach proposed is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a 180x speedup over state-of-the art Central Processing Unit (CPU) implementations when handling multi-million object collision detection. GPUs are composed of many (on the order of hundreds) scalar processors that can simultaneously execute an operation; this strength is leveraged in the proposed algorithm. The approach can detect collisions between five million objects in less than two seconds; with newer GPUs, the capability of detecting collisions between eighty million objects in less than thirty seconds is expected. The proposed methodology is expected to have an impact on a wide range of granular flow dynamics and smoothed particle hydrodynamics applications, e.g. sand, gravel and fluid simulations, where the number of contacts can reach into the hundreds of millions.

## Software

Follow Step 1 and Step 2 below in order to be able to tap into the GPU computational resources of your computer.

After these two steps, if the graphics card in your computer is CUDA compatible, you can immediately start running demos on your GPU. If your computer is not CUDA compatible, you’ll only be able to run CUDA in emulation mode (the CPU will emulate the behavior of the GPU; very slow, useful for debugging only). A list of CUDA enabled products is available here: CUDA GPUs.

#### Step 1: Install Visual Studio 2008 Express and Windows SDK

n order to compile and run the code samples in the SDK, Microsoft Visual Studio 2008 needs to be installed.

Note: For those who have purchased Visual Studio 2008 Professional Edition, the Windows SDK does NOT need to be installed.

#### Step 2: Install CUDA

A CUDA capable GPU is required. A list of CUDA enabled products is available here: CUDA GPUs

Download and install in this order: the CUDA Driver, the CUDA Toolkit, and the CUDA SDK (links provided below, according to your Windows distribution).

### Organizers

#### Sara McMains

University of California, Berkeley

#### Roshan D’Souza

Michigan Tech University, Houghton

#### Krishnan Suresh

University of Wisconsin, Madison

#### Dan Negrut

University of Wisconsin, Madison