Developing FPGA-accelerated cloud applications with SDAccel: Practice
This course is for anyone passionate about learning how to develop FPGA-accelerated applications with SDAccel!
The more general purpose you are, the more flexible you are and the more kinds of programs and algorithms you can execute on your underlying computing infrastructure. All of this is terrific, but there is no free food and this is happening, quite often, by losing in efficiency.
This course will present several scenarios where the workloads require more performance than can be obtained even by using the fastest CPUs. This scenario is turning cloud and data center architectures toward accelerated computing. Within this course, we are going to show you how to gain benefits by using Xilinx SDAccel to program Amazon EC2 F1 instances. We are going to do this through a working example of an algorithm used in computational biology.
The huge amount of data the algorithms need to process and their complexity raised the problem of increasing the amount of computational power needed to perform the computation. In this scenario, hardware accelerators revealed to be effective in achieving a speed-up in the computation while, at the same time, saving power consumption. Among the algorithms used in computational biology, the Smith-Waterman algorithm is a dynamic programming algorithm, guaranteed to find the optimal local alignment between two strings that could be nucleotides or proteins. In the following classes, we present an analysis and successive FPGA-based hardware acceleration of the Smith-Waterman algorithm used to perform pairwise alignment of DNA sequences.
Within this context, this course is focusing on distributed, heterogeneous cloud infrastructures, providing you details on how to use Xilinx SDAccel, through working examples, to bring your solutions to life by using the Amazon EC2 F1 instances.
Reconfigurable cloud infrastructure
-Distributed systems, data center and cloud architectures are facing the exponential growth in computing requirements and the impossibility for CPU-based solutions to keep pace. Within this context these complex distributed systems have to move toward accelerated computing. Accelerators complement CPU-based architectures and deliver both performance and power efficiency. Moreover, modern data center, as we know, can be used by several different users to serve different workloads and the idea of having an underlying architecture built on reconfigurable technologies seems to provide an ideal fit for these changing, demanding, workloads. This module provides a description of the main cloud computing components and technologies, as well as detailing the current technologies to accelerate cloud computing workloads.
On how to accelerate the cloud with SDAccel
-Within this module we are going to have a first taste on how to gain the best out of the combination of the F1 instances with SDAccel providing some few practical instructions on how to develop accelerated applications on Amazon F1 by using the Xilinx SDAccel development environment. Then, we are going to present what it is necessary to create FPGA kernels, assemble the FPGA program and to compile the Amazon FPGA Image, or AFI. Finally, we will describe the steps and tasks involved in developing a host application accelerated on the F1 FPGA.
Summing things up: the Smith-Waterman algorithm
-Within this module we are going to introduce you to the Smith-Waterman algorithm that we have chosen to demonstrate how to create a hardware implementation of a system based on FPGA technologies using the Xilinx SDAccel design framework. We are going to dig into the details of the algorithm from its data structures to the computation flow. Then we are going to introduce the Roofline model and we are going to use it to analyze the theoretical peak performance and the operational intensity of the Smith-Waterman algorithm.
The Smith-Waterman example in details
-Within this module we are going to dig deeper in the Smith-Waterman algorithm. We are going to implement a first version of the algorithm on a local server with the Xilinx SDAccel design framework. Then we are going to introduce some optimizations to improve performance, in particular we will add more parallelism in the implementation and we will introduce systolic arrays. Moreover, we will explore how we can perform data compression and then we will leverage multiple memory ports to improve memory access speed. Finally, we are going to port our implementation of the Smith-Waterman algorithm on the AWS F1 instances.
-We are working at the edge of the research in the area of reconfigurable computing. FPGA technologies are not used only as standalone solutions/platforms but are now included into cloud infrastructures. They are now used both to accelerate infrastructure/backend computations and exposed as-a-Service that can be used by anyone. Within this context we are facing the definition of new research opportunities and technologies improvements and the time cannot be better under this perspective. This module is concluding this course but posing interesting questions towards possible future research directions that may also point the students to other Coursera courses on FPGAs.