Design efficient vectorized Stockham and Korn-Lambiotte FFTs for large vector length machines, implement them on an abstract vector machine, and compare their performance based on memory operations and vector arithmetic operations.
The Fast Fourier Transform (FFT) is a fundamental algorithm in computer science, enabling efficient computation of the Discrete Fourier Transform (DFT). Optimizing FFT performance is crucial for high-efficiency computing in applications like signal processing and homomorphic encryption. Our study focuses on designing and implementing vectorized FFT algorithms for large vector-length machines. We derive the vectorized formulas for the Stockham and Korn-Lambiotte FFTs from the iterative Cooley-Tukey, and compare their performance against each other based on memory and vector arithmetic operations to identify the most efficient approach for modern computational architectures.
The thesis entails a deep dive into the foundational theories and practical applications of FFT algorithms, with Sultan M. Alsultan leading the project under the mentorship of Professor Jeremy Johnson. Starting with a theoretical examination of key FFT algorithms in the literature, the work progresses to proving and implementing these algorithms, notably the Cooley-Tukey, Korn-Lambiotte, and Stockham FFTs. The ultimate goal is to develop and document an optimized implementation of the Stockham FFT for large vector machines and to get a working implementation on SPIRAL (a code generation system written in GAP), addressing specific architectural needs to maximize computational efficiency and performance.
sultan.mohammed.alsultan@drexel.edu
jeremy.russell.johnson@drexel.edu