Divya Kiran Kadiyala

I am an AI Performance Architect at Hewlett Packard Enterprise (HPE), where I develop tailored system architectures for parallel, scientific, and deep learning AI/ML workloads in High Performance Computing environments. My work spans computer architecture, memory system design, distributed AI/ML systems, and CXL-based technologies, with a focus on accelerating memory bandwidth-intensive applications in resource-constrained environments.

Prior to joining HPE, I completed my PhD at Georgia Institute of Technology, Atlanta, advised by Dr. Alexandros Daglis, Associate Professor in the School of Computer Science. My doctoral research focused on designing and developing tailored memory system optimizations to enhance the performance of parallel, scientific, and deep learning AI/ML workloads, drawing on insights from computer architecture, memory system design, distributed AI/ML systems, and CXL technologies.

Earlier, I earned my Master’s in Electrical Engineering from Arizona State University, Tempe, and worked as a Sr. Applications Engineer at Cadence Design Systems, San Jose, CA. For more details, please refer to my CV.

I am always open to opportunities and collaborations where I can apply my academic expertise and industry experience to drive innovation in next-generation memory and computer system architectures. If my profile aligns with your interests, please feel free to reach out via email or LinkedIn.

Thesis Topic: Memory system optimizations for parallel and bandwidth-intensive workloads
The growing performance and bandwidth demands of modern datacenter and HPC workloads are driving innovation in memory system design. My research adopts a holistic approach to optimizing memory systems across multiple levels of the system hierarchy—chip, server, and cluster—through architectural techniques integrated with system software. By jointly considering workload-specific characteristics and underlying hardware capabilities, these innovations demonstrate how tailored memory system designs can significantly enhance the performance of parallel, scientific, and AI/ML workloads in resource-constrained and bandwidth-intensive environments..