星元集团
 
 
星元集团

Accelerated Computing

Deep Learning GPU Assembly and Debugging Accelerated Computing GPU Maintenance Computing Platform Development
Accelerated Computing

Module 1: Basics of Accelerated Computing 

    1.1 Overview of Computing Architectures

            • CPU vs. GPU vs. FPGA vs. TPU
            • Parallel Computing vs. Serial Computing
            • Vectorized Computing and SIMD Instruction Set

    1.2 Applications of Accelerated Computing        

            • Deep Learning Training and Inference Optimization
            • Scientific Computing (High Energy Physics, Genome Sequencing)
            • Financial Modeling (High-Frequency Trading, Quantitative Analysis)
            • Industrial Simulation (Computational Fluid Dynamics, Finite Element Analysis)

Module 2: GPU Accelerated Computing

    2.1 Basics of GPU Computing

            • GPU Computing Architecture (CUDA / ROCm / OpenCL)
            • CUDA Core (CUDA C/C++ Programming)
            • Tensor Core, RT Core Accelerating AI Computation

    2.2 Deep Learning GPU Acceleration
            • PyTorch / TensorFlow GPU Computing Optimization
            • cuDNN, cuBLAS, TensorRT Introduction
            • Multi-GPU Training (Data Parallel / Model Parallel)

    2.3 GPU Programming and Optimization
            • CUDA Thread Model (Blocks, Threads, Warps)
            • Memory Optimization (Global Memory vs. Shared Memory)
            • GPU Profiler to Monitor Bottlenecks (Nsight, nvprof)

Module 3: FPGA Accelerated Computing

    3.1 Principles of FPGA Computing
       
     • FPGA vs. CPU vs. GPU
            • Reconfigurable Computing and Hardware Acceleration

    3.2 FPGA Programming Basics
      
      • Introduction to Verilog / VHDL
            • HLS (High-Level Synthesis) Acceleration Development
            • FPGA Applications in AI and High-Performance Computing

    3.3 FPGA Optimization in Specific Industries
           
 • High-Speed Signal Processing (5G, Millimeter-Wave Radar)
            • Low Latency Financial Trading (High-Frequency Trading HFT)
            • Edge Computing in IoT (Smart Cameras, Autonomous Driving)

Module 4: TPU / AI-Specific Accelerators

    4.1 TPU Computing Architecture
 
           • TPU vs. GPU for Deep Learning Acceleration
            • Google TPU Development Framework (JAX, TensorFlow)
            • TPU Training vs. TPU Inference

    4.2 AI-Specific Accelerators
            • Huawei Ascend (Sheng Teng)
            • Cerebras Wafer-Scale Engine
            • Graphcore IPU

Module 5: Software Layer Optimization and Acceleration Libraries

    5.1 Deep Learning Acceleration Libraries
            
• cuDNN / ROCm MIOpen: Deep Learning Acceleration
            • TensorRT: AI Inference Optimization
            • DeepSpeed / Horovod: Distributed AI Training

    5.2 Parallel Computing Optimization
            • OpenMP / OpenACC Parallel Optimization
            • MPI (Message Passing Interface) for Accelerating Large-Scale Computing

    5.3 Numerical Computing Acceleration
            • NumPy / SciPy Optimization (Intel MKL, OpenBLAS)
            • JIT Compilation (Numba, TensorFlow XLA)

Module 6: High-Performance Computing (HPC) Cluster Optimization

    6.1 HPC Hardware Architecture
            • InfiniBand High-Speed Interconnect
            • RDMA / NVLink Remote Data Access
            • High-Speed Storage (NVMe, Lustre File System)

    6.2 Large-Scale Computational Task Management
            • SLURM / Kubernetes Task Scheduling
            • Cloud GPU Computing (AWS, Azure, GCP)
            • Spot Instance Cost Optimization

 

AI加速.jpg


Back to top

Contact Us

+1-8259866358 Email: infor@novatech-alberta.com 09:00:00 - 18:00:00
Copyright2025@NovaTech