Weekly Summary 3

Dec 31, 2023 12:30 pm UTC+8

科大高新区

1 本周的工作

1.1 理论调研:

在深入了解Nsight System 和 Nsight Compute时，我发现我根本没有系统的学习GPU和CUDA编程，有很多基础的问题无法回答：
1. 通用GPU设计
  1. GPU作为一种加速器，设计之初的目的是解决什么问题？简单的并行计算？
  2. 为了这个目的，在硬件上诞生了什么经典的通用设计？类似CPU的Mem hierarchy，ROB，port model。
    1. 可能的SIMT， shared memory， large shared register page
  3. GPU程序的指令执行流程是什么？也是多级流水线？
  4. AI程序为什么适合在GPU上跑（并行度高，访存量大？），以及是如何在GPU上跑的？（举例 CNN，transformer，前向反向传播，梯度更新？）
2. NV-GPU和CUDA编程的相关概念
  1. NV的GPU有什么特殊的设计？（光追单元，3D单元？ Tensor core？）
  2. block grid
  3. Warps threads
DSE on GPU using Interval model
1. 三篇文章，与一个公式，
  1. GPUMech: GPU Performance Modeling Technique based on Interval Analysis
  2. MDM: The GPU Memory Divergence Model
  3. GCoM: A Detailed GPU Core Model for Accurate Analytical Modeling of Modern GPUs
2. 原理也是基于 Interval model 和 memory trace
3. 模拟精度与GPUSIM差10%，
4. 相较于GPUSIM 时间大大缩短，真实100ms，采集trace 10s, 差不多也是100倍。
5. 对于Zsim模拟器，不仅有DSE功能。保留了TB级别的trace数据，但是节约了DSE的大量时间。
Ideas about friends:
1. CZW： DSE on PIM
2. TBX： GPU model formular based on interval model
3. YFY： tensor core & shared memory model
My offer between huawei and baidu.

1.2 实践上手:

PPT 使用插件，修改字体和格式。
SD的推理上手并简单分析热点。

健康周报

2 下周任务优先级

完成毕业论文标题和内容填充：参考 Cross-Architecture Automatic Critical Path Detection For In-Core Performance Analysis
AI for system 3: back to 3 papers about GPU Analysis model to get deep understanding. After this I will continue we journey to analyse LLM and PowerInfer
Of course I have a lot of ACSA lab website design and document work remain to do.

Baidu ai for system

Pls correct the grammar of my weekly report emai(not the first time) in easy-understanding english:

Weekly Summary 3

1 本周的工作

1.1 理论调研:

1.2 实践上手:

2 下周任务优先级

2.1 团队

2.2 工作内容

2.2.1 AI 编译器