Weekly Summary 4
Jan 07, 2024 12:30 pm UTC+8
科大高新区
1 本周的工作
- Mon: 元旦假期
- Tues:研究了毕业论文的组织。了解了Device 2 Device 技术: InfiniBand, RoCE
- Wed:
- 调研了各种超算,华为和NV的AI超算互联图。并与刘波师弟讨论了LLM on Shenwei的可能性。
- 学习了相关的AI的训练并行策略
- Thurs: 阅读GPU Interval model, 但是被CPU和GPU的分析模型文章吸引去了。发现有些基础不明白。开始读教程 https://cuda-tutorial.github.io/part1_22.pdf, 准备弄明白概念再看文章。
- Fri: 阅读了一部分 cuda-tutorial。 和huawei battle 薪资
- Sat: 大致了解了GPU performance model,并与yfy和师兄讨论。但师兄并不认为有用。
- Sun:周报,回顾,细化了任务优先级的评判标准。
| metrics | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|---|---|---|---|---|---|---|---|
| Healthy | D | B | B | A | A | B | C |
| Efficiency | D | A | B | C | C | C | C |
| Exercise | x | x |
2 下周任务优先级
- Utopia 2: 给蔚来讲Utopia,需要补充基础知识。润色逻辑和图表。
- Graduation thesis 3:大纲基本确定了,但是题目还没确定。打算先把背景知识和动机写完。再敲定题目。
- AI for system 4: 首先cuda的基础知识还有待完善,之后可以review Interval model的细节尤其是动机部分。之后可以略微调研分析模型的方法。之后是华为DaVinci架构。在这些之后再继续LLM的分析,和PowerInfer 的相关优化方法的调研。
- 异构系统workload性能评估:
- 先了解OpenCL是如何跨平台和抽象计算资源的。在进一步了解。
- literature review :CPU GPU charalization scheduler。 可以从并行度,访存需求的角度考虑。
- ACSA Lab website:许多设计,文档和信息收集工作有待处理,可以先从小部分人开始测试信息收集。
2.1 任务评估与时间分配
| 紧急性(3) | 重要性(3) | 喜好(1) | 工作量(3) | 总分 | 分配 | |
|---|---|---|---|---|---|---|
| report | 3 | 0 | 0 | 0 | 3 | 一天欠 |
| thesis | 3 | 3 | 0 | 3 | 9 | 两天多 |
| AI | 2 | 2 | 1 | 2 | 7 | 两天欠 |
| OpenCL | 2 | 1 | 1 | 1 | 5 | 一天多 |
| web | 1 | 0 | 0 | 1 | 2 | 一天欠 |
| Summary | 26 |
2.2 工作内容
- 算子底层npu异常检测,内存踩踏,越界。
- 高层模型层级,提高精度,算法强相关。
- (传统内容的创新)基于性能建模,推理和训练加速。
2.3 团队
- 百度编译器 晓光
- 10几个人,后端上海,前端北京。
- 尖酸小黄鸭:T9 - 半年
- T5 -清华的硕士:AI编译器前端,OP → 机器码→kernel。循环融合。
2.4 工作内容
- 鲁棒性(容忍输入扰动的稳定性,处理出错情况的可靠性),正确性,AI原生的思维。
- 后端 code阵列,
2.4.1 AI 编译器
- 中间理论:代数结构,形式化下来。多个op循环融合,map ADT(Abstract Data Type",即抽象数据类型?)
- TVM autotune op规则
- torch 负优化 - 形式化定义,算子以及约束。 A B op融合
Dear Qingcai,
Here is a summary of my activities and accomplishments for the past week, along with the planned priorities for the upcoming week.
Weekly Work Summary (January 1-7, 2024):
- Monday: New Year’s Day holiday.
- Tuesday: Focused on organizing my graduation thesis. Explored Device 2 Device technologies: InfiniBand, RoCE.
- Wednesday:
- Conducted research on various supercomputers, including Huawei’s and NVIDIA’s AI supercomputing interconnects. Discussed the potential of LLM on Shenwei with junior colleague Liu Bo.
- Studied AI training parallel strategies.
- Thursday: Reviewed the GPU Interval model. Got sidetracked by articles on CPU and GPU analysis models, realizing a gap in my foundational knowledge. Began studying tutorials to clarify concepts before revisiting the articles.
- Friday: Continued reading parts of the CUDA tutorial and discussed salary aspects with Huawei.
- Saturday: Gained a basic understanding of the GPU performance model. Discussed it with YFY and senior colleagues, although they seemed skeptical of its usefulness.
- Sunday: Prepared this weekly report, reviewed the past week, and refined the criteria for prioritizing tasks.
Tasks for Next Week :
- Utopia 2(1 day due): Present Utopia to NIO, requiring additional foundational knowledge. Refine logic and diagrams.
- Graduation Thesis 3(Over 2 days): Outline is almost set, but the title is yet to be finalized. Plan to complete the background and motivation sections before finalizing the title.
- AI for System 4(Almost 2 days): First, need to improve foundational knowledge of CUDA. Then, review details of the Interval model, especially the motivation. Minor research on analysis models might follow. After these, continue with LLM analysis and research optimization methods for PowerInfer.
- Heterogeneous System Workload Performance Assessment(Over 1 day):
- Start by understanding how OpenCL operates across platforms and abstracts computing resources, then delve deeper.
- Literature review: CPU and GPU characterization schedulers, considering aspects like parallelism and memory access requirements.
- ACSA Lab Website(1 day due): Handle design, documentation, and information collection tasks. Start with a small group for testing information gathering.
Please let me know if you have any feedback or need further details on any of the points.
Best regards,
Shaojie Tan
目录