Fork me on GitHub

LAMP Manual

Sub-message load balancing illustration. (a) Task graph before and after spliting without load balancing; (b) Task mapping and routing in NoC-basedsystem; (c) Original splitting without spliting; (d) Splitting with temporal load balancing; (e) Splitting with temporal and spatial load balancing.

Source Code link: link

3-DataSplitting

This is the algorithm to generate the routing after data splitting. Please read this paper: Chen, Hui, et al. “Parallel Multipath Transmission for Burst Traffic Optimization in Point-to-Point NoCs.” Proceedings of the 2021 on Great Lakes Symposium on VLSI. 2021.

More ...

MACRO Manual

The overview of co-optimization framework.

Source Code link: link

2-MARCO

This is the co-optimization algorithm used to generate mapping and routing solution for a given task graph. More details about this algorithm can be found in:


Python dependency:

(to be continued)

Python 3.7.4
pytorch (conda activate msai)


More ...

ArSMART Manual

Illustration of ArSMART NoC context.

Source Code link: link

This is a simulator which enables basic NoC, SMART NoC and ArSMART NoC. This simulator is developed based on gem5. The main website can be found at http://www.gem5.org.


To install this simulator:

    1. Download gem5: https://github.com/Dawnzju/gem5 (This is the version I am working on. Recommanded since the patch can be easily applied)
    1. Check the python version: Python 2.7.17 :: Anaconda, Inc. (conda activate py27)
    1. Make sure the original gem5 can be compiled and executed.
    1. Download and install JSONcpp: https://github.com/open-source-parsers/jsoncpp#generating-amalgamated-source-and-header
    1. Paste configs to gem5 folder
    1. Download NoC.patch file in this repo to gem5 folder.
    1. Apply this patch using command: git apply NoC.patch
    1. Compile again: scons build/Garnet_standalone/gem5.debug -j25 NUMBER_BITS_PER_SET=256

To use this simulator:

  • Original traffic pattern for NoC are supported:
      https://www.gem5.org/documentation/general_docs/ruby/garnet_synthetic_traffic/  
    
  • Prepare the task graph file with mapping and routing information
    • The task graph is saved in a json file: one task with out links is saved as:
      “9”: {“total_needReceive”: 93, “input_links”: [], “start_time”: 0, “out_links”: [[[2, 93, [], [[9, “S”], [17, “S”], [25, “S”], [33, “E”], [34, “E”], [35, “E”]], 1, 9, 36], [2, 93, [], [[9, “E”], [10, “E”], [11, “E”], [12, “S”], [20, “S”], [28, “S”]], 2, 9, 36], [2, 93, [], [[9, “E”], [10, “E”], [11, “S”], [19, “S”], [27, “S”], [35, “E”]], 3, 9, 36], [2, 93, [], [[9, “E”], [10, “S”], [18, “S”], [26, “E”], [27, “E”], [28, “S”]], 6, 9, 36], [2, 93, [], [[9, “E”], [10, “E”], [11, “S”], [19, “S”], [27, “E”], [28, “S”]], 6, 9, 36]]], “end_time”: 0, “visited”: 0, “total_needSend”: 93, “exe_time”: 102, “mapto”: 9}
    • taskid(str):{“total_needReceive”:int(total message need to be received), “input_links”: list(opt,reserved for future use), “start_time”: int(opt,reserved for future use), “out put links”: [[link1[candidata path1: destinationTaskId,messagesize,priority(opt,reserved for future use),path,candidateCount,sourceRouter,destinationRouter],[candidata path2],[candidata path3]],[link2],[link3]],“endTime”:int(opt,reserved for future use),“visited”:0 or 1(opt,reserved for future use),“totalneedSend”: int(total Message need to send),“exe_time”:int(total task should be executed),“mapto”:int(The PE has been mapped to)}
    • File name format: appName_Meshxxx_applicationInjectRate_Method.json
    1. Options:
    • –num-cpus Number of PEs
    • –num-dirs
    • –sim-cycles Number of simulation cycles
    • –topology NoC topology
    • –debug-flags=GarnetSyntheticTraffic Show the debug information
    • –network=garnet2.0 Enable garnet network
    • –mesh-rows Mesh size
    • –synthetic Synthetic traffic loads
    • –smart_hpcmax=8 Number of hops can be traversed in one cycle
    • Example: ./build/Garnet_standalone/gem5.debug –debug-flags=GarnetSyntheticTraffic configs/example/garnet_synth_traffic.py –network=garnet2.0 –num-cpus=64 –num-dirs=64 –topology=Mesh_XY –mesh-rows=8 –sim-cycles=1000000 –single-flit –synthetic=taskgraph –filename=Example_Mesh8x8_AIR1_xy.json
    • For SMART NoC:
      • –smart (GarnetNetwork.py/Network.py) Enable SMART
      • –smart2D (GarnetNetwork.py/Network.py) Enable SMART-2D
      • Example: ./build/Garnet_standalone/gem5.debug –debug-flags=GarnetSyntheticTraffic configs/example/garnet_synth_traffic.py –network=garnet2.0 –num-cpus=64 –num-dirs=64 –topology=Mesh_XY –mesh-rows=8 –sim-cycles=1000000 –single-flit –synthetic=taskgraph –smart –smart_hpcmax=4 –filename=Example_Mesh8x8_AIR1_xy.json
    • For ArSMART NoC:
      • Please read this paper: https://ieeexplore.ieee.org/abstract/document/9464312
      • –central (GarnetNetwork.py/Network.py) Enable the bypass at routers for the ArSMART
      • –filename (GarnetNetwork.py/Network.py) The routing and mapping configuration file
      • Example: ./build/Garnet_standalone/gem5.debug –debug-flags=GarnetSyntheticTraffic configs/example/garnet_synth_traffic.py –network=garnet2.0 –num-cpus=64 –num-dirs=64 –smart_hpcmax=8 –topology=Mesh_XY –mesh-rows=8 –sim-cycles=1000000 –single-flit –synthetic=taskgraph –central –filename=Example_Mesh8x8_AIR1_xy.json

Mapping and routing

  • Minimize contention via direct and indirect route:

    • Peng Chen, Hui Chen, Jun Zhou, Mengquan Li, Weichen Liu, Chunhua Xiao, Yaoyao Ye, Nan Guan, ``Contention Minimization in Emerging SMART NoC via Direct and Indirect Routes.’’ in IEEE Transactions on Computers (TC). 2021

    • Example: Example_Mesh8x8_AIR1_basic.json | Example_Mesh8x8_AIR1_improved.json

  • Mapping and routing co-optimization:

    • Hui Chen, Zihao Zhang, Peng Chen, Xiangzhong Luo, Shiqing Li, Weichen Liu, “MARCO: A High-performance Task Mapping and Routing Co-optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems”, in ACM Transactions on Embedded Computing Systems, doi: 10.1145/3476985

    • Example: Example_Mesh8x8_AIR1_coop.json

    • https://github.com/Dawnzju/2-MARCO

  • Mapping and routing co-optimization:

    • Hui Chen, Peng Chen, Xiangzhong Luo, Shuohuai, Weichen Liu, ``LAMP: Load-balanced Multipath Parallel Transmission in Point-to-point NoCs.’’ IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD). 2022.

    • Example: Example_Mesh8x8_AIR1_split.json

    • https://github.com/Dawnzju/3-DataSplitting

VScode Configuration and Insturctions

VScode 使用记录
https://code.visualstudio.com/docs/languages/json

SSH

应用: Remote SSH
host 不需要加端口

markdown

You can also right-click on the editor Tab and select Open Preview (Ctrl+Shift+V) or use the Command Palette (Ctrl+Shift+P) to run the Markdown: Open Preview to the Side command (Ctrl+K V)

Json/html

Shift+Alt+F formatting

Setting sync

步骤一:安装Settings Sync扩展
启动VSCode,搜索并安装Settings Sync扩展:

上传配置
按下 Shift + Alt + U

下载配置
按下 Shift + Alt + D

提示:如果上传配置成功,将会有消息显示在输出选项卡中。

按照如上步骤在新环境中配置Settings Sync扩展,然后按下Shift + Alt + D下载配置。

MikTex latex

修改 “更新-> 检索源” 为随机宏

Sumatra PDF

修改vscode setting.json file SumatraPDF 路径

正反向搜索

  1. 配置反向搜索(PDF->Latex源文件)
  2. 反向搜索在SumatraPDF中设置。打开SumatraPDF,进入设置->选项 对话框,在“设置反向搜索命令行”处填入如下内容(是一行内容,不是2行!):
  3. “C:\Users\Administrator\AppData\Local\Programs\Microsoft VS Code\Code.exe” “C:\Users\Administrator\AppData\Local\Programs\Microsoft VS Code\resources\app\out\cli.js” -r -g “%f:%l”
  4. 双击PDF中的任意一处即可跳转到VSCode中所对应的内容的源代码处
  5. 反向搜索:打开一个已经编译的TeX文件,ctrl+alt+v打开PDF文件,在正文中双击鼠标左键,会切换到了源文件的相应位置。
  6. 正向搜索:ctrl+alt+x,找到”navigator,select and edit”,点击第一项”syncTeX from cursor”(快捷键ctrl+alt+j),会切换到PDF文件的相应位置。

如果不成功,检查路径设置,或者文件名错误。

MARCO A High-performance Task Mapping and Routing Co-optimization Framework for NoC-based Heterogeneous Computing Systems

 Motivation  examples.  (a).  DAG  modeled  application  and  processing  rate  of  different  PEs;  (b).Computation-aware mapping and SOTA routing; (c). Communication-aware mapping and SOTA routing; (d).Co-optimized mapping and routing.

Source

This paper published in TCAD. To refer this:

Hui Chen, Zihao Zhang, Peng Chen, Xiangzhong Luo, Shiqing Li and Weichen Liu, “MARCO: A High-performance Task Mapping and Routing Co-optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems”, in ACM Transactions on Embedded Computing Systems, doi: 10.1145/3476985

Hui Chen, Zihao Zhang, Peng Chen, Xiangzhong Luo, Shiqing Li and Weichen Liu, “MARCO: A High-performance Task Mapping and Routing Co-optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems”, in Proceedings of International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (ESWEEK-CASES ‘21), October 08-15, 2021, Virtual Event.

Abstract

More ...

简介

陈晖


English Version

邮箱: hui.chen@ntu.edu.sg   

主页:https://dawnzju.github.io/              

GitHub:https://github.com/Dawnzju/

Linkedin: https://www.linkedin.com/in/chen-hui-33996aa6/  

电话: (+86) 13830650260 / (+65) 87106683


研究方向

片上网络
专用集成电路通信优化
众核系统


教育背景

2019/01 至今            南洋理工大学     博士在读
导师: 刘韦辰
计算机科学与工程
绩点: 4.83/5

  • 第58届dac青年院士
  • 2021年ASP-DAC学生研究论坛秘书

2017/01 - 2018/07         新加坡国立大学    硕士
计算学院
绩点: 4.05/5

2012/09 - 2016/06         浙江大学       本科
计算机科学与技术学院

  • 2015 美国数学建模大赛一等奖
  • 2014 辅修英语(英语文学)
  • 2014 浙江大学本科生数学建模比赛二等奖
  • 2013-2014 浙大新青年新媒体部门部长,负责网页和日常管理
  • 2013 浙江大学新生辩论赛冠军

发表刊物

期刊

会议

专利

  • 技术披露(Technology Disclosure):
      序号: 2020-195-01-SG PRV
      题目: A High-Performance Application-Specific Network-on-Chip With Software Configurable Express Paths
      研究人员: 1) LIU Weichen; 2) CHEN Hui
      时间: 11/09/2020
      新加坡专利号: 10202008924S

  • 许可(License):
      题目 : EDLAB: A Benchmark Tool for Edge Deep Learning Accelerators
      研究人员: 1) Liu Weichen; 2) Liu Di; 3) Kong Hao; 4) Zhang Lei; 5) Huai Shuo; 6) Li Shiqing; 7) Chen Hui; 8) Zhu Shien
      时间: 08/07/2020
      NTU序号: TD 2020-264


项目

  • ArSMART           

    Url:ArSMAET

    一句话总结 : 一种支持任意路径的单周期多跳片上网络

    关键词 : 结构设计, 路由算法.

    具体贡献:

    • 我们开发了一种支持任意路径的单周期多跳片上网络,ArSMART,使用其可以大幅度减少资源冲突。具体来说,ArSMART将整个片上网络分为多个集群,在每个集群里,传输路径由控制器计算,而数据转发由无缓冲可重配置路由器进行。
    • 我们提出了相应的路由算法,使得ArSMART在正式传输之前,综合考虑网络实时状态,并且迅速地计算出路由。计算路由的难点在于网络状态在计算时和真正传输时是不同的。我们的算法减少了这种影响并且提高了片上网络性能。
       
  • MARCO           

    Url:MARCO

    一句话总结 : 一个针对异构系统(基于ArSMART)的通信-计算联合优化框架

    关键词 : 映射算法, 路由算法,异构系统

    具体贡献:

    • 我们分析了基于片上网络的异构系统中的任务映射和通信路由的设计空间,并且发现由于任务映射和通信路由设计是强相关的,逐步地优化任务映射和通信路由无法达到最优解。
    • 我们提出了MARCO,一个针对基于片上网络的异构系统的任务映射-通信路由联合优化框架。具体来说,我们修改了tabu搜索框架来探索并且评估任务映射和通信路由计算空间。同时,我们使用增强学习来高效地找出通信路径。
       
  • LAMP           

    Url:LAMP

    一句话总结 : 一种提高ArSMART通信并行度的方法

    关键词 : 并行多路径传输, 路由算法,网络接口设计

    具体贡献:

    • 我们修改了NoC的路由器和网络接口设计,使得其能够支持并行多路径传输,同时,在目的节点对多端口收到的包进行高效地重排。
    • 我们提出了一种并行多路径传输路由算法,这个算法很好得回答了以下三个问题:什么时候并行传输?分几条路径并行传输?每一条路径传输多少数据?以增强学习为基础,我们的算法能大幅提高传输效率。

About Me

Chen Hui


中文简历

Email: hui.chen@ntu.edu.sg   

blog:https://dawnzju.github.io/               

GitHub:https://github.com/Dawnzju/ 

Linkedin: https://www.linkedin.com/in/chen-hui-33996aa6/   

Tel: (+86) 13830650260 / (+65) 87106683


Research Interests

Electronic/Photonic Network-on-Chip
Application-specific Designs
Many-Core Systems


Education

2019/01 - Now          Nanyang Technological University   PhD Candidate
Supervisor: Liu Weichen
School of Computer Science and Engineering
Grade: 4.83/5

  • Young Fellow of the 58th Design Automation Conference
  • Secretary of ACM SIGDA Student Research Forum (co-located with ASP-DAC 2021)

2017/01 - 2018/07         National University of Singapore   Master
School of Computing
Grade: 4.05/5

2012/09 - 2016/06         Zhejiang University         Undergraduate
School of Computer Science and Technology

  • 2015 1st grade award of The Mathematical Contest in Modeling of America.
  • 2014 Minor in English, focus on English literature.
  • 2014 2nd grade award of The Undergraduate Mathematical Contest in Modeling of Zhejiang University.
  • 2013-2014 Minister of The New Youth Network Studio, responsible for the website and daily operations.
  • 2013 The Best Debater of The new Undergraduate debate competition of Zhejiang University.

Publication

Journal

Conference

Patents

  • Technology Disclosure (TD):
      Ref: 2020-195-01-SG PRV
      Title: A High-Performance Application-Specific Network-on-Chip With Software Configurable Express Paths
      Inventors: 1) LIU Weichen; 2) CHEN Hui
      Filed date: 11 September 2020
      Singapore provisional patent application number: 10202008924S

  • License:
      Title : EDLAB: A Benchmark Tool for Edge Deep Learning Accelerators
      Inventors: 1) Liu Weichen; 2) Liu Di; 3) Kong Hao; 4) Zhang Lei; 5) Huai Shuo; 6) Li Shiqing; 7) Chen Hui; 8) Zhu Shien
      Filed date: 08 July 2020
      NTU Ref: TD 2020-264

Main Projects

  • ArSMART           

    Url:ArSMAET

    One sentence for contribution : A NoC infrastructure that supports single-cycle-multi-hop transmission with arbitrary turns.

    Related to : NoC infrastructure design, routing design.

    Specifically:

    • We develop an NoC design, ArSMART NoC, to set up single-cycle long-distance paths and support arbitrary-turn data transmission, which significantly reduces resource contentions. Specifically, ArSMART divides the whole NoC into multiple clusters where the route computation is conducted by the cluster controller and the data forwarding is performed by the bufferless reconfigurable router.
    • We present corresponding routing algorithms that enable ArSMART to manage NoC resources efficiently. Specifically, we conduct the route computation to generate a route before they demand at runtime, considering the real-time network state. The challenge to design routing algorithms for ArSMART is the difference of network states used in route computation and actual transmission. Our algorithms manage to minimize such impact and lessen contentions to improve NoC performance.
    • We implement the ArSMART design and matched routing algorithms in Gem5, and conduct a full system simulation to show their effectiveness. Compared with the state-of-the-art SMART NoC, the experimental results demonstrate an average reduction of 40.7% in application schedule length and 29.7% in energy consumption.

 

  • MARCO           

    Url:MARCO

    One sentence for contribution : A computation and communication optimization framework for heterogeneous many-cores (based on ArSMART).

    Related to : Mapping, Routing, Heterogeneous systems.

    Specifically:

    • We analyze the design space of task mapping and routing for emerging NoC-based HCSs. We identify that algorithms that unilaterally explore task mapping or routing cannot get the optimal solution since task mapping and routing are strongly related.
    • We propose MARCO, a task mapping and routing co-optimization framework for emerging NoC-based HCSs, to decrease the schedule length of applications. Specifically, we revise the tabu search to explore the design space and evaluate the quality of task mapping and routing. The advanced reinforcement learning algorithm, i.e., advantage actor-critic, is adopted to compute paths efficiently.
    • We perform extensive experiments on various real applications, which demonstrates that the MARCO achieves a remarkable performance improvement in terms of schedule length (+44.94% ~ +50.18%) when compared with the state-of-the-art mapping and routing co-optimization algorithm for homogeneous computing systems. We also compare MARCO with different combinations of state-of-the-art independent mapping and routing approaches.

 

  • LAMP           

    Url:LAMP

    One sentence for contribution : A methodology to improve data transmission parallelism for ArSMART.

    Related to : Parallel multipath transmission, Routing, NI design.

    Specifically:

    • We revised NoC router and network interface (NI) designs to support the parallel multipath transmission competently and re-order the packets from different ports with minimal overhead.
    • We proposed a parallel multipath algorithm, with which, the complex problems: when to split data transmission, how to split it, and which path should be taken to transmit data, are efficiently answered through the reinforcement learning-based approach to improve the NoC performance.
    • We presented temporal and spatial load balancing algorithms to adjust the size of split messages so that the NoC resources can be fully utilized, further improving the transmission efficiency.

Parallel Multipath Transmission for Burst Traffic Optimization in Point-to-Point NoCs

Illustration of hardware design. (a). Overview ofdata transmission; (b). Router design; (c) Input NI design.

Source

This paper published in TCAD. To refer this:

Hui Chen, Zihao Zhang, Peng Chen, Shien Zhu, Weichen Liu. ``Parallel Multipath Transmission for Burst Traffic Optimization in Point-to-Point NoCs.’’ In Proceedings of the Great Lakes Symposium on VLSI 2021(GLSVLSI ’21), June 22–25, 2021, Virtual Event, USA.ACM, New York, NY

Abstract

Network-on-chip (NoC) is a promising solution to connect more than hundreds of processing elements (PEs). As the number of PEs increases, the high communication latency caused by the burst traffic hampers the speedup gained by computation acceleration. Although parallel multipath transmission is an effective method to reduce transmission latency, its advantages have not been fully exploited in previous works, especially for emerging point-to-point NoCs since:

More ...

ArSMART An Improved SMART NoC Design Supporting Arbitrary-Turn Transmission

ArSMART NoC Design (a). Overview of ArSMART; (b). Cluster structure; (c). Router design.

Source

This paper published in TCAD. To refer this:

Hui Chen, Peng Chen, Jun Zhou, L. H. K. Duong and Weichen Liu, “ArSMART: An Improved SMART NoC Design Supporting Arbitrary-Turn Transmission,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, doi: 10.1109/TCAD.2021.3091961.

Abstract

SMART NoC, which transmits unconflicted flits to distant processing elements (PEs) in one cycle through the express bypass, is a high-performance NoC design proposed recently.
However, if contention occurs, flits with low priority would not only be buffered but also could not fully utilize bypass. Although there exist several routing algorithms that decrease contentions by rounding busy routers and links, they cannot be directly applicable to SMART since it lacks the support for arbitrary-turn (i.e., the number and direction of turns are free of constraints) routing. Thus, in this article, to minimize contentions and further utilize bypass, we propose an improved SMART NoC, called ArSMART, in which arbitrary-turn transmission is enabled. Specifically, ArSMART divides the whole NoC into multiple clusters where the route computation is conducted by the cluster controller and the data forwarding is performed by the bufferless reconfigurable router.

More ...
  • Copyrights © 2020-2022 Chen Hui

请我喝杯咖啡吧~