Top Banner
A survey on application mapping strategies for Network-on-Chip design Pradip Kumar Sahu, Santanu Chattopadhyay Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India article info Article history: Received 1 February 2012 Received in revised form 12 October 2012 Accepted 13 October 2012 Available online 2 November 2012 Keywords: Application mapping Network-on-chip System-on-chip Intellectual property abstract Application mapping is one of the most important dimensions in Network-on-Chip (NoC) research. It maps the cores of the application to the routers of the NoC topology, affecting the overall performance and power requirement of the system. This paper presents a detailed survey of the work done in last one decade in the domain of application mapping. Apart from classifying the reported techniques, it also performs a quantitative comparison among them. Comparison has been carried out for larger sized test applications also, by implementing some of the prospective techniques. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction With the growing complexity of embedded VLSI products, Sys- tem-on-Chip (SoC) based single-chip implementation, integrating numerous Intellectual Property (IP) cores performing various func- tions and possibly working at different clock frequencies, is now a well-established one. Shared medium arbitrated bus is the com- monly used communication backbone in these SoCs. However, the performance of a bus based SoC does not scale with the number of cores attached. In the process of search for the communication backbone of next generation many-core based SoCs supporting new inter-core communication demands, Network-on-Chip (NoC) has emerged as a viable alternative. It proposes the design of mod- ular and scalable communication architectures where various IP cores are connected to a router-based network using appropriate Network Interface (NI) [1–3]. Fig. 1 shows the NoC synthesis flow. The application is specified as a collection of tasks with communication between them. This is known as the application task graph. The IP cores in a design library can perform a subset of these tasks. Hence, the first step is to select a set of cores and allocate the tasks to be realized by them. This gives rise to the core graph, with cores as nodes and communication bandwidths as edge labels. The mapping tech- niques map this core graph onto a topology graph, the objective of mapping being reduction in overall communication delay. The mapped graph then passes through routing and scheduling stages to generate the final NoC. The holistic research problems in this NoC design paradigm can be broadly classified into four different dimensions [4–6]. The first dimension is focused on the choice of communication infrastructure, such as, network topology, router architecture, buffer optimization, link design, clocking, floor planning, and layout. The second dimen- sion of NoC research deals with the communication paradigm includ- ing routing policies, switching techniques, congestion control, power and thermal management, fault tolerance, reliability etc. The third dimension involves designing an evaluation framework for NoC to have a good understanding of achievable throughput, la- tency, and bandwidth of the network. Once the communication infrastructure and paradigm for a NoC have been finalized, a major challenge in overall system design is to associate the IP cores imple- menting tasks of an application with the routers. This has got a very significant role to play in determining the performance of the over- all system, as it directly influences communication time, required link bandwidth and admissible delay of the router. This application mapping forms the fourth important dimension in NoC research. While there exist quite a good number of surveys [4–6] on NoC works in the first three dimensions, the fourth one, that is, applica- tion mapping techniques have not been surveyed well. To the best of our knowledge, the only survey on NoC mapping and scheduling techniques is [7], this is now dated. A large number research works have been reported in recent years. The scope of this paper lies in studying these works. The objective is to classify the mapping algo- rithms into different categories and compare them. In this paper we have mainly focused on application mapping. In some NoCs, the cores are already attached to the routers; hence, task mapping stage can simply be augmented to decide on the particular core to be used. On the other hand, in general NoCs no such association pre-exists between cores and routers. The application mapping 1383-7621/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sysarc.2012.10.004 Corresponding author. Mobile: +91 9434042800; fax: +91 3222 255303. E-mail addresses: [email protected] (P.K. Sahu), [email protected]. ernet.in (S. Chattopadhyay). Journal of Systems Architecture 59 (2013) 60–76 Contents lists available at SciVerse ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc
17

A Survey on Application Mapping Strategies for Network-On-Chip Design

Jan 03, 2016

Download

Documents

Marwan Ahmed

A Survey on Application Mapping Strategies for Network-On-Chip Design
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Survey on Application Mapping Strategies for Network-On-Chip Design

Journal of Systems Architecture 59 (2013) 60–76

Contents lists available at SciVerse ScienceDirect

Journal of Systems Architecture

journal homepage: www.elsevier .com/locate /sysarc

A survey on application mapping strategies for Network-on-Chip design

Pradip Kumar Sahu, Santanu Chattopadhyay ⇑Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India

a r t i c l e i n f o

Article history:Received 1 February 2012Received in revised form 12 October 2012Accepted 13 October 2012Available online 2 November 2012

Keywords:Application mappingNetwork-on-chipSystem-on-chipIntellectual property

1383-7621/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.sysarc.2012.10.004

⇑ Corresponding author. Mobile: +91 9434042800;E-mail addresses: [email protected] (P.K

ernet.in (S. Chattopadhyay).

a b s t r a c t

Application mapping is one of the most important dimensions in Network-on-Chip (NoC) research. Itmaps the cores of the application to the routers of the NoC topology, affecting the overall performanceand power requirement of the system. This paper presents a detailed survey of the work done in lastone decade in the domain of application mapping. Apart from classifying the reported techniques, it alsoperforms a quantitative comparison among them. Comparison has been carried out for larger sized testapplications also, by implementing some of the prospective techniques.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

With the growing complexity of embedded VLSI products, Sys-tem-on-Chip (SoC) based single-chip implementation, integratingnumerous Intellectual Property (IP) cores performing various func-tions and possibly working at different clock frequencies, is now awell-established one. Shared medium arbitrated bus is the com-monly used communication backbone in these SoCs. However,the performance of a bus based SoC does not scale with the numberof cores attached. In the process of search for the communicationbackbone of next generation many-core based SoCs supportingnew inter-core communication demands, Network-on-Chip (NoC)has emerged as a viable alternative. It proposes the design of mod-ular and scalable communication architectures where various IPcores are connected to a router-based network using appropriateNetwork Interface (NI) [1–3].

Fig. 1 shows the NoC synthesis flow. The application is specifiedas a collection of tasks with communication between them. This isknown as the application task graph. The IP cores in a designlibrary can perform a subset of these tasks. Hence, the first stepis to select a set of cores and allocate the tasks to be realized bythem. This gives rise to the core graph, with cores as nodes andcommunication bandwidths as edge labels. The mapping tech-niques map this core graph onto a topology graph, the objectiveof mapping being reduction in overall communication delay. Themapped graph then passes through routing and scheduling stagesto generate the final NoC.

ll rights reserved.

fax: +91 3222 255303.. Sahu), [email protected].

The holistic research problems in this NoC design paradigm canbe broadly classified into four different dimensions [4–6]. The firstdimension is focused on the choice of communication infrastructure,such as, network topology, router architecture, buffer optimization,link design, clocking, floor planning, and layout. The second dimen-sion of NoC research deals with the communication paradigm includ-ing routing policies, switching techniques, congestion control,power and thermal management, fault tolerance, reliability etc.The third dimension involves designing an evaluation frameworkfor NoC to have a good understanding of achievable throughput, la-tency, and bandwidth of the network. Once the communicationinfrastructure and paradigm for a NoC have been finalized, a majorchallenge in overall system design is to associate the IP cores imple-menting tasks of an application with the routers. This has got a verysignificant role to play in determining the performance of the over-all system, as it directly influences communication time, requiredlink bandwidth and admissible delay of the router. This applicationmapping forms the fourth important dimension in NoC research.

While there exist quite a good number of surveys [4–6] on NoCworks in the first three dimensions, the fourth one, that is, applica-tion mapping techniques have not been surveyed well. To the bestof our knowledge, the only survey on NoC mapping and schedulingtechniques is [7], this is now dated. A large number research workshave been reported in recent years. The scope of this paper lies instudying these works. The objective is to classify the mapping algo-rithms into different categories and compare them. In this paperwe have mainly focused on application mapping. In some NoCs,the cores are already attached to the routers; hence, task mappingstage can simply be augmented to decide on the particular core tobe used. On the other hand, in general NoCs no such associationpre-exists between cores and routers. The application mapping

Page 2: A Survey on Application Mapping Strategies for Network-On-Chip Design

Application Task Graph

Core Selection

Core Graph

Mapping

MappedTopology Graph

Routes for Communication

Synthesized NoC

Routing

Scheduling

Topology Graph

Fig. 1. Application-specific NoC design flow.

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 61

phase performs this mapping. While a third category synthesizesthe topology graph specifically optimized for an application, com-monly known as application specific NoC synthesis. In this paperwe survey mostly the techniques which map cores to routers with-out assuming any pre-existing core–router association. It also ex-cludes the application specific topology generation from itspurview. The IP cores in a NoC topology may be homogeneous orheterogeneous in nature. All homogeneous cores can perform sameset of tasks while heterogeneous cores can perform different sets oftasks. This is generally taken care of at the core selection stage(Fig. 1). The mapping process takes the core graph (along with itscommunication requirements) as input and finds a mapping ofcores to the topology graph. The rest of the paper is organized asfollows. Section 2 presents an overview of mapping techniques.Section 3 enumerates different dynamic mapping techniques,

Mapping Algorithms

Systematic or

Deterministic Search

Search ba

Mathematical Programming

based Mapping

Exact Mapping

TransformHeuri

ILP, MILP, etc.

Branch and Bound (BB)

PSO,AC

Dynamic Mapping Static Mapping

Fig. 2. Classification of m

while Section 4 concentrates on static mapping approaches. Sec-tion 5 presents a performance comparison among the mappingtechniques. Some special mapping techniques have been presentedin Section 6. Section 7 presents an overview of some of the appli-cation mapping tools. Section 8 draws the conclusion.

2. Mapping techniques

The problem of application mapping is NP-hard [7]. Dependingon the time at which the tasks are assigned to the IPs for process-ing, the mapping techniques can be classified as dynamic mappingand static mapping (Fig. 2).

In case of on-line or dynamic mapping, the assignment andordering of tasks are performed during the execution of the appli-cation. Dynamic mapping always tries to detect the performancebottleneck and distribute the workload among the processors. Asthe mapping depends on the current load of the processors, itshould result in a better solution. However, the computationaloverhead of the mapping algorithm may increase the delay and en-ergy consumption of the application at run-time.

On the other hand, in case of static mapping, generally the map-ping of tasks is performed off-line, before the application is run. Fora given application and a target communication infrastructure,static mapping always tries to define the best placement of tasksat design time. As the mapping is completed before execution,the mapping algorithm is executed only once. For NoC, staticmapping is generally recommended, as excess communicationoverhead in dynamic mapping significantly affects system perfor-mance, increasing the overall delay of the system [7].

3. Dynamic mapping techniques

Dynamic mapping is an on-line mapping strategy. The readytasks are mapped to the processors by observing the load of theprocessors at run-time. So the placement of tasks on NoC can bechanged during execution of the application.

sed Mapping

Heuristic Search

Constructivewith

Iterative Improvement

Constructivewithout

Iterative Improvement

ative stic

ConstructiveHeuristic

GA, O

BMAP, CMAP , CHMAP, SMAP

NMAP, LMAP,SA, Onyx

apping algorithms.

Page 3: A Survey on Application Mapping Strategies for Network-On-Chip Design

62 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

In [8], authors have proposed a compiler based application map-ping scheme, which can perform task scheduling, processor map-ping, data mapping and packet routing. This mapping techniqueneeds very high compilation time which may degrade the systemperformance. Comparison of energy has been carried out withand without data mapping and packet routing. In [9,18], authorshave presented heuristics for dynamic task mapping with an initialtask mapping phase, followed by a dynamic mapping phase. Thedynamic mapping phase may use any one of the techniques, suchas, First Free (FF), Nearest Neighbor (NN), Minimum MaximumChannel Load (MMC), Minimum Average Channel Load (MAC),and Path Load (PL). In case of FF technique, the NoC selects the firstfree node which can execute the requested task, the network beingsearched column wise. The NN mapping is similar to FF, only differ-ence is that the requested task is placed at the free neighboringnode of the node making the request. MMC is a congestion-awaremapping heuristic which reduces the maximum loads in the links.The MAC technique is similar to MMC, which distributes the com-munication load onto the NoC to reduce the average load in thelinks. The MMC and MAC consider all NoC links while mapping anew task. Hence, mapping takes time. The PL technique overcomesthis problem by considering only the links that are used by the taskbeing mapped. It has been shown that the PL heuristic produces thebest solution, compared to others [9,18]. The authors in [10] haveproposed a technique for run-time application mapping onto NoCplatforms with multiple voltage levels. This technique consists ofa region selection algorithm using a heuristic for run-time applica-tion mapping. Different regions are operated at different voltagelevels. Compared to random mapping, the communication energysaving is about 50%. In [11], authors have described a run-timestrategy for task allocation to homogeneous NoC platform. It incor-porates the user behavior information in the resource allocationprocess. This allows the system to respond better to real-timechanges and adapt to user needs dynamically. In [12], the real-timeapplications are dynamically mapped onto embedded MPSoCs,where communication is performed via NoC, the resources con-nected to the NoC have multiple voltage levels, as in [10]. A two-step algorithm has been proposed, first to select proper region formapping followed by a greedy heuristic for run-time task mappingto nodes. The task allocation problem may follow any one of thetechniques – Best Case (BC), Worst Case (WC), Euclidean Minimum(EM), Fixed Center (FC), Random Frontier (RF) and Neighbor-awareFrontier (NF). The BC technique corresponds to the optimal solutiongenerated by an exhaustive search. It cannot be applied to dynamicmapping due to its high run-time overhead [12]. In case of WC, themapping of a task onto a tile is dependent on an already mappedtask. The EM technique maps a task by selecting an unmapped tileon NoC having minimum Euclidian distance from all the mappednodes. In case of FC, tasks are mapped with a minimum ManhattanDistance (MD) to the first mapped tile. The RF heuristic maps a taskby selecting an unmapped tile randomly from the frontier of themapped region. Every tile has four neighbors. A tile is availablefor mapping if it has not been included into the region. The NF heu-ristic maps a task onto a tile with a minimal number of availableneighbors. It has been shown that the EM and FC heuristics producereasonably better solutions, compared to others [12]. Sometimesthe number of tasks running in an MPSoC may exceed the numberof available resources, so dynamic task mapping is needed [13].Tasks are mapped on the fly, according to the communication re-quests and the loads in the links. The target MPSoC architecturecontains software and hardware processing elements. Each pro-cessing element can support only one task. The approach [13] re-duces the NoC channel load, congestion, and packet latency usingdifferent heuristics as in [9,18]. DSM, a dynamic spiral mappingtechnique has been proposed in [14] for task mapping duringrun-time. The placement of a task is searched in a spiral path from

centre to the boundary of the network architecture. It tries to placethe communicating tasks close to each other. It also attempts to re-duce the communication time by reducing dynamic mapping time,reconfiguration time and task migration time. A run-time agentbased distributed application mapping technique for NoC basedheterogeneous MPSoCs has been presented in [15]. The techniquemaps the applications in a decentralized or distributed mannerusing an agent based approach. It reduces monitoring traffic andcomputational effort for the mapping process, compared to the cen-tralized approaches. Agents are small tasks which can be executedon any node in the NoC. They perform resource management andstore state information for the resources. The agents act and nego-tiate with each other to find processing elements suitable for map-ping a task. There are two types of agents to accomplish this: GlobalAgents (GA) and Cluster Agents (CA). The CAs have knowledgeabout their clusters. When they get a new task request, they nego-tiate with the GAs, which have global information about all clus-ters. The target MPSoC architecture designed in [16] containsboth software and hardware processing elements which can sup-port more than one task in parallel. In this MPSoC architecture,among the available processing nodes, one processing node actsas a Manager Processor, and is responsible for task binding, taskmapping, task migration, resource control and reconfiguration con-trol. The resource status is updated at run-time and the ManagerProcessor keeps track of the information about resource occupancy.Mapping decisions are taken accordingly. The mapping heuristicsproposed in [16,17] map the communicating tasks of an applicationclose to each other so as to minimize the communication overheadin order to improve performance. The heuristics examine the avail-able resources prior to recommending the adjacent tasks on thesame processing element. The algorithm also attempts to map thecommunicating tasks in close proximity in a compact manner, soas to reduce communication overhead and time. Hence, total exe-cution time also gets reduced. This is a two phase algorithm. Inthe first phase, initial task mapping is done either on the first freeposition found in the network that can support the tasks, or theNoC is partitioned into regions and the initial tasks are placed atthe centre of these region clusters. In the second phase, the newrequesting tasks are mapped for better performance gain by an effi-cient run-time mapping algorithm. In general, the works proposedin [9,13,18] are extended in [16,17], employing a packing strategythat minimizes the communication overhead in the same NoCbased MPSoC platform. It supports multi-task mapping onto thesame PE. An energy-aware heuristic for dynamic task mapping,named lower energy consumption based on dependencies-neighbor-hood (LEC-DN) has been presented in [19,20]. The main costfunction here is not only the distance in hops between communi-cating tasks, but also the proximity in the number of hops andthe communication volume among the tasks, since the number oftransmitted flits defines the communication energy. When targettask has only one communicating task that has already beenmapped, LEC-DN uses the Nearest Neighbor (NN) search in a spiralfashion. On the other hand, if there are more than one communicat-ing tasks that are already mapped, it searches for a processing ele-ment inside the bounding box defined by the position of such taskdepending on the communication volume. In [21], a dynamicdecentralized, application-driven and resource-aware mappinghas been proposed, where tasks can be embedded incrementallyby an already mapped predecessor task. This is a self-embeddingapproach which is fully decentralized and autonomous.

4. Static mapping techniques

A mapping is called static if the resource on which a task isgoing to be executed is decided before its execution and is not

Page 4: A Survey on Application Mapping Strategies for Network-On-Chip Design

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 63

changed thereafter. Static mapping (Fig. 2) is an off-line mapping.All the cores are mapped to routers at design time. Various tech-niques have been developed to find a good and efficient mappingsolution. The static application mapping approaches can be broadlyclassified to be either Exact mapping or Search based mapping, asshown in Fig. 2, depending upon the techniques employed to reachat a mapping solution.

4.1. Exact mapping

The mathematical programming based mapping produces opti-mal solution. A Mixed Integer Linear Programming (MILP) basedtask mapping for heterogeneous multiprocessor systems has beenreported in [22]. In this heterogeneous multiprocessor, some pro-cessors are programmable, while others are application specific.The model determines the optimization tradeoff between execu-tion time, processor (general purpose processor or application spe-cific processor) and communication cost. This is a hardware/software co-design process, run iteratively until the design goalis met. An MILP formulation for mapping cores onto NoC whileconsidering the choice of core placements, switches for each core,and network interfaces for communication has been proposed in[23]. It is reported that the energy consumption is much less ascompared to other mapping techniques for some real, as well as,random benchmarks. An integrated approach for mapping of coresonto heterogeneous processor/memory based NoC topologies andphysical planning has been presented in [24], where the positionand size of the cores and network components are computed. Forinitial mapping, authors have followed a greedy mapping of coresonto the specified topology and then in improvement phase therelative core positions are fixed by Tabu search. An MILP basedphysical planning algorithm has been formulated to improve thearea and power of the final design and also to guarantee Quality-of-Service (QoS) for the application. In [25], MILP formulation forsynthesis of custom NoC architecture has been presented. Herethe optimization objective is to minimize the power consumption,subject to the performance constraints. In case of Linear Program-ming (LP), the main bottleneck is runtime. To reduce runtime, theauthors have partitioned the application task graph into a numberof clusters. The MILP formulation for topology design is then uti-lized and partial solutions are generated. At the end, the finalmapped custom topology is generated by adding physical links be-tween the ports of neighbouring routers of the clusters.

Network processors incorporate features like symmetric multi-processing (SMP), block multi-threading, and multiple memoryelements to support high performance networking applications.Mapping an application onto a complex multi-processor, multi-threaded network processor is a difficult task. In [26], authors havepresented a two stage Integer Linear Programming (ILP) formula-tion for process allocation and data mapping on SMP and blockmulti-threading based network processor. Power/energy controlis a very important issue in case of NoC based chip multiprocessors(CMPs). The work [27] attempts to minimize energy by shuttingdown certain communication links in such architectures. This for-mulation can be used for selecting the links in use and voltage, fre-quency for those links. The problem of minimization of energyconsumption during application execution while satisfying theperformance constraint may be combination of some sub-prob-lems, such as, mapping of application tasks to IPs, mapping of IPsto the routers of NoC architecture, assigning operating voltagesto IPs, and routing. Different operating voltages are assigned toIPs if they are operating at multiple voltages. A unified approachof energy efficient application mapping which utilizes MILP formu-lation of the problem has been presented in [28] taking care of allthe sub-problems, such as, application mapping, operating voltageassignment, and routing. In [29], the existing ILP [28] has been ex-

tended to find a trade-off between computation and communica-tion energy. In [30], factors that produce network contentionhave been analyzed. It proposes an ILP formulation for a conten-tion-aware application mapping algorithm in tile-based NoC tominimize inter-tile network contention. In NoC based design, theglobal wires are replaced by a network of shared links and the rou-ters exchange data packets simultaneously through the links. Sothere is traffic congestion within the links which significantly de-grades the system performance. The network contention may besource based, destination based, and path based. The result showsthat there is a significant reduction of packet latency by reducingthe network contention, but the loss of communication energy ishigh. In [31], authors have presented ILP formulation for applica-tion mapping onto mesh based NoC to minimize energy consump-tion for different benchmarks. However, the formulation does notinclude bandwidth constraints. The CPU time for different bench-marks reported in this paper is also quite high. To overcome thehigh CPU time, a clustering based relaxation for ILP formulationhas been proposed in [32]. The tasks of the application graph areclustered suitably, as in [25]. Based on the number of clusters,the mesh architecture is divided into smaller sized meshes. TheILP based formulation of [31] is used to map the clusters onto cor-responding sub-meshes. At the end, it merges all such sub-meshesto determine the final solution. It has been noted that, the CPUtime gets improved with a sacrifice in the communication cost ofthe mapping solution.

4.2. Search based mapping

Depending on the search type and results, there are two types ofsearch based mapping algorithms – (i) systematic or deterministicsearch and (ii) heuristic search.

4.2.1. Deterministic searchSearch algorithms using Branch-and-Bound (BB) belongs to this

category. It is a systematic search algorithm that topologicallyfinds the mapping by searching the solution in tree branches andbounding unallowable solutions. It can be applied to smaller prob-lems, as search time grows exponentially with the size of theproblem.

In [33–35], authors have proposed an energy and performanceaware mapping for tile-based regular NoC architecture to satisfythe specified design constraints through bandwidth reservation.In [34,35], it has been suggested that the most appropriate routingtechnique for NoC should be deterministic, deadlock-free, minimal,and wormhole-based. In a tile-based NoC architecture, as the net-work wires are structured and modular, their electrical parameterscan be very well controlled and optimized. They have first formu-lated Energy- and Performance-Aware Mappings (EPAM or GMAP)in topological sense and then an efficient Performance-awareBranch-and-Bound (PBB) algorithm is utilized to improve the solu-tion quality. A good amount of energy saving has been reported forEPAM combined with PBB, compared to Simulated Annealing (SA)based solutions. In the above mapping, single IP is connected to arouter. An IP with large communication volume will result in aheavy traffic load to certain routers, which may become hotspotdue to high power density that affects the reliability of the chip.A traffic balanced IP mapping algorithm (TBMAP) for 2-D (twodimensional) mesh based NoC has been proposed in [36]. The traf-fic of all the routers is balanced without sacrificing the networkperformance by the TBMAP algorithm. To reduce unbalanced traf-fic loads, authors have proposed various network interfaces (NIs),Single-Router to Single-IP (SRSI), Single-Router to Multiple-IP(SRMI), and Double-Router to Single-IP (DRSI). Based on the newnetwork interfaces (NIs), TBMAP uses a modified branch-and-bound search, as in [33–35], to map all the IPs onto 2-D mesh based

Page 5: A Survey on Application Mapping Strategies for Network-On-Chip Design

c a e

d b f

Fig. 3. Final core placement [39].

64 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

NoC. As in this case traffic loads are more balanced, some datapaths might not be the shortest ones. So, the average traffic loadmay be higher in some cases. In [37], authors have taken NMAP[75] (discussed in Section 4.2.2.2.2) as their initial mapping solu-tion. A Branch-and-Bound algorithm, as in [33–35], has been ap-plied upon the NMAP mapping solution to arrive at a bettersolution. It has been reported that the new mapping solution isbetter than EPAM, PBB and NMAP in terms of communication cost,power consumption and network latency. The Branch-and-Boundalgorithms demand high memory depth and suffer from longCPU time requirement.

4.2.2. Heuristic searchA number of heuristic approaches have been reported to solve

the application mapping problem. They can be broadly classifiedinto transformative heuristics and constructive heuristics.

4.2.2.1. Transformative heuristics. Transformative heuristics trans-form some existing mapping solution(s) to arrive at better ones.Typical examples include the evolutionary techniques, such as, Ge-netic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Col-ony optimization (ACO), and so on.4.2.2.1.1. GA based transformative heuristics. Genetic Algorithm(GA) is a stochastic search algorithm based on operations of naturalgenetics. Here, fixed-sized population of chromosomes evolvesover a number of generations following the principle of naturalselection. Each chromosome identifies a potential solution. A chro-mosome has got an associated fitness measure. Using the operatorssimilar to crossover and mutation in nature, the population evolvesthrough generations. To evolve a new generation, generally top fewpercentages of chromosomes are directly copied to the next gener-ation. Rest of the population is created by two operators – cross-over and mutation. The crossover operator selects two parentchromosomes to participate in the operation. Their parts are ex-changed to create new offspring. The crossover may be single-point or multi-point. The mutation operator may be implementedby selecting a parent chromosome and randomly changing some ofits portions. The mutation rate can be controlled to control the rateof convergence to local or global minima. The termination criterionis often set to be ‘‘no improvement in last few generations’’ or ‘‘aspecified number of generations for which the GA has run’’.

A two-step Genetic Algorithm (GA) for mapping applicationsonto NoC has been proposed in [38], which reduces the overall exe-cution time. In the first step, the tasks are assigned onto differentIPs assuming the edge delays to be constant and equal to the aver-age edge delay. In second step, the IPs are mapped to tiles of NoCtaking the actual edge delay based on the network traffic model,and the total system delay is minimized. In this mapping, some de-lay factors, such as, the message sending probability of cores, pack-et length and the network contention for communication have notbeen considered.

In [39], authors have proposed a delay model for applicationmapping onto NoC considering all these factors. Their proposed ge-netic algorithm based delay model can map application onto NoCoptimally with a minimum average delay. In this proposed geneticalgorithm, a population corresponds to the core positions to theNoC topology. The initial population is chosen randomly. In thiscase the fitness function is average waiting-time. To evolve newgeneration, a multi-point crossover is used with randomly chosencrossover points. Chromosome with lower waiting-time has ahigher chance of participating in crossover than a chromosomewith higher waiting-time. The size of the chromosome is same asthe number of cores in the core graph. To control the rate of con-vergence to local or global minima a mutation is performed ran-domly on a chromosome with a mutation probability. The aboveoperations are performed repeatedly until the lowest average wait-

ing-time has not been changed for a specified number of iterations.Then the best solution at the end generates optimal core positionson the NoC. For example, let the cores a to f to be mapped onto a(2 � 3) 2-D mesh, and the final chromosome structure is (1 2 1 33 6). In this structure, the first integer is used to map a, the secondinteger is used to map b, and so on. All the solutions will have a 1 inthe first position. The second core b can be mapped onto twoplaces, that is, before a (with a value 1) or after a (with a value2). So, now b is placed after a as the integer value for b is 2 inthe chromosome structure. Then there are three placeholders forc, that is before a (with a value 1), in between a and b (with a value2), or after b (with a value 3). For c, the integer value is 1, so it isplaced before a. Similarly the other cores are placed and the finalpositions of cores are obtained as (c a e d b f) according to the chro-mosome structure. Utilizing this representation for the chromo-somes, cores are placed in a (2 � 3) 2-D mesh connected NoCaccording to the positions obtained in the final solution as shownin Fig. 3.

A pareto based multi-objective evolutionary computing tech-nique has been proposed in [40], that optimizes performance andpower consumption of mapped NoC. Same authors in [41] usedthe above technique for application task mapping. For dynamicevaluation, an event-driven trace-based simulator has been usedto compare their results with pareto based Branch-and-Bound ap-proach [41] and pareto based NMAP approach [41]. A multi-objec-tive genetic algorithm based application mapping for NoC has beenpresented in [42], which targets mapping with Network Assign-ment (NA) for heterogeneous distributed embedded systems to im-prove the performance and reduce the power consumption andarea. This technique first allocates tasks to cores, and then mapsthe cores to different tiles of NoC satisfying communicationrequirements. The mapping of IP cores onto NoC tiles, togetherwith routing path allocation has been referred as network assign-ment (NA). The network assignment is usually performed after taskmapping to reduce on-chip inter-communication distance.

The Genetic Algorithm based optimization technique MGAPproposed in [43] minimizes the power consumption by reducingthe number of switches in the communication path between coresand also maximizes the throughput. Though similar technique hasbeen used in [38], but here authors have considered the dynamiceffect of traffic. They have also given a set of solutions using paretomapping as used in [40,41]. A multi-objective Genetic Algorithm(MOGA) based application mapping technique has been proposedin [44], where one–one as well as many–many mapping betweenswitches and tiles have been taken into consideration to minimizeenergy consumption and required link bandwidth. It is used to findoptimal solution from the pareto optimal solutions as in [43]. Thechromosome is representation of the mapping solution [43,44]which is formed by m � n genes, where, m is the number of rowsand n is the number of columns of the mesh connected NoC. Theith gene corresponds to the core in the tile having row dði=nÞeand column (i % n). Here the crossover is single-point, and duringthe crossover the maximum communicating cores are remappedto random tiles result in a new chromosome. The mutation opera-tion is performed upon a chromosome by choosing highly commu-nicated cores and placing them nearby to each other.

In [45,46], CGMAP, a Genetic Algorithm based application map-ping technique has been proposed that uses the chaotic mappingoperator instead of the random processes in GA. Here the concept

Page 6: A Survey on Application Mapping Strategies for Network-On-Chip Design

6 75821 4 3

Router Number 0 31 2 4 5 6 7

Core Number

Fig. 4. Particle structure [60].

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 65

of chaotic sequences has been combined with genetic algorithm foran optimal mapping solution. The same authors in [47] presented adifferent one-dimensional chaotic mapping technique onto NoC.Here authors have combined different chaotic operators with GAto arrive at a better solution. GBMAP, an evolutionary approachfor mapping cores onto NoC architecture has been proposed in[48], which reduces energy consumption and total bandwidthrequirement of NoC. GAMR [49], a genetic algorithm based mappingand routing approach addresses a two phase mapping of IP coresonto NoC architecture and generates a deterministic dead-lock freeminimal routing path for each communication to minimize the totalcommunication energy and maximize link bandwidth utilization ofthe NoC architecture. In the first phase, GAMR maps IP cores ontodifferent resource nodes of mesh based NoC architecture. In secondphase, it generates deterministic dead-lock free minimal routingpath for each communication trace without changing the place-ment of cores generated in the first phase. In [50], authors have pro-posed Architecture-Aware Analytic Mapping algorithm (A3MAP)for NoC with homogeneous and heterogeneous cores on regularand irregular mesh or custom architecture. The task mapping prob-lem is solved by two effective heuristics, a successive relaxationalgorithm as a fast algorithm and a genetic algorithm to find bettermapping solutions. In [51], a genetic algorithm based mappingtechnique has been proposed for customized NoC architecture to re-duce the communication energy. The same authors in [52] proposeda GA based congestion aware mapping technique for irregular cus-tomized NoC architecture to reduce the communication energy. AMulti-objective Adaptive Immune Algorithm (MAIA), based on evo-lutionary approach has been proposed in [53], which maps theapplication tasks onto NoC to reduce the power consumption andoverall network latency. The adaptive immune algorithms integratea wide set of features that improve local search while preventingthe premature convergence by preserving the diversity of solutionsin the population. The same authors in [54] have proposed an im-proved version of MAIA to solve the multi-application NoC problem.It produces a set of mapping alternatives by exploring the mappingspace.

The main drawback of such genetic approach is the slow rate ofconvergence. It often requires the GA to evolve a large number ofgenerations to converge to a solution. The best solution at theend is taken to be the solution of the mapping problem. To accel-erate the rate of convergence, the mutation rate can be increased.However, it mostly converges to local best solutions, rather thanfinding the global best.4.2.2.1.2. PSO and ACO based transformative heuristics. ParticleSwarm Optimization (PSO) [55] is a population based stochastictechnique developed by Eberhart and Kennedy in 1995, inspiredby social behaviour of bird flocking or fish schooling. In a PSO sys-tem, multiple candidate solutions coexist and collaborate simulta-neously. Each solution, called a particle, flies (evolves) in theproblem space according to its own experience as well as the expe-rience of neighbouring particles. It has been successfully applied inmany problem areas. In a PSO, each single solution is a particle inthe search space, having a fitness value. The quality of a particleis evaluated by its fitness.

PLBMR, a PSO based two-phase application mapping algorithmhas been proposed in [56], which minimizes the NoC communica-tion energy and allocates the routing path for balancing the link-load. In first phase, the PSO maps IP cores onto NoC to minimizethe energy consumption, and in the second phase the routing pathsare allocated to every pair to satisfy the link-load balance. The par-ticle structure and initial particle generation is same as the chro-mosome structure of GA based technique described in [39]. In[57], authors have proposed a Particle Swarm Optimization (PSO)based application mapping technique for NoC. However, the meritof the scheme is not clear, as no comparison has been made with

the existing approaches. A mapping technique based on discretePSO has been presented in [58]. However, it only considersimprovement over genetic algorithm based method and reportsrelative improvements only. In [59], a hybrid multi-objective algo-rithm has been proposed, where Dijkstra’s shortest path algorithmhas been used to find the shortest path among communicatingcores to satisfy the bandwidth constraints and then a multi-objec-tive pareto based Particle Swarm Optimization (PSO) technique isapplied upon that to improve performance.

In [60], PSMAP, a meta-heuristic strategy using Particle SwarmOptimization (PSO) technique has been proposed to reduce bothstatic and dynamic cost of NoC for 2-D mesh based applicationmapping. A particle corresponds to a possible mapping of coresto the routers. An example of a particle structure has been shownin Fig. 4. The numbers shown within circles in the boxes are thecore numbers present in the core graph. The numbers outsidethe box are the router numbers of the topology graph. It is assumedthat the routers are numbered in an increasing order from top leftto bottom right position. The figure shows that core 1 is attached torouter 0, core 4 is attached to router 1, and so on. If the number ofnodes (routers) present in the topology graph is greater than thenumber of cores present in the core graph, dummy nodes areadded to the core graph to make the two numbers same. Dummynodes are connected to all core nodes and between themselves.Edges connecting a core node to dummy nodes and the edges be-tween dummy nodes are assigned a cost zero. Let N be the numberof cores present in the core graph, after connecting dummy nodes,if required. For these N cores, there are N node positions in thetopology graph. A particle is a permutation of numbers from 1 toN, which shows the placement of cores to the node positions ofthe topology graph. The overall communication cost is influencedby the position of cores in a particle. In our formulation, the overallcommunication cost forms the fitness function. Fitness of a particlepi is equal to the overall communication cost after placement ofcores of the core graph to different routers, as specified by theparticle.

In the evolution process, every particle i has its correspondinglocal best pbesti, which is the permutation of core positions thatgives the minimum communication cost, among all permutationsthat the particle has seen so far. The local best permutation valueguides partially the evolution of the particle. For a particular gen-eration, the particle resulting in the minimum communication costis the global best (gbestk) for that generation. This parameter alsoguides the evolution of particles. The particles evolve through gen-erations to create new particles which are expected to give resultscloser to the optimum. In the first generation, the initial populationis created randomly and the fitness of individual particles is evalu-ated. The local best (pbesti) of each particle is set to be same as theinitial particle. The global best (gbestk) of a generation is the parti-cle giving the least communication cost (smallest fitness function)in that generation. Further generations are evolved through a ser-ies of operations called swap operations [61]. The local best of eachparticle and the global best of a generation are modified if the cor-responding values in the current generation are lesser than the val-ues in the previous generation.

For a particle p, the router associated with a core is identified bythe position index of the core in p. The indexing of the positiontakes value between 0 and N – 1 (N being the number of routers).The index corresponds to the router number, as shown in Fig. 1.

Page 7: A Survey on Application Mapping Strategies for Network-On-Chip Design

Fig. 5. Application mapping onto NoC [67,68].

66 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

Let the swap operator be SOj,k (where, j and k = 0,1,. . .N � 1) thatswaps jth and kth positions of the particle p to create a new particlepnew. For example, let us consider the particle p = {1, 4, 3, 6, 2, 8, 5,7}, where the numbers represent the core numbers of the coregraph and the position represents the router numbers in the topol-ogy graph. The swap operator SO4,6 swaps the cores at positions 4and 6, which creates a new particle pnew = {1, 4, 3, 6, 5, 8, 2, 7}.

A swap sequence SS is made up of one or more swap operators.The swap operators of the swap sequence are applied, in order,upon the particle p to create a new particle pnew. For example, letthe swap sequence SS = {SO4,6, SO2,5} be applied upon the particlep = {1, 4, 3, 6, 2, 8, 5, 7}. It creates a new particle pnew = {1, 4, 8,6, 5, 3, 2, 7}.

To align a particle pi with its local best, the swap sequence isidentified. Let this be SSl best

i . Then another swap sequence is iden-tified to align the particle with the global best. Let this be SSg best

i .Now the swap sequence SSl best

i is applied on particle pi with a prob-ability of a [62]. Let the modified particle be pl best

i . Then the swapsequence SSg best

i is applied on pl besti with a probability of b [62].

This creates a new particle pnewi . Its fitness is evaluated and the lo-

cal best is updated for particle i, if it is better than the previous lo-cal best for the particle. If the best fitness in a generation is betterthan the global best of the previous generation, the global best isalso updated.

The Ant Colony Optimization (ACO) technique [63] is a popula-tion based probabilistic technique developed by A. Colorni and M.Dorigo in 1991, inspired by the biological behaviour of ants in find-ing the paths from the colony to a food source. Thus, when one antfinds a good path from the colony to a food source, other ants aremore likely to follow that path, and positive feedback eventuallyleads all the ants following a single path. It constitutes somemeta-heuristic optimization. An Ant Colony Optimization (ACO)based algorithm has been proposed in [64] for application taskmapping onto NoC to minimize the bandwidth requirement. Theresults have been compared with random mapping techniques.

Iteration 1

Iteration 2

Iteration 3

Iteration 4

Fig. 6. An example of binomial merging (N = 16) iterations [69].

4.2.2.2. Constructive heuristics. In constructive heuristics, partialsolutions are generated sequentially, and at the end the final map-ping solution is obtained. The constructive heuristic may be con-structive without iterative improvement or constructive withiterative improvement. Constructive heuristic search techniquesare normally much faster than the transformative heuristics.

4.2.2.2.1. Constructive heuristic without iterative improvement. Aconstructive heuristic without improvement algorithm maps thecores of a core graph, one at a time, onto the NoC topology graphby selecting the cores based on some predefined criteria. There willbe no change of position of a core once the placement of a core isdone. No optimization technique is applied upon the initial solu-tion to arrive at a better solution.

PMAP, a two-phase mapping algorithm for placing clusters ontoprocessors has been presented in [65], where highly communicat-ing clusters are placed on adjacent nodes of the processor network.Each cluster contains all tasks which are to be executed in the sameprocessor having zero interconnection overhead to increase paral-lelism. UMARS, a unified mapping, routing and slot allocation algo-rithm presented in [66] couples mapping, path allocation andtime-slot allocation to minimize communication energy. This tech-nique maps cores onto NoC topology, route the communicationand allocate TDMA time-slots on network channels so that applica-tion constraints are met. SMAP [67] is a simulation based environ-ment, which performs application mapping and task routing for 2-D mesh-based NoC to minimize execution time and communica-tion energy. In this technique the highest priority task is mappedat the centre and other tasks are mapped from the mapped tasksspirally [68] to the boundaries of the mesh based NoC by placing

highly communicating cores as close as possible to each other(Fig. 5).

An efficient binomial IP mapping and optimization algorithm(BMAP) has been presented in [69] to reduce hardware cost ofon-chip network. It is a very fast and efficient algorithm having lesscomputation complexity than NMAP [75] (discussed in Section4.2.2.2.2). The binomial mapping comprises of three steps – IPranking, merging IP set, and refreshing IP set. IP ranking dependsupon the communication bandwidth between them. The commu-nication bandwidth of an IP is the sum of the bandwidth from itto other IPs and from other IPs to it. Depending on the IP ranking,the most communicated IP sets are merged two-by-two every iter-ation as shown in Fig. 6. The new requirements of merged IP setsare recalculated by taking each IP set as an individual IP, which re-fresh the IP set.

CHMAP [70] is a chain-mapping algorithm that produces chainsof connected cores in order to introduce a method for applicationmapping onto mesh-based NoC. CMAP [71] is a fast constructiveapplication mapping algorithm that maps tasks onto NoC minimiz-ing total communication cost and energy. It is a hybrid of two con-structive mapping algorithms – link-based mapping (LBMAP), andsort-based mapping (SBMAP). After comparing the results of thesetwo, the better one is taken as output. RMAP, a reliability-awareapplication mapping technique for mesh-based NoC has been pro-posed in [72]. It divides the application graph into two sub-graphswhich minimizes the communication traffic between the sub-graphs and maximizes the traffic within each sub-graph. Thenone sub-graph is mapped onto upper triangular nodes of the NoCand the other is mapped to lower triangular nodes of the NoC. Thistechnique utilizes the non-uniformity of traffic distribution overthe network channels to efficiently route the packets of redundantcommunications. In [73], all the nodes and the interconnectionsamong nodes of 2-D mesh-based NoC are abstracted as a tree. Inthis tree model the vertex with highest communication volumeis selected as root node. The vertices communicating to the root(node) are the children of that node, and so on. During mapping,

Page 8: A Survey on Application Mapping Strategies for Network-On-Chip Design

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 67

the root node is placed at centre of mesh-based NoC, and the tra-versal is made from the centre towards the borders of the NoC.The children nodes are placed by seeing the tree structure andthe communication volume of interconnect from the centre to-wards the borders.

CastNet, an energy-aware application mapping and routingtechnique for 2-D NoC has been proposed in [74]. Before mapping,a priority list for the tasks is formed based on its total communica-tion bandwidth and average communication bandwidth. Depend-ing on the priority list, the initial task is selected. If there is a tie,the task is selected randomly. The next task which is most commu-nicated with mapped task is selected for mapping. If again there isa tie, then the higher priority task is selected between them formapping. For mapping the first task, a set of initial node positionsis selected (Fig. 7). A set of solutions are generated by this tech-nique for each initial node position for the initial task. The remain-ing tasks are placed on the nodes of NoC according to the prioritylist. After each mapping the priority list is updated. Finally, fromthe set of solutions, the best one is taken as the solution for map-ping of application onto NoC.4.2.2.2.2. Constructive heuristic with iterative improvement. In thiscase, the cores of the core graph are mapped onto NoC topologygraph one at a time based on some predefined criteria to generatean initial solution. Then an iterative improvement is acted uponthe initial constructed solution, to find better candidate solutions.

In [75], NMAP, a mapping technique has been proposed withminimum path routing in the mesh architecture which satisfiesthe bandwidth constraint and minimizes the average communica-tion delay. The proposed heuristic has three phases. In initializa-tion phase, the core having maximum communication demand ismapped to a node having maximum neighbours. Then the corehaving most communication demand with the already mappedcores is selected for mapping. The selected core is mapped to thenode that minimizes the communication cost, that is, (hop-count � Bandwidth) with mapped cores. This is obtained by exam-ining every available node in the mesh. This procedure is continuedtill all the cores are mapped. In next phase, Dijkstra’s shortest pathalgorithm is applied to the quadrant graph for minimum path com-putation with satisfaction of bandwidth constraints. In the lastphase, the initial solution is improved iteratively by invoking thesecond phase for each pair-wise swapping of mapped cores. It alsoproposes traffic splitting that considers the mapping problem to-gether with the possibility of splitting traffic among various paths.For various benchmark applications, NMAP produces better resultsthan the reported mapping algorithms before it. A tool, SUNMAP,has been presented in [76] to automatically select the best stan-dard topology for a given application and producing a mappingof cores onto that topology. It minimizes the average communica-tion delay, area, power dissipation subject to bandwidth and areaconstraints. MOCA, a two phase heuristic for low energy meshbased on-chip interconnection architecture has been proposed in[77], to reduce the communication energy considering the band-width and latency constraints. In the first phase, the cores aremapped to different routers of the mesh by invoking a bi-partition-

(a) Selected region for initial candidate core (b) Initial candid

Region for initial

candidate core

Fig. 7. Candidate for initial core selection [74]. (a) Selected region for initial candidate c

ing based slicing tree generation technique. In the second phase, itattempts to find a minimal path from source to destination for eachtraffic trace. It does not give good solution when latency con-straints are considered.

All the mapping techniques proposed previously, use communi-cation weighted model (CWM) to account for the overall communi-cation volume of each channel. It does not consider communicationtiming. To capture both timing of application communication andcommunication volume, communication dependence and compu-tation model (CDCM) has been proposed in [78,79], which mapsapplications on regular NoC under bandwidth constraint and min-imizes average communication delay. The same authors in [80]have compared different algorithms for obtaining low energy map-pings onto NoCs using a CWM. They have also proposed two heuris-tics, largest communication first (LCF) and greedy incremental (GI)for low energy mapping using CWM. The CWM counts the dynamicenergy only when there is a bit transition. However, traffic withoutbit transitions also consumes dynamic energy. Therefore, to over-come the problems of CWM, the same authors have been proposedan extended communication weighted model (ECWM) [81], whichcaptures both the volume of communication and the bit transitionrate in each communication channel. A Simulated Annealing (SA)based application mapping technique has been proposed in [82]for 2-D mesh based NoC which minimizes the area requirementand the maximum bandwidth. It also proposes an efficient routingalgorithm which selects a route among alternative paths based onthe network state and occupancy of queues. Cluster based tech-nique combined with simulated annealing has been proposed in[83] for application mapping onto 2-D mesh-based NoC. In thistechnique, mapping is done cluster-wise, instead of node-wise, toreduce the mapping complexity. Clustering is a technique to parti-tion nodes into groups according to the physical distance amongthem in the network topology. Clustering exploits the knowledgeabout the network architecture and communication demand ofapplications. So in this mapping technique, first cluster-based coreto node initial mapping is done and then a simulated annealingtechnique is applied upon it to find good mapping solution. In[84–86], authors have analyzed different approaches to minimizetotal communication energy by inserting some permissible longerlinks and by-passing some routers of application-specific NoC. Inthis process, by network partitioning, the area cost is reduced byreducing both router area and number of links. In [86], the authorshave proposed an efficient methodology to choose the most powerefficient application-specific NoC architecture. In this paper,authors have compared different topologies taking only one appli-cation benchmark and reported the best one, but that topologymay not be good for other applications. Topology design is one ofthe significant factors that affects the net delay and energy con-sumption of an application specific NoC. The topology must satisfythe design constraints. For very high I/O rate streaming type ofapplication mapping, a guaranteed and high throughput pipelinedmechanism for NoC is introduced in [87]. In this paper, authorshave proposed a pipeline-based high throughput low energy map-ping algorithm which performs task allocation, pipelined task

ate core (4×4 mesh) (c) Initial candidate core (5×5 mesh)

ore. (b) Initial candidate core (4 � 4 mesh). (c) Initial candidate core (5 � 5 mesh).

Page 9: A Survey on Application Mapping Strategies for Network-On-Chip Design

68 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

scheduling, and communication scheduling simultaneously on theheterogeneous NoC and minimizes the energy consumption.

Onyx, a new bandwidth constrained application mapping hasbeen presented in [88] to minimize the overall communicationcost of NoC. In this technique, a core with the highest communica-tion bandwidth has been mapped at the centre. Then the ranking of

Fig. 8. Concept of Lozenge-shape path selection [88].

Fig. 9. Zigzag path for core mapping [89].

C4 CC7C6C5C1 C2 C3Level-0Initial Graph

C11 C1C14C13C12C8 C9 C10Level-1

Partitions

C12 C1C14C13C11C8 C9 C10Level-2Partitions

C10 C1C13C15C11C9 C12 C8Level-3Partitions

Partition ID-0

Partition ID-0

Partition ID-0

Partition ID-1

Partition ID-1 Partition ID-2 Partition ID

Fig. 10. Partitions at all levels

other unmapped cores are settled according to the communicationvolume with mapped cores. The unmapped cores are placed at thenearest possible distance with its related core by looking the loz-enge-shape path with one hop or two hop distances and so on tillthe empty tile is identified (Fig. 8). In [89], Crinkle, a mapping algo-rithm has been presented to reduce the overall communicationcost. In this technique, priority lists are prepared depending onthe interconnection degree of nodes and communication band-width before mapping onto mesh based NoC. Depending on thepriority lists, the heuristic maps the tasks from the corner of 2-Dmesh platform and ends on another corner in a zigzag manner(Fig. 9).

In [90,91], a power-aware template-based efficient mapping(TEM) algorithm for NoC has been proposed to generate good map-ping solutions with low run time under bandwidth and latencyconstraints. In this mapping technique, a core having highest con-nectivity is called as hot core and that is mapped onto the tile hav-ing maximum number of neighbour tiles. In an application coregraph there is at least one hot core. Once all the hot cores aremapped, the mapping sequences of remaining unmapped coresare performed based on the decreasing order of weight of edgesconnecting to them with minimum hop distance to an alreadymapped hot core. An Architecture-Aware Analytic Mapping algo-rithm (A3MAP) for NoC with homogeneous and heterogeneouscores on regular and irregular mesh or custom architecture hasbeen proposed in [50]. The task mapping problem is solved by asuccessive relaxation algorithm and a genetic algorithm is appliedupon this to find better mapping solutions. Citrine, a two step 2-Dmesh mapping algorithm has been proposed in [92], which usesthe mapping technique Onyx [88] to retrieve the order of coresand then a Branch-and-Bound search tries to search different per-mutations by lozenge shaped rule of Onyx [88]. In [93], authorshave proposed a two step multi-application mapping algorithmthat maps multiple applications simultaneously onto different re-gions of NoC to minimize network latency and energy consump-tion for a set of applications. The algorithm consists of anapplication mapping phase followed by a task mapping phase.The application mapping phase deals with the multiple applica-tions mapping to optimize the layout of multiple applications onthe NoC and find a region with minimal Nodes Average Distance(NAD) for each application. After application mapping phase, therole of task mapping phase is to map the tasks of the applicationso that the individual as well as overall average communication

C10C98 C14C13C12C11 C16C15

C2C15 C6C5C4C3 C8C7

C6C55 C2C1C16C7 C4C3

C7C64 C2C1C16C5 C4C3

Partition ID-1

Partition ID-2 Partition ID-3

-3 Partition ID-4 Partition ID-5 Partition ID-6 Partition ID-7

for core graph VOPD [94].

Page 10: A Survey on Application Mapping Strategies for Network-On-Chip Design

U1 U6U2 U5

U3 U8U4 U7

U9 U14U10 U13

U11 U16U12 U15

Level-1Partitions

Level-2Partitions

Level-3Partitions

{C12, C9} {C15, C11}

{C8,C10 } {C13, C14}

{C7, C6} {C2, C1}

{C16, C5} {C4, C3}

Fig. 11. Initial mapping for VOPD application [94].

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 69

distance is minimized. The task mapping of each application fol-lows the tree-model based mapping as described in [73].

In [94], LMAP, a mapping algorithm has been proposed to re-duce both static and dynamic cost of mesh based NoC. This is athree phase mapping algorithm. The first one is a partitioningphase, in which a Kernighan–Lin (K–L) partitioning scheme [95]has been used to identify the closeness of cores by analyzing theirbandwidth or communication requirements. This bi-partitioning isapplied (recursively) until the closest two cores are left in one finalpartition as shown in Fig. 10 for VOPD (Fig. 12b) application. Thesecond phase is initial mapping. In this phase the placement ofthe cores of an application onto a mesh connected NoC is per-formed based on the K–L final partitioning result (Fig. 11). After ini-tial mapping an iterative improvement phase has been applied byswapping and flipping of cores within partition to arrive at a finalmapping solution.

All the application mapping techniques of NoC discussed aboveare based on mesh based network architecture. But it is essential tocheck the suitability of other network topology when applicationsare mapped onto that. In [96], an energy-aware mapping techniquehas been proposed which maps the IPs onto tree based NoC archi-tecture such that the total communication energy can be mini-mized. In this technique, first an energy-aware mapping isformulated, and then a recursive bi-partitioning algorithm is usedto solve it. An application mapping heuristic has been proposed[97] for generating optimal tree based topology for multimediaapplications to minimize energy consumption while meeting thedesign constraints. Application mapping techniques have beenproposed in [98] and [99] to map applications onto Butterfly-Fat-Tree (BFT) and Mesh-of-Tree (MoT) based NoC respectively. In thistechnique a Kernighan–Lin (K–L) partitioning scheme has beenused [94] to identify the closeness of cores by analyzing their band-width or communication requirements.

5. Performance comparison

An application can be represented in the form of a core graph[75], defined as follows

Definition 1. The core graph for an application is a directed graph,G(C, E) with each vertex ci 2 C representing a core and the directededge ei,j 2 E representing the communication between the cores ci

and cj. The weight of edge ei,j, denoted by commi,j, represents thebandwidth requirement of the communication from ci to cj.

On the other hand, the given NoC topology can be representedin the form of a topology graph [75].

Definition 2. The NoC topology graph is a directed graph P(U, F)with each vertex ui 2 U representing a node in the topology and thedirected edge fi,j 2 F representing a direct communication betweenthe vertices ui and uj. The weight of the edge fi,j, denoted as bwi,j,represents the bandwidth available across the edge fi,j.

A mapping of the core graph G(C, E) onto the topology graphP(U, F) is defined by the function map: C ? U, such that, " ci 2 C,$ uj 2 U and map(ci) = uj.

The function associates core ci to router uj. Naturally, mapping isdefined only when |C| 6 |U|. The quality of such a mapping is de-fined in terms of the total communication cost of the application un-der this mapping [75]. The communication between each pair ofcores can be treated as flow of a single commodity dk, k = 1, 2,...,|E|. The value of commodity dk, corresponding to the communica-tion between cores ci and cj is equal to commi,j, the bandwidthrequirement. If ci is mapped to the router map(ci) and cj is mappedto map(cj), the set of all commodities D = {dk} is defined as follows

D ¼ fdkjvalueðdkÞ ¼ commi;j; for k ¼ 1;2; . . . ; jEj and ei;j 2 Eg

Also,

sourceðdkÞ ¼ mapðciÞ and sinkðdkÞ ¼ mapðcjÞ

The link between two individual routers ui and uj of the topol-ogy has a maximum bandwidth of bwi,j. The total commodity flow-ing through such a link should not exceed this bandwidth. Thequantity xk

i;j indicating the value of commodity dk flowing throughthe link (ui, uj) is given by,

xki;j ¼

valueðdkÞ; if linkðui;ujÞ 2 Path ðsourceðdkÞ; sinkðdkÞÞ0; otherwise

(

where Path (a, b) indicates the deterministic routing path betweenthe mesh nodes a and b in the topology. Satisfaction of bandwidthlimitations of individual links must be ensured. That is, all mappingsolutions should satisfy the following relation.

Page 11: A Survey on Application Mapping Strategies for Network-On-Chip Design

(b) VOPD (g) mp3enc mp3dec (h) 263dec mp3dec

1 2

9

3

4

8

5 6 7

10

64 1211

128

96

64

64

96

96

96

96

96 96

128 38.016

1

2

9

3

4

8

5

6

7

10 114.06

37.958

0.5

0.193

2.083

0.025

0.01

46.733

12

24.634

38.001

38.001

1

2

9

3

4

8

5

6

7

10

13

12

11

0.025

2.0834.06

0.51

1

0.87

0.15

0.18

2.083

0.01

4.06

0.5

3.672

1

2

93

4 8

5

6

7

10

14

11

2.0830.53.672

0.5

0.025

0.10.5

0.187 0.025

0.01

0.384.06

0.25

12

13

3.672

(a) DVOPD

(d) PIP

(e) MWD (f) 263enc mp3dec

(c) MPEG - 4

1 2

9

3

8

567

10 1411

362

16

12 13

15

70 362

35349

357 27

300

94

4362

500 313

16

16

16157

313

16

1616

16

9 5

7

1

11

2410

12 8

190

3

6

0.5

600

17340

670

40500 250

91032 60

0.55 1

7

2

8

36

4

64 128

646464

6464

64

1 2

9

3

8

567

10 13

31

362

15

11 12

14

70 362

35349

357 27

300

94

4362

500 313

16

16

32

157

313

16

1616

16

16 17

24

18

23

202122

25 28

126

362

30

26 27

29

70 362

35349

357 27

300

94

19362

500 313

16

16

540

157

313

16

1616

16

126

540

Fig. 12. Application core graphs with communication bandwidth (MB/s).

70 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

XjEjk¼1

xki;j 6 bwi;j; for all i; j 2 f1;2; . . . ; jUjg

If all bandwidth constraints are satisfied, the communicationcost of a mapping solution is given by,

T ¼XjEjk¼1

valueðdkÞ hopcount ðsourceðdkÞ; sinkðdkÞÞ

Here, hopcount (a, b) is the number of hops between the topol-ogy nodes a and b. For a deterministic shortest path routing, hop-count corresponds to the minimum number of hops between theconstituent nodes. Since communication cost is very much depen-dent on the mapping solution, the overall mapping problem is tooptimize the communication cost, ensuring that the bandwidth con-straints of all individual links are satisfied. The communication costaffects the performance of the overall system and its energy con-sumption, as both of these factors are directly proportional to thetotal hopcount.

The application mapping results are generally reported on a setof benchmarks. Fig. 12 notes a number of such benchmark applica-tions. Most of the existing tools have reported results on threebenchmarks – VOPD, MPEG-4 and PIP. Table 1 notes those absolutemapping result along with the cost normalized to NMAP and aver-age communication cost relative to NMAP. To compare other

benchmarks (DVOPD, MWD, 263enc mp3dec, mp3enc mp3dec,263dec mp3dec), we have implemented the NMAP algorithm andILP ourselves. As expected, ILP based exact methods achieve bestresults. The evolutionary approaches also do quite well. In particu-lar, PSMAP could obtain results same as ILP. The results of LMAPand PSMAP are available from our existing works [94,60]. Tables2 and 3 note these results along with cost normalized to NMAP.Here also it can be observed that PSMAP produces results almostsame as ILP with less CPU time. NMAP and LMAP results are veryclose. ILP could not be run on 32-core DVOPD example, as run-timebecomes unacceptably high.

All the algorithms are run on an Intel Core i5 platform with 4GBmain memory and 2.4 GHz clock frequency. The CPU times neededin each of the techniques for individual benchmarks are noted inTables 2 and 3. The PSMAP algorithm has been run with at most200 particles for at most 100 generations without improvement.

Since the results reported in the literature are for applicationswith less number of cores, we have used the TGFF tool [100] to gen-erate a few task graphs with 64 and 128 cores. By varying band-width, number of start nodes and in-out degree for nodes,different task graphs have been generated via TGFF. The band-widths are varied from 10 to 1500 MB/s for some graphs and 50to 150 MB/s for other graphs. The in-out degrees of nodes are var-ied from 1 to 8 to generate both low and high communicationgraphs. Number of start nodes also varied to generate different

Page 12: A Survey on Application Mapping Strategies for Network-On-Chip Design

Table 1Absolute communication cost, cost normalized to NMAP, and average communication cost relative to NMAP.

Mappingtechniques

VOPD MPEG-4 PIP Average communicationcost relative to NMAP

Absolute comm.cost (hops � BW)

Costnormalized toNMAP

Absolute comm.cost (hops � BW)

Costnormalized toNMAP

Absolute comm.cost (hops � BW)

Costnormalized toNMAP

ILP based exact mapping techniquesILP [31] 4119.0 0.966 3567.0 0.971 – – 0.972a

Cluster + ILP[32]

4205.0 0.986 3567.0 0.971 – – 0. 979

Deterministic search techniquesGMAP [33-35] 5553.0 1.302 7849.0 2.137 704.0 1.10 1.513PBB [33-35] 4317.0 1.012 3763.0 1.025 640.0 1.0 1.012Elixir [37] 4249.0 0.996 3640.0 0.991 – – 0.994

GA based transformative heuristic techniquesCGMAP

[45,46]4300.0 1.008 3600.0 0.980 – – 0.994

GBMAP [48] 4217.0 0.989 3572.0 0.973 – – 0.981GAMR [49] – – 3772.0 1.027 – – 1.027A3MAP-GA

[50]4141.0 0.971 – – – – 0.971

PSO and ACO based transformative heuristic techniquesPSMAP [60] 4119.0 0.966 3567.0 0.971 640.0 1.0 0.940b

ACO [64] – – 3633.0 0.989 – – 0.989

Constructive heuristic without iterative improvementPMAP [65] 7054.0 1.654 6128.0 1.669 832.0 1.30 1.541BMAP [69] 4351.0 1.020 6280.0 1.710 – – 1.365CHMAP [70] 4249.0 0.996 3977.0 1.083 – – 1.040CMAP [71] 4281.0 1.004 3704.0 1.009 – – 1.006CastNet [74] 4135.0 0.969 3852.0 1.049 – – 1.009

Constructive heuristic with iterative improvementA3MAP-SR

[50]4265.0 1.0 – – – – 1.0

NMAP [75] 4265.0 1.0 3672.0 1.0 640.0 1.0 1.0b

MOCA [77] – – 5246.0 1.429 – – 1.429SA [83] 4231.0 0.992 – – – – 0.992CSA [83] 4169.0 0.977 – – – – 0.977Onyx [88] 4249.0 0.996 3612.0 0.984 – – 0.990LMAP [94] 4189.0 0.982 4006.0 1.091 640.0 1.0 0.983b

a Average communication cost relative to NMAP taking the values from Tables 1–3.b Average communication cost relative to NMAP taking the values from Tables 1–4.

Table 2Absolute communication cost, cost normalized to NMAP and CPU time for different applications with their corresponding algorithms.

Mappingalgorithm

DVOPD VOPDa MPEG-4a PIPa

Comm. cost(hops � BW)

CPU timein s

Cost normalizedto NMAP

Comm. cost(hops � BW)

CPU timein s

Comm. cost(hops � BW)

CPU timein s

Comm. cost(hops � BW)

CPU timein s

NMAP 10253.0 0.380 1.0 4265.0 0.024 3672.0 0.016 640.0 0.010LMAP 9974.0 1.660 0.973 4189.0 0.040 4006.0 0.040 640.0 0.010PSMAP 9752.0 14.287 0.951 4119.0 0.260 3567.0 0.040 640.0 0.010ILP – – – 4119.0 4474.730 3567.0 21.530 640.0 1.280

a Cost normalized to NMAP shown in Table 1.

Table 3Absolute communication cost, cost normalized to NMAP and CPU time for different applications with their corresponding algorithms.

Mappingalgorithm

MWD 263enc mp3dec mp3enc mp3dec 263dec mp3dec

Comm. cost(hops � BW)

CPUtime in s

Costnormalizedto NMAP

Comm. cost(hops � BW)

CPUtime in s

Costnormalizedto NMAP

Comm. cost(hops � BW)

CPU timein s

Costnormalizedto NMAP

Comm. cost(hops � BW)

CPU timein s

CostNormalizedto NMAP

NMAP 1184.0 0.016 1.0 230.407 0.012 1.0 18.171 0.016 1.0 20.073 0.016 1.0LMAP 1248.0 0.030 1.054 230.417 0.040 1.0 17.856 0.040 0.983 20.058 0.040 0.999PSMAP 1120.0 0.020 0.946 230.407 0.268 1.0 17.021 0.320 0.937 19.823 0.260 0.987ILP 1120.0 200.510 0.946 230.407 191.910 1.0 17.021 1432.430 0.937 19.823 4895.250 0.987

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 71

Page 13: A Survey on Application Mapping Strategies for Network-On-Chip Design

Table 4Absolute communication cost, cost normalized to NMAP and CPU time for different TGFF task graphs with their corresponding.

TGFF task graphs NMAP LMAP PSMAP

Comm. cost(hops � BW)

CPU timein s

Comm. cost(hops � BW)

CPU timein s

Cost normalizedto NMAP

Comm. cost(hops � BW)

CPU timein s

Cost normalizedto NMAP

64 Cores Graph 1 9207.49 8.43 7441.40 6.43 0.808 8380.79 13.87 0.910Graph 2 132,292.38 12.37 128,174.0 9.87 0.969 115,797.09 23.63 0.875Graph 3 116,337.81 13.41 121,835.0 9.90 1.047 110,077.83 52.14 0.946Graph 4 55,244.17 12.09 51,344.40 9.75 0.929 50,947.07 35.76 0.922Graph 5 6015.28 12.96 6381.99 9.95 1.061 5949.09 42.22 0.989Graph 6 44,902.16 12.93 44,005.10 9.74 0.980 42,086.60 36.56 0.937

128 Cores Graph 7 70,168.36 410.57 70,168.36 186.34 1.0 67,508.53 205.85 0.962Graph 8 503,767.47 548.63 477,572.87 320.56 0.948 453,078.22 567.74 0.899Graph 9 343,982.87 423.57 306,761.0 207.67 0.892 285,295.72 405.34 0.829Graph 10 82,744.31 451.50 80,746.20 137.83 0.976 73,940.90 584.36 0.894

72 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

graphs and to see the effect of mapping solutions upon them. Sixtyfour-core NoCs are implemented in 2-D as 8 � 8 and for 128-corecase, 2-D mesh is realized as 8 � 16. We have implemented themapping algorithms, such as, NMAP [75], LMAP [94] and PSMAP[60] for these 64 and 128 core graphs, and the mapping solutionsand cost normalized to NMAP for these graphs are noted inTable 4. PSMAP produces consistently good results, whereas, theresults produced by NMAP and LMAP are comparable.

6. Special mapping techniques

The design flow of NoC includes several parameters. There aresome special mapping techniques, such as, routing-aware map-ping, and integrated mapping and scheduling that represent highlycorrelated design problem that must be handled carefully to opti-mize different performance matrices.

6.1. Routing-aware mapping

CART, a Communication-Aware Routing Technique has beenproposed in [101,102] that optimizes the network performancefor application-specific NoCs. It combines both topology-agnosticrouting algorithm and a communication-aware mapping techniqueconsidering the bandwidth constraints. In [103], the same authorshave proposed a core mapping technique based on source routingto achieve a mapping with path length constraint. The path lengthconstraint has been achieved by a heuristic search that satisfies thedistance restrictions between source and destination. A multi-objective optimization strategy has been proposed in [104], whichdetermines the pareto optimal NoC configurations to optimizeaverage delay of the network and routing robustness. In this tech-nique, both the topological mapping and routing are consideredconcurrently. An application specific routing algorithm (APSRA)has been proposed in [105,106] to maximize the communicationperformance for an application after mapping onto NoC. The pro-posed algorithm can be applied for both deterministic as well asadaptive routing scheme. This algorithm can be used on any net-work topology and both homogeneous and heterogeneous 2-Dmesh connected NoC systems. After taking the mapped informa-tion of cores to routers, APSRA generates a set of routing tables thatguarantees both the reachability as well as the deadlock-free com-munication among the cores, maximizing the routing adaptability.

6.2. Integrated mapping and scheduling

The process of application mapping answers the question‘where’, but to answer ‘when’, scheduling is required. If multiplenumbers of tasks of an application are mapped onto one core, then

the task scheduling is encountered. Given an application taskgraph mapped onto NoC architecture, Scheduling is the time order-ing of tasks and communications determining the order in whichtasks and transactions between them are to be executed such thatdeadlines are met and some parameters are optimized. This iscalled the process of scheduling. In this light, an energy-awarecommunication and task scheduling has been proposed in[107,108] which maps tasks and statically schedules both commu-nication transactions and computation tasks onto heterogeneousNoCs. It automatically assigns the tasks onto different processingelements and schedules their execution under real-time con-straints. Here the communication is non-streaming in nature, thatis, tasks communicate at most once with each other. For streamingcommunication, in which tasks periodically and repeatedly com-municate with each other, a time constrained resource-efficientrouting and scheduling strategy for task mapping has been pre-sented in [109]. This minimizes resource usage by exploiting allscheduling freedom offered by NoC. Quality of service (QoS) is anessential parameter for real time and multimedia applications. Toachieve this, a rate-based scheduling policy in NoC has been pro-posed in [110]. In this task mapping method, a data flow requiringQoS is admitted only if all the routers in the path from source todestination of the NoC can transmit at a rate required by the spe-cific flow. Then each router dynamically defines the priority of eachQoS flow, locally depending on the required rate and the rate cur-rently used by the QoS flow. A non pre-emptive static traffic-awarescheduling has been proposed in [111], which maps the applicationtasks onto NoC keeping track of the network traffic and then sched-ules the computation and communication of tasks. A power-awareonline scheduling has been proposed [112] to minimize communi-cation energy consumption. For online scheduling, the communi-cation status of an application task graph is analyzed at run timeto implement the real-time scheduling. General mapping idea isto map the cores nearer having more communication bandwidth.This mapping can cause runtime traffic congestion if not properlyscheduled. Without coordinated scheduling on both computationand communication, speculative mapping may not generate effec-tive runtime behaviour. To handle above issue a combined map-ping and scheduling algorithm has been proposed in [113] whichroute and schedule the transmission in the process of task map-ping. In this technique a routing-aware list-scheduling methodhas been proposed to schedule each task onto the best fit processorminimizing the overall execution time. All the above mapping andscheduling techniques do not consider the temperature effect dur-ing mapping. Temperature affects performance, power, and reli-ability of the system. A temperature-aware task mapping andscheduling technique has been proposed in [114] which mapstasks using a heuristic and a floorplanning tool is used to reducethe peak temperature.

Page 14: A Survey on Application Mapping Strategies for Network-On-Chip Design

HW/SW Co -design

and Simulation

Mapping onto

Topologies

Application

Topology Lib

Floor Plan

Pow Lib

Area Lib

Topology Selection

Floor Plan

Area Lib

Pow Lib

Routing Function

SystemCFiles of whole Design

XpipesCompiler

Phase 2 Phase 3Phase 1

XpipesComponent

Lib

Fig. 13. Design flow of SUNMAP [76].

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 73

7. Application mapping tools

In this section we present an overview of some of the applica-tion mapping tools.

The tool SUNMAP [76] has the ability to map cores of an appli-cation onto various network architectures and choose the mostsuitable one amongst them. It explores the available topologies(from a library) for a given application and performs synthesisaround the best topology. The exploration of RTL-level NoC topol-ogies attempts to minimize average communication delay, designarea, and power dissipation subject to bandwidth and area con-straints. The tool is supported by various routing techniques, suchas, dimension ordered, minimum path, traffic splitting across min-imum path, and traffic splitting across all paths. The design flow ofSUNMAP tool is shown in Fig. 13. It has three phases of operation.In the first phase, mapping onto various network topologies is per-formed by considering the routing functions, area constraints,power constraints, and topology library. For each mapping, thebandwidth and area constraints are evaluated. In the second phase,all the mappings produced in first phase are evaluated for severaldesign objectives and the best one is selected. In the third phase,SUNMAP generates SystemC description of the network compo-nents using the xpipesCompiler [115,116] and xpipes component li-brary. The xpipesCompiler [115,116] automatically instantiatesnetwork components, such as, routers, links, and network inter-faces for a specific NoC topology using xpipes library. Most innova-tive feature of xpipes is that all its components are highlyparameterized, and it can be tailored at the design time accordingto the needs for a specific architecture.

SMAP [67] is an application mapping and simulation tool in theMatlab environment. It performs application mapping and taskrouting in a spiral fashion to enhance the performance of theNoC. It provides a variety of algorithms for application mapping,task routing, and task scheduling for different NoC topologiesand calculates a series of performance and cost metrics to selectthe best mapping onto a best NoC topology.

Some more tools have been reported in [117,119,121]. Selectionof network architecture for an application and associated mappingof cores onto NoC are major issues for a high performance NoC de-sign. Considering these issues, a NoC topology exploration basedmapping and simulation model has been presented in [117] to se-lect the best NoC topology for an application and mapping ontothat. The IP mapping is automatically computed by SCOTCH parti-tioning tool [118] respecting different design constraints. It per-forms static graph partitioning, mapping, and sparse matrix blockordering. SCOTCH allows the user to map efficiently any kind of

weighted core graph onto any kind of topology graph with differ-ent design constraints and topological constraints. xENoC [119] isan experimental NoC environment for parallel and distributedcomputing on NoC based MPSoC architectures. It shares the capa-bilities of xpipes and SUNMAP to select the topology for an appli-cation. It is also inspired by NoCGEN [120] to customize and selectdifferent NoC parameters to choose from different mapping, rout-ing and switching schemes. xENoC performs a complete HW/SWco-design to build an efficient distributed NoC based MPSoC de-sign. HeMPS [121] targets flexible application mapping strategies,fast design space exploration and performance evaluation of themapped application to select the best mapping for NoC basedMPSoC design. It supports both static and dynamic applicationmapping and SystemC simulation model for evaluation of perfor-mance and cost metrics.

8. Conclusion

This paper surveys the NoC application mapping strategies re-ported mostly in the last one decade. It classifies the reported tech-niques into groups like dynamic and static mapping approaches.Static mapping techniques have further been categorized as exactmethods, branch-and-bound, transformative, and constructive ap-proaches. We have also presented a performance comparison be-tween the static mapping techniques. Apart from the existingbenchmarks, we have generated some test cases having 64 and128 cores. Communication cost and mapping times of some ofthe algorithms have been compared. Thus, it provides a fair under-standing of the effort needed and quality of solution obtained indifferent mapping approaches.

References

[1] L. Benini, G. De Micheli, Networks on Chips: a new SoC paradigm, IEEEComputer 35 (1) (2002) 70–78.

[2] W. J. Dally, B. Towles, Route packets, not wires: on-chip interconnectionnetworks, in: Proceedings of the 38th Design Automation Conference (DAC),2001, pp. 684–689.

[3] S. Kumar, A. Jantsch, J.P. Soininen, M. Forsell, M. Millberg, J. Oberg, K.Tiensyrja, A. Hemani, A network on chip architecture and designmethodology, in: Proceedings of ISVLSI, 2002, pp. 117–124.

[4] U.Y. Ogras, J. Hu, R. marculescu, Key research problems in NoC design: aholistic perspective, in: Proceedings of the IEEE International Conference onHardware/Software Codesign and System, Synthesis, 2005, pp. 69–74.

[5] R. Marculescu, U.Y. Ogras, L.S. Peh, N.E. Jerger, Y. Hoskote, Outstandingresearch problems in NoC design: systems, microarchitecture, and circuitperspectives, IEEE Transactions on Computer-aided Design of IntegratedCircuits and Systems 28 (1) (2009) 03–21.

Page 15: A Survey on Application Mapping Strategies for Network-On-Chip Design

74 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

[6] A. Agarwal, C. Iskander, R. Shankar, Survey of network on chip (NoC)architectures & contributions, Journal of Engineering, Computing and,Architecture 3 (1) (2009).

[7] R. Pop, S. Kumar, A survey of techniques for mapping and schedulingapplications to network on chip systems, ISSN 1404 – 0018, Research Report04:4, School of Engineering, Jönköping University, 2004.

[8] G. Chen, F. Li, M. Kandemir, Compiler-directed application mapping for NoCbased chip multiprocessors, in: Proceedings of LCTES, 2007, pp. 155–157.

[9] E. Carvalho, N. Calazans, F. Moraes, Heuristics for Dynamic Task Mapping inNoC-based Heterogeneous MPSoCs, IEEE International Workshop on Rapidsystem Prototyping (RSP), 2007, pp. 34–40.

[10] C.L. Chou, R. Marculescu, Incremental runtime application mapping forhomogeneous NoCs with multiple voltage levels, in: ACM InternationalConference on Hardware/Software codesign and system synthesys, 2007, pp.161–166.

[11] C.L. Chou, R. Marculescu, User-aware dynamic task allocation in Network-on-Chip, in: Proceedings of Design, Automation and Test in Europe (DATE), 2008,pp. 1232–1237.

[12] C.L. Chou, U.Y. Ogras, R. Marculescu, Energy- and performance-awareincremental mapping for NoCs with multiple voltage levels, IEEETransactions on Computer-Aided design of Integrated Circuits and Systems27 (10) (2008) 1866–1879.

[13] E. Carvalho, F. Moraes, Congestion-aware task mapping in heterogeneousMPSoCs, in: International Symposium on SoC, 2008, pp. 1–4.

[14] A. Mehran, A. Khademzadeh, S. Saeidi, DSM: a heuristic dynamic spiralmapping algorithm for Network-on-Chip, IEICE Electronics Express 5 (13)(2008) 464–471.

[15] M.A.A. Faruque, R. Krist, J. Henkel, ADAM: run-time agent based distributedapplication mapping for on-chip communication, IEEE Design AutomationConference (DAC), 2008, pp. 760–765.

[16] A.K. Singh, W. Jigang, A. Prakash, T. Srikanthan, Mapping algorithms for NoC-based heterogeneous MPSoC platforms, in: Euromicro Conference on DigitalSystem Design/Architecture, Methods and Tools, 2009, pp. 133–140.

[17] A.K. Singh, T. Srikanthan, A. Kumar, W. Jigang, Communication-awareheuristics for run-time task mapping on NoC-based MPSoC platforms,Journal of System Architecture 56 (2010) 242–255.

[18] E. Carvalho, N. Calazans, F. Moraes, Dynamic task mapping for MPSoCs, IEEEDesign and Test of Computers (2010) 26–35.

[19] M. Mandelli, L. Ost, E. Carara, G. Guindani, T. Gouvea, G. Medeiros, F.G.Moraes, Energy-aware dynamic task mapping for NoC-based MPSoCs, in:Proceedings of ISCAS, 2011, pp. 1676–1679.

[20] M. Mandelli, A. Amory, L. Ost, F.G. Moraes, Multi-task dynamic mapping ontoNoC-based MPSoCs, in: Proceedings of the 24th Symposium on IntegratedCircuits and System Design, 2011, pp. 191–196.

[21] A. Weichslgartner, S. Wildermann, J. Teich, Dynamic decentralized mappingof tree-structured applications on NoC architectures, in: IEEE/ACMInternational Symposium on Network-on-Chip (NOCS), 2011, pp. 201–208.

[22] A. Bender, MILP based task mapping for heterogeneous multiprocessorsystems, in: Proceedings of International conference on Design andAutomation (EURO-DAC), 1996, pp. 190–197.

[23] C. Rhee, H. Jeong, S. Ha, Many-to-Mmany core-switch mapping in 2-D MeshNoC architectures, in: IEEE International Conference on Computer Design:VLSI in Computers and Processors (ICCD), 2004, pp. 438–443.

[24] S. Murali, L. Benini, G.D. Micheli, Mapping and physical planning of networks-on-chip architectures with quality-of-service guarantees, in: Asia and SouthPacific Design Automation Conference (ASP-DAC), 2005, pp. 27–32.

[25] K. Srinivasan, K.S. Chatha, G. Konjevod, Linear-programming-basedtechniques fo synthesis of Network-on-Chip architectures, IEEE Transactionson Very Large Scale Integration (VLSI) Systems 14 (4) (2006) 407–420.

[26] C. Ostler, K.S. Chatha, An ILP formulation for system-level applicationmapping on network processor architecture, in: Proceedings of Design,Automation and Test in Europe (DATE), 2007, pp. 1–6.

[27] O. Ozturk, M. Kandemir, S.W. Son, An ILP based approach to reducing energyconsumption in NoC based CMPs, in: IEEE International Symposium on LowPower Electronics and Design (ISLPED), 2007, pp. 411–414.

[28] P. Ghosh, A. Sen, A. Hall, Energy efficient application mapping to NoCprocessing elements operating at multiple voltage levels, in: IEEEInternational Symposium on Network-on-Chip (NoCS), 2009, pp. 80–85.

[29] J. Huang, C. Buckl, A. Raabe, A. Knool, Energy-aware task allocation forNetwork-on-Chip based heterogeneous multiprocessor systems, in:Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP), 2011, pp. 447–454.

[30] C.L. Chou, R. Marculescu, Contention-aware application mapping forNetwork-on-Chip communication architectures, in: IEEE InternationalConference on Computer Design (ICCD), 2008, pp. 164–169.

[31] S. Tosun, O. Ozturk, M. Ozen, An ILP formulation for application mapping ontoNetwork-on-Chips, in: International Conference on Application ofInformation and Communication Technologies (AICT), 2009, pp. 1–5.

[32] S. Tosun, Clustered-based application mapping method for Network-on-Chip,Journal of Advances in Engineering Software 42 (10) (2011) 868–874.

[33] J. Hu, R. Marculescu, Energy-aware mapping for tile-based NoC architecturesunder performance constraints, in: Asia and South Pacific Design AutomationConference (ASP-DAC), 2003, pp. 233–239.

[34] J. Hu, R. Marculescu, Exploiting the routing flexibility for energy/performanceaware mapping of regular NoC architectures, in: Proceedings of Design,Automation and Test in Europe (DATE), 2003, pp. 688–693.

[35] J. Hu, R. Marculescu, Energy- and performance-aware mapping for regularNoC architectures, IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems 24 (4) (2005) 551–562.

[36] T.J. Lin, S.Y. Lin, A.Y. Wu, Traffic-balanced IP mapping algorithm for 2D-Meshon-chip-networks, in: IEEE Workshop on Signal Processing Systems (SiPS),2008, pp. 200–203.

[37] M. Reshadi, A. Khademzadeh, A. Reza, Elixir: a new bandwidth-constrainedmapping for networks-on-chip, IEICE Electronics Express 7 (2) (2010) 73–79.

[38] T. Lei, S. Kumar, A two-step genetic algorithm for mapping task graphs to anetwork on chip architecture, in: Proceedings of the Euromicro Symposiumon Digital System Design (DSD), 2003, pp. 180–187.

[39] W. Zhou, Y. Zhang, Z. Mao, An application specific NoC mapping for optimizeddelay, in: IEEE International Conference on Design and Test of IntegratedSystems in Nanoscale (DTIS), 2006, pp. 184–188.

[40] G. Ascia, V. Catania, M. Palesi, Multi-objective mapping for mesh-based NoCarchitectures, in: ACM International Conference on Hardware/SoftwareCodesign and System Synthesis, 2004, pp. 182–187.

[41] G. Ascia, V. Catania, M. Palesi, Multi-objective genetic approach to mappingproblem on Network-on-Chip, Journal of Universal Computer Science 12 (4)(2006) 370–394.

[42] A.H. Benyamina, P. Boulet, Multi-objective mapping for NoC architecture,Journal of Digital Information Management 5 (2007) 378–384.

[43] R.K. Jena, G.K. Sharma, A multi-objective evolutionary algorithm basedoptimization model for Network-on-Chip synthesis, in: IEEE InternationalConference on Information Technology (ITNG), 2007, pp. 977–982.

[44] K. Bhardwaj, R.K. Jena, Energy and bandwidth aware mapping of IPs ontoregular NoC architectures using multi-objective genetic algorithms, in:International Symposium on System-on-Chip (SOC), 2009, pp. 27–31.

[45] F.M. Darbari, A. Khademzadeh, G.G. Fard, Evaluating the performance of achaos genetic algorithm for solving the network on chip mapping problem,in: International Conference on Computational Science and Engineering,2009, pp 366–373.

[46] F.M. Darbari, A. Khademzadeh, G.G. Fard, CGMAP: a new approach toNetwork-on-Chip mapping problem, IEICE Electronics Express 6 (1) (2009)27–34.

[47] G.G. Fard, A. Khademzadeh, F.M. Darbari, Evaluating the performance of one-dimensional chaotic maps in Network-on-Chip mapping problem, IEICEElectronics Express 6 (12) (2009) 811–817.

[48] M. Tavanpour, A. Khademzadeh, S. Pourkiani, M. Yaghobi, GBMAP: anevolutionary approach to mapping cores onto a mesh-based NoCarchitecture, Journal of Communication and Computer 7 (3) (2010) 1–7.

[49] G. Fen, W. Ning, Genetic algorithm based mapping and routing approach fornetwork on chip architectures, Chinese Journal of Electronics 19 (1) (2010)91–96.

[50] W. Jang, D.Z. Pan, A3MAP: Architecture-aware analytic mapping for Network-on-Chip, in: Asia and South Pacific Design Automation Conference (ASP-DAC),2010, pp. 523–528.

[51] N. Choudhary, M.S. Gaur, V. Laxmi, V. Singh, Energy aware designmethodologies for application specific NoC, in: Proceedings of NORCHIP,2010, pp. 1–4.

[52] N. Choudhary, M.S. Gaur, V. Laxmi, V. Singh, GA based congestion awaretopology generation for application specific NoC, in: IEEE InternationalSymposium on Electronics Design, Test, and Application, 2011, pp. 93–98.

[53] M.J. Sepulveda, M. Strum, W.J. Chau, A multi-objective adaptive immunealgorithm for NoC mapping, in: International Conference on Very Large ScaleIntegration (VLSI-SOC), 2009, pp. 193–196.

[54] M.J. Sepulveda, M. Strum, W.J. Chau, G. Gogniat, A multi-objective approachfor multi-application NoC mapping, in: IEEE Latin American Symposium onCircuits and Systems (LASCAS), 2011, pp. 1–4.

[55] I. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proceedings of IEEEInternational Conference on Neural Networks, NJ, 1995, pp. 1942–1948.

[56] W. Zhou, Y. Zhang, Z. Mao, Link-load balance aware mapping and routingfor NoC, WSEAS Transactions on Circuits and Systems 6 (11) (2007) 583–591.

[57] A.R. Fekr, A. Khademzadeh, M. Janidarmian, V.S. Bokharaei, Bandwidth/fault/contention aware application-specific NoC using PSO as a mapping generator,in: Proceedings of the World Congress on Engineering (WCE), vol. 1, 2010, pp.247–252.

[58] W. Lei, L. Xiang, Energy- and latency-aware NoC mapping based on discreteparticle swarm optimization, in: Proceedings of IEEE International Conferenceon Communications and Mobile Computing, 2010, pp. 263–268.

[59] A.H. Benyamina, P. Boulet, A. Aroul, S. Eltar, K. Dellal, Mapping real timeapplications on NoC architecture with hybrid multi-objective algorithm, in:International Conference on Metaheuristics and Nature Inspired, Computing,2010, pp. 1–10.

[60] P.K. Sahu, P. Venkatesh, S. Gollapalli, S. Chattopadhyay, Application mappingonto mesh structured Network-on-Chip using particle swarm optimization,in: IEEE International symposium on VLSI (ISVLSI), 2011, pp. 335–336.

[61] K. Wang, L. Huang, C. Zhou, W. Pang, Particle swarm optimization fortraveling salesman problem, in: Proceedings of the Second InternationalConference on Machine Learning and Cybermetics, 2003, pp. 1583–1585.

[62] Yuhui Shi, Russell Eberhart, Parameter Selection in Particle SwarmOptimization, Springer Berlin/ Heidelberg, vol. 1447/1998, 2006, pp. 591-600.

[63] A. Colorni, M. Dorigo, V. Maniezzo, Distributed optimization by ant colonies,actes de la première conférence européenne sur la vie artificielle, France,Elsevier Publishing, Paris, 1991. 134–142.

Page 16: A Survey on Application Mapping Strategies for Network-On-Chip Design

P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76 75

[64] J. Wang, Y. Li, S. Chai, Q. Peng, Bandwidth-aware application mapping forNoC-based MPSoCs, Journal of Computational Information Systems 7 (1)(2011) 152–159.

[65] N. Koziris, M. Romesis, P. Tsanakas, G. Papakonstantinou, An efficientalgorithm for the physical mapping of clustered task graphs ontomultiprocessor architectures, in: Proceedings of 8th Euro PDP, 2000, pp.406–413.

[66] A. Hansson, K. Goossens, A. Radulescu, A unified approach to constrainedmapping and routing on Network-on-Chip architectures, in: IEEE/ACMInternational Conference on Hardware/Software Codesign and System,Synthesis (CODES+ISSS), 2005, pp. 75–80.

[67] S. Saeidi, A. Khademzadeh, A. Mehran, SMAP: An intelligent mapping tool fornetwork on chip, in: International Symposium on Signals, Circuits andSystems (ISSCS), 2007, pp. 1–4.

[68] R. Mehran, S. Saeidi, A. Khademzadeh, A.A. Kusha, Spiral: a heuristic mappingalgorithm for network on chip, IEICE Electronics Express 4 (15) (2007) 478–484.

[69] T. Shen, C.H. Chao, Y.K. Lien, A.Y. Wu, A new binomial mapping andoptimization algorithm for reduced-complexity mesh-based on-chipnetwork, in: Proceedings of NOCS’07, 2007, pp. 317–322.

[70] M. Tavanpour, A. Khademzadeh, M. Janidarmian, Chain-mapping for meshbased Network-on-Chip architecture, IEICE Electronics Express 6 (22) (2009)1535–1541.

[71] Y. Chen, L. Xie, J. Li, An energy-aware heuristic constructive mappingalgorithm for network on chip, in: International Conference on ASIC(ASICON), 2009, pp. 101–104.

[72] A. Patooghy, H. Tabkhi, S.G. Miremadi, RMAP: a reliability-aware applicationmapping for Network-on-Chips, in: International Conference onDependability, 2010, pp. 112–117.

[73] B. Yang, T.C. Xu, T. Santti, J. Plosila, Tree-model based mapping for energy-efficient and low-latency Network-on-Chip, in: International Symposium onDesign and Diagnostics of Electronics Circuits and Systems (DDECS), 2010, pp.189–192.

[74] S. Tosun, New heuristic algorithm for energy aware application mapping androuting on mesh-based NoCs, Journal of System Architecture 57 (2011) 69–78.

[75] S. Murali, G. De Micheli, Bandwidth constrained mapping of cores onto NoCarchitectures, in: Proceedings of Design, Automation and Test in EuropeConference and Exhibition (DATE), vol. 2, 2004, pp. 896–901.

[76] S. Murali, G. De Micheli, SUNMAP: a tool for automatic topolog selection andgeneration for NoCs, in: Proceedings of 41st Design Automation Conference(DAC), 2004, pp. 914–919.

[77] K. Srinivasan, K.S. Chatha, A technique for low energy mapping and routing inNetwork-on-Chip architecture, in: IEEE International Symposiun on LowPower Electronics and Design (ISLPED), 2005, pp. 387–392.

[78] C. Marcon, N. Calazans, F. Moraes, A. Susin, I. Reis, F. Hessel, Exploring NoCmapping strategies: an energy and timing aware technique, in: Proceedingsof Design, Automation and Test in Europe Conference and Exhibition (DATE),vol. 1, 2005, pp. 502–507.

[79] C. Marcon, A. Borin, A. Susin, L. Carro, F. Wagner, Time and energy efficientmapping of embeded applications onto NoCs, in: Proceedings of Asia andSouth Pacific Design Automation Conference (ASP-DAC), vol. 1, 2005, pp. 33–38.

[80] C.A.M. Marcon, E.I. Moreno, N.L.V. Calazans, F.G. Moraes, Comparison ofNetwork-on-Chip mapping algorithms targeting low energy consumption,IET Computer & Digital Technique 2 (6) (2008) 471–482.

[81] C.A.M. Marcon, J.C.S. Palma, A.A. Susin, R.A.L. Reis, N.L.V. Calazans, F.G.Moraes, Modeling the traffic effect for the application cores mapping problemonto NoCs, VLSI-SoC International Federation for Information Processing, vol.240/2007, 2007, pp. 179–194.

[82] H.M. Harmanani, R. Farah, A method for efficient mapping and reliablerouting for NoC architectures with minimum bandwidth and area, in: IEEEInternational Workshop on Circuits and systems and TAISA Conference(NEWCAS-TAISA), 2008, pp. 29–32.

[83] Z. Lu, L. Xia, A. Jantsch, Cluster-based simulated annealing for mapping coresonto 2D mesh Networks on Chip, in: Proceedings of Design and Diagnostics ofElectronic Circuits and Systems (DDECS), 2008, pp. 1–6.

[84] H. Elmiligi, A.A. Morgan, M.W.E. Kharashi, F. Gebali, Power-aware topologyoptimization for Network-on-Chips, in: IEEE International Symposium onCircuits and Systems, 2008, PP. 360–363.

[85] A. Morgan, H. Elmiligi, A.M.W.E. Kharashi, F. Gebali, Application-specificnetworks-on-chip topology customization using network partitioning, in: 1stInternational Forum on Next-generation Multicore/manycore Technologies,2008.

[86] H. Elmiligi, A.A. Morgan, M.W.E. Kharashi, F. Gebali, Power optimization forapplication-specific networks-on-chips: a topology-based approach, Journalof Microprocessor and Microsystems 33 (2009) 343–355.

[87] M.Y. Yu, M. Li, J.J. Song, F.F. Fu, Y.X. Bai, Pipelining-based high throughput lowenergy mapping on Network-on-Chip, in: Euromicro InternationalConference on Digital System Design/Architectures, Methods and Tools,2009, pp. 427–432.

[88] M. Janidarmian, A. Khademzadeh, M. Tavanpour, Onyx: a new heuristicbandwidth-constrained mapping of cores onto network on chip, IEICEElectronics Express 6 (1) (2009) 1–7.

[89] S. Saeidi, A. Khademzadeh, F. Vardi, Crinkle: a heuristic mapping algorithmfor network on chip, IEICE Electronics Express 6 (24) (2009) 1737–1744.

[90] X. Wang, M. Yang, Y. Jiang, P. Liu, Power-aware mapping for Network-on-Chiparchitectures under bandwidth and latency constraints, in: InternationalConference on Embedded and Multimedia Computing (EM-COM), 2009, pp.1–6.

[91] X. Wang, M. Yang, Y. Jiang, P. Liu, Power-aware mapping approach to map IPcores onto NoCs under bandwidth and latency constraints, ACM Transactionson Architecture and Code Optimization 7 (1) (2010) 1–30.

[92] M. Janidarmian, A. Khademzadeh, A.R. Fekr, V.S. Bokharaei, Citrine: amethedology for application-specific Network-on-Chips design, in:Proceedings of World Congress on Engineering and Computer Science, vol.1, 2010, pp. 196–202.

[93] B. Yang, L. Guang, T.C. Xu, T. Santti, J. Plosila, Multi-application mappingalgorithm for Network-on-Chip platforms, in: IEEE 26th Convention ofElectrical and Electronics Engineers in Israel (IEEEI), 2010, pp. 540–544.

[94] P.K. Sahu, N. Shah, K. Manna, S. Chattopadhyay, A new application mappingalgorithm for mesh based Network-on-Chip design, in: IEEE InternationalConference (INDICON), 2010, pp. 1–4.

[95] B. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs,Bell System Technical Journal 49 (2) (1970) 291–307.

[96] Z. Chang, G. Xiong, N. Sang, Energy-aware mapping for tree-based NoCarchitecture by recursive bipartitioning, in: International Conference onEmbedded Software and Systems (ICESS), 2008, pp. 105–109.

[97] D. Majeti, A. Pasalapudi, K. Yalamanchili, Low energy tree based network onchip architectures using homogeneous routers for bandwidth and latencyconstrained multimedia applications, in: International Conference onEmerging Trends in Engineering and Technology (ICETET), 2009, pp. 358–363.

[98] P.K. Sahu, N. Shah, K. Manna, S. Chattopadhyay, An application mappingtechnique for butterfly-fat-tree Network-on-Chip, in: IEEE InternationalConference on Emerging Applications and Information Technology (EAIT),2011, pp. 383–386.

[99] P.K. Sahu, N. Shah, K. Manna, S. Chattopadhyay, A new application mappingstrategy for mesh-of-tree based Network-on-Chip, in: IEEE InternationalConference on Emerging Trends in Electrical and Computer Technology(ICETECT), 2011, pp. 518–523.

[100] R.P. Dick, D.L. Rhodes, W. Wolf, TGFF: task graphs for free, in: Proceedings ofInternational Workshop on Hardware/Software Codesign, 1998.

[101] R. Tornero, J.M. Orduna, A. Mejia, J. Flich, J. Duato, CART: communication-aware routing technique for application-specific NoCs, in: IEEE EuromicroConference on Digital System Design Architecture, Methods and Tools, 2008,pp. 26–31.

[102] R. Tornero, J.M. Orduna, A. Mejia, J. Flich, J. Duato, A communication-drivenrouting technique for application-specific NoCs, International Journal ofParallel Programming 39 (3) (2011) 357–374.

[103] R. Tornero, S. Kumar, S. Mubeen, J.M. Orduna, Distance constrained mappingto support NoC platforms based on source routing, in: Workshop on HighlyParallel Processing on a Chip (HPPC), 2009, pp. 8–17.

[104] R. Tornero, V. Sterrantino, M. Palesi, J.M. Orduna, A multi-objective strategyfor concurrent mapping and routing in Networks on Chip, in: IEEEInternational Symposium on Parallel and Distributed Processing (IPDPS),2009, pp. 1–8.

[105] M. Palesi, R. Holsmark, S. Kumar, A methodology for design of applicationspecific deadlock-free routing algorithms for NoC Systems, in: ACMInternational Conference on Hardware/Software Codesign and SystemSynthesis, 2006, pp. 142–147.

[106] M. Palesi, R. Holsmark, S. Kumar, V. Catania, Application specific routingalgorithms for network on chip, IEEE Transactions on Parallel and DistributedSystems 20 (3) (2009) 316–330.

[107] J. Hu, R. Marculescu, Energy-aware communication and task scheduling forNetwork-on-Chip architectures under real-time constraints, in: DesignAutomation and Test in Europe Conference and Exhibition, vol. 1, 2004, pp.234–239.

[108] J. Hu, R. Marculescu, Communication and task scheduling of application-specific Network-on-Chip, IEE Proc. Compute. Digit. Tech. 152 (5) (2005)643–651.

[109] S. Stuijk, T. Basten, M. Geilen, A.H. Ghamarian, B. Theelen, Resource-efficientrouting and scheduling of time-constrained Network-on-Chipcommunication, in: Proceedings of EUROMICRO Conference on DigitalSystem Design, 2006, pp. 45–52.

[110] A. Mello, N. Calazans, F. moraes, Rate-based scheduling policy for QoS Floowsin network on chip, in: International Conference on Very large ScaleIntegration (VLSI-SoC), 2007, pp. 140–145.

[111] A. Raina, V. Muthukumar, Traffic aware scheduling algorithm for network onchip, in: International Conference on Information Technology, 2009, pp. 877–882.

[112] W. Hu, X. Tang, B. Xie, T. Chen, D. Wang, An efficient power-awareoptimization for task scheduling on NoC-based many-core System, in: IEEEInternational Conference on Computer and Information Technology (CIT),2010, pp. 171–178.

[113] H. Yu, Y. Ha, B. Veeravalli, Communication-aware application mapping andscheduling for NoC-based MPSoCs, in: IEEE International Symposium onCircuits and Systems (ISCAS), 2010, pp. 3232–3235.

[114] Y. Xie, W.L. Hung, Temperature-aware task allocation and scheduling forembeded multiprocessor system-on-chip (MPSoC) design, Journal of VLSISignal Processing 45 (2006) 177–189.

[115] A. Jalabert, S. Murali, L. Benini, G.D. Micheli, xpipesCompiler: a tool forinstantiating application specific network on chip, in: Proceedings of Design,

Page 17: A Survey on Application Mapping Strategies for Network-On-Chip Design

76 P.K. Sahu, S. Chattopadhyay / Journal of Systems Architecture 59 (2013) 60–76

Automation and Test in Europe Conference and Exhibition (DATE), 2004, pp.884–889.

[116] D. Bertozzi, L. Benini, xpipes: a Network-on-Chip architecture for gigascalesystem-on-chip, in: IEEE Circuits and Systems Magazine, 2004, pp. 18–31.

[117] L. Bononi, N. Concer, M. Grammatikakis, NoC topology exploration based onsimulation models, in: Euromicro Conference on Digital System DesignArchitecture, Methods and Tools (DSD), 2007, pp. 543–546.

[118] F. Pellegrini, SCOTCH and LibScotch 4.0 User’s Guide.[119] J. Joven, O.F. Bach, D.C. Rufas, R. Martinez, L. Teres, J. Carrabina, xENoC – an

experimental Network-on-Chip environment for parallel distributedcomputing on NoC-based MPSoC architecture, in: Euromicro Conference onParallel, Distributed and Network-based Processing, 2008, pp. 141–148.

[120] J. Chan, S. Parameswaran, NoCGEN: a templet based reuse methodology fornetwork on chip architecture, in: International Conference on VLSI Design,2004, pp. 717–720.

[121] E.A. Carara, R.P. de Oliveira, N.L.V. Calazans, F.G. Moraes, HeMPS – aframework for NoC based MPSoC generation, in: IEEE Symposium onCircuits and Systems (ISCAS), 2009, pp. 1345–1348.

Pradip Kumar Sahu is a PhD student in the Departmentof Electronics and Electrical Communication Engineer-ing at Indian Institute of Technology, Kharagpur. Hisresearch interests include Network-on-Chip architec-ture design and Application mapping in 2-D and 3-Denvironments, Performance and Cost Evaluation, andPower-Performance-Reliability trade-off.

Santanu Chattopadhyay is currently Professor in theDepartment of Electronics and Electrical Communica-tion Engineering at Indian Institute of Technology,Kharagpur. He received the PhD degree in ComputerScience and Engineering from Indian Institute of Tech-nology Kharagpur in 1996. Before joining IIT Kharagpur,he was with Indian Institute of Technology, Guwahati.His research interests include CAD tools for low powercircuit design and test, Systemon- Chip testing, Net-work-on-Chip design and test. He has more than 120publications in refereed international journals andconferences. He is the co-author of the book on ‘‘Addi-

tive Cellular Automata – Theory and Applications’’, published by the IEEE ComputerSociety Press in 1997.