With the ability to deliver 40Gbps, InfiniBand offers more bandwidth than Ethernet or Fibre Channel for connecting compute clusters or data centers, but it has often been viewed as a technology that is only for geeks. In today’s InfiniBand products, however, automated wizard-driven installation, verification and parameter-based monitoring greatly simplify proper configuration and ongoing management of the cluster.
Growing Clusters, Growing Issues
The communications bottleneck in high-performance computing clusters is getting worse as clusters grow larger. A generation ago, InfiniBand switches were limited to a maximum of 288 ports in a single chassis; today, a comparable director class switch provides up to 864 ports. With Moore’s Law enabling denser multi-core processors, each of these nodes generates much more data than ever before.
InfiniBand delivers 40Gbps with the lowest available latency. The challenge is to make it usable. As clusters grow in size, the volume of communications between cores grows exponentially, and it becomes more difficult to configure and manage the network for optimum performance. With an increase in active cores in the cluster, it is easy to see how fabric efficiency tends to decline as the number of nodes increases.
Fabric Management Software on the Rise
To address these challenges, InfiniBand switch and adapter vendors have improved fabric management software to simplify the job of optimizing and managing the fabric. Four key capabilities in fabric software contribute to the solution: fabric tools, adaptive and dispersive routing, and virtual fabrics.
Fabric tools help eliminate the geek factor in InfiniBand by automatically configuring and managing the network. They typically include a wizard-driven setup and discovery function that makes InfiniBand as easy to deploy as Ethernet. In addition, the toolkit includes management tools to identify trouble spots in the fabric, including a graphical depiction of the fabric and a real-time congestion monitor.
Every network manager wants to get the best performance from the fabric at all times. Adaptive routing minimizes the impact of congestion on the fabric. Most high-performance computing fabrics are designed to enable multiple paths between switches, but standard InfiniBand switches don’t necessarily take advantage of these paths to reduce congestion. Adaptive routing shifts network traffic from over-utilized links to less utilized links so that a bottleneck in one path doesn’t cripple the flow of data.
Dispersive routing is another technique used to optimize fabric performance. It distributes traffic over multiple paths to a destination, thereby load balancing the fabric. The most advanced implementations of this capability go beyond simple load balancing by minimizing the potential for out-of-order packets sent via disparate routes so that packets can be reassembled in the proper order for processing at their destination.
Virtual fabrics provide the ability to segregate traffic into different priority classes. A user may have different jobs that require different priorities, or he may decide to separate different traffic types into differing priority classes—such as compute traffic, storage traffic and management traffic. With the virtual fabrics capability, users can partition traffic flows to make sure traffic with high bandwidth requirements doesn’t interfere with traffic requiring low latency.
Automated fabric configuration, optimization, and management tools supplied with InfiniBand switches and adapters can make this technology a logical choice for a broader set of users.
David Smith is Senior Product Manager, InfiniBand Products, at QLogic. Contact him via firstname.lastname@example.org.