Network computing was much simpler in the ‘90s, when most computers—desktops, laptops, and servers—came with single-core processors. In general, each node—usually identified as a single physical box in the network—came with a single computing core. Now, with multi-core processors becoming the norm in the consumer market as well as professional market, a single node could comprise as many as 12 computing cores.
As explained in this diagram, when a user feels his/her local resources are no longer enough, he/she has the option to tap into Altair’s cloud-hosted HPC environment to run the job.
If multi-core CPUs add complexity, they also open up new opportunities. One is to strategically manage and prioritize your computing demands using job schedulers. Another is to bundle your existing multi-core machines into a virtual cluster, ready to perform rendering, animation, analysis, and other heavy-duty tasks during off hours and weekends. These introduce what global engineering firm Altair and many others call desktop cycle harvesting, a way to reap more from your hardware investment.
HPC Management for All Sizes
PBS Works, Altair’s portable batch system software suite for distributed computing, can be used for setting up and managing high-performance computing (HPC) systems with as many as 120,000 cores, and as few as four to eight cores.
PBS Works used to be one behemoth application encompassing workload management, cluster setup, and cloud-computing access. It has been refashioned into three complementary components:
- PBS Professional (commercial grade HPC workload management)
- PBS Analytics (usage monitor, reports, and planning)
- PBS Catalyst (job submission application)
The drag-and-drop job management interface, PBS Catalyst, is available as a desktop client or as a browser-based web client. This client functions as a portal to tap into the processing power available in your local machine or your company’s internal network. If the internal computing capacity is not sufficient to process your jobs, you may choose to reach into Altair’s on-demand computing environment through the client. Altair offers PBS Catalyst as a free download for managing up to four local cores.
“We’re taking the technology that was once exclusive to large organizations and bringing it to the smaller ones that don’t have all the necessary expertise,” says Robert Walsh, Altair’s director of business development for PBS GridWorks products.
PBS Works consists of a server, a scheduler, and machine-oriented mini-servers (MOMs). The server creates, monitors, and tracks job batches. The scheduler houses policies (administrative rights, credentials, access granted, and others).
The MOMs (mothers of all execution jobs) monitor the execution nodes’ native resources (CPUs, disk, etc.) and custom resources (for instance, tagging a node with Altair’s Radioss FEA program will tell the scheduler to route Radioss jobs to that node). They also monitor jobs in progress and help clean up the nodes where jobs are running. This allows the next jobs to run on those nodes without competing with leftover job remnants.
Drag and Drop Job Scheduler
The tabulated interface of PBS Catalyst shows you a summary of jobs generated, along with a list of local and remote computing resources (HPC nodes) at your disposal. You may configure the client to query the backend systems at regular intervals for updated lists of resources and jobs. Through the client interface, you may point to the specific HPC system to use and the priority level it gets in the queue.
While the PBS Catalyst client lets you customize and manage your own local resources, you won’t be able to do the same with the backend distributed networks (the company’s internal network and the remote cloud network provided by Altair). They are, however, available for you to customize and submit your jobs to within the boundaries set by IT administrators.
“The reasoning behind this approach is, we’re giving engineers the ability to manage their own local resources, but the backend resources are shared resources managed by IT staff, so users won’t have the ability to do whatever they want to those systems,” explains Walsh. These configurations are left to network administrators and IT managers with the right credentials to edit them via PBS Professional.
If you run a certain job repeatedly, you may save the job setup with preferred parameters in your local profile. The next time you need to run the job, you can simply drag and drop it—with all your saved settings—into the appropriate queue.
Insight from History
The purpose of PBS Analytics is to let users monitor and study job queues, resources accessed, time requested, and other historic patterns to help develop enterprise-wide HPC strategies. Walsh says PBS Analytics might help you better answer the following: Are you short on licenses or computing cores? How many jobs are you running? Who’s running most of the failed jobs? Do they need additional training?
In the future, it may also play a role in PBS Works R&D team’s efforts to tackle one of the most challenging problems with distributed computing: estimating time required to complete jobs based on user history.
Altair develops and markets its own computer-aided engineering (CAE) solution, HyperWorks, but PBS Works is designed with an open architecture to allow users to manage job scheduling with third-party software. “Even though I’m at a CAE company, I work with all the solvers from others, from Abaqus (from Dassault Systèmes) and MSC (Nastran, Patran, etc.) to ANSYS. They’re all our partners,” says Walsh.
FAQs for HPC
Fielding daily inquiries from business prospects, Robert Walsh, Altair’s director of business development for the PBS GridWorks product line, has noticed certain issues tend to be on most HPC buyers’ mind. Here are a few, with the responses he usually gives.
Question: If I want to set up my own private HPC system, do I need to acquire machines with the same type of CPUs? Or can I put together a system with a mix of CPUs? (For example, CPUs with varying number of cores and speed.)
Answer: It depends on the application you want to execute on the system. With some analysis and simulation software, the floating point calculation method changes depending on the CPU architecture it’s running on. So if you’re repeatedly running the same analysis scenario to compare results, you may need to make sure you’re running it on the same platform. If you want to run a job on eight cores, comprising four newer CPUs and four old ones, the newer CPUs will finish the job much quicker than the older (and slower) ones. So the newer CPUs will be forced to wait for the older CPUs to catch up. So if you’d like to retain older machines you’ve purchased as distributed computing resources to use in conjunction with newer ones, you can, but the older machines may not be the best resources. You’ll need a job scheduling program like PBS Catalyst to designate the appropriate cores to different jobs.
Q: Can PBS Works give me an estimated amount of time a certain job will take to complete based on the number of cores designated and the volume of data involved?
A: If you run a job, say an analysis scenario, on machine X, then run it again on machine Y with additional degrees of freedom or with different constraints, the time it takes to complete will be different. It’s almost impossible to make an accurate prediction [of how long each variation will take]. In PBS Works, we display wall clock time—not CPU cycle time, but clock time—so if the user specifies that he/she wants to run a job for two or four hours, we can tell when the next job will start running. What we’re working on now is to be able to take into account your past history—how long similar kinds of jobs took when you ran them previously—to predict how long your current job will take. But that’s in the future. Today, time estimates are based on user input only.
Q: How do I get billed for using on-demand computing?
A: If you’re using PBS Catalyst to submit jobs to your local machine or private HPC clusters, you only pay for the units (licenses) of PBS Works you’re checking out.
Q: Can I use PBS Works to schedule works on GPU clusters?
A: Absolutely. Today we don’t even charge you for GPU management. So, for example, you have a machine with a quad-core CPU and a 128-core GPU, you’re only paying for using the licenses of PBS Works you run on the CPU.
Kenneth Wong writes about technology, its innovative use, and its implications. One of DE’s MCAD/PLM experts, he has written for technology magazines and writes DE’s Virtual Desktop blog at deskeng.com/virtual_desktop/. You can follow him on Twitter at KennethwongSF, or send e-mail to DE-Editors@deskeng.com.