Why you need to bet on Oracle Cloud Infrastructure as THE Cloud for your HPC needs

in #flexibility6 years ago

You’d imagine, with the growth of the public cloud, that the majority of the HPC workloads and applications would have transitioned to the cloud; however, almost all enterprise HPC workloads are still running in on-premises datacenters. This means millions of mission critical use-cases such as engineering crash simulations, cancer research, visual effects and new cutting edge workloads such as deep learning in Artificial Intelligence (AI) are still constrained by on-premise environments.

What’s stopping these HPC workloads from moving to the cloud?

Simply – bad or incomplete cloud infrastructure solutions – inconsistent performance, no flexibility, high costs and no integration. If cloud infrastructure were as good as it needed to be, all these workloads would already be in the cloud. But they’re not. There is still clearly a lot of innovation to be done to move entire HPC and AI workloads and applications to the cloud.

Enterprise HPC workloads have specialized needs. Traditional cloud providers don’t support these. If you want to run the most demanding HPC, AI or Database workloads, you need clusters of servers working as a single piece of infrastructure. Most cloud providers see this as a hard problem. Oracle solved these challenges on-premises 10 years ago with Exadata. What made Exadata so good? We built a clustered network, connected high speed compute and storage, and wrote software to optimize it end-to-end for performance and security.

Today, we’re going to solve this problem for customers!

First, we’re starting with announcing a brand-new capability called “Clustered Networking”. Clusters seem like an old idea, but everyone still runs them on-premises for their tough workloads: HPC Clusters, AI Research GPU Clusters, Simulation Clusters, etc. With Oracle Cloud, customers no longer need expensive, specialized networking gear on-premises. Customers can now get single digit micro-second latency and a 100G bandwidth with the first and only public cloud provider with a bare-metal RDMA capability. You can now migrate workloads into Oracle Cloud with better performance than on-premises or any other cloud provider. None of our competitors offer anything close. A cloud provider like Microsoft Azure offers a more expensive and niche solution with their H-Series Instances. You don’t have to compromise anymore!

As part of Clustered Networking capability, we are announcing a new set of HPC instances available in preview today in our London (UK) and Ashburn (US) regions with expansion into other regions in the future:

These new HPC instances are powered by Intel® Xeon® processors with 3.7Ghz all-core frequency. Additionally, to support local data check-pointing for MPI workloads or local file access for cutting edge Deep Learning workloads; these instances also contain local NVMe SSD storage for predictable high-performance IO.

Additionally, we’ve worked with Mellanox to deliver 100G RDMA capability with ultra-low latency for MPI workloads supporting all market leading MPI frameworks including IntelMPI, OpenMPI or PlatformMPI. This is truly new ground-breaking innovation no other cloud provider has been able to solve at this scale.

“As organizations look to ensure they stay ahead of the competition, they are looking for more efficient services to enable higher performing workloads. This requires fast data communication between CPUs, GPUs and storage, in the cloud,” said Michael Kagan, CTO, Mellanox Technologies. “Over the past 10 years we have provided advanced RDMA enabled networking solutions to Oracle for a variety of its products and are pleased to extend this to Oracle Cloud Infrastructure to help maximize performance and efficiency in the cloud.”

Finally, we’re excited to offer these new instances in the cloud with leading on-demand cost of $0.075 cents per core hour. You no longer need to spend hundreds of millions of dollars on purpose-built super computers like Cray when you have on-demand HPC Clusters in Oracle CIoud Infrastructure for a couple of dollars an hour!

Further innovation and commitment to Artificial Intelligence

If you’re a data scientist or an AI developer, we’ve got great news. You will be able use our RDMA Clustered Network along with new GPU instances based on the HGX-2 architecture, providing over 1 petaflop of performance! With these new instances, Oracle Cloud becomes the first cloud provider with 32GB Tesla Volta GPUs along with the new NVSWITCH based architecture. We’ve plugged these GPUs into our Clustered Network as well, so that customers can launch GPUs with a click of their finger and are able to enable workloads utilizing RDMA across 1000s of GPUs!

These new instances will be available in 2019 in our major regions globally at launch:

Our HPC ISV Ecosystem

At the recent Altair Global Conference – we also announced a new collaboration with Altair to offer HyperWorks CFD Unlimited as a new ground-breaking Engineering Simulation on Oracle Cloud Infrastructure. This new ground-breaking service offers computational fluid dynamics (CFD) solvers as a service in Oracle. Advanced CFD solvers such as Altair ultraFluidX™ and Altair nanoFluidX™ are optimized on Oracle to provide overnight simulation results for the most complex cases on a single server. You can find out more information on this service at https://www.altair.com/oracle.

“We are excited to expand our relationship with Oracle,” said Sam Mahalingam, Chief Technical Officer for Enterprise Solutions at Altair. “We find that access to GPU compute resources can be challenging for our customers. The integration with Oracle’s cloud platform addresses this challenge, and provides customers the ability to use GPU-based solvers in the cloud for accelerated performance without the need to purchase expensive hardware. Ultimately this leads to improved productivity, optimized resource utilization, and faster time to market.”

Come see us at Supercomputing Conference 2018

We’re extremely proud and excited to be showcasing these new capabilities this week at Supercomputing Conference in Dallas with our partners and customers in full force. Come see us at booth #2806 to talk to our engineering and product teams, get free credits and hands on demos.

Some of the other activities you should check out:

Come talk to us at the Oracle + Altair Happy Hour on Tuesday at 5pm
HPC instance demos in Intel’s booth #3223 and a tech talk on the 15th November at 12.00pm.
AMD instance demos at the AMD booth #2824 including a presentation on 13th November at 11am.
Altair HyperWorks CFD Unlimited Presentation at booth #2833 at 11.30am on Tuesday 13th November.
NVIDIA’s Theater (Booth #2417, Hall D) on Wednesday 14th November at 3pm
HPC and AI at your fingertips…

Most cloud infrastructure is just a set of unrelated and commodity parts. It’s the enterprise’s responsibility to figure out which parts will work, and which portions of the application need to be rebuilt for the cloud. Oracle Cloud enables customers to run new AI workloads, next to HPC workloads, next to database workloads, next to traditional applications. We’ve figured out how to run the hardest pieces of your applications, so you don’t have to. We provide the performance that enterprises need, with guarantees you require. We’re enabling HPC in a way that no other cloud can match. And we’re charging less for it.

See you in Dallas!