How to be disruptive with Software Defined Storage
Current scenario of a major service provider:
- storage platform based on 2x EMC Vplex systems each equipped with 90 TB of traditional HDD storage;
- backed up by a VNX system with a SSD Fast Cache.
This Infrastructure was ageing, support was expiring, and needed more performance and a better TCO over 5 years.
A traditional storage vendor wouldn’t have been successful: a disruptive Software Defined Storage solution was needed, based on a full flash system made of HGST NVMe SSDs and 2U24 JBOFs.
So, we proceeded: the SDS system is built around 2 nodes made of 2U servers each with 256GB of DRAM, 2x 7.68TB HGST SN260 NVMe SSDs, and connected to 1x HGST 2U24 JBOF equipped with 24x SS200 SAS SSDs with a 7.68TB capacity.
The drives are connected to a HW RAID controller and organized on a RAID 50 of two times 11x drive and with 2x SS200 SSDs as hot spare.
The SSDs within the JBOF are then partitioned in 4x logical volumes providing a total 140TB net capacity.
The SDS is based on DataCore SAN Symphony 10 PSP7 which auto tiers between the 2x NVMe 7.68TB (not mirrored and set up as Tier-0) and the other tier (Tier-1) based on the SSDs on the 2U24 JBOF.
This setup is completely redundant being based on a 1+1 scheme between the 2 servers. Moreover if an NVMe card fails, the 2U24 is flushed all the data.
The two servers are connected back to back via 2x 16Gb/s Fiber channel links, while customer’s infrastructure is connected through 2x (per storage node) 8Gb/s Fiber channel links.
The storage system is then exposed to 20x VMware Vsphere 6.5 servers plus some bare metal servers.
The solution has shown to be extremely resilient sustaining:
1) Disconnection of the FC front end on one server
2) Disconnection of the backend connection
3) Disconnection of one SAS cable and showing that multipath works.
4) Disconnection of the 2x SAS cables, isolating the 2U24 completely.
The storage solution always worked and recovered the failure scenario.
Installation of the whole setup was extremely simple, taking just 1 day to install the SW, create the RAID and expose the storage pool to Windows (and Datacore) and the servers.
Monitoring at a high level is handled by Datacore SAN Symphony, while the RAID controller manager performs the low level control.
Performance testing showed amazing performance with a 10TB rebuild completed in less than 1H and a bandwidth of >1.5GB/s measured using GrafANA test platform.
Considering the 2 x 8 Gb/s connections to the host servers, this excellent performance is basically showing the saturation of the bandwidth by the HGST and Datacore SAN SYMPHONY solution.