Wednesday, 3rd May

10:15 - 11:00
Ceph as Monash University's research data engine (45 minutes session) | Breakout session
In 2015 Monash University built the southern hemisphere's largest (known) Ceph cluster containing 5PB of raw storage across 40 Dell servers with 1100 OSDs. Today that cluster is at almost 7PB, and services a wide variety of research workloads from well over 1000 users across Monash's research intensive Clayton precinct. Prior to this large deployment we cut our teeth on Ceph on its home-turf - as block storage for OpenStack Cinder within Monash2019s node of the Nectar Research Cloud (a federation of distributed OpenStack cells) - that was in 2013 and since then we2019ve seen Ceph2019s popularity and community grow, with the first APAC Ceph Day hosted at Monash in late 2015. At Monash Red Hat Ceph Storage replaces legacy SAN storage for research computing, servicing use-cases from desktop CIFS file-shares to instrument data-capture and management, via HPC cluster home-directories and software libraries. The wide-variety of use-cases in the research space are what attracted us to Ceph in the first place - a storage unicorn that could service block, object and file workloads was a bandwagon we wanted to be on! Furthermore, the inherent difficulty in predicting the capacity and performance needs meant that a software-defined approach with no vendor lock-in was a key requirement for controlling costs and ensuring virtually unlimited scalability. In this talk we'll cover the high-level business and strategic challenges that led us to this solution and discuss how with first Inktank, then Red Hat2019s support we have tackled the challenges of architecting, deploying, and operating a cluster of this size (spoiler: it isn't that hard). We'll dive into some of the issues and pitfalls we've faced along the way and talk about some of the key use-cases and scientific data workflows that this infrastructure supports, for example, data capture, management and sharing from various microscopes across several facilities and platforms. To wrap up we'll look at our current progress in rolling out CephFS for production use-cases and detail how we managed to keep the whole thing in-service whilst it was physically moved to a new data-centre - this relocation is being planned now and will be complete by the time of the summit, so come along to find out whether there was a happy ending!