19 - Backup and Archival Data Storage Service for Science. From system model to backup/archive service in PLATON

Maciej Brzezniak, Gracjan Jankowski, Michal Jankowski (PSNC)

Backup and Archival Data Storage Service provides 15 PB of data storage space to education and academia users through popular data access protocols such as SFTP, WebDAV, Web GUI and GridFTP. The service is operated by members of the PIONIER network, the Polish NREN, under the PLATON project (Service Platform for e-Science). The service addresses the users’ need of protecting large amounts (in range of tens of TBs) the valuable data and storing them for a long-time (dozen of years) in safe and reliable way.
The service logic and functionality is provided by the National Data Storage software stack. NDS concept assumes distributed data storage with automatic and transparent data and meta-data replication. Data replication assures that each portion of data is synchronously or asynchronously stored in desired number of replicas across so-called Storage Nodes in local and distant geographical locations. Performance optimizations including automatic selection of NFS vs GridFTP protocols for efficient replica access is performed in order to assure high data access throughput and good responsiveness to small IOs. Meta-data are protected against failures by using the combination of DBMS-level data replication and synchronous operations logs written to distributed persistent storage, which assures both data integrity and I/O efficiency.
The data and meta-data handling logic of NDS is hidden from users behind the abstract, virtual file system interface (based on the FUSE library) on top of which the standard data access protocols are run, including SFTP, WebDAV, Web GUI (HTTPs) and GridFTP. This enables users to easily store and retrieve data in the remote system, using their favourite data transmission methods and enables easy integration of the services with standard backup/archive tools.
The poster discusses the main data threats such as data volume growth, volatility of the data as well as storage technologies limitations and data migration issues. The poster also shows the main elements of the NDS concept and presents how NDS is deployed in the PLATON project infrastructure and the PIONIER academic optical network. Critical components of the data storage infrastructure of PLATON are presented, including servers, disk arrays and tape libraries, along with their capacity and performance parameters. The poster also shows how the PIONIER network enables the users to efficiently store and access the data and how the broad-band network links are used in order to perform efficient and reliable replication across the multiple system sites.

Download file