Research

Climate Change at MIT (PDF)
The Scientific Data Flood: A Case Study of "How Much Information?"

Stuart Madnick, John Norris Maguire Professor of Information Technology, MIT Sloan School of Management & Professor of Engineering Systems, MIT School of Engineering

MacKenzie Smith, Associate Director of Technology, MIT Libraries

Kate Clopeck, Masters of Science, Technology and Policy Program, MIT

June 2009

Abstract:
This case study provides an early look into the data growth projections for the embryonic Earth System Initiative (ESI) at MIT. ESI is an umbrella initiative facilitating the development of large scale research efforts in Earth system science and engineering. The case study focuses on the first project initiated under ESI, the Darwin Project. Darwin is focused on the large scale modeling of the physical and biological processes in the oceans. Numerical models produce approximately 90% of the data generated by ESI scientists; the remaining 10% of data is observational data recorded by NASA satellites or NOAA oceanographers. ESI’s 25 researchers generated approximately 200 terabytes of data in the last year, primarily from high resolution calculations that model the physical and biological processes occurring in a specific sector of the ocean. For example, one high resolution calculation will occupy from one to two months of machine time and produce 60 terabytes of data. Increases in the Laboratory’s computing power and storage capacity have helped drive data to increase by a factor of 100 in five years. At this rate, in the next five years ESI data production could reach 20 petabytes annually. The case concludes with notes on data retention policies and metadata creation and use. Other papers examine other labs at MIT.