Research
Climate Change at MIT (PDF)
The Scientific Data Flood: A Case Study of "How Much Information?"
Stuart Madnick, John Norris Maguire Professor of Information Technology, MIT Sloan School of
Management & Professor of Engineering Systems, MIT School of Engineering
MacKenzie Smith, Associate Director of Technology, MIT Libraries
Kate Clopeck, Masters of Science, Technology and Policy Program, MIT
June 2009
Abstract:
This case study provides an early look into the data growth projections for the embryonic Earth System
Initiative (ESI) at MIT. ESI is an umbrella initiative facilitating the development of large scale research
efforts in Earth system science and engineering. The case study focuses on the first project initiated
under ESI, the Darwin Project. Darwin is focused on the large scale modeling of the physical and
biological processes in the oceans. Numerical models produce approximately 90% of the data generated
by ESI scientists; the remaining 10% of data is observational data recorded by NASA satellites or NOAA
oceanographers. ESI’s 25 researchers generated approximately 200 terabytes of data in the last year,
primarily from high resolution calculations that model the physical and biological processes occurring
in a specific sector of the ocean. For example, one high resolution calculation will occupy from one to
two months of machine time and produce 60 terabytes of data. Increases in the Laboratory’s computing
power and storage capacity have helped drive data to increase by a factor of 100 in five years. At this
rate, in the next five years ESI data production could reach 20 petabytes annually. The case concludes
with notes on data retention policies and metadata creation and use. Other papers examine other labs at
MIT. |