Research
Materials Science and Engineering at MIT (PDF)
The Scientific Data Flood: A Case Study of "How Much Information?"
Stuart Madnick, John Norris Maguire Professor of Information Technology, MIT Sloan School of
Management & Professor of Engineering Systems, MIT School of Engineering
MacKenzie Smith, Associate Director of Technology, MIT Libraries
Kate Clopeck, Masters of Science, Technology and Policy Program, MIT
June 2009
Abstract:
This case study gives examples of how data is created and stored by material scientists and engineers
at MIT. The amount of data depends on specific research goals and the tools, experimental techniques,
and computational methods employed by the individual researcher. Both simulation and experiments
are used, with the simulations producing more data in the cases reported here. The ratio between
computation and data production varies widely. For example, a hundred million-atom simulation might
produce only a few kilobytes of data. However, if the researcher wants to track the system at every time
step, a much “smaller” simulation (fewer atoms) could generate petabytes of data. Data is retained
very differently in different labs. For example, in one lab, research data is stored on the students’ and
postdocs’ personal computers, with each person in charge of the data they generate. The first author
listed on the final publication is responsible for backing up the data onto a CD or portable hard drive at
the time of publication. Each year, the faculty member assigns one of her students to purge old data.
Other papers examine other labs at MIT. |
|