Dean Wampler (Mastodon: @deanwampler) is an expert in AI/ML systems. He is the Engineering Directory for watsonx Platform Engineering at IBM Research. Dean is the author of several books including Programming Scala, Third Edition, What Is Ray?, and Fast Data Architectures for Streaming Applications. He blogs on various topics at deanwampler.medium.com. Dean contributes to several open source projects and he speaks at many technology conferences and user groups. Dean has a Ph.D. in Physics from the University of Washington.
Open Source Science vs. Open Source Software: What's Different? What's the Same?
Running engineering for IBM's Accelerated Discovery Platform taught me a lot about the unique characteristics of open-source science (OSSci) vs. the more familiar open-source software (OSS). Let's explore these differences and take away some lessons for both scientists and for software developers. Specifically, I will discuss the following:
The objectives of and motivations for OSSci are different than for OSS. For example, reproducibility of scientific results is paramount and different approaches are used to validate results.
The creators of OSSci are mostly scientists. Open source is a relatively new concept for them. They need to learn and apply modern OSS development skills and practices.
In OSSci, models and data are usually more important than the corresponding software, while in OSS the software is most important, at least it was in the past. Because of the recent, explosive growth of AI, models and data are becoming more important in OSS, too!
Science had big data and massively-parallel, distributed computation before the software industry needed it. Today, core hardware and software tools are more similar than different, but moving from legacy approaches to more modern approaches is difficult.