How a Digital Repository Is Democratizing Science From a Duke Basement

4 3D scans on a black background

Doug Boyer was a hit at his daughter’s kindergarten show and tell.

The associate professor of Evolutionary Anthropology came armed with a life-sized, 3D-printed vertebra belonging to the world’s largest living snake, the green anaconda (Eunectes murinus). Once the students were done oohing and aahing over the plum-sized bone replica, he pulled a second vertebra, ten times larger than the anaconda’s, out of his bag. It was a life-sized replica of a vertebra belonging to Titanoboa, a snake that went extinct around 60 million years ago.

“If the anaconda is the length of a truck, can you imagine how big this one was?” he asked. The room erupted.

Thanks to MorphoSource, an NSF-funded and Duke-hosted digital repository of museum specimens’ 3D scans, he isn’t the only one able to pull this trick. And that is precisely why Boyer created MorphoSource: to democratize access to specimens previously hidden away in museum drawers.

Hands holding (left) replica of a green anaconda’s vertebra, and (right) replica of the vertebra of the extinct Titanoboa
Boyer shows the two life-sized 3D printed vertebrae he brought to his daughter’s kindergarten class. On the left is a replica of a green anaconda’s vertebra. On the right, a replica of the vertebra of the extinct Titanoboa. Scans of both vertebrae were made available on MorphoSource for education and research by the University of Florida, Florida Museum of Natural History. (John West/Trinity Communications)


From a basement at Duke to the world

Museum specimens are often rare, if not one of a kind. That’s why museum exhibits are such a hit. But what we see in these exhibits is a small fraction of what a museum holds, the tip of a massive collection iceberg hidden away and kept safe in drawers, vials and fire-proof cabinets.

Doug Boyer looking out window
Doug Boyer, assistant professor of Evolutionary Anthropology, created MorphoSource in 2013. (John West/Trinity Communications)

To ensure the protection of these collections, museums restrict access to all except accredited researchers willing and able to jump through multiple bureaucratic hoops — and often buy a plane ticket — to visit them in person.

Though necessary, these safeguards prevent the public from seeing or learning from the vast majority of specimens in museum collections. Even among researchers with the correct credentials, geographic distance and cost of travel can be unsurmountable obstacles blocking access to these resources.

Enter MorphoSource.

It houses what one would see in a typical natural history museum exhibit, such as skulls or shells —  you can even find Sue the T-rex within its ranks — but also specimens like grains of pollen, battle wounds from the civil war, live animals in their natural environment, and much, much more.

The repository currently houses scans of over 53,000 biological, paleontological and archeological specimens from over 1,000 museum collections located in all six inhabited continents. Researchers can upload and download CT scans, 3D models, photos, X-rays and a variety of other file types. Data has been contributed or downloaded by over 17,000 researchers, students, teachers and artists all over the world. By the end of 2021, MorphoSource had been cited as a source of material in over 1,300 scientific publications. And it is still growing.

Originally envisioned as a way to store 3D scans produced by the lab Boyer worked in as a postdoctoral researcher, MorphoSource is now one of the world’s most important scientific data repositories. In a recent survey asking natural sciences researchers which repositories were most important for their work, it tied up in first place with GenBank, the National Institutes of Health genetic sequence database holding all publicly available DNA sequences. And it reached this status in less than 10 years.

MorphoSource was kickstarted in 2013, soon after Boyer started his faculty position at Duke. With funding included in his job offer from Duke, Boyer hired a web development firm, and MorphoSource slowly went from an idea to a concrete product hosted by servers in the Biology department. While it solved the fundamental need to have a place to archive and access 3D research data, it was still relatively limited in capacity and usability.

Three years later, Boyer recruited a graduate school lab mate turned software developer, Julie Winchester, to join him in the project. Having worked extensively with museum specimens as a Ph.D. student, Winchester had become a passionate advocate for data sharing.

“The only reason I was able to work with 3D data as a grad student is because I got a grant to travel to museums in the United States and around the world to go gain access to the physical specimens and 3D scan them,” she said. “Not everyone has even the possibility of getting these grants.”

Shepard in lab
Digital curator Mackenzie Shepard keeps the digital collection organized by assisting both uploaders and downloaders. She also sometimes scans specimens herself, using the Pratt School’s Shared Materials Instrumentation Facility. (John West/Trinity Communications)

Having the means to travel to museums to scan specimens was only half of the problem. Winchester says that without a public repository, researchers who were fortunate enough to be able to collect their own 3D data had no way of sharing it.

“We knew so many people working in this field who had a stack of hard drives full of 3D data just sitting in someone's lab,” she said. “Scientific data should be shared, especially since a lot of it is taxpayer funded.”

In 2017, Boyer and Winchester obtained a grant from the National Science Foundation, and MorphoSource took a big leap forward. The grant allowed them to hire a team of two additional developers and a digital curator, all with skills complementary to their own.

Simon Choy, a computer scientist, and Jocelyn Triplett, a library scientist with a masters in classics, helped refactor the software underpinning MorphoSource from the initial proof-of-concept version to a more complete repository solution, in partnership with Duke Libraries, and developed more efficient methods to upload and store large amounts of data in a searchable way, using widely-adopted concepts and tools used by other academic and industry data repositories.

Mackenzie Shepard, then an undergraduate student working in Boyer’s laboratory, joined the team to sort through troves of data and ensure that researchers and institutions upload their scans correctly.

“We basically started from ground zero,” said Winchester, who leads the development team. “It took us almost three years to rebuild, expand and improve.”

There was no shortage of motivation. “I liked working on commercial web applications,” said Choy, who used to work for TiVo, “but working on a product that is actually helping a community, helping researchers and educating kids gives me a much greater sense of accomplishment.”
 

From scientific resource to educational tool

Boyer and Winchester weren’t satisfied with giving users the ability to download data from MorphoSource’s website.

“If you download the data,” said Boyer, “then you just have a large file on your computer, which means you need to have software. You need to have a computer that has a powerful enough graphics card, processor, or what have you.”

“Internet connections aren’t always phenomenal, and teachers in schools often don't have full ability, or sometimes any ability, to install software on educational computers,” said Winchester.

The new MorphoSource platform solved this problem with an online, interactive visualizer. Thanks to work by Winchester and an open-source software developer, MorphoSource can “optimize literally gigantic files for web viewing,” Boyer said. “We’ve provided a resource and functionality that isn’t replicated by anyone. Not even commercially, or outside of science — it’s unique.” Almost anyone can open the website, enter a keyword in the search bar or browse by object type, choose a sample and visualize it with no need to download the data.