In a profile shot, Sifan Liu looks onto the right of the camera
Sifan Liu is one of four faculty joining the Statistical Science department this fall. (John West/Trinity Communications)

Sifan Liu: Helping Researchers Generate Trustworthy Results

While selective inference can work, there are landmines. Sifan Liu, assistant professor in the Department of Statistical Science, wants to help researchers avoid them.

“It’s reasonable to make scientific discoveries by exploring the data you have,” she said, “but you have to be careful. If you use the same data to come up with the conjecture and then again to confirm the conjecture, that’s circular and suffers from bias. My work corrects for the bias.”

Selection bias is one of the factors that can lead to results that can’t be replicated in follow-up studies. It also includes such scientific no-no’s as p-hacking or data snooping, where researchers test as many relationships as possible in a dataset in order to find something with a low p-value, that is, something “statistically significant.” 

The simplest way to avoid selection bias is to use one subset of the data to develop a hypothesis and a separate subset to confirm it. But that means reducing the pool of data that can be used to prove the hypothesis, which in turn weakens the result, aka the inference.

Liu is using machine learning to improve upon a statistically sound, but computationally difficult, method that uses all the data with none of the bias. “In the methodology we are developing, you can use all the data to come up with the model and all the data to do the inference,” she said. “It’s a more efficient use of the data because you are not throwing away any information.” 

While researchers have easy access to data these days, Liu knows that not everyone has the statistical experience or expertise to employ the most appropriate statistical methods in each case. “For regular practitioners or researchers, it’s too complicated to do selective inference,” she said, “so I want to develop some efficient and practical ways to do it. I want to develop some method or package that people can use off the shelf.”

After earning an undergraduate degree in mathematics from Tsinghua University outside Beijing, Liu got a Ph.D. in statistics at Stanford University. Before joining the Duke faculty, she was a research scientist at the Center for Computational Mathematics at the Flatiron Institute in New York. She co-organizes a worldwide online seminar series on Monte Carlo methods.

Liu is one of four faculty joining the Statistical Science department this fall. Read more about his new colleagues, Anya Katsevich, Lasse Vuursteen and Omar Melikechi.