Mary-Russell Roberson for Trinity Communications
Omar Melikechi took a curiously circuitous route on his way to becoming an assistant professor in the Department of Statistical Science. As an undergrad at Dartmouth, he majored in political science and took just one math class.
After graduating, he found himself collecting data as part of his job helping resettle refugees with AmeriCorps in Phoenix. “I became interested in how I could analyze the data and that led me to linear algebra,” he said. “I studied by myself at night for the next two years while working during the day. I realized I really liked math."
After catching up on undergrad math classes at the University of Arizona, he came to Duke to earn a PhD, where he focused on random dynamical systems — systems that change over time in ways that aren’t easy to predict, such as the weather. For his postdoc at Harvard, he decided he wanted to get back to his initial interest — gleaning meaning from data. “I always wanted to work on applied problems and bridge the gap between theory and application,” he said, “so that led me to statistics.”
As a statistician, Melikechi studies variable selection, which involves identifying which variables out of a large set — genes, say — are most related to an outcome of interest — say, survival rate of cancer patients. With large datasets, it’s challenging to untangle the possible relationships in a way that doesn’t produce a lot of false positives. In the example of genes and survival rates, a false positive would be labeling an irrelevant gene as being relevant for survival.
Unfortunately, some statistical methods that prevent false positives have the unintended consequence of throwing out true positives with the bathwater. Melikechi has developed a method that hits the sweet spot. “We’ve shown theoretically and empirically that we could control false positives but attain more true positives,” he said.
He developed this method while working on a bigger problem of how to illustrate relationships among the thousands, or tens of thousands, of variables and multiple outcomes that are common in studies that include genetic information.
Melikechi’s solution is to use statistics to generate networks that resemble social network graphs. They graphically illustrate relationships among many variables, and the relative importance of those relationships, in a way the human mind can understand. “You get a more holistic view,” he said.
Melikechi is also interested in using statistics to illuminate the black boxes of artificial intelligence and machine learning. “People are using AI and machine learning, but there’s not much understanding of what’s going on under the hood,” he said. “An algorithm outputs a solution, but you don’t know how it came to that solution. It’s my belief that mathematicians and computer scientists and statisticians are the ones equipped to understand these models better so we can gain insights into the scientific questions.”
As someone who wants to be in the middle of theory and application, Melikechi is looking forward to using statistics to help Duke researchers, especially those focused on health-related questions and cancer.
“I think helping to cure or mitigate disease is one of the most important problems that people can work on, so I want to be a part of that,” he said. “What I’d really like to do is talk to people who are running experiments and collecting data and see what their statistics needs are. At Duke, there will be lots of opportunity to do that kind of thing.”
Melikechi is one of four faculty joining the Statistical Science department this fall. Read more about his new colleagues, Anya Katsevich, Lasse Vuursteen and Sifan Liu.