Discovering the power of big data

Author: Rowland, Ashley

Two seniors at the Kellogg Institute for International Studies are learning that big data can be a game changer for scholars trying to tackle some of the world’s most intractable development problems.

Attina Zhang ’21 and Timothy Burley ’21 are both Kellogg International Scholars who are taking part in ambitious, years-long research projects that integrate their nontraditional combination of interests in social science and data science.

Zhang, a double major in sociology and applied and computational mathematics and statistics, is part of an interdisciplinary team studying early childhood development in rural Mexico. Led by Faculty Fellow Nitesh Chawla, the team is working with a local non-governmental organization that identifies and helps malnourished children within the community.

Meanwhile, Burley, a computer science major, developed a machine-learning analysis that allows social science researchers to comb through masses of data in seconds. His application is being used in a study led by Faculty Fellow Ernesto Verdeja that aims to identify the warning signs of impending genocides and prevent future state-led mass atrocities against civilians.

Both undergrads said the International Scholars Program, or ISP, has given them the opportunity to explore innovative ways to use big data – the use of massive datasets too large or complex to be analyzed through traditional means – while working alongside their faculty mentors.

“Data can be something that’s very powerful and informative to people’s health,” Zhang said. “The potential to help others by collecting and analyzing data is huge.”

Endless possibilities

ISP pairs exceptional sophomores with Kellogg faculty fellows like Chawla and Verdeja, who work closely with students in research partnerships that typically last for the duration of their undergraduate careers.

Chawla, the Frank Freimann Professor of Computer Science and Engineering, is an expert in artificial intelligence, data science, and network science whose work looks at how technology can advance the common good. Verdeja is an associate professor in the Department of Political Science and the Kroc Institute for International Peace Studies who studies political violence, transitional justice, forgiveness, and reconciliation.

Both said the possibilities for using big data – and the future career possibilities for students who know how to gather, model, and interpret it – are endless.

“Imagine social scientists marrying the intuition of a decision maker with big data and artificial intelligence to translate into more effective decision-making capability,” Chawla said. “Imagine an improved understanding of human behavior from the scale of individual to society based on data. Imagine integrating big quantitative data with ethnographic or qualitative data to inform social theory and analysis. This can unravel new understandings and discoveries about our world.”

Verdeja noted that big data is taking on increasing significance in social science because it allows researchers to understand patterns that can’t be captured without complex computational methods.

“Given its growing importance, it is crucial for students to be exposed to these methods and approaches for addressing big social and political problems,” he added.

‘Dream big’

Zhang joined ISP because of her interest in exploring the ties between cyclical poverty and educational and economic disparities in the developing world. But over time, she became more interested in how data could be applied to research projects. In 2018, she joined Chawla’s project, which partners with Un Kilo De Ayuda, a Mexican NGO that detects children whose growth is stunted and works with parents to make sure they receive the nourishment they need.

In the past year, Chawla’s team has helped the NGO sort through, or “clean,” the masses of height, weight, and other health measurements its workers have collected during community visits.

“They collected such a huge amount of data but they couldn’t put it to use,” she said. “That’s why they found us.”

The team is now in the process of creating a phone app that mothers can use to record data about their children’s growth, allowing the NGO to track development in a more systematic way and intervene more quickly in urgent cases.

Chawla is head of the Lucy Family Institute for Data & Society, which is funding the project. He said Zhang’s contributions have included surveying literature on child development, developing a survey for Un Kilo De Ayuda’s social workers, and contributing to a published research study related to the project.

“Attina has learned about interdisciplinary research and how to be a data-driven scholar,” he said. “This data acumen, combined with normative thinking about problem-solving and the experience of working with an interdisciplinary team, will serve her well in the future.”

Zhang said that before she started working on the project, the idea of merging traditional social science research with data seemed incongruous. Now, she sees that both kinds of research are complimentary, fitting together like pieces of a puzzle or a Venn diagram.

“In the past I felt like the quantitative and qualitative sides were too far apart, but I realize now that they’re all overlapping,” she said.

Zhang said Chawla has encouraged the team to “dream big” and to think broadly about how the app could be used by other NGOs worldwide. He’s also told them not to be discouraged by their youth or the limitations on travel and fieldwork posed by the coronavirus.

“He constantly engages the young people on the team to jump out of the box that the world puts us in and think about what we, as young people, can contribute to the world,” Zhang added. “He tells us not to waste this awesome and unique opportunity of living in this time of COVID, and challenges us to observe, to do, to learn.”

‘There’s so much insight we can glean’

When Burley joined Verdeja’s project examining the triggers of mass killings, student workers were sorting through and reading masses of news articles on genocides. But the amount of work was overwhelming, with the team looking at atrocities worldwide over a nearly 20-year period.

Using natural language processing (NLP), Burley developed what he described as a “workflow” that allowed the team to exponentially expand its scope of research.

“We went from to reading 10 or 20 documents a day to having a computer read hundreds of thousands of documents in a couple of seconds,” he said.

The project investigates how certain events – assassinations, protests, and coups, for example – can spark state-led mass killings of civilians. Under what conditions are those events triggers, and are there particular combinations or sequences of events that are more dangerous than others?

Burley worked with Notre Dame’s Center for Research Computing (CRC) to develop a system that downloads news sources about trigger events and tags them according to different features.

“From a political science perspective, it’s very important to be able to do this because there’s so much insight we can glean about triggers of mass atrocities,” he said. “This can help us inform policies on preventing these things, or bring attention to the likelihood of their happening again.”

Verdeja said Burley’s efforts have been central to the project’s success, and the computer coding he helped develop could be applied to similar research.

“This is not only highly technical work - it also requires a lot of problem solving and thinking through a number of practical and big conceptual questions that tend to come up,” Verdeja said.  

“Timothy has been great in this capacity. He is not only learning how to use sophisticated, cutting-edge computational tools as part of a big research team, but he also sees how computer science and the social sciences can work together to address some of the most challenging problems of our time.”

Also involved in the project are Kellogg International Scholar Abigail Sticha ‘22, Angela Chesler, a Kellogg doctoral student affiliate studying political science and peace studies, and Paul Brenner, CRC senior associate director. Burley and Sticha coauthored a peer-reviewed article about the project published this year.

Burley was drawn to ISP because he wanted to combine his computer science skills with political science. And the outcome has been gratifying.

“Data intensive work speaks to the need for data intensive solutions,” he said.

“If there was just a simple program that could have done this from start to finish, we probably could have employed it and we wouldn’t have had to create this whole computational arm of the project.

“I’m excited to see where this takes us.”