This month Prolific's cofounder Katia spoke with Columbia University's Sandra Matz about computational social science, do's and don'ts around data collection, and some big challenges in science today.
Katia: Sandra, you’re a Computational Social Scientist by training and Assistant Professor of Management at Columbia Business School in New York. What does a computational social scientist do? How would you describe this role?
Sandra: As a computational social scientist I typically ask questions that are psychological in nature, but I use computational methods to test these questions. To make this more specific, I am mostly interested in understanding people’s preferences, needs, and motivations by looking at their online behavior: We might look at their browsing histories, Facebook likes, or GPS data in smart phones to infer people’s personality or predict whether they might be suffering from depression. Those predictions are made using machine learning techniques that help us turn large volumes of data into meaningful psychological constructs. Once this step is done, we can start thinking about behavioral interventions at scale. For example, we might think of ways in which we can help individuals who are at risk of depression to get better, or ways in which we can recommend products based on people’s personality.
Machine learning techniques help us turn large volumes of data into meaningful psychological constructs. Once this step is done, we can start thinking about behavioral interventions at scale.
– Sandra Matz, PhD
K: How did you get into research?
S: I actually never wanted to do a PhD at all. Originally I wanted to go into leadership development. But then I did an exchange year with Michal Kosinski at Cambridge’s Psychometrics Centre during my Bachelor studies while I was at the University of Freiburg in Germany. We started exploring possible links between Facebook likes and personality. I found this research fascinating, so I applied for a PhD at the Psychometrics Centre – and was lucky enough to get accepted into the program. Ever since we have been working on expanding this line of research and making it more accessible to other social scientists. But I certainly never thought I’d become a professor one day, which looking back now is really the best thing that could have happened to me!
K: What research questions are you most excited about right now?
S: In my field, we’ve made a great deal of progress in understanding people’s preferences, motivations, and needs by looking at their digital footprints.
My main focus right now is to better understand how we can use these insights to influence people’s behavior for the better. In one project, we’re currently collaborating with a bank in Central America to convince people to save more money. In this project, we ask a relatively simple question, but it’s one that we hope will have a big impact on how much people save: What do people care about and how can we speak to those motivations? We try to frame our communication such that it’s tailored towards people’s personalities. For example, we try to highlight specific saving goals that motivate extraverts, such as the prospect of spending an exciting and fun holiday with friends in the future.
By creating a fit between people’s personalities and saving goals through communication, we hope to promote saving behaviors that benefit people in the long-term.
K: How do you typically collect your data? For example, if I were to allocate proportions to how I collected data during my PhD at the University of Sheffield, I’d say that I collected about 90% of my data online and 10% in the lab. Of course, there are further options, such as field research and using existing (i.e., secondary) data. What would be your percentages? Do you have any preferred way(s) of collecting data?
S: I collect about 5% of my data in the lab whenever I want to test for causality and I want to have control over what’s happening. Aside from that, I collect most of my data online. Online survey data (e.g., data collected via Prolific) constitutes about 30% of the data I collect.
All the remaining data is secondary, already existing data (e.g., smart phone sensing data). We often try to collaborate with companies who already have collected people’s digital footprints. It’s not always easy to get access to secondary data, but persistence pays off! Something I’d definitely recommend to anyone doing this type of research and trying to collaborate with companies: If you offer to analyse companies’ secondary data, charge them for it. It makes them a lot more committed and helps you in collaborating successfully. Remember that you add a lot of value to them by providing them with research insights and scientific credibility. You’re an impartial source that can help them validate their assumptions. I know this sounds counter-intuitive, because we as researchers often think that the advantage we bring to the table is that we are mostly free labor. It’s true that this might help you get a company to talk to you, but it also makes it much more likely that they are going to drop the project half-way through.
My most recent study on Prolific was actually one of the most complicated studies I’ve ever run, and if it wasn’t for the flexibility and support Prolific provided, would have been impossible to see through. – Sandra Matz, PhD
K: You have stated using Prolific not too long ago. What kinds of studies have you run and what has your experience with our site been like?
S: I’ve mostly run regular online surveys via Prolific. But my most recent study on Prolific was actually one of the most complicated studies I’ve ever run, and if it wasn’t for the flexibility and support Prolific provided, would have been impossible to see through. It involved two time points that were 30 days apart, so there was an entry survey in the beginning and exit survey at the end. There was an experimental vs. control control condition. In the experimental condition (but not control), we had people sign up to a betting website, where they were asked to make bets on climate change as part of a real betting market. For example, participants in the experimental condition were asked to predict how hot it would be in Florida in a week’s time and to bet money on their prediction. The goal was to see whether people would bet in line or against their stated climate beliefs, and to test whether we could actively change people’s opinions on climate change by having them engage with and learn about the topic.
It was really amazing that Prolific allowed us to prescreen participants based on their climate beliefs. No other company would have allowed us to do such custom prescreening! And I was quite impressed by how little participant dropout we’ve had and how good the quality of participants was across the board!
K: Is there anything we could do to make your Prolific experience even better?
S: The only thing that comes to mind is phone support. While your email support has been super reliable and fast, it would sometimes be so much easier to simply pick up the phone and ask a quick question. I know that this is an insane amount of effort and man power, but I think such a service would improve user experience – maybe something to add for high-value customers. Aside from that, I think Prolific is fantastic as it is!
A topic that I care about a lot, and that is still overlooked most of the time, is the potential friction between transparency and data protection. Even if you completely anonymise your data, participants might still be identifiable if you uniquely combine different demographic variables (e.g., their location, age, gender, ethnicity, things they Like on Facebook) and then reverse-engineer. – Sandra Matz, PhD
K: This is a somewhat bigger and broader question, but I’m really curious. What do you think are some of the biggest challenges that science is facing today?
S: As amazing a workplace academia is, there are quite a lot of challenges... A topic that I care about a lot, and that is still overlooked most of the time, is the potential friction between transparency and data protection. There is a growing demand for open science where researchers make their datasets and scripts available for others to replicate and explore. However, this gets quite tricky when it comes to Big Data and digital footprints. Even if you completely anonymise your data, participants might still be identifiable if you uniquely combine different demographic variables (e.g., their location, age, gender, ethnicity, things they Like on Facebook) and then reverse-engineer. There is a fascinating paper in Science showing that you only needs three purchases to uniquely identify a person in a large dataset of credit card spending - that’s just because there are not that many people that buy a Starbucks coffee at 2pm, pay for their parking in Stanley Street at 4pm and then have dinner at Bill’s at 7pm. A pressing question is how to get the balance between transparency and data protection right. How can we share as much as we can to allow for open science, but how can we also protect our users?
Another major challenge is around interdisciplinary work. While it’s valued a lot and encouraged by many, it is still really difficult to get it published in scientific journals. When I submit my work to psychology journals, it’s often deemed too technical. When I submit it to computer science journals it’s not seen as technical enough. This leaves you wondering how to get interdisciplinary work published at all! It’s usually either the top journals, or if you don’t make somewhere much less prestigious. Journals have started to recognize this and are slowly shifting to encourage more interdisciplinary work at the intersection of psychology and computer science, but there’s still way to go.
Finally, I think that the publish-or-perish culture in Academia is not only creating undue stress, but it also hinders the scientific process itself. As a result of this culture we’re often not thinking big enough. Instead of planning long-term projects that could add huge value, we focus on short-term projects with immediate returns.
K: Thanks Sandra – it was fascinating to speak to you!