The Founders and Leaders Series

Episode 13: Samuel Cohen, Fairgen

Mike Stevens Season 1 Episode 13

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 26:35

Episode Overview

Fairgen founder Samuel Cohen on synthetic data, digital twins and how AI is making market research more accessible to more organisations.

Episode Highlights

  • A taxonomy of synthetic data — Samuel maps the space across two axes: methodology (directional to foundational) and stakes (low to high), showing why different approaches suit different research contexts.
  • Why "boosting" has become standard — Fairgen's augmentation technology is now used by brands including T-Mobile and L'Oréal to dramatically expand the number of segments they can reliably report on, with three years of rigorous statistical validation behind it.
  • The problem with fully synthetic panels — Samuel explains why prompting LLMs to simulate survey respondents produces poor results: large language models are "averaging machines" that flatten individual variance and produce near-identical responses.
  • How Fairgen builds category-level digital twins — Each twin is anchored to a real person and enriched with category-specific survey data, clickstream, transactional and live news data, refreshed quarterly to maintain accuracy.
  • The democratisation of research — AI-powered tools — including digital twins and AI-moderated qualitative research — have the potential to make insights accessible to startups, SMBs and underserved markets that have historically been priced out.
  • Founder lessons — Samuel shares two core pieces of advice: challenge every assumption, and invest consistently in building relationships and networks.

About the Guest

Samuel Cohen is the founder and CEO of Fairgen, a synthetic data company serving the market research industry. He studied mathematics at Oxford and completed a PhD at UCL, spending the majority of his doctoral research at Facebook AI Research Labs, where his team built foundational generative models that now underpin mainstream image and video generation tools. He founded Fairgen four years ago, applying that background to the challenges of survey research and consumer insights. Fairgen works with large insight agencies and directly with enterprise brands including T-Mobile, L'Oréal and Coty.

Learn more about the impact of technology and AI on research, insights & analytics at Insight Platforms.

Mike

Hello everyone. Welcome to another edition of the Founders and Leaders series. This is where I have conversations with the people who are building the next generation of insights, research, technology, and capability. And today I'm delighted that I'm joined by Samuel Cohen, who is the founder and CEO of Fairgen Samuel, welcome to the podcast.

Samuel Cohen

Thank you, Mike. Thanks so much.

Mike

Great to have you here. So in a, in a like nutshell, how would you describe Fairgen? Tell us what the company does.

Samuel Cohen

Of course. So about four years ago, we had this idea that the market research industry was gonna, get blasted by this new AI technologies. And my research in the past had always been about, synthetic data. And it was natural for me to basically try to bring synthetic data to this industry. And, fast forward for four years, it's the hottest topic. And at the core of this, from bringing synthetic data to a directional research, foundational research in different ways.

Mike

Yeah.

Samuel Cohen

work with large insight companies and also directly with enterprise brands.

Mike

Yeah. Okay. Wow. So if we just scroll back, you were working in synthetic data before the era of LLMs and Transformers. So can you tell me a bit about that background? You know, I guess machine learning approaches, what, what was your background before you decided to found Fairgen?

Samuel Cohen

Yeah. So I had a super, I would say, technical background. So I did the, I studied maths at uni in the UK and went to Oxford, did a masters in ml, and that was my first. data, exploration. So I did a thesis there on basically generat generational of images with, synthetic models. Then went to do a PhD in at UCL in London and spent I would say 90% of my PhD time at Facebook AI research labs we're, we were basically building foundational models for generating all sorts of data. And that was, yeah, that was pre LLMs like, you know, a few years before LLMs were even a thing. At the time it was so hard to do anything. Like even during a basic like 28 by 28 image was really hard. And now, things really move forward. And one of the models we created at Facebook research Labs is now one of the core components of all the image and video generation models that you see out there.

Mike

Right. Wow. Okay, cool. I knew you must have studied in the UK when you called it"Maths".

Samuel Cohen

Correct, correct.

Mike

Yeah, exactly. So tell me, so synthetic data, it's. It's polarizing the industry. And can you just break it down a little bit, because I think a lot of people find the language challenging. We've got synthetic, we've got digital twins, we've got personas, we've got augmented. How would you create a kind of rough taxonomy of the space to help people understand it?

Samuel Cohen

Yeah, a hundred percent. So I think there's two access that I. like to think about. And basically one is like methodologies and the sec, and the second one is basically stakes, right? And basically different methodologies. Are good for different stakes and that's basically how I create a grid of what people are doing in this industry. And for methodologies you can go from pure exploratory, directional research until foundational research where you have things like, segmentations, brand trackers and so on. And on the stakes level, you have research that is meant to be used for like micro stake. Decisions, and you have research that is used for, super high stakes decisions, like from blah, should I invest$10 million to expand into this market? And so obviously different stakes require different methodologies. And you have, in my opinion, synthetic data solutions that are you know, crafted for the different cells of this grid. so on our end, like on the fa on on, on the Fairgen end. started from like high stakes, like a foundational research capabilities and augment them with basically various synthetic approaches that basically just literally just augments like a field that's already collected and so on. And that's called boosting. And I think now it's like a standard in the industry. It's not really challenged that much anymore. And we are now entering also in the other end of the spectrum on the pure directional research. Where it's more things that are named like, digital twins, although there's many versions of it. So that's my taxonomy the space.

Mike

So augmenting you say is, you know, is it is relatively uncontroversial now. You think it's been accepted, adopted, and, and you no longer get the same level of pushback. Is that, is that fair?

Samuel Cohen

Yeah, I think we as a company, we, we've, proven at enterprise scale that this kind of technologies basically could be used. Like we, it's now used by Coty, like L'Oréal, T-Mobile and so on. So I, so we had three years of like statistical validation, like super rigorous. And so now it's more about finding the right use cases for it rather than, iterating or testing and testing. So for as a company, like it's been, it's now 10 times easier than it was like three years ago. And I think now the, there's a lot more questions around digital twins persona and stuff that are more synthetic than augmented. And that's basically where we're shifting focus in terms of education and thought leadership.

Mike

Yeah. Okay. Wow. Fascinating. What are the, without giving away any secrets from the brands you mentioned, but what are some of the big territories that people are excited about that are not just about pilots? You know, so the, the real production use cases are other, you know, trends, themes that are emerging.

Samuel Cohen

Yeah, so I, there, there's a use case that I've printed in various conferences that always attracted some interest because it's simple to understand and it's very common if you take brand trackers for example, like everyone knows about the issues that, you, as soon as you start to look at the segment level or cut level, like from a DMA level in the us. Then things start to become erratic across waves. And that's really a problem of variance and low segment sizes. And so for example in a tracker, maybe you have only 10, five or 10 DMAs where you can actually report anything. And a lot of companies are eager. To be able to report things at a deeper level so they can take decisions at a local level, like local media optimization, for example. So that's what we do with T-Mobile for example. They can look at 21 DMA they used to look at 21 DMAs. Now they're able to look at 98 DMAs. So I think that's a really simple, an important example. Another one is segmentation. So segmentation is also something where obviously if you have larger segment sizes, you can do a lot more interesting things. So L'Oreal uses Fairgen in partnership with a company called IFOP in France,

Mike

Yep.

Samuel Cohen

like globally, across like many countries. So I think that's two the foundational research examples where

Mike

Yeah.

Samuel Cohen

interest.

Mike

Okay. Interesting. I was expecting you to talk about the innovation area or you know, at the early stage of, of innovation. Is that, is that an area that, that people are, are deploying or is it, is it tend to be more about these because they feel like. Harder challenges. The brand tracking and the segmentation feel almost like harder challenges than some of that early stage innovation screening ideation stuff.

Samuel Cohen

Everything that's more like exploratory and early stages of innovation. Like boosting is not the right technology. Boosting is really like a technology that allows to reduce margins of error at the segment level and for

Mike

Yep.

Samuel Cohen

innovation exploration. It's not the kind of thing that you're optimizing for. You are just optimizing for taking the right directional calls. Although have to say that we are also doing some research innovation research projects, but that's. For foundational innovation projects that are super high stakes. And later down the stages of the innovation funnel.

Mike

Yep.

Samuel Cohen

Now on the pure innovation side of things, that's where exploratory research and tactical research is more important.

Mike

Yep.

Samuel Cohen

that's basically where. Often you want to do a lot of iteration on of research. Like it's painful to have to do one round of research and then wait six weeks to have your results. Right?

Mike

Yeah.

Samuel Cohen

where things like, digital twins, I think have a lot more potential,

Mike

Okay.

Samuel Cohen

To, test ideas, iterate and then may potentially later do a

Mike

Sure. Yeah. Okay. Interesting. We it sort of brings us on really to some of the broader trends in the industry then. So, you know, we looked at, we talked a bit about segmentation. Before we get there, I just have one, I guess one of the common challenges or pushbacks that you, I imagine you, you hear often, but I hear a lot is. Is there a risk that with, you know, the boosting or the kind of generation of synthetic data that actually what we're doing is we're just, we're flattening the extremes too much, you know, a central tendency, reversion to the mean. So what you described as small sample sizes in cells with data that jumps around all over the place, the boosting effectively de-noise is an awful lot of that. Is there a risk that you de-noise that. To the point of blandness or you know, lack of real insight. That's, I, you know, it's a challenge that I hear a lot. How, how would you respond to that?

Samuel Cohen

I think we, yeah, we need to separate the LLM based solutions and digital twins from boosting when

Mike

Yep.

Samuel Cohen

this question because things are very different in different, in these different areas. So

Mike

Yep.

Samuel Cohen

a hundred percent, you are a hundred percent right. There's basically a trade off between variance and bias.

Mike

Yep.

Samuel Cohen

so traditionally. statistical met method that basically, learns from a set of data and basically predict things, especially at a local level, is gonna basically use knowledge from everywhere else to strengthen that small segment. And so basically you are reducing variance while increasing bias. And the bias basically brings you towards like similar segments. So the key thing is for very small segments, the variance. Basically reduction that's gonna happen post boosting is way more significant than the introduction of bias, and so

Mike

Right.

Samuel Cohen

Going to improve your results significantly.

Mike

Yep.

Samuel Cohen

large segment, like usually n equals like a hundred than 50 plus, what's gonna happen is the contrary, because the introduction of bias is going to be more significant than the diminishing returns in variance reduction. And then boosting is not a good fit. So

Mike

Yeah.

Samuel Cohen

That's basically the key to take into account and it's very easy to measure in various ways, the parallel tests and so on. So that's a trade off, and that's something that we always very

Mike

Yeah.

Samuel Cohen

explain to customers.

Mike

Okay. Interesting. Feels like a, kind of an inverted U curve, you know, where you're, you're basically, you know, you're, you're optimizing and then actually above a certain sample size, it, it doesn't work so much.

Samuel Cohen

Exactly.

Mike

Okay, great. So you mentioned digital twins and, you know, there's a, there's a lot of innovation happening in the industry, you know, powering powered by ai, a lot of it. Can you just say you, you know, you've come into the industry from the outside, I guess you've got an, you know a perspective that combines both having worked in it for the last few years, but also that broader knowledge. What do you see as some of the bigger, interesting trends that are happening in the research insight space?

Samuel Cohen

Yeah, basically, I would say that there's so many approaches and variants to tackling the bringing of a digital twin solution like in this industry right now. And that's creating a lot of noise. So even within digital twins, there's maybe 10 of, ways of doing it. And and basically we have at Fairgen in a pretty strong position on how we think this should be done.

Mike

Right.

Samuel Cohen

The let me maybe introduce a bit the spectrum of what's happening right now

Mike

Okay.

Samuel Cohen

from on. I would say the worst of all end is basically what people call, like fully synthetic panels. basically simplifying a bit what people are doing is like creating a panel of a thousand US 18 plus people with age, gender, region income, all these things. And then basically what you do is for a new survey, you are basically going to prompt LLMs or GPT, 5.5, whatever, saying you are like this a, is this gender, is this, whatever, answer these questions. And that's

Mike

Okay.

Samuel Cohen

terrible for so many reasons. I can explain very sim simplistically why the main thing is that LLMs, like GP 5.5 or previous versions, or Claude, whatever, they're trained on the whole internet, right? So they're basically averaging machines,

Mike

Okay.

Samuel Cohen

They're literally averaging machines. So basic the variance in the answers of these at the individual level is basically no. And you just get an average answer to everything. And that's why the results are terrible and these kind of solutions don't really work that much. on the other end of the spectrum is what we think is right, is to basically anchor every single tool that we are ever going to use, through data from a real person. So a twin is one-to-one with a real

Mike

Yep. Yep.

Samuel Cohen

At a category level.

Mike

Hmm.

Samuel Cohen

Because if you're just doing us 18 plus, and so you collect some data on individuals, like you actually collect this data even a 30 question questionnaire, but you won't have information on this person, on, what bank this person is is using and how much is this person spending in groceries and does this person prefer a Pepsi or Coke? so that's wouldn't work. So what we say is you have to create audiences that are category level, even subcategory level soft drinks, for example, is a category. And then you can basically run studies like that, collect a lot of information on these people at a subcategory level, and then you can use these audiences of twins. Per category. Like I wanna test a new concept of soft drink. I'm gonna use a soft drink category audience where all the twins that I'm using are based on a real person that I have collected everything I need to on the, at the soft drink level.

Mike

Yep. Yep.

Samuel Cohen

doing that works really great. Now, obviously that's a lot more expensive, a lot more complex in terms of data orchestration, processing and so on. And that's why we also partner with great companies and data companies inside data companies that allow us to get this coverage.

Mike

Okay, so the, we've got the two extremes of that, that spectrum. The, the digital twins, the one-to-one. Is that, I mean, how frequently would you need to update that or refresh it? Does that tend to vary by category, I guess by subcategory and, and the vertical dynamics?

Samuel Cohen

Yeah that, that's usually the question the question I get every time I say that because. It is very obvious to anyone that has collected data and survey data, especially that's, that it's a complicated process. Unfortunately, you have to refresh data super often. So our recommendation there is to refresh at a quarterly level. So you have to have new primary data pretty much every quarter.

Mike

What.

Samuel Cohen

another thing that helps is to have secondary data that can be refreshed more often. Industry report, state of X and so on and so forth. and you can put more in the secondary data like, interviews and so on. another thing is we're also the twins, the modeling itself with various other streams of data towards our partnerships like clickstream data to understand what's really hot in a categorize a click. Level, which obviously super granular,

Mike

Mm-hmm.

Samuel Cohen

Transactional data. So there's many things that allow us to basically, and live news data also. That's also something that we're using, so that allows us to get like a state of the world that's as refreshed as possible, but yeah, still need to data primary.

Mike

Okay. So the, the core of the twin is, direct survey conversation, the data's collected category specific, but then it's enriched or kept dynamic up to date with additional secondary data sources, transactional, behavioral, whatever you can get for that category. Is that correct?

Samuel Cohen

Completely right. Yes.

Mike

Yeah. Okay. Very interesting. What, what else is going on that you've observed in the research and insights industry? What, what else do you think there's some interesting innovation happening.

Samuel Cohen

Yeah. So think about, so earlier I spoke about low stakes and super high stakes, and I think that we need to give a lot more attention in the small, medium stakes decisions that need to be made. So I think that there's two paths that are complimentary actually, and one path is like this whole digital twin world. Second part is the AI moderated qual direction. All the Listen Labs, Outset Knits of the world. I think they're doing great work. Like they're great technological companies and they're also super needed. I think user research, for example, is something where these guys are going to be super useful for the coming years. So I think, both of these two categories are going to take a big chunk of the directional exploratory markets. So yeah, I think these are the two directions that I am, that I'm most excited in right

Mike

Yeah. Okay. It's definitely the two territories where we've seen the most new startups. So if we look at the volumes of companies, you know, submitting or adding to the, the Insight Platforms directory you know, AI moderated research in all its various flavors. You know, some towards more of the in depth qualitative, some more about survey enhancement. And then. In this very broad territory, like you, you know, you, you sort of mapped for us at the start, but if you include personas, digital twins, you know, synthetic, augmented an awful lot of different attempts to build, you know, using those frameworks as well. So. Yeah, there's a lot happening. What do you think will be different in, you know, let's say three years time? There's a huge amount of innovation. Now there's a lag with corporate adoption because that's how these things work. Enterprise takes time to, to absorb these changes. But let's imagine in, you know, in three years time, what might the industry look like? How might it be different?

Samuel Cohen

So let me try to not say AI everywhere because I think that's very obvious. I can maybe tell you what I'd like to see,

Mike

Yep.

Samuel Cohen

in, in three years.

Mike

Okay.

Samuel Cohen

think that the roots cause of why research is, has been built and is here as an industry is going to make companies more customer centric. I think that's generally the reason why we're doing research, right? And customer centric has a lot of great downstream, implications that are great for businesses. unfortunately, research is not the most accessible industry and set of tools for many reason, like cost, but most, more cost and time. We're shifting into a world where basically you don't have you have huge industry level changes with companies that go from inception to, massive companies like Cursor,

Mike

Yeah.

Samuel Cohen

so on. and basically I believe that research will be a key component into getting more companies through that kind of path. But it has to be made more accessible. And I think that's what AI opens as a possibility is to make it more accessible. So reducing the time to insights

Mike

Yeah.

Samuel Cohen

the cost of insights and my hope is that basically. Startups, SMBs and companies that couldn't afford research from time and cost perspective will be able to afford it. And it won't be something that is just, focused and reserved for LA large and massive enterprises.

Mike

Yeah, interesting. Democratizing

Samuel Cohen

Yeah.

Mike

to research. I think you said something earlier, which is. A different dimension of that, which is even in large enterprises where they have substantial research budgets and are doing what looks like a lot of work, the potential to use. Data insights for lower stakes decisions, you know, for sort of medium and lower stakes that would never have had justification for budget, you know, with the traditional ways of doing things. And there's maybe a third dimension I was struck by. You talked about the AI moderation opportunity, Anthropic's research with its user base with 80,000 participants. You know, what they were able to do was to get robust. Insights into populations that often don't get a voice. So, you know, markets in Sub-Saharan Africa and, you know, reach like that. So I, you know, I'm I think I'm just furiously agreeing with you that you know, these, the development of these tools is likely to have these expansive, democratizing impact in these different dimensions.

Samuel Cohen

A hundred percent. Yeah, we need. be able to increase the kind of reach of this kind of audiences. And I think now we have companies that, that basically have customer bases that go from zero to hundreds of a couple of years.

Mike

Yep.

Samuel Cohen

I think they're, these companies are gonna be also really smaller as using this audiences like basically to to do more. And that's gonna create some eagerness for streamlined research tools like again, twins and AI moderated qual.

Mike

Yeah. Yeah. Okay, great. If we think about your journey you know, you, you had a number of different sort of interests, roles, academic and you know, working for Facebook Labs and then building Fairgen. What are some of the lessons that you've learned as an entrepreneur and a founder? What are, what are some of the big learnings you'd, you'd like to share?

Samuel Cohen

There is this like super famous sentence by, by Ben Horowitz from a16z(Andreessen Horowitz)

Mike

Yep.

Samuel Cohen

where he says when I was a founder, I basically used to sleep like a baby. I would wake up every hour and cry. And I think he was say, and he, it's the kind of guy that, sold massive companies now has probably the most successful vc and

Mike

Yep.

Samuel Cohen

what he was trying to say. Was that, you just the successful founders are the ones that are resilient and are, basically successful at managing the down news and just, and the happy news or you're just taking them in. But so it, it's I feel like being a founder is like being a, in MMA, just have to take the heat and just stay up and stay up the longest. And that's my experience here. So that's like the resilience part. I think another part is I've really learned that like having a good network of people that can, that, that can help you get there is super important, like we are from outside this industry. And that allowed me to learn so much into, understanding like all the dynamics and how to do things right and also make friends along the way. So I think I

Mike

Yeah.

Samuel Cohen

the two things that are most excited

Mike

Yeah.

Samuel Cohen

about.

Mike

Okay. Yeah. Fascinating. So be prepared to be punched in the face and keep going. Yeah. But I think the external perspective is. It can be uniquely valuable as well if it's accompanied with the right kind of growth mindset. You know, the desire to learn and the, and the humility for it. There's a, there is some resistance amongst people who've been in the research industry for a while. When they see new tech startups founded from people they don't know from outside the industry. They can be a bit like, well, you know, you're not from here. What do you know about this? I have to say, I, you know, I work with. Founders right across the spectrum who've always been in this industry and who've come to it fresh. And there are some. Who have come into the industry without any real background in it, and have come with that inquiring mind with that desire to learn without the sort of arrogance of we're gonna disrupt everything and turn it upside down and reinvent it.

Samuel Cohen

Yeah.

Mike

and have built incredible businesses. You know, I'm not gonna name them here, but there's some very. Let's say humble founders, you know, who've come in knowing what they don't know and, and very prepared to learn. So I think it's a, it's a good observation.

Samuel Cohen

I think sometimes you can challenge like the things that are written in stone. Like for, give an example once I needed to run 20 fields in two days.

Mike

Yep.

Samuel Cohen

Which when I say field, I say write a questionnaires, programs, then run the fields like for a super important project in two days and like a couple of countries. And when I went to partner, obviously that's the kind of thing that, usually that's not the kind of thing that you can ask in this industry, but, luckily the partner is also someone that's, that understands these kind of things. And that's the kind of thing that I think if I was from this industry all my life, I

Mike

Yep.

Samuel Cohen

even dared to ask for some.

Mike

Exactly. Yeah. There's lot tension between the Yeah. The, the, the fresh perspective and the recognizing the, the constraints, but hopefully inventing something new along the way. Yeah. Great. If you had to give one piece of advice, if, you know, if a newish founder in this industry we're coming to you for, for advice, what would be the what would be the most important thing you could.

Samuel Cohen

Yeah. So I think there's two things. The first one is like challenge, like every single assumption or that you are getting and hearing. And I've learned that from, elon Musk book, like a guy that's a guy that's like obviously like super controversial and polarized. But I think managed, do things that are absolutely crazy. It got into the space field knowing no, nothing about space and challenging everything that was set in stone since like dozens of years, right? So I would say challenge every single assumption. And that's also the kind of thing, a moderated qual have proven to us.

Mike

Yep.

Samuel Cohen

That that's the first thing, challenge every assumption. And the second thing is, network.

Mike

Yep.

Samuel Cohen

done like maybe four, four trade shows in a. The last four years,

Mike

Yep.

Samuel Cohen

while obviously having a lot of work on the side, but I always took the time to do these trade shows and build like, longstanding relationships and friendship that I

Mike

Yeah.

Samuel Cohen

brought us like so much to, to this company. So yeah, that's the two things I would say.

Mike

Yeah. Okay, great. Good advice. And obviously. Keep yourself informed by subscribing to Insight Platforms and,

Samuel Cohen

sense.

Mike

good stuff. Okay, Samuel, that's been a great conversation. We've covered a lot of ground. We talked about, you know, obviously like digital twins, synthetic, the space, the innovations that are happening and. Some very useful lessons that people can apply if they're considering building their own tech startup in this space. Thank you very much for, for sharing that insights with us and for those of you watching or listening, we have many other episodes in the pipeline. You can also find out much more about the topics we've discussed on Insight Platforms. There are articles, there are eBooks, there are some of the corporate researcher guides. We have. The pipeline on related topics. So do check those out if you wanna find out more. So Samuel, CEO, founder of Fairgen, thank you very much for your time and we'll look forward to seeing you all again at future episodes. Thank you.

Samuel Cohen

you, Mike.