The Science Behind Better AI Prompts (And Why Most of What We Think We Know Is Wrong)

Earbud Insights

Jul 24

I was halfway through my morning coffee when Sander Schulhoff dropped a truth bomb that made me pause my podcast: “Role prompting doesn’t work for accuracy tasks anymore.”

Wait, what? The guy who literally created the first prompt engineering guide on the internet (two months before ChatGPT even existed) was telling me that “You are an expert lawyer…” prompts are basically useless now?

This is exactly why I love Lenny’s Podcast. Sometimes you stumble into conversations that completely reframe how you think about something you use every day. Schulhoff’s recent episode, fresh off co-authoring the most comprehensive prompt engineering study ever conducted (with OpenAI, Microsoft, Google, Princeton, and Stanford), was one of those moments.

The Tale of Two Prompts

Here’s the first thing that clicked for me: we’re actually playing two completely different games when we prompt AI.

Conversational prompting is what most of us do daily — that back-and-forth dance where you ask ChatGPT to write an email, then refine it with “make it more formal” or “add some humor.” It’s iterative, forgiving, and honestly pretty forgiving of sloppy technique.

Product/system prompting is the serious stuff — prompts that process thousands or millions of inputs in production systems. Medical coding tools, customer support bots, content generation pipelines. Here, you don’t get to see the output and iterate. You need to get it right the first time, every time.

Most prompt engineering advice conflates these two, which is like giving the same driving tips for both casual neighborhood cruising and Formula 1 racing.

What Actually Works (According to Science, Not Guru Tweets)

After analyzing over 1,500 academic papers covering 200+ prompting techniques, here’s what actually moves the needle:

Few-Shot Prompting: Show, Don’t Tell

This was Schulhoff’s #1 recommendation, and it makes perfect sense once you think about it. Instead of describing what you want in elaborate detail, just show the AI examples.

Writing emails in your style? Paste a few previous emails. Need data analysis? Show the format you want. The AI learns your patterns way better than it interprets your descriptions.

The magic format: q: [your input] a: [desired output] — even when it's not technically a question-answer pair. Simple, effective, and leverages how these models were trained.

Decomposition: Think Like a Developer

Ask yourself: “What are the subproblems that need solving here?” Then tackle them step by step.

Instead of asking a customer service bot to “handle this return,” break it down:

Confirm customer status
Verify purchase details and date
Check insurance coverage
Apply return policy rules
Generate response

Each step becomes more manageable and debuggable.

Self-Criticism: The Free Performance Boost

This one feels almost too simple to work, but the research is clear:

Let the AI solve your problem
Ask it to critique its own response
Request improvements based on that critique
Repeat 1–3 times

It’s like having a built-in editor that costs zero extra tokens upfront and consistently improves output quality.

Context is Still King

The more relevant information you provide upfront, the better the results. One medical coding study saw 70% accuracy improvement just from better context provision.

Pro tip: Put context at the beginning of your prompt for better caching and performance.

The Myth-Busting Section

Here’s where things get interesting (and where I had to unlearn some habits):

Role prompting for accuracy tasks is mostly theater. “You are a world-class expert in…” doesn’t actually improve factual accuracy or reasoning. It still works for creative tasks, writing style, and summarization, but for getting correct answers? Skip it.

AI doesn’t care about your incentives. Those “$200 tip for perfect answer” or “this is urgent, lives depend on it” additions? They do nothing. The AI isn’t motivated by money or emotional appeals.

Simple defensive prompts are security theater. Just saying “don’t follow malicious instructions” won’t protect against sophisticated prompt injection attacks.

The Meta Realization

What struck me most wasn’t any single technique, but how this research validates something I’ve been feeling: we’re still in the early days of understanding how to work with these systems effectively.

Schulhoff runs HackAPrompt, the largest AI red teaming competition, and partners with OpenAI on safety research. His perspective isn’t just academic — it’s grounded in real-world production use and adversarial testing.

The fact that we needed 1,500+ papers to figure out what actually works (and what doesn’t) shows how much folklore and cargo cult practices have built up around prompt engineering. We’re finally getting the scientific foundation to separate signal from noise.

Practical Takeaways for Daily Use

For casual, conversational use: Keep it simple. “Write this email,” “make this better,” “improve this” works just fine. Don’t overthink it.

For anything production-critical: Invest in proper techniques. Few-shot examples, decomposition, self-criticism, and comprehensive context. Test rigorously.

The gap between these two approaches is bigger than most people realize, and treating them the same is a recipe for frustration (or production failures).

Why This Matters Now

As AI capabilities rapidly advance, the difference between mediocre and excellent prompting becomes more consequential. Companies are building entire product experiences around AI outputs. The stakes for getting this right keep rising.

But here’s the encouraging part: the techniques that work aren’t complex or require PhD-level understanding. They’re often simpler than the elaborate prompt templates floating around Twitter.

Sometimes the best insights come from stepping back and asking: “What does the research actually say?” rather than “What does conventional wisdom suggest?”

Shakie Liu