What are scaling laws in AI and why do they matter?

Scaling laws predict model performance based on compute, data, and parameters. They matter because they show that model capabilities improve predictably with scale, helping startups plan which capabilities to expect from future models and when to build vs wait.

AI/ML Guide

AI Model Selection for Startups: Claude vs GPT vs Gemini

Q: Should startups use open source LLMs like Llama or DeepSeek?

Open source models make sense when you need full control, privacy, or cost optimization at scale. DeepSeek R1 offers reasoning comparable to frontier models at lower cost. However, for most startups, API-based models (Claude, GPT) are faster to ship and iterate with.

We analyzed 47 Y Combinator videos featuring Andrej Karpathy, Sam Altman, Dario Amodei, and other AI leaders to extract practical guidance on which models to use and when.

20 min read Updated January 2025 47 YC videos analyzed

AI Model Selection Guide - Visualization of Claude, GPT, and Gemini comparison

In this guide:

1. The 2025 Model Landscape 2. Karpathy's Software 3.0 Framework 3. Claude vs GPT vs Gemini 4. Reasoning Models: o1 vs DeepSeek R1 5. Scaling Laws Explained 6. State-of-the-Art Prompting 7. Open Source vs Closed Models 8. Decision Framework

YC videos analyzed

78%

Cost reduction possible

Frontier model providers

10x

Context window growth in 2 years

The 2025 Model Landscape

The AI model landscape has exploded. In Y Combinator interviews, founders and investors consistently emphasize one thing: the model you choose matters less than how you use it. Sam Altman put it directly: "The models are converging in capability. What matters is your application layer."

That said, meaningful differences still exist. Here's the current state:

Model	Best For	Weakness	Context
Claude 3.5 Sonnet	Coding, extended reasoning, writing	Tool use ecosystem	200K
GPT-4o	Multimodal, real-time, tool use	Reasoning depth	128K
Gemini 1.5 Pro	Long context, video, Google ecosystem	Consistency	1M+
o1	Complex reasoning, math, science	Speed, cost	128K
DeepSeek R1	Reasoning at lower cost	Ecosystem, support	64K
Llama 3.1 405B	Self-hosting, privacy, customization	Infrastructure needs	128K

YC Partner Insight

From YC's AI talks: "Model capability is table stakes now. The winners will be those who understand their users deeply and build the right application layer on top."

Karpathy's Software 3.0 Framework

In his YC talk, Andrej Karpathy laid out a framework for understanding LLMs that every founder should internalize.

1.0

Software 1.0: Explicit Code

Traditional programming. You write explicit rules. Deterministic, predictable, but limited to problems you can specify completely.

2.0

Software 2.0: Neural Networks

Machine learning. You provide data and architecture. The model learns the program. Great for pattern recognition but narrow.

3.0

Software 3.0: LLMs

Natural language programming. You describe what you want in English. The model "knows" from training on human knowledge. General purpose but probabilistic.

Karpathy's key insight: LLMs are not deterministic computers. They're "vibes-based" systems. You need to treat them like you would a new hire - give them examples, iterate on instructions, and verify their work.

The Psychology of LLMs

Karpathy emphasizes that LLMs have "psychology" - they respond to social pressure in prompts, they try to please, and they sometimes hallucinate when uncertain. Understanding this is key to using them effectively.

Claude vs GPT vs Gemini: Practical Differences

Based on YC interviews with founders and AI leaders, here's where each model excels in practice.

Claude (Anthropic)

Dario and Amanda Amodei's interviews on YC reveal Claude's design philosophy: safety through understanding, not restrictions. The model is trained to be genuinely helpful while avoiding harm.

Strengths

Extended thinking for complex reasoning
Coding (particularly with Claude Code)
Long, nuanced writing
Following complex instructions
200K context with strong recall

Best Use Cases

AI-assisted coding (Cursor, Claude Code)
Document analysis
Technical writing
Research synthesis

GPT-4/4o (OpenAI)

Sam Altman's YC interviews emphasize OpenAI's focus on developer experience and ecosystem. GPT-4's strength is the breadth of integrations and tooling around it.

Strengths

Best-in-class tool use and function calling
Real-time voice (GPT-4o)
Vision capabilities
Massive ecosystem
Consistent API reliability

Best Use Cases

Production apps with complex workflows
Voice assistants
Multi-modal applications
Agent systems with many tools

Gemini (Google)

Less frequently discussed in YC talks, but the 1M+ token context window makes Gemini uniquely powerful for specific use cases.

Strengths

Massive context window (1M+ tokens)
Native video understanding
Google ecosystem integration
Competitive pricing

Best Use Cases

Entire codebases in context
Long video analysis
Google Workspace integration
Long document processing

Cursor CEO's take

In his YC interview, Cursor's CEO explains their model switching: "We use Claude for heavy lifting - the actual code generation. But the model choice matters less than the context you give it. Most of the intelligence is in how you construct the prompt."

Reasoning Models: o1 vs DeepSeek R1

2024-2025 saw the rise of "reasoning models" - LLMs that explicitly think through problems step-by-step before answering.

OpenAI o1

Proprietary chain-of-thought reasoning
Excels at math, coding, science
Slower but more accurate for hard problems
Premium pricing ($15/1M input, $60/1M output)

DeepSeek R1

Open weights (you can run it locally)
Comparable reasoning to o1
Much lower cost
Chinese company - consider data residency

From YC's analysis of DeepSeek: "The engineering innovations are real." DeepSeek achieved similar results to frontier models with significantly less compute, using techniques like:

FP8 Training

8-bit floating point instead of 16-bit. 2x memory efficiency, enabling larger batches and faster training.

Mixture of Experts (MoE)

Only 37B parameters active per inference despite 671B total. Dramatically reduces inference cost.

Multi-head Latent Attention

Compresses KV cache for faster inference without quality loss.

Scaling Laws: What Every Founder Should Know

YC's dedicated scaling laws episode breaks down why AI capabilities keep improving predictably.

The Core Equation

Loss = A × (Compute)^(-0.05) × (Data)^(-0.05) × (Parameters)^(-0.076)

In plain English: Model performance improves predictably as you increase compute, data, or parameters. The relationship is logarithmic - you need 10x more resources for each incremental improvement.

Models Will Keep Getting Better

No ceiling in sight. GPT-5, Claude 4, etc. will be meaningfully more capable than current models. Build for this - don't over-engineer around current limitations.

Inference Costs Will Drop

Every 18 months, the same capability gets 10x cheaper. What costs $1 today will cost $0.10 in 18 months. Price accordingly.

The Moat Is Not the Model

If your product is just "GPT-4 + a wrapper," you have no moat. The defensibility comes from data, distribution, and user workflows - not model access.

The GPT Wrapper Myth

YC's analysis shows that "GPT wrapper" companies CAN build real businesses. The key is building something that gets better with use - whether through proprietary data, user feedback loops, or workflow integration that creates switching costs.

State-of-the-Art Prompting for AI Agents

From YC's prompting masterclass, here are the techniques that actually move the needle.

Be Specific About Format

Don't say "return JSON." Say "Return a JSON object with keys: name (string), score (integer 0-100), reasoning (string, 2-3 sentences)." The more specific, the more reliable.

Give Examples (Few-Shot)

Show 2-3 examples of the exact input/output format you want. This works better than any amount of explanation for most tasks.

Use Chain-of-Thought

For complex tasks, explicitly ask the model to "think step by step" or "explain your reasoning before giving the final answer." This dramatically improves accuracy on multi-step problems.

Define Escape Hatches

Tell the model what to do when it's uncertain: "If you're not sure, respond with 'UNSURE: ' followed by your best guess and why you're uncertain."

The Temperature Setting

For deterministic tasks (extraction, classification), use temperature=0. For creative tasks (writing, brainstorming), use 0.7-1.0. Most startups should default to temperature=0 and only increase when they want variety.

Open Source vs Closed Models

YC interviews increasingly discuss when to use open source models like Llama, Mistral, or DeepSeek.

Use Open Source When

Privacy/data residency is critical
You need to fine-tune for your domain
High volume makes API costs prohibitive
You need full control over the model
Latency requirements are extreme

Use Closed APIs When

Speed to market matters most
You're still iterating on product-market fit
Volume is low to moderate
You need the latest capabilities
You don't want to manage infrastructure

The practical advice from YC founders: Start with APIs (Claude or GPT), validate your product, then consider open source for specific high-volume use cases. Don't prematurely optimize for cost.

Decision Framework: Which Model to Use

Based on patterns across 47 YC videos, here's a practical decision tree.

For Coding/Development

Use Claude via Cursor or Claude Code. Multiple YC founders cite Claude as their primary coding assistant.

Fallback: GPT-4 if you need specific integrations or tool use.

For Complex Reasoning/Math

Use o1 or DeepSeek R1. The extended thinking time is worth it for problems that require multi-step reasoning.

Cost tip: Start with DeepSeek R1 for testing, use o1 for production if quality matters.

For Production Apps with Tools

Use GPT-4. The function calling and tool use ecosystem is most mature. Reliability matters more than marginal capability differences.

Consider: Claude for heavy lifting, GPT-4 for orchestration.

For Long Context (Entire Codebases)

Use Gemini 1.5 Pro. The 1M+ token context window is unmatched for stuffing entire codebases into context.

Alternative: Claude 200K is often sufficient and more consistent.

For Voice/Real-Time

Use GPT-4o. Native voice mode is still ahead. Combine with LiveKit for production voice apps.

Cost tip: Use speech-to-text pipeline instead of real-time mode for significant savings.

For Simple Tasks (Classification, Extraction)

Use Claude Haiku or GPT-3.5. Don't overpay for capabilities you don't need. These are 10-50x cheaper than frontier models.

Rule: If Haiku works 95% of the time, use Haiku and handle edge cases separately.

Want to research AI and startup channels?

Taffy lets you analyze transcripts and comments from any YouTube channel. Find out what AI tools founders are discussing, what problems they're solving, and what's working.

Get Started Free

Free daily channel insights. No credit card required.

Frequently Asked Questions

Should I use Claude or GPT for my startup?

It depends on your use case. Claude excels at extended thinking, coding, and complex reasoning. GPT-4 has stronger tool use and real-time capabilities. For most startups, the recommendation is: use Claude for heavy cognitive tasks, GPT-4 for production apps with many integrations.

What are scaling laws and why do they matter?

Scaling laws predict model performance based on compute, data, and parameters. They matter because they show capabilities improve predictably with scale. This helps you plan which features to build now vs. wait for future models to enable.

Should startups use open source LLMs like Llama or DeepSeek?

Start with APIs (Claude, GPT) for speed. Consider open source when you have: privacy requirements, high volume making APIs expensive, or need for fine-tuning. Don't prematurely optimize - validate your product first.

What are reasoning models and when should I use them?

Reasoning models like o1 and DeepSeek R1 use chain-of-thought to solve complex problems. Use them for math, coding challenges, and multi-step reasoning. They're slower and more expensive, so reserve them for tasks where accuracy matters more than speed.

How do I reduce LLM costs without sacrificing quality?

Use model cascading (start with cheap models, escalate if needed), clean your prompts to reduce tokens, use smaller models for simple tasks (Haiku, GPT-3.5), and implement caching for repeated queries. YC founders report 78%+ cost reductions with these techniques.

AI Model Selection for Startups: Claude vs GPT vs Gemini

In this guide:

The 2025 Model Landscape

Karpathy's Software 3.0 Framework

Software 1.0: Explicit Code

Software 2.0: Neural Networks

Software 3.0: LLMs

Claude vs GPT vs Gemini: Practical Differences

Claude (Anthropic)

Strengths

Best Use Cases

GPT-4/4o (OpenAI)

Strengths

Best Use Cases

Gemini (Google)

Strengths

Best Use Cases

Reasoning Models: o1 vs DeepSeek R1

OpenAI o1

DeepSeek R1

FP8 Training

Mixture of Experts (MoE)

Multi-head Latent Attention

Scaling Laws: What Every Founder Should Know

The Core Equation

Models Will Keep Getting Better

Inference Costs Will Drop

The Moat Is Not the Model

State-of-the-Art Prompting for AI Agents

Be Specific About Format

Give Examples (Few-Shot)

Use Chain-of-Thought

Define Escape Hatches

Open Source vs Closed Models

Use Open Source When

Use Closed APIs When

Decision Framework: Which Model to Use

For Coding/Development

For Complex Reasoning/Math

For Production Apps with Tools

For Long Context (Entire Codebases)

For Voice/Real-Time

For Simple Tasks (Classification, Extraction)

Want to research AI and startup channels?

Frequently Asked Questions

Should I use Claude or GPT for my startup?

What are scaling laws and why do they matter?

Should startups use open source LLMs like Llama or DeepSeek?

What are reasoning models and when should I use them?

How do I reduce LLM costs without sacrificing quality?

Related Guides

AI Automation Agency

10 Skills for the AI Age

$100K Side Business Blueprint

Get the next guide first