How to Run an AI Proof of Concept Before Your MVP | SpeedMVPs

Q: What is an AI proof of concept?

An AI PoC is a time-boxed experiment (1–5 days) that tests whether a specific AI approach can achieve the target quality metrics before investing in a full MVP build. It produces evidence, not a product.

Q: How long should an AI proof of concept take?

1–5 days. Anything longer is an MVP, not a PoC. The goal is to answer a specific technical question: 'Can we achieve X accuracy on Y task?' with minimum effort.

Q: What is the difference between a PoC and an MVP?

A PoC answers a technical question: 'Can this work?' An MVP answers a market question: 'Will people use and pay for this?' A PoC validates the AI approach; an MVP validates the business model.

Q: When should you skip the PoC and go straight to the MVP?

When your AI approach is proven technology with documented accuracy benchmarks for your task type (e.g., GPT-4 for text classification, established computer vision models for object detection). Skip the PoC when the risk is not technical but market.

The $200K Mistake Most AI Projects Make

The most expensive mistake in AI product development is building a full MVP around an AI approach that can't achieve the required accuracy. We've seen companies spend $200K+ building a product around a custom ML model that plateaued at 72% accuracy when they needed 90%+. A 2-day PoC would have revealed this before a line of production code was written.

When to Run a PoC

Run a PoC when:

You're proposing a new ML model for a task without established benchmarks in your domain
Your use case requires a specific accuracy threshold (e.g., medical imaging needs >95% sensitivity)
You're evaluating multiple AI approaches (RAG vs fine-tuning vs prompt engineering) and need evidence to choose
Your AI system will process proprietary data and you're unsure how available models will perform without fine-tuning

Skip the PoC when:

You're using an established API for a well-benchmarked task (GPT-4 for text summarisation, standard OCR for document parsing)
Your accuracy requirements are low and your AI feature is enhancement, not core functionality

The 5-Step AI PoC Framework

Step 1: Define the success criterion (Day 0.5)

Write down the specific, measurable outcome that would validate the approach. Examples:

"The model classifies support tickets with >88% accuracy on our test set"
"The RAG pipeline answers 90%+ of FAQ questions correctly with citations"
"The image classifier detects defects with <2% false positive rate"

If you can't state the success criterion precisely, you're not ready for a PoC.

Step 2: Prepare evaluation data (Day 1)

Collect 50–200 labelled examples of the task you're evaluating. For a classification task: 50 labelled samples across all classes. For a generation task: 50 input-output pairs with expert-rated outputs. This is your test set — you won't use it for training.

Step 3: Build the minimum viable pipeline (Day 1–2)

Implement the simplest version of the AI approach that could conceivably work. For LLM tasks: write the prompt, call the API, parse the output. For ML tasks: train a baseline model on available data. No production code, no error handling, no UI. Just the core logic.

Step 4: Evaluate against your test set (Day 2–3)

Run your test set through the pipeline. Calculate your target metric (accuracy, F1, BLEU, human rating). Compare to your success criterion. Document where it fails and why.

Step 5: Decide and document (Day 3–5)

Write a 1-page PoC report: what you tested, what the results were, why it passed/failed, and the recommendation (proceed to MVP, iterate on approach, or stop). This document is the input to your MVP scoping.

PoC Success Patterns

LLM tasks: Few-shot prompting with 5–10 examples almost always outperforms zero-shot. Try it before concluding that fine-tuning is needed.

RAG tasks: Chunking strategy has more impact than model choice. Test 3 chunking approaches (fixed size, semantic, sentence) before evaluating models.

ML classification: A simple logistic regression or gradient boosting baseline is often within 5% of a deep learning model and 10× faster to train. Start simple.

When PoC Results Are Disappointing

If your PoC doesn't meet the success criterion, you have three options:

Narrow the scope: Instead of classifying all 50 support ticket types, can you achieve 95% accuracy on the top 10 most common types?
Augment the data: Can you collect more labelled examples to improve model performance?
Change the approach: If RAG underperforms, try fine-tuning. If fine-tuning is too costly, reconsider whether AI is the right tool.

SpeedMVPs includes a PoC phase in all AI MVP engagements where the AI approach involves significant technical risk. Contact us to discuss whether your project needs a PoC.

Frequently Asked Questions

An AI PoC is a time-boxed experiment (1–5 days) that tests whether a specific AI approach can achieve the target quality metrics before investing in a full MVP build. It produces evidence, not a product.

1–5 days. Anything longer is an MVP, not a PoC. The goal is to answer a specific technical question: 'Can we achieve X accuracy on Y task?' with minimum effort.

A PoC answers a technical question: 'Can this work?' An MVP answers a market question: 'Will people use and pay for this?' A PoC validates the AI approach; an MVP validates the business model.

When your AI approach is proven technology with documented accuracy benchmarks for your task type (e.g., GPT-4 for text classification, established computer vision models for object detection). Skip the PoC when the risk is not technical but market.