The $200K Mistake Most AI Projects Make
The most expensive mistake in AI product development is building a full MVP around an AI approach that can't achieve the required accuracy. We've seen companies spend $200K+ building a product around a custom ML model that plateaued at 72% accuracy when they needed 90%+. A 2-day PoC would have revealed this before a line of production code was written.
When to Run a PoC
Run a PoC when:
- You're proposing a new ML model for a task without established benchmarks in your domain
- Your use case requires a specific accuracy threshold (e.g., medical imaging needs >95% sensitivity)
- You're evaluating multiple AI approaches (RAG vs fine-tuning vs prompt engineering) and need evidence to choose
- Your AI system will process proprietary data and you're unsure how available models will perform without fine-tuning
Skip the PoC when:
- You're using an established API for a well-benchmarked task (GPT-4 for text summarisation, standard OCR for document parsing)
- Your accuracy requirements are low and your AI feature is enhancement, not core functionality
The 5-Step AI PoC Framework
Step 1: Define the success criterion (Day 0.5)
Write down the specific, measurable outcome that would validate the approach. Examples:
- "The model classifies support tickets with >88% accuracy on our test set"
- "The RAG pipeline answers 90%+ of FAQ questions correctly with citations"
- "The image classifier detects defects with <2% false positive rate"
If you can't state the success criterion precisely, you're not ready for a PoC.
Step 2: Prepare evaluation data (Day 1)
Collect 50–200 labelled examples of the task you're evaluating. For a classification task: 50 labelled samples across all classes. For a generation task: 50 input-output pairs with expert-rated outputs. This is your test set — you won't use it for training.
Step 3: Build the minimum viable pipeline (Day 1–2)
Implement the simplest version of the AI approach that could conceivably work. For LLM tasks: write the prompt, call the API, parse the output. For ML tasks: train a baseline model on available data. No production code, no error handling, no UI. Just the core logic.
Step 4: Evaluate against your test set (Day 2–3)
Run your test set through the pipeline. Calculate your target metric (accuracy, F1, BLEU, human rating). Compare to your success criterion. Document where it fails and why.
Step 5: Decide and document (Day 3–5)
Write a 1-page PoC report: what you tested, what the results were, why it passed/failed, and the recommendation (proceed to MVP, iterate on approach, or stop). This document is the input to your MVP scoping.
PoC Success Patterns
LLM tasks: Few-shot prompting with 5–10 examples almost always outperforms zero-shot. Try it before concluding that fine-tuning is needed.
RAG tasks: Chunking strategy has more impact than model choice. Test 3 chunking approaches (fixed size, semantic, sentence) before evaluating models.
ML classification: A simple logistic regression or gradient boosting baseline is often within 5% of a deep learning model and 10× faster to train. Start simple.
When PoC Results Are Disappointing
If your PoC doesn't meet the success criterion, you have three options:
- Narrow the scope: Instead of classifying all 50 support ticket types, can you achieve 95% accuracy on the top 10 most common types?
- Augment the data: Can you collect more labelled examples to improve model performance?
- Change the approach: If RAG underperforms, try fine-tuning. If fine-tuning is too costly, reconsider whether AI is the right tool.
SpeedMVPs includes a PoC phase in all AI MVP engagements where the AI approach involves significant technical risk. Contact us to discuss whether your project needs a PoC.

