AI sentiment analysis for online reviews: the 2026 framework that actually works
Most "AI for reviews" tools are just sentiment colour-coding. Here's the 5-layer framework we ship to clients — sentiment, topic, intent, urgency, and reply drafting in your voice.
Sentiment analysis on its own is table-stakes by 2026. The framework that wins is layered — extracting topic, intent, urgency and tone, then drafting replies in the brand voice. Here's exactly how to build it.
If you've shopped for review-management software in the last twelve months, every vendor's pitch deck has the same screenshot: a coloured pill that says "positive · 87%". That's sentiment analysis at its most basic — and it's about as useful, in 2026, as a smoke alarm that just says "hot". You need to know what's hot, where, and what to do.
We build review-management features into agency-grade SaaS (RepSaaS) and one-off systems for owner-operated businesses. The framework that consistently moves the needle isn't sentiment by itself — it's a layered model that strips out five separate signals from each review and feeds them into reply drafting, alerting and operations. Here it is, end to end.
Why "sentiment alone" fails
A 4-star review that says "great staff but the toilets were filthy" is positive on sentiment and a five-alarm fire on operations. A 5-star review that says "my cousin works here and you all do amazing" is positive on sentiment and total noise on signal. Sentiment, alone, can't tell those two apart from the kind reviews you'd want to put on the homepage.
What you actually need is to know:
- How does the customer feel? (sentiment)
- What are they talking about? (topic / aspect)
- What do they want to happen next? (intent)
- How time-sensitive is this? (urgency)
- How should we respond? (tone-tuned reply draft)
The 5-layer framework
Each layer feeds the next. The output is a structured object you can then alert, route, file and reply to — not just a colour-coded pill.
Layer 1 · Sentiment
Output: a triplet — positive %, neutral %, negative %. We don't reduce to a single label because mixed reviews are common. "Loved the food, hated the service" should land as 50/10/40, not "neutral".
Implementation: a single GPT-4-class call with a structured-output schema. ~£0.0008 per review at OpenAI 4.1-mini prices in April 2026, so you can run sentiment on 100,000 reviews for under £80. Don't bother training your own model unless you're doing 10M+ reviews/month.
Layer 2 · Topic extraction
Output: a list of aspects mentioned (e.g. {staff, food, ambience, value, cleanliness, wait time}). For SMBs we keep this to a closed taxonomy of 8–12 aspects per industry — open-ended topic extraction sounds nice but creates an unmanageable long-tail you can't act on.
Layer 3 · Intent
Output: one or more intents the customer signals. Common buckets: {praising, complaining, asking_question, requesting_action, comparing, warning_others, recommending}. This is the layer most platforms skip — and it's the one that drives whether you reply, escalate, or route to operations.
A review with intent={complaining, requesting_action} should auto-create an internal ticket. A review with intent={praising, recommending} should be queued for a public reply and a thank-you. Same star rating; completely different operational response.
Layer 4 · Urgency
Output: a 1–4 score where 1 is "reply this week" and 4 is "someone needs to call this customer in the next hour". Calibrated against signals like food-safety mentions, allegations of staff misconduct, threats of legal action, mentions of your competitors as alternatives, or repeat-customer disappointment.
Layer 5 · Reply drafting
Output: a draft reply, tuned to the business's tone of voice, the customer's sentiment, and the topic they mentioned. Critical: the reply isn't sent automatically. It's drafted, the owner approves (one click) or edits (one paragraph), then it goes.
How to tune the voice: feed the model 5–10 examples of the owner's previous replies as a few-shot prompt. Don't fine-tune — by April 2026 that's overkill for tone work; in-context examples beat fine-tuning for this use case 9 times out of 10 and cost a fraction.
See the AI Agents demoLive sentiment analysis + reply drafting on the AI Agents pageThe 2026 stack
| Job | Recommended | Cost per review | Why |
|---|---|---|---|
| Sentiment + topic + intent | GPT-4.1-mini or Claude 3.5 Haiku | £0.0008–£0.0015 | Cheap, fast, structured-output friendly |
| Urgency scoring | Same call, multi-task prompt | (included) | Combine to save tokens |
| Reply drafting | Claude 3.5 Sonnet or GPT-4.1 | £0.004–£0.008 | Need higher tone fidelity here |
| Vector search of similar past reviews | Pinecone / pgvector | £0.0001 | Helps with context recall |
What good looks like (the metrics that move)
Within six months of deploying the full layered framework, here's what we've seen across SMB clients (your mileage will vary, but these are typical bands):
- Average reply latency: from 4–7 days down to under 24 hours
- Reply rate: from 22% (typical SMB) to 95%+
- 1–2 star review recovery rate (customer changes their review): from 4% to 18%
- Repeat-customer rate among complainers reached out to: from 11% to 31%
- Time spent on reviews per week: from 4–6 hours down to 25–45 minutes
"We didn't realise how many great signals we were leaving on the floor. The sentiment colours were nice. The topic + urgency tagging was where the actual money was hiding."
— Pilot client, hospitality (5 venues, 8,000 reviews/year)
Frequently asked questions
Do I really need all 5 layers, or can I start with sentiment?
Start with sentiment + topic. Those two alone get you 60% of the value. Add urgency + intent in month two. Reply drafting in month three. Going all-in on day one is overkill — most teams won't use the full output until they've built the operational habit of acting on the simpler tags.
Will this work for non-English reviews?
Yes. GPT-4.1-mini and Claude 3.5 Haiku handle multilingual sentiment + topic extraction with no changes needed in 2026. Reply drafting in non-English needs more careful tone-tuning examples, but the architecture is identical.
How do I avoid the AI replying like a chatbot?
Two tricks: (1) Feed it 5–10 real previous replies as in-context examples, and (2) Bake a "do not say" list into the prompt — banned phrases like "we strive", "valued customer", and any corporate-speak the brand wouldn't use. Most AI-generated replies fail the smell test because nobody banned the smell.
Should I auto-publish replies?
No. Always draft, never auto-send. The cost of one off-tone public reply outweighs months of saved time. Have the AI draft, the owner approve. The approval click takes two seconds and is worth the human-in-the-loop safety it adds.
What about negative review removal — can AI flag fake reviews for takedown?
Yes — pattern-matching for review fraud (suspicious account age, off-topic content, mention of competitor by name) is straightforward with GPT-4.1-class models. Whether Google or TripAdvisor act on the takedown request is a different battle entirely. AI can build the dossier; the platform decides what to do with it.
Got a build like this in your head?
Free 30-minute call. Fixed quote in 48 hours. Source code yours. If we're not the right fit, I'll say so up front.
