Work · Case study
Global content moderation at short-form scale
Building and running the content review pipeline at a global short-form video platform. AI-assisted scoring on ingest, country-specific manual review, escalation paths for sensitive content, and the operational reporting behind it.
Overview
Firework is a global short-form video platform that powers shoppable and livestream video on customer websites. Content comes in from many directions: Instagram and YouTube importers, customer CMS feeds, manual uploads from a business portal, and bulk uploads through an internal admin. All of it had to be moderated before it surfaced on customer sites, and the operation had to scale across regions and languages without burning out the review team.
I led the team that owned this end to end. The work was as much workflow design as it was day-to-day operations.
The problem
When I joined, content moderation was happening, but the pipeline was uneven: ingestion sources had different review rules, country-specific compliance needs were handled inconsistently, and the team didn't have a clear escalation path for the harder calls. The operational data was scattered, which made it hard to argue for resourcing or to spot patterns.
We needed a single moderation workflow that could handle very different content sources, run AI scoring and human review side by side, route correctly across more than a dozen regional buckets, and produce reporting the broader company could see.
Context
Multiple groups felt the impact of how moderation worked, or didn't:
- Customers were exposed to whatever cleared review. Inconsistent standards were a brand risk for them and for Firework.
- The review team was the front line. They needed clear rules, predictable routing, and a way to escalate without breaking flow.
- Customer Success needed answers fast when a customer asked why a clip was approved or denied.
- Engineering needed the workflow to be implementable inside the existing CMS, not bolted on top.
- Compliance and leadership needed visibility into volume, accuracy, and trend lines.
My role
I led the team, owned the workflow design, and partnered with engineering on the system pieces. Specifically:
- Mapped every ingest path and how content actually entered the system
- Designed the AI-assisted-then-human review pipeline, including the scoring thresholds and the country-specific routing
- Documented the moderation rules, the watermark and rating tag conventions, and the escalation logic
- Partnered with engineering on the CMS-side workflow so the moderation queue and outcomes were first-class data, not a side spreadsheet
- Set up the reporting cadence into compliance Slack channels so the operation was visible to the rest of the company
- Built the team's playbook and trained reviewers
What the pipeline looked like
From the outside, a creator uploaded a clip and either saw it go live or got a denial. Inside, it was a multi-stage pipeline:
1. Ingestion
Content came in from Instagram and YouTube importers, manual uploads via the business portal, customer CMSes through mRSS, and bulk uploads through internal admin.
2. Automated processing
Lambda jobs transcoded the asset and produced thumbnails. A third-party visual moderation service generated a risk score for each clip.
3. Score-based routing
Clips with very low risk auto-flagged for spot checks. Clips in the middle band went into a manual review queue. Clips above a confidence threshold passed through. The thresholds were tunable.
4. Country-specific manual review
Manual queues were split into regional buckets (a global default plus country-specific ones like BR, JP, IN, LAT, MENA) so reviewers familiar with local context handled what they were best at.
5. Watermarking and tagging
Reviewers applied watermarks and rating tags during approval, which downstream surfaces (including syndication to Google Discover) read to decide where the clip could appear.
6. Escalation and reporting
Anything reviewers flagged as unresolved or NSFW went to a supervisor queue with its own SLA. Daily activity reported into a content compliance Slack channel.
Product decisions and trade-offs
A few of the calls we had to make that ended up shaping the workflow:
- Where to set the AI thresholds. Too aggressive and the human review queue exploded. Too permissive and the brand-risk exposure grew. We tuned the bands over the first few months as we learned what the AI was good at and where it consistently missed.
- Global vs. regional review. Splitting the queues by region added some operational overhead but dramatically improved the quality of edge calls. Local reviewers caught things a global queue would have missed.
- Watermark and rating tags as first-class metadata. Treating these as part of the asset rather than as an external state meant downstream surfaces (syndication, customer feeds) didn't have to ask a separate system.
- What gets auto-approved. We deliberately kept a small percentage of high-confidence clips spot-checked even when the AI said they were clean. That gave us a sanity check on the AI itself.
- How to handle unresolved cases. Reviewers had a fast escalation path so they weren't stuck. Supervisors carried a daily SLA on the escalation queue.
Outcomes
Without sharing internal numbers: the pipeline shifted the operation from reactive to operational. We had:
- A single workflow that handled every ingest path, with consistent rules across regions
- Country-specific review with reviewers who actually knew the local context
- Reporting that compliance and leadership could see, not a black box
- An escalation path that reviewers actually used, so the hard calls got triaged instead of buried
- A foundation that let us absorb new ingest sources (additional importers, customer CMS partners) without redesigning everything
The work also gave Customer Success a real answer to give customers when a clip didn't pass: the rules were documented, the workflow was traceable, and the decision could be reviewed.
What I learned
Three things stuck with me:
AI is a triage tool, not a moderator. The most useful thing the automated scoring did was sort. Decisions still happened with humans in the loop, and the workflow design was about making those humans effective.
Regional context is real. A global review queue with global rules sounds tidy and isn't. The minute a reviewer is judging something they don't understand, you start trading accuracy for throughput.
Operational visibility is the moat. The thing that made this operation defensible inside the company wasn't the workflow itself. It was that the workflow produced numbers other teams could see and use.