Skip to main content

Blog Spotlight: How to validate data at scale: Geometry, schema, attributes, and AI

  • June 12, 2026
  • 0 replies
  • 0 views

creeatsafe
Safer
Forum|alt.badge.img+11

Welcome to this week's blog spotlight: How to Validate Data at Scale: Geometry, Schema, Attributes, and AI. If you've ever had a workflow break because of bad incoming data, or worse, not break but silently pass bad data through, this one is essential reading!

Why This Matters
 The blog opens with a striking real-world example: a mislabeled utility line contributed to an oil spill in Burnaby, BC, in 2007. It's a powerful reminder that data quality isn't just a technical concern; it has real consequences. And as more organizations lean on automation and AI, the stakes only get higher.

Inside the Blog
The post walks through a practical, layered approach to data validation that covers the three most common ways data goes wrong (geometry, schema, and attributes) and then goes a step further to address AI-generated outputs too. A few highlights:

  • Geometry validation should run in order: check for corrupt or null geometries first, then test for compliance. Don't waste time running complex checks on broken data.
  • Schema drift is one of the most common causes of pipeline failures, and catching it before any records are written saves a lot of pain downstream.
  • The AttributeValidator transformer is your go-to for checking data types, nulls, ranges, regex patterns, and more.
  • AI outputs (from models like OpenAI or Gemini) need the same validation treatment as any other data source. Structured JSON schemas are your best tool here.

Key Takeaways
The "fail fast, report thoroughly" principle is worth taking to heart. A validation system that only surfaces one error at a time forces endless resubmit loops. Building workflows that accumulate all errors into a clear, human-readable HTML report is a game changer, especially when paired with FME Flow Apps that let data submitters validate their own files without needing to involve the GIS team at all.

Read the full blog here

Join the Conversation
Where does bad data cause the most headaches in your workflows? Is it geometry errors, schema drift, or something else entirely? Drop it below, we'd love to hear what the community is dealing with.

See you next time for another blog spotlight!