Document Intelligence Pipelines on Google Cloud: Document AI + Gemini for Enterprise Back-Offices

Document intelligence is one of those use cases that doesn’t make it into keynote demos but quietly drives a lot of enterprise GenAI spend. Insurance claims pipelines, accounts payable, contract review, regulatory submissions, KYC workflows. The volumes are large, the manual cost is high, the ROI math is unambiguous.

It’s also the use case where Google Cloud has a particularly strong offering. Document AI’s specialized extractors, plus Gemini’s multimodal and reasoning capabilities, plus BigQuery as the analytical substrate, are a stronger combination than the equivalent stack on either of the other major clouds. This isn’t marketing. It’s an observation from running comparable projects across platforms.

This is our reference build for document intelligence on GCP.

What’s in Document AI

Document AI is a family of products, not a single one. Worth being specific:

Document AI Workbench / custom processors. Train a custom extractor on your document type with as few as 50 to 100 labeled examples. Schema-driven.
Pre-trained specialized processors. Out-of-the-box extractors for invoices, receipts, W-2s, 1099s, US drivers licenses, passports, and others. Good baselines.
Document OCR. Optical character recognition on PDFs, scanned images, mixed documents. Strong handling of tables and forms.
Layout Parser. Extracts document structure (headers, paragraphs, tables) without specific schema.
Document AI Warehouse (now part of Agentspace, itself now under the Gemini Enterprise Agent Platform). Document management with search, classification, and approval workflows.

Gemini complements these. Document AI is excellent at structured extraction; Gemini is excellent at reasoning over extracted content. The pattern that works well: Document AI extracts the fields, Gemini reasons about whether the fields are consistent, plausible, anomalous, or actionable.

Reference architecture

┌──────────────────────────────────────────────────────────────────────────┐
│  Document intake                                                          │
│  Email attachments, SFTP, S3-to-GCS transfer, UI uploads, EDI feeds       │
│  Cloud Storage landing bucket per document type                            │
└────────────────────────────┬─────────────────────────────────────────────┘
                             │ Pub/Sub trigger on object creation
                             ▼
┌──────────────────────────────────────────────────────────────────────────┐
│  Cloud Workflows orchestration                                             │
│                                                                            │
│  1. Document type classification (Gemini multimodal, fast and cheap)       │
│  2. Route to appropriate Document AI processor                             │
│  3. Run Document AI extraction                                             │
│  4. Validate extracted fields against schema                                │
│  5. Run Gemini reasoning pass over extracted content                       │
│  6. Anomaly check (deterministic rules + Gemini judgment)                  │
│  7. Confidence-based routing: auto-approve / human review / reject         │
└────────┬─────────────────────────────────────────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────────────────────────────────────────┐
│  Outputs                                                                   │
│  - BigQuery: structured extraction results, joinable with business data    │
│  - Cloud Storage: processed documents, indexed by extracted metadata       │
│  - Helpdesk / workflow system: queued items for human review               │
│  - Downstream systems: ERP, accounting, claims platform via Pub/Sub        │
└──────────────────────────────────────────────────────────────────────────┘

Observability:
  Per-document trace ID propagated from intake through final disposition
  Cloud Trace spans on each pipeline stage
  Cloud Logging structured logs with confidence scores per field
  Vertex AI Evals: nightly run on golden documents
  BigQuery: full audit table of every document processed

The Document AI plus Gemini pattern in detail

Here’s the shape of a single document going through the pipeline.

from google.cloud import documentai_v1 as documentai
from google.cloud import storage
from google import genai
from google.genai import types

PROJECT_ID = "your-project"
LOCATION = "us"
storage_client = storage.Client()

docai_client = documentai.DocumentProcessorServiceClient()
genai_client = genai.Client(vertexai=True, project=PROJECT_ID, location="us-central1")
GEMINI_MODEL = "gemini-2.5-flash"

async def process_invoice(gcs_uri: str) -> dict:
    # 1. Document AI extraction (invoice-specific processor)
    pdf_bytes = read_gcs(gcs_uri)
    docai_request = documentai.ProcessRequest(
        name=INVOICE_PROCESSOR_NAME,
        raw_document=documentai.RawDocument(
            content=pdf_bytes, mime_type="application/pdf"
        ),
    )
    docai_result = docai_client.process_document(request=docai_request)
    extracted = parse_invoice_entities(docai_result.document)

    # 2. Gemini reasoning pass over the extraction
    reasoning_prompt = f"""
You are validating an extracted invoice. The extraction came from a specialist
OCR system, so the fields are likely accurate, but the consistency and
plausibility should be verified.

Extracted fields:
{json.dumps(extracted, indent=2)}

Check:
1. Do the line items sum to the subtotal?
2. Does subtotal plus tax equal total?
3. Is the vendor in our approved vendor list? (List: {APPROVED_VENDORS})
4. Is the invoice date plausible (not future, not over 1 year old)?
5. Any anomalies that warrant human review?

Output JSON: {{
  "math_consistent": bool,
  "vendor_approved": bool,
  "date_plausible": bool,
  "anomalies": [string],
  "recommendation": "auto_approve" | "human_review" | "reject",
  "reasoning": string
}}
"""
    reasoning = genai_client.models.generate_content(
        model=GEMINI_MODEL,
        contents=reasoning_prompt,
        config=types.GenerateContentConfig(
            temperature=0.0,
            response_mime_type="application/json",
        ),
    )
    validation = json.loads(reasoning.text)

    # 3. Compose final record
    return {
        "extraction": extracted,
        "validation": validation,
        "gcs_uri": gcs_uri,
        "trace_id": get_trace_id(),
    }

A few things to notice in this pattern:

Document AI does the structural extraction. Gemini doesn’t try to OCR a PDF. That’s not what it’s good at, and Document AI’s specialist processors will outperform Gemini’s vision on standard document types.

Gemini does the reasoning. Math checks, plausibility checks, anomaly detection. Things that are easy to express as natural language judgments but hard to express as deterministic rules.

Temperature 0 and response schema for the reasoning pass. When the output is going to drive downstream routing decisions, you don’t want creative outputs.

The recommendation is one of three values. Auto-approve, human review, reject. The pipeline acts on this. Confidence thresholds are policy decisions, not model decisions.

Routing by confidence

Document intelligence systems live or die on the auto-approve threshold. Set it too low and humans review everything (no automation value). Set it too high and you auto-approve errors (real money out the door).

The right setting depends on the cost matrix: cost of a false auto-approve vs. cost of a false human review. We build this as an explicit policy with knobs the business can tune:

                   ┌────────────────────────┐
Document processed │  Field-level confidence │
                   │  (Document AI)          │
                   └─────────┬──────────────┘
                             │
                             ▼
                   ┌─────────────────────────────────────┐
                   │  Aggregate confidence score          │
                   │  (worst field × Gemini validation)   │
                   └─────────┬──────────────────────────┘
                             │
                  ┌──────────┼──────────┐
                  │          │          │
                  ▼          ▼          ▼
            ┌─────────┐  ┌────────┐  ┌────────┐
            │ > 0.95  │  │ 0.7-   │  │ < 0.7  │
            │ auto    │  │ 0.95   │  │ reject │
            │ approve │  │ human  │  │ or     │
            │         │  │ review │  │ retry  │
            └─────────┘  └────────┘  └────────┘

The thresholds are configuration. Different document types have different thresholds. Different downstream systems (high-trust vs. high-risk) get different thresholds.

The Document AI Warehouse / human-in-the-loop pattern

For documents that need human review, the workflow needs a place for the human. Two options:

Document AI Warehouse, now consolidated into Agentspace under the Gemini Enterprise Agent Platform. Native GCP. Documents land in a managed UI with the extracted fields, the original document, and an approve/edit/reject workflow. Good if you don’t have an existing document management system.
Integration with existing systems. Most enterprises have an existing review queue (Salesforce, ServiceNow, custom internal tools). Push documents flagged for review into that system with the AI extraction pre-populated. Save the human reviewer time, not the workflow.

Either way, capture the human’s correction. When a reviewer changes “Subtotal: $1,247.50” to “Subtotal: $1,247.05,” that correction is signal. Periodically retrain the Document AI processor on accumulated corrections. The system gets better.

Verticals where this lands

Insurance claims processing. First Notice of Loss documents, medical records, repair estimates, police reports. The volumes are huge, the document types are reasonably standardized, and the regulatory environment requires audit trails (which Cloud Logging and BigQuery handle natively). High-budget category, recurring spend.

Accounts payable. Invoice ingestion, three-way match against POs and receiving documents, exception handling. Document AI’s pre-trained invoice processor is solid out of the box, customization captures the long tail.

Contract review. Different shape from invoices. Document AI Layout Parser for structure, Gemini for clause-level analysis. We’ve covered this in the worked example in the build vs. buy vs. fine-tune framework.

KYC / onboarding. ID verification, proof-of-address documents, beneficial ownership documentation. Specialized processors plus Gemini reasoning over the combination of documents (does this person’s address match across all submitted documents?).

Regulatory submissions. Drug approval filings, environmental compliance reports, financial regulatory filings. High accuracy bar, very high cost of error, but Document AI plus Gemini with a thorough human review tier produces value over the baseline of pure human review.

Cost considerations

A few patterns specific to document intelligence cost economics:

Document AI per-page pricing. Adds up at high volume. Worth measuring whether all pages need to be processed (e.g., the legal boilerplate on pages 8 to 12 of a contract often doesn’t).
Gemini token cost. Per document, modest. Per page-of-extracted-text, more modest. The reasoning pass usually fits in Flash, not Pro.
Storage. Processed documents need to live somewhere with audit retention. Cloud Storage Archive class is appropriate after the operational window.
Cloud Workflows execution cost. Negligible compared to the model and Document AI costs.

For a 100,000-document-per-month pipeline, monthly cost is typically four to low-five figures depending on complexity. The labor cost being replaced is usually six figures or more, which is why these projects fund themselves quickly.

Operational realities

A few things that break in production document pipelines:

Document type drift. The forms change. The bank updates its statement template, the supplier changes their invoice format, the regulator publishes a new version. Classification quality drifts. Schedule monthly reviews of classification confidence trends.

Garbage inputs. Customers send PDFs of photos of screens. Scanned documents come in upside down. The pipeline needs a quality gate at intake: a fast Gemini check that asks “is this actually the document type I expected?”

Edge cases in the long tail. The first 80% of documents go through cleanly. The last 20% are weird. Build the review queue with the assumption that humans will see all of them. The auto-approve rate is the metric to track over time, not the per-document success rate.

Auditability requirements. Especially in regulated industries, every decision needs to be reconstructible. The full extraction, the Gemini reasoning, the confidence score, the policy version applied, the reviewer (if any), and the final disposition all need to be queryable, often for 7+ years. BigQuery with appropriate retention policies handles this; design for it from day one.

How Accelyze helps

Accelyze designs and builds document intelligence pipelines on Google Cloud across multiple verticals. Engagements typically include custom Document AI processor training, the Gemini reasoning layer, the human-in-the-loop integration, the BigQuery audit schema, and the cost and quality dashboards. If you have a high-volume document processing workflow that’s currently manual or partially automated, get in touch.

GenAI Strategy & Readiness

Pilot to Production Delivery

MLOps & Platform Enablement

GenAI Risk & Governance