- Apr 20, 2026
- 9 min read
Building Internal Copilots: A Vertex AI + Workspace Integration Pattern
The internal copilot is the unloved sibling of customer-facing GenAI. No marketing wins. No deflection numbers to put on slides. Just employees using the system every day because it shaves an hour off their work. For most enterprises, this is where the largest aggregate productivity gain shows up first.
Google Workspace is a strong starting point for internal copilots because most employees already use Gmail, Docs, Sheets, Drive, Meet, and Chat. Adding a copilot doesn’t require deploying a new tool. It’s a layer inside tools they’re already in.
This post is the integration pattern we deploy: Gemini in Workspace (formerly “Gemini for Workspace”) as the wedge, extended with custom data via Vertex AI Search (now branded Agent Search; APIs still use the Vertex AI Search / Discovery Engine endpoints), with heavier processing in Cloud Run plus Vertex AI when the conversation calls for it.
What “internal copilot” means in practice
Three categories of internal copilot land most often in our engagements:
Knowledge copilots. Answer questions over internal documentation, wikis, runbooks, policies. “What’s our travel policy for international clients?” “How do I configure the staging environment?” “Where’s the latest version of the security questionnaire we send to enterprise customers?”
Workflow copilots. Take action across systems. “Draft a follow-up email to the leads who didn’t respond.” “Create a Sheet summarizing last week’s sales by region.” “Schedule a 30-minute kickoff with the project team next Tuesday.”
Analytical copilots. Synthesize across data sources. “Summarize this quarter’s customer feedback themes.” “Compare our Q3 roadmap commitments with what shipped.” “What patterns are showing up in support tickets this month?”
A serious internal copilot ends up doing all three. The architecture has to handle the range.
The Workspace integration layer
Google Workspace gives you several places to host an internal copilot, with different trade-offs.
Gemini in Workspace (native), formerly “Gemini for Workspace”. Gemini is built into Gmail, Docs, Sheets, and Slides. Out of the box, it can summarize, draft, and assist within those tools. Workspace administrators control feature availability and data scope.
To extend Gemini in Workspace with your own data, the integration point is the Gemini in Workspace API (via Google Agentspace, now part of the Gemini Enterprise Agent Platform — the Cloud Next 2026 umbrella covering Agent Builder, ADK, Agent Engine, Agent Studio, Agent Garden, and Agentspace) which connects to Vertex AI Search data stores. Documents you index into Vertex AI Search become grounding sources Gemini can cite when answering questions in Workspace contexts. Workspace Studio, introduced at Cloud Next 2026, is the authoring surface for the custom skills you expose to users from within Gemini in Workspace — it’s where domain-specific actions and prompts are defined for non-developer authors.
Workspace Add-ons. For custom UI in Gmail, Calendar, Docs, Drive. Built with Apps Script or with the Workspace Add-ons framework. The add-on can call your own Cloud Run service, which calls Vertex AI.
Google Chat apps. Conversational interface as a Chat bot. Users @mention the copilot or DM it. Custom Chat apps are built with the Chat API and a webhook backend (typically Cloud Run).
Google Sites or standalone web UI. If the copilot needs a dedicated surface, host it on Cloud Run with Identity-Aware Proxy in front, embed it in an internal Sites page or a chrome-extension launcher.
Most production internal copilots end up using a combination: Workspace Add-ons for in-context assistance, a Chat app for conversational queries, and Gemini in Workspace’s native features for the everyday writing assistance.
Reference architecture
┌──────────────────────────────────────────────────────────────────────────┐
│ User entry points │
│ Gmail / Docs / Sheets / Chat / Calendar (Workspace surfaces) │
└────┬──────────────────────────┬─────────────────────────────────┬───────┘
│ │ │
│ native Gemini │ Workspace Add-on │ Chat app
│ in Workspace │ (Apps Script) │ webhook
│ │ │
▼ ▼ ▼
┌──────────────────────┐ ┌──────────────────────┐ ┌────────────────────────┐
│ Vertex AI Search │ │ Cloud Run │ │ Cloud Run │
│ (Workspace-grounded │ │ (Add-on backend) │ │ (Chat webhook) │
│ responses with │ │ │ │ │
│ citations) │ │ IAP-gated, calls │ │ Service account auth, │
│ │ │ Vertex AI │ │ calls Vertex AI │
└──────────┬───────────┘ └──────────┬────────────┘ └──────────┬─────────────┘
│ │ │
└──────────────────────────┼────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Vertex AI (Gemini + Agent Builder) │
│ Routes between simple Q&A and tool-using agent flows │
└──────────┬───────────────────────────────────────────────────────────────┘
│
┌───────┴──────────┬─────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────────────┐ ┌──────────────────────────┐
│ Vertex AI │ │ Workspace APIs │ │ Internal systems │
│ Search │ │ (Drive, Gmail, │ │ (HR, ITSM, CRM, custom) │
│ data stores │ │ Calendar, Sheets) │ │ via OpenAPI or custom │
│ (KB, wikis, │ │ Domain-wide │ │ tool definitions │
│ Drive) │ │ delegation │ │ │
└─────────────┘ └─────────────────────┘ └──────────────────────────┘
The key design decisions:
Vertex AI Search over Drive content is the simplest path to “the copilot can answer questions about our company documents.” Vertex AI Search has native Drive connectors. Permissions are respected at query time (a user can only retrieve documents they have permission to see).
Domain-wide delegation lets your Cloud Run service act on behalf of users in Workspace APIs. Important: scope it tightly. Domain-wide delegation with broad scopes is a security risk if the service is compromised. Use it only for the specific APIs the copilot needs.
IAP in front of Cloud Run ensures only authenticated company users can reach the copilot backend. Combine with VPC-SC for the perimeter.
Where the conversational latency budget goes
Workspace copilots are conversational. Users expect responses in 1 to 3 seconds for most queries. The latency budget breakdown:
| Stage | Budget |
|---|---|
| Workspace UI to Cloud Run | 100 to 300ms |
| Auth and request validation | 50 to 100ms |
| Vertex AI Search retrieval (if grounded) | 200 to 500ms |
| Gemini generation (2.5 Flash, thinking disabled) | 600ms to 1.4s p50 |
| Response formatting and return | 50 to 100ms |
| Total | ~1 to 2.4s |
This is fine for Gemini 2.5 Flash with thinking disabled. Set thinking_budget=0 in the ThinkingConfig (passed via GenerateContentConfig) on the chat path — 2.5 Flash defaults to thinking on, which pushes p50 latency into the 1.5–3s range and breaks the conversational budget. Reserve thinking-on Flash and Gemini 2.5 Pro for tasks where the response will take longer than a few seconds anyway (heavy analysis, document drafting).
The latency-critical bit: if the copilot has to call multiple tools (read your calendar, check your email, query your CRM), the request will exceed conversational latency budget. Solutions:
- Show progress. Stream the response. Even “Looking up your calendar…” while the actual work is happening reduces perceived latency.
- Parallelize tool calls. If you need calendar and CRM data, fetch them concurrently.
- Background long-running tasks. “I’ll do this and message you when it’s done” is a valid response for tasks that genuinely take 30 seconds.
The auth model that actually works
This is the part teams underestimate. Internal copilots need to act on behalf of the user, but the user’s auth context has to flow through cleanly to backend systems.
The pattern we use:
- The user is authenticated to Workspace (their normal session).
- The Workspace surface (Add-on, Chat app) passes the user’s identity to Cloud Run via Google’s identity tokens.
- Cloud Run verifies the token and extracts the user’s email.
- For Workspace APIs, Cloud Run uses domain-wide delegation to impersonate the user (with appropriate scopes).
- For internal systems (CRM, ITSM, custom), Cloud Run looks up the user’s permissions and either passes the user’s identity through (preferred) or applies the user’s permissions on the service account’s behalf.
from google.oauth2 import id_token
from google.auth.transport import requests as google_requests
from googleapiclient.discovery import build
from google.oauth2 import service_account
WORKSPACE_SA_KEY = "..." # from Secret Manager
SCOPES = ["https://www.googleapis.com/auth/calendar.readonly"]
def get_user_identity(authorization_header: str) -> str:
token = authorization_header.replace("Bearer ", "")
payload = id_token.verify_oauth2_token(token, google_requests.Request())
return payload["email"]
def workspace_client_for_user(user_email: str, api: str, version: str):
credentials = service_account.Credentials.from_service_account_info(
WORKSPACE_SA_KEY, scopes=SCOPES,
).with_subject(user_email)
return build(api, version, credentials=credentials)
@app.post("/handle_query")
async def handle_query(req: Request, authorization: str = Header(...)):
user_email = get_user_identity(authorization)
# Now we can call Workspace APIs as the user
calendar = workspace_client_for_user(user_email, "calendar", "v3")
events = calendar.events().list(calendarId="primary", maxResults=10).execute()
# ...
A common security failure mode: copilots that operate with broad service account permissions instead of the user’s permissions. A user asks “what’s on my calendar,” but the service account has access to all calendars in the domain. Tight scoping and user impersonation prevent this.
Building the right things first
A common pattern that derails internal copilot projects: the “Swiss army knife” mistake. The team wants one copilot that does everything. They define a vague scope (“an AI assistant for our employees”) and then never quite get to launch because the surface area is unbounded.
Better: pick one specific painful workflow first. Ship it. Measure usage. Add the next workflow only after the first is genuinely used.
Workflows that ship cleanly as a v1:
- Onboarding Q&A. New hires ask questions about benefits, IT setup, policy. The KB exists. The volume justifies it. Wins are easy to measure (HR ticket reduction).
- Meeting prep. “Here’s a 5-minute prep for my 2pm meeting with the Acme team,” pulls recent emails, CRM notes, and calendar context. Each meeting saves 5 to 10 minutes of context-loading.
- Document drafting from templates. “Draft a status update following our template, based on the JIRA tickets closed this week.” Repetitive writing tasks the team already has a template for.
- Cross-system summary. “What’s happening with project X this week,” queries JIRA, Slack, and email. Useful at every reporting cadence.
Measuring success
Internal copilot success is harder to measure than customer-facing GenAI because the value is qualitative. Useful metrics:
- Weekly active users. If the copilot has been launched to 1,000 employees and 50 are using it weekly, you’ve shipped a v1 that didn’t stick. Investigate why.
- Queries per active user. Below 3 per week per user, you have a curiosity, not a tool. Above 10, you have something embedded in their workflow.
- Self-reported time saved. A monthly pulse survey: “Roughly how much time did the copilot save you this month?” Combined with usage data, this gives you a defensible ROI number.
- Qualitative feedback. What are users trying to do that the copilot doesn’t handle? Those are the next workflows to add.
How Accelyze helps
Accelyze builds internal copilots on Google Workspace and Vertex AI. Engagements typically include workflow scoping (which copilot first), the integration pattern across Workspace surfaces, Vertex AI Search and Agent Builder setup, the auth and security model, and the rollout plan with measurement. If you’ve been told “deploy Gemini in Workspace and we’ll see what happens” and want a more deliberate approach, get in touch.