AI SRE Product Overview - Flashduty Docs

Private beta: AI SRE is currently in private beta and available to invited accounts only. To join the whitelist test, contact the Flashduty sales team to request access; features and the UI may change during the beta.

What Is AI SRE

AI SRE is Flashduty’s autonomous SRE agent platform. You give the AI instructions through conversation, and it independently investigates incidents, diagnoses root causes, calls tools to run diagnostics, and captures the operational knowledge gained from each investigation for future reuse. It is not a simple Q&A chatbot — it is a hands-on troubleshooting worker: it plans its own steps, reads and writes files, queries monitoring and logs, executes commands, calls external tools (MCP), and delegates subtasks when needed, ultimately delivering conclusions backed by a full investigation trail. AI SRE is deeply integrated with Flashduty’s incident response system. When Flashduty generates an incident or opens a war room, you can directly trigger an AI SRE session so the agent enters the investigation with full incident context — you can collaborate either in the console or by @ mentioning it directly in your IM group (Slack / Feishu / DingTalk / WeCom).

Conversation as Troubleshooting

Describe the problem in natural language. The agent plans autonomously, calls tools, and delivers an investigation trail and conclusion — no manual scripting required.

Integrated with Incident Response

Launch a session with one click from an incident or war room. The agent enters the investigation with full incident context, and the knowledge it accumulates feeds back into the next response.

Typical Use Cases

AI SRE is more than a chat box in the console — it covers multiple collaboration entry points across the full lifecycle from incident trigger to retrospective:

Conversational Troubleshooting

Ask questions proactively in the console’s Chat workspace: why a service is behaving abnormally, the root cause of an alert, or the blast radius of a change. The agent streams its planning, tool calls, and intermediate findings before delivering a final conclusion.

On-Demand in IM

No context switching needed — in Slack, Feishu, DingTalk, or WeCom group chats or direct messages, @ AI SRE to start or continue an investigation. It replies in-thread so all team members stay in the loop. See IM Integration.

Automatic War Room Diagnosis

When a war room is opened for an incident, AI SRE automatically runs an initial diagnostic and posts the conclusion back to the war room — by the time your team starts investigating, a first-pass analysis is already in the channel.

Usage Insights

Use /insight to review the past 30 days of sessions, quantify where you’re spending time, identify missing runbooks and repeatedly copy-pasted context, and receive actionable improvement suggestions.

Beta Access & Activation

AI SRE is currently in private beta. Activation requires both of the following conditions to be met:

Subscription Requirement: Pro or Above

AI SRE requires a Pro or higher subscription. Consistent with other professional capabilities such as Status Page and alert ingestion, it is unavailable on lower tiers, and the UI will prompt you to upgrade.

Whitelist Activation

During the private beta, AI SRE is available only to invited accounts and must be whitelisted by Flashduty for your account. Even with a Pro subscription, accounts not on the whitelist will not see the AI SRE entry point.

To join the private beta, contact the Flashduty sales team to request whitelist access.

Core Capabilities

AI SRE is built around “conversational troubleshooting + knowledge accumulation + autonomous execution,” with every capability configurable and manageable from the console.

Conversational Troubleshooting

Collaborate with the agent session by session. A session handles your messages one at a time, with streaming output, cancellation at any time, automatic context compaction for long conversations, and one-click launch from an incident.

IM Platform

@ the agent in Slack / Feishu / DingTalk / WeCom to start or continue an investigation, and receive automatic initial diagnostics in incident war rooms.

Skills

Skill packages that the agent can invoke, encapsulating reusable troubleshooting workflows. Scope can be set to account or team; once enabled, they are loaded on demand during sessions.

Manage Knowledge

Knowledge packages indexed via DUTY.md as the entry point and @-reference links, carrying service catalogs, runbooks, on-call routing, and other long-lived context — loaded in layers by account and team.

MCP (External Tools)

Connect external tools and data sources via the Model Context Protocol. MCP servers are not pre-connected; the agent establishes, uses, and closes connections on demand when making a call.

Agent

Delegate tasks to external remote agents via the standard A2A protocol. AI SRE also exposes its own Agent Card so external clients can invoke it in reverse.

BYOC

The execution layer for the agent: uses Flashduty-managed cloud sandboxes by default, or deploy a persistent Runner on your own machine to let investigations reach your private network.

Usage Insights

Use /insight to review the past 30 days of AI SRE sessions and receive a quantified overview, narrative summary, and actionable operational improvement suggestions (read-only; nothing is applied automatically).

After entering AI SRE, the top navigation is organized into the following four areas (menu names match the console):

Area	Menu Name	Purpose
Chat	Chat	The primary workspace for collaborating with the agent on troubleshooting. The left panel shows your session list (with search, filtering, pinning, and archiving); the right panel shows the conversation and investigation trail.
Plugins	Plugins	Manage extensible resources the agent can invoke, organized into three sub-tabs: Skill (skill packages), Agents (A2A remote agents), MCP (external tools).
Knowledges	Knowledges	Manage Knowledge Packs. At most one per target: account-level (visible to all agents) plus per-team (loaded only in that team’s sessions).
Environments	Environments	Manage self-hosted Runners. The persistent process handles the agent’s tool, Skill, and MCP calls; if none is available, sessions fall back to the cloud sandbox.

Visibility of each area is determined by your access permissions in the account: menus or sub-tabs you do not have permission for will not appear in the navigation.

Quick Start

Activate Access

Confirm your account has a Pro or higher subscription and has been added to the AI SRE private beta whitelist (contact the sales team to request access).

Open AI SRE

In the Flashduty console sidebar, open AI SRE. You will land in the Chat workspace by default.

Create a Session

Click “New Chat” to create a session. Sessions use the app_name=ai-sre agent by default and automatically select an online environment (falling back to the cloud sandbox if none is available).

Ask in Natural Language

Describe the problem you want to investigate in the chat box — for example, why a service is misbehaving, the root cause of an alert, or the blast radius of a change.

Review the Investigation Trail and Conclusion

The agent streams its planning, tool calls, and intermediate findings, then delivers a final conclusion. You can ask follow-up questions, cancel at any time, or relaunch the session with context from an incident or war room.

Reusable knowledge distilled during an investigation can be saved as a Knowledge Pack so subsequent sessions load it automatically; frequently used troubleshooting workflows can be packaged as Skills.

Next Steps

Console

Learn about sessions, streaming output, cancellation and context compaction, and how to launch an investigation from an incident.

IM Platform

@ the agent in Slack / Feishu / DingTalk / WeCom, and learn about automatic war room diagnostics.

Usage Insights

Use /insight to review the past 30 days of sessions and surface operational friction such as repeated context and missing runbooks.

​What Is AI SRE

Conversation as Troubleshooting

Integrated with Incident Response

​Typical Use Cases

Conversational Troubleshooting

On-Demand in IM

Automatic War Room Diagnosis

Usage Insights

​Beta Access & Activation

​Core Capabilities

Conversational Troubleshooting

IM Platform

Skills

Manage Knowledge

MCP (External Tools)

Agent

BYOC

Usage Insights

​Console Navigation

​Quick Start

​Next Steps

Console

IM Platform

Usage Insights

What Is AI SRE

Typical Use Cases

Beta Access & Activation

Core Capabilities

Console Navigation

Quick Start

Next Steps