Private beta: AI SRE is currently in private beta and available to invited accounts only. To join the whitelist test, contact the Flashduty sales team to request access; features and the UI may change during the beta.
What Is AI SRE
AI SRE is Flashduty’s autonomous SRE agent platform. You give the AI instructions through conversation, and it independently investigates incidents, diagnoses root causes, calls tools to run diagnostics, and captures the operational knowledge gained from each investigation for future reuse. It is not a simple Q&A chatbot — it is a hands-on troubleshooting worker: it plans its own steps, reads and writes files, queries monitoring and logs, executes commands, calls external tools (MCP), and delegates subtasks when needed, ultimately delivering conclusions backed by a full investigation trail. AI SRE is deeply integrated with Flashduty’s incident response system. When Flashduty generates an incident or opens a war room, you can directly trigger an AI SRE session so the agent enters the investigation with full incident context — you can collaborate either in the console or by @ mentioning it directly in your IM group (Slack / Feishu / DingTalk / WeCom).
Conversation as Troubleshooting
Describe the problem in natural language. The agent plans autonomously, calls tools, and delivers an investigation trail and conclusion — no manual scripting required.
Integrated with Incident Response
Launch a session with one click from an incident or war room. The agent enters the investigation with full incident context, and the knowledge it accumulates feeds back into the next response.
Typical Use Cases
AI SRE is more than a chat box in the console — it covers multiple collaboration entry points across the full lifecycle from incident trigger to retrospective:
Conversational Troubleshooting
Ask questions proactively in the console’s Chat workspace: why a service is behaving abnormally, the root cause of an alert, or the blast radius of a change. The agent streams its planning, tool calls, and intermediate findings before delivering a final conclusion.
On-Demand in IM
No context switching needed — in Slack, Feishu, DingTalk, or WeCom group chats or direct messages, @ AI SRE to start or continue an investigation. It replies in-thread so all team members stay in the loop. See IM Integration.
Automatic War Room Diagnosis
When a war room is opened for an incident, AI SRE automatically runs an initial diagnostic and posts the conclusion back to the war room — by the time your team starts investigating, a first-pass analysis is already in the channel.
Usage Insights
Use
/insight to review the past 30 days of sessions, quantify where you’re spending time, identify missing runbooks and repeatedly copy-pasted context, and receive actionable improvement suggestions.Beta Access & Activation
AI SRE is currently in private beta. Activation requires both of the following conditions to be met:
Subscription Requirement: Pro or Above
Subscription Requirement: Pro or Above
AI SRE requires a Pro or higher subscription. Consistent with other professional capabilities such as Status Page and alert ingestion, it is unavailable on lower tiers, and the UI will prompt you to upgrade.
Whitelist Activation
Whitelist Activation
During the private beta, AI SRE is available only to invited accounts and must be whitelisted by Flashduty for your account. Even with a Pro subscription, accounts not on the whitelist will not see the AI SRE entry point.
Core Capabilities
AI SRE is built around “conversational troubleshooting + knowledge accumulation + autonomous execution,” with every capability configurable and manageable from the console.
Conversational Troubleshooting
Collaborate with the agent session by session. A session handles your messages one at a time, with streaming output, cancellation at any time, automatic context compaction for long conversations, and one-click launch from an incident.
IM Platform
@ the agent in Slack / Feishu / DingTalk / WeCom to start or continue an investigation, and receive automatic initial diagnostics in incident war rooms.
Skills
Skill packages that the agent can invoke, encapsulating reusable troubleshooting workflows. Scope can be set to account or team; once enabled, they are loaded on demand during sessions.
Manage Knowledge
Knowledge packages indexed via DUTY.md as the entry point and @-reference links, carrying service catalogs, runbooks, on-call routing, and other long-lived context — loaded in layers by account and team.
MCP (External Tools)
Connect external tools and data sources via the Model Context Protocol. MCP servers are not pre-connected; the agent establishes, uses, and closes connections on demand when making a call.
Agent
Delegate tasks to external remote agents via the standard A2A protocol. AI SRE also exposes its own Agent Card so external clients can invoke it in reverse.
BYOC
The execution layer for the agent: uses Flashduty-managed cloud sandboxes by default, or deploy a persistent Runner on your own machine to let investigations reach your private network.
Usage Insights
Use /insight to review the past 30 days of AI SRE sessions and receive a quantified overview, narrative summary, and actionable operational improvement suggestions (read-only; nothing is applied automatically).
Console Navigation
After entering AI SRE, the top navigation is organized into the following four areas (menu names match the console):
| Area | Menu Name | Purpose |
|---|---|---|
| Chat | Chat | The primary workspace for collaborating with the agent on troubleshooting. The left panel shows your session list (with search, filtering, pinning, and archiving); the right panel shows the conversation and investigation trail. |
| Plugins | Plugins | Manage extensible resources the agent can invoke, organized into three sub-tabs: Skill (skill packages), Agents (A2A remote agents), MCP (external tools). |
| Knowledges | Knowledges | Manage Knowledge Packs. At most one per target: account-level (visible to all agents) plus per-team (loaded only in that team’s sessions). |
| Environments | Environments | Manage self-hosted Runners. The persistent process handles the agent’s tool, Skill, and MCP calls; if none is available, sessions fall back to the cloud sandbox. |
Visibility of each area is determined by your access permissions in the account: menus or sub-tabs you do not have permission for will not appear in the navigation.
Quick Start
Activate Access
Confirm your account has a Pro or higher subscription and has been added to the AI SRE private beta whitelist (contact the sales team to request access).
Open AI SRE
In the Flashduty console sidebar, open AI SRE. You will land in the Chat workspace by default.
Create a Session
Click “New Chat” to create a session. Sessions use the
app_name=ai-sre agent by default and automatically select an online environment (falling back to the cloud sandbox if none is available).Ask in Natural Language
Describe the problem you want to investigate in the chat box — for example, why a service is misbehaving, the root cause of an alert, or the blast radius of a change.
Next Steps
Console
Learn about sessions, streaming output, cancellation and context compaction, and how to launch an investigation from an incident.
IM Platform
@ the agent in Slack / Feishu / DingTalk / WeCom, and learn about automatic war room diagnostics.
Usage Insights
Use /insight to review the past 30 days of sessions and surface operational friction such as repeated context and missing runbooks.