pam_deploy_graph/agent.py:progress action 未完成不标记 completed,超时暂停在当前 action,支持断点继续。
llm 提示词和规则:新增 progress_complete 判断字段。 deploy.sh / deploy.ps1:poll-* action 入口改为单次查询。 interactive.py:chat 会播报进度更新。 config.txt.example / README / packaging 文档 / Skill 文档:同步进度查询参数和新 workflow 语义。 测试补充了进度重复查询、超时暂停、chat 进度播报。
This commit is contained in:
parent
1cb1b42395
commit
e572a26e6f
11
README.md
11
README.md
@ -83,6 +83,7 @@ packaging/
|
|||||||
- chat 执行前会归一化参数并展示实际写入脚本配置的值;`script_only` / `hybrid_node_mcp` 会提前检查 `ZIP_FILE_PATH` 是否存在。
|
- chat 执行前会归一化参数并展示实际写入脚本配置的值;`script_only` / `hybrid_node_mcp` 会提前检查 `ZIP_FILE_PATH` 是否存在。
|
||||||
- chat 执行中会播报每个 action 的开始、完成或失败;action 执行失败会停在当前 checkpoint,不再误报 LangGraph 不可用。
|
- chat 执行中会播报每个 action 的开始、完成或失败;action 执行失败会停在当前 checkpoint,不再误报 LangGraph 不可用。
|
||||||
- 每个 action 完成后都会进入一次 LLM/规则审核;只有审核通过才会把 action 记为 completed,如果审核建议停止,流程会暂停并等待用户 `resume` 重试当前 action。
|
- 每个 action 完成后都会进入一次 LLM/规则审核;只有审核通过才会把 action 记为 completed,如果审核建议停止,流程会暂停并等待用户 `resume` 重试当前 action。
|
||||||
|
- `poll-download-progress` 和 `poll-upgrade-progress` 已改为单次进度查询;workflow 负责按配置重复调用,每次查询结果都会交给 LLM/规则审核判断是否完成,并通过 chat 播报进度。
|
||||||
- `--analyze-actions` 和 `llm action-analysis on` 改为只控制是否把详细审核结果写入 `events`,不再控制审核是否执行。
|
- `--analyze-actions` 和 `llm action-analysis on` 改为只控制是否把详细审核结果写入 `events`,不再控制审核是否执行。
|
||||||
- chat 会播报 action 审核开始、审核完成和审核失败,避免黑盒执行。
|
- chat 会播报 action 审核开始、审核完成和审核失败,避免黑盒执行。
|
||||||
- chat 支持执行中按 `Ctrl+C` 中断,保存 checkpoint 后再 `resume`。
|
- chat 支持执行中按 `Ctrl+C` 中断,保存 checkpoint 后再 `resume`。
|
||||||
@ -90,7 +91,7 @@ packaging/
|
|||||||
- 支持通过 `--llm-action-analysis-prompt-file`、`PAM_LLM_ACTION_ANALYSIS_PROMPT_FILE` 或 chat 内 `llm config action_analysis_prompt_file=...` 自定义 action 审核提示词。
|
- 支持通过 `--llm-action-analysis-prompt-file`、`PAM_LLM_ACTION_ANALYSIS_PROMPT_FILE` 或 chat 内 `llm config action_analysis_prompt_file=...` 自定义 action 审核提示词。
|
||||||
- 增加统一运行日志,默认写入 `logs/pam_deploy_agent.log`,覆盖 CLI/chat、LLM 调用、action 路由、脚本/MCP 调用、LangGraph、checkpoint 等关键流程。
|
- 增加统一运行日志,默认写入 `logs/pam_deploy_agent.log`,覆盖 CLI/chat、LLM 调用、action 路由、脚本/MCP 调用、LangGraph、checkpoint 等关键流程。
|
||||||
- chat 支持 `llm test [文本]`,可用当前 LLM client 做一次轻量调用,确认真实 LLM 或规则 fallback 是否正常加载。
|
- chat 支持 `llm test [文本]`,可用当前 LLM client 做一次轻量调用,确认真实 LLM 或规则 fallback 是否正常加载。
|
||||||
- 添加基础测试,当前本地结果为 `62 passed, 2 skipped`。
|
- 添加基础测试,当前本地结果为 `66 passed, 3 skipped`。
|
||||||
|
|
||||||
未完成:
|
未完成:
|
||||||
|
|
||||||
@ -299,7 +300,13 @@ PAM> resume
|
|||||||
PAM> exit
|
PAM> exit
|
||||||
```
|
```
|
||||||
|
|
||||||
`chat` 默认仍要求在会话内显式输入 `run`,并确认参数、目标 IP 范围和最终执行后才会执行 action。输入 `你好`、`hello` 这类问候不会触发 LLM/结构化分析;需要分析部署需求时可直接描述部署任务,或显式使用 `analyze <需求>`。每个 action 完成后都会自动进入一次 LLM/规则审核,并播报审核开始/结束;只有审核通过才会把 action 记为 completed;如果审核建议停止或审核本身失败,流程会暂停并输出建议,等待用户决定是否 `resume` 重试当前 action。逐 IP action 失败时也会暂停,修复外部环境后输入 `resume` 会从当前 action 重试;如果确实需要回滚,使用 `rollback [IP]` 显式执行。`llm test [文本]` 可测试当前 LLM client 是否可用。`--analyze-actions` 仅控制详细审核结果是否写入 `events`。执行中可按 `Ctrl+C` 中断,chat 会保存当前 checkpoint 并把流程标记为 `user_interrupted`。`set KEY=VALUE` 和 `load params <路径>` 会把更新同步到当前运行 state、`config.txt` 和 checkpoint。`chat` 也支持 `--llm-base-url` / `--llm-api-key` / `--llm-model` / `--llm-action-analysis-prompt-file`、`--mcp-config` 和 `--analyze-actions`。
|
`chat` 默认仍要求在会话内显式输入 `run`,并确认参数、目标 IP 范围和最终执行后才会执行 action。输入 `你好`、`hello` 这类问候不会触发 LLM/结构化分析;需要分析部署需求时可直接描述部署任务,或显式使用 `analyze <需求>`。每个 action 完成后都会自动进入一次 LLM/规则审核,并播报审核开始/结束;只有审核通过才会把 action 记为 completed;如果审核建议停止或审核本身失败,流程会暂停并输出建议,等待用户决定是否 `resume` 重试当前 action。`poll-download-progress` 和 `poll-upgrade-progress` 每次只查询一次进度,workflow 会按 `POLL_INTERVAL_SEC`、`DOWNLOAD_POLL_MAX_ATTEMPTS`、`UPGRADE_POLL_MAX_ATTEMPTS` 重复调用,并在每次返回后让 LLM/规则判断是否完成、播报进度;未完成时不会跳到下一个 action。逐 IP action 失败时也会暂停,修复外部环境后输入 `resume` 会从当前 action 重试;如果确实需要回滚,使用 `rollback [IP]` 显式执行。`llm test [文本]` 可测试当前 LLM client 是否可用。`--analyze-actions` 仅控制详细审核结果是否写入 `events`。执行中可按 `Ctrl+C` 中断,chat 会保存当前 checkpoint 并把流程标记为 `user_interrupted`。`set KEY=VALUE` 和 `load params <路径>` 会把更新同步到当前运行 state、`config.txt` 和 checkpoint。`chat` 也支持 `--llm-base-url` / `--llm-api-key` / `--llm-model` / `--llm-action-analysis-prompt-file`、`--mcp-config` 和 `--analyze-actions`。
|
||||||
|
|
||||||
|
进度查询相关参数:
|
||||||
|
|
||||||
|
- `POLL_INTERVAL_SEC`:两次进度查询之间的等待秒数,默认 `2`。
|
||||||
|
- `DOWNLOAD_POLL_MAX_ATTEMPTS`:云下载进度最大查询次数,默认 `60`。
|
||||||
|
- `UPGRADE_POLL_MAX_ATTEMPTS`:单 IP 推送进度最大查询次数,默认 `600`。
|
||||||
|
|
||||||
## 日志
|
## 日志
|
||||||
|
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: pam-auto-deply
|
name: pam-auto-deply
|
||||||
description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解用户需求、收集并确认参数、选择执行模式、编排主流程、控制回滚确认与最终汇总;由现有 deploy.sh / deploy.ps1 提供 action 能力执行建版、上传、发布、节点发现、云下载、升级、启停、校验、日志下载和手动回滚。禁止自动生成或修改脚本,禁止使用脚本主流程做部署。
|
description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解用户需求、收集并确认参数、选择执行模式、编排主流程、控制进度查询与最终汇总;由现有 deploy.sh / deploy.ps1 提供 action 能力执行建版、上传、发布、节点发现、云下载、升级、启停、校验、日志下载和手动回滚。禁止自动生成或修改脚本,禁止使用脚本主流程做部署。
|
||||||
---
|
---
|
||||||
|
|
||||||
# PAM_AUTO_DEPLY Skill
|
# PAM_AUTO_DEPLY Skill
|
||||||
@ -22,7 +22,7 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
- 禁止自动生成、重建、覆盖或修改 `deploy.sh`、`deploy.ps1`、`deploy.bat`、`test_deploy.sh`、`test_deploy.ps1`、`test_deploy.bat`。
|
- 禁止自动生成、重建、覆盖或修改 `deploy.sh`、`deploy.ps1`、`deploy.bat`、`test_deploy.sh`、`test_deploy.ps1`、`test_deploy.bat`。
|
||||||
- 在任何真实调用前,必须先向用户展示归一化后的参数并得到确认。
|
- 在任何真实调用前,必须先向用户展示归一化后的参数并得到确认。
|
||||||
- 在真实部署执行过程中,必须持续向用户展示当前阶段、下一步动作和阶段结果,禁止长时间静默执行。
|
- 在真实部署执行过程中,必须持续向用户展示当前阶段、下一步动作和阶段结果,禁止长时间静默执行。
|
||||||
- 回滚不得自动执行。脚本只能输出 `PENDING_AGENT_CONFIRMATION(...)`,必须由 Agent 先向用户确认。
|
- 回滚不得自动执行;主 workflow 失败后只暂停在当前 action。需要回滚时,必须由用户显式输入 `rollback [IP]` 或直接调用 `rollback-ip` action。
|
||||||
|
|
||||||
## 2. 执行模式选择
|
## 2. 执行模式选择
|
||||||
|
|
||||||
@ -68,6 +68,9 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
| `actionType` | `ACTION_TYPE` | 否 | 升级类型,默认 `FULL` |
|
| `actionType` | `ACTION_TYPE` | 否 | 升级类型,默认 `FULL` |
|
||||||
| `timeOut` | `TIMEOUT` | 否 | 接口级超时参数,默认 `120` |
|
| `timeOut` | `TIMEOUT` | 否 | 接口级超时参数,默认 `120` |
|
||||||
| `logName` | `LOG_NAME` | 否 | 日志文件名,默认 `app.log` |
|
| `logName` | `LOG_NAME` | 否 | 日志文件名,默认 `app.log` |
|
||||||
|
| `pollIntervalSec` | `POLL_INTERVAL_SEC` | 否 | 两次进度查询间隔,默认 `2` 秒 |
|
||||||
|
| `downloadPollMaxAttempts` | `DOWNLOAD_POLL_MAX_ATTEMPTS` | 否 | 云下载进度最大查询次数,默认 `60` |
|
||||||
|
| `upgradePollMaxAttempts` | `UPGRADE_POLL_MAX_ATTEMPTS` | 否 | 单 IP 推送进度最大查询次数,默认 `600` |
|
||||||
|
|
||||||
### 3.2 运行控制参数
|
### 3.2 运行控制参数
|
||||||
|
|
||||||
@ -77,13 +80,12 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
- `showUsageOnly`: 是否只说明现有脚本用法而不执行
|
- `showUsageOnly`: 是否只说明现有脚本用法而不执行
|
||||||
- `userSpecifiedIps`: 用户指定的目标 IP 子集
|
- `userSpecifiedIps`: 用户指定的目标 IP 子集
|
||||||
- `allOrNothing`: 是否要求全有或全无
|
- `allOrNothing`: 是否要求全有或全无
|
||||||
- `rollbackApproved`: 用户是否已确认回滚
|
- `rollbackApproved`: 用户是否已明确要求执行回滚
|
||||||
- `osTarget`: 目标脚本入口环境
|
- `osTarget`: 目标脚本入口环境
|
||||||
- `checkpointPath`: 检查点文件路径
|
- `checkpointPath`: 检查点文件路径
|
||||||
- `resumeFromCheckpoint`: 是否按已有检查点断点续试
|
- `resumeFromCheckpoint`: 是否按已有检查点断点续试
|
||||||
- `traceFilePath`: 当前部署统一复用的接口跟踪日志文件路径
|
- `traceFilePath`: 当前部署统一复用的接口跟踪日志文件路径
|
||||||
- `stepIntervalSec`: 全局 action 与 action 之间的执行间隔
|
- `stepIntervalSec`: 全局 action 与 action 之间的执行间隔
|
||||||
- `firstPollDelaySec`: 创建下载任务后,到首次轮询下载进度前的等待间隔
|
|
||||||
- `perIpStepIntervalSec`: 同一台 IP 内部步骤之间的执行间隔
|
- `perIpStepIntervalSec`: 同一台 IP 内部步骤之间的执行间隔
|
||||||
- `perIpIntervalSec`: 一台 IP 完成后到下一台 IP 开始前的间隔
|
- `perIpIntervalSec`: 一台 IP 完成后到下一台 IP 开始前的间隔
|
||||||
- `failurePauseSec`: 某步骤失败后进入下一分支前的等待间隔
|
- `failurePauseSec`: 某步骤失败后进入下一分支前的等待间隔
|
||||||
@ -91,7 +93,6 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
推荐默认值:
|
推荐默认值:
|
||||||
|
|
||||||
- `stepIntervalSec = 2`
|
- `stepIntervalSec = 2`
|
||||||
- `firstPollDelaySec = 2`
|
|
||||||
- `perIpStepIntervalSec = 1`
|
- `perIpStepIntervalSec = 1`
|
||||||
- `perIpIntervalSec = 3`
|
- `perIpIntervalSec = 3`
|
||||||
- `failurePauseSec = 0`
|
- `failurePauseSec = 0`
|
||||||
@ -160,6 +161,9 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
- `ACTION_TYPE`
|
- `ACTION_TYPE`
|
||||||
- `TIMEOUT`
|
- `TIMEOUT`
|
||||||
- `LOG_NAME`
|
- `LOG_NAME`
|
||||||
|
- `POLL_INTERVAL_SEC`
|
||||||
|
- `DOWNLOAD_POLL_MAX_ATTEMPTS`
|
||||||
|
- `UPGRADE_POLL_MAX_ATTEMPTS`
|
||||||
- 命令行只传 action 级控制参数:
|
- 命令行只传 action 级控制参数:
|
||||||
- `--action` / `-Action`
|
- `--action` / `-Action`
|
||||||
- `--ip` / `-Ip`
|
- `--ip` / `-Ip`
|
||||||
@ -168,7 +172,8 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
- 不要把整套业务参数直接拼接到命令行。
|
- 不要把整套业务参数直接拼接到命令行。
|
||||||
- `client_secret` 等敏感字段不得通过命令行透传。
|
- `client_secret` 等敏感字段不得通过命令行透传。
|
||||||
- 如果用户明确要求“不落地配置文件”,则本 Skill 不执行真实部署,只说明限制和原因。
|
- 如果用户明确要求“不落地配置文件”,则本 Skill 不执行真实部署,只说明限制和原因。
|
||||||
- `traceFilePath` 与间隔控制参数不写入 `config.txt`,由 Agent 在运行时持有并应用。
|
- `traceFilePath` 不写入 `config.txt`,由 Agent 在运行时持有并应用。
|
||||||
|
- 进度查询间隔和最大次数写入 `config.txt`,由 Agent workflow 和脚本调试流程共同读取。
|
||||||
|
|
||||||
## 4. 主流程(硬约束)
|
## 4. 主流程(硬约束)
|
||||||
|
|
||||||
@ -194,22 +199,22 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
12. 调用 `get-online-ips`。
|
12. 调用 `get-online-ips`。
|
||||||
13. 若用户指定了目标 IP,则基于在线 IP 列表做过滤。
|
13. 若用户指定了目标 IP,则基于在线 IP 列表做过滤。
|
||||||
14. 调用 `create-download-task`。
|
14. 调用 `create-download-task`。
|
||||||
15. 调用 `poll-download-progress`,直到下载完成、失败或超时。
|
15. 重复调用 `poll-download-progress` 单次查询进度;每次返回后交给 LLM/规则判断,直到下载完成、失败或达到最大查询次数。
|
||||||
16. 按在线 IP 或过滤后的目标 IP 列表逐台执行:
|
16. 按在线 IP 或过滤后的目标 IP 列表逐台执行:
|
||||||
- `upgrade-ip`
|
- `upgrade-ip`
|
||||||
- `poll-upgrade-progress`
|
- 重复调用 `poll-upgrade-progress` 单次查询进度;每次返回后交给 LLM/规则判断,直到推送完成、失败或达到最大查询次数
|
||||||
- `start-ip`
|
- `start-ip`
|
||||||
- `verify-ip`
|
- `verify-ip`
|
||||||
- `download-log`
|
- `download-log`
|
||||||
17. 汇总每台 IP 的结果。
|
17. 汇总每台 IP 的结果。
|
||||||
18. 若出现 `PENDING_AGENT_CONFIRMATION(...)`,立即中止自动后续动作,转入回滚确认分支。
|
18. 若 action 失败、LLM/规则审核要求停止,或出现 legacy `PENDING_AGENT_CONFIRMATION(...)`,暂停在当前 action 并输出建议。
|
||||||
19. 输出最终报告。
|
19. 输出最终报告;需要回滚时,等待用户显式执行 `rollback [IP]`。
|
||||||
|
|
||||||
主流程补充规则:
|
主流程补充规则:
|
||||||
|
|
||||||
1. 一次完整部署中的所有 action 调用,应复用同一个 `traceFilePath`,禁止每个 action 各自新建独立 trace 文件。
|
1. 一次完整部署中的所有 action 调用,应复用同一个 `traceFilePath`,禁止每个 action 各自新建独立 trace 文件。
|
||||||
2. 全局 action 与下一 action 之间,按 `stepIntervalSec` 等待。
|
2. 全局 action 与下一 action 之间,按 `stepIntervalSec` 等待。
|
||||||
3. `create-download-task` 成功后,到首次 `poll-download-progress` 前,按 `firstPollDelaySec` 等待。
|
3. `create-download-task` 成功后,直接进入 `poll-download-progress`;未完成时按 `POLL_INTERVAL_SEC` 等待后再次查询当前 action。
|
||||||
4. 同一台 IP 内部:
|
4. 同一台 IP 内部:
|
||||||
- `upgrade-ip -> poll-upgrade-progress`
|
- `upgrade-ip -> poll-upgrade-progress`
|
||||||
- `poll-upgrade-progress -> start-ip`
|
- `poll-upgrade-progress -> start-ip`
|
||||||
@ -219,13 +224,14 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
5. 当前一台 IP 处理完成后,到下一台 IP 开始前,按 `perIpIntervalSec` 等待。
|
5. 当前一台 IP 处理完成后,到下一台 IP 开始前,按 `perIpIntervalSec` 等待。
|
||||||
6. 若某步骤失败后需要进入提示、确认或分支流程,可按 `failurePauseSec` 等待。
|
6. 若某步骤失败后需要进入提示、确认或分支流程,可按 `failurePauseSec` 等待。
|
||||||
7. 若某个间隔值为 `0`,表示该层级不等待,直接进入下一动作。
|
7. 若某个间隔值为 `0`,表示该层级不等待,直接进入下一动作。
|
||||||
|
8. `poll-download-progress` 和 `poll-upgrade-progress` 的脚本 action 只执行一次进度查询;正式 workflow 的循环、checkpoint、LLM 判断和进度播报由 Agent Runtime 负责。
|
||||||
|
|
||||||
### 4.2 主流程中的强制确认点
|
### 4.2 主流程中的强制确认点
|
||||||
|
|
||||||
以下节点必须等待用户确认,不能自动越过:
|
以下节点必须等待用户确认,不能自动越过:
|
||||||
|
|
||||||
1. 参数确认单确认前。
|
1. 参数确认单确认前。
|
||||||
2. 出现回滚条件时。
|
2. 执行 `rollback [IP]` 或 `rollback-ip` 前。
|
||||||
3. 用户指定 IP 与在线 IP 过滤结果不一致,且会影响部署范围时。
|
3. 用户指定 IP 与在线 IP 过滤结果不一致,且会影响部署范围时。
|
||||||
4. 用户显式要求修改默认间隔策略时。
|
4. 用户显式要求修改默认间隔策略时。
|
||||||
|
|
||||||
@ -238,9 +244,9 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
3. 在每个全局步骤成功后,告知用户该步骤已完成,并说明关键结果。
|
3. 在每个全局步骤成功后,告知用户该步骤已完成,并说明关键结果。
|
||||||
4. 在每个全局步骤失败后,立即告知用户失败阶段、失败原因和后续处理。
|
4. 在每个全局步骤失败后,立即告知用户失败阶段、失败原因和后续处理。
|
||||||
5. 在逐台 IP 处理时,必须告知当前正在处理哪一台 IP。
|
5. 在逐台 IP 处理时,必须告知当前正在处理哪一台 IP。
|
||||||
6. 在云下载进度轮询阶段,必须持续汇报当前进度,不能静默等待完成。
|
6. 在云下载和单 IP 推送进度查询阶段,每次 `poll-*` 返回后都必须汇报当前进度,不能静默等待完成。
|
||||||
7. 若执行耗时较长,必须按阶段持续播报,不能等全部结束后一次性汇总。
|
7. 若执行耗时较长,必须按阶段持续播报,不能等全部结束后一次性汇总。
|
||||||
8. 若进入回滚确认状态,必须明确告诉用户:
|
8. 若失败后建议回滚,必须明确告诉用户:
|
||||||
- 哪一台 IP 失败
|
- 哪一台 IP 失败
|
||||||
- 失败阶段
|
- 失败阶段
|
||||||
- 建议是否回滚
|
- 建议是否回滚
|
||||||
@ -349,9 +355,10 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
4. 若部分 IP 已成功完成:
|
4. 若部分 IP 已成功完成:
|
||||||
- 默认跳过成功 IP
|
- 默认跳过成功 IP
|
||||||
- 只继续未完成或失败的 IP
|
- 只继续未完成或失败的 IP
|
||||||
5. 若存在 `PENDING_AGENT_CONFIRMATION(...)`:
|
5. 若存在失败暂停或 legacy `PENDING_AGENT_CONFIRMATION(...)`:
|
||||||
- 检查点中必须保留该状态
|
- 检查点中必须保留失败阶段、失败原因和审核建议
|
||||||
- 未得到用户确认前,不得自动继续后续动作
|
- 修复后 `resume` 默认从当前失败 action 重试
|
||||||
|
- 需要回滚时必须由用户显式执行 `rollback [IP]`
|
||||||
6. 若用户要求“从头重新开始”:
|
6. 若用户要求“从头重新开始”:
|
||||||
- 先明确说明将忽略现有检查点
|
- 先明确说明将忽略现有检查点
|
||||||
- 再从第 1 步重新执行
|
- 再从第 1 步重新执行
|
||||||
@ -430,14 +437,14 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
| 12 | 获取在线 IP | `get-online-ips` | 返回 `COUNT>0` 且有 `IP=...` 行 | 停止并报告 `GET_ONLINE_IPS` 失败 |
|
| 12 | 获取在线 IP | `get-online-ips` | 返回 `COUNT>0` 且有 `IP=...` 行 | 停止并报告 `GET_ONLINE_IPS` 失败 |
|
||||||
| 13 | 过滤目标 IP | 按用户指定 IP 与在线 IP 交集过滤 | 过滤结果明确 | 过滤后为空时停止;范围变化需确认 |
|
| 13 | 过滤目标 IP | 按用户指定 IP 与在线 IP 交集过滤 | 过滤结果明确 | 过滤后为空时停止;范围变化需确认 |
|
||||||
| 14 | 创建云下载任务 | `create-download-task` | 返回 `RESULT=TASK_CREATED` | 停止并报告 `CREATE_DOWNLOAD_TASK` 失败 |
|
| 14 | 创建云下载任务 | `create-download-task` | 返回 `RESULT=TASK_CREATED` | 停止并报告 `CREATE_DOWNLOAD_TASK` 失败 |
|
||||||
| 15 | 轮询下载进度 | `poll-download-progress` | `STEP=DONE` 或 `MSG=success` 且 `RATE_OF_PROGRESS=100` | 停止并报告 `POLL_DOWNLOAD_PROGRESS` 失败或超时 |
|
| 15 | 查询下载进度 | 重复调用单次 `poll-download-progress` | LLM/规则判断 `progress_complete=true`;或 `STEP=DONE` / `MSG=success` 且 `RATE_OF_PROGRESS=100` | 停止并报告 `POLL_DOWNLOAD_PROGRESS` 失败或超时 |
|
||||||
| 16.1 | 创建单 IP 推送任务 | `upgrade-ip --ip ...` | 返回 `RESULT=TASK_CREATED` | 记录失败,标记 `PENDING_AGENT_CONFIRMATION(stopFirst=false)` |
|
| 16.1 | 创建单 IP 推送任务 | `upgrade-ip --ip ...` | 返回 `RESULT=TASK_CREATED` | 暂停在当前 action,修复后 `resume` 重试;需要回滚时显式执行 rollback |
|
||||||
| 16.2 | 轮询单 IP 推送进度 | `poll-upgrade-progress --ip ...` | `STEP=DONE` 或 `FINISH=true` 或 `MSG=success` 且 `RATE_OF_PROGRESS=100` | 记录失败,标记 `PENDING_AGENT_CONFIRMATION(stopFirst=false)` |
|
| 16.2 | 查询单 IP 推送进度 | 重复调用单次 `poll-upgrade-progress --ip ...` | LLM/规则判断 `progress_complete=true`;或 `STEP=DONE` / `FINISH=true` / `MSG=success` 且 `RATE_OF_PROGRESS=100` | 暂停在当前 action,修复后 `resume` 重试;需要回滚时显式执行 rollback |
|
||||||
| 16.3 | 启动单 IP | `start-ip --ip ...` | action 成功返回 | 记录失败,标记 `PENDING_AGENT_CONFIRMATION(stopFirst=true)` |
|
| 16.3 | 启动单 IP | `start-ip --ip ...` | action 成功返回 | 暂停在当前 action,修复后 `resume` 重试;需要回滚时显式执行 rollback |
|
||||||
| 16.4 | 校验单 IP | `verify-ip --ip ...` | 返回 `SUCCESS=true` | 记录失败,标记 `PENDING_AGENT_CONFIRMATION(stopFirst=true)` |
|
| 16.4 | 校验单 IP | `verify-ip --ip ...` | 返回 `SUCCESS=true` | 暂停在当前 action,修复后 `resume` 重试;需要回滚时显式执行 rollback |
|
||||||
| 16.5 | 下载日志 | `download-log --ip ...` | 返回 `LOG_FILE=...` | 记录日志下载失败,但不覆盖原主失败原因 |
|
| 16.5 | 下载日志 | `download-log --ip ...` | 返回 `LOG_FILE=...` | 记录日志下载失败,但不覆盖原主失败原因 |
|
||||||
| 17 | 汇总结果 | 汇总每台 IP 的阶段、失败原因、回滚状态、日志路径 | 报告内容完整 | 若汇总失败,至少保留原始 action 输出 |
|
| 17 | 汇总结果 | 汇总每台 IP 的阶段、失败原因、回滚状态、日志路径 | 报告内容完整 | 若汇总失败,至少保留原始 action 输出 |
|
||||||
| 18 | 回滚确认分支 | 发现 `PENDING_AGENT_CONFIRMATION(...)` 时进入回滚确认 | 用户明确是否回滚 | 未确认时停止,不自动回滚 |
|
| 18 | 失败暂停或显式回滚 | 失败后默认停在当前 action;用户输入 `rollback [IP]` 后才执行回滚 | 用户明确要求回滚或修复后 `resume` | 未显式要求回滚时不自动回滚 |
|
||||||
| 19 | 最终报告 | 输出最终报告 | 报告包含模式、入口、阶段结果、日志、回滚状态 | 不省略失败细节 |
|
| 19 | 最终报告 | 输出最终报告 | 报告包含模式、入口、阶段结果、日志、回滚状态 | 不省略失败细节 |
|
||||||
|
|
||||||
## 5. 通用执行原则
|
## 5. 通用执行原则
|
||||||
@ -456,7 +463,7 @@ description: 面向 PAM HOME/NODE 的智能部署 Skill。由 Skill 负责理解
|
|||||||
- `[FLOW][FAIL]`
|
- `[FLOW][FAIL]`
|
||||||
10. 只允许调用脚本 `action` 入口,禁止调用脚本主流程。
|
10. 只允许调用脚本 `action` 入口,禁止调用脚本主流程。
|
||||||
11. 脚本 action 输出以 `key=value` 为主,Agent 应优先读取这些结果行。
|
11. 脚本 action 输出以 `key=value` 为主,Agent 应优先读取这些结果行。
|
||||||
12. 遇到需要回滚的场景,脚本只返回 `PENDING_AGENT_CONFIRMATION(stopFirst=...)`,Agent 必须先确认。
|
12. 遇到需要回滚的场景,Agent 只能提示风险和建议;不得自动回滚,必须等待用户显式执行 rollback。
|
||||||
|
|
||||||
## 6. 脚本 action 能力
|
## 6. 脚本 action 能力
|
||||||
|
|
||||||
@ -485,10 +492,10 @@ powershell -File .\deploy.ps1 -ConfigPath .\config.txt -Action <ActionName> [-Ip
|
|||||||
| `get-node-url` | 获取目标 Node 地址 | 无 |
|
| `get-node-url` | 获取目标 Node 地址 | 无 |
|
||||||
| `get-online-ips` | 获取在线工作站 IP 列表 | 无 |
|
| `get-online-ips` | 获取在线工作站 IP 列表 | 无 |
|
||||||
| `create-download-task` | 创建云下载任务 | 无 |
|
| `create-download-task` | 创建云下载任务 | 无 |
|
||||||
| `poll-download-progress` | 轮询下载进度 | 无 |
|
| `poll-download-progress` | 单次查询下载进度;是否继续查询由 Agent workflow 和 LLM/规则决定 | 无 |
|
||||||
| `download-cloud-to-node` | 创建下载任务并轮询至完成,仅调试使用,不得进入正式主流程 | 无 |
|
| `download-cloud-to-node` | 创建下载任务并轮询至完成,仅调试使用,不得进入正式主流程 | 无 |
|
||||||
| `upgrade-ip` | 为指定 IP 创建推送任务,固定使用 `timeOut=0` | `--ip` / `-Ip` |
|
| `upgrade-ip` | 为指定 IP 创建推送任务,固定使用 `timeOut=0` | `--ip` / `-Ip` |
|
||||||
| `poll-upgrade-progress` | 轮询指定 IP 的推送进度 | `--ip` / `-Ip` |
|
| `poll-upgrade-progress` | 单次查询指定 IP 的推送进度;是否继续查询由 Agent workflow 和 LLM/规则决定 | `--ip` / `-Ip` |
|
||||||
| `start-ip` | 启动指定 IP 应用 | `--ip` / `-Ip` |
|
| `start-ip` | 启动指定 IP 应用 | `--ip` / `-Ip` |
|
||||||
| `stop-ip` | 停止指定 IP 应用 | `--ip` / `-Ip` |
|
| `stop-ip` | 停止指定 IP 应用 | `--ip` / `-Ip` |
|
||||||
| `verify-ip` | 校验指定 IP | `--ip` / `-Ip` |
|
| `verify-ip` | 校验指定 IP | `--ip` / `-Ip` |
|
||||||
@ -559,9 +566,9 @@ Agent 读取时:
|
|||||||
- `create-download-task`
|
- `create-download-task`
|
||||||
- `upgrade-ip`
|
- `upgrade-ip`
|
||||||
|
|
||||||
### 7.4 手动回滚分支
|
### 7.4 显式回滚命令
|
||||||
|
|
||||||
当部署结果出现 `PENDING_AGENT_CONFIRMATION(...)` 且用户明确同意回滚时:
|
当用户明确输入 `rollback [IP]` 或直接要求对指定 IP 回滚时:
|
||||||
|
|
||||||
1. 再次向用户确认目标 IP 和 `stopFirst` 值。
|
1. 再次向用户确认目标 IP 和 `stopFirst` 值。
|
||||||
2. 调用 `rollback-ip` action。
|
2. 调用 `rollback-ip` action。
|
||||||
@ -613,19 +620,16 @@ Agent 读取时:
|
|||||||
|
|
||||||
### 8.3 回滚规则
|
### 8.3 回滚规则
|
||||||
|
|
||||||
回滚只允许在 Agent 与用户确认后执行。
|
回滚只允许在用户显式要求后执行。
|
||||||
|
|
||||||
回滚状态有三类:
|
回滚状态包括:
|
||||||
|
|
||||||
- `ROLLBACK_NOT_RUN`
|
- `ROLLBACK_NOT_RUN`
|
||||||
- `PENDING_AGENT_CONFIRMATION(stopFirst=true|false)`
|
- `ROLLBACK_DONE`
|
||||||
- 真正执行后的结果:
|
- `ROLLBACK_FAILED`
|
||||||
- `ROLLBACK_SUCCESS`
|
- `REJECTED_BY_OPERATOR`
|
||||||
- `ROLLBACK_FAILED`
|
|
||||||
- `ROLLBACK_REQUEST_FAILED`
|
|
||||||
- `ROLLBACK_VERIFY_FAILED`
|
|
||||||
|
|
||||||
默认确认逻辑:
|
默认建议:
|
||||||
|
|
||||||
- 升级失败:建议回滚,`stopFirst=false`
|
- 升级失败:建议回滚,`stopFirst=false`
|
||||||
- 启动失败:建议回滚,`stopFirst=true`
|
- 启动失败:建议回滚,`stopFirst=true`
|
||||||
@ -674,7 +678,9 @@ powershell -File .\deploy.ps1 -ConfigPath .\config.txt -Action rollback-ip -Ip 1
|
|||||||
- 失败: 1
|
- 失败: 1
|
||||||
- 间隔控制:
|
- 间隔控制:
|
||||||
- stepIntervalSec: 2
|
- stepIntervalSec: 2
|
||||||
- firstPollDelaySec: 2
|
- pollIntervalSec: 2
|
||||||
|
- downloadPollMaxAttempts: 60
|
||||||
|
- upgradePollMaxAttempts: 600
|
||||||
- perIpStepIntervalSec: 1
|
- perIpStepIntervalSec: 1
|
||||||
- perIpIntervalSec: 3
|
- perIpIntervalSec: 3
|
||||||
- failurePauseSec: 0
|
- failurePauseSec: 0
|
||||||
@ -684,7 +690,7 @@ powershell -File .\deploy.ps1 -ConfigPath .\config.txt -Action rollback-ip -Ip 1
|
|||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| 192.168.1.10 | SUCCESS | - | - | logs/deploy_192.168.1.10.zip |
|
| 192.168.1.10 | SUCCESS | - | - | logs/deploy_192.168.1.10.zip |
|
||||||
| 192.168.1.11 | SUCCESS | - | - | logs/deploy_192.168.1.11.zip |
|
| 192.168.1.11 | SUCCESS | - | - | logs/deploy_192.168.1.11.zip |
|
||||||
| 192.168.1.12 | FAILED | VERIFY | PENDING_AGENT_CONFIRMATION(stopFirst=true) | logs/deploy_192.168.1.12.zip |
|
| 192.168.1.12 | FAILED | VERIFY | ROLLBACK_NOT_RUN | logs/deploy_192.168.1.12.zip |
|
||||||
```
|
```
|
||||||
|
|
||||||
更完整的最终报告模板:
|
更完整的最终报告模板:
|
||||||
@ -709,7 +715,7 @@ powershell -File .\deploy.ps1 -ConfigPath .\config.txt -Action rollback-ip -Ip 1
|
|||||||
| IP | 状态 | 失败阶段 | 失败原因 | 回滚状态 | 日志 |
|
| IP | 状态 | 失败阶段 | 失败原因 | 回滚状态 | 日志 |
|
||||||
| --- | --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- | --- |
|
||||||
| 192.168.1.10 | SUCCESS | - | - | - | logs/deploy_192.168.1.10.log |
|
| 192.168.1.10 | SUCCESS | - | - | - | logs/deploy_192.168.1.10.log |
|
||||||
| 192.168.1.12 | FAILED | VERIFY | Health check failed | PENDING_AGENT_CONFIRMATION(stopFirst=true) | logs/deploy_192.168.1.12.log |
|
| 192.168.1.12 | FAILED | VERIFY | Health check failed | ROLLBACK_NOT_RUN | logs/deploy_192.168.1.12.log |
|
||||||
|
|
||||||
## 检查点摘要
|
## 检查点摘要
|
||||||
|
|
||||||
@ -724,9 +730,10 @@ powershell -File .\deploy.ps1 -ConfigPath .\config.txt -Action rollback-ip -Ip 1
|
|||||||
- get-online-ips
|
- get-online-ips
|
||||||
- create-download-task
|
- create-download-task
|
||||||
|
|
||||||
## 待确认事项
|
## 后续建议
|
||||||
|
|
||||||
- 是否对 192.168.1.12 执行回滚
|
- 192.168.1.12 停在 verify-ip;修复后可 resume 重试当前 action
|
||||||
|
- 如确认需要回滚,可执行 rollback 192.168.1.12
|
||||||
```
|
```
|
||||||
|
|
||||||
## 10. Agent 执行建议
|
## 10. Agent 执行建议
|
||||||
@ -740,7 +747,7 @@ powershell -File .\deploy.ps1 -ConfigPath .\config.txt -Action rollback-ip -Ip 1
|
|||||||
- 回滚需要确认
|
- 回滚需要确认
|
||||||
4. 参数未确认前,不触发任何真实部署 action。
|
4. 参数未确认前,不触发任何真实部署 action。
|
||||||
5. 用户只要求“生成脚本不执行”时,由于本 Skill 禁止自动生成或修改脚本,应直接说明限制,而不是自动产出脚本文件。
|
5. 用户只要求“生成脚本不执行”时,由于本 Skill 禁止自动生成或修改脚本,应直接说明限制,而不是自动产出脚本文件。
|
||||||
6. 如果 action 输出中出现 `PENDING_AGENT_CONFIRMATION(...)`,立即中止自动后续动作并请求确认。
|
6. 如果 action 输出中出现 legacy `PENDING_AGENT_CONFIRMATION(...)`,立即暂停当前 workflow,输出建议;需要回滚时等待用户显式执行 rollback。
|
||||||
7. 如果存在检查点,优先评估能否从断点续试,而不是默认从头执行。
|
7. 如果存在检查点,优先评估能否从断点续试,而不是默认从头执行。
|
||||||
8. 任何长耗时阶段都要主动播报进度,尤其是:
|
8. 任何长耗时阶段都要主动播报进度,尤其是:
|
||||||
- `create-download-task`
|
- `create-download-task`
|
||||||
|
|||||||
@ -9,3 +9,6 @@ ZIP_FILE_PATH=C:\path\to\pam-2.0.5.zip
|
|||||||
ACTION_TYPE=FULL
|
ACTION_TYPE=FULL
|
||||||
TIMEOUT=120
|
TIMEOUT=120
|
||||||
LOG_NAME=app.log
|
LOG_NAME=app.log
|
||||||
|
POLL_INTERVAL_SEC=2
|
||||||
|
DOWNLOAD_POLL_MAX_ATTEMPTS=60
|
||||||
|
UPGRADE_POLL_MAX_ATTEMPTS=600
|
||||||
|
|||||||
@ -23,6 +23,8 @@ Notes:
|
|||||||
- deploy.bat is only a wrapper for this script.
|
- deploy.bat is only a wrapper for this script.
|
||||||
- The wrapper avoids cmd.exe delayed-expansion issues with CLIENT_SECRET values
|
- The wrapper avoids cmd.exe delayed-expansion issues with CLIENT_SECRET values
|
||||||
containing exclamation marks.
|
containing exclamation marks.
|
||||||
|
- poll-download-progress and poll-upgrade-progress only query progress once.
|
||||||
|
The Agent workflow repeats them and asks LLM/rules to judge completion.
|
||||||
'@ | Write-Host
|
'@ | Write-Host
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -366,6 +368,9 @@ function Get-PamConfig {
|
|||||||
'ACTION_TYPE' { $config[$key] = $value }
|
'ACTION_TYPE' { $config[$key] = $value }
|
||||||
'TIMEOUT' { $config[$key] = $value }
|
'TIMEOUT' { $config[$key] = $value }
|
||||||
'LOG_NAME' { $config[$key] = $value }
|
'LOG_NAME' { $config[$key] = $value }
|
||||||
|
'POLL_INTERVAL_SEC' { $config[$key] = $value }
|
||||||
|
'DOWNLOAD_POLL_MAX_ATTEMPTS' { $config[$key] = $value }
|
||||||
|
'UPGRADE_POLL_MAX_ATTEMPTS' { $config[$key] = $value }
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
@ -384,6 +389,9 @@ function Get-PamConfig {
|
|||||||
ACTION_TYPE = 'FULL'
|
ACTION_TYPE = 'FULL'
|
||||||
TIMEOUT = '120'
|
TIMEOUT = '120'
|
||||||
LOG_NAME = 'app.log'
|
LOG_NAME = 'app.log'
|
||||||
|
POLL_INTERVAL_SEC = '2'
|
||||||
|
DOWNLOAD_POLL_MAX_ATTEMPTS = '60'
|
||||||
|
UPGRADE_POLL_MAX_ATTEMPTS = '600'
|
||||||
}
|
}
|
||||||
|
|
||||||
foreach ($name in $defaults.Keys) {
|
foreach ($name in $defaults.Keys) {
|
||||||
@ -647,8 +655,14 @@ function Wait-DownloadProgress {
|
|||||||
RateOfProgress = ''
|
RateOfProgress = ''
|
||||||
RawResponse = ''
|
RawResponse = ''
|
||||||
}
|
}
|
||||||
|
$maxAttempts = 60
|
||||||
|
[int]::TryParse([string]$Config.DOWNLOAD_POLL_MAX_ATTEMPTS, [ref]$maxAttempts) | Out-Null
|
||||||
|
if ($maxAttempts -lt 1) { $maxAttempts = 60 }
|
||||||
|
$pollIntervalSec = 2
|
||||||
|
[int]::TryParse([string]$Config.POLL_INTERVAL_SEC, [ref]$pollIntervalSec) | Out-Null
|
||||||
|
if ($pollIntervalSec -lt 0) { $pollIntervalSec = 2 }
|
||||||
|
|
||||||
for ($attempt = 0; $attempt -lt 60; $attempt++) {
|
for ($attempt = 0; $attempt -lt $maxAttempts; $attempt++) {
|
||||||
$response = Invoke-PamWebRequest -Method GET -Url $progressUrl -Token $Token -Headers @{
|
$response = Invoke-PamWebRequest -Method GET -Url $progressUrl -Token $Token -Headers @{
|
||||||
'Target-Node' = $NodeUrl
|
'Target-Node' = $NodeUrl
|
||||||
}
|
}
|
||||||
@ -681,7 +695,7 @@ function Wait-DownloadProgress {
|
|||||||
if ($progressParts.Count -gt 0) {
|
if ($progressParts.Count -gt 0) {
|
||||||
Write-Info ("Step 3.3b: async download progress -> {0}" -f ($progressParts -join ', '))
|
Write-Info ("Step 3.3b: async download progress -> {0}" -f ($progressParts -join ', '))
|
||||||
} else {
|
} else {
|
||||||
Write-Info ("Step 3.3b: async download progress polling... ({0}/60)" -f ($attempt + 1))
|
Write-Info ("Step 3.3b: async download progress polling... ({0}/{1})" -f ($attempt + 1), $maxAttempts)
|
||||||
}
|
}
|
||||||
|
|
||||||
if ($step -eq 'DONE' -or $status -eq 'completed' -or $successFlag -eq 'true' -or (($msg -eq 'success') -and ($progressValue -eq '100'))) {
|
if ($step -eq 'DONE' -or $status -eq 'completed' -or $successFlag -eq 'true' -or (($msg -eq 'success') -and ($progressValue -eq '100'))) {
|
||||||
@ -694,12 +708,64 @@ function Wait-DownloadProgress {
|
|||||||
throw "Node download failed: $message"
|
throw "Node download failed: $message"
|
||||||
}
|
}
|
||||||
|
|
||||||
Start-Sleep -Seconds 2
|
Start-Sleep -Seconds $pollIntervalSec
|
||||||
}
|
}
|
||||||
|
|
||||||
throw 'Node download timed out.'
|
throw 'Node download timed out.'
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function Read-DownloadProgress {
|
||||||
|
param($Config, [string]$Token, [string]$NodeUrl)
|
||||||
|
|
||||||
|
$query = Join-RequestPairs ([ordered]@{
|
||||||
|
applicationName = $Config.APP_NAME
|
||||||
|
moduleName = $Config.MODULE_NAME
|
||||||
|
airportCode = $Config.AIRPORT_CODE
|
||||||
|
versionNumber = $Config.VERSION_NUMBER
|
||||||
|
})
|
||||||
|
$progressUrl = "$($Config.HOME_BASE_URL)/node-proxy/$($Config.AIRPORT_CODE)/api/mcp/version/upgrade/download-cloud/progress?$query"
|
||||||
|
$response = Invoke-PamWebRequest -Method GET -Url $progressUrl -Token $Token -Headers @{
|
||||||
|
'Target-Node' = $NodeUrl
|
||||||
|
}
|
||||||
|
|
||||||
|
$status = Get-ResponseValue -Response $response -Candidates @('status')
|
||||||
|
$successFlag = Get-ResponseValue -Response $response -Candidates @('success')
|
||||||
|
$step = Get-ResponseValue -Response $response -Candidates @('step')
|
||||||
|
$msg = Get-ResponseValue -Response $response -Candidates @('msg')
|
||||||
|
$progressValue = Get-ResponseValue -Response $response -Candidates @('rateOfProgress', 'progress', 'percent', 'data.rateOfProgress', 'data.progress', 'data.percent')
|
||||||
|
$message = Get-ResponseValue -Response $response -Candidates @('message')
|
||||||
|
if (-not $message) { $message = $msg }
|
||||||
|
$script:DownloadProgressState = [ordered]@{
|
||||||
|
Status = [string]$status
|
||||||
|
Success = [string]$successFlag
|
||||||
|
Step = [string]$step
|
||||||
|
Msg = [string]$msg
|
||||||
|
Message = [string]$message
|
||||||
|
RateOfProgress = [string]$progressValue
|
||||||
|
RawResponse = [string]$response
|
||||||
|
}
|
||||||
|
|
||||||
|
$progressParts = [System.Collections.Generic.List[string]]::new()
|
||||||
|
if ($msg) { $progressParts.Add("msg=$msg") }
|
||||||
|
if ($step) { $progressParts.Add("step=$step") }
|
||||||
|
if ($progressValue) { $progressParts.Add("rateOfProgress=$progressValue") }
|
||||||
|
if ($status) { $progressParts.Add("status=$status") }
|
||||||
|
if ($successFlag) { $progressParts.Add("success=$successFlag") }
|
||||||
|
if ($message -and $message -ne $msg) { $progressParts.Add("message=$message") }
|
||||||
|
|
||||||
|
if ($progressParts.Count -gt 0) {
|
||||||
|
Write-Info ("Step 3.3b: async download progress single query -> {0}" -f ($progressParts -join ', '))
|
||||||
|
} else {
|
||||||
|
Write-Info 'Step 3.3b: async download progress single query returned no explicit progress fields.'
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((@($step, $message, $msg, $status) -join ' ') -match '(?i)fail|error') {
|
||||||
|
if (-not $message) { $message = $step }
|
||||||
|
if (-not $message) { $message = $msg }
|
||||||
|
throw "Node download failed: $message"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
function Create-DownloadTask {
|
function Create-DownloadTask {
|
||||||
param($Config, [string]$Token, [string]$NodeUrl)
|
param($Config, [string]$Token, [string]$NodeUrl)
|
||||||
|
|
||||||
@ -751,8 +817,14 @@ function Wait-UpgradeProgress {
|
|||||||
LastModify = ''
|
LastModify = ''
|
||||||
RawResponse = ''
|
RawResponse = ''
|
||||||
}
|
}
|
||||||
|
$maxAttempts = 600
|
||||||
|
[int]::TryParse([string]$Config.UPGRADE_POLL_MAX_ATTEMPTS, [ref]$maxAttempts) | Out-Null
|
||||||
|
if ($maxAttempts -lt 1) { $maxAttempts = 600 }
|
||||||
|
$pollIntervalSec = 2
|
||||||
|
[int]::TryParse([string]$Config.POLL_INTERVAL_SEC, [ref]$pollIntervalSec) | Out-Null
|
||||||
|
if ($pollIntervalSec -lt 0) { $pollIntervalSec = 2 }
|
||||||
|
|
||||||
for ($attempt = 0; $attempt -lt 60; $attempt++) {
|
for ($attempt = 0; $attempt -lt $maxAttempts; $attempt++) {
|
||||||
$response = Invoke-PamWebRequest -Method GET -Url $progressUrl -Token $Token -Headers @{
|
$response = Invoke-PamWebRequest -Method GET -Url $progressUrl -Token $Token -Headers @{
|
||||||
'Target-Node' = $NodeUrl
|
'Target-Node' = $NodeUrl
|
||||||
}
|
}
|
||||||
@ -797,7 +869,7 @@ function Wait-UpgradeProgress {
|
|||||||
if ($progressParts.Count -gt 1) {
|
if ($progressParts.Count -gt 1) {
|
||||||
Write-Info ("Step 3.4a: async upgrade progress -> {0}" -f ($progressParts -join ', '))
|
Write-Info ("Step 3.4a: async upgrade progress -> {0}" -f ($progressParts -join ', '))
|
||||||
} else {
|
} else {
|
||||||
Write-Info ("Step 3.4a: async upgrade progress polling... ip={0} ({1}/60)" -f $Ip, ($attempt + 1))
|
Write-Info ("Step 3.4a: async upgrade progress polling... ip={0} ({1}/{2})" -f $Ip, ($attempt + 1), $maxAttempts)
|
||||||
}
|
}
|
||||||
|
|
||||||
if ($step -eq 'DONE' -or $finish -eq 'true' -or $status -eq 'completed' -or $successFlag -eq 'true') {
|
if ($step -eq 'DONE' -or $finish -eq 'true' -or $status -eq 'completed' -or $successFlag -eq 'true') {
|
||||||
@ -821,12 +893,88 @@ function Wait-UpgradeProgress {
|
|||||||
throw "Node upgrade failed: ip=$Ip, message=$message"
|
throw "Node upgrade failed: ip=$Ip, message=$message"
|
||||||
}
|
}
|
||||||
|
|
||||||
Start-Sleep -Seconds 2
|
Start-Sleep -Seconds $pollIntervalSec
|
||||||
}
|
}
|
||||||
|
|
||||||
throw "Node upgrade timed out: ip=$Ip"
|
throw "Node upgrade timed out: ip=$Ip"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function Read-UpgradeProgress {
|
||||||
|
param(
|
||||||
|
$Config,
|
||||||
|
[string]$Token,
|
||||||
|
[string]$NodeUrl,
|
||||||
|
[string]$Ip
|
||||||
|
)
|
||||||
|
|
||||||
|
$query = Join-RequestPairs ([ordered]@{
|
||||||
|
applicationName = $Config.APP_NAME
|
||||||
|
moduleName = $Config.MODULE_NAME
|
||||||
|
airportCode = $Config.AIRPORT_CODE
|
||||||
|
versionNumber = $Config.VERSION_NUMBER
|
||||||
|
})
|
||||||
|
$progressUrl = "$($Config.HOME_BASE_URL)/node-proxy/$($Config.AIRPORT_CODE)/api/mcp/version/upgrade/progress?$query"
|
||||||
|
$response = Invoke-PamWebRequest -Method GET -Url $progressUrl -Token $Token -Headers @{
|
||||||
|
'Target-Node' = $NodeUrl
|
||||||
|
}
|
||||||
|
$progressResponse = Get-ScopedResponseObject -Response $response -ScopeKey $Ip
|
||||||
|
|
||||||
|
$status = Get-ResponseValue -Response $progressResponse -Candidates @('status')
|
||||||
|
$successFlag = Get-ResponseValue -Response $progressResponse -Candidates @('success')
|
||||||
|
$step = Get-ResponseValue -Response $progressResponse -Candidates @('step')
|
||||||
|
$msg = Get-ResponseValue -Response $progressResponse -Candidates @('msg')
|
||||||
|
$progressValue = Get-ResponseValue -Response $progressResponse -Candidates @('rateOfProgress', 'progress', 'percent', 'data.rateOfProgress', 'data.progress', 'data.percent')
|
||||||
|
$message = Get-ResponseValue -Response $progressResponse -Candidates @('message')
|
||||||
|
$code = Get-ResponseValue -Response $progressResponse -Candidates @('code')
|
||||||
|
$finish = Get-ResponseValue -Response $progressResponse -Candidates @('finish')
|
||||||
|
$lastModify = Get-ResponseValue -Response $progressResponse -Candidates @('lastModify')
|
||||||
|
if (-not $message) { $message = $msg }
|
||||||
|
|
||||||
|
$script:UpgradeProgressState = [ordered]@{
|
||||||
|
Status = [string]$status
|
||||||
|
Success = [string]$successFlag
|
||||||
|
Step = [string]$step
|
||||||
|
Msg = [string]$msg
|
||||||
|
Message = [string]$message
|
||||||
|
RateOfProgress = [string]$progressValue
|
||||||
|
Code = [string]$code
|
||||||
|
Finish = [string]$finish
|
||||||
|
LastModify = [string]$lastModify
|
||||||
|
RawResponse = [string]$response
|
||||||
|
}
|
||||||
|
|
||||||
|
$progressParts = [System.Collections.Generic.List[string]]::new()
|
||||||
|
$progressParts.Add("ip=$Ip")
|
||||||
|
if ($msg) { $progressParts.Add("msg=$msg") }
|
||||||
|
if ($step) { $progressParts.Add("step=$step") }
|
||||||
|
if ($progressValue) { $progressParts.Add("rateOfProgress=$progressValue") }
|
||||||
|
if ($code) { $progressParts.Add("code=$code") }
|
||||||
|
if ($finish) { $progressParts.Add("finish=$finish") }
|
||||||
|
if ($status) { $progressParts.Add("status=$status") }
|
||||||
|
if ($successFlag) { $progressParts.Add("success=$successFlag") }
|
||||||
|
if ($lastModify) { $progressParts.Add("lastModify=$lastModify") }
|
||||||
|
if ($message -and $message -ne $msg) { $progressParts.Add("message=$message") }
|
||||||
|
|
||||||
|
if ($progressParts.Count -gt 1) {
|
||||||
|
Write-Info ("Step 3.4a: async upgrade progress single query -> {0}" -f ($progressParts -join ', '))
|
||||||
|
} else {
|
||||||
|
Write-Info ("Step 3.4a: async upgrade progress single query returned no explicit progress fields: ip={0}" -f $Ip)
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($code -and $code -ne '0') {
|
||||||
|
if (-not $message) { $message = $msg }
|
||||||
|
if (-not $message) { $message = $step }
|
||||||
|
if (-not $message) { $message = "code=$code" }
|
||||||
|
throw "Node upgrade failed: ip=$Ip, message=$message"
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((@($step, $message, $msg, $status) -join ' ') -match '(?i)fail|error') {
|
||||||
|
if (-not $message) { $message = $step }
|
||||||
|
if (-not $message) { $message = $msg }
|
||||||
|
throw "Node upgrade failed: ip=$Ip, message=$message"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
function Invoke-UpgradeRequest {
|
function Invoke-UpgradeRequest {
|
||||||
param($Config, [string]$Token, [string]$NodeUrl, [string]$Ip)
|
param($Config, [string]$Token, [string]$NodeUrl, [string]$Ip)
|
||||||
|
|
||||||
@ -1273,7 +1421,7 @@ function Invoke-PamAction {
|
|||||||
'poll-download-progress' {
|
'poll-download-progress' {
|
||||||
$token = Invoke-FlowStep -Name 'Get-Token' -Action { Get-Token -Config $config }
|
$token = Invoke-FlowStep -Name 'Get-Token' -Action { Get-Token -Config $config }
|
||||||
$nodeUrl = Invoke-FlowStep -Name 'Get-NodeUrl' -Action { Get-NodeUrl -Config $config -Token $token }
|
$nodeUrl = Invoke-FlowStep -Name 'Get-NodeUrl' -Action { Get-NodeUrl -Config $config -Token $token }
|
||||||
Invoke-FlowStep -Name 'Wait-DownloadProgress' -Action { Wait-DownloadProgress -Config $config -Token $token -NodeUrl $nodeUrl } | Out-Null
|
Invoke-FlowStep -Name 'Read-DownloadProgress' -Action { Read-DownloadProgress -Config $config -Token $token -NodeUrl $nodeUrl } | Out-Null
|
||||||
Write-DownloadProgressResult
|
Write-DownloadProgressResult
|
||||||
}
|
}
|
||||||
'download-cloud-to-node' {
|
'download-cloud-to-node' {
|
||||||
@ -1287,7 +1435,7 @@ function Invoke-PamAction {
|
|||||||
Require-IpArgument -TargetIp $Ip
|
Require-IpArgument -TargetIp $Ip
|
||||||
$token = Invoke-FlowStep -Name 'Get-Token' -Action { Get-Token -Config $config }
|
$token = Invoke-FlowStep -Name 'Get-Token' -Action { Get-Token -Config $config }
|
||||||
$nodeUrl = Invoke-FlowStep -Name 'Get-NodeUrl' -Action { Get-NodeUrl -Config $config -Token $token }
|
$nodeUrl = Invoke-FlowStep -Name 'Get-NodeUrl' -Action { Get-NodeUrl -Config $config -Token $token }
|
||||||
Invoke-FlowStep -Name "Wait-UpgradeProgress[$Ip]" -Action { Wait-UpgradeProgress -Config $config -Token $token -NodeUrl $nodeUrl -Ip $Ip } | Out-Null
|
Invoke-FlowStep -Name "Read-UpgradeProgress[$Ip]" -Action { Read-UpgradeProgress -Config $config -Token $token -NodeUrl $nodeUrl -Ip $Ip } | Out-Null
|
||||||
Write-UpgradeProgressResult -Ip $Ip
|
Write-UpgradeProgressResult -Ip $Ip
|
||||||
}
|
}
|
||||||
'upgrade-ip' {
|
'upgrade-ip' {
|
||||||
|
|||||||
@ -57,6 +57,13 @@ usage() {
|
|||||||
ACTION_TYPE
|
ACTION_TYPE
|
||||||
TIMEOUT
|
TIMEOUT
|
||||||
LOG_NAME
|
LOG_NAME
|
||||||
|
POLL_INTERVAL_SEC
|
||||||
|
DOWNLOAD_POLL_MAX_ATTEMPTS
|
||||||
|
UPGRADE_POLL_MAX_ATTEMPTS
|
||||||
|
|
||||||
|
说明:
|
||||||
|
--action poll-download-progress 和 poll-upgrade-progress 只执行一次进度查询。
|
||||||
|
Agent workflow 会重复调用单次进度查询,并在每次返回后交给 LLM/规则审核判断是否完成。
|
||||||
EOF
|
EOF
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -342,6 +349,9 @@ set_defaults() {
|
|||||||
: "${ACTION_TYPE:=FULL}"
|
: "${ACTION_TYPE:=FULL}"
|
||||||
: "${TIMEOUT:=120}"
|
: "${TIMEOUT:=120}"
|
||||||
: "${LOG_NAME:=app.log}"
|
: "${LOG_NAME:=app.log}"
|
||||||
|
: "${POLL_INTERVAL_SEC:=2}"
|
||||||
|
: "${DOWNLOAD_POLL_MAX_ATTEMPTS:=60}"
|
||||||
|
: "${UPGRADE_POLL_MAX_ATTEMPTS:=600}"
|
||||||
}
|
}
|
||||||
|
|
||||||
load_config() {
|
load_config() {
|
||||||
@ -366,7 +376,7 @@ load_config() {
|
|||||||
value="$(strip_inline_comment "$value")"
|
value="$(strip_inline_comment "$value")"
|
||||||
|
|
||||||
case "$key" in
|
case "$key" in
|
||||||
HOME_BASE_URL|CLIENT_ID|CLIENT_SECRET|AIRPORT_CODE|APP_NAME|MODULE_NAME|VERSION_NUMBER|ZIP_FILE_PATH|ACTION_TYPE|TIMEOUT|LOG_NAME)
|
HOME_BASE_URL|CLIENT_ID|CLIENT_SECRET|AIRPORT_CODE|APP_NAME|MODULE_NAME|VERSION_NUMBER|ZIP_FILE_PATH|ACTION_TYPE|TIMEOUT|LOG_NAME|POLL_INTERVAL_SEC|DOWNLOAD_POLL_MAX_ATTEMPTS|UPGRADE_POLL_MAX_ATTEMPTS)
|
||||||
printf -v "$key" '%s' "$value"
|
printf -v "$key" '%s' "$value"
|
||||||
;;
|
;;
|
||||||
esac
|
esac
|
||||||
@ -961,8 +971,6 @@ get_online_ips() {
|
|||||||
|
|
||||||
poll_download_progress() {
|
poll_download_progress() {
|
||||||
local progress_url="${HOME_BASE_URL}/node-proxy/${AIRPORT_CODE}/api/mcp/version/upgrade/download-cloud/progress?applicationName=${APP_NAME}&moduleName=${MODULE_NAME}&airportCode=${AIRPORT_CODE}&versionNumber=${VERSION_NUMBER}"
|
local progress_url="${HOME_BASE_URL}/node-proxy/${AIRPORT_CODE}/api/mcp/version/upgrade/download-cloud/progress?applicationName=${APP_NAME}&moduleName=${MODULE_NAME}&airportCode=${AIRPORT_CODE}&versionNumber=${VERSION_NUMBER}"
|
||||||
local attempt=0
|
|
||||||
local max_attempts=60
|
|
||||||
local error_regex='[Ff]ail|[Ee]rror'
|
local error_regex='[Ff]ail|[Ee]rror'
|
||||||
|
|
||||||
DOWNLOAD_PROGRESS_STATUS=""
|
DOWNLOAD_PROGRESS_STATUS=""
|
||||||
@ -973,7 +981,6 @@ poll_download_progress() {
|
|||||||
DOWNLOAD_PROGRESS_RATE=""
|
DOWNLOAD_PROGRESS_RATE=""
|
||||||
DOWNLOAD_PROGRESS_RESPONSE=""
|
DOWNLOAD_PROGRESS_RESPONSE=""
|
||||||
|
|
||||||
while (( attempt < max_attempts )); do
|
|
||||||
local response
|
local response
|
||||||
response=$(http_request "GET" "$progress_url" "" "" "Target-Node: ${NODE_URL}") || return 1
|
response=$(http_request "GET" "$progress_url" "" "" "Target-Node: ${NODE_URL}") || return 1
|
||||||
|
|
||||||
@ -1010,28 +1017,42 @@ poll_download_progress() {
|
|||||||
[[ -n "$status" ]] && progress_parts+=("status=${status}")
|
[[ -n "$status" ]] && progress_parts+=("status=${status}")
|
||||||
[[ -n "$success_flag" ]] && progress_parts+=("success=${success_flag}")
|
[[ -n "$success_flag" ]] && progress_parts+=("success=${success_flag}")
|
||||||
[[ -n "$message" && "$message" != "$msg_value" ]] && progress_parts+=("message=${message}")
|
[[ -n "$message" && "$message" != "$msg_value" ]] && progress_parts+=("message=${message}")
|
||||||
log_info "Step 3.3b: 异步下载进度 -> ${progress_parts[*]}"
|
log_info "Step 3.3b: 异步下载进度单次查询 -> ${progress_parts[*]}"
|
||||||
else
|
else
|
||||||
log_info "Step 3.3b: 异步下载进度轮询中... ($((attempt + 1))/${max_attempts})"
|
log_info "Step 3.3b: 异步下载进度单次查询未返回明确进度字段。"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [[ "$step_value" == "DONE" || "$status" == "completed" || "$success_flag" == "true" ]]; then
|
if [[ "${step_value} ${message} ${msg_value} ${status}" =~ $error_regex ]]; then
|
||||||
return 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ "$msg_value" == "success" && "$progress_value" == "100" ]]; then
|
|
||||||
return 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ "${step_value} ${message} ${msg_value}" =~ $error_regex ]]; then
|
|
||||||
[[ -z "$message" ]] && message="$step_value"
|
[[ -z "$message" ]] && message="$step_value"
|
||||||
[[ -z "$message" ]] && message="$msg_value"
|
[[ -z "$message" ]] && message="$msg_value"
|
||||||
log_error "Node 下载失败: $message"
|
log_error "Node 下载失败: $message"
|
||||||
return 1
|
return 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
download_progress_complete() {
|
||||||
|
[[ "$DOWNLOAD_PROGRESS_STEP" == "DONE" || "$DOWNLOAD_PROGRESS_STATUS" == "completed" || "$DOWNLOAD_PROGRESS_SUCCESS" == "true" ]] && return 0
|
||||||
|
[[ "$DOWNLOAD_PROGRESS_MSG" == "success" && "$DOWNLOAD_PROGRESS_RATE" == "100" ]] && return 0
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_download_progress() {
|
||||||
|
local attempt=0
|
||||||
|
local max_attempts="${DOWNLOAD_POLL_MAX_ATTEMPTS:-60}"
|
||||||
|
local interval_sec="${POLL_INTERVAL_SEC:-2}"
|
||||||
|
[[ "$max_attempts" =~ ^[0-9]+$ ]] || max_attempts=60
|
||||||
|
[[ -n "$interval_sec" ]] || interval_sec=2
|
||||||
|
|
||||||
|
while (( attempt < max_attempts )); do
|
||||||
|
poll_download_progress || return 1
|
||||||
|
if download_progress_complete; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
attempt=$((attempt + 1))
|
attempt=$((attempt + 1))
|
||||||
sleep 2
|
log_info "Step 3.3b: 异步下载进度未完成,等待下一次查询... (${attempt}/${max_attempts})"
|
||||||
|
sleep "$interval_sec"
|
||||||
done
|
done
|
||||||
|
|
||||||
log_error "Node 下载超时。"
|
log_error "Node 下载超时。"
|
||||||
@ -1050,14 +1071,12 @@ create_download_task() {
|
|||||||
|
|
||||||
download_cloud_to_node() {
|
download_cloud_to_node() {
|
||||||
create_download_task || return 1
|
create_download_task || return 1
|
||||||
poll_download_progress
|
wait_download_progress
|
||||||
}
|
}
|
||||||
|
|
||||||
poll_upgrade_progress() {
|
poll_upgrade_progress() {
|
||||||
local ip="$1"
|
local ip="$1"
|
||||||
local progress_url="${HOME_BASE_URL}/node-proxy/${AIRPORT_CODE}/api/mcp/version/upgrade/progress?applicationName=${APP_NAME}&moduleName=${MODULE_NAME}&airportCode=${AIRPORT_CODE}&versionNumber=${VERSION_NUMBER}"
|
local progress_url="${HOME_BASE_URL}/node-proxy/${AIRPORT_CODE}/api/mcp/version/upgrade/progress?applicationName=${APP_NAME}&moduleName=${MODULE_NAME}&airportCode=${AIRPORT_CODE}&versionNumber=${VERSION_NUMBER}"
|
||||||
local attempt=0
|
|
||||||
local max_attempts=600
|
|
||||||
local error_regex='[Ff]ail|[Ee]rror'
|
local error_regex='[Ff]ail|[Ee]rror'
|
||||||
|
|
||||||
UPGRADE_PROGRESS_STATUS=""
|
UPGRADE_PROGRESS_STATUS=""
|
||||||
@ -1071,7 +1090,6 @@ poll_upgrade_progress() {
|
|||||||
UPGRADE_PROGRESS_LAST_MODIFY=""
|
UPGRADE_PROGRESS_LAST_MODIFY=""
|
||||||
UPGRADE_PROGRESS_RESPONSE=""
|
UPGRADE_PROGRESS_RESPONSE=""
|
||||||
|
|
||||||
while (( attempt < max_attempts )); do
|
|
||||||
local response
|
local response
|
||||||
response=$(http_request "GET" "$progress_url" "" "" "Target-Node: ${NODE_URL}") || return 1
|
response=$(http_request "GET" "$progress_url" "" "" "Target-Node: ${NODE_URL}") || return 1
|
||||||
|
|
||||||
@ -1120,17 +1138,9 @@ poll_upgrade_progress() {
|
|||||||
[[ -n "$success_flag" ]] && progress_parts+=("success=${success_flag}")
|
[[ -n "$success_flag" ]] && progress_parts+=("success=${success_flag}")
|
||||||
[[ -n "$last_modify_value" ]] && progress_parts+=("lastModify=${last_modify_value}")
|
[[ -n "$last_modify_value" ]] && progress_parts+=("lastModify=${last_modify_value}")
|
||||||
[[ -n "$message" && "$message" != "$msg_value" ]] && progress_parts+=("message=${message}")
|
[[ -n "$message" && "$message" != "$msg_value" ]] && progress_parts+=("message=${message}")
|
||||||
log_info "Step 3.4a: async push progress -> ${progress_parts[*]}"
|
log_info "Step 3.4a: async push progress single query -> ${progress_parts[*]}"
|
||||||
else
|
else
|
||||||
log_info "Step 3.4a: async push progress polling... ip=${ip} ($((attempt + 1))/${max_attempts})"
|
log_info "Step 3.4a: async push progress single query returned no explicit progress fields: ip=${ip}"
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ "$step_value" == "DONE" || "$finish_value" == "true" || "$status" == "completed" || "$success_flag" == "true" ]]; then
|
|
||||||
return 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ "$msg_value" == "success" && "$progress_value" == "100" ]] && [[ -z "$code_value" || "$code_value" == "0" ]]; then
|
|
||||||
return 0
|
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [[ -n "$code_value" && "$code_value" != "0" ]]; then
|
if [[ -n "$code_value" && "$code_value" != "0" ]]; then
|
||||||
@ -1148,8 +1158,31 @@ poll_upgrade_progress() {
|
|||||||
return 1
|
return 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
upgrade_progress_complete() {
|
||||||
|
[[ "$UPGRADE_PROGRESS_STEP" == "DONE" || "$UPGRADE_PROGRESS_FINISH" == "true" || "$UPGRADE_PROGRESS_STATUS" == "completed" || "$UPGRADE_PROGRESS_SUCCESS" == "true" ]] && return 0
|
||||||
|
[[ "$UPGRADE_PROGRESS_MSG" == "success" && "$UPGRADE_PROGRESS_RATE" == "100" ]] && [[ -z "$UPGRADE_PROGRESS_CODE" || "$UPGRADE_PROGRESS_CODE" == "0" ]] && return 0
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_upgrade_progress() {
|
||||||
|
local ip="$1"
|
||||||
|
local attempt=0
|
||||||
|
local max_attempts="${UPGRADE_POLL_MAX_ATTEMPTS:-600}"
|
||||||
|
local interval_sec="${POLL_INTERVAL_SEC:-2}"
|
||||||
|
[[ "$max_attempts" =~ ^[0-9]+$ ]] || max_attempts=600
|
||||||
|
[[ -n "$interval_sec" ]] || interval_sec=2
|
||||||
|
|
||||||
|
while (( attempt < max_attempts )); do
|
||||||
|
poll_upgrade_progress "$ip" || return 1
|
||||||
|
if upgrade_progress_complete; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
attempt=$((attempt + 1))
|
attempt=$((attempt + 1))
|
||||||
sleep 2
|
log_info "Step 3.4a: async push progress not complete, waiting for next query... ip=${ip} (${attempt}/${max_attempts})"
|
||||||
|
sleep "$interval_sec"
|
||||||
done
|
done
|
||||||
|
|
||||||
log_error "Node push timed out: ip=${ip}"
|
log_error "Node push timed out: ip=${ip}"
|
||||||
@ -1522,7 +1555,7 @@ deploy_one_ip() {
|
|||||||
return
|
return
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if ! run_flow_step "poll_upgrade_progress[${ip}]" poll_upgrade_progress "$ip"; then
|
if ! run_flow_step "wait_upgrade_progress[${ip}]" wait_upgrade_progress "$ip"; then
|
||||||
local message
|
local message
|
||||||
message="$UPGRADE_PROGRESS_MESSAGE"
|
message="$UPGRADE_PROGRESS_MESSAGE"
|
||||||
[[ -z "$message" ]] && message="$UPGRADE_PROGRESS_MSG"
|
[[ -z "$message" ]] && message="$UPGRADE_PROGRESS_MSG"
|
||||||
|
|||||||
@ -79,13 +79,19 @@ flowchart TD
|
|||||||
G4 --> G5[get-node-url]
|
G4 --> G5[get-node-url]
|
||||||
G5 --> G6[get-online-ips]
|
G5 --> G6[get-online-ips]
|
||||||
G6 --> G7[create-download-task]
|
G6 --> G7[create-download-task]
|
||||||
G7 --> G8[poll-download-progress]
|
G7 --> G8[poll-download-progress 单次查询]
|
||||||
G8 --> H[prepare_ip 节点选择下一个 IP action]
|
G8 --> G9{LLM/规则判断下载完成}
|
||||||
|
G9 -- 未完成且正常 --> G8
|
||||||
|
G9 -- 已完成 --> H[prepare_ip 节点选择下一个 IP action]
|
||||||
|
G9 -- 异常或超时 --> R
|
||||||
|
|
||||||
H --> I[resolve_target_ips 计算目标 IP]
|
H --> I[resolve_target_ips 计算目标 IP]
|
||||||
I --> J[ip_action 节点执行 upgrade-ip]
|
I --> J[ip_action 节点执行 upgrade-ip]
|
||||||
J --> K[ip_action 节点执行 poll-upgrade-progress]
|
J --> K[ip_action 节点执行 poll-upgrade-progress 单次查询]
|
||||||
K --> L[ip_action 节点执行 start-ip]
|
K --> K1{LLM/规则判断推送完成}
|
||||||
|
K1 -- 未完成且正常 --> K
|
||||||
|
K1 -- 已完成 --> L[ip_action 节点执行 start-ip]
|
||||||
|
K1 -- 异常或超时 --> R
|
||||||
L --> M[ip_action 节点执行 verify-ip]
|
L --> M[ip_action 节点执行 verify-ip]
|
||||||
M --> N[ip_action 节点执行 download-log]
|
M --> N[ip_action 节点执行 download-log]
|
||||||
N --> O{还有下一个 IP}
|
N --> O{还有下一个 IP}
|
||||||
@ -133,6 +139,29 @@ flowchart TD
|
|||||||
- 如果审核建议停止或审核本身失败,当前 action 不会计入 completed,`resume` 会重试当前 action。
|
- 如果审核建议停止或审核本身失败,当前 action 不会计入 completed,`resume` 会重试当前 action。
|
||||||
- 如果审核本身失败,也会生成“停止继续”的审核结果并暂停流程,避免黑盒继续执行。
|
- 如果审核本身失败,也会生成“停止继续”的审核结果并暂停流程,避免黑盒继续执行。
|
||||||
|
|
||||||
|
## 进度查询 action 语义
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TD
|
||||||
|
A[poll-download-progress / poll-upgrade-progress] --> B[执行一次进度查询]
|
||||||
|
B --> C[ActionResult 返回结构化进度字段]
|
||||||
|
C --> D[LLM/规则审核 progress_complete]
|
||||||
|
D --> E{是否完成}
|
||||||
|
E -- 是 --> F[写入 completed,进入下一个 action]
|
||||||
|
E -- 否但正常 --> G[追加 ACTION_PROGRESS,保存 checkpoint]
|
||||||
|
G --> H[按 POLL_INTERVAL_SEC 等待]
|
||||||
|
H --> A
|
||||||
|
E -- 异常 --> I[暂停在当前 progress action]
|
||||||
|
G --> J{达到最大查询次数}
|
||||||
|
J -- 是 --> I
|
||||||
|
J -- 否 --> H
|
||||||
|
```
|
||||||
|
|
||||||
|
- `poll-download-progress` 和 `poll-upgrade-progress` 不再在脚本内部长时间循环;脚本/MCP/fake 每次只返回一次进度查询结果。
|
||||||
|
- LLM/规则通过 `progress_complete` 判断进度是否完成。未完成但正常时,`should_continue=true`、`progress_complete=false`,workflow 会保留当前 action 并再次查询。
|
||||||
|
- 查询间隔由 `POLL_INTERVAL_SEC` 控制,下载最大次数由 `DOWNLOAD_POLL_MAX_ATTEMPTS` 控制,单 IP 推送最大次数由 `UPGRADE_POLL_MAX_ATTEMPTS` 控制。
|
||||||
|
- 每次进度查询都会播报 `ACTION_PROGRESS` 并保存 checkpoint;中断或失败后 `resume` 会从同一个 progress action 继续。
|
||||||
|
|
||||||
## 失败、显式回滚和续跑
|
## 失败、显式回滚和续跑
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
|
|||||||
@ -71,11 +71,13 @@ cd pam-deploy-agent-linux-x86_64
|
|||||||
本次发布包对应的运行时行为也已同步到包内 `README.md`:
|
本次发布包对应的运行时行为也已同步到包内 `README.md`:
|
||||||
|
|
||||||
- 每个 action 完成后都会自动执行一次 LLM/规则审核,只有审核通过才会把 action 记为 completed。
|
- 每个 action 完成后都会自动执行一次 LLM/规则审核,只有审核通过才会把 action 记为 completed。
|
||||||
|
- `poll-download-progress` 和 `poll-upgrade-progress` 是单次进度查询 action;Agent workflow 会按配置重复调用,每次返回后交给 LLM/规则判断是否完成并播报进度。
|
||||||
- `--analyze-actions` 只控制是否把详细审核结果写入 `events`。
|
- `--analyze-actions` 只控制是否把详细审核结果写入 `events`。
|
||||||
- action 失败或审核阻断后会保存 checkpoint 并暂停;修复外部环境后通过 `resume` 从当前 action 重试。
|
- action 失败或审核阻断后会保存 checkpoint 并暂停;修复外部环境后通过 `resume` 从当前 action 重试。
|
||||||
- 回滚不再属于主 workflow 自动分支;需要时使用 chat 内 `rollback [IP]` 或 CLI `rollback --checkpoint ...` 显式执行。
|
- 回滚不再属于主 workflow 自动分支;需要时使用 chat 内 `rollback [IP]` 或 CLI `rollback --checkpoint ...` 显式执行。
|
||||||
- chat 支持执行中 `Ctrl+C` 中断后保存 checkpoint,再通过 `resume` 重试当前 action。
|
- chat 支持执行中 `Ctrl+C` 中断后保存 checkpoint,再通过 `resume` 重试当前 action。
|
||||||
- chat 支持 `set KEY=VALUE` 和 `load params <路径>` 热更新当前运行任务参数。
|
- chat 支持 `set KEY=VALUE` 和 `load params <路径>` 热更新当前运行任务参数。
|
||||||
|
- 进度查询间隔和最大次数可通过 `POLL_INTERVAL_SEC`、`DOWNLOAD_POLL_MAX_ATTEMPTS`、`UPGRADE_POLL_MAX_ATTEMPTS` 配置。
|
||||||
- 支持通过 `--llm-action-analysis-prompt-file` 或 chat 内 `llm config action_analysis_prompt_file=...` 自定义 action 审核提示词。
|
- 支持通过 `--llm-action-analysis-prompt-file` 或 chat 内 `llm config action_analysis_prompt_file=...` 自定义 action 审核提示词。
|
||||||
- chat 支持 `llm test [文本]` 测试当前 LLM client 是否正常加载。
|
- chat 支持 `llm test [文本]` 测试当前 LLM client 是否正常加载。
|
||||||
- 默认运行日志写入 `logs/pam_deploy_agent.log`,可通过 `PAM_AGENT_LOG_FILE` 和 `PAM_AGENT_LOG_LEVEL` 调整。
|
- 默认运行日志写入 `logs/pam_deploy_agent.log`,可通过 `PAM_AGENT_LOG_FILE` 和 `PAM_AGENT_LOG_LEVEL` 调整。
|
||||||
|
|||||||
@ -37,6 +37,7 @@ pam-deploy-agent-linux-x86_64/
|
|||||||
发布包默认会优先使用 `prompt_toolkit` 增强输入,支持更稳定的退格、历史记录和补全;如果增强输入初始化失败,会自动降级到普通 `input()`。输出仍会在可用时使用 `rich` 做更清晰的文本展示。
|
发布包默认会优先使用 `prompt_toolkit` 增强输入,支持更稳定的退格、历史记录和补全;如果增强输入初始化失败,会自动降级到普通 `input()`。输出仍会在可用时使用 `rich` 做更清晰的文本展示。
|
||||||
action 失败或审核阻断后会保存 checkpoint 并暂停;修复外部环境后输入 `resume` 会从当前 action 重试。回滚不再属于主 workflow 自动分支,需要时在 chat 内输入 `rollback [IP]` 显式执行。
|
action 失败或审核阻断后会保存 checkpoint 并暂停;修复外部环境后输入 `resume` 会从当前 action 重试。回滚不再属于主 workflow 自动分支,需要时在 chat 内输入 `rollback [IP]` 显式执行。
|
||||||
chat 会在执行前归一化并展示实际写入脚本配置的参数;`script_only` / `hybrid_node_mcp` 会先检查 `ZIP_FILE_PATH` 是否存在,避免脚本运行后才用默认路径失败。执行过程中每个 action 都会输出开始、完成或失败状态;每个 action 完成后还会自动进入一次 LLM/规则审核,并播报审核开始和审核结果;只有审核通过才会把 action 记为 completed。
|
chat 会在执行前归一化并展示实际写入脚本配置的参数;`script_only` / `hybrid_node_mcp` 会先检查 `ZIP_FILE_PATH` 是否存在,避免脚本运行后才用默认路径失败。执行过程中每个 action 都会输出开始、完成或失败状态;每个 action 完成后还会自动进入一次 LLM/规则审核,并播报审核开始和审核结果;只有审核通过才会把 action 记为 completed。
|
||||||
|
`poll-download-progress` 和 `poll-upgrade-progress` 每次只查询一次进度,Agent workflow 会按 `POLL_INTERVAL_SEC`、`DOWNLOAD_POLL_MAX_ATTEMPTS`、`UPGRADE_POLL_MAX_ATTEMPTS` 重复调用,并在每次返回后交给 LLM/规则判断是否完成、向 chat 播报进度。
|
||||||
|
|
||||||
## 交互式使用
|
## 交互式使用
|
||||||
|
|
||||||
@ -243,6 +244,7 @@ MCP token 获取方式与 HOME 一致,默认按 `client_credentials` POST 到
|
|||||||
- 执行真实 action 前请确认配置文件中的 `HOME_BASE_URL`、`CLIENT_ID`、`CLIENT_SECRET`、`AIRPORT_CODE`、`APP_NAME`、`MODULE_NAME`、`VERSION_NUMBER`、`ZIP_FILE_PATH`。
|
- 执行真实 action 前请确认配置文件中的 `HOME_BASE_URL`、`CLIENT_ID`、`CLIENT_SECRET`、`AIRPORT_CODE`、`APP_NAME`、`MODULE_NAME`、`VERSION_NUMBER`、`ZIP_FILE_PATH`。
|
||||||
- `chat` 中输入 `你好`、`hello` 这类问候不会触发 LLM/结构化分析;需要分析部署需求时请直接描述部署任务,或显式使用 `analyze <需求>`。
|
- `chat` 中输入 `你好`、`hello` 这类问候不会触发 LLM/结构化分析;需要分析部署需求时请直接描述部署任务,或显式使用 `analyze <需求>`。
|
||||||
- 每个 action 完成后都会自动执行一次 LLM/规则审核;`--analyze-actions` 和 `llm action-analysis on` 只控制是否把详细审核结果写入 `events`。
|
- 每个 action 完成后都会自动执行一次 LLM/规则审核;`--analyze-actions` 和 `llm action-analysis on` 只控制是否把详细审核结果写入 `events`。
|
||||||
|
- `poll-download-progress` 和 `poll-upgrade-progress` 是单次进度查询 action,未完成时不会进入下一个 action;最大查询次数和间隔可通过 `config.txt` 或 chat `set` 热更新。
|
||||||
- `llm test [文本]` 可测试当前 LLM client 是否可用。
|
- `llm test [文本]` 可测试当前 LLM client 是否可用。
|
||||||
- 如果审核建议停止、审核本身失败,或用户在执行中按下 `Ctrl+C`,流程都会保存 checkpoint 并进入暂停状态;后续可使用 `resume` 重试当前 action。
|
- 如果审核建议停止、审核本身失败,或用户在执行中按下 `Ctrl+C`,流程都会保存 checkpoint 并进入暂停状态;后续可使用 `resume` 重试当前 action。
|
||||||
- `set KEY=VALUE` 和 `load params <路径>` 会热更新当前运行任务的参数,并回写运行中的 `config.txt` 和 checkpoint。
|
- `set KEY=VALUE` 和 `load params <路径>` 会热更新当前运行任务的参数,并回写运行中的 `config.txt` 和 checkpoint。
|
||||||
|
|||||||
@ -32,6 +32,8 @@ REQUIRED_ACTION_VALUES = {
|
|||||||
"get-node-url": ("NODE_URL",),
|
"get-node-url": ("NODE_URL",),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
PROGRESS_ACTIONS = {"poll-download-progress", "poll-upgrade-progress"}
|
||||||
|
|
||||||
|
|
||||||
class PamDeployAgent:
|
class PamDeployAgent:
|
||||||
"""PAM 部署主 Agent,串联 LLM、action 路由、确认和续跑状态。"""
|
"""PAM 部署主 Agent,串联 LLM、action 路由、确认和续跑状态。"""
|
||||||
@ -345,24 +347,16 @@ class PamDeployAgent:
|
|||||||
error_summary=str(exc),
|
error_summary=str(exc),
|
||||||
)
|
)
|
||||||
logger.info("全局 action 返回 run_id=%s result=%s", state.run_id, _action_result_for_log(result))
|
logger.info("全局 action 返回 run_id=%s result=%s", state.run_id, _action_result_for_log(result))
|
||||||
state.events.append(
|
|
||||||
{
|
|
||||||
"type": "ACTION_DONE" if result.ok else "ACTION_FAIL",
|
|
||||||
"stage": action,
|
|
||||||
"backend": result.backend,
|
|
||||||
"message": result.error_summary or "ok",
|
|
||||||
}
|
|
||||||
)
|
|
||||||
analysis = self._append_action_analysis(state, action, result)
|
analysis = self._append_action_analysis(state, action, result)
|
||||||
if not result.ok:
|
if not result.ok:
|
||||||
self._emit_progress(
|
fail_event = {
|
||||||
{
|
|
||||||
"type": "ACTION_FAIL",
|
"type": "ACTION_FAIL",
|
||||||
"stage": action,
|
"stage": action,
|
||||||
"backend": result.backend,
|
"backend": result.backend,
|
||||||
"message": result.error_summary or "action 执行失败",
|
"message": result.error_summary or "action 执行失败",
|
||||||
}
|
}
|
||||||
)
|
state.events.append(fail_event)
|
||||||
|
self._emit_progress(fail_event)
|
||||||
state.last_failed_step = action
|
state.last_failed_step = action
|
||||||
self.pause_state(
|
self.pause_state(
|
||||||
state,
|
state,
|
||||||
@ -374,14 +368,14 @@ class PamDeployAgent:
|
|||||||
missing_values = self._missing_required_values(action, result.values)
|
missing_values = self._missing_required_values(action, result.values)
|
||||||
if missing_values:
|
if missing_values:
|
||||||
message = f"{action} 返回缺少必要字段: {', '.join(missing_values)}"
|
message = f"{action} 返回缺少必要字段: {', '.join(missing_values)}"
|
||||||
self._emit_progress(
|
fail_event = {
|
||||||
{
|
|
||||||
"type": "ACTION_FAIL",
|
"type": "ACTION_FAIL",
|
||||||
"stage": action,
|
"stage": action,
|
||||||
"backend": result.backend,
|
"backend": result.backend,
|
||||||
"message": message,
|
"message": message,
|
||||||
}
|
}
|
||||||
)
|
state.events.append(fail_event)
|
||||||
|
self._emit_progress(fail_event)
|
||||||
state.last_failed_step = action
|
state.last_failed_step = action
|
||||||
self.pause_state(
|
self.pause_state(
|
||||||
state,
|
state,
|
||||||
@ -397,6 +391,14 @@ class PamDeployAgent:
|
|||||||
raise RuntimeError(message)
|
raise RuntimeError(message)
|
||||||
if analysis is not None and not analysis.should_continue:
|
if analysis is not None and not analysis.should_continue:
|
||||||
state.last_failed_step = action
|
state.last_failed_step = action
|
||||||
|
state.events.append(
|
||||||
|
{
|
||||||
|
"type": "ACTION_BLOCKED",
|
||||||
|
"stage": action,
|
||||||
|
"backend": result.backend,
|
||||||
|
"message": analysis.suggested_action or analysis.possible_reason or "LLM 审核要求暂停",
|
||||||
|
}
|
||||||
|
)
|
||||||
self.pause_state(
|
self.pause_state(
|
||||||
state,
|
state,
|
||||||
reason="llm_review_blocked",
|
reason="llm_review_blocked",
|
||||||
@ -404,19 +406,22 @@ class PamDeployAgent:
|
|||||||
)
|
)
|
||||||
logger.info("全局 action 被 LLM 审核拦截 run_id=%s action=%s analysis=%s", state.run_id, action, json_for_log(asdict(analysis)))
|
logger.info("全局 action 被 LLM 审核拦截 run_id=%s action=%s analysis=%s", state.run_id, action, json_for_log(asdict(analysis)))
|
||||||
return state
|
return state
|
||||||
|
if self._handle_progress_action(state, action, result, analysis):
|
||||||
|
return state
|
||||||
self._apply_result(state, action, result.values)
|
self._apply_result(state, action, result.values)
|
||||||
state.completed_global_steps.append(action)
|
state.completed_global_steps.append(action)
|
||||||
state.last_success_step = action
|
state.last_success_step = action
|
||||||
if state.last_failed_step == action:
|
if state.last_failed_step == action:
|
||||||
state.last_failed_step = ""
|
state.last_failed_step = ""
|
||||||
self._emit_progress(
|
done_message = self._progress_message(action, result) if action in PROGRESS_ACTIONS else result.values.get("MESSAGE", "ok")
|
||||||
{
|
done_event = {
|
||||||
"type": "ACTION_DONE",
|
"type": "ACTION_DONE",
|
||||||
"stage": action,
|
"stage": action,
|
||||||
"backend": result.backend,
|
"backend": result.backend,
|
||||||
"message": result.values.get("MESSAGE", "ok"),
|
"message": done_message,
|
||||||
}
|
}
|
||||||
)
|
state.events.append(done_event)
|
||||||
|
self._emit_progress(done_event)
|
||||||
self._save_checkpoint(state)
|
self._save_checkpoint(state)
|
||||||
logger.info("全局 action 完成 run_id=%s action=%s completed=%s", state.run_id, action, state.completed_global_steps)
|
logger.info("全局 action 完成 run_id=%s action=%s completed=%s", state.run_id, action, state.completed_global_steps)
|
||||||
return state
|
return state
|
||||||
@ -550,27 +555,18 @@ class PamDeployAgent:
|
|||||||
failed,
|
failed,
|
||||||
_action_result_for_log(result),
|
_action_result_for_log(result),
|
||||||
)
|
)
|
||||||
state.events.append(
|
|
||||||
{
|
|
||||||
"type": "ACTION_FAIL" if failed else "ACTION_DONE",
|
|
||||||
"stage": action,
|
|
||||||
"backend": result.backend,
|
|
||||||
"ip": ip,
|
|
||||||
"message": result.error_summary or result.values.get("MESSAGE", "ok"),
|
|
||||||
}
|
|
||||||
)
|
|
||||||
analysis = self._append_action_analysis(state, action, result, ip=ip)
|
analysis = self._append_action_analysis(state, action, result, ip=ip)
|
||||||
|
|
||||||
if failed:
|
if failed:
|
||||||
self._emit_progress(
|
fail_event = {
|
||||||
{
|
|
||||||
"type": "ACTION_FAIL",
|
"type": "ACTION_FAIL",
|
||||||
"stage": action,
|
"stage": action,
|
||||||
"backend": result.backend,
|
"backend": result.backend,
|
||||||
"ip": ip,
|
"ip": ip,
|
||||||
"message": result.error_summary or result.values.get("MESSAGE", "action 执行失败"),
|
"message": result.error_summary or result.values.get("MESSAGE", "action 执行失败"),
|
||||||
}
|
}
|
||||||
)
|
state.events.append(fail_event)
|
||||||
|
self._emit_progress(fail_event)
|
||||||
self._record_ip_failure(state, ip, action, result.error_summary or str(result.values))
|
self._record_ip_failure(state, ip, action, result.error_summary or str(result.values))
|
||||||
self.pause_state(
|
self.pause_state(
|
||||||
state,
|
state,
|
||||||
@ -585,6 +581,15 @@ class PamDeployAgent:
|
|||||||
ip_state["failed_stage"] = action
|
ip_state["failed_stage"] = action
|
||||||
ip_state["failure_reason"] = analysis.possible_reason or analysis.suggested_action or "LLM 审核要求暂停"
|
ip_state["failure_reason"] = analysis.possible_reason or analysis.suggested_action or "LLM 审核要求暂停"
|
||||||
state.last_failed_step = action
|
state.last_failed_step = action
|
||||||
|
state.events.append(
|
||||||
|
{
|
||||||
|
"type": "ACTION_BLOCKED",
|
||||||
|
"stage": action,
|
||||||
|
"backend": result.backend,
|
||||||
|
"ip": ip,
|
||||||
|
"message": analysis.suggested_action or analysis.possible_reason or "LLM 审核要求暂停",
|
||||||
|
}
|
||||||
|
)
|
||||||
self.pause_state(
|
self.pause_state(
|
||||||
state,
|
state,
|
||||||
reason="llm_review_blocked",
|
reason="llm_review_blocked",
|
||||||
@ -592,6 +597,9 @@ class PamDeployAgent:
|
|||||||
)
|
)
|
||||||
logger.info("IP action 被 LLM 审核拦截 run_id=%s ip=%s action=%s analysis=%s", state.run_id, ip, action, json_for_log(asdict(analysis)))
|
logger.info("IP action 被 LLM 审核拦截 run_id=%s ip=%s action=%s analysis=%s", state.run_id, ip, action, json_for_log(asdict(analysis)))
|
||||||
return state
|
return state
|
||||||
|
if self._handle_progress_action(state, action, result, analysis, ip=ip):
|
||||||
|
ip_state["status"] = "RUNNING"
|
||||||
|
return state
|
||||||
self._apply_ip_result(ip_state, action, result.values)
|
self._apply_ip_result(ip_state, action, result.values)
|
||||||
ip_state["status"] = "RUNNING"
|
ip_state["status"] = "RUNNING"
|
||||||
ip_state["failed_stage"] = ""
|
ip_state["failed_stage"] = ""
|
||||||
@ -599,15 +607,16 @@ class PamDeployAgent:
|
|||||||
completed_steps.append(action)
|
completed_steps.append(action)
|
||||||
if state.last_failed_step == action:
|
if state.last_failed_step == action:
|
||||||
state.last_failed_step = ""
|
state.last_failed_step = ""
|
||||||
self._emit_progress(
|
done_message = self._progress_message(action, result, ip=ip) if action in PROGRESS_ACTIONS else result.values.get("MESSAGE", "ok")
|
||||||
{
|
done_event = {
|
||||||
"type": "ACTION_DONE",
|
"type": "ACTION_DONE",
|
||||||
"stage": action,
|
"stage": action,
|
||||||
"backend": result.backend,
|
"backend": result.backend,
|
||||||
"ip": ip,
|
"ip": ip,
|
||||||
"message": result.values.get("MESSAGE", "ok"),
|
"message": done_message,
|
||||||
}
|
}
|
||||||
)
|
state.events.append(done_event)
|
||||||
|
self._emit_progress(done_event)
|
||||||
self._save_checkpoint(state)
|
self._save_checkpoint(state)
|
||||||
logger.info("IP action 完成 run_id=%s ip=%s action=%s completed=%s", state.run_id, ip, action, completed_steps)
|
logger.info("IP action 完成 run_id=%s ip=%s action=%s completed=%s", state.run_id, ip, action, completed_steps)
|
||||||
return state
|
return state
|
||||||
@ -893,6 +902,161 @@ class PamDeployAgent:
|
|||||||
missing,
|
missing,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def _handle_progress_action(
|
||||||
|
self,
|
||||||
|
state: AgentState,
|
||||||
|
action: str,
|
||||||
|
result: ActionResult,
|
||||||
|
analysis: LlmActionAnalysis | None,
|
||||||
|
*,
|
||||||
|
ip: str | None = None,
|
||||||
|
) -> bool:
|
||||||
|
"""处理进度查询 action;未完成时保留当前 action 等待下一次查询。"""
|
||||||
|
if action not in PROGRESS_ACTIONS:
|
||||||
|
return False
|
||||||
|
if ip:
|
||||||
|
ip_state = state.ip_states.get(ip, {})
|
||||||
|
ip_state["progress"] = dict(result.values)
|
||||||
|
|
||||||
|
key = self._poll_attempt_key(action, ip=ip)
|
||||||
|
if self._progress_complete(action, result, analysis):
|
||||||
|
state.poll_attempts.pop(key, None)
|
||||||
|
logger.info(
|
||||||
|
"进度 action 已完成 run_id=%s action=%s ip=%s values=%s",
|
||||||
|
state.run_id,
|
||||||
|
action,
|
||||||
|
ip or "",
|
||||||
|
json_for_log(result.values),
|
||||||
|
)
|
||||||
|
return False
|
||||||
|
|
||||||
|
max_attempts, interval_sec = self._poll_limits(state, action)
|
||||||
|
attempt = state.poll_attempts.get(key, 0) + 1
|
||||||
|
state.poll_attempts[key] = attempt
|
||||||
|
message = self._progress_message(action, result, ip=ip, attempt=attempt, max_attempts=max_attempts)
|
||||||
|
progress_event = {
|
||||||
|
"type": "ACTION_PROGRESS",
|
||||||
|
"stage": action,
|
||||||
|
"backend": result.backend,
|
||||||
|
"ip": ip or "",
|
||||||
|
"message": message,
|
||||||
|
"attempt": attempt,
|
||||||
|
"max_attempts": max_attempts,
|
||||||
|
"values": dict(result.values),
|
||||||
|
}
|
||||||
|
state.events.append(progress_event)
|
||||||
|
self._emit_progress(progress_event)
|
||||||
|
|
||||||
|
if attempt >= max_attempts:
|
||||||
|
timeout_message = f"{action} 进度查询达到最大次数 {max_attempts},当前仍未完成。{message}"
|
||||||
|
logger.warning(
|
||||||
|
"进度 action 超时 run_id=%s action=%s ip=%s attempt=%s max=%s values=%s",
|
||||||
|
state.run_id,
|
||||||
|
action,
|
||||||
|
ip or "",
|
||||||
|
attempt,
|
||||||
|
max_attempts,
|
||||||
|
json_for_log(result.values),
|
||||||
|
)
|
||||||
|
fail_event = {
|
||||||
|
"type": "ACTION_FAIL",
|
||||||
|
"stage": action,
|
||||||
|
"backend": result.backend,
|
||||||
|
"ip": ip or "",
|
||||||
|
"message": timeout_message,
|
||||||
|
}
|
||||||
|
state.events.append(fail_event)
|
||||||
|
self._emit_progress(fail_event)
|
||||||
|
state.last_failed_step = action
|
||||||
|
self.pause_state(
|
||||||
|
state,
|
||||||
|
reason="progress_timeout",
|
||||||
|
review_context={
|
||||||
|
"type": "action_review",
|
||||||
|
"stage": action,
|
||||||
|
"ip": ip or "",
|
||||||
|
"backend": result.backend,
|
||||||
|
"ok": result.ok,
|
||||||
|
"error_summary": timeout_message,
|
||||||
|
"possible_reason": "进度查询超过最大次数但未达到完成条件。",
|
||||||
|
"suggested_action": "请检查 PAM_HOME/PAM_NODE 任务状态;确认外部任务仍在运行时,可调大轮询次数后 resume 重试当前 action。",
|
||||||
|
"should_continue": False,
|
||||||
|
"progress_values": dict(result.values),
|
||||||
|
"attempt": attempt,
|
||||||
|
"max_attempts": max_attempts,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return True
|
||||||
|
|
||||||
|
self._save_checkpoint(state)
|
||||||
|
logger.info(
|
||||||
|
"进度 action 未完成,等待下一次查询 run_id=%s action=%s ip=%s attempt=%s max=%s interval=%s message=%s",
|
||||||
|
state.run_id,
|
||||||
|
action,
|
||||||
|
ip or "",
|
||||||
|
attempt,
|
||||||
|
max_attempts,
|
||||||
|
interval_sec,
|
||||||
|
message,
|
||||||
|
)
|
||||||
|
if interval_sec > 0:
|
||||||
|
time.sleep(interval_sec)
|
||||||
|
return True
|
||||||
|
|
||||||
|
def _poll_attempt_key(self, action: str, *, ip: str | None = None) -> str:
|
||||||
|
"""生成 progress action 的 checkpoint 计数 key。"""
|
||||||
|
return f"ip:{ip}:{action}" if ip else f"global:{action}"
|
||||||
|
|
||||||
|
def _poll_limits(self, state: AgentState, action: str) -> tuple[int, float]:
|
||||||
|
"""从运行参数读取轮询最大次数和间隔。"""
|
||||||
|
interval_sec = _safe_float(state.params.get("POLL_INTERVAL_SEC"), float(DEFAULT_PARAMS["POLL_INTERVAL_SEC"]))
|
||||||
|
if action == "poll-upgrade-progress":
|
||||||
|
max_attempts = _safe_int(
|
||||||
|
state.params.get("UPGRADE_POLL_MAX_ATTEMPTS"),
|
||||||
|
int(DEFAULT_PARAMS["UPGRADE_POLL_MAX_ATTEMPTS"]),
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
max_attempts = _safe_int(
|
||||||
|
state.params.get("DOWNLOAD_POLL_MAX_ATTEMPTS"),
|
||||||
|
int(DEFAULT_PARAMS["DOWNLOAD_POLL_MAX_ATTEMPTS"]),
|
||||||
|
)
|
||||||
|
return max(max_attempts, 1), max(interval_sec, 0.0)
|
||||||
|
|
||||||
|
def _progress_complete(
|
||||||
|
self,
|
||||||
|
action: str,
|
||||||
|
result: ActionResult,
|
||||||
|
analysis: LlmActionAnalysis | None,
|
||||||
|
) -> bool:
|
||||||
|
"""判断进度 action 是否完成,优先尊重 LLM 明确结论。"""
|
||||||
|
if analysis is not None and analysis.progress_complete is not None:
|
||||||
|
return bool(analysis.progress_complete)
|
||||||
|
return _progress_values_complete(action, result.values)
|
||||||
|
|
||||||
|
def _progress_message(
|
||||||
|
self,
|
||||||
|
action: str,
|
||||||
|
result: ActionResult,
|
||||||
|
*,
|
||||||
|
ip: str | None = None,
|
||||||
|
attempt: int | None = None,
|
||||||
|
max_attempts: int | None = None,
|
||||||
|
) -> str:
|
||||||
|
"""把进度字段格式化为用户和日志可读的短消息。"""
|
||||||
|
values = result.values
|
||||||
|
parts: list[str] = []
|
||||||
|
if ip:
|
||||||
|
parts.append(f"IP={ip}")
|
||||||
|
if attempt is not None and max_attempts is not None:
|
||||||
|
parts.append(f"第 {attempt}/{max_attempts} 次查询")
|
||||||
|
for key in ("RATE_OF_PROGRESS", "STEP", "MSG", "STATUS", "SUCCESS", "CODE", "FINISH", "MESSAGE"):
|
||||||
|
value = values.get(key)
|
||||||
|
if value not in (None, ""):
|
||||||
|
parts.append(f"{key}={value}")
|
||||||
|
if not parts:
|
||||||
|
parts.append("进度接口已返回,但未包含明确进度字段")
|
||||||
|
return ",".join(parts)
|
||||||
|
|
||||||
def _business_failed(self, action: str, values: dict[str, Any]) -> bool:
|
def _business_failed(self, action: str, values: dict[str, Any]) -> bool:
|
||||||
"""识别 exit code 之外的业务失败条件。"""
|
"""识别 exit code 之外的业务失败条件。"""
|
||||||
if action == "verify-ip":
|
if action == "verify-ip":
|
||||||
@ -1026,6 +1190,7 @@ class PamDeployAgent:
|
|||||||
"has_anomaly": analysis.has_anomaly,
|
"has_anomaly": analysis.has_anomaly,
|
||||||
"severity": analysis.severity,
|
"severity": analysis.severity,
|
||||||
"should_continue": analysis.should_continue,
|
"should_continue": analysis.should_continue,
|
||||||
|
"progress_complete": analysis.progress_complete,
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
logger.info(
|
logger.info(
|
||||||
@ -1052,6 +1217,7 @@ class PamDeployAgent:
|
|||||||
"pause_reason": state.pause_reason,
|
"pause_reason": state.pause_reason,
|
||||||
"last_success_step": state.last_success_step,
|
"last_success_step": state.last_success_step,
|
||||||
"last_failed_step": state.last_failed_step,
|
"last_failed_step": state.last_failed_step,
|
||||||
|
"poll_attempts": state.poll_attempts,
|
||||||
}
|
}
|
||||||
|
|
||||||
def _review_context(
|
def _review_context(
|
||||||
@ -1079,6 +1245,7 @@ class PamDeployAgent:
|
|||||||
"possible_reason": analysis.possible_reason,
|
"possible_reason": analysis.possible_reason,
|
||||||
"suggested_action": analysis.suggested_action,
|
"suggested_action": analysis.suggested_action,
|
||||||
"should_continue": analysis.should_continue,
|
"should_continue": analysis.should_continue,
|
||||||
|
"progress_complete": analysis.progress_complete,
|
||||||
"notes": list(analysis.notes),
|
"notes": list(analysis.notes),
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
@ -1152,3 +1319,44 @@ def _action_result_for_log(result: ActionResult) -> str:
|
|||||||
},
|
},
|
||||||
max_text_len=1000,
|
max_text_len=1000,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _progress_values_complete(action: str, values: dict[str, Any]) -> bool:
|
||||||
|
"""根据 action 返回字段判断下载/推送进度是否完成。"""
|
||||||
|
step = _lower_value(values.get("STEP"))
|
||||||
|
status = _lower_value(values.get("STATUS"))
|
||||||
|
msg = _lower_value(values.get("MSG"))
|
||||||
|
success = _lower_value(values.get("SUCCESS"))
|
||||||
|
finish = _lower_value(values.get("FINISH"))
|
||||||
|
code = _lower_value(values.get("CODE"))
|
||||||
|
rate = _lower_value(values.get("RATE_OF_PROGRESS"))
|
||||||
|
if step == "done":
|
||||||
|
return True
|
||||||
|
if status in ("completed", "complete", "done", "success", "succeeded"):
|
||||||
|
return True
|
||||||
|
if success in ("true", "1", "yes"):
|
||||||
|
return True
|
||||||
|
if action == "poll-upgrade-progress" and finish in ("true", "1", "yes"):
|
||||||
|
return True
|
||||||
|
return msg == "success" and rate == "100" and (not code or code == "0")
|
||||||
|
|
||||||
|
|
||||||
|
def _lower_value(value: Any) -> str:
|
||||||
|
"""把结构化字段转换为小写字符串,便于规则判断。"""
|
||||||
|
return str(value).strip().lower() if value is not None else ""
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_int(value: Any, default: int) -> int:
|
||||||
|
"""安全读取整数参数。"""
|
||||||
|
try:
|
||||||
|
return int(str(value).strip())
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_float(value: Any, default: float) -> float:
|
||||||
|
"""安全读取浮点参数。"""
|
||||||
|
try:
|
||||||
|
return float(str(value).strip())
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|||||||
@ -17,6 +17,9 @@ CONFIG_KEYS = (
|
|||||||
"ACTION_TYPE",
|
"ACTION_TYPE",
|
||||||
"TIMEOUT",
|
"TIMEOUT",
|
||||||
"LOG_NAME",
|
"LOG_NAME",
|
||||||
|
"POLL_INTERVAL_SEC",
|
||||||
|
"DOWNLOAD_POLL_MAX_ATTEMPTS",
|
||||||
|
"UPGRADE_POLL_MAX_ATTEMPTS",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -64,6 +64,9 @@ DEFAULT_PARAMS = {
|
|||||||
"ACTION_TYPE": "FULL",
|
"ACTION_TYPE": "FULL",
|
||||||
"TIMEOUT": 120,
|
"TIMEOUT": 120,
|
||||||
"LOG_NAME": "app.log",
|
"LOG_NAME": "app.log",
|
||||||
|
"POLL_INTERVAL_SEC": 2,
|
||||||
|
"DOWNLOAD_POLL_MAX_ATTEMPTS": 60,
|
||||||
|
"UPGRADE_POLL_MAX_ATTEMPTS": 600,
|
||||||
}
|
}
|
||||||
|
|
||||||
# 日志、报告和 LLM 输入中需要脱敏的字段。
|
# 日志、报告和 LLM 输入中需要脱敏的字段。
|
||||||
|
|||||||
@ -43,6 +43,14 @@ class FakeActionRunner:
|
|||||||
return {"ACTION": action, "NODE_URL": "https://fake-node.local"}
|
return {"ACTION": action, "NODE_URL": "https://fake-node.local"}
|
||||||
if action == "get-online-ips":
|
if action == "get-online-ips":
|
||||||
return {"ACTION": action, "COUNT": "2", "IP": ["192.168.1.10", "192.168.1.11"]}
|
return {"ACTION": action, "COUNT": "2", "IP": ["192.168.1.10", "192.168.1.11"]}
|
||||||
|
if action == "poll-download-progress":
|
||||||
|
return {
|
||||||
|
"ACTION": action,
|
||||||
|
"STEP": "DONE",
|
||||||
|
"RATE_OF_PROGRESS": "100",
|
||||||
|
"MSG": "success",
|
||||||
|
"MESSAGE": "success",
|
||||||
|
}
|
||||||
if action == "upgrade-ip":
|
if action == "upgrade-ip":
|
||||||
return {"ACTION": action, "IP": kwargs.get("ip", ""), "RESULT": "TASK_CREATED"}
|
return {"ACTION": action, "IP": kwargs.get("ip", ""), "RESULT": "TASK_CREATED"}
|
||||||
if action == "poll-upgrade-progress":
|
if action == "poll-upgrade-progress":
|
||||||
@ -51,6 +59,7 @@ class FakeActionRunner:
|
|||||||
"IP": kwargs.get("ip", ""),
|
"IP": kwargs.get("ip", ""),
|
||||||
"STEP": "DONE",
|
"STEP": "DONE",
|
||||||
"RATE_OF_PROGRESS": "100",
|
"RATE_OF_PROGRESS": "100",
|
||||||
|
"MSG": "success",
|
||||||
"MESSAGE": "success",
|
"MESSAGE": "success",
|
||||||
}
|
}
|
||||||
if action == "start-ip":
|
if action == "start-ip":
|
||||||
|
|||||||
@ -824,6 +824,8 @@ class InteractiveCliSession:
|
|||||||
ip = context.get("ip")
|
ip = context.get("ip")
|
||||||
rollback_hint = f"rollback {ip}" if ip else "rollback <IP>"
|
rollback_hint = f"rollback {ip}" if ip else "rollback <IP>"
|
||||||
self.output(f"请修复失败原因后输入 resume 重试当前 action;如需回滚,输入 {rollback_hint}。")
|
self.output(f"请修复失败原因后输入 resume 重试当前 action;如需回滚,输入 {rollback_hint}。")
|
||||||
|
elif reason == "progress_timeout":
|
||||||
|
self.output("请检查外部任务状态;如任务仍在运行,可调大最大查询次数或等待后输入 resume 重试当前进度 action。")
|
||||||
elif reason == "rollback_failed":
|
elif reason == "rollback_failed":
|
||||||
self.output("请检查回滚失败原因;修复后可再次输入 rollback 重试,或人工处理后再 resume。")
|
self.output("请检查回滚失败原因;修复后可再次输入 rollback 重试,或人工处理后再 resume。")
|
||||||
|
|
||||||
@ -848,6 +850,9 @@ class InteractiveCliSession:
|
|||||||
elif event_type == "ACTION_FAIL":
|
elif event_type == "ACTION_FAIL":
|
||||||
detail = f": {message}" if message else ""
|
detail = f": {message}" if message else ""
|
||||||
self.output(f"失败 action: {stage}{suffix}{detail}")
|
self.output(f"失败 action: {stage}{suffix}{detail}")
|
||||||
|
elif event_type == "ACTION_PROGRESS":
|
||||||
|
detail = f": {message}" if message else ""
|
||||||
|
self.output(f"进度更新: {stage}{suffix}{detail}")
|
||||||
elif event_type == "ACTION_REVIEW_START":
|
elif event_type == "ACTION_REVIEW_START":
|
||||||
self.output(f"开始分析 action 结果: {stage}{suffix}")
|
self.output(f"开始分析 action 结果: {stage}{suffix}")
|
||||||
elif event_type == "ACTION_REVIEW_DONE":
|
elif event_type == "ACTION_REVIEW_DONE":
|
||||||
|
|||||||
@ -80,7 +80,7 @@ class LangGraphDeploymentRuntime:
|
|||||||
|
|
||||||
def _config(self) -> dict[str, Any]:
|
def _config(self) -> dict[str, Any]:
|
||||||
"""生成 LangGraph checkpointer 使用的线程配置。"""
|
"""生成 LangGraph checkpointer 使用的线程配置。"""
|
||||||
return {"configurable": {"thread_id": self.thread_id}}
|
return {"configurable": {"thread_id": self.thread_id}, "recursion_limit": 10000}
|
||||||
|
|
||||||
def _consume(self, chunks: Any) -> LangGraphRunResult:
|
def _consume(self, chunks: Any) -> LangGraphRunResult:
|
||||||
"""消费 LangGraph stream 输出,提取状态、报告和旧版 interrupt 请求。"""
|
"""消费 LangGraph stream 输出,提取状态、报告和旧版 interrupt 请求。"""
|
||||||
|
|||||||
@ -170,6 +170,7 @@ class OpenAICompatibleLlmClient:
|
|||||||
suggested_action=_string(payload, "suggested_action", ""),
|
suggested_action=_string(payload, "suggested_action", ""),
|
||||||
requires_confirmation=bool(payload.get("requires_confirmation", False)),
|
requires_confirmation=bool(payload.get("requires_confirmation", False)),
|
||||||
should_continue=bool(payload.get("should_continue", True)),
|
should_continue=bool(payload.get("should_continue", True)),
|
||||||
|
progress_complete=_optional_bool(payload.get("progress_complete")),
|
||||||
notes=_string_list(payload.get("notes")),
|
notes=_string_list(payload.get("notes")),
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -385,6 +386,23 @@ def _float(payload: dict[str, Any], key: str, default: float) -> float:
|
|||||||
return default
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _optional_bool(value: Any) -> bool | None:
|
||||||
|
"""解析可选布尔值,字段缺失时保留 None。"""
|
||||||
|
if value is None:
|
||||||
|
return None
|
||||||
|
if isinstance(value, bool):
|
||||||
|
return value
|
||||||
|
if isinstance(value, str):
|
||||||
|
lowered = value.strip().lower()
|
||||||
|
if lowered in ("", "null", "none"):
|
||||||
|
return None
|
||||||
|
if lowered in ("true", "1", "yes", "y"):
|
||||||
|
return True
|
||||||
|
if lowered in ("false", "0", "no", "n"):
|
||||||
|
return False
|
||||||
|
return bool(value)
|
||||||
|
|
||||||
|
|
||||||
def _dict(value: Any) -> dict[str, Any]:
|
def _dict(value: Any) -> dict[str, Any]:
|
||||||
"""确保返回 dict,非法值降级为空 dict。"""
|
"""确保返回 dict,非法值降级为空 dict。"""
|
||||||
return value if isinstance(value, dict) else {}
|
return value if isinstance(value, dict) else {}
|
||||||
|
|||||||
@ -38,7 +38,10 @@ PARAM_PROMPT = """从用户输入中抽取 PAM 部署参数和控制信息。
|
|||||||
"ZIP_FILE_PATH": "...",
|
"ZIP_FILE_PATH": "...",
|
||||||
"ACTION_TYPE": "...",
|
"ACTION_TYPE": "...",
|
||||||
"TIMEOUT": "...",
|
"TIMEOUT": "...",
|
||||||
"LOG_NAME": "..."
|
"LOG_NAME": "...",
|
||||||
|
"POLL_INTERVAL_SEC": "...",
|
||||||
|
"DOWNLOAD_POLL_MAX_ATTEMPTS": "...",
|
||||||
|
"UPGRADE_POLL_MAX_ATTEMPTS": "..."
|
||||||
},
|
},
|
||||||
"extracted_control": {
|
"extracted_control": {
|
||||||
"user_specified_ips": ["..."]
|
"user_specified_ips": ["..."]
|
||||||
@ -77,12 +80,17 @@ ACTION_ANALYSIS_PROMPT = """分析一次 PAM action 执行结果。
|
|||||||
"suggested_action": "...",
|
"suggested_action": "...",
|
||||||
"requires_confirmation": false,
|
"requires_confirmation": false,
|
||||||
"should_continue": true,
|
"should_continue": true,
|
||||||
|
"progress_complete": null,
|
||||||
"notes": ["..."]
|
"notes": ["..."]
|
||||||
}
|
}
|
||||||
|
|
||||||
要求:
|
要求:
|
||||||
- 必须明确给出 `should_continue`:没有问题时为 true;存在需要人工判断的问题时为 false。
|
- 必须明确给出 `should_continue`:没有问题时为 true;存在需要人工判断的问题时为 false。
|
||||||
- 如果 exit_code 非 0、ok=false、verify-ip SUCCESS=false、出现 legacy pending_confirmation,应标记异常。
|
- 如果 exit_code 非 0、ok=false、verify-ip SUCCESS=false、出现 legacy pending_confirmation,应标记异常。
|
||||||
|
- 对 `poll-download-progress`、`poll-upgrade-progress` 必须判断 `progress_complete`:已完成为 true;未完成但正常为 false;非进度 action 可为 null。
|
||||||
|
- 进度 action 未完成但正常时,`has_anomaly=false`、`should_continue=true`、`progress_complete=false`,建议继续查询进度。
|
||||||
|
- 进度 action 完成条件优先看 `STEP=DONE`、`STATUS=completed/done/success`、`SUCCESS=true`、`FINISH=true`,或 `MSG=success` 且 `RATE_OF_PROGRESS=100` 且 `CODE` 为空或 0。
|
||||||
|
- 进度 action 出现 `CODE` 非 0,或 `STEP/MSG/STATUS/MESSAGE` 含 fail/error,应标记异常并 `should_continue=false`。
|
||||||
- 主要依据结构化字段 `ok`、`exit_code`、`values`、`error_summary` 判断;只有输入里存在 `diagnostic_log` 时,才把它当作异常诊断上下文。
|
- 主要依据结构化字段 `ok`、`exit_code`、`values`、`error_summary` 判断;只有输入里存在 `diagnostic_log` 时,才把它当作异常诊断上下文。
|
||||||
- 脚本正常过程日志不会作为错误依据,不能因为日志来自 stderr 就判定异常。
|
- 脚本正常过程日志不会作为错误依据,不能因为日志来自 stderr 就判定异常。
|
||||||
- 不要输出密钥、token、Authorization 或完整日志原文。
|
- 不要输出密钥、token、Authorization 或完整日志原文。
|
||||||
|
|||||||
@ -192,6 +192,7 @@ class RuleBasedLlmClient:
|
|||||||
suggested_action = "继续观察。"
|
suggested_action = "继续观察。"
|
||||||
requires_confirmation = False
|
requires_confirmation = False
|
||||||
should_continue = True
|
should_continue = True
|
||||||
|
progress_complete: bool | None = None
|
||||||
|
|
||||||
if not result.ok:
|
if not result.ok:
|
||||||
severity = "medium"
|
severity = "medium"
|
||||||
@ -218,6 +219,25 @@ class RuleBasedLlmClient:
|
|||||||
notes.append("rollback-ip 失败需要人工处理。")
|
notes.append("rollback-ip 失败需要人工处理。")
|
||||||
should_continue = False
|
should_continue = False
|
||||||
|
|
||||||
|
if action in ("poll-download-progress", "poll-upgrade-progress"):
|
||||||
|
progress_complete, progress_has_anomaly, progress_reason, progress_note = _analyze_progress_values(action, result.values)
|
||||||
|
if progress_note:
|
||||||
|
notes.append(progress_note)
|
||||||
|
if progress_has_anomaly:
|
||||||
|
has_anomaly = True
|
||||||
|
severity = "high"
|
||||||
|
possible_reason = progress_reason or possible_reason or "进度接口返回失败状态。"
|
||||||
|
suggested_action = "停止后续 action,检查下载/推送任务状态、PAM_HOME/PAM_NODE 日志和接口返回。"
|
||||||
|
should_continue = False
|
||||||
|
elif progress_complete:
|
||||||
|
has_anomaly = has_anomaly or False
|
||||||
|
suggested_action = "进度已完成,可以继续下一个 action。"
|
||||||
|
should_continue = should_continue and True
|
||||||
|
elif result.ok:
|
||||||
|
severity = severity if has_anomaly else "info"
|
||||||
|
suggested_action = "进度未完成,继续查询进度。"
|
||||||
|
should_continue = should_continue and True
|
||||||
|
|
||||||
if result.values.get("PENDING_AGENT_CONFIRMATION"):
|
if result.values.get("PENDING_AGENT_CONFIRMATION"):
|
||||||
has_anomaly = True
|
has_anomaly = True
|
||||||
severity = "high"
|
severity = "high"
|
||||||
@ -235,6 +255,7 @@ class RuleBasedLlmClient:
|
|||||||
suggested_action=suggested_action,
|
suggested_action=suggested_action,
|
||||||
requires_confirmation=requires_confirmation,
|
requires_confirmation=requires_confirmation,
|
||||||
should_continue=should_continue,
|
should_continue=should_continue,
|
||||||
|
progress_complete=progress_complete,
|
||||||
notes=notes,
|
notes=notes,
|
||||||
)
|
)
|
||||||
logger.info("规则 LLM action 审核完成 analysis=%s", json_for_log(asdict(analysis)))
|
logger.info("规则 LLM action 审核完成 analysis=%s", json_for_log(asdict(analysis)))
|
||||||
@ -265,3 +286,49 @@ class RuleBasedLlmClient:
|
|||||||
if match:
|
if match:
|
||||||
params[key] = match.group(1)
|
params[key] = match.group(1)
|
||||||
return params
|
return params
|
||||||
|
|
||||||
|
|
||||||
|
def _analyze_progress_values(action: str, values: dict[str, Any]) -> tuple[bool, bool, str, str]:
|
||||||
|
"""分析进度字段,返回完成状态、异常状态、原因和备注。"""
|
||||||
|
step = _lower_value(values.get("STEP"))
|
||||||
|
status = _lower_value(values.get("STATUS"))
|
||||||
|
msg = _lower_value(values.get("MSG"))
|
||||||
|
message = _lower_value(values.get("MESSAGE"))
|
||||||
|
success = _lower_value(values.get("SUCCESS"))
|
||||||
|
finish = _lower_value(values.get("FINISH"))
|
||||||
|
code = _lower_value(values.get("CODE"))
|
||||||
|
rate = _lower_value(values.get("RATE_OF_PROGRESS"))
|
||||||
|
|
||||||
|
complete = False
|
||||||
|
if step == "done":
|
||||||
|
complete = True
|
||||||
|
elif status in ("completed", "complete", "done", "success", "succeeded"):
|
||||||
|
complete = True
|
||||||
|
elif success in ("true", "1", "yes"):
|
||||||
|
complete = True
|
||||||
|
elif action == "poll-upgrade-progress" and finish in ("true", "1", "yes"):
|
||||||
|
complete = True
|
||||||
|
elif msg == "success" and rate == "100" and (not code or code == "0"):
|
||||||
|
complete = True
|
||||||
|
|
||||||
|
if code and code != "0":
|
||||||
|
return complete, True, f"进度接口返回非 0 CODE: {code}", _progress_note(values)
|
||||||
|
combined = " ".join(item for item in (step, status, msg, message) if item)
|
||||||
|
if re.search(r"fail|error", combined, flags=re.IGNORECASE):
|
||||||
|
return complete, True, values.get("MESSAGE") or values.get("MSG") or values.get("STEP") or "进度接口返回失败状态", _progress_note(values)
|
||||||
|
return complete, False, "", _progress_note(values)
|
||||||
|
|
||||||
|
|
||||||
|
def _progress_note(values: dict[str, Any]) -> str:
|
||||||
|
"""把进度核心字段整理成一条备注。"""
|
||||||
|
parts = []
|
||||||
|
for key in ("RATE_OF_PROGRESS", "STEP", "MSG", "STATUS", "SUCCESS", "CODE", "FINISH", "MESSAGE"):
|
||||||
|
value = values.get(key)
|
||||||
|
if value not in (None, ""):
|
||||||
|
parts.append(f"{key}={value}")
|
||||||
|
return "当前进度: " + ", ".join(parts) if parts else "进度接口未返回明确进度字段。"
|
||||||
|
|
||||||
|
|
||||||
|
def _lower_value(value: Any) -> str:
|
||||||
|
"""把字段值转成小写字符串。"""
|
||||||
|
return str(value).strip().lower() if value is not None else ""
|
||||||
|
|||||||
@ -100,6 +100,7 @@ class LlmActionAnalysis:
|
|||||||
suggested_action: str = ""
|
suggested_action: str = ""
|
||||||
requires_confirmation: bool = False
|
requires_confirmation: bool = False
|
||||||
should_continue: bool = True
|
should_continue: bool = True
|
||||||
|
progress_complete: bool | None = None
|
||||||
notes: list[str] = field(default_factory=list)
|
notes: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
@ -131,3 +132,4 @@ class AgentState:
|
|||||||
pause_reason: str = ""
|
pause_reason: str = ""
|
||||||
review_context: dict[str, Any] = field(default_factory=dict)
|
review_context: dict[str, Any] = field(default_factory=dict)
|
||||||
events: list[dict[str, Any]] = field(default_factory=list)
|
events: list[dict[str, Any]] = field(default_factory=list)
|
||||||
|
poll_attempts: dict[str, int] = field(default_factory=dict)
|
||||||
|
|||||||
@ -70,7 +70,7 @@ ACTION_TOOL_SPECS: dict[str, ActionToolSpec] = {
|
|||||||
name="poll_download_progress",
|
name="poll_download_progress",
|
||||||
action="poll-download-progress",
|
action="poll-download-progress",
|
||||||
scope="global",
|
scope="global",
|
||||||
description="轮询云下载任务进度。",
|
description="单次查询云下载任务进度;是否继续查询由 Agent workflow 和 LLM 审核决定。",
|
||||||
risk_level="medium",
|
risk_level="medium",
|
||||||
),
|
),
|
||||||
"upgrade-ip": ActionToolSpec(
|
"upgrade-ip": ActionToolSpec(
|
||||||
@ -85,7 +85,7 @@ ACTION_TOOL_SPECS: dict[str, ActionToolSpec] = {
|
|||||||
name="poll_upgrade_progress",
|
name="poll_upgrade_progress",
|
||||||
action="poll-upgrade-progress",
|
action="poll-upgrade-progress",
|
||||||
scope="ip",
|
scope="ip",
|
||||||
description="轮询单个工作站升级进度。",
|
description="单次查询单个工作站升级进度;是否继续查询由 Agent workflow 和 LLM 审核决定。",
|
||||||
risk_level="medium",
|
risk_level="medium",
|
||||||
),
|
),
|
||||||
"start-ip": ActionToolSpec(
|
"start-ip": ActionToolSpec(
|
||||||
|
|||||||
@ -9,12 +9,17 @@
|
|||||||
"suggested_action": "...",
|
"suggested_action": "...",
|
||||||
"requires_confirmation": false,
|
"requires_confirmation": false,
|
||||||
"should_continue": true,
|
"should_continue": true,
|
||||||
|
"progress_complete": null,
|
||||||
"notes": ["..."]
|
"notes": ["..."]
|
||||||
}
|
}
|
||||||
|
|
||||||
要求:
|
要求:
|
||||||
- 必须明确给出 `should_continue`:没有问题时为 true;存在需要人工判断的问题时为 false。
|
- 必须明确给出 `should_continue`:没有问题时为 true;存在需要人工判断的问题时为 false。
|
||||||
- 如果 exit_code 非 0、ok=false、verify-ip SUCCESS=false、出现旧版 pending_confirmation,应标记异常。
|
- 如果 exit_code 非 0、ok=false、verify-ip SUCCESS=false、出现旧版 pending_confirmation,应标记异常。
|
||||||
|
- 对 `poll-download-progress`、`poll-upgrade-progress` 必须判断 `progress_complete`:已完成为 true;未完成但正常为 false;非进度 action 可为 null。
|
||||||
|
- 进度 action 未完成但正常时,`has_anomaly=false`、`should_continue=true`、`progress_complete=false`,建议继续查询进度。
|
||||||
|
- 进度 action 完成条件优先看 `STEP=DONE`、`STATUS=completed/done/success`、`SUCCESS=true`、`FINISH=true`,或 `MSG=success` 且 `RATE_OF_PROGRESS=100` 且 `CODE` 为空或 0。
|
||||||
|
- 进度 action 出现 `CODE` 非 0,或 `STEP/MSG/STATUS/MESSAGE` 含 fail/error,应标记异常并 `should_continue=false`。
|
||||||
- 主要依据结构化字段 `ok`、`exit_code`、`values`、`error_summary` 判断;只有输入里存在 `diagnostic_log` 时,才把它当作异常诊断上下文。
|
- 主要依据结构化字段 `ok`、`exit_code`、`values`、`error_summary` 判断;只有输入里存在 `diagnostic_log` 时,才把它当作异常诊断上下文。
|
||||||
- 脚本正常过程日志不会作为错误依据,不能因为日志来自 stderr 就判定异常。
|
- 脚本正常过程日志不会作为错误依据,不能因为日志来自 stderr 就判定异常。
|
||||||
- 不要输出密钥、token、Authorization 或完整日志原文。
|
- 不要输出密钥、token、Authorization 或完整日志原文。
|
||||||
|
|||||||
@ -60,6 +60,39 @@ class BrokenReviewLlmClient:
|
|||||||
raise RuntimeError("review transport failed")
|
raise RuntimeError("review transport failed")
|
||||||
|
|
||||||
|
|
||||||
|
class ProgressivePollRunner(FakeActionRunner):
|
||||||
|
"""模拟下载和推送进度多次查询后才完成。"""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
super().__init__()
|
||||||
|
self.download_progress = ["10", "55", "100"]
|
||||||
|
self.upgrade_progress: dict[str, list[str]] = {}
|
||||||
|
|
||||||
|
def _fixture_for(self, action, kwargs):
|
||||||
|
if action == "poll-download-progress":
|
||||||
|
rate = self.download_progress.pop(0) if self.download_progress else "100"
|
||||||
|
return {
|
||||||
|
"ACTION": action,
|
||||||
|
"STEP": "DONE" if rate == "100" else "RUNNING",
|
||||||
|
"RATE_OF_PROGRESS": rate,
|
||||||
|
"MSG": "success" if rate == "100" else "running",
|
||||||
|
"MESSAGE": f"download {rate}%",
|
||||||
|
}
|
||||||
|
if action == "poll-upgrade-progress":
|
||||||
|
ip = kwargs.get("ip", "")
|
||||||
|
values = self.upgrade_progress.setdefault(str(ip), ["30", "100"])
|
||||||
|
rate = values.pop(0) if values else "100"
|
||||||
|
return {
|
||||||
|
"ACTION": action,
|
||||||
|
"IP": ip,
|
||||||
|
"STEP": "DONE" if rate == "100" else "RUNNING",
|
||||||
|
"RATE_OF_PROGRESS": rate,
|
||||||
|
"MSG": "success" if rate == "100" else "running",
|
||||||
|
"MESSAGE": f"upgrade {rate}%",
|
||||||
|
}
|
||||||
|
return super()._fixture_for(action, kwargs)
|
||||||
|
|
||||||
|
|
||||||
def test_run_deploy_flow_success(tmp_path: Path):
|
def test_run_deploy_flow_success(tmp_path: Path):
|
||||||
agent = PamDeployAgent(fake_runner=FakeActionRunner())
|
agent = PamDeployAgent(fake_runner=FakeActionRunner())
|
||||||
state = agent.create_state(
|
state = agent.create_state(
|
||||||
@ -75,6 +108,59 @@ def test_run_deploy_flow_success(tmp_path: Path):
|
|||||||
assert all(item["status"] == "SUCCESS" for item in state.ip_states.values())
|
assert all(item["status"] == "SUCCESS" for item in state.ip_states.values())
|
||||||
|
|
||||||
|
|
||||||
|
def test_progress_actions_repeat_until_llm_marks_complete(tmp_path: Path):
|
||||||
|
fake = ProgressivePollRunner()
|
||||||
|
agent = PamDeployAgent(fake_runner=fake)
|
||||||
|
state = agent.create_state(
|
||||||
|
params={**PARAMS, "POLL_INTERVAL_SEC": 0},
|
||||||
|
execution_strategy="fake",
|
||||||
|
config_path=str(tmp_path / "config.txt"),
|
||||||
|
checkpoint_path=str(tmp_path / "checkpoint.json"),
|
||||||
|
)
|
||||||
|
|
||||||
|
agent.run_deploy_flow(state)
|
||||||
|
|
||||||
|
calls = [call[0] for call in fake.calls]
|
||||||
|
assert calls.count("poll-download-progress") == 3
|
||||||
|
assert calls.count("poll-upgrade-progress") == 4
|
||||||
|
assert "poll-download-progress" in state.completed_global_steps
|
||||||
|
assert state.poll_attempts == {}
|
||||||
|
assert all(item["status"] == "SUCCESS" for item in state.ip_states.values())
|
||||||
|
progress_events = [event for event in state.events if event["type"] == "ACTION_PROGRESS"]
|
||||||
|
assert any(event["stage"] == "poll-download-progress" and "RATE_OF_PROGRESS=10" in event["message"] for event in progress_events)
|
||||||
|
assert any(event["stage"] == "poll-upgrade-progress" and event["ip"] == "192.168.1.10" for event in progress_events)
|
||||||
|
|
||||||
|
|
||||||
|
def test_progress_timeout_pauses_on_current_action(tmp_path: Path):
|
||||||
|
fake = FakeActionRunner(
|
||||||
|
{
|
||||||
|
"poll-download-progress": {
|
||||||
|
"ACTION": "poll-download-progress",
|
||||||
|
"STEP": "RUNNING",
|
||||||
|
"RATE_OF_PROGRESS": "20",
|
||||||
|
"MSG": "running",
|
||||||
|
"MESSAGE": "download 20%",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
agent = PamDeployAgent(fake_runner=fake)
|
||||||
|
state = agent.create_state(
|
||||||
|
params={**PARAMS, "POLL_INTERVAL_SEC": 0, "DOWNLOAD_POLL_MAX_ATTEMPTS": 2},
|
||||||
|
execution_strategy="fake",
|
||||||
|
config_path=str(tmp_path / "config.txt"),
|
||||||
|
checkpoint_path=str(tmp_path / "checkpoint.json"),
|
||||||
|
)
|
||||||
|
|
||||||
|
agent.run_deploy_flow(state)
|
||||||
|
|
||||||
|
assert state.paused is True
|
||||||
|
assert state.pause_reason == "progress_timeout"
|
||||||
|
assert state.last_failed_step == "poll-download-progress"
|
||||||
|
assert "poll-download-progress" not in state.completed_global_steps
|
||||||
|
assert state.review_context["stage"] == "poll-download-progress"
|
||||||
|
assert state.poll_attempts["global:poll-download-progress"] == 2
|
||||||
|
|
||||||
|
|
||||||
def test_create_state_writes_absolute_script_config_path_and_normalized_zip(tmp_path: Path):
|
def test_create_state_writes_absolute_script_config_path_and_normalized_zip(tmp_path: Path):
|
||||||
package_path = tmp_path / "pkg.zip"
|
package_path = tmp_path / "pkg.zip"
|
||||||
params = {**PARAMS, "ZIP_FILE_PATH": str(package_path)}
|
params = {**PARAMS, "ZIP_FILE_PATH": str(package_path)}
|
||||||
|
|||||||
@ -80,6 +80,26 @@ class FlakyVerifyRunner(FakeActionRunner):
|
|||||||
return super()._fixture_for(action, kwargs)
|
return super()._fixture_for(action, kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
class ChatProgressRunner(FakeActionRunner):
|
||||||
|
"""让 chat fake 部署产生一次可见的进度更新。"""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
super().__init__()
|
||||||
|
self.download_progress = ["40", "100"]
|
||||||
|
|
||||||
|
def _fixture_for(self, action, kwargs):
|
||||||
|
if action == "poll-download-progress":
|
||||||
|
rate = self.download_progress.pop(0) if self.download_progress else "100"
|
||||||
|
return {
|
||||||
|
"ACTION": action,
|
||||||
|
"STEP": "DONE" if rate == "100" else "RUNNING",
|
||||||
|
"RATE_OF_PROGRESS": rate,
|
||||||
|
"MSG": "success" if rate == "100" else "running",
|
||||||
|
"MESSAGE": f"download {rate}%",
|
||||||
|
}
|
||||||
|
return super()._fixture_for(action, kwargs)
|
||||||
|
|
||||||
|
|
||||||
def run_session(session: InteractiveCliSession, inputs: list[str]) -> list[str]:
|
def run_session(session: InteractiveCliSession, inputs: list[str]) -> list[str]:
|
||||||
output: list[str] = []
|
output: list[str] = []
|
||||||
iterator = iter(inputs)
|
iterator = iter(inputs)
|
||||||
@ -138,6 +158,23 @@ def test_chat_run_prints_action_progress(tmp_path: Path):
|
|||||||
assert any("分析完成: verify-ip" in item for item in output)
|
assert any("分析完成: verify-ip" in item for item in output)
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_run_prints_progress_poll_updates(tmp_path: Path):
|
||||||
|
checkpoint = tmp_path / "checkpoint.json"
|
||||||
|
session = InteractiveCliSession(
|
||||||
|
agent=PamDeployAgent(fake_runner=ChatProgressRunner()),
|
||||||
|
params={**PARAMS, "POLL_INTERVAL_SEC": 0},
|
||||||
|
strategy="fake",
|
||||||
|
checkpoint_path=str(checkpoint),
|
||||||
|
)
|
||||||
|
|
||||||
|
output = run_session(session, ["run", "yes", "yes", "yes", "exit"])
|
||||||
|
|
||||||
|
assert any("进度更新: poll-download-progress" in item for item in output)
|
||||||
|
assert any("RATE_OF_PROGRESS=40" in item for item in output)
|
||||||
|
assert session.state is not None
|
||||||
|
assert "poll-download-progress" in session.state.completed_global_steps
|
||||||
|
|
||||||
|
|
||||||
def test_chat_greeting_does_not_trigger_structured_analysis(tmp_path: Path):
|
def test_chat_greeting_does_not_trigger_structured_analysis(tmp_path: Path):
|
||||||
session = InteractiveCliSession(
|
session = InteractiveCliSession(
|
||||||
agent=PamDeployAgent(),
|
agent=PamDeployAgent(),
|
||||||
@ -253,7 +290,7 @@ def test_chat_params_events_and_checkpoint_commands(tmp_path: Path):
|
|||||||
"yes",
|
"yes",
|
||||||
"yes",
|
"yes",
|
||||||
"yes",
|
"yes",
|
||||||
"events 2",
|
"events 20",
|
||||||
"list checkpoints",
|
"list checkpoints",
|
||||||
"load checkpoint " + str(checkpoint),
|
"load checkpoint " + str(checkpoint),
|
||||||
"exit",
|
"exit",
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user