2026-03 Terminal Stream Transport And Identity Coupling¶
Summary¶
The browser terminal remained stuck in connecting or later in session ready
without a prompt or without keyboard input because the current terminal relay couples:
- a brittle duplex-over-HTTP transport model
- ingress-dependent buffering behavior
- and an identity model that currently depends on the ingress mTLS handoff path
Under the current design, the ingress path was reachable but buffered live duplex streaming, while the direct path restored immediate stream behavior but failed node identity authorization.
Impact¶
- user-facing browser terminal was unusable
- multiple partial fixes improved readiness/state reporting without restoring shell output
- terminal debugging consumed time across node-agent, API, gateway, and frontend layers
Symptoms¶
- UI initially stayed on
Connecting... - after gateway/frontend readiness fixes, UI showed
[session ready]but no prompt - node-agent showed:
terminal pty started- later
terminal pty produced first output - but those key stream response logs often appeared only after disconnect
- PTY preview log clearly contained the real prompt while the UI still showed no shell output
Root Cause¶
The owner defect was architectural coupling between transport and identity on the node-agent terminal stream leg.
Current design: - node-agent sends NDJSON downstream frames in the request body - API relays NDJSON upstream frames in the response body - both sides keep the same HTTP request open concurrently
Observed production behavior:
- ingress path via https://node-api.100-90-157-34.sslip.io was reachable
- but live duplex behavior was buffered in practice; stream responses surfaced only
after roughly the disconnect/session close boundary
- splitting upstream/downstream endpoints did not remove that buffering while still
traversing the same ingress path
- bypassing ingress to a direct node API endpoint restored immediate stream response
timing, proving the transport theory
- but the direct path then failed 401 invalid node identity, proving the current
node identity model is still coupled to the ingress/mTLS handoff path
So the real root cause is not just "HTTP/2 buffering." It is: - NDJSON duplex-over-HTTP is too brittle for this terminal path - and the direct-path transport we actually want does not share the same identity enforcement model as the ingress path
Why Detection Was Weak¶
- readiness and shell-output failures looked like frontend or PTY startup issues
- the system lacked first-frame logs across all stream legs
- terminal had multiple coupled state machines:
- browser UI
- gateway websocket/session binding
- API relay
- node-agent PTY
Recovery¶
Interim recovery steps:
- added explicit session_ready control frame in terminal-gateway
- cleared stale Connecting... state in the frontend
- added gateway session transition logs
- added PTY-start and PTY-first-output logs in node-agent
- forced the node-agent terminal stream transport to HTTP/1.1
What those changes did: - improved observability and made prompt delivery visible - proved that browser-side readiness and prompt rendering could work - but did not fully solve the terminal architecture
Additional investigation proved:
- split upstream/downstream HTTP endpoints still buffered when they used ingress
- direct terminal API bypass removed the 60-second stall
- but the direct path was first misaddressed to a VPN-only IP and then, when corrected
to a routable hostname/IP, failed with 401 invalid node identity
That means this incident ended with clear diagnosis, not a complete terminal redesign.
Follow-ups¶
- redesign terminal relay transport around typed bidirectional streaming instead of NDJSON duplex-over-POST
- evaluate gRPC bidirectional streaming for node-agent <-> control-plane terminal legs
- define a node identity model for the direct terminal path that does not depend on the current ingress mTLS header handoff
- decide explicitly whether terminal should:
- use a direct node-reachable control-plane endpoint with bearer/service auth, or
- use a separate mTLS-capable direct endpoint, or
- move to a transport where auth and stream lifecycle are designed together
- add a deployed-environment terminal duplex smoke test that verifies:
- prompt appears
- a typed key leaves the browser
- the shell echoes before disconnect
- add a post-reimage terminal smoke so node rebuilds cannot silently regress terminal transport
- record the ingress-vs-direct-path findings in the runbook so future operators do not have to rediscover them