* feat: stream replay viewer runs live * fix: keep replay bootstrap stable during live updates * fix: streamline replay viewer recent runs * feat: add first-class replay viewer scripts * docs: adopt json patch plus spec * docs: pin live transport decisions * feat: stream live replay session updates * fix: harden replay viewer live sync * fix: guard replay viewer recent-run races * fix: refresh replay snapshots from source * fix: tint replay runs by status * fix: render live replay sessions directly * fix: harden replay viewer live runs sync * fix: preserve replay viewer conversation order * fix: settle completed terminal replay nodes * feat: compact replay viewer tool cards * refactor: simplify replay viewer tool event chrome * refactor: flatten replay viewer tool events * fix: reveal replay tool calls before step completion * fix: make replay viewer follow sticky to bottom * fix: detach replay viewer follow on upward scroll * fix: rename replay viewer part discriminators
8.8 KiB
| author | date |
|---|---|
| Onur Solmaz | 2024-03-10 |
JSON Patch+
We introduce an amendment to the JSON Patch standard, which we call JSON Patch+.
JSON Patch is a standard for describing changes to a JSON document, which defines a set of operations that can be applied to a JSON document, i.e. add, remove, replace, move, copy, and test.
JSON Patch+ is a superset of JSON Patch, which lets you append to JSON arrays and strings. This makes it useful for streaming conversation data using Server-Sent Events (SSE), such as tokens as they are generated by a language model.
Existing implementations that stream conversation data
We review existing implementations of streaming conversation data from language models, such as OpenAI and Anthropic.
OpenAI
OpenAI uses unnamed events to stream conversation. That means OpenAI chunks come strictly without the event: line, where the data: line contains the JSON payload. The payload is a JSON object which contains the message id, creation time, the "delta" of the generation (i.e. the token), finish reason and so on.
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}]} <!-- pragma: allowlist secret -->
data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]} <!-- pragma: allowlist secret -->
data: [DONE]
Anthropic
Anthropic has a similar format, but uses a different schema and always sends named events:
event: message_start
data: {"type":"message_start","message":{"id":"msg_01XxGpEMWjpmB6mhvDvwTyKR","type":"message","role":"assistant","content":[],"model":"claude-3-opus-20240229","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":10,"output_tokens":1}}} <!-- pragma: allowlist secret -->
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: ping
data: {"type": "ping"}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":18}}
event: message_stop
data: {"type":"message_stop"}
Many things are different in the Anthropic format, e.g.
- it uses string
text_deltainstead of adeltaobject, stop_reasoninsteadfinish_reasonwhich also takes different values, and so on.
A backend can process and convert all the different formats into the same standard set of objects.
Weaknesses of existing formats
The most common formats that are used out in the wild, OpenAI's and Anthropic's, assume a given conversation state schema. For example, they assume that the state has an array of messages, and each message can have a role and content.
This creates inflexibilities in terms of API and client developments. For example, if Anthropic had to send a new type of object that is not a message, it would need to introduce a new event type, and clients would need to be updated to handle this new event type. If they wanted to introduce a new content type, they would most likely need to change the schema or introduce new fields. For example, this is what happened when OpenAI migrated from a simple generated text string to a message object with role and content.
The weakness is this: every time these companies want to introduce some changes, they need to update:
- both the code that parses the events and reconstructs the state on the client side
- AND the code that renders that state.
The idea
If conversation state is always going to be a JSON object (which is the case for these companies and us), create a general API that does not assume any schema, and can reconstruct arbitrary JSON objects on the client side.
There is already a well-established standard for achieving this, outside the context of SSEs: JSON Patch. It is used to describe changes to a JSON document that can be sent in an HTTP PATCH request.
Our conversation state is a JSON object. We can then use the JSON Patch format in SSE data: to reconstruct this state as the text is generated.
However, JSON Patch has a major shortcoming for our use case. Namely it does not support appending to a string field, which is the whole point of token streaming with the delta field.
To remedy this, we propose a new append operation which will be used to that end.
Appending to a string
The append operation will be used the same way as others, where the value is going to be appended to the string field located at path.
{
"op": "append",
"path": "/targetString",
"value": " appended text"
}
Pros of the proposed method
- We will never need to make changes to our SSE schemas.
- When we need to add a new object, content type, etc., we will just do that on the backend, and the client will still be able to reconstruct them without requiring an update.
- The experience team will only need to deal with a changing JSON object, and how to render that in React. The code that renders the conversation will not have to have any logic regarding the transport of the data, except that a change was made to the state at a certain field.
- In summary, things will simplify a lot and we will not have to maintain or make changes to any streaming related code on the client side ever again.
TBD: More examples