--- author: Onur Solmaz date: 2024-03-10 --- # JSON Patch+ We introduce an amendment to the [JSON Patch](https://datatracker.ietf.org/doc/html/rfc6902) standard, which we call JSON Patch+. JSON Patch is a standard for describing changes to a JSON document, which defines a set of operations that can be applied to a JSON document, i.e. `add`, `remove`, `replace`, `move`, `copy`, and `test`. JSON Patch+ is a superset of JSON Patch, which lets you append to JSON arrays and strings. This makes it useful for streaming conversation data using [Server-Sent Events (SSE)](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events), such as tokens as they are generated by a language model. ## Existing implementations that stream conversation data We review existing implementations of streaming conversation data from language models, such as OpenAI and Anthropic. ### OpenAI OpenAI uses unnamed events to stream conversation. That means OpenAI chunks come strictly without the `event:` line, where the `data:` line contains the JSON payload. The payload is a JSON object which contains the message id, creation time, the "delta" of the generation (i.e. the token), finish reason and so on. ``` data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-91GC9u1cYNISj7gcyyitJmhuM30lD","object":"chat.completion.chunk","created":1710087609,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]} data: [DONE] ``` ### Anthropic Anthropic has a similar format, but uses a different schema and always sends named events: ``` event: message_start data: {"type":"message_start","message":{"id":"msg_01XxGpEMWjpmB6mhvDvwTyKR","type":"message","role":"assistant","content":[],"model":"claude-3-opus-20240229","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":10,"output_tokens":1}}} event: content_block_start data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}} event: ping data: {"type": "ping"} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}} event: content_block_stop data: {"type":"content_block_stop","index":0} event: message_delta data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":18}} event: message_stop data: {"type":"message_stop"} ``` Many things are different in the Anthropic format, e.g. - it uses string `text_delta` instead of a `delta` object, - `stop_reason` instead `finish_reason` which also takes different values, and so on. A backend can process and convert all the different formats into the same standard set of objects. ### Weaknesses of existing formats The most common formats that are used out in the wild, OpenAI's and Anthropic's, assume a given conversation state schema. For example, they assume that the state has an array of messages, and each message can have a role and content. This creates inflexibilities in terms of API and client developments. For example, if Anthropic had to send a new type of object that is not a message, it would need to introduce a new event type, and clients would need to be updated to handle this new event type. If they wanted to introduce a new content type, they would most likely need to change the schema or introduce new fields. For example, this is what happened when OpenAI migrated from a simple generated `text` string to a `message` object with `role` and `content`. The weakness is this: every time these companies want to introduce some changes, they need to update: - both the code that parses the events and reconstructs the state on the client side - AND the code that renders that state. ## The idea If conversation state is always going to be a JSON object (which is the case for these companies and us), create a general API that does not assume any schema, and can reconstruct arbitrary JSON objects on the client side. There is already a well-established standard for achieving this, outside the context of SSEs: [JSON Patch](https://datatracker.ietf.org/doc/html/rfc6902). It is used to describe changes to a JSON document that can be sent in an HTTP PATCH request. Our conversation state is a JSON object. We can then use the JSON Patch format in SSE `data:` to reconstruct this state as the text is generated. However, JSON Patch has a major shortcoming for our use case. Namely it does not support appending to a string field, which is the whole point of token streaming with the `delta` field. To remedy this, we propose a new `append` operation which will be used to that end. ### Appending to a string The `append` operation will be used the same way as others, where the `value` is going to be appended to the string field located at `path`. ```json { "op": "append", "path": "/targetString", "value": " appended text" } ``` ## Pros of the proposed method - We will never need to make changes to our SSE schemas. - When we need to add a new object, content type, etc., we will just do that on the backend, and the client will still be able to reconstruct them without requiring an update. - The experience team will only need to deal with a changing JSON object, and how to render that in React. The code that renders the conversation will not have to have any logic regarding the transport of the data, except that a change was made to the state at a certain field. - In summary, things will simplify a lot and we will not have to maintain or make changes to any streaming related code on the client side ever again. TBD: More examples