Stream Amazon Bedrock responses for a more responsive UI
Pair Amazon Bedrock's invoke_model_with_response_stream with a bit of Streamlit's magic to give it the typewriter effect, creating a more responsive UI
Earlier this week, my team held a "winging it" session. A "winging it" session is essentially a mob programming session where one person (in this case, me) drives and the rest of the team helps to navigate a problem. I enter this session relatively unprepared, with only a problem I want help solving or something I want to learn more about and a working dev environment. This particular day, I tasked myself with exploring more about integrating my Streamlit webapp with Amazon Bedrock.
Just want to grab the final code? Head to the last section and it's yours!
I had already spent some time with my coworker's article here to make a simple call to a model using the invoke_model
API through a Python script run at the command line. It was pretty straightforward to migrate this send_prompt_to_bedrock
function to my Streamlit app:
import boto3, json, traceback
import streamlit as st
def send_prompt_to_bedrock(prompt):
bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
try:
# Invoke the Anthropic Claude Sonnet model
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps(request_body)
)
# Parse the response and return the whole thing at once
response_body = json.loads(response.get("body").read())
result = response_body.get("content")[0]['text']
return result
except Exception as e:
print(traceback.format_exc())
return f"Error: {str(e)}"
# Set the title of the app
st.set_page_config(page_title="🪨 Amazon Bedrock demo: invoke_model")
# Add a title
st.title("Demo: invoke_model")
# Add some text
st.markdown(
"""
This is a demo of Amazon Bedrock's invoke_model API. To learn more about this API, check out the API docs [here](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).
"""
)
# Add a text area for the prompt
prompt = st.text_area("Prompt")
# Add a button
if prompt:
result = send_prompt_to_bedrock(prompt)
st.write(result)
In the code above, the send_prompt_to_bedrock
uses invoke_model
to send a user entered prompt to the Claude Sonnet model. When the response is returned, it is simply parsed and written to the page using Streamlit's write
function.
As I was showing my coworkers my current state, I used my trusty prompt What is the capital of France?
and got a response. A really short one: The capital of France is Paris.
But what if I ask a question that warrants a much longer response? Maybe something like Sing me a song about DevOps.
The user sits there and waits until the entire response has been returned from the API call and then the UI updates. My coworkers picked up on this and suggested a minor improvement – stream the response so it prints to the UI earlier. We knew it was possible to do, but none of us knew the exact call to make and how to do it with Streamlit.
So, we asked Amazon Q:
Turns out there is a function on the Amazon Bedrock runtime client that returns the response as a stream -- invoke_model_with_response_stream
. Let use that!
I swapped out the invoke_model
call for this one:
response = bedrock.invoke_model_with_response_stream(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps(request_body)
)
And then had to figure out how to parse and process the streamed response. This took a bit of debugging as the response from Amazon Q wasn't quite what I needed. Below, we get the actual stream object, iterate over it, and return the chunked content if the type is content_block_delta
. And if it's not content_block_delta
and instead is message_stop
, we know we've hit the end of the response stream and return a new line.
event_stream = response.get('body', {})
for event in event_stream:
chunk = event.get('chunk')
if chunk:
message = json.loads(chunk.get("bytes").decode())
if message['type'] == "content_block_delta":
yield message['delta']['text'] or ""
elif message['type'] == "message_stop":
return "\n"
Now that we've processed the stream, we can write that back to the page. Instead of using Streamlit's write
function, we swap it out for write_stream
which does all the fancy magic to make it look like typewriter output.
st.write_stream(result)
In addition to providing earlier feedback to the user by updating the UI earlier, invoke_model_with_response_stream
also gives us other benefits like better memory management and the potential to interrupt the response.
Wrapping up
It wasn't too painful to get from invoke_model
to invoke_model_with_response
, except that we had to figure out how to process that response stream. In the final code below, you can see how we pull this all together and stream a response from Amazon Bedrock and use a bit of Streamlit's magic to give it the typewriter effect.
Want more like this? Share this post on social or drop a comment 💬 below and help me figure out what to explore next!
Helpful resources
Here are a few resources that were helpful to me while figuring this out:
- Getting started with different LLMs on Amazon Bedrock
- Explore Amazon Bedrock with Python, Fast API, and Next.js
- Amazon Bedrock code examples (various languages)
- InvokeModel API docs
- InvokeModelWithResponseStream API docs
- Genrative AI space on Community.aws
Final code
You can grab the full code to stream the response here:
import boto3, json, traceback
import streamlit as st
def send_prompt_to_bedrock(prompt):
bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
try:
# Invoke the Anthropic Claude Sonnet model
response = bedrock.invoke_model_with_response_stream(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps(request_body)
)
# Get the event stream and process it
event_stream = response.get('body', {})
for event in event_stream:
chunk = event.get('chunk')
if chunk:
message = json.loads(chunk.get("bytes").decode())
if message['type'] == "content_block_delta":
yield message['delta']['text'] or ""
elif message['type'] == "message_stop":
return "\n"
except Exception as e:
print(traceback.format_exc())
return f"Error: {str(e)}"
# Set the title of the app
st.set_page_config(page_title="🪨 Amazon Bedrock demo: invoke_model_with_response_stream")
# Add a title
st.title("Demo: invoke_model_with_response_stream")
st.markdown(
"""
This is a demo of Amazon Bedrock's invoke_model_with_response_stream API. To learn more about this API, check out the API docs [here](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html).
"""
)
# Add a text area for the prompt
prompt = st.text_area("Prompt")
# Add a button
if prompt:
result = send_prompt_to_bedrock(prompt)
st.write_stream(result)