This endpoint creates a websocket connection to convert text to speech. The connection is kept open until the audio is fully generated. The audio is streamed back to the client in real-time.