跳过至正文
某些 API 端点默认以流式传输方式返回响应,例如 /api/generate。这些响应以换行符分隔的 JSON 格式(即 application/x-ndjson 内容类型)提供。例如
{"model":"gemma3","created_at":"2025-10-26T17:15:24.097767Z","response":"That","done":false}
{"model":"gemma3","created_at":"2025-10-26T17:15:24.109172Z","response":"'","done":false}
{"model":"gemma3","created_at":"2025-10-26T17:15:24.121485Z","response":"s","done":false}
{"model":"gemma3","created_at":"2025-10-26T17:15:24.132802Z","response":" a","done":false}
{"model":"gemma3","created_at":"2025-10-26T17:15:24.143931Z","response":" fantastic","done":false}
{"model":"gemma3","created_at":"2025-10-26T17:15:24.155176Z","response":" question","done":false}
{"model":"gemma3","created_at":"2025-10-26T17:15:24.166576Z","response":"!","done":true, "done_reason": "stop"}

禁用流式传输

对于任何支持流式传输的端点,可以通过在请求体中提供 {"stream": false} 来禁用流式传输。这将导致响应改为以 application/json 格式返回
{"model":"gemma3","created_at":"2025-10-26T17:15:24.166576Z","response":"That's a fantastic question!","done":true}

何时使用流式传输与非流式传输

流式传输(默认):
  • 实时生成响应
  • 更低的感知延迟
  • 更适合长文本生成
非流式传输:
  • 处理更简单
  • 更适合短响应或结构化输出
  • 在某些应用程序中更易于处理