Observability

Every request in Karini AI includes a trace that outlines the steps orchestrated by the prompt, agent, recipe, or copilot. This trace allows you to follow the step-by-step process leading to the response at that point in the conversation.

Tracing Step

Prompt

Attributes

Detect Greeting Questions

Input : Greeting detection prompt with input question

Output: Classification output

ServiceName- Information about the application in the resource
SpanName - Internal function name
gen_ai.prompt.0.role -
gen_ai.completion.0.finish_reason -
gen_ai.completion.0.role -
gen_ai.openai.api_base -
gen_ai.openai.system_fingerprint -
gen_ai.request.max_tokens -The maximum number of response tokens requested
gen_ai.request.model - The model requested (e.g. gpt-4, claude, etc.)
gen_ai.request.temperature
gen_ai.system - The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens - The number of tokens used for the completion response
gen_ai.usage.prompt_tokens - The number of tokens used for the prompt in the request
llm.headers -
llm.is_streaming -
llm.request.type - The type of request (e.g. completion, chat, etc.)
llm.usage.total_tokens - The total number of tokens used

Check Content Safety

Input : User query
Output : Content Safety Check Output

ServiceName - Information about the application in the resource
SpanName - Internal function name

Query Embeddings

Input : User query
Output : Vector embeddings of the user query

ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.openai.api_base -
gen_ai.request.model - The model requested (e.g. gpt-4, claude, etc.)
gen_ai.response.model - The model actually used (e.g. gpt-4-0613, etc.)
gen_ai.system - The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.headers - The headers used for the request
llm.is_streaming -
llm.request.type - The type of request (e.g. completion, chat, etc.)
llm.usage.total_tokens - The total number of tokens used

Get similar embeddings

Input : User query
Output : Similar embeddings from the vector store

ServiceName - Information about the application in the resource
SpanName - Internal function name

Perform reranking

Input : User query
Output : Reranked similar embeddings using Cohere reranker

ServiceName - Information about the application in the resource
SpanName - Internal function name

Get Qna chain streaming

Input : Prompt, user query and the reranked context
Output : Response from the LLM

ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.completion.0.finish_reason -
gen_ai.completion.0.role -
gen_ai.openai.api_base -
gen_ai.openai.api_version -
gen_ai.prompt.0.role
gen_ai.request.max_tokens -The maximum number of response tokens requested
gen_ai.request.model - The model requested (e.g. gpt-4, claude, etc.)
gen_ai.request.temperature -
gen_ai.response.model - The model actually used (e.g. gpt-4-0613, etc.)
gen_ai.system - The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens - The number of tokens used for the completion response
gen_ai.usage.prompt_tokens - The number of tokens used for the prompt in the request
llm.headers -
llm.is_streaming -
llm.request.type - The type of request (e.g. completion, chat, etc.)
llm.usage.total_tokens - The total number of tokens used

Get Followup Questions

Input : Follow up question generation prompt, user query and LLM generated answer to the user query
Output : Followup questions

ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.completion.0.finish_reason -
gen_ai.completion.0.role -
gen_ai.openai.api_base -
gen_ai.openai.system_fingerprint
gen_ai.openai.api_version -
gen_ai.prompt.0.role -
gen_ai.request.max_tokens -The maximum number of response tokens requested
gen_ai.request.model -The model actually used (e.g. gpt-4-0613, etc.)
gen_ai.request.temperature -
gen_ai.response.model -
gen_ai.system -The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens -The number of tokens used for the completion response
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.headers -
llm.is_streaming -
llm.request.type - The type of request (e.g. completion, chat, etc.)
llm.usage.total_tokens

Agent Executor

Input : Prompt, user question, agent thoughts and actions
Output: Response to the agent action

ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.prompt.0.role
gen_ai.request.max_tokens - The maximum number of response tokens requested
gen_ai.request.model -The model requested (e.g. gpt-4, claude, etc.)
gen_ai.request.temperature
gen_ai.system-he vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens -The number of tokens used for the completion response
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.request.type - The type of request (e.g. completion, chat, etc.)
llm.usage.total_tokens - The total number of tokens used

Get LLM Chain Streaming

Input : Prompt with user query

Output: Response from LLM

ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.prompt.0.role
gen_ai.request.max_tokens - The maximum number of response tokens requested
gen_ai.request.model -The model requested (e.g. gpt-4, claude, etc.)
gen_ai.request.temperature
gen_ai.system-he vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens -The number of tokens used for the completion response
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.request.type - The type of request (e.g. completion, chat, etc.)
llm.usage.total_tokens - The total number of tokens used

PreviousCopilots NextDashboard Overview

Last updated 11 months ago