Observability
Every request in Karini AI includes a trace that outlines the steps orchestrated by the prompt, agent, recipe, or copilot. This trace allows you to follow the step-by-step process leading to the response at that point in the conversation.
Detect Greeting Questions
Input : Greeting detection prompt with input question
Output: Classification output
ServiceName- Information about the application in the resource
SpanName - Internal function name
gen_ai.prompt.0.role -
gen_ai.completion.0.finish_reason -
gen_ai.completion.0.role -
gen_ai.openai.api_base -
gen_ai.openai.system_fingerprint -
gen_ai.request.max_tokens -The maximum number of response tokens requested
gen_ai.request.model - The model requested (e.g.
gpt-4
,claude
, etc.)gen_ai.request.temperature
gen_ai.system - The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens - The number of tokens used for the completion response
gen_ai.usage.prompt_tokens - The number of tokens used for the prompt in the request
llm.headers -
llm.is_streaming -
llm.request.type - The type of request (e.g.
completion
,chat
, etc.)llm.usage.total_tokens - The total number of tokens used
Check Content Safety
Input : User query
Output : Content Safety Check Output
ServiceName - Information about the application in the resource
SpanName - Internal function name
Query Embeddings
Input : User query
Output : Vector embeddings of the user query
ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.openai.api_base -
gen_ai.request.model - The model requested (e.g.
gpt-4
,claude
, etc.)gen_ai.response.model - The model actually used (e.g.
gpt-4-0613
, etc.)gen_ai.system - The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.headers - The headers used for the request
llm.is_streaming -
llm.request.type - The type of request (e.g.
completion
,chat
, etc.)llm.usage.total_tokens - The total number of tokens used
Get similar embeddings
Input : User query
Output : Similar embeddings from the vector store
ServiceName - Information about the application in the resource
SpanName - Internal function name
Perform reranking
Input : User query
Output : Reranked similar embeddings using Cohere reranker
ServiceName - Information about the application in the resource
SpanName - Internal function name
Get Qna chain streaming
Input : Prompt, user query and the reranked context
Output : Response from the LLM
ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.completion.0.finish_reason -
gen_ai.completion.0.role -
gen_ai.openai.api_base -
gen_ai.openai.api_version -
gen_ai.prompt.0.role
gen_ai.request.max_tokens -The maximum number of response tokens requested
gen_ai.request.model - The model requested (e.g.
gpt-4
,claude
, etc.)gen_ai.request.temperature -
gen_ai.response.model - The model actually used (e.g.
gpt-4-0613
, etc.)gen_ai.system - The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens - The number of tokens used for the completion response
gen_ai.usage.prompt_tokens - The number of tokens used for the prompt in the request
llm.headers -
llm.is_streaming -
llm.request.type - The type of request (e.g.
completion
,chat
, etc.)llm.usage.total_tokens - The total number of tokens used
Get Followup Questions
Input : Follow up question generation prompt, user query and LLM generated answer to the user query
Output : Followup questions
ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.completion.0.finish_reason -
gen_ai.completion.0.role -
gen_ai.openai.api_base -
gen_ai.openai.system_fingerprint
gen_ai.openai.api_version -
gen_ai.prompt.0.role -
gen_ai.request.max_tokens -The maximum number of response tokens requested
gen_ai.request.model -The model actually used (e.g.
gpt-4-0613
, etc.)gen_ai.request.temperature -
gen_ai.response.model -
gen_ai.system -The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens -The number of tokens used for the completion response
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.headers -
llm.is_streaming -
llm.request.type - The type of request (e.g.
completion
,chat
, etc.)llm.usage.total_tokens
Agent Executor
Input : Prompt, user question, agent thoughts and actions
Output: Response to the agent action
ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.prompt.0.role
gen_ai.request.max_tokens - The maximum number of response tokens requested
gen_ai.request.model -The model requested (e.g.
gpt-4
,claude
, etc.)gen_ai.request.temperature
gen_ai.system-he vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens -The number of tokens used for the completion response
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.request.type - The type of request (e.g.
completion
,chat
, etc.)llm.usage.total_tokens - The total number of tokens used
Get LLM Chain Streaming
Input : Prompt with user query
Output: Response from LLM
ServiceName - Information about the application in the resource
SpanName - Internal function name
gen_ai.prompt.0.role
gen_ai.request.max_tokens - The maximum number of response tokens requested
gen_ai.request.model -The model requested (e.g.
gpt-4
,claude
, etc.)gen_ai.request.temperature
gen_ai.system-he vendor of the LLM (e.g. OpenAI, Anthropic, etc.)
gen_ai.usage.completion_tokens -The number of tokens used for the completion response
gen_ai.usage.prompt_tokens -The number of tokens used for the prompt in the request
llm.request.type - The type of request (e.g.
completion
,chat
, etc.)llm.usage.total_tokens - The total number of tokens used
Last updated