Test Recipe
Last updated
Last updated
A recipe can be tested by bringing and configuring the Output element into the recipe canvas (see Create Recipe).
Click the Test button to open a chat window, allowing interaction through queries. Submit your question review the response generated by the recipe. The response includes the following:
You can see the real-time response from your recipe RAG pipeline that includes the answer to the question, prompt lens icon, a trace icon, and statistics. If the model selected in the prompt for the recipe supports streaming, you will see a streaming response.
Prompt lens lets you peek behind the scenes as the request is being executed. Here, you can inspect the input sent to the language models (LLMs) - including system instructions, context, questions, and prompt. This empowers you to analyze the quality of your retrieved context from the vector store and make necessary adjustments to the context generation strategy if needed.
Click on the trace icon to view detailed step-by-step information about the prompt request processing and response generation. Trace has two sections as Prompt and Attributes.
Prompt: Shows the traces of each operation executed during the processing . It includes the following:
Input
Output
Attributes: Shows various parameters and metrics associated with each request. Some of the attributes include:
Input Tokens
Completion tokens
Model parameters such as temperature, max tokens etc.
Following statistics are displayed when the response is generated after a test.
Search Embeddings: Time taken in milliseconds to retrieve the similar embeddings based on the user query.
Question Embeddings Creation: The time taken in milliseconds to generate embeddings for a given question.
LLM Response Time: The amount of time in milliseconds taken by the LLM to generate complete response for the given prompt request.
LLM Request Timestamp: Represents the specific time a request was made to the LLM.
Time To First Token: The time time taken in milliseconds by the LLM to produce the first token of the response after receiving the prompt. TTFT is particularly relevant for applications utilizing streaming, where providing immediate feedback is crucial.
Input Tokens: Total number of input tokens in the LLM request. This includes the prompt instructions, system prompt, context and user query.
Output Tokens: Total number of output tokens generated by the LLM in response to the prompt request. This number does not exceed the Max Tokens value setup during the prompt testing.
Embeddings Input Tokens: Number of tokens, converted into vectors by an embedding model.
Input Unsafety Score: It measures the unsafety score for the given input. A higher score indicates a greater level of unsafety.
Input Toxicity Score: This score represents the likelihood that the input text could be perceived as toxic or harmful.
Doc Summarization: The amount of time, in milliseconds, taken to summarize the retrieved embedding chunks. This number is reported when you select Summarize chunks option in Context Generation configuration.
Summarization Prompt Tokens: Number of tokens in the prompt or input provided for a summarization task. This number is reported when you select Summarize chunks option in Context Generation configuration.
Summarization Response Tokens: Number of tokens in the generated response of a summarization task. This number is reported when you select Summarize chunks option in Context Generation configuration.