Test & Compare

Karini AI's prompt playground allows you to test your prompt against different models and model parameters simultaneously in real-time.

A/B Testing

You can select up to 3 models to test your prompts. The models available for selection must be registered in Karini's Model Hub. You can update the model parameters such as Temperature and Max Tokens to tune the performance of your prompt.

The "Test" button will trigger LLM invocation simultaneously for all the selected models for the prompt. You can review the responses generated by the models in real-time and continue to fine-tune and re-test the prompt as required by tweaking the prompt instructions or modifying the model and parameter configurations. The real-time, side-by-side comparison of prompt responses for various models and parameters gives you the ability to view and analyze the variations in responses generated by each model and guides you select the best combination of model and parameters for your prompt task.

Prompt Responses and Statistics

Upon clicking "Test", you can see real-time response from each of the selected model in the prompt test. If the selected model supports streaming, you will see a streaming response. Once the response generation is complete, you can also review the statistics displayed for each prompt test run. They include:

  • Input Tokens: Total number of input tokens in the LLM request. This includes the prompt instructions, system prompt, context and user query.

  • Output Tokens: Total number of output tokens generated by the LLM in response to the prompt request. This number does not exceed the Max Tokens value setup during the prompt testing.

  • LLM Response Time: The amount of time in milliseconds taken by the LLM to generate complete response for the given prompt request.

  • Time to First Token: The time that it takes for the model to produce the first token of the response after receiving the prompt. TTFT is particularly relevant for applications utilizing streaming, where providing immediate feedback is crucial.

These statistics provide additional guidance when testing the performance of the prompt and the LLM and can be used to decide if the prompt output is satisfactory or if the prompt needs fine-tuning to obtain more precise results. You can save the prompt experiments by clicking "Save prompt runs" button.

Selecting the Best Answer

Based on the live response and the statistics of the model, you can select the best answer for your prompt requests. Additionally, you can also view the model tracing details to get an in-depth understanding of request execution process by the LLM.

When you click "Select as best answer" button on a model response, the associated model and model parameters get assigned to the prompt which are also shown on "Edit prompt" tab in "Selected LLM Model" section.

Selecting the best answer also prompts you to save the prompt run.

Last updated