Batch Recipe

[[Work In Progress]]

Batch recipes allows to create dynamic prompt chains and process batch load of documents over the designed use case.

To create a new recipe, go to the Recipe Page, click Add New, select Karini as the runtime option, provide a user-friendly name and detailed description, and choose Batch for the recipe type.

Source

Drag and drop the Source element onto the recipe canvas. For details about provisioning the source configurations. refer to Source.

Dataset

Dataset server as internal collection of dataset items which are pointers to the data source. For a recipe, you can choose to use an existing dataset which may have references to other data sources or create a new one, depending on your needs.

For configuring the details displayed on the dataset tile, refer to Dataset.

Prompt

When you add the Prompt tile to the canvas, you're incorporating a tool that enables the generation of contextual outputs based on data. This process involves placing the Prompt tile in a strategic location within your workspace. You then connect the Dataset node to the Prompt node, establishing a link that allows the Prompt node to access and utilize the your data.

You'll select a specific prompt from the available options. This prompt serves as the initial query or input that guides the system in generating contextual responses or performing actions based on the vectorized data stored in the Vector store.

After selecting a prompt, the system shows the prompt alongside the associated LLM and its configured parameters, along with following parameters:

<<<Please check this text>>>>

  • Source: Select transform with Lambda function.

  • Json Path OR Absolute Value: Specify a path within JSON data structures to extract specific information or directly input a fixed value or string for processing.

  • Iterate with batch size: Allows for processing data in batches, optimizing performance and resource utilization when handling large datasets

Transform

The Transform tile allows integration with AWS Lambda functions for data processing. Here’s how to set it up:

  • To begin, drag and drop the Transform tile onto the canvas. Enter a descriptive label and specify the type of transformation required.

  • Next, provide the Lambda ARN (Amazon Resource Name) from your AWS Lambda setup. To find the ARN:

    • Open the AWS Management Console and navigate to AWS Lambda.

    • Select your Lambda function from the list of functions available.

    • Locate the ARN in the Function overview section of the Lambda function's configuration page.

  • After configuring the Lambda ARN in the Transform tile, you can add any variables required by the Lambda function. Example: summaries, evaluation

  • To test the setup, input a test payload into the designated field for input test payload. This payload represents the data or parameters that the Lambda function will process.

  • Execute the test to observe the response generated by the Lambda function. This ensures that the integration and transformation processes are functioning correctly within your workflow.

Aggregator

The Aggregator component offers several output options to consolidate data outputs.

  • Merge JSONs as Json lines: Combines multiple JSON objects into a single output, where each JSON object appears as a line in the output file.

  • Merge List of JSONs: Aggregates a list of JSON objects into a unified JSON structure.

  • Custom Function: Allows integration with AWS Lambda for customized data processing and aggregation.

    • To integrate AWS Lambda with the Aggregator component, follow these steps:

    • Provide the Lambda function's ARN: Obtain the Amazon Resource Name (ARN) of your AWS Lambda function from the AWS Management Console.

    • Source: Select transform with Lambda function.

    • Json Path OR Absolute Value: Specify a path within JSON data structures to extract specific information or directly input a fixed value or string for processing.

    • Iterate with batch size: Allows for processing data in batches, optimizing performance and resource utilization when handling large datasets

    • Input a Test Payload: Prepare a sample payload to simulate input data that your Lambda function will process during execution.

    • Test the Lambda Function's Response: Execute a test to observe how the Lambda function processes the input payload and generates the desired output.

    • Integrating AWS Lambda with the Aggregator component enhances flexibility in data processing workflows, allowing for customized aggregation and transformation of data outputs based on specific business needs or processing requirements.

  • Connect the nodes: Prompts, Lambda, and Aggregator.

Sink

  • Begin by adding the Sink tile to your workflow canvas.Choose Amazon S3 from the list of available output types. This selection indicates that the output data will be stored in an Amazon S3 bucket.

  • Choose Amazon S3 from the list of available output types.

  • Specify the S3 path where you want to store the output data. This includes specifying the bucket name and optionally the folder structure within the bucket.

  • You can choose to receive the output in either CSV or JSON format.

Save and publish recipe

You can save the recipe at any point during the creation. Saving the recipe preserves all configurations and connections made in the workflow for future reference or deployment.

Once a recipe is created and saved, you need to publish it to assign it a version number. A Run button is enabled after the recipe has been published.

Run recipe

Upon execution, the process proceeds through several stages: initially, identifying and listing the objects or data items requiring processing. These items are then grouped into batches to optimize efficiency. Each batch undergoes specific tasks such as Optical Character Recognition (OCR) and extraction of Personally Identifiable Information (PII) as required. Additionally, batches are processed using a Large Language Model (LLM) for further tasks like analysis or content generation. Post-processing, the results are formatted into CSV or JSON files, adhering to the user's specified preferences. Each file is structured to contain data values aligned with the provided prompts, ensuring organized and structured output tailored to meet user needs.

After a successful run, you can access additional options via the Actions menu, located on the right side of the interface. This menu provides two key options:

  • View Runs: This option enables you to view all executed runs. It provides detailed information for each run, including status, duration, and run ID. This is useful for tracking the progress and outcomes of your processes. For more details refer Recipe Runs.

  • Batch History: This option displays the historical data of batch processes. It includes detailed records of previous batch runs, allowing you to review their status, performance, and any errors or issues encountered. This is helpful for auditing, troubleshooting, and understanding the performance trends over time.

  • Batch History Details The batch history contains the following details:

    • Recipe Name: The name of the recipe used.

    • Run ID: A unique identifier for each run.

    • Run Name: The name assigned to the run.

    • Input File Name: The name of the input file used.

    • Output File Path: The location where the output is saved.

    • Output Type: Selected output type in sink

    • Status: The current status of the run.

    • Status Message: A message indicating the status.

    • Start Time: The start time of the run.

    • Duration: The total time taken for the run.

    • Tokens: Tokens associated with the run.

In the Actions menu, under Batch History, you have two important options to help you analyze and understand the results of your batch processes:

  • View Trace:

    Trace has two sections as Prompt and Attributes.

    1. Prompt: You can view the traces of each operation executed during the processing . It includes the following:

      • Input

      • Output

    2. Attributes: These include various parameters and metrics associated with each request. Some of the attributes include:

      • Input Tokens

      • Completion tokens

      • Model parameters such as temperature, max tokens etc.

  • View Output: This option displays the final output of the batch process. It shows the end results generated by the batch, such as processed data, generated reports. This is useful for verifying that the batch process produced the expected results and for further analysis of the output data. You can view the output file in either CSV or JSON format and download it.

Last updated