Karini AI Documentation
Go Back to Karini AI
  • Introduction
  • Installation
  • Getting Started
  • Organization
  • User Management
    • User Invitations
    • Role Management
  • Model Hub
    • Embeddings Models
    • Large Language Models (LLMs)
  • Prompt Management
    • Prompt Templates
    • Create Prompt
    • Test Prompt
      • Test & Compare
      • Prompt Observability
      • Prompt Runs
    • Agentic Prompts
      • Create Agent Prompt
      • Test Agent Prompt
    • Prompt Task Types
    • Prompt Versions
  • Datasets
  • Recipes
    • QnA Recipe
      • Data Storage Connectors
      • Connector Credential Setup
      • Vector Stores
      • Create Recipe
      • Run Recipe
      • Test Recipe
      • Evaluate Recipe
      • Export Recipe
      • Recipe Runs
      • Recipe Actions
    • Agent Recipe
      • Agent Recipe Configuration
      • Set up Agentic Recipe
      • Test Agentic Recipe
      • Agentic Evaluation
    • Databricks Recipe
  • Copilots
  • Observability
  • Dashboard Overview
    • Statistical Overview
    • Cost & Usage Summary
      • Spend by LLM Endpoint
      • Spend by Generative AI Application
    • Model Endpoints & Datasets Distribution
    • Dataset Dashboard
    • Copilot Dashboard
    • Model Endpoints Dashboard
  • Catalog Schemas
    • Connectors
    • Catalog Schema Import and Publication Process
  • Prompt Optimization Experiments
    • Set up and execute experiment
    • Optimization Insights
  • Generative AI Workshop
    • Agentic RAG
    • Intelligent Document Processing
    • Generative BI Agentic Assistant
  • Release Notes
Powered by GitBook
On this page
  • Default Metadata Extraction After Recipe Processing:
  • Custom Metadata Extraction Using Metadata Extractor Prompt:
  • ACL Tags:
  • Embedding model configuration
  • Processing task status

Datasets

PreviousPrompt VersionsNextRecipes

Last updated 1 month ago

In the recipe building process, datasets play a crucial role, serving as the foundation for various data processing operations.

Datasets can be added as follow.

  1. On the recipe canvas, drag and drop the Dataset element into the recipe.

  2. Provide a user-friendly name and description.

  3. Choose the dataset type: text, image, audio, or video.

  4. Save the dataset to view it on the dashboard with the latest updates.

Default Metadata Extraction After Recipe Processing:

The metadata feature displays default attributes extracted from a dataset after it has been processed through a recipe or workflow. In this case, fields such as source_ref, checksum, file_type, and other metadata related to the file are extracted automatically. These help identify and validate the dataset's integrity and origin.

Custom Metadata Extraction Using Metadata Extractor Prompt:

This feature allows for customized metadata extraction by using a specific prompt. Users can specify the relevant keys they want to extract, offering flexibility for tailored metadata extractions. This is useful for datasets that may have additional custom attributes or unique fields not covered by the default extraction.

ACL Tags:

Access Control List (ACL) tags are shown when the recipe is processed with ACLs enabled. These tags define the permissions and access control for the dataset, ensuring that only authorized users or processes can interact with certain data. The ACL information is displayed to ensure proper data governance and security protocols are followed during dataset processing.

Embedding model configuration

This feature specifies the technical details of the embedding model used for text processing.

Processing task status

The chart offers a clear visual summary of the processing status for tasks. It tracks the success and failure of various stages in the data processing pipeline, providing insights into the overall performance.

Key Features:

  • Total Items: Displays the total number of items being processed.

  • Processing Tasks: Tracks the following stages:

    • OCR (Optical Character Recognition): Converts images or scanned documents into machine-readable text.

    • PII (Personally Identifiable Information): Detects and manages sensitive personal data within the dataset.

    • Chunking: Breaking down larger pieces of data into smaller, more manageable chunks.

    • Embeddings: Transforms data into numerical representations for use in machine learning models.

  • Processing Status: Indicates the success or failure of each task:

    • Success: Tasks marked with green indicate successful completion.

    • Error: Tasks with orange bars represent errors encountered during processing.