Karini AI Documentation

Go Back to Karini AI

Introduction
Installation
Getting Started
Organization
User Management
- User Invitations
- Role Management
Model Hub
- Embeddings Models
- Large Language Models (LLMs)
Prompt Management
Datasets
Recipes
Copilots
Observability
Dashboard Overview
Catalog Schemas
- Connectors
- Catalog Schema Import and Publication Process
Prompt Optimization Experiments
- Set up and execute experiment
- Optimization Insights
Generative AI Workshop
Release Notes

Powered by GitBook

On this page

Optical Character Recognition (OCR)
Personally Identifiable Information (PII)
Chunking
Embeddings
Dataset Used in Batch Recipe
Batch-chain

Dashboard Overview

Dataset Dashboard

PreviousModel Endpoints & Datasets Distribution NextCopilot Dashboard

Last updated 11 months ago

On the Datasets page, you can select a pre-created dataset and view dashboards that give you insights into the data processing tasks.

Below is an example of a dataset dashboard and the associated each processing tasks.

Optical Character Recognition (OCR)

OCR enables the extraction of text from images or scanned documents, making the data more accessible and searchable.
Count: Number of dataset items processed using OCR.
Processing Status: Indicates whether the OCR task was successful or if there were errors during processing.

Personally Identifiable Information (PII)

PII handling involves identifying and managing data that could potentially identify a specific individual, such as names, social security numbers, addresses, etc.
Count: Number of dataset items scanned for PII.
Status: Indicates success or errors in identifying and handling PII.

Chunking

Chunking is the process of splitting documents into smaller, manageable pieces, called chunks, which can be processed independently.
Count: Number of dataset items that underwent the chunking process.
Status: Indicates success or errors in chunking.

Embeddings

Embeddings are vector representations of data, such as words, sentences, or images, that capture the semantic meaning and relationships within the data.
- Count: Number dataset items that underwent the embeddings generation process.
- Status: Indicates success or errors in generating embeddings.

Dataset Used in Batch Recipe

For the datasets that are used in Batch recipe, you see an additional chart of batch_chain in the datasets dashboard.

Batch-chain

Batch-chains refer to the sequence of tasks processed in batches to improve efficiency and manageability. This includes grouping data for processing and ensuring each step in the sequence is completed successfully.
Count: Number of dataset items processed in the batch-chain.
Status: Indicates whether each task in the batch-chain was successful or if errors were encountered.