Dataset Dashboard

On the Datasets page, you can select a pre-created dataset and view dashboards that give you insights into the data processing tasks.

Below is an example of a dataset dashboard and the associated each processing tasks.

Optical Character Recognition (OCR)

  • OCR enables the extraction of text from images or scanned documents, making the data more accessible and searchable.

  • Count: Number of dataset items processed using OCR.

  • Processing Status: Indicates whether the OCR task was successful or if there were errors during processing.

Personally Identifiable Information (PII)

  • PII handling involves identifying and managing data that could potentially identify a specific individual, such as names, social security numbers, addresses, etc.

  • Count: Number of dataset items scanned for PII.

  • Status: Indicates success or errors in identifying and handling PII.

Chunking

  • Chunking is the process of splitting documents into smaller, manageable pieces, called chunks, which can be processed independently.

  • Count: Number of dataset items that underwent the chunking process.

  • Status: Indicates success or errors in chunking.

Embeddings

  • Embeddings are vector representations of data, such as words, sentences, or images, that capture the semantic meaning and relationships within the data.

    • Count: Number dataset items that underwent the embeddings generation process.

    • Status: Indicates success or errors in generating embeddings.

Dataset Used in Batch Recipe

For the datasets that are used in Batch recipe, you see an additional chart of batch_chain in the datasets dashboard.

Batch-chain

  • Batch-chains refer to the sequence of tasks processed in batches to improve efficiency and manageability. This includes grouping data for processing and ensuring each step in the sequence is completed successfully.

  • Count: Number of dataset items processed in the batch-chain.

  • Status: Indicates whether each task in the batch-chain was successful or if errors were encountered.

Last updated