Karini AI Documentation
Go Back to Karini AI
  • Introduction
  • Installation
  • Getting Started
  • Organization
  • User Management
    • User Invitations
    • Role Management
  • Model Hub
    • Embeddings Models
    • Large Language Models (LLMs)
  • Prompt Management
    • Prompt Templates
    • Create Prompt
    • Test Prompt
      • Test & Compare
      • Prompt Observability
      • Prompt Runs
    • Agentic Prompts
      • Create Agent Prompt
      • Test Agent Prompt
    • Prompt Task Types
    • Prompt Versions
  • Datasets
  • Recipes
    • QnA Recipe
      • Data Storage Connectors
      • Connector Credential Setup
      • Vector Stores
      • Create Recipe
      • Run Recipe
      • Test Recipe
      • Evaluate Recipe
      • Export Recipe
      • Recipe Runs
      • Recipe Actions
    • Agent Recipe
      • Agent Recipe Configuration
      • Set up Agentic Recipe
      • Test Agentic Recipe
      • Agentic Evaluation
    • Databricks Recipe
  • Copilots
  • Observability
  • Dashboard Overview
    • Statistical Overview
    • Cost & Usage Summary
      • Spend by LLM Endpoint
      • Spend by Generative AI Application
    • Model Endpoints & Datasets Distribution
    • Dataset Dashboard
    • Copilot Dashboard
    • Model Endpoints Dashboard
  • Catalog Schemas
    • Connectors
    • Catalog Schema Import and Publication Process
  • Prompt Optimization Experiments
    • Set up and execute experiment
    • Optimization Insights
  • Generative AI Workshop
    • Agentic RAG
    • Intelligent Document Processing
    • Generative BI Agentic Assistant
  • Release Notes
Powered by GitBook
On this page
  • Amazon S3
  • Azure Cloud Storage
  • Google Cloud Storage
  • Confluence
  • Dropbox
  • Box
  • Google Drive
  • Website
  • Manifest
  • Sharepoint
  1. Recipes
  2. QnA Recipe

Data Storage Connectors

PreviousQnA RecipeNextConnector Credential Setup

Last updated 6 days ago

Karini AI supports out-of-the-box integration with the following data connectors. This gives you the flexibility to access data from disparate data sources.

The access credentials for the data connectors must be set in the .

Amazon S3

  • is a scalable object storage service provided by Amazon Web Services (AWS).

  • In order to setup access to your datasource in S3, you need to specify the path to your S3 bucket or folder within the bucket in the recipe's storage connector. You can also use the recursive option to access data from the bucket path and all of it's subfolders.

Azure Cloud Storage

  • is a Microsoft-managed cloud service that provides scalable and secure storage solutions.

  • In order to setup access to your datasource in Azure Cloud Storage, you need to specify the Azure Cloud Storage Container Path in the recipe's storage connector.

Google Cloud Storage

  • is a service provided by Google Cloud Platform that offers highly durable and available object storage.

  • To access you data source from Google Cloud Storage, you need to specify the full Google Cloud Storage bucket path in the recipe's storage connector.

Confluence

  • is a collaboration and content management tool used by teams to create, share, and manage their work in one place. It's often used for documentation, project planning, and team collaboration.

  • In Confluence, a space is a designated area where users can organize and manage related content, such as pages, documents, and discussions. To access you data from Confluence, you need to specify the confluence space name in the recipe's storage connector.

Dropbox

  • To access you data from Dropbox, you need to specify the dropbox folder name in the recipe's storage connector.

Box

Google Drive

  • To access you data from Google drive, you need to specify the Google drive folder id in the recipe's storage connector. Google drive folder id refers to the specific path or location within your Google Drive where the files or folders you want to access are stored.

Website

A Website connector typically allows you to extract and manage data directly from websites. This can include scraping data, integrating with APIs provided by websites, or embedding website content into other applications.

Karini AI's website connectors enables you to crawl your website data source using following options.

Source Type

  1. URLs: Add up to 10 seed/starting point URLs of the websites you want to crawl. You can also include website subdomains.

  2. Sitemap: Add up to 3 sitemap URLs of the websites you want to crawl. Sitemaps help in systematically crawling and extracting data from all pages listed in the sitemap file.

  3. Source URL Files: Add up to 100 seed/starting point URLs listed in a text file in Amazon S3, or as http, https link. Each URL should be on a separate line in the text file. You can also upload from a local device.

  4. Source Sitemap Files: Add up to 3 sitemap XML files stored in Amazon S3 or local device. Upload a file containing multiple sitemap URLs to crawl and extract data from.

Configuration Settings

  • Crawl Depth: The depth, or number, of levels from the seed level to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.

  • Maximum File Size (MB): The maximum size in MB of a webpage or attachment to crawl.

  • Maximum Number of URLs Crawled per Minute per Host: Limits the rate at which the connector accesses URLs on the same host.

  • Include files in web page links: Choose to crawl files that the webpages link to.

  • Include URL Patterns: Add regular expression patterns to include crawling specific URLs, and indexing any hyperlinks on these URL webpages.

  • Exclude URL Patterns: Add regular expression patterns to exclude crawling specific URLs, and indexing any hyperlinks on these URL webpages.

Manifest

You can provide a S3 manifest file as a data source in recipe storage connector. The manifest file is expected to be in CVS format, with each line containing a url as source.

Sharepoint

Sharepoint is a web-based collaboration and document management platform developed by Microsoft. It enables organizations to store, manage, and share documents and other content in a secure, centralized location.

To configure access to your data stored in sharepoint, you will need to specify the folder path in the recipe's source connector. This will enable seamless integration and retrieval of data from your sharepoint repository for use in your workflows.

is a file hosting service that provides cloud storage, file synchronization, personal cloud, and client software. It allows users to create a special folder on their computers, which Dropbox then synchronizes so that it appears to be the same folder (with the same contents) regardless of which device is used to view it. Dropbox is often used for file sharing and collaboration.

is a cloud-based file storage and collaboration service that allows users to store, access, and share files from anywhere

To access you data from Box, you need to specify the as global credentials

is a file storage and synchronization service developed by Google. It allows users to store files in the cloud, synchronize files across devices, and share files. Google Drive includes Google Docs, Sheets, and Slides, which enable collaborative editing of documents, spreadsheets, and presentations.

For steps to obtain the credentials for Sharepoint, please refer to

Organization
Amazon Simple Storage Service (S3)
Azure Storage
Google Cloud Storage
Confluence
Dropbox
Box
Google Drive
Box credentials json
Sharepoint Credential Setup section.