Large Language Models (LLMs)

PreviousEmbeddings Models NextPrompt Management

Last updated 1 month ago

Large Language Models (LLMs)

Karini AI supports integrations with the following Large Language Model providers and custom models. Using these models, users can create model endpoints in the Karini AI model hub.

Amazon Bedrock
In Amazon Bedrock, the "Model Serving" configuration provides two options for how your models are deployed and managed: On Demand and Provisioned Throughput.
1. On Demand
  With the On Demand option, Amazon Bedrock automatically adjusts the computational resources to match the volume of requests your model receives. This means the system scales up or down in real-time based on demand, offering flexibility without the need for manual resource management. This option is suitable for workloads with varying traffic, ensuring you only pay for the resources used during active requests.
  The table below lists all available models.
2. Provisioned Throughput
  The Provisioned Throughput option allows you to specify a fixed amount of computational capacity for your model, ensuring consistent performance and response times. This is ideal for use cases where you require a stable level of throughput, regardless of fluctuating demand. Resources are pre-allocated, which guarantees predictable performance but comes with a fixed cost, regardless of actual usage.
  When Provisioned Throughput is selected, the following model providers are available:
  For more details on model providers, please refer to the relevant documentation.
Model ARN: Enter the ARN (Amazon Resource Name) of the selected model in this field. The ARN serves as a unique identifier for the model within the AWS ecosystem, ensuring proper configuration and secure linkage to your account.
OpenAI
Azure OpenAI
Databricks
Anyscale
Amazon SageMaker

Add New Model Endpoint

To add a new model endpoint to the model hub, do the following:

On the Model Endpoints menu, select Large language model endpoints(LLM) tab and click Add New.
Select a model provider and associated model id in the list.
User has option to override default configurations such as temperature, max tokens and pricing.
By default, the organization level credentials are used to access the model. User can User can optionally overwrite credentials with a new set of model credentials.
User can test the model endpoint request and response by using the Test endpoint button.

Review Model Endpoints

User can review the created model endpoints under Large language model endpoints(LLM) tab. It includes following information:

Model provider and model id
Max tokens, Min tokens and Temperature: The default values are displayed based on model specifications from the model provider. User has the ability to override them.
Link to view the the recipes and prompts in which the model endpoint is used.
Link to view the model information including the cost and usage dashboard for the model endpoint.

Available LLM Configurations

The following table describes LLMs that are available for integration with Karini AI model hub. It also includes links to model provider reference documentation offering detailed information on model specifications, usage instructions, and API endpoints for effective integration and utilization.

Provider

Models

Config Parameters

Reference

Amazon Bedrock

Anthropic Claude 3.7 Sonnet vl
Anthropic Claude 3.5 Sonnet v2
Anthropic Claude 3.5 Haiku 20241022 vl
DeepSeek RI vl
Amazon Nova Pro vl
.Amazon Nova Lite vl
Amazon Nova Micro vl
Anthropic Claude 3.5 Sonnet 20240620 vl
Anthropic Claude 3 Opus 20240229 vl
Anthropic Claude 3 Sonnet 20240229 vl
Anthropic Claude 3 Haiku 20240307 vl
Anthropic Claude v2.1
Anthropic Claude v2
Anthropic Claude Instant vl
Llama 3.3 70B Instruct
Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
Llama 3.2 11B Vision Instruct
Llama 3.2 90B Vision Instruct
Meta Llama 3.1 8B Instruct
Meta Llama 3.1 70B Instruct
Meta Llama 3 8B Instruct
Meta Llama 3 70B Instruct
Mistral 7B Instruct
Mistral Mixtral 8x7B Instruct
Mistral Large (24.02)
Mistral Small (24.02)
Cohere Command R Plus
Cohere Command R
Amazon Titan Text Premier
Amazon Titan Text Express
Amazon Titan Text Lite
A121 Jamba 1.5 Mini
A121 Jamba 1.5 Large
A121 Jamba Instruct

Temperature
Max Tokens

Azure OpenAI

GPT 4O 2024-11-20
GPT 4O Mini
O3 Mini
O1
GPT 4O 2024-08-06
GPT 4O
GPT 3.5 Turbo (Legacy)
GPT-4 (Legacy)

Temperature
Max Tokens
Azure OpenAI API Base
Azure OpenAI Deployment Name

OpenAI

GPT 4O 2024-11-20
GPT 4O Mini
O3 Mini
O1
GPT 4O 2024-08-06
GPT 4O
Whisper-I
TTS-I
TTS-1-hd
GPT-4-Turbo
GPT-3.5-Turbo (Legacy)

Temperature
Max Tokens

Google Gemini

Gemini 2.O Flash
Gemini 2.O Flash-Lite Preview
Gemini 1.5 Pro
Gemini 1.5 Flash

Temperature
Max Token

Vertex Gemini

Gemini 2.O Flash
Gemini 2.O Flash-Lite Preview
Gemini 2.O Flash Thinking
Gemini 1.5 Pro
Gemini 1.5 Flash

Temperature
Max Token

Fireworks

Llama 4 Maverick Instruct (Basic)
Llama 4 Scout Instruct (Basic)
DeepSeek RI
Deepseek V3-0324
Llama V3P1 405B Instruct
Llama V3P3 70B Instruct

Temperature
Max Token

Anyscale

Google Gemma 7B
Meta Llama 3 8B
Meta Lama 3 70B
Mistral 7B Instruct
Mixtral 8x7B Instruct
Mixtral 8x22B Instruct

Temperature
Max Tokens

Databricks

Foundation Models
1. Databricks DBRX Instruct
2. Meta Lama 3 70B Instruct
3. Mistral 8x7B Instruct
4. Llama 2 70B Chat (Legacy)

Databricks External Models

Databricks Custom Models

Temperature
Max Tokens
Endpoint URL: Databricks model Endpoint URL

Amazon SageMaker

Temperature
Max Tokens
Model Endpoint Name: SageMaker model endpoint name

Cohere

Rerank-english-v3.0
Rerank-multilingual-v3.0
Rerank-english-v2.0
Rerank-multilingual-v2.0

PreviousEmbeddings Models NextPrompt Management

Last updated 1 month ago

On Demand

Provisioned Throughput

Add New Model Endpoint

Review Model Endpoints

Available LLM Configurations

On Demand

Provisioned Throughput

Add New Model Endpoint

Review Model Endpoints

Available LLM Configurations