Modalities

Models supported

EXXA Batch API currently supports the following model:

Model	Description	Context Length	Max Completion Length
`llama-3.1-8b-instruct-fp16`	Meta 8B parameter model	128K tokens	16,384 tokens
`llama-3.1-70b-instruct-fp16` `llama-3.3-70b-instruct-fp16`	Meta 70B parameter model	128K tokens	16,384 tokens
`deepseek-r1-distill-llama-3.3-70b-fp16`	Reasoning model, distilled from DeepSeek-R1 based on Meta Llama-3.3-70b model	128K tokens	20,480 tokens
`llama-3.1-nemotron-70b-instruct-fp16`	NVIDIA fine-tuned version of the 70B model	128K tokens	16,384 tokens
`beta:qwen-2-vl-72b-instruct-fp16`	Multimodal model (beta) with vision capabilities	32K tokens	16,384 tokens

The maximum completion length is configured to 4k tokens by default, use the max_tokens parameter to increase it.

For reasoning models, like deepseek-r1-distill-llama-3.3-70b-fp16, it is recommended to increase the max_tokens parameter to 20k tokens.

We are actively working to expand our model offerings to better serve your needs. If you have specific models requirements, please reach out to us on Discord or via email at founders@withexxa.com

Pricing

Pricing depends on the model use and number of tokens. Tokens rates are available in the table below.

Base model	Input tokens	Output tokens	Cached input tokens
`llama-3.1-8b-instruct-fp16`	$0.10 / M tokens	$0.15 / M tokens	$0.10 / M write tokens $0.02 / M read tokens
`llama-3.1-70b-instruct-fp16` `llama-3.3-70b-instruct-fp16`	$0.30 / M tokens	$0.50 / M tokens	$0.30 / M write tokens $0.06 / M read tokens
`deepseek-r1-distill-llama-3.3-70b-fp16`	$0.30 / M tokens	$0.50 / M tokens	$0.30 / M write tokens $0.06 / M read tokens
`llama-3.1-nemotron-70b-instruct-fp16`	$0.30 / M tokens	$0.50 / M tokens	$0.30 / M write tokens $0.06 / M read tokens
`beta:qwen-2-vl-72b-instruct-fp16`	$0.30 / M tokens	$0.50 / M tokens	$0.30 / M write tokens $0.06 / M read tokens

A minimum balance of $0.20 in credits is required to process a request.

Rate limits

EXXA Batch API is designed with flexibility in mind, imposing no hard rate limits. You are free to send any number of requests and create batches as large as needed without restrictions.

Over 2 billion requests, we cannot guarantee we will process under 24 hours, but we will make our best to process them as quickly as possible and provide you with visibility.

If you need to process massive amount of data, feel free to contact us via email at founders@withexxa.com We can do custom processing and pricing depending on your needs and requirements.

Completion time

EXXA ensures that all requests are processed and outputs delivered within 24 hours of their submission. We aim to process requests faster when possible.

Batching completion time details

Individual Request Processing: Each request in a batch or not is processed within 24 hours of submission.
Incremental Batch Processing: It's common for batches to be processed incrementally, with completion notifications issued only when the full batch is processed.
Full Batch Processing: We ensure that the entire batch will be processed within 24 hours after the submission of the last included request.

Modalities

Models supported​

Pricing​

Rate limits​

Completion time​

Batching completion time details​

Models supported

Pricing

Rate limits

Completion time

Batching completion time details