Skip to main content

Modalities

Models supported

EXXA Batch API currently supports the following model:

ModelDescriptionContext LengthMax Completion Length
llama-3.1-8b-instruct-fp16Meta 8B parameter model128K tokens16,384 tokens
llama-3.1-70b-instruct-fp16
llama-3.3-70b-instruct-fp16
Meta 70B parameter model128K tokens16,384 tokens
llama-3.1-nemotron-70b-instruct-fp16NVIDIA fine-tuned version of the 70B model128K tokens16,384 tokens
beta:qwen-2-vl-72b-instruct-fp16Multimodal model (beta) with vision capabilities32K tokens16,384 tokens

The maximum completion length is configured to 4k tokens by default, use the max_tokens parameter to increase it.

We are actively working to expand our model offerings to better serve your needs. If you have specific models requirements, please reach out to us on Discord or via email at founders@withexxa.com

Pricing

Pricing depends on the model use and number of tokens. Tokens rates are available in the table below.

Base modelInput tokensOutput tokensCached input tokens
llama-3.1-8b-instruct-fp16$0.10 / M tokens$0.15 / M tokens$0.10 / M write tokens
$0.02 / M read tokens
llama-3.1-70b-instruct-fp16
llama-3.3-70b-instruct-fp16
$0.30 / M tokens$0.50 / M tokens$0.30 / M write tokens
$0.06 / M read tokens
llama-3.1-nemotron-70b-instruct-fp16$0.30 / M tokens$0.50 / M tokens$0.30 / M write tokens
$0.06 / M read tokens
beta:qwen-2-vl-72b-instruct-fp16$0.30 / M tokens$0.50 / M tokens$0.30 / M write tokens
$0.06 / M read tokens

A minimum balance of $0.20 in credits is required to process a request.

Rate limits

EXXA Batch API is designed with flexibility in mind, imposing no hard rate limits. You are free to send any number of requests and create batches as large as needed without restrictions.

Over 2 billion requests, we cannot guarantee we will process under 24 hours, but we will make our best to process them as quickly as possible and provide you with visibility.

If you need to process massive amount of data, feel free to contact us via email at founders@withexxa.com We can do custom processing and pricing depending on your needs and requirements.

Completion time

EXXA ensures that all requests are processed and outputs delivered within 24 hours of their submission. We aim to process requests faster when possible.

Batching completion time details

  • Individual Request Processing: Each request in a batch or not is processed within 24 hours of submission.
  • Incremental Batch Processing: It's common for batches to be processed incrementally, with completion notifications issued only when the full batch is processed.
  • Full Batch Processing: We ensure that the entire batch will be processed within 24 hours after the submission of the last included request.