Skip to main content

Modalities

Models supported

EXXA Batch API currently supports the following model:

  • llama-3.1-70b-instruct-fp16

We are actively working to expand our model offerings to better serve your needs. If you have specific models requirements, please reach out to us on Discord or via email at founders@withexxa.com

Pricing

Pricing depends on the model use and number of tokens. Tokens rates are available in the table below.

Base modelInput tokensOutput tokens
llama-3.1-70b-instruct-fp16$0.30 / M tokens$0.50 / M tokens

Rate limits

EXXA Batch API is designed with flexibility in mind, imposing no hard rate limits. You are free to send any number of requests and create batches as large as needed without restrictions.

Over 2 billion requests, we cannot guarantee we will process under 24 hours, but we will make our best to process them as quickly as possible and provide you with visibility.

If you need to process massive amount of data, feel free to contact us via email at founders@withexxa.com We can do custom processing and pricing depending on your needs and requirements.

Completion time

EXXA ensures that all requests are processed and outputs delivered within 24 hours of their submission. We aim to process requests faster when possible.

Batching completion time details

  • Individual Request Processing: Each request in a batch or not is processed within 24 hours of submission.
  • Incremental Batch Processing: It's common for batches to be processed incrementally, with completion notifications issued only when the full batch is processed.
  • Full Batch Processing: We ensure that the entire batch will be processed within 24 hours after the submission of the last included request.