Modalities
Models supported
EXXA Batch API currently supports the following model:
llama-3.1-8b-instruct-fp16
llama-3.1-70b-instruct-fp16
llama-3.1-nemotron-70b-instruct-fp16
The nemotron version of llama-3.1-70b-instruct-fp16
is a fine-tuning made by NVIDIA of the original llama-3.1-70b-instruct-fp16
model. It has the same pricing as the original model.
All models support a context length of up to 128k tokens and a maximum completion length of 16k tokens.
We are actively working to expand our model offerings to better serve your needs. If you have specific models requirements, please reach out to us on Discord or via email at founders@withexxa.com
Pricing
Pricing depends on the model use and number of tokens. Tokens rates are available in the table below.
Base model | Input tokens | Output tokens | Cached input tokens |
---|---|---|---|
llama-3.1-8b-instruct-fp16 | $0.10 / M tokens | $0.15 / M tokens | $0.10 / M write tokens $0.02 / M read tokens |
llama-3.1-70b-instruct-fp16 | $0.30 / M tokens | $0.50 / M tokens | $0.30 / M write tokens $0.06 / M read tokens |
llama-3.1-nemotron-70b-instruct-fp16 | $0.30 / M tokens | $0.50 / M tokens | $0.30 / M write tokens $0.06 / M read tokens |
A minimum balance of $0.20 in credits is required to process a request.
Rate limits
EXXA Batch API is designed with flexibility in mind, imposing no hard rate limits. You are free to send any number of requests and create batches as large as needed without restrictions.
Over 2 billion requests, we cannot guarantee we will process under 24 hours, but we will make our best to process them as quickly as possible and provide you with visibility.
If you need to process massive amount of data, feel free to contact us via email at founders@withexxa.com We can do custom processing and pricing depending on your needs and requirements.
Completion time
EXXA ensures that all requests are processed and outputs delivered within 24 hours of their submission. We aim to process requests faster when possible.
Batching completion time details
- Individual Request Processing: Each request in a batch or not is processed within 24 hours of submission.
- Incremental Batch Processing: It's common for batches to be processed incrementally, with completion notifications issued only when the full batch is processed.
- Full Batch Processing: We ensure that the entire batch will be processed within 24 hours after the submission of the last included request.