Modalities
Models supported
EXXA Batch API currently supports the following model:
Model | Description | Context Length | Max Completion Length |
---|---|---|---|
llama-3.1-8b-instruct-fp16 | Meta 8B parameter model | 128K tokens | 16,384 tokens |
llama-3.1-70b-instruct-fp16 llama-3.3-70b-instruct-fp16 | Meta 70B parameter model | 128K tokens | 16,384 tokens |
llama-3.1-nemotron-70b-instruct-fp16 | NVIDIA fine-tuned version of the 70B model | 128K tokens | 16,384 tokens |
beta:qwen-2-vl-72b-instruct-fp16 | Multimodal model (beta) with vision capabilities | 32K tokens | 16,384 tokens |
The maximum completion length is configured to 4k tokens by default, use the max_tokens
parameter to increase it.
We are actively working to expand our model offerings to better serve your needs. If you have specific models requirements, please reach out to us on Discord or via email at founders@withexxa.com
Pricing
Pricing depends on the model use and number of tokens. Tokens rates are available in the table below.
Base model | Input tokens | Output tokens | Cached input tokens |
---|---|---|---|
llama-3.1-8b-instruct-fp16 | $0.10 / M tokens | $0.15 / M tokens | $0.10 / M write tokens $0.02 / M read tokens |
llama-3.1-70b-instruct-fp16 llama-3.3-70b-instruct-fp16 | $0.30 / M tokens | $0.50 / M tokens | $0.30 / M write tokens $0.06 / M read tokens |
llama-3.1-nemotron-70b-instruct-fp16 | $0.30 / M tokens | $0.50 / M tokens | $0.30 / M write tokens $0.06 / M read tokens |
beta:qwen-2-vl-72b-instruct-fp16 | $0.30 / M tokens | $0.50 / M tokens | $0.30 / M write tokens $0.06 / M read tokens |
A minimum balance of $0.20 in credits is required to process a request.
Rate limits
EXXA Batch API is designed with flexibility in mind, imposing no hard rate limits. You are free to send any number of requests and create batches as large as needed without restrictions.
Over 2 billion requests, we cannot guarantee we will process under 24 hours, but we will make our best to process them as quickly as possible and provide you with visibility.
If you need to process massive amount of data, feel free to contact us via email at founders@withexxa.com We can do custom processing and pricing depending on your needs and requirements.
Completion time
EXXA ensures that all requests are processed and outputs delivered within 24 hours of their submission. We aim to process requests faster when possible.
Batching completion time details
- Individual Request Processing: Each request in a batch or not is processed within 24 hours of submission.
- Incremental Batch Processing: It's common for batches to be processed incrementally, with completion notifications issued only when the full batch is processed.
- Full Batch Processing: We ensure that the entire batch will be processed within 24 hours after the submission of the last included request.