Skip to main content

Getting Started

EXXA Off-Peak Computing is an asynchronous inference API without hard rate limits designed for serving Generative AI models efficiently and sustainably. By aggregating requests over a set period (starting with 24 hours), it is possible to maximize GPU efficiency and prioritize GPUs in locations and during times where electricity has low emissions.

With EXXA API, you can:

  • Send requests one by one (similar to using a streaming API)
  • Aggregate requests into a batch (for more convenient processing)

1. Access EXXA Batch API

  1. Visit EXXA Console and sign in
  2. Create a new API key in the API key management section
  3. Ensure your account has sufficient credits to perform operations

For detailed API documentation, refer to our API Docs.

If you have any questions, please reach out to us on Discord or send us an email at founders@withexxa.com.

2. Send Requests

EXXA API provides a seamless way for developers to send requests with just a few lines of code. You need to activate payments on your account to enable your API keys.

Before running any of the following examples, make sure to set your EXXA API key as an environment variable:

export EXXA_API_KEY="your-api-key-here"

Use the following code to send a request:

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
url = "https://api.withexxa.com/v1/requests"
headers = {"X-API-Key": api_key, "Content-Type": "application/json"}
payload = {
"request_body": {
"model": "llama-3.1-70b-instruct-fp16",
"messages": [{"role": "user", "content": "Your query here"}],
}
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

3. Create a Batch

After sending each request, you can aggregate requests into a batch for processing. Assign a name to your batch for easier management:

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
url = "https://api.withexxa.com/v1/batches"
headers = {"X-API-Key": api_key, "Content-Type": "application/json"}
payload = {
"requests_ids": ["request_id1", "request_id2"],
"metadata": {"batch_name": "MyFirstBatch"}
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

The advantages of batching requests are the following:

  • Check the status of a group of requests at once
  • Cancel a group of requests simultaneously

It is important to note that you are not required to create a batch; you can use EXXA asynchronous API for each request individually and retrieve results for each request.

You can easily check if a request is included in a batch or not by checking the Batch ID parameter of the request. If the Batch ID is null, it is not included in any batch.

4. Check the Status

1. Batch

Check the status of your batch using the following snippet:

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
batch_id = 'batch_id_here'
url = f"https://api.withexxa.com/v1/batches/{batch_id}/status"
headers = {"X-API-Key": api_key}

response = requests.get(url, headers=headers)
print(response.json())

The status of a given batch can be any of the following:

StatusDescription
registeredThe batch was received and is pending processing
in progressThe batch was validated; processing underway
cancelledThe batch was cancelled
completedThe batch was processed; all results are available

2. Request

Check the status of your request using the following snippet:

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
request_id = 'request_id_here'
url = f"https://api.withexxa.com/v1/requests/{request_id}/status"
headers = {"X-API-Key": api_key}

response = requests.get(url, headers=headers)
print(response.json())

The status of a given request can be any of the following:

StatusDescription
registeredThe request was received and is pending processing
in progressThe request was validated; processing underway
cancelledThe request was cancelled
completedThe request was processed; result is available
failedThe request was processed; It failed and returned an error

5. Retrieve the Results

Once a batch or an individual request has been processed, retrieve the output using:

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
batch_id = 'batch_id_here'
url = f"https://api.withexxa.com/v1/batches/{batch_id}/results"
headers = {"X-API-Key": api_key}

response = requests.get(url, headers=headers)
print(response.text)

# You could also iterate over the response to get the result of each request
# for line in response.iter_lines():
# result = json.loads(line)
# print(result)

6. Cancel a Request or a Batch

If you need to cancel a batch of requests or an individual request, you can do it as follows:

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
headers = {"X-API-Key": api_key}

# Cancel a batch
batch_id = 'batch_id_here'
url = f"https://api.withexxa.com/v1/batches/{batch_id}/cancel"
response = requests.post(url, headers=headers)
print(response.json())

# Cancel an individual request
request_id = 'request_id_here'
url = f"https://api.withexxa.com/v1/batches/{request_id}/cancel"
response = requests.post(url, headers=headers)
print(response.json())

Note that when you cancel a batch, this action will cancel all requests contained within it. However, canceling a single request from a batch will affect only that request; the other requests in the batch will not be affected and will continue processing.