Getting Started

EXXA Off-Peak Computing is an asynchronous inference API without hard rate limits designed for serving Generative AI models efficiently and sustainably. By aggregating requests over a set period (starting with 24 hours), it is possible to maximize GPU efficiency and prioritize GPUs in locations and during times where electricity has low emissions.

With EXXA API, you can:

Send requests one by one (similar to using a streaming API)
Aggregate requests into a batch (for more convenient processing)

1. Access EXXA Batch API

Visit EXXA Console and sign in
Create a new API key in the API key management section
Ensure your account has sufficient credits to perform operations

For detailed API documentation, refer to our API Docs.

If you have any questions, please reach out to us on Discord or send us an email at founders@withexxa.com.

2. Send Requests

EXXA API provides a seamless way for developers to send requests with just a few lines of code. You need to activate payments on your account to enable your API keys.

Before running any of the following examples, make sure to set your EXXA API key as an environment variable:

export EXXA_API_KEY="your-api-key-here"

Use the following code to send a request:

python
curl

import requests as http_client
import os

api_key = os.environ["EXXA_API_KEY"]
url = "https://api.withexxa.com/v1/requests"
headers = {"X-API-Key": api_key, "Content-Type": "application/json"}
payload = {
    "request_body": {
        "model": "llama-3.1-70b-instruct-fp16",
        "messages": [{"role": "user", "content": "Your query here"}],
    }
}

response = http_client.post(url, headers=headers, json=payload)
print(response.json())

curl -X POST https://api.withexxa.com/v1/requests \
  -H "X-API-Key: $EXXA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_body": {
      "model": "llama-3.1-70b-instruct-fp16",
      "messages": [{"role": "user", "content": "Your query here"}],
    }
  }'

3. Create a Batch

After sending each request, you can aggregate requests into a batch for processing. Assign a name to your batch for easier management:

python
curl

import requests as http_client
import os

api_key = os.environ["EXXA_API_KEY"]
url = "https://api.withexxa.com/v1/batches"
headers = {"X-API-Key": api_key, "Content-Type": "application/json"}
payload = {
    "requests_ids": ["request_id1", "request_id2"],
    "metadata": {"batch_name": "MyFirstBatch"}
}

response = http_client.post(url, headers=headers, json=payload)
print(response.json())

curl -X POST https://api.withexxa.com/v1/batches \
  -H "X-API-Key: $EXXA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "requests_ids": ["request_id1", "request_id2"],
    "metadata": {"batch_name": "MyFirstBatch"}
  }'

The advantages of batching requests are the following:

Check the status of a group of requests at once
Cancel a group of requests simultaneously

It is important to note that you are not required to create a batch; you can use EXXA asynchronous API for each request individually and retrieve results for each request.

You can easily check if a request is included in a batch or not by checking the Batch ID parameter of the request. If the Batch ID is null, it is not included in any batch.

4. Check the Status

1. Batch

Check the status of your batch using the following snippet:

python
curl

import requests as http_client
import os

api_key = os.environ["EXXA_API_KEY"]
batch_id = 'batch_id_here'
url = f"https://api.withexxa.com/v1/batches/{batch_id}/status"
headers = {"X-API-Key": api_key}

response = http_client.get(url, headers=headers)
print(response.json())

curl -X GET https://api.withexxa.com/v1/batches/{batch_id_here}/status \
  -H "X-API-Key: $EXXA_API_KEY"

The status of a given batch can be any of the following:

Status	Description
registered	The batch was received and is pending processing
in progress	The batch was validated; processing underway
cancelled	The batch was cancelled
completed	The batch was processed; all results are available

2. Request

Check the status of your request using the following snippet:

python
curl

import requests as http_client
import os

api_key = os.environ["EXXA_API_KEY"]
request_id = 'request_id_here'
url = f"https://api.withexxa.com/v1/requests/{request_id}/status"
headers = {"X-API-Key": api_key}

response = http_client.get(url, headers=headers)
print(response.json())

curl -X GET https://api.withexxa.com/v1/requests/{request_id_here}/status \
  -H "X-API-Key: $EXXA_API_KEY"

The status of a given request can be any of the following:

Status	Description
registered	The request was received and is pending processing
in progress	The request was validated; processing underway
cancelled	The request was cancelled
completed	The request was processed; result is available
failed	The request was processed; It failed and returned an error

5. Retrieve the results

1. Retrieve a Batch results

You can retrieve the results of a batch using the following snippet:

python
curl

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
batch_id = 'batch_id_here'
url = f"https://api.withexxa.com/v1/batches/{batch_id}/results"
headers = {"X-API-Key": api_key}

response = requests.get(url, headers=headers)
print(response.text)

# You could also iterate over the response to get the result of each request
# for line in response.iter_lines():
#     result = json.loads(line)
#     print(result)

curl -X GET https://api.withexxa.com/v1/batches/{batch_id_here}/results \
  -H "X-API-Key: $EXXA_API_KEY"

2. Retrieve a Request result

You can retrieve the results of a request using the following snippet:

python
curl

import requests as http_client
import os

api_key = os.environ["EXXA_API_KEY"]
request_id = 'request_id_here'
url = f"https://api.withexxa.com/v1/requests/{request_id}"
headers = {"X-API-Key": api_key}

response = http_client.get(url, headers=headers)
print(response.json())

curl -X GET https://api.withexxa.com/v1/requests/{request_id_here} \
  -H "X-API-Key: $EXXA_API_KEY"

6. Cancel a Request or a Batch

If you need to cancel a batch of requests or an individual request, you can do it as follows:

python
curl

import requests
import os

api_key = os.environ["EXXA_API_KEY"]
headers = {"X-API-Key": api_key}

# Cancel a batch
batch_id = 'batch_id_here'
url = f"https://api.withexxa.com/v1/batches/{batch_id}/cancel"
response = requests.post(url, headers=headers)
print(response.json())

# Cancel an individual request
request_id = 'request_id_here'
url = f"https://api.withexxa.com/v1/batches/{request_id}/cancel"
response = requests.post(url, headers=headers)
print(response.json())

# Cancel a batch
curl -X POST https://api.withexxa.com/v1/batches/{batch_id_here}/cancel \
  -H "X-API-Key: $EXXA_API_KEY"

# Cancel an individual request
curl -X POST https://api.withexxa.com/v1/batches/{request_id_here}/cancel \
  -H "X-API-Key: $EXXA_API_KEY"

Note that when you cancel a batch, this action will cancel all requests contained within it. However, canceling a single request from a batch will affect only that request; the other requests in the batch will not be affected and will continue processing.

Getting Started

1. Access EXXA Batch API​

2. Send Requests​

3. Create a Batch​

4. Check the Status​

1. Batch​

2. Request​

5. Retrieve the results​

1. Retrieve a Batch results​

2. Retrieve a Request result​

6. Cancel a Request or a Batch​