OpticXT exposes a single REST API endpoint for inference operations.
POST /v1/inference
This endpoint accepts multipart form data with optional image/video files and text input, and returns text responses from the vision-language model. If the robot is currently executing a task, it can provide status updates based on the current camera feed.
The endpoint accepts multipart/form-data with the following optional fields:
text(string, optional): Text input/prompt for the modelimage(file, optional): Image file (JPEG, PNG, etc.)video(file, optional): Video file (currently extracts first frame)task_context(string, optional): Additional context about the current task
{
"id": "uuid-string",
"text": "Model response text",
"processing_time_ms": 1250,
"tokens_generated": 45,
"status": "completed|current_task",
"current_task": "optional description of ongoing task"
}curl -X POST http://localhost:8080/v1/inference \
-F "text=What can you see in the camera feed?"curl -X POST http://localhost:8080/v1/inference \
-F "text=Describe what you see in this image" \
-F "image=@/path/to/image.jpg"curl -X POST http://localhost:8080/v1/inference \
-F "text=Move forward and find a red object" \
-F "task_context=Navigation task in living room"# If a task is running, send request without input to get status
curl -X POST http://localhost:8080/v1/inferenceTo start OpticXT in API server mode:
# Start the API server on default port 8080
./opticxt --api-server
# Start on custom port
./opticxt --api-server --api-port 3000
# With custom model
./opticxt --api-server --model-path /path/to/model.gguf
# CPU-only mode (no CUDA)
cargo run --no-default-features -- --api-server- Multimodal Input: Supports text, images, and video files
- Task Tracking: Automatically tracks ongoing robot tasks
- Status Updates: Provides real-time status of current operations
- Vision Integration: Uses camera feed when no image is provided
- CORS Enabled: Ready for web frontend integration
{
"error": "Error description",
"code": "ERROR_CODE"
}Common error codes:
INVALID_MULTIPART: Malformed request dataIMAGE_PARSE_ERROR: Invalid image formatINFERENCE_ERROR: Model processing failedSTATUS_ERROR: Failed to get robot status
- The API is designed for integration with web frontends, mobile apps, or other services
- Supports real-time robot control and monitoring
- Can be used alongside the regular robot control mode
- Thread-safe for concurrent requests