This page explains how to set up a testbed to run gRPC OSS benchmarks. The testbed consists of a Kubernetes cluster and a custom controller deployed to the cluster. For other aspects of running tests once the testbed is set up, see tools.
The cluster running benchmark jobs must be configured with node pools
dimensioned for the number of simultaneous tests that it should support. The
controller uses pool as a node selector for the various pod types. Worker pods
have mutual anti-affinity, so one node is required per pod.
For example, the node pools that are used in our continuous integration testbed are configured as follows:
| Pool name | Node count | Machine type | Kubernetes labels |
|---|---|---|---|
| system | 2 | e2-standard-8 | default-system-pool:true, pool:system |
| drivers-ci | 8 | e2-standard-2 | pool:drivers-ci |
| workers-c2-8core-ci | 8 | c2-standard-8 | pool:workers-c2-8core-ci |
| workers-c2-30core-ci | 8 | c2-standard-30 | pool:workers-c2-30core-ci |
Since each scenario in our tests requires one driver and two workers, this
configuration supports four simultaneous tests on 8-core machines and four on
30-core machines. Drivers require few resources, and do not have mutual
anti-affinity. We find it convenient to schedule them on two-core machines with
a node count set to the required number of drivers, rather than on a larger
shared machine, because that allows the driver pool to be resized together with
the worker pools. The controller itself is scheduled in the system pool.
In addition to the pools used in continuous integration, our cluster contains another set of node pools that can be used for ad hoc testing:
| Pool name | Node count | Machine type | Kubernetes labels |
|---|---|---|---|
| drivers | 8 | e2-standard-8 | default-driver-pool:true, pool:drivers |
| workers-8core | 8 | e2-standard-8 | default-worker-pool:true, pool:workers-8core |
| workers-32core | 8 | e2-standard-32 | pool:workers-32core |
Some pools are labeled with default-*-pool labels. These labels specify which
pool to use if it is not specified in the LoadTest configuration. With the
configuration above, these tests (for instance, the tests specified in the
examples) will use the drivers and workers-8core pools, and not
interfere with continuous integration jobs. The default labels are defined as
part of the controller configuration: if they are
not set, the controller will only run tests where the pool labels are
specified explicitly.
The following instructions explain how to build the custom LoadTest controller and how to deploy it to the cluster.
In order to build a specific version of the controller, you must check out the
desired version. The following commands clone the repo and check out version
v1.5.7:
git clone https://github.com/grpc/test-infra && cd test-infra
git checkout --detach v1.5.7The following environment variables must be set before starting the build:
TEST_INFRA_VERSIONDRIVER_VERSIONINIT_IMAGE_PREFIXBUILD_IMAGE_PREFIXRUN_IMAGE_PREFIXKILL_AFTER
GOCMD may be set to build with a specific version of Go. This
variable defaults to go.
TEST_INFRA_VERSION is used to tag the images created by the controller build,
and defaults to latest.
DRIVER_VERSION is the version of the load test driver. The driver is built
from the gRPC Core repository. This variable defaults to a recent
known good version of gRPC, so it is safe to leave it unset.
INIT_IMAGE_PREFIX, BUILD_IMAGE_PREFIX and RUN_IMAGE_PREFIX define the
repository locations where various kinds of images will be uploaded. If not set,
will upload to Docker Hub.
KILL_AFTER is the time interval in seconds after which a KILL signal will be
sent to test components, if they have not terminated after timeout. Component
timeout is set in the LoadTest configuration. KILL_AFTER is set in the
controller configuration, as a safeguard for
components that may hang and consume resources after test timeout.
The variables used to build the v1.5.7 release are as follows:
export GOCMD=go1.21.13
export TEST_INFRA_VERSION=v1.5.7
export INIT_IMAGE_PREFIX=us-docker.pkg.dev/grpc-testing/e2etest/init-
export BUILD_IMAGE_PREFIX=us-docker.pkg.dev/grpc-testing/e2etest/init-build-
export RUN_IMAGE_PREFIX=us-docker.pkg.dev/grpc-testing/e2etest/
export KILL_AFTER=30Our images are pushed to gcr.io. You can push to any image repository by
changing the environment variables.
You can change TEST_INFRA_VERSION to any label you would like to apply to your
images.
The controller requires a configuration file to be included in the controller image. This configuration file can be generated from a template as follows:
${GOCMD:-go} run config/cmd/configure.go \
-version="${TEST_INFRA_VERSION}" \
-init-image-prefix="${INIT_IMAGE_PREFIX}" \
-build-image-prefix="${BUILD_IMAGE_PREFIX}" \
-run-image-prefix="${RUN_IMAGE_PREFIX}" \
-kill-after="${KILL_AFTER}" \
-validate=true \
config/defaults_template.yaml \
config/defaults.yamlThis step must be completed before building and pushing images.
The structure of the configuration file can be seen in defaults_template.yaml.
The controller configuration contains default pool labels (see
cluster setup) and the value of KILL_AFTER, in addition to
the location of images generated when
building and pushing images.
The controller binary can be built and tested with the following command:
make all testImages can be built and pushed to an image repository with the following command:
make all-images push-all-imagesThe set of images includes the controller, the driver runtime, and a ready container image used by the driver. These images must be included in any build.
The set of images also includes build and runtime images for every supported language, plus a language-agnostic clone container image. These images are necessary to run any tests that do not use pre-built images, such as the examples.
The complete set of images built for v1.5.7 is as follows:
us-docker.pkg.dev/grpc-testing/e2etest/init-build-csharp:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/init-build-dotnet:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/init-build-node:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/init-build-php7:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/init-build-ruby:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/init-clone:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/init-ready:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/controller:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/cxx:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/dotnet:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/driver:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/java:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/node:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/php7:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/python:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/ruby:v1.5.7This should match what is included in the controller configuration.
PSM benchmarks images are not required for the controller deployment, but are required if you intend to run PSM benchmarks.
Images can be built and pushed with the following command:
make all-psm-images push-all-psm-imagesThe complete set of PSM images built for v1.5.7 is as follows:
us-docker.pkg.dev/grpc-testing/e2etest/sidecar:v1.5.7
us-docker.pkg.dev/grpc-testing/e2etest/xds-server:v1.5.7Note: PSM images are pushed by default to the location specified by
RUN_IMAGE_PREFIX. You can change this location by setting the variablePSM_IMAGE_PREFIX.
The following command deletes all previous deployments from the cluster:
kubectl -n test-infra-system delete deployments,prometheus --allThis is an optional step, but may be advisable, so we can start from a clean deployment.
PSM benchmarks require a Prometheus Operator deployment, in addition to the controller. If you delete all previous deployments, you will need to to deploy it again.
Assuming that you are connected to the cluster where you want to deploy, the controller can be deployed as follows:
make deploy installThis step depends only on TEST_INFRA_VERSION and RUN_IMAGE_PREFIX.
The command above can also be used to deploy an existing version of the controller. In this case, the environment variables should point to the location of the controller binary.
PSM benchmarks require a Prometheus Operator deployment. This can be deployed as follows:
make install-prometheus-crds deploy-prometheusIf the CRDs have already been created, they will need to be uninstalled first:
make undeploy-prometheus uninstall-prometheus-crdsAlternatively, you can keep the previous CRDs and only deploy the operator.
You can verify that the deployment started by running the following command:
kubectl -n test-infra-system get deploymentsYou should eventually see 1/1 for the READY column, in the command output:
NAME READY UP-TO-DATE AVAILABLE AGE
controller-manager 1/1 1 1 18sVerify that the deployment is running in the system node pool by running the
following command:
kubectl get pods -n test-infra-system \
-o custom-columns=NAME:.metadata.name,NODE:.spec.nodeNameThe node names in the response should contain the word system:
kubectl get pods -n test-infra-system \
-o custom-columns=NAME:.metadata.name,NODE:.spec.nodeNameIt may take a while for the deployment to start. If it does not, you will need
to debug the deployment by checking the description of its pod and the logs of
its manager container. The deployment runs in namespace test-infra-system.
You can verify that Prometheus started by running the following command:
kubectl get services -n test-infra-systemYou should eventually see following resources in the command output and a
CLUSTER-IP has been assigned to service/prometheus:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus ClusterIP 10.20.7.234 <none> 9090/TCP 9d
prometheus-operated ClusterIP None <none> 9090/TCP 9d
prometheus-operator ClusterIP None <none> 8080/TCP 9dAfter all resources are up and running, you can forward the port with the following command:
kubectl port-forward service/prometheus 9090:9090 -n test-infra-system
Then you can verify that Prometheus is serving metrics by navigating to its metrics endpoint: http://localhost:9090/graph.
Verify that the deployment is able to run a test by running the example Go test.
The easiest way to run the test is with the test runner:
-
Build the test runner binary:
make runner
-
Run the example test:
./bin/runner -i config/samples/ruby_example_loadtest.yaml -c :1 --delete-successful-tests -o sponge_log.xml
Alternatively, you can apply the test to the cluster and monitor for completion:
-
Start the test:
kubectl apply -f config/samples/go_example_loadtest.yaml
-
Check the status of the test:
kubectl get loadtest -l prefix=examples,language=go -o jsonpath='{range .items[*]}{.status.state}{"\n"}{end}'Initially, the status should show
Running. -
Repeat the previous step until the status changes to
Succeeded. -
Delete the test:
kubectl delete loadtest -l prefix=examples,language=go