Skip to content

GrainServiceFactory uses non-deterministic Array.Find for interface resolution, causing "SystemTarget not active" with interface hierarchies #9904

@danielmeza

Description

@danielmeza

Summary

GrainServiceFactory in GrainServicesSiloBuilderExtensions.cs uses Array.Find to locate which IGrainService-derived interface a GrainService implementation exposes. When the implementation type's interface hierarchy contains more than one interface that transitively extends IGrainService, Array.Find picks whichever appears first in reflection order — which is non-deterministic. This causes the factory to compute a different typeCode than what GrainServiceClient<T> expects, resulting in:

OrleansMessageRejectionException: SystemTarget sys.svc.user.XXXXXXXX/... not active on this silo.

Orleans Version

10.0.0

Reproduction

Interface hierarchy

// Base interface — extends IGrainService
public interface IDeviceMqttService : IGrainService
{
    ValueTask SendFirmwareUpdate(FirmwareUpdateCommand command);
}

// Leaf interface — inherits IGrainService through IDeviceMqttService
public interface IConcentratorMqttService : IDeviceMqttService
{
    Task SendDiscoveryRequest(DiscoverySequenceRequest request);
}

// Implementation
public class ConcentratorMqttService : GrainService, IConcentratorMqttService { /* ... */ }

// Client — generic parameter is the LEAF interface
public class ConcentratorMqttServiceClient : GrainServiceClient<IConcentratorMqttService> { /* ... */ }

Registration

siloBuilder.AddGrainService<ConcentratorMqttService>();

What happens

  1. Factory (GrainServicesSiloBuilderExtensions.cs L26-39):

    var grainServiceInterfaceType = Array.Find(
        serviceType.GetInterfaces(),
        x => x.GetInterfaces().Contains(typeof(IGrainService)));

    Both IDeviceMqttService and IConcentratorMqttService satisfy the predicate (both have IGrainService in their interface chain). Array.Find returns either one non-deterministically.

  2. Client (GrainServiceClient.cs L24-30):

    var grainTypeCode = GrainInterfaceUtils.GetGrainClassTypeCode(typeof(TGrainService));
    // TGrainService = IConcentratorMqttService (always deterministic)
  3. If the factory picks IDeviceMqttService (the parent), it registers the GrainService under typeCode X. The client computes typeCode Y from IConcentratorMqttService. The silo responds with "SystemTarget not active" because no GrainService exists at typeCode Y.

Observed behavior

  • MQTT-initiated operations (GrainService → Grain) work because they bypass the client lookup.
  • API-initiated operations (Grain → GrainServiceClient<IConcentratorMqttService> → GrainService) fail with OrleansMessageRejectionException.
  • The behavior is non-deterministic across silo restarts since GetInterfaces() reflection ordering is not guaranteed by the CLR.

Workaround

Remove : IGrainService from intermediate interfaces. Only the leaf interface referenced by GrainServiceClient<T> should extend IGrainService:

// Base interface — plain interface, no IGrainService
public interface IDeviceMqttService
{
    ValueTask SendFirmwareUpdate(FirmwareUpdateCommand command);
}

// Only the leaf extends IGrainService
public interface IConcentratorMqttService : IDeviceMqttService, IGrainService
{
    Task SendDiscoveryRequest(DiscoverySequenceRequest request);
}

Suggested Fix

The GrainServiceFactory should resolve the ambiguity. Options:

Option 1: Pick the most-derived interface

private static IGrainService GrainServiceFactory(Type serviceType, IServiceProvider services)
{
    var candidates = serviceType.GetInterfaces()
        .Where(x => x.GetInterfaces().Contains(typeof(IGrainService)))
        .ToArray();

    var grainServiceInterfaceType = candidates.Length switch
    {
        0 => throw new InvalidOperationException(
            $"Type {serviceType.FullName} does not implement any IGrainService-derived interface."),
        1 => candidates[0],
        _ => candidates.FirstOrDefault(c => !candidates.Any(other => other != c && other.GetInterfaces().Contains(c)))
            ?? throw new InvalidOperationException(
                $"Ambiguous IGrainService interfaces on {serviceType.FullName}: " +
                $"{string.Join(", ", candidates.Select(c => c.FullName))}.")
    };

    var typeCode = GrainInterfaceUtils.GetGrainClassTypeCode(grainServiceInterfaceType);
    var grainId = SystemTargetGrainId.CreateGrainServiceGrainId(typeCode, null, SiloAddress.Zero);
    return (IGrainService)ActivatorUtilities.CreateInstance(services, serviceType, grainId);
}

Option 2: Throw on ambiguity

If multiple interfaces match, throw a clear error at registration time rather than silently picking the wrong one at runtime.

Why existing tests don't catch this

The test suite (TestGrainService.cs) uses flat interface hierarchies (ITestGrainService : IGrainService) with no intermediate interfaces, so Array.Find always has exactly one match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions