Skip to content

nip29: audio/video live spaces#2238

Draft
fiatjaf wants to merge 6 commits intomasterfrom
voice-rooms
Draft

nip29: audio/video live spaces#2238
fiatjaf wants to merge 6 commits intomasterfrom
voice-rooms

Conversation

@fiatjaf
Copy link
Copy Markdown
Member

@fiatjaf fiatjaf commented Feb 25, 2026

No description provided.

@staab
Copy link
Copy Markdown
Member

staab commented Feb 25, 2026

Adding sub-types to rooms makes feature support combinatorially complex. Clients now have to handle text, audio, and text+audio rooms, which require different design decisions (and existing clients will only support text-only rooms). This wouldn't be a problem if it was only text and audio, but text+audio means that there will be missing context for clients that only do text. Still, I don't see how we can avoid this, people are going to want to pair audio and text sometimes and not others, and the only way to keep implementations clean would be to have a different kind for every combination.

One other modification would be to put the token endpoint in the livekit tag rather than implicitly locate it at the relay's well-known URL. This would allow for better decoupling of the relay implementation and the livekit integration. By default, relays can still put their own url in the tag, but other relays might want to use a third party server, or allow a third party (like a community management bot, or some authorized key) to define which url to use. I could see this same mechanism being used for blossom servers and other parallel infrastructure as well.

@mplorentz
Copy link
Copy Markdown
Collaborator

I think we should say something about publishing kind 30311 live activity and 10312 room presence events from NIP-53 with appropriate h tags on those events. I can't think of a great use case where you wouldn't want that information to be published to your group. If you are concerned about people in the group knowing that a call happened or who was in it then you should probably be doing a voice call (which is a different thing) rather than joining a voice room in a community space. (I think it's good practice to have expiration tags on these kinds of events anyway)

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Feb 25, 2026

Adding sub-types to rooms makes feature support combinatorially complex. Clients now have to handle text, audio, and text+audio rooms, which require different design decisions (and existing clients will only support text-only rooms). This wouldn't be a problem if it was only text and audio, but text+audio means that there will be missing context for clients that only do text. Still, I don't see how we can avoid this, people are going to want to pair audio and text sometimes and not others, and the only way to keep implementations clean would be to have a different kind for every combination.

A different kind of what, may I ask? Everything is already a pool of infinite complexity as it is. There are already a dozen different tags in the group announcement event that are probably not fully implemented in all clients. This is just a small cosmetic change.

It is just a small cosmetic change.

@staab
Copy link
Copy Markdown
Member

staab commented Feb 25, 2026

I was thinking of a different group kind. 39000, 3900x, 3900y, etc., with all the companion kinds. Madness, really.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Feb 25, 2026

something about publishing kind 30311 live activity

These kinds are vastly underspecified, so adding them helps nothing. They don't even say what you have to do with the URL. It's a non-spec. We would have to create another event kind anyway.

Can we just publish a kind defining the existence of an AV space in the group? Yes, but then who owns that space? The person who published the event. Now we have multiple owners creating rooms inside the same group. It's a complete nightmare.

I guess it's ok to have a one-off event for an ephemeral space, but that should be a different spec from permanent group-like spaces.

I'm trying to come up with the simplest leanest possible standard here, and that means the most specific and restrict too.

One other modification would be to put the token endpoint in the livekit tag rather than implicitly locate it at the relay's well-known URL.

I don't see any use case that justifies having this unnecessary flexibility that just makes implementations more cumbersome. The relay already keeps track of who belongs in each group, there is nothing a "community management bot" can add to the equation.

Remember that this is for NIP-29. If we were talking about some other non-NIP-29 scenario then yes, it could have another shape.

@purrgrammer
Copy link
Copy Markdown
Contributor

Adding sub-types to rooms makes feature support combinatorially complex

It's just adding one more opt-in kind (AV) that has huge demand, this should've been in NIP-29 from the start imo. Better late than never.

Clients can choose to ignore the livekit and no-text tags and keep functioning as regular groups. I think of audio as an additional layer on top of the group space.

These kinds are vastly underspecified, so adding them helps nothing. They don't even say what you have to do with the URL. It's a non-spec. We would have to create another event kind anyway.

Yeah, I prefer a self-contained spec. NIP-29 clients shouldn't have to worry about a whole other NIP just to support AV.

One other modification would be to put the token endpoint in the livekit tag rather than implicitly locate it at the relay's well-known URL. This would allow for better decoupling of the relay implementation and the livekit integration.

The relay is already the authority and audio rooms are scoped to a group, they are inherently coupled.

My two sats. I like this proposal.

@vitorpamplona
Copy link
Copy Markdown
Collaborator

ACK.

This change is the first time I read some NIP-29 change and understood what I should do.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 2, 2026

Implemented this in fiatjaf/pyramid@4ac5418.

I still have to come up with a way to try it.

@purrgrammer
Copy link
Copy Markdown
Contributor

I still have to come up with a way to try it.

I'll implement it in Chachi, is your pyramid instance already running this code?

@mplorentz
Copy link
Copy Markdown
Collaborator

The thing I want out of the NIP-53 presence is to be able to display a list of who is currently in a space. Like this:
Screenshot 2026-03-02 at 9 16 28 AM

But I don't care if it's NIP-53, a new event kind, or the LiveKit listParticipants API.

If we want to use the LiveKit API for presence we should say in the NIP something like: "Relays MUST set the sub property on the issued JWT to the requesting user's hexadecimal public key". This ensures that when you call listParticipants you get back the public keys of the people in the room in the identity property of each participant, which makes it much harder to impersonate another key. (This seems good to standardize on either way)

Alternative we could say something like "When joining a livekit space Clients SHOULD publish a kind 39311 event signaling their presence in the call. When leaving they should publish a NIP-09 delete request for the presence event."

The presence event would look something like this:

{
  "kind": 39311,
  "pubkey": <32-bytes hex-encoded public key of the event creator>,
  "tags": [
    ["h", "<group-id>"],
    ["expiration", "<optional-expiration-date>"]
  ],
  "content": "",
}

The presence event seems like a much better solution than the Livekit API because clients can just subscribe to 39311 via websockets rather than polling an API. And then you don't need to do the JWT dance and leak info to the livekit server every time you open the client, only when joining a voice call.

@mplorentz
Copy link
Copy Markdown
Collaborator

Also I'm working on adding this to zooid + flotilla today. Keen to test compatibility when I'm done.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 2, 2026

I'll implement it in Chachi, is your pyramid instance already running this code?

Yes. I'll try to test it soon so you don't have to hit too many bugs, but I'm feeling somewhat sick today.

If we want to use the LiveKit API for presence we should say in the NIP something like: "Relays MUST set the sub property on the issued JWT to the requesting user's hexadecimal public key".

I've added this. Thank you for pointing out.

Maybe in the future we should say something about the other JWT properties the LiveKit server understands. I haven't paid much attention to them yet.

The presence event seems like a much better solution than the Livekit API

Oops, I hadn't read this. Do you really think this? The presence event feels much more cumbersome to me, adds a lot of room for errors and flakiness, like clients failing to publish, publishing but not joining, or failing to delete these events, or adding extra requirements to relays for accepting and expiring these events (or storing the deletion requests forever unnecessarily), very error-prone. And since we're already trusting a LiveKit server entirely I think it can handle this.

On the other hand I don't like polling either. I think it could be less bad if it was a kind: 10031 with an h tag and if instead of deleting you updated it to remove the h tag. Still unclear what to do if your computer just crashes in the middle of the call.

@mplorentz
Copy link
Copy Markdown
Collaborator

The presence event feels much more cumbersome to me, adds a lot of room for errors and flakiness, like clients failing to publish, publishing but not joining, or failing to delete these events, or adding extra requirements to relays for accepting and expiring these events (or storing the deletion requests forever unnecessarily), very error-prone.

The replaceable event (I think that's what you are suggesting) does seem more reliable (setting aside all the existing issues with replaceable events). Alternatively we could say that you must include an expiration tag on the presence event, and even the delete request associated with it. Or that all presence events should only be treated as valid for up to an hour and then deleted. In any case these things should not leave any trace on the relay long term.

@mplorentz
Copy link
Copy Markdown
Collaborator

Another thing I think we might need is an endpoint for checking whether the relay has livekit configured or not. I'm using this to decide when to show the user voice room creation options. I tried hitting /.well-known/nip29/livekit/{groupId} with a dummy group ID but it is not a given that all relay implementations will return a 404 in that case.

I propose something like "If the relay receives a request with no groupId (GET /.well-known/nip29/livekit/) it should return HTTP 204 with an empty body if livekit is configured, and HTTP 404 if it is not.

@staab
Copy link
Copy Markdown
Member

staab commented Mar 3, 2026

Putting the token url in the group event tags would fix detection, and probing wouldn't even be necessary

@mplorentz
Copy link
Copy Markdown
Collaborator

Putting the token url in the group event tags would fix detection, and probing wouldn't even be necessary

How would that work when you are creating a new group and the 39000 doesn't exist yet? I have a selection box for the room type like this:
Screenshot 2026-03-03 at 3 59 00 PM

@staab
Copy link
Copy Markdown
Member

staab commented Mar 3, 2026

Since the group create/edit event and the 39000 are separate, the relay can inject the token url into the livekit tag, or the admin can set it manually if he prefers. As far as classifying room type, it's probably fine to do that on edit rather than create. It need not even be immutable.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 4, 2026

I propose something like "If the relay receives a request with no groupId (GET /.well-known/nip29/livekit/) it should return HTTP 204 with an empty body if livekit is configured, and HTTP 404 if it is not.

Makes sense.

@wcat7
Copy link
Copy Markdown
Contributor

wcat7 commented Mar 7, 2026

ACK. I support this.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 8, 2026

@mplorentz the solution to all the problems of knowing who is online just came to my mind in a dream: let's just have the relay publish an event kind 39006 saying who is currently online in the group -- and keep that updated as the list of online participants change.

Would that work?

@purrgrammer
Copy link
Copy Markdown
Contributor

let's just have the relay publish an event kind 39006 saying who is currently online in the group -- and keep that updated as the list of online participants change.

sounds good to me, we probably don't need granular join/leave events for the room.

@purrgrammer
Copy link
Copy Markdown
Contributor

One thing I noticed while joining the same room with the same pubkey from different clients is that, as soon as I join in the second client, i am "kicked out" of the room in the first. It's not uncommon to join the same av room from two different devices (mobile/desktop), is this behaviour some limitation of LiveKit or intentional? I have a suspicion that since we are identifying clients by pubkey only one client with the same pubkey is allowed to be in the room at a time but I might be wrong. cc @fiatjaf

@mplorentz
Copy link
Copy Markdown
Collaborator

@mplorentz the solution to all the problems of knowing who is online just came to my mind in a dream: let's just have the relay publish an event kind 39006 saying who is currently online in the group -- and keep that updated as the list of online participants change.

Would that work?

I think this sounds ok. Does the relay need to "join" the call to see the list of subscribers?

@staab
Copy link
Copy Markdown
Member

staab commented Mar 10, 2026

as soon as I join in the second client, i am "kicked out" of the room in the first

I don't know the answer to the question, but we could solve this by generating a random device ID.

@staab
Copy link
Copy Markdown
Member

staab commented Mar 10, 2026

We're running into some interesting design challenges with the no-text/livekit stuff. I think we should remove no-text tag, and just allow people to do kind-based granular content policies instead (e.g., kind 39000 would include k tags for the kinds of events permitted). This means that we can't have rooms with voice but no text, but I think that's ok, clients can still be creative about how text in voice rooms is rendered. You could have the full room UI, or do something more ephemeral. Anyway, it moves it from a data model challenge to a UI challenge.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 10, 2026

as soon as I join in the second client, i am "kicked out" of the room in the first

I don't know the answer to the question, but we could solve this by generating a random device ID.

If we're going to have the relay publish the list of participants instead of clients reading that from the LiveKit server directly then we can get rid of the jwt-identify=pubkey clause, right? Then this is solved.

We're running into some interesting design challenges with the no-text/livekit stuff. I think we should remove no-text tag, and just allow people to do kind-based granular content policies instead (e.g., kind 39000 would include k tags for the kinds of events permitted).

I think this is good, yes. The no-text is not a great approach. And having granular kinds solves other problems unrelated to audio rooms. The default should be "everything is allowed, deal with it". But if the "kinds" tag is present then it should contain all supported kinds. Does that sound ok?

@purrgrammer
Copy link
Copy Markdown
Contributor

If we're going to have the relay publish the list of participants instead of clients reading that from the LiveKit server directly then we can get rid of the jwt-identify=pubkey clause, right? Then this is solved.

Sounds good, so the p-tag in the list has the key(s) the user is connected from? We'll need that to show who is who in the call.

having granular kinds solves other problems unrelated to audio rooms. The default should be "everything is allowed, deal with it". But if the "kinds" tag is present then it should contain all supported kinds. Does that sound ok?

I like where this is going, but it's probably worth opening a separate PR/discussion to flesh it out. @staab wanna take a staab at it? 🥁

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 11, 2026

Please take a look at the last two commits. Just fleshing these two things out. Very simple changes.

I'll try to implement kind:39004 now.

@mplorentz
Copy link
Copy Markdown
Collaborator

I think the sub property on the JWT does need to contain the user's pubkey in some form. How about as a prefix? This allows us to map livekit identities to pubkeys reliably to do things like put a border around the face of the person who is speaking. And if a relay doesn't implement the 39004 (which is right now a MAY) we can fall back to the pubkey in sub to show the name and profile pic of people in the call after you have joined.

@mplorentz
Copy link
Copy Markdown
Collaborator

mplorentz commented Mar 11, 2026

It looks like you can set arbitrary keys and values on a livekit participant at JWT time and these values are available on the Participant further down the line (like in the active speaker callback) although I haven't tested it yet. Maybe the best thing to do is set sub to a random value and put {"pubkey": "0289073190283..."} in the attributes field.

@staab staab mentioned this pull request Mar 11, 2026
@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 11, 2026

And if a relay doesn't implement the 39004 (which is right now a MAY)

Good point.

Let's go back to having the pubkey at the sub? I think just having it as the prefix (the first 64 characters) is simpler than doing the additional arbitrary metadata part, but doesn't matter much, we just have to pick one.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 12, 2026

@Anderson-Juhasc have you seen this?

@NielLiesmons
Copy link
Copy Markdown

I don't get why you guys are not just targeting NIP-53 events at Communities with an h-tag.
Those work already and support multiple use cases. Feels like copy-catting Discord and Telegram blinds out the potential better solutions often. Just my 2 cents. Sorry of this is noise.

@purrgrammer
Copy link
Copy Markdown
Contributor

you guys are not just targeting NIP-53 events at Communities with an h-tag. Those work already and support multiple use cases.

NIP-53 works fine for public broadcast type events but group av rooms have different semantics. NIP-29 has a robust, clear access control policy enforced by the relay, NIP-53 is vague about this.

These are a few reasons why I don't think NIP-53 fits here:

  • It's too generic. What is service really? It can be a lot of things: a Google Meet link? An RTMP stream? A Jitsi link? A LiveKit WebSocket? It's open to interpretation and the client must know how to deal with all of those just to support every possible scenario. Using LiveKit directly in NIP-29 makes more sense since it's a well-known, extensible, performant and robust implementation of av rooms.
  • We don't need scheduling, statuses, assumption of a host that NIP-53 has: we just want a permanent, always available av room.

I think this approach is pragmatic and solves what we need: opt-in av rooms with relay access control + participant lists. That's it. We don't need all the bells and whistles and generic things NIP-53 provides.

Feels like copy-catting Discord and Telegram blinds out the potential better solutions often.

What is better about using NIP-53 here? It makes the implementation way more complicated that it needs to be in my view.

@mplorentz
Copy link
Copy Markdown
Collaborator

It looks like we need to include the full livekit identity string in the kind 39004. Otherwise it's overly complicated to handle the EventParticipantLeft callback from livekit in the case where a user has joined the livekit room from two different devices with the same pubkey. In this case if you just do the simple thing and remove the identity that left the room from your 39004 you will have a bug because the user is still in the room on a different device.

I think the 39004 tags should change from:

    ["participant", "<pubkey>"],

to:

    ["participant", "livekit-identity"],

And we can note (again) that the first 64 chars of <livekit-identity> MUST be the users hex pubkey.

@fiatjaf
Copy link
Copy Markdown
Member Author

fiatjaf commented Mar 13, 2026

It looks like we need to include the full livekit identity string in the kind 39004. Otherwise it's overly complicated to handle the EventParticipantLeft callback from livekit in the case where a user has joined the livekit room from two different devices with the same pubkey. In this case if you just do the simple thing and remove the identity that left the room from your 39004 you will have a bug because the user is still in the room on a different device.

That's an implementation detail, right? I don't think it's useful for clients, it's only useful for the relay to keep track of things, so I don't think we should standardize it for now.

I'd say you could do that on your implementation (and I'll have to fix mine too, thank you), but others might not care, and clients should only look at the pubkey, not at the identity.

(Also whenever you get a webhook you could query the LiveKit server for the full list of online participants and recreate the kind:39004 from that, that is probably a more solid approach. There could also be a no-webhooks approach that just queries the LiveKit server every minute, for example.)

@mplorentz
Copy link
Copy Markdown
Collaborator

Yes I suppose it is an implementation detail. And I think you are right that querying livekit for the full list on every webhook is safer. Let's do that.

arthurfranca pushed a commit to 44Billion/flotilla that referenced this pull request Apr 3, 2026
This adjusts our implementation of the Livekit presence event to match the NIP (nostr-protocol/nips#2238 (comment)). Specifically we now expect the user's Nostr pubkey in the `participant` tag instead of the livekit identity string.

I also fixed a bug I found where a malformed `participant` tag would crash the rendering of VoiceWidget, causing it to appear frozen.

There is a corresponding zooid PR [here](coracle-social/zooid#11)

Co-authored-by: mplorentz <mplorentz@noreply.gitea.coracle.social>
Reviewed-on: https://gitea.coracle.social/coracle/flotilla/pulls/101
Co-authored-by: Matt Lorentz <mplorentz@noreply.coracle.social>
Co-committed-by: Matt Lorentz <mplorentz@noreply.coracle.social>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants