Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
"nonoverlapping",
"nonterminal",
"peekable",
"repr",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of a common rust attribute for enums. It's like specifying an underlying type in Slice. You specify what it's in-memory representation is.

"rfind",
"rsplit",
"RUSTDOCFLAGS",
Expand Down
24 changes: 23 additions & 1 deletion slice-codec/src/slice2/encoding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,14 @@ use crate::encode_into::*;
use crate::encoder::Encoder;
use crate::{Error, InvalidDataErrorKind, Result};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoder already supported the 'view' types: str and [],
but I updated it to also support the 'owned types: String and Vec.

This is not strictly necessary, but makes the API more convenient, and more consistent (which let use a macro). With this change you can write
encoder.encode(&myVec) instead of encoder.encode(myVec.as_slice()).

Same for String and as_str.


// We only support `BTreeMap` if the `alloc` crate is available through the `alloc` feature flag.
// We only support 'owned' sequence/dictionary types if the `alloc` crate is available through the `alloc` feature flag.
// Note that we always support encoding views into these types (which don't require allocating memory).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence makes sense to me, but might be confusing.
It should be read "... support encoding views-into-these-types ..."
Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to me. You encode into a byte buffer, not into a "type".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it would make sense if it says encoding view of as i you can always encode a view of a sequence, isn't that the point?

#[cfg(feature = "alloc")]
use alloc::collections::BTreeMap;
#[cfg(feature = "alloc")]
use alloc::string::String;
#[cfg(feature = "alloc")]
use alloc::vec::Vec;

// We only support `HashMap` if the standard library is available through the `std` feature flag.
#[cfg(feature = "std")]
Expand Down Expand Up @@ -151,6 +156,13 @@ impl EncodeInto<Slice2> for &str {
}
}

#[cfg(feature = "alloc")]
impl EncodeInto<Slice2> for &String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely get rid of all these Slice2.

fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> {
self.as_str().encode_into(encoder)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding of &String just delegates to the encoding for &str.

This is purely to save the caller from needing to do the conversion themselves.

}
}

/// TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the PR didn't change it, but what is the this TODO about, the missing doc comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just that we should add doc-comments. I could remove these TODOs, but might as well wait until I do the full cleanup & add doc-comments I figure.

impl<'a, T> EncodeInto<Slice2> for &'a [T]
where
Expand All @@ -166,6 +178,16 @@ where
}
}

#[cfg(feature = "alloc")]
impl<'a, T> EncodeInto<Slice2> for &'a Vec<T>
where
&'a T: EncodeInto<Slice2>,
{
Comment on lines +182 to +185
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's powerful 'blanket-impl' syntax at work here. This translates to:
"Implement the EncodeInto trait for any &Vec<T>, such that &T itself already implements EncodeInto."

And we have to tie the lifetimes of the elements to the vector they're contained within (No one else can mess with the vec while we're encoding it's elements).

fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> {
self.as_slice().encode_into(encoder)
}
}

// =============================================================================
// Dictionary type implementations
// =============================================================================
Expand Down
332 changes: 332 additions & 0 deletions slicec/src/definition_types.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
// Copyright (c) ZeroC, Inc.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file, the most important thing to scrutinize is that the encoding logic is correct.


//! This module contains the handwritten encoding code for the Slice-compiler definitions.
//! After rust code-gen has been implemented, this file will be deleted, and we will use generated definitions instead.

use slice_codec::slice2::Slice2;

use slice_codec::buffer::OutputTarget;
use slice_codec::encode_into::*;
use slice_codec::encoder::Encoder;
use slice_codec::Result;

/// TAG_END_MARKER must be encoded at the end of every non-compact type.
const TAG_END_MARKER: i32 = -1;

/// This macro implements `EncodeInto<Slice2>` for a Rust struct (which is mapped from a non-compact Slice struct).
/// It encodes all the struct's fields (in definition order), followed by `TAG_END_MARKER`.
///
/// It uses macro-function-syntax, and should be called like:
/// `implement_encode_into_for_struct!(struct_type_name, field1, field2, ...);`
macro_rules! implement_encode_into_for_struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to use the same approach for the generated code? If so, the macro should live in the codec library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, probably not. Haven't 100% decided yet, but I think it'd be better for the generated code to just generate the encoding normally, and not use a macro. Thoughts?

This macro is just for my own convenience. You can even see it as a kind of "psuedo code-gen".

($type_name:ty$(, $field_name:ident)*$(,)?) => {
Comment on lines +21 to +22
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 lines give the macro a name, and describes the syntax to invoke it with. It should be passed a ty (Type) token, followed by 0 or more ident tokens, each separated by a comma. It supports an optional trailing comma $(,)?.

impl EncodeInto<Slice2> for &$type_name {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
$(encoder.encode(&self.$field_name)?;)*
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}
}
}

// ================= //
// Hand-mapped types //
// ================= //

pub type EntityId = String;
pub type TypeId = String;
pub type Message = Vec<MessageComponent>;
Comment on lines +37 to +39
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping for the typealiases we use in the Slice files.

Followed by each of the mapped types. These should exactly correspond to their Slice-file counterparts, so I only added comments where something was worth mentioning.


#[derive(Clone, Debug)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good practice in Rust to pro-actively implement these two traits where-ever possible.

Note that Clone does not mean Rust will make copies of the thing automatically, that's the Copy trait instead.

pub struct Attribute {
pub directive: String,
pub args: Vec<String>,
}
implement_encode_into_for_struct!(Attribute, directive, args);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple structs, we just use the macro to implement the encoding function.


#[derive(Clone, Debug)]
pub struct TypeRef {
pub type_id: TypeId,
pub is_optional: bool,
pub type_attributes: Vec<Attribute>,
}
implement_encode_into_for_struct!(TypeRef, type_id, is_optional, type_attributes);

#[derive(Clone, Debug)]
pub struct EntityInfo {
pub identifier: String,
pub attributes: Vec<Attribute>,
pub comment: Option<DocComment>,
}
impl EncodeInto<Slice2> for &EntityInfo {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
encoder.encode(&self.identifier)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a conscious choice to omit types, since this is common in Rust.
If we think this is too unclear, our encode function supports being called with an explicit type:
encoder.encode::<&String>(&self.identifier)
More of a thought for whether the codegen should include types or not.

encoder.encode(&self.attributes)?;
// encoder.encode_tagged(0, &self.comment)?; TODO add doc-comments after adding tag encoding support.
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Module {
pub identifier: String,
pub attributes: Vec<Attribute>,
}
implement_encode_into_for_struct!(Module, identifier, attributes);

#[derive(Clone, Debug)]
pub struct Struct {
pub entity_info: EntityInfo,
pub is_compact: bool,
pub fields: Vec<Field>,
}
implement_encode_into_for_struct!(Struct, entity_info, is_compact, fields);

#[derive(Clone, Debug)]
pub struct Field {
pub entity_info: EntityInfo,
pub tag: Option<i32>, // TODO: varint32 isn't a real type?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is not real?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove this comment and open a proposal to add a varint32 (etc.) type to the Rust mapping.

I think it's cleaner to have a separate type for this, instead of needing to call special encode_varint functions. It makes the API more consistent (you always call encoder.encode, no special functions), and will let users encode Rust structs directly without needing Slice files (otherwise, there's no way to specify whether an i32 should use the int32 or the varint32 encoding). That's a future proposal though.

pub data_type: TypeRef,
}
impl EncodeInto<Slice2> for &Field {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Encode the bit-sequence. With only one optional, this is just a bool.
encoder.encode(self.tag.is_some())?;

// Encode the actual fields.
encoder.encode(&self.entity_info)?;
if let Some(tag_value) = self.tag {
encoder.encode_varint(tag_value)?;
}
encoder.encode(&self.data_type)?;
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Interface {
pub entity_info: EntityInfo,
pub bases: Vec<EntityId>,
pub operations: Vec<Operation>,
}
implement_encode_into_for_struct!(Interface, entity_info, bases, operations);

#[derive(Clone, Debug)]
pub struct Operation {
pub entity_info: EntityInfo,
pub is_idempotent: bool,
pub parameters: Vec<Field>,
pub has_streamed_parameter: bool,
pub return_type: Vec<Field>,
pub has_streamed_return: bool,
}
implement_encode_into_for_struct!(
Operation,
entity_info,
is_idempotent,
parameters,
has_streamed_parameter,
return_type,
has_streamed_return,
);

#[derive(Clone, Debug)]
pub struct Enum {
pub entity_info: EntityInfo,
pub is_compact: bool,
pub is_unchecked: bool,
pub underlying: Option<TypeId>,
pub enumerators: Vec<Enumerator>,
}
impl EncodeInto<Slice2> for &Enum {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Encode the bit-sequence. With only one optional, this is just a bool.
encoder.encode(self.underlying.is_some())?;

// Encode the actual fields.
encoder.encode(&self.entity_info)?;
encoder.encode(self.is_compact)?;
encoder.encode(self.is_unchecked)?;
if let Some(underlying_value) = &self.underlying {
encoder.encode(underlying_value)?;
}
encoder.encode(&self.enumerators)?;
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Enumerator {
pub entity_info: EntityInfo,
pub value: Discriminant,
pub fields: Vec<Field>,
}
implement_encode_into_for_struct!(Enumerator, entity_info, value, fields);

#[derive(Clone, Debug)]
pub struct Discriminant {
pub absolute_value: u64,
pub is_positive: bool,
}
implement_encode_into_for_struct!(Discriminant, absolute_value, is_positive);

#[derive(Clone, Debug)]
pub struct CustomType {
pub entity_info: EntityInfo,
}
implement_encode_into_for_struct!(CustomType, entity_info);

#[derive(Clone, Debug)]
pub struct TypeAlias {
pub entity_info: EntityInfo,
pub underlying_type: TypeRef, // Can never be optional.
}
implement_encode_into_for_struct!(TypeAlias, entity_info, underlying_type);

#[derive(Clone, Debug)]
pub struct SequenceType {
pub element_type: TypeRef,
}
implement_encode_into_for_struct!(SequenceType, element_type);

#[derive(Clone, Debug)]
pub struct DictionaryType {
pub key_type: TypeRef, // Can never be optional.
pub value_type: TypeRef,
}
implement_encode_into_for_struct!(DictionaryType, key_type, value_type);

#[derive(Clone, Debug)]
pub struct ResultType {
pub success_type: TypeRef,
pub failure_type: TypeRef,
}
implement_encode_into_for_struct!(ResultType, success_type, failure_type);

#[derive(Clone, Debug)]
pub struct DocComment {
pub overview: Message,
pub see_tags: Vec<EntityId>,
}
implement_encode_into_for_struct!(DocComment, overview, see_tags);

#[repr(u8)]
#[derive(Clone, Debug)]
pub enum MessageComponent {
Text(String) = 0,
Link(EntityId) = 1,
}
impl EncodeInto<Slice2> for &MessageComponent {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Write the discriminant value.
// SAFETY: this cast is guaranteed to be safe because the enum is marked with `repr(u8)`, so it's safe to cast
// it directly to a `u8`.
unsafe {
let discriminant = *<*const _>::from(self).cast::<u8>();
encoder.encode_varint(discriminant)?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Review — Critical: Unsafe discriminant reading is unsound

For data-carrying #[repr(u8)] enums, Rust does not guarantee the discriminant is stored at byte offset 0. This unsafe pointer cast can be UB.

Safe alternative:

let discriminant: u8 = match self {
    MessageComponent::Text(_) => 0,
    MessageComponent::Link(_) => 1,
};
encoder.encode_varint(discriminant)?;

Same issue applies to Symbol::encode_into below.


// Encode the actual value.
match self {
MessageComponent::Text(v) => encoder.encode(v)?,
MessageComponent::Link(v) => encoder.encode(v)?,
}

encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct SliceFile {
pub path: String,
pub module_declaration: Module,
pub attributes: Vec<Attribute>,
pub contents: Vec<Symbol>,
}
implement_encode_into_for_struct!(SliceFile, path, module_declaration, attributes, contents);

#[derive(Clone, Debug)]
pub struct GeneratedFile {
pub path: String,
pub contents: String,
}
implement_encode_into_for_struct!(GeneratedFile, path, contents);

#[repr(u8)]
#[derive(Clone, Debug)]
pub enum Symbol {
Interface(Interface) = 0,
Enum(Enum) = 1,
Struct(Struct) = 2,
CustomType(CustomType) = 3,
SequenceType(SequenceType) = 4,
DictionaryType(DictionaryType) = 5,
ResultType(ResultType) = 6, // TODO make result come before dictionary!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to fix the ordering in the Slice files as well.
Long ago, we agreed the canonical order should be 'result', 'sequence', 'dictionary'. But I forgot this while writing the compiler definitions...

TypeAlias(TypeAlias) = 7,
}
impl EncodeInto<Slice2> for &Symbol {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Write the discriminant value.
// SAFETY: this cast is guaranteed to be safe because the enum is marked with `repr(u8)`, which means we know
// the first 'field' of this type's data layout must be a u8. This lets us read without offsetting the pointer.
unsafe {
let discriminant = *<*const _>::from(self).cast::<u8>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Review — Critical: Same unsound unsafe discriminant read as MessageComponent

Same fix — use a safe match:

let discriminant: u8 = match self {
    Symbol::Interface(_) => 0,
    Symbol::Enum(_) => 1,
    Symbol::Struct(_) => 2,
    Symbol::CustomType(_) => 3,
    Symbol::SequenceType(_) => 4,
    Symbol::DictionaryType(_) => 5,
    Symbol::ResultType(_) => 6,
    Symbol::TypeAlias(_) => 7,
};
encoder.encode_varint(discriminant)?;

encoder.encode_varint(discriminant)?;
}

// Encode the actual value.
match self {
Symbol::Interface(v) => encoder.encode(v)?,
Symbol::Enum(v) => encoder.encode(v)?,
Symbol::Struct(v) => encoder.encode(v)?,
Symbol::CustomType(v) => encoder.encode(v)?,
Symbol::SequenceType(v) => encoder.encode(v)?,
Symbol::DictionaryType(v) => encoder.encode(v)?,
Symbol::ResultType(v) => encoder.encode(v)?,
Symbol::TypeAlias(v) => encoder.encode(v)?,
}

encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Diagnostic {
pub level: DiagnosticLevel,
pub message: String,
pub source: Option<String>,
}
impl EncodeInto<Slice2> for &Diagnostic {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Encode the bit-sequence. With only one optional, this is just a bool.
encoder.encode(self.source.is_some())?;

// Encode the actual fields.
encoder.encode(&self.level)?;
encoder.encode(&self.message)?;
if let Some(source_value) = &self.source {
encoder.encode(source_value)?;
}
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[repr(u8)]
#[derive(Clone, Copy, Debug)]
pub enum DiagnosticLevel {
Info,
Warning,
Error,
}
impl EncodeInto<Slice2> for &DiagnosticLevel {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
encoder.encode(*self as u8)
}
}
Loading