Skip to content

Refactor the slicec Binary to Write Encoded Slice Definitions to 'stdout'#729

Open
InsertCreativityHere wants to merge 7 commits intoicerpc:mainfrom
InsertCreativityHere:slicec-big-change
Open

Refactor the slicec Binary to Write Encoded Slice Definitions to 'stdout'#729
InsertCreativityHere wants to merge 7 commits intoicerpc:mainfrom
InsertCreativityHere:slicec-big-change

Conversation

@InsertCreativityHere
Copy link
Member

@InsertCreativityHere InsertCreativityHere commented Mar 6, 2026

This PR does what the title says! I haven't implemented the process forking (i.e. where you can pass code-gen plugins on the command-line and slicec will fork them). That's next up on the list.

But you can still pipe the output of cargo run -- Test.slice to another process, or a file.
It's obviously hard to test this without a decoder. I've been using cargo run -- Test.slice > slice-output.dat,
and opening the file in a hex editor to manually check the bytes. The bytes seem correct for normal Slice files.


Structure

This PR does 3 big things, in 2 files:

  • definition_types.rs: this file is my 'hand-mapping' the Slice Compiler definitions.
    It will eventually be replaced by generated code when we add code-gen in Rust.
    This file 1) Defines the types that will be sent on-the-wire and 2) implements the encoding of these types.
    The encoding is complete, except for DocComments which require tags (which require size/position tracking which doesn't exist yet).
  • slice_file_converter.rs is the more interesting one. This file provides conversion functions for mapping between domains. It takes types from slicec and converts them to the Slice-generated types. Most of this is straightforward, but there are some differences between them which required attention.

"nonoverlapping",
"nonterminal",
"peekable",
"repr",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of a common rust attribute for enums. It's like specifying an underlying type in Slice. You specify what it's in-memory representation is.

@@ -6,9 +6,14 @@ use crate::encode_into::*;
use crate::encoder::Encoder;
use crate::{Error, InvalidDataErrorKind, Result};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoder already supported the 'view' types: str and [],
but I updated it to also support the 'owned types: String and Vec.

This is not strictly necessary, but makes the API more convenient, and more consistent (which let use a macro). With this change you can write
encoder.encode(&myVec) instead of encoder.encode(myVec.as_slice()).

Same for String and as_str.


// We only support `BTreeMap` if the `alloc` crate is available through the `alloc` feature flag.
// We only support 'owned' sequence/dictionary types if the `alloc` crate is available through the `alloc` feature flag.
// Note that we always support encoding views into these types (which don't require allocating memory).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence makes sense to me, but might be confusing.
It should be read "... support encoding views-into-these-types ..."
Thoughts?

#[cfg(feature = "alloc")]
impl EncodeInto<Slice2> for &String {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> {
self.as_str().encode_into(encoder)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding of &String just delegates to the encoding for &str.

This is purely to save the caller from needing to do the conversion themselves.

Comment on lines +182 to +185
impl<'a, T> EncodeInto<Slice2> for &'a Vec<T>
where
&'a T: EncodeInto<Slice2>,
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's powerful 'blanket-impl' syntax at work here. This translates to:
"Implement the EncodeInto trait for any &Vec<T>, such that &T itself already implements EncodeInto."

And we have to tie the lifetimes of the elements to the vector they're contained within (No one else can mess with the vec while we're encoding it's elements).

Comment on lines +3 to +4
// Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions.
// Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's a better solution here... Alot of type names are identical between slicec and the Compiler Definitions; unsurprisingly.


// Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions.
// Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type.
#![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustfmt does not handle aliased imports well apparently. It tries to format like normal imports, which makes them unreadable.
Tbf, aliases are pretty uncommon in practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just qualify the grammar types, like in:

fn get_attributes_from(attributes: Vec<&slicec::grammar::Attribute>) -> Vec<Attribute> {

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah of course, but then it's easy to accidentally forget to qualify a type (happened multiple times), not to mention it bloats all the function signatures.

If that's what everyone prefers I'm fine with it. But that's what I tried first and didn't like it.

/// In Slice, doc-comments are not allowed on parameters. Instead, you would use a '@param' tag applied to an enclosing
/// operation. But this is an implementation detail of the language, not something code-generators should deal with.
fn get_doc_comment_for_parameter(parameter: &CompilerParameter) -> Option<DocComment> {
let operation_comment = parameter.parent().comment()?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ? is the short-circuiting operator for Option and Result types.

If you apply it an Option, and your function also returns an Option, it will immediately return for None cases (what we do here).

Same for Result and Err cases.

// =========================== //
// Direct conversion functions //
// =========================== //

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For types which are 'simple', we can directly convert between slicec and mapped types. The conventional way to do this in Rust is via the From<T> trait.

}
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For everything else, we need to keep some state. So we implement their conversion functions inside a struct which can hold that state.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the slicec binary to emit an encoded “generateCode” request to stdout, and introduces handwritten Rust types + conversion code to map the compiler AST into the Slice-compiler “on-the-wire” definitions for encoding.

Changes:

  • Add handwritten Slice-compiler definition types with EncodeInto<Slice2> implementations.
  • Add an AST → definition-types converter that linearizes top-level + anonymous types for encoding.
  • Update slicec CLI to encode parsed files and write the encoded bytes to stdout.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
slicec/src/main.rs Encodes a generateCode request and writes bytes to stdout (replaces prior debug/diagnostic output behavior).
slicec/src/slice_file_converter.rs Converts slicec AST structures into the handwritten definition types, including handling anonymous types.
slicec/src/definition_types.rs Defines handwritten “on-the-wire” types and their Slice2 encoding implementations.
slice-codec/src/slice2/encoding.rs Adds EncodeInto<Slice2> impls for &String and &Vec<T> to support encoding owned containers.
.vscode/cspell.json Adds repr to the spellchecker dictionary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +60 to +92
// TODO: replace this by forking a code-gen plugin once they exist.
// For now, if there are any diagnostics, we emit those and NOT the encoded definitions.
// Code-generators can tell if it's okay to decode or not by the presence of the `"generateCode"` string.
if totals.0 + totals.1 > 0 {
// If there were diagnostics, print them to 'stdout' and don't encode the Slice definitions.
print!("Diagnostics: ");
println!("{totals:?}");
for diagnostic in updated_diagnostics {
println!("{diagnostic:?}");
}
} else {
// Encode the parsed Slice definitions.
let encoded_bytes = match encode_generate_code_request(&files) {
Ok(bytes) => bytes,
Err(error) => {
eprintln!("{error:?}");
std::process::exit(11);
}
};

// Obtain an exclusive handle to 'stdout', and write the encoded bytes to it.
let mut stdout = std::io::stdout().lock();
match stdout.write_all(&encoded_bytes) {
Ok(_) => {}
Err(error) => {
eprintln!("{error:?}");
std::process::exit(12);
}
}
}
println!("{ast:?}");

std::process::exit(i32::from(totals.1 != 0));
// Success.
std::process::exit(0);
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main unconditionally exits with code 0, even when totals.1 indicates compilation errors. This breaks the CLI contract and makes it hard to detect failures in scripts/pipelines. Consider preserving the previous behavior (non-zero exit when errors are present) while still skipping binary output when diagnostics exist.

Copilot uses AI. Check for mistakes.

std::process::exit(i32::from(totals.1 != 0));
// Success.
std::process::exit(0);
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success path writes encoded bytes to stdout and then calls std::process::exit(0). process::exit bypasses destructors, so any buffered stdout data may not be flushed, potentially truncating the binary output. Prefer returning from main (no explicit exit), or explicitly flush() stdout before exiting.

Suggested change
std::process::exit(0);

Copilot uses AI. Check for mistakes.
@bernardnormier
Copy link
Member

It would be helpful to "vendor-in" the Compiler definitions you used for encoding, even if all the generated code is hand-written for now.

@InsertCreativityHere
Copy link
Member Author

It would be helpful to "vendor-in" the Compiler definitions you used for encoding, even if all the generated code is hand-written for now.

Yes, and I plan to do so. But I wanted to do the vendoring in a separate PR, since it's not directly related, and this is already a large PR. But, the hand-mapped definitions I have here should exactly match icerpc-slice/main right now, except for not encoding doc-comments, and obviously the names being in 'rust' style.

Copy link
Member

@pepone pepone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I want to do some testing from C# to ensure the decoding works!

}
}

/// TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the PR didn't change it, but what is the this TODO about, the missing doc comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just that we should add doc-comments. I could remove these TODOs, but might as well wait until I do the full cleanup & add doc-comments I figure.

///
/// It uses macro-function-syntax, and should be called like:
/// `implement_encode_into_for_struct!(struct_type_name, field1, field2, ...);`
macro_rules! implement_encode_into_for_struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to use the same approach for the generated code? If so, the macro should live in the codec library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, probably not. Haven't 100% decided yet, but I think it'd be better for the generated code to just generate the encoding normally, and not use a macro. Thoughts?

This macro is just for my own convenience. You can even see it as a kind of "psuedo code-gen".

#[derive(Clone, Debug)]
pub struct Field {
pub entity_info: EntityInfo,
pub tag: Option<i32>, // TODO: varint32 isn't a real type?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is not real?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove this comment and open a proposal to add a varint32 (etc.) type to the Rust mapping.

I think it's cleaner to have a separate type for this, instead of needing to call special encode_varint functions. It makes the API more consistent (you always call encoder.encode, no special functions), and will let users encode Rust structs directly without needing Slice files (otherwise, there's no way to specify whether an i32 should use the int32 or the varint32 encoding). That's a future proposal though.


// Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions.
// Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type.
#![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just qualify the grammar types, like in:

fn get_attributes_from(attributes: Vec<&slicec::grammar::Attribute>) -> Vec<Attribute> {

?

/// This struct exposes a function ([`SliceFileContentsConverter::convert`]) that converts the contents of a Slice file
/// from their AST representation, to a representation that can be encoded with the Slice encoding.
#[derive(Debug)]
pub struct SliceFileContentsConverter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this struct, can we just have a plain function that does the conversion. I don't see the advantage here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I tried implementing everything using simple From<T> conversions, but this doesn't work for something like myfield: Sequence<bool>. When converting this field, you have to yield 2 Symbols. One for the field, and one for the Sequence<bool> anonymous type. A From conversion can't capture this.

So, I added this struct with a dedicated field. This way, 1 conversion can yield multiple Symbols.
For example convert_field will return a Field, but under the hood, it's free to convert as many anonymous types as necessary and push them directly into the converted_contents field.

I'm open to other ideas. Not the biggest fan of this struct either, but I couldn't find a cleaner way.
It's basically the same as a Visitor, except it happens to not impl Visitor for reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants