Refactor the slicec Binary to Write Encoded Slice Definitions to 'stdout'#729
Refactor the slicec Binary to Write Encoded Slice Definitions to 'stdout'#729InsertCreativityHere wants to merge 7 commits intoicerpc:mainfrom
slicec Binary to Write Encoded Slice Definitions to 'stdout'#729Conversation
| "nonoverlapping", | ||
| "nonterminal", | ||
| "peekable", | ||
| "repr", |
There was a problem hiding this comment.
This is the name of a common rust attribute for enums. It's like specifying an underlying type in Slice. You specify what it's in-memory representation is.
| @@ -6,9 +6,14 @@ use crate::encode_into::*; | |||
| use crate::encoder::Encoder; | |||
| use crate::{Error, InvalidDataErrorKind, Result}; | |||
There was a problem hiding this comment.
The encoder already supported the 'view' types: str and [],
but I updated it to also support the 'owned types: String and Vec.
This is not strictly necessary, but makes the API more convenient, and more consistent (which let use a macro). With this change you can write
encoder.encode(&myVec) instead of encoder.encode(myVec.as_slice()).
Same for String and as_str.
|
|
||
| // We only support `BTreeMap` if the `alloc` crate is available through the `alloc` feature flag. | ||
| // We only support 'owned' sequence/dictionary types if the `alloc` crate is available through the `alloc` feature flag. | ||
| // Note that we always support encoding views into these types (which don't require allocating memory). |
There was a problem hiding this comment.
This sentence makes sense to me, but might be confusing.
It should be read "... support encoding views-into-these-types ..."
Thoughts?
| #[cfg(feature = "alloc")] | ||
| impl EncodeInto<Slice2> for &String { | ||
| fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> { | ||
| self.as_str().encode_into(encoder) |
There was a problem hiding this comment.
The encoding of &String just delegates to the encoding for &str.
This is purely to save the caller from needing to do the conversion themselves.
| impl<'a, T> EncodeInto<Slice2> for &'a Vec<T> | ||
| where | ||
| &'a T: EncodeInto<Slice2>, | ||
| { |
There was a problem hiding this comment.
Rust's powerful 'blanket-impl' syntax at work here. This translates to:
"Implement the EncodeInto trait for any &Vec<T>, such that &T itself already implements EncodeInto."
And we have to tie the lifetimes of the elements to the vector they're contained within (No one else can mess with the vec while we're encoding it's elements).
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. |
There was a problem hiding this comment.
Not sure if there's a better solution here... Alot of type names are identical between slicec and the Compiler Definitions; unsurprisingly.
|
|
||
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. | ||
| #![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line. |
There was a problem hiding this comment.
rustfmt does not handle aliased imports well apparently. It tries to format like normal imports, which makes them unreadable.
Tbf, aliases are pretty uncommon in practice.
There was a problem hiding this comment.
Can't we just qualify the grammar types, like in:
fn get_attributes_from(attributes: Vec<&slicec::grammar::Attribute>) -> Vec<Attribute> {
?
There was a problem hiding this comment.
Yeah of course, but then it's easy to accidentally forget to qualify a type (happened multiple times), not to mention it bloats all the function signatures.
If that's what everyone prefers I'm fine with it. But that's what I tried first and didn't like it.
| /// In Slice, doc-comments are not allowed on parameters. Instead, you would use a '@param' tag applied to an enclosing | ||
| /// operation. But this is an implementation detail of the language, not something code-generators should deal with. | ||
| fn get_doc_comment_for_parameter(parameter: &CompilerParameter) -> Option<DocComment> { | ||
| let operation_comment = parameter.parent().comment()?; |
There was a problem hiding this comment.
The ? is the short-circuiting operator for Option and Result types.
If you apply it an Option, and your function also returns an Option, it will immediately return for None cases (what we do here).
Same for Result and Err cases.
| // =========================== // | ||
| // Direct conversion functions // | ||
| // =========================== // | ||
|
|
There was a problem hiding this comment.
For types which are 'simple', we can directly convert between slicec and mapped types. The conventional way to do this in Rust is via the From<T> trait.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
For everything else, we need to keep some state. So we implement their conversion functions inside a struct which can hold that state.
There was a problem hiding this comment.
Pull request overview
This PR refactors the slicec binary to emit an encoded “generateCode” request to stdout, and introduces handwritten Rust types + conversion code to map the compiler AST into the Slice-compiler “on-the-wire” definitions for encoding.
Changes:
- Add handwritten Slice-compiler definition types with
EncodeInto<Slice2>implementations. - Add an AST → definition-types converter that linearizes top-level + anonymous types for encoding.
- Update
slicecCLI to encode parsed files and write the encoded bytes tostdout.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
slicec/src/main.rs |
Encodes a generateCode request and writes bytes to stdout (replaces prior debug/diagnostic output behavior). |
slicec/src/slice_file_converter.rs |
Converts slicec AST structures into the handwritten definition types, including handling anonymous types. |
slicec/src/definition_types.rs |
Defines handwritten “on-the-wire” types and their Slice2 encoding implementations. |
slice-codec/src/slice2/encoding.rs |
Adds EncodeInto<Slice2> impls for &String and &Vec<T> to support encoding owned containers. |
.vscode/cspell.json |
Adds repr to the spellchecker dictionary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // TODO: replace this by forking a code-gen plugin once they exist. | ||
| // For now, if there are any diagnostics, we emit those and NOT the encoded definitions. | ||
| // Code-generators can tell if it's okay to decode or not by the presence of the `"generateCode"` string. | ||
| if totals.0 + totals.1 > 0 { | ||
| // If there were diagnostics, print them to 'stdout' and don't encode the Slice definitions. | ||
| print!("Diagnostics: "); | ||
| println!("{totals:?}"); | ||
| for diagnostic in updated_diagnostics { | ||
| println!("{diagnostic:?}"); | ||
| } | ||
| } else { | ||
| // Encode the parsed Slice definitions. | ||
| let encoded_bytes = match encode_generate_code_request(&files) { | ||
| Ok(bytes) => bytes, | ||
| Err(error) => { | ||
| eprintln!("{error:?}"); | ||
| std::process::exit(11); | ||
| } | ||
| }; | ||
|
|
||
| // Obtain an exclusive handle to 'stdout', and write the encoded bytes to it. | ||
| let mut stdout = std::io::stdout().lock(); | ||
| match stdout.write_all(&encoded_bytes) { | ||
| Ok(_) => {} | ||
| Err(error) => { | ||
| eprintln!("{error:?}"); | ||
| std::process::exit(12); | ||
| } | ||
| } | ||
| } | ||
| println!("{ast:?}"); | ||
|
|
||
| std::process::exit(i32::from(totals.1 != 0)); | ||
| // Success. | ||
| std::process::exit(0); |
There was a problem hiding this comment.
main unconditionally exits with code 0, even when totals.1 indicates compilation errors. This breaks the CLI contract and makes it hard to detect failures in scripts/pipelines. Consider preserving the previous behavior (non-zero exit when errors are present) while still skipping binary output when diagnostics exist.
|
|
||
| std::process::exit(i32::from(totals.1 != 0)); | ||
| // Success. | ||
| std::process::exit(0); |
There was a problem hiding this comment.
The success path writes encoded bytes to stdout and then calls std::process::exit(0). process::exit bypasses destructors, so any buffered stdout data may not be flushed, potentially truncating the binary output. Prefer returning from main (no explicit exit), or explicitly flush() stdout before exiting.
| std::process::exit(0); |
|
It would be helpful to "vendor-in" the Compiler definitions you used for encoding, even if all the generated code is hand-written for now. |
Yes, and I plan to do so. But I wanted to do the vendoring in a separate PR, since it's not directly related, and this is already a large PR. But, the hand-mapped definitions I have here should exactly match |
pepone
left a comment
There was a problem hiding this comment.
Looks good, I want to do some testing from C# to ensure the decoding works!
| } | ||
| } | ||
|
|
||
| /// TODO |
There was a problem hiding this comment.
I know the PR didn't change it, but what is the this TODO about, the missing doc comment?
There was a problem hiding this comment.
Yes, just that we should add doc-comments. I could remove these TODOs, but might as well wait until I do the full cleanup & add doc-comments I figure.
| /// | ||
| /// It uses macro-function-syntax, and should be called like: | ||
| /// `implement_encode_into_for_struct!(struct_type_name, field1, field2, ...);` | ||
| macro_rules! implement_encode_into_for_struct { |
There was a problem hiding this comment.
Are we planning to use the same approach for the generated code? If so, the macro should live in the codec library.
There was a problem hiding this comment.
Honestly, probably not. Haven't 100% decided yet, but I think it'd be better for the generated code to just generate the encoding normally, and not use a macro. Thoughts?
This macro is just for my own convenience. You can even see it as a kind of "psuedo code-gen".
| #[derive(Clone, Debug)] | ||
| pub struct Field { | ||
| pub entity_info: EntityInfo, | ||
| pub tag: Option<i32>, // TODO: varint32 isn't a real type? |
There was a problem hiding this comment.
I meant to remove this comment and open a proposal to add a varint32 (etc.) type to the Rust mapping.
I think it's cleaner to have a separate type for this, instead of needing to call special encode_varint functions. It makes the API more consistent (you always call encoder.encode, no special functions), and will let users encode Rust structs directly without needing Slice files (otherwise, there's no way to specify whether an i32 should use the int32 or the varint32 encoding). That's a future proposal though.
|
|
||
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. | ||
| #![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line. |
There was a problem hiding this comment.
Can't we just qualify the grammar types, like in:
fn get_attributes_from(attributes: Vec<&slicec::grammar::Attribute>) -> Vec<Attribute> {
?
| /// This struct exposes a function ([`SliceFileContentsConverter::convert`]) that converts the contents of a Slice file | ||
| /// from their AST representation, to a representation that can be encoded with the Slice encoding. | ||
| #[derive(Debug)] | ||
| pub struct SliceFileContentsConverter { |
There was a problem hiding this comment.
What is the point of this struct, can we just have a plain function that does the conversion. I don't see the advantage here.
There was a problem hiding this comment.
Originally, I tried implementing everything using simple From<T> conversions, but this doesn't work for something like myfield: Sequence<bool>. When converting this field, you have to yield 2 Symbols. One for the field, and one for the Sequence<bool> anonymous type. A From conversion can't capture this.
So, I added this struct with a dedicated field. This way, 1 conversion can yield multiple Symbols.
For example convert_field will return a Field, but under the hood, it's free to convert as many anonymous types as necessary and push them directly into the converted_contents field.
I'm open to other ideas. Not the biggest fan of this struct either, but I couldn't find a cleaner way.
It's basically the same as a Visitor, except it happens to not impl Visitor for reasons.
This PR does what the title says! I haven't implemented the process forking (i.e. where you can pass code-gen plugins on the command-line and
slicecwill fork them). That's next up on the list.But you can still pipe the output of
cargo run -- Test.sliceto another process, or a file.It's obviously hard to test this without a decoder. I've been using
cargo run -- Test.slice > slice-output.dat,and opening the file in a hex editor to manually check the bytes. The bytes seem correct for normal Slice files.
Structure
This PR does 3 big things, in 2 files:
definition_types.rs: this file is my 'hand-mapping' the Slice Compiler definitions.It will eventually be replaced by generated code when we add code-gen in Rust.
This file 1) Defines the types that will be sent on-the-wire and 2) implements the encoding of these types.
The encoding is complete, except for
DocCommentswhich require tags (which require size/position tracking which doesn't exist yet).slice_file_converter.rsis the more interesting one. This file provides conversion functions for mapping between domains. It takes types fromslicecand converts them to the Slice-generated types. Most of this is straightforward, but there are some differences between them which required attention.