Owning reference to a string with O(1) access to lines#3
Owning reference to a string with O(1) access to lines#3pacak wants to merge 2 commits intovallentin:masterfrom
Conversation
|
Any thoughts? |
|
Hey. Sorry. I completely overlooked this PR. I just randomly did some changes today, and well, your new comment pinged me. I'll just take a look at this PR :) |
|
I have a few questions and thoughts. So feel free to respond to those, before you spend time reworking the PR. (I'll review the PR now, and add the questions and thoughts as comments) |
vallentin
left a comment
There was a problem hiding this comment.
Again. Sorry for the lack or response until now. I somehow overlooked the original notification for this PR
| /// - [`with_ending`](CachedLines::with_ending) | ||
| /// | ||
| /// | ||
| #[cfg(feature = "alloc")] |
There was a problem hiding this comment.
Might be easier to avoid needing all the #[cfg(feature = "alloc")], by having a:
#[cfg(feature = "alloc")]
mod cached;and then adding this implementation to a cached.rs module.
There was a problem hiding this comment.
I'm okay either way, was mostly trying to keep things in one file, second alternative is
mod cached {
...
}Will do.
| #[cfg(feature = "alloc")] | ||
| pub struct CachedLines { | ||
| content: alloc::string::String, | ||
| splits: alloc::vec::Vec<Range<usize>>, |
There was a problem hiding this comment.
Maybe instead we could have a:
pub struct LineSpanWithoutText {
start: usize,
end: usize,
ending: usize,
}and then:
| splits: alloc::vec::Vec<Range<usize>>, | |
| splits: alloc::vec::Vec<LineSpanWithoutText>, |
That way we could replace with_ending() and without_ending() with new()
There was a problem hiding this comment.
It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index and iterator goodies.
| pub fn get(&self, index: usize) -> Option<&str> { | ||
| self.content.get(self.splits.get(index)?.clone()) |
There was a problem hiding this comment.
(in extension of the previous comment)
Then have this be something like:
| pub fn get(&self, index: usize) -> Option<&str> { | |
| self.content.get(self.splits.get(index)?.clone()) | |
| pub fn get(&self, index: usize) -> Option<LineSpan<'_>> { | |
| // i.e. from splits get LineSpanWithoutText + self.contents => LineSpan | |
| ... |
| type Output = str; | ||
|
|
||
| fn index(&self, index: usize) -> &Self::Output { | ||
| &self.content[self.splits[index].clone()] |
There was a problem hiding this comment.
Hmm. I was about to suggest calling .get() and returning the LineSpan, but this needs to return a reference
There was a problem hiding this comment.
Index needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.
|
|
||
| #[test] | ||
| fn test_cached_lines_no_endings() { | ||
| let text = "\r\nfoo\nbar\r\nbaz\nqux\r\n\r"; |
There was a problem hiding this comment.
Not that Rust 1.70 changed the behavior, so trailing \r will appear in the final line. (Just in case this test suddenly)
| pub struct CachedLines { | ||
| content: alloc::string::String, |
There was a problem hiding this comment.
Maybe this can use alloc::borrow::Cow instead to allow for borrowing a &str instead of needing to turn it into an owned String
(Not sure if it creates other issues)
| pub struct CachedLines { | |
| content: alloc::string::String, | |
| pub struct CachedLines<'a> { | |
| content: alloc::borrow::Cow<'a>, |
There was a problem hiding this comment.
I'm okay either way, but Cow adds complexity without adding significant benefits - CachedLines will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.
| /// | ||
| /// | ||
| #[cfg(feature = "alloc")] | ||
| pub struct CachedLines { |
There was a problem hiding this comment.
Finally, maybe it should be called LineSpanCache/CachedLineSpans. If it's changed to collect LineSpan
| #[derive(Debug, Clone)] | ||
| /// Owning pointer to a string that provides O(1) access to line slices | ||
| /// | ||
| /// ``` | ||
| /// use line_span::CachedLines; | ||
| /// let cache = CachedLines::without_ending(String::from("hello\nworld")); | ||
| /// assert_eq!(&cache[0], "hello"); | ||
| /// assert_eq!(&cache[1], "world"); | ||
| /// ``` | ||
| /// | ||
| /// Slices might or might not include line breaks depending on a function used to construct it | ||
| /// | ||
| /// - [`without_ending`](CachedLines::without_ending) | ||
| /// - [`with_ending`](CachedLines::with_ending) | ||
| /// | ||
| /// | ||
| #[cfg(feature = "alloc")] | ||
| pub struct CachedLines { |
There was a problem hiding this comment.
Please move the attribute, to after the comment
#[derive(Clone, Debug)]
pub struct CachedLines {| #[derive(Debug, Clone)] | ||
| /// Owning pointer to a string that provides O(1) access to line slices | ||
| /// | ||
| /// ``` | ||
| /// use line_span::CachedLines; | ||
| /// let cache = CachedLines::without_ending(String::from("hello\nworld")); | ||
| /// assert_eq!(&cache[0], "hello"); | ||
| /// assert_eq!(&cache[1], "world"); | ||
| /// ``` | ||
| /// | ||
| /// Slices might or might not include line breaks depending on a function used to construct it | ||
| /// | ||
| /// - [`without_ending`](CachedLines::without_ending) | ||
| /// - [`with_ending`](CachedLines::with_ending) | ||
| /// | ||
| /// | ||
| #[cfg(feature = "alloc")] | ||
| pub struct CachedLines { |
There was a problem hiding this comment.
I also feel like this should probably be feature gated, with a cache = [] feature
pacak
left a comment
There was a problem hiding this comment.
Currently I inlined proposed changes into my crate and using Index to access strings, code is here: https://github.com/pacak/cargo-show-asm/blob/master/src/cached_lines.rs
There's no rush to get this included :)
Open questions are:
- Cow - yes/now. I decided against it since it's complexity without significant gains
- stored values -
Range<usize>orLinespan- former offers uniform interface forgetandIndex, later seems to be a bit less useful - put under separate feature or not. I decided against it, "alloc" limits what platforms you can run on, having "cached" as a separate feature saves a tiny bit of compilation time at expense of user confusion :)
What are your preferences herE?
| /// - [`with_ending`](CachedLines::with_ending) | ||
| /// | ||
| /// | ||
| #[cfg(feature = "alloc")] |
There was a problem hiding this comment.
I'm okay either way, was mostly trying to keep things in one file, second alternative is
mod cached {
...
}Will do.
| /// | ||
| /// | ||
| #[cfg(feature = "alloc")] | ||
| pub struct CachedLines { |
| pub struct CachedLines { | ||
| content: alloc::string::String, |
There was a problem hiding this comment.
I'm okay either way, but Cow adds complexity without adding significant benefits - CachedLines will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.
| #[cfg(feature = "alloc")] | ||
| pub struct CachedLines { | ||
| content: alloc::string::String, | ||
| splits: alloc::vec::Vec<Range<usize>>, |
There was a problem hiding this comment.
It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index and iterator goodies.
| type Output = str; | ||
|
|
||
| fn index(&self, index: usize) -> &Self::Output { | ||
| &self.content[self.splits[index].clone()] |
There was a problem hiding this comment.
Index needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.
| pub fn get(&self, index: usize) -> Option<&str> { | ||
| self.content.get(self.splits.get(index)?.clone()) |
This can be useful if you have a string that you don't want to modify but do want to access line by line in a random order.