Conversation
Char.len_utf8 is stable since rust-lang#49698, making this a little easier to follow.
|
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @rkruppe (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
|
@bors rollup |
|
cc @SimonSapin |
|
Thanks, LGTM! @bors r+ |
|
📌 Commit 7af83dc has been approved by |
|
|
|
@bors r- (let's wait with merging until this performance question is cleared up) |
|
I don’t read assembly fluently enough to conclude anything from this: |
|
I created a sample benchmark to compare the two implementations: benchmark#![feature(test)]
extern crate test;
use std::str::Chars;
pub struct CharIndices<'a> {
front_offset: usize,
iter: Chars<'a>,
}
pub fn before(self_: &mut CharIndices) -> Option<(usize, char)> {
let pre_len = self_.iter.as_str().len();
match self_.iter.next() {
None => None,
Some(ch) => {
let index = self_.front_offset;
let len = self_.iter.as_str().len();
self_.front_offset += pre_len - len;
Some((index, ch))
}
}
}
pub fn after(self_: &mut CharIndices) -> Option<(usize, char)> {
let ch = self_.iter.next()?;
let index = self_.front_offset;
self_.front_offset += ch.len_utf8();
Some((index, ch))
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn before(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.chars().count();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some(_) = super::before(&mut chars) {
i += 1;
}
assert_eq!(i, len);
});
}
#[bench]
fn after(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.chars().count();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some(_) = super::after(&mut chars) {
i += 1;
}
assert_eq!(i, len);
});
}
}It appears they're roughly the same speed? It might be that I'm just doing something wrong, the assembly definitely looks more complex the the after output above. Before: https://rust.godbolt.org/z/sIJTXu |
|
Nevermind, when I updated the benchmark to actually use the index, it's slower: benchmark using index#![feature(test)]
extern crate test;
use std::str::Chars;
pub struct CharIndices<'a> {
front_offset: usize,
iter: Chars<'a>,
}
pub fn before(self_: &mut CharIndices) -> Option<(usize, char)> {
let pre_len = self_.iter.as_str().len();
match self_.iter.next() {
None => None,
Some(ch) => {
let index = self_.front_offset;
let len = self_.iter.as_str().len();
self_.front_offset += pre_len - len;
Some((index, ch))
}
}
}
pub fn after(self_: &mut CharIndices) -> Option<(usize, char)> {
let ch = self_.iter.next()?;
let index = self_.front_offset;
self_.front_offset += ch.len_utf8();
Some((index, ch))
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn before(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.len();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some((index, _)) = super::before(&mut chars) {
i = index;
}
assert_eq!(i + 1, len);
});
}
#[bench]
fn after(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.len();
b.iter(|| {
let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
let mut i = 0;
while let Some((index, _)) = super::after(&mut chars) {
i = index;
}
assert_eq!(i + 1, len);
});
}
} |
Char.len_utf8is stable since #49698, making this a little easier to follow.