Conversation
* allow replacing builder in stream reader - useful for reading large streams
| Buffer.MemoryCopy((void*)source, dest, newStr.Length * 2, used * 2); | ||
| } | ||
|
|
||
| str = newStr; |
There was a problem hiding this comment.
It's not necessary to copy the contents of the old backing string to the new one. The previously created ValueStrings will still reference the original string. The only observable change after a resize is that the contents of old ValueStrings won't become garbage when DestroyPreviousValueStrings is called (but this would stay true even if the patch is applied).
There was a problem hiding this comment.
The old ValueString indeed still contains the old content, but when looking at the MultiValueStringBuilder as a whole, wouldn't the old content be missing?
I ran into this when using the builder in conjunction with the ValueStringStreamReader.
Seems like with the original behavior, the builder resets itself when EnsureSpace is called, which is not how lists/StringBuilders behave in general. Or am I missing something?
BTW, this is a great lib. Thanks for publishing this.
I managed to improve the CPU utilization of a heavy process (parsing huge .csv files) from ~3% to 50%!
This is a good thing, as it was previously spending ~70% of time in GC, hence blocked most of the time without doing real work.
There was a problem hiding this comment.
I'm glad it has been useful:)
Anyways, MultiValueStringBuilder is intended to efficiently produce multiple ValueStrings (from char[]s or StringBuilders) with minimal GC (because ValueStrings are backed by real CLR strings). It's not intended to provide (nor does it provide) a StringBuilder-like API. There actually is a Concatenate method, but what it's doing is just a performance optimization in the case the provided ValueStrings are already in contiguous memory.
There was a problem hiding this comment.
But the builder does append to itself in some cases.
I think it'll more consistent if it does so always, especially if the name ("..StringBuilder") suggests so :)
At any rate, without the fix, the StreamReader generates wrong lines when the block-size (used in the builder) is reached.
| return hash1 + (hash2 * 1566083941); | ||
|
|
||
| } | ||
|
|
There was a problem hiding this comment.
I think this might be a too specific use case, especially considering that case-insensitive comparison is not even available in this library.
There was a problem hiding this comment.
I had to support case-insensitive equality, which is quite frequent in strings.
I can contribute the IEqualityComparer I implemented to make this more integrated if you wish.
I replaced the deprecated project.json with a new .csproj.
Fixed also various bugs discovered while working with large streams/strings, and introduced some supporting changes.