Skip to content

add a benchmark for ericniebler/stl2#34#181

Closed
ericniebler wants to merge 1 commit intomasterfrom
benchmark
Closed

add a benchmark for ericniebler/stl2#34#181
ericniebler wants to merge 1 commit intomasterfrom
benchmark

Conversation

@ericniebler
Copy link
Collaborator

@ericniebler ericniebler commented Aug 6, 2018

Thusly proving that @tomaszkam's "itinerary" example can be implemented efficiently in STL2 with very minor changes, leaving the CommonReference requirement intact on StrictTotallyOrderedWith.

The benchmark, compiled with gcc-8.1 on a MacBook, gives me the following results:

2018-08-06 15:03:41
Running benchmark/itinerary
Run on (8 X 2800 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 6291K (x1)
-----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
ItineraryFixture/STL1/1024/1            30 ns         30 ns   22817059
ItineraryFixture/STL1/4096/1            37 ns         37 ns   18764693
ItineraryFixture/STL1/32768/1           59 ns         59 ns   13328002
ItineraryFixture/STL1/262144/1          79 ns         78 ns    9318052
ItineraryFixture/STL1/1048576/1         97 ns         94 ns    8003842
ItineraryFixture/STL1/1024/2            30 ns         30 ns   21461070
ItineraryFixture/STL1/4096/2            38 ns         38 ns   18467075
ItineraryFixture/STL1/32768/2           51 ns         51 ns   13753807
ItineraryFixture/STL1/262144/2          68 ns         68 ns   10493966
ItineraryFixture/STL1/1048576/2         87 ns         87 ns    7763717
ItineraryFixture/STL2/1024/1            26 ns         26 ns   27363845
ItineraryFixture/STL2/4096/1            31 ns         31 ns   23047544
ItineraryFixture/STL2/32768/1           45 ns         45 ns   15155583
ItineraryFixture/STL2/262144/1          62 ns         62 ns   11136212
ItineraryFixture/STL2/1048576/1         72 ns         71 ns    9794045
ItineraryFixture/STL2/1024/2            24 ns         24 ns   29091272
ItineraryFixture/STL2/4096/2            31 ns         31 ns   22127950
ItineraryFixture/STL2/32768/2           45 ns         45 ns   14772140
ItineraryFixture/STL2/262144/2          66 ns         66 ns   10266037
ItineraryFixture/STL2/1048576/2         76 ns         76 ns    8112086

That is, according to the benchmark, STL2 is actually slightly faster than the STL1 code. Go figure.

This relates to ericniebler/stl2#34.

@ericniebler
Copy link
Collaborator Author

ericniebler commented Aug 6, 2018

Someone (@tomaszkam perhaps) should check my work.

A word about why the common_type specializations are not necessary. All the view adaptors in P0789 are implicitly convertible to Container-like things. So, a view::transform already has a common type with a compatible std::vector; namely, std::vector. (EDIT: @tomaszkam notes this already in his draft paper for San Diego, not linked here.) Nevertheless, it may be a good idea to add a common_type<Range, Range> specialization (the common type is any_view, which is yet to be proposed).

Also, this solution provides strong motivation for providing relational operators for ranges, which default to ranges::equal (for == and !=) and ranges::lexicographical_compare (for <, >, <=, and >=).

cc'ing @asutton and @sean-parent who have previously expressed the opinion that the "common type" as discussed in N3351 need only be notional, not manifest in a program. I may yet be convinced of this, but my gut says that if we are not checking it syntactically, the requirement will be ignored.

EDIT: I'll also note that once the range-based relational operators and common_type specializations are added to the proposal, the STL2 solution to this problem is much shorter and easier to reason about, in addition to being faster. And contrary to what Tomasz claims in his paper, no compiler heroics are needed to optimize this code. It's just inlining and constant propagation.

@CaseyCarter
Copy link
Owner

Also, this solution provides strong motivation for providing relational operators for ranges, which default to ranges::equal (for == and !=) and ranges::lexicographical_compare (for <, >, <=, and >=).

Egads no. WG21 just got done arguing that span shouldn't have an equality operator that compares the elements it denotes because they are subject to change by operations that don't modify the span. The element values cannot be salient to the value of a range that doesn't own those elements.

@ericniebler
Copy link
Collaborator Author

ericniebler commented Aug 7, 2018

Oops, you're right about that. :-P

EDIT: OK, fixed. The relational operators were unnecessary anyway.

@ericniebler ericniebler force-pushed the benchmark branch 2 times, most recently from e88965e to 29d576f Compare August 7, 2018 04:32
@tomaszkam
Copy link

tomaszkam commented Aug 7, 2018

EDIT: I'll also note that once the range-based relational operators and common_type specializations are added to the proposal, the STL2 solution to this problem is much shorter and easier to reason about, in addition to being faster. And contrary to what Tomasz claims in his paper, no compiler heroics are needed to optimize this code. It's just inlining and constant propagation.

You have proved that there is single compiler (as the code can be used with only 1 version) and single machine that is able to give the same speed for STL2 and cmcstl. While you are ware aware see your's blog post, that compiler have given very different result for such test previously. Note that due Concepts TS requirements, we cannot check more than 1 compiler,

@ericniebler
Copy link
Collaborator Author

I can implement the same benchmark in range-v3 and show the same effect on any compiler (capable of compiling range-v3, that is). The only reason you were seeing any slow-down at all was because your STL1 code wasn't using pointers-to-members and your STL2 code was. That is, you were comparing apples to oranges. (Compilers do a poor job of optimizing pointers-to-members, possibly because they are not that common in end-user code.)

As for that old blog post, there are some views that are difficult for some compilers to optimize; view::join for instance, which is used to implement view::for_each. On the other hand, view::transform is pretty easy for compilers to optimize. I wouldn't use that post to conclude that ranges are inefficient or difficult to optimize in general. And it's certainly not the case that the CommonReference requirement on cross-type concepts has anything to do with clang's inability to optimize the Pythagorean triples example.

@tomaszkam
Copy link

tomaszkam commented Aug 7, 2018

Few things regarding the request:

  1. The code has UB, as a comparison of any_view does not match the comparison of the algorithms, i.e. the semantics requirements of StrictWeakOrdering are not satisfied.. This is important because the any_view is currently not proposed, so maybe not available for the user of C++20.
  2. To make code shorter (compared to STL1) you are using fact that algorithms are functors (not functions), which again is not true for C++20. This may be fixed by changing the invocation to:
benchmark::DoNotOptimize(
     ranges::equal_range(
         itineraries,
         dates,
         [](auto const& r1, auto const& r2) { return ranges::lexicographical_compare(r1, r2); },
        toDates));

So, please provide a solution that actually works with current working-draft for C++20, or at least, it is proposed to be included. Any other option may (and probably won't be available) for programmers.

@tomaszkam
Copy link

tomaszkam commented Aug 7, 2018

The fact is I am unable to reproduce your result on my system. I am using g++-7:

COLLECT_GCC=g++-7
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-16ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --with-as=/usr/bin/x86_64-linux-gnu-as --with-ld=/usr/bin/x86_64-linux-gnu-ld --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)

Then the program is compiled as follows (I have wrapped lexicographical_compare into lambda, to test code that could be written with standard).
g++-7 -O2 -std=c++17 -fconcepts itinerary.cpp -Icmcstl2/include/ -lbenchmark -lpthread
Note: Optimization level that the projects I am working on are usuall built with.

And I get the following result on my PC with Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz:

Running ./a.out
Run on (8 X 4200 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
ItineraryFixture/STL1/1024/1            26 ns         26 ns   27150714
ItineraryFixture/STL1/4096/1            33 ns         33 ns   21486153
ItineraryFixture/STL1/32768/1           52 ns         52 ns   14354169
ItineraryFixture/STL1/262144/1          65 ns         65 ns   10948499
ItineraryFixture/STL1/1048576/1         86 ns         86 ns    8321704
ItineraryFixture/STL1/1024/2            24 ns         24 ns   29284085
ItineraryFixture/STL1/4096/2            31 ns         31 ns   22932089
ItineraryFixture/STL1/32768/2           47 ns         47 ns   16559947
ItineraryFixture/STL1/262144/2          59 ns         59 ns   12459084
ItineraryFixture/STL1/1048576/2         79 ns         79 ns    9176410
ItineraryFixture/STL2/1024/1            40 ns         40 ns   17388324
ItineraryFixture/STL2/4096/1            47 ns         47 ns   14701711
ItineraryFixture/STL2/32768/1           64 ns         64 ns   10895889
ItineraryFixture/STL2/262144/1          77 ns         77 ns    9109058
ItineraryFixture/STL2/1048576/1         98 ns         98 ns    7005623
ItineraryFixture/STL2/1024/2            36 ns         36 ns   19695671
ItineraryFixture/STL2/4096/2            43 ns         43 ns   16237604
ItineraryFixture/STL2/32768/2           58 ns         58 ns   11944594
ItineraryFixture/STL2/262144/2          78 ns         78 ns    9373197
ItineraryFixture/STL2/1048576/2         98 ns         98 ns    7356964

Note, that I am only required to prove that some users are not able to tap into optimization giving equal performance, to justify claim I made in paper (of course, excluding builds without optimizations enabled at all). However to prove, that the STL2 is equally efficient abstraction, we need to show that on-par performance will be available for all C++ users - including ones targeting embedded, or other devices.

@ericniebler
Copy link
Collaborator Author

The code has UB, as a comparison of any_view does not match the comparison of the algorithms

I took a shortcut. Of course a full solution would completely flesh out any_view, and there is nothing hard about its implementation.

you are using fact that algorithms are functors (not functions), which again is not true for C++20

They are (disguised) function objects, so my code will work as written in C++20. (We're a little circumspect in the current draft about whether the algorithms in std::ranges are function objects or not, but in truth there is no other way to implement them.)

And here is my result for gcc-7.3, again showing no difference. I use -O3 -DNDEBUG, so maybe that's what you're seeing?

2018-08-07 12:05:11
Running ./benchmark/itinerary
Run on (8 X 2800 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 6291K (x1)
-----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
ItineraryFixture/STL1/1024/1            33 ns         33 ns   20874710
ItineraryFixture/STL1/4096/1            40 ns         40 ns   16970890
ItineraryFixture/STL1/32768/1           63 ns         63 ns   12080838
ItineraryFixture/STL1/262144/1          80 ns         80 ns    8872440
ItineraryFixture/STL1/1048576/1         94 ns         94 ns    7655126
ItineraryFixture/STL1/1024/2            31 ns         31 ns   22644122
ItineraryFixture/STL1/4096/2            38 ns         38 ns   18323168
ItineraryFixture/STL1/32768/2           54 ns         54 ns   12208096
ItineraryFixture/STL1/262144/2          72 ns         72 ns    9740757
ItineraryFixture/STL1/1048576/2         90 ns         90 ns    7595404
ItineraryFixture/STL2/1024/1            26 ns         26 ns   26716741
ItineraryFixture/STL2/4096/1            32 ns         32 ns   21591344
ItineraryFixture/STL2/32768/1           50 ns         50 ns   13999720
ItineraryFixture/STL2/262144/1          65 ns         65 ns   10741963
ItineraryFixture/STL2/1048576/1         74 ns         74 ns    9471363
ItineraryFixture/STL2/1024/2            23 ns         23 ns   29850619
ItineraryFixture/STL2/4096/2            31 ns         31 ns   23021999
ItineraryFixture/STL2/32768/2           46 ns         46 ns   15243637
ItineraryFixture/STL2/262144/2          65 ns         65 ns   10831386
ItineraryFixture/STL2/1048576/2         79 ns         79 ns    8998355

Again, the STL2 code is shorter, simpler, more readable, and faster, even with the CommonReference requirement. The STL1 code defines semantically meaningless relations over apples and oranges. I see no reason for it.

@tomaszkam
Copy link

Yes, the difference is caused by the usage of -O2 vs -O3, however, the apps I have contact with consistently use O2, for them, we are seeing performance drop caused by compiler not optimizing the code - exactly the point I made in paper.

Again, the STL2 code is shorter, simpler, more readable, and faster, even with the CommonReference requirement. The STL1 code defines semantically meaningless relations over apples and oranges. I see no reason for it.
I totally disagree about this statement.

  1. The code is not faster for some build setting. -O2 is reasonable and as far as I know, widely used optimization level.
  2. Readability is totally subjective.
  3. Mathematical notation of weak ordering allows building relation of different types (see my point 3.2) and such relation as meaningful as any other weak ordering. Including additional requirement makes this relation overconstrained, as consequence, they are less generic - see points 3.3 and 3.5 in paper.

@CaseyCarter
Copy link
Owner

CaseyCarter commented Aug 7, 2018

I suggest you both test with the implementation on the benchmark_fixes branch, which eliminates variance between the library implementations by using the cmcstl2 algorithms for both the "STL1" and "STL2" benchmarks. (Obviously the constraints for the "STL1" code have been largely removed.)

EDIT: My numbers for g++-7 (Ubuntu 7.3.0-21ubuntu1~16.04) with -O2 -DNDEBUG (running in WSL, so there will be some noise in the timings):

2018-08-07 12:36:44
Running ./a.out
Run on (8 X 3601 MHz CPU s)
-----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
ItineraryFixture/STL1/1024/1            29 ns         29 ns   24888889
ItineraryFixture/STL1/4096/1            33 ns         32 ns   21333333
ItineraryFixture/STL1/32768/1           50 ns         50 ns   10000000
ItineraryFixture/STL1/262144/1          64 ns         64 ns   11200000
ItineraryFixture/STL1/1048576/1         76 ns         75 ns    8960000
ItineraryFixture/STL1/1024/2            28 ns         28 ns   24888889
ItineraryFixture/STL1/4096/2            39 ns         39 ns   17920000
ItineraryFixture/STL1/32768/2           50 ns         50 ns   11200000
ItineraryFixture/STL1/262144/2          71 ns         71 ns   11200000
ItineraryFixture/STL1/1048576/2         81 ns         82 ns    8960000
ItineraryFixture/STL2/1024/1            26 ns         26 ns   26352941
ItineraryFixture/STL2/4096/1            33 ns         32 ns   20363636
ItineraryFixture/STL2/32768/1           49 ns         49 ns   14451613
ItineraryFixture/STL2/262144/1          67 ns         68 ns    8960000
ItineraryFixture/STL2/1048576/1         71 ns         71 ns   11200000
ItineraryFixture/STL2/1024/2            24 ns         24 ns   28000000
ItineraryFixture/STL2/4096/2            33 ns         34 ns   21333333
ItineraryFixture/STL2/32768/2           52 ns         52 ns   11200000
ItineraryFixture/STL2/262144/2          66 ns         66 ns    8960000
ItineraryFixture/STL2/1048576/2         82 ns         82 ns    8960000

I see no significant differences between the STL1 and STL2 benchmarks.

@CaseyCarter
Copy link
Owner

@tomaszkam, I also suggest you compile -DNDEBUG to disable the debug checking in cmcstl2.

@tomaszkam
Copy link

Putting -NDEBUG with -O2 does not help:

> ./a.out 
2018-08-07 21:37:03
Running ./a.out
Run on (8 X 4200 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
ItineraryFixture/STL1/1024/1            26 ns         26 ns   27149589
ItineraryFixture/STL1/4096/1            33 ns         33 ns   21467442
ItineraryFixture/STL1/32768/1           50 ns         50 ns   15329865
ItineraryFixture/STL1/262144/1          65 ns         65 ns   10858852
ItineraryFixture/STL1/1048576/1         82 ns         82 ns    8905017
ItineraryFixture/STL1/1024/2            24 ns         24 ns   29270811
ItineraryFixture/STL1/4096/2            31 ns         31 ns   22936784
ItineraryFixture/STL1/32768/2           46 ns         46 ns   16075759
ItineraryFixture/STL1/262144/2          59 ns         59 ns   12132677
ItineraryFixture/STL1/1048576/2         78 ns         78 ns    9288086
ItineraryFixture/STL2/1024/1            37 ns         37 ns   19111440
ItineraryFixture/STL2/4096/1            46 ns         46 ns   14923145
ItineraryFixture/STL2/32768/1           64 ns         64 ns   11011531
ItineraryFixture/STL2/262144/1          79 ns         79 ns    8617140
ItineraryFixture/STL2/1048576/1        107 ns        107 ns    6865674
ItineraryFixture/STL2/1024/2            31 ns         31 ns   22421557
ItineraryFixture/STL2/4096/2            42 ns         42 ns   16700593
ItineraryFixture/STL2/32768/2           56 ns         56 ns   12402371
ItineraryFixture/STL2/262144/2          74 ns         74 ns    9649730

Bumping up optimization level to O3, at least give me consistent results with one from Eric:

> ./a.out 
2018-08-07 21:39:21
Running ./a.out
Run on (8 X 4200 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
ItineraryFixture/STL1/1024/1            26 ns         26 ns   26607868
ItineraryFixture/STL1/4096/1            33 ns         33 ns   21068874
ItineraryFixture/STL1/32768/1           49 ns         49 ns   14726723
ItineraryFixture/STL1/262144/1          65 ns         65 ns   10625095
ItineraryFixture/STL1/1048576/1         78 ns         78 ns    8931364
ItineraryFixture/STL1/1024/2            25 ns         25 ns   28320784
ItineraryFixture/STL1/4096/2            31 ns         31 ns   22362341
ItineraryFixture/STL1/32768/2           47 ns         47 ns   16273846
ItineraryFixture/STL1/262144/2          59 ns         59 ns   12040877
ItineraryFixture/STL1/1048576/2         80 ns         80 ns    9351142
ItineraryFixture/STL2/1024/1            22 ns         22 ns   31693527
ItineraryFixture/STL2/4096/1            27 ns         27 ns   25750684
ItineraryFixture/STL2/32768/1           42 ns         42 ns   17059459
ItineraryFixture/STL2/262144/1          53 ns         53 ns   12459202
ItineraryFixture/STL2/1048576/1         71 ns         70 ns   10075356
ItineraryFixture/STL2/1024/2            19 ns         19 ns   36026087
ItineraryFixture/STL2/4096/2            24 ns         24 ns   29528166
ItineraryFixture/STL2/32768/2           40 ns         40 ns   18340397
ItineraryFixture/STL2/262144/2          57 ns         57 ns   13138066
ItineraryFixture/STL2/1048576/2         70 ns         70 ns   10785384

However, change in optimization level may not be an option for some of the projects.

@CaseyCarter
Copy link
Owner

CaseyCarter commented Aug 7, 2018

They are (disguised) function objects, so my code will work as written in C++20. (We're a little circumspect in the current draft about whether the algorithms in std::ranges are function objects or not, but in truth there is no other way to implement them.)

An implementation is in no way required to implement them as function objects: it's only required that they are not findable by ADL and inhibit ADL when found via unqualified name lookup. There's no other pure library implementation, but a front end developer could throw together a custom attribute to provide those properties for function templates in a day. Your code may or may not work as written in C++20.

(I really, really want to force EWG to fix functions so we can - oddly enough - actually use them in libraries.)

@CaseyCarter
Copy link
Owner

Putting -NDEBUG with -O2 does not help:

Is that with itinerary.cpp from #182? I can't explain the differences we're seeing from the same command line and almost precisely the same compiler.

@tomaszkam
Copy link

tomaszkam commented Aug 7, 2018

Is that with itinerary.cpp from #182? I can't explain the differences we're seeing from the same command line and almost precisely the same compiler.

I used itinerary.cpp from this commit (copied raw data) and applied following changes:

     ranges::equal_range(
         itineraries,
         dates,
         ranges::lexicographical_compare.
        toDates));

To something that is guaranteed to work with standard, i.e.:

     ranges::equal_range(
         itineraries,
         dates,
         [](auto const& r1, auto const& r2) { return ranges::lexicographical_compare(r1, r2); },
        toDates));

And also have BENCHMARK_MAIN(); in file to make it runnable. And running it against current master of cmcstl2 (e02b89a).

@ericniebler
Copy link
Collaborator Author

@CaseyCarter wrote:

An implementation is in no way required to implement them as function objects... Your code may or may not work as written in C++20.

Bah, humbug! Being able to pass functions to functions is fundamental and very useful. It galls me that implementors will write them as objects (mark my words <shakes cane>), but we're not allowed to treat them as such.

For the record, I see no performance impact after lifting ranges::lexicographical_compare into a trivial lambda.

@tomaszkam wrote:

the difference is caused by the usage of -O2 vs -O3, however, the apps I have contact with consistently use O2, for them, we are seeing performance drop caused by compiler not optimizing the code - exactly the point I made in paper.

If the argument you are bringing to San Diego is that you should be free to write worse code because it optimizes a little better at lower optimization settings, I can tell you how that argument will go over.

Don't get me wrong. You have made some good points in your paper; in particular the unfortunate integer promotion rules and their interaction with the semantic constraints of Relation are a legitimate problem. Your argument about performance is not helping your case, though. You should either find a better example that substantiates your claim about performance or drop that argument. Your paper will be better for it.

@CaseyCarter
Copy link
Owner

I changed ... to something that is guaranteed to work with standard

I've made a similar change to #182 (I stored the lambda in a named variable outside the benchmark loop). I still have results that differ by at most 5ns at both -O2 and -O1 (-O0 is a very different story: the STL2 benchmarks take ~twice the time).

@CaseyCarter
Copy link
Owner

Being able to pass functions to functions is fundamental and very useful.

Being able to pass function templates and overload sets to functions would be useful as well. I consider it a defect in C++'s support for the functional paradigm that we can't do this today.

@ericniebler
Copy link
Collaborator Author

ericniebler commented Aug 7, 2018

@tomaszkam One other suggestion for your paper: rather that simply dropping the CommonReference syntactic requirements, you could make it a purely semantic requirement instead. Appeal to the existence of a common type that is outside the type system. So long as such a type can be implemented in theory, it need not exist in practice. The relaxation of Common in N3351 hints at that much in D.2, and @sean-parent and @asutton have both said as much explicitly. Without something like that, I cannot support your paper (and still may argue against it).

Given such a purely semantic common type constraint on Relation, your STL1 code is still questionable.

@tomaszkam
Copy link

tomaszkam commented Aug 8, 2018

@tomaszkam One other suggestion for your paper: rather that simply dropping the CommonReference syntactic requirements, you could make it a purely semantic requirement instead. Appeal to the existence of a common type that is outside the type system. So long as such a type can be implemented in theory, it need not exist in practice. The relaxation of Common in N3351 hints at that much in D.2, and @sean-parent and @asutton have both said as much explicitly. Without something like that, I cannot support your paper (and still may argue against it).

But please provide a justification for this requirement in case of the weak orderings, where the objects are merely equivalent. The underlying mathematical model allows building relation over different types (the domain is just union of the objects of a different type), the definition placed in the book you are basing your work (Elements of Programming) does also not mention such requirement. From that perspective, this is only the product of your proposal.

Saying "it is meaningless" without pointing out any problems with it (breaking semantic models, inability to perform runtime checks, algorithm not working), is not an argument, is just opinion, that does not have support in the underlying mathematical model. This is even more evident, by the fact that you can break your semantics requirement (see your example any_view) and algorithms works without a problem - a requirement that is never used is superfluous.

The is a major difference between weak (comparators) and strong orderings (<, ==), as one of the allows an equivalent object (and things are equivalent if they from equivalence relation - there are 3 axioms required for that and nothing more), and other make objects equal (in the mathematical sense same). For the later I agree, that we should be able to point of common_type, as they are meant to represent same underlying entity, so they should have common reference (very similar case to Assingability for example).

@tomaszkam
Copy link

tomaszkam commented Aug 9, 2018

I am willing to assume that actually the difference on the result for the test example, is fault on my side. I am also planning to check it on AMD low-end laptop, to see if it may be architecture related.

The example was a simplification of the actual use case we have - i.e. comparing operating flights of itineraries. I have prepared an updated example, that compares departures on actual flights (there should be more fields being compared, but want to build it iteratively). I want to check if I can reimplement it with STL2 without performance loss.

However, I have miserably failed to build STL2 equivalent for that code. My attempt may be found at:
here or on my github - I get compilation errors from view::join.

Also, if there is better way to express such code, I would be happy to change it.

@tomaszkam
Copy link

tomaszkam commented Aug 9, 2018

The difference from the previous example is that now instead of transforming Itinerary into vector<Date>, I notionally transform it to std::vector<std::vector<Dates>> and then flatten it.

The code to construct view looks like follows:

inline constexpr auto toDate = [](const Flight& f)
{
    return f.departure_date();
};

inline constexpr auto toFlightsDates = [](const Leg& l)
{
    return l.flights() | ranges::view::transform(toDate);
};

inline constexpr auto toDates = [](Itinerary& i)
{
    return ranges::view::join(i.legs() | ranges::view::transform(toFlightsDates));
};

Now, I am able to construct such view and iterate over it, but the invocation of lexicographical_compare fails, in iter_common_reference instatation.

Edit:
Removing all occurrences of common_reference helped and the code now compiles. Also, I have forgotten that begin is lazy for views, so the following wrapper does not work:
[](auto const& r1, auto const& r2) { return ranges::lexicographical_compare(r1, r2); }

And should be :
[](auto&& r1, auto&& r2) { return ranges::lexicographical_compare(r1, r2); }

Edit 2:
From this test I am getting the same result as from the previous example. Drop in performance for -O2, and a slight increase with -O3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants