Evaluation of BridgeDrive on Fail2Drive

Dear Simon Gerstenecker,

First of all, thank you very much for creating such a challenging benchmark. I believe it will have a significant impact on the autonomous driving community.

To investigate the robustness of our BridgeDrive, we conducted a thorough evaluation on your benchmark. Here are the results:

<div align="center">

<table>
 <thead>
 <tr>
 <th rowspan="2">Method</th>
 <th colspan="1">Bench2Drive</th>
 <th colspan="3">Fail2Drive In-Distribution</th>
 <th colspan="3">Fail2Drive Generalization</th>
 </tr>
 <tr>
 <th>DS ↑</th>
 <th>DS ↑</th>
 <th>SR(%) ↑</th>
 <th>HM ↑</th>
 <th>DS ↑</th>
 <th>SR(%) ↑</th>
 <th>HM ↑</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>TCP</td>
 <td align="center">59.9</td>
 <td align="center">24.7</td>
 <td align="center">39.1</td>
 <td align="center">30.3</td>
 <td align="center">24.5 (-0.8%)</td>
 <td align="center">31.4 (-19.7%)</td>
 <td align="center">27.5 (-9.1%)</td>
 </tr>
 <tr>
 <td>UniAD</td>
 <td align="center">45.8</td>
 <td align="center">47.5</td>
 <td align="center">36.3</td>
 <td align="center">41.2</td>
 <td align="center">44.0 (-7.4%)</td>
 <td align="center">27.6 (-24.0%)</td>
 <td align="center">33.9 (-17.6%)</td>
 </tr>
 <tr>
 <td>Orion</td>
 <td align="center">77.8</td>
 <td align="center">53.0</td>
 <td align="center">52.0</td>
 <td align="center">52.5</td>
 <td align="center">51.2 (-3.4%)</td>
 <td align="center">46.0 (-11.5%)</td>
 <td align="center">48.5 (-7.7%)</td>
 </tr>
 <tr>
 <td>HiP-AD</td>
 <td align="center">86.8</td>
 <td align="center">74.1</td>
 <td align="center">70.7</td>
 <td align="center">72.4</td>
 <td align="center">67.1 (-9.4%)</td>
 <td align="center">56.7 (-19.8%)</td>
 <td align="center">61.5 (-15.1%)</td>
 </tr>
 <tr>
 <td>SimLingo</td>
 <td align="center">85.1</td>
 <td align="center">82.6</td>
 <td align="center">79.3</td>
 <td align="center">80.9</td>
 <td align="center">71.7 (-13.2%)</td>
 <td align="center">55.0 (-30.6%)</td>
 <td align="center">62.2 (-23.1%)</td>
 </tr>
 <tr>
 <td>TFv5</td>
 <td align="center">84.2</td>
 <td align="center">83.3</td>
 <td align="center">78.5</td>
 <td align="center">80.8</td>
 <td align="center">75.4 (-9.5%)</td>
 <td align="center">61.1 (-22.2%)</td>
 <td align="center">67.5 (-16.5%)</td>
 </tr>
 <tr>
 <td>TFv6</td>
 <td align="center">95.2</td>
 <td align="center">90.2</td>
 <td align="center">93.3</td>
 <td align="center">91.7</td>
 <td align="center">79.5 (-11.9%)</td>
 <td align="center">70.7 (-24.2%)</td>
 <td align="center">74.8 (-18.4%)</td>
 </tr>
 <tr>
 <td>BridgeDrive (Ours)</td>
 <td align="center">96.3</td>
 <td align="center">91.6</td>
 <td align="center">95.0</td>
 <td align="center">93.3</td>
 <td align="center">81.9 (-10.5%)</td>
 <td align="center">75.0 (-21.1%)</td>
 <td align="center">78.3 (-16.0%)</td>
 </tr>
 </tbody>
</table>

</div>

For completeness, I could also send you the evaluation JSON files via email.

I believe including these results in your paper would greatly enhance the comprehensiveness of your work, especially since none of the compared methods are diffusion-based—BridgeDrive would help fill that gap.

I hope you find this message helpful, and I look forward to a fruitful discussion. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of BridgeDrive on Fail2Drive #7

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Method	Bench2Drive	Fail2Drive In-Distribution			Fail2Drive Generalization
Method	DS ↑	DS ↑	SR(%) ↑	HM ↑	DS ↑	SR(%) ↑	HM ↑
TCP	59.9	24.7	39.1	30.3	24.5 _(-0.8%)	31.4 _(-19.7%)	27.5 _(-9.1%)
UniAD	45.8	47.5	36.3	41.2	44.0 _(-7.4%)	27.6 _(-24.0%)	33.9 _(-17.6%)
Orion	77.8	53.0	52.0	52.5	51.2 _(-3.4%)	46.0 _(-11.5%)	48.5 _(-7.7%)
HiP-AD	86.8	74.1	70.7	72.4	67.1 _(-9.4%)	56.7 _(-19.8%)	61.5 _(-15.1%)
SimLingo	85.1	82.6	79.3	80.9	71.7 _(-13.2%)	55.0 _(-30.6%)	62.2 _(-23.1%)
TFv5	84.2	83.3	78.5	80.8	75.4 _(-9.5%)	61.1 _(-22.2%)	67.5 _(-16.5%)
TFv6	95.2	90.2	93.3	91.7	79.5 _(-11.9%)	70.7 _(-24.2%)	74.8 _(-18.4%)
BridgeDrive (Ours)	96.3	91.6	95.0	93.3	81.9 _(-10.5%)	75.0 _(-21.1%)	78.3 _(-16.0%)

Evaluation of BridgeDrive on Fail2Drive #7

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions