I think there's a small typo in the function read_csvs_from_path_and_reformat inside CICIDS2017_labelling_fixed_CICFlowMeter.ipynb.
Right now, the code has this line:
int64_columns = ["Total TCP Flow Time"]
But it should actually be:
int64_columns = ["Total Connection Flow Time"]
Why?
After digging into the source files, I found that the correct column name should be "Total Connection Flow Time", not "Total TCP Flow Time". Here's why:
-
In FlowFeature.java (line 109), we found the following column:
cum_cnx_time("Total Connection Flow Time", "TCFT"), // 91
-
The value of col is set in BasicFlow.java (line 1419) with the following var:
dump.append(cumulativeConnectionDuration).append(separator); //91
-
The var is updated the following way:
The logic for setting cumulativeConnectionDuration is in FlowGenerator.java (lines 97-102):
// Set cumulative flow time if TCP packet
if (TCP_UDP_LIST_FILTER.contains(flow.getProtocol())) {
long currDuration = flow.getCumulativeConnectionDuration();
currDuration += flow.getFlowDuration();
flow.setCumulativeConnectionDuration(currDuration);
}
Note how in the comments it is explicitly stated that this value represents TPC flow duration
I think there's a small typo in the function
read_csvs_from_path_and_reformatinside CICIDS2017_labelling_fixed_CICFlowMeter.ipynb.Right now, the code has this line:
But it should actually be:
Why?
After digging into the source files, I found that the correct column name should be "Total Connection Flow Time", not "Total TCP Flow Time". Here's why:
In FlowFeature.java (line 109), we found the following column:
The value of col is set in BasicFlow.java (line 1419) with the following var:
The var is updated the following way:
The logic for setting
cumulativeConnectionDurationis in FlowGenerator.java (lines 97-102):Note how in the comments it is explicitly stated that this value represents TPC flow duration