Skip to content

Possible Typo in Column Name in read_csvs_from_path_and_reformat Function #5

@bowl-of-porrige

Description

@bowl-of-porrige

I think there's a small typo in the function read_csvs_from_path_and_reformat inside CICIDS2017_labelling_fixed_CICFlowMeter.ipynb.

Right now, the code has this line:

int64_columns = ["Total TCP Flow Time"]

But it should actually be:

int64_columns = ["Total Connection Flow Time"]

Why?

After digging into the source files, I found that the correct column name should be "Total Connection Flow Time", not "Total TCP Flow Time". Here's why:

  1. In FlowFeature.java (line 109), we found the following column:

    cum_cnx_time("Total Connection Flow Time", "TCFT"), // 91
  2. The value of col is set in BasicFlow.java (line 1419) with the following var:

    dump.append(cumulativeConnectionDuration).append(separator);                //91
  3. The var is updated the following way:
    The logic for setting cumulativeConnectionDuration is in FlowGenerator.java (lines 97-102):

    // Set cumulative flow time if TCP packet
    if (TCP_UDP_LIST_FILTER.contains(flow.getProtocol())) {
        long currDuration = flow.getCumulativeConnectionDuration();
        currDuration += flow.getFlowDuration();
        flow.setCumulativeConnectionDuration(currDuration);
    }

    Note how in the comments it is explicitly stated that this value represents TPC flow duration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions