Skip to content

duplicate name when reading csv from zenodo #176

@bart1

Description

@bart1

I was testing with the german data from the uva zenodo page and extracted this file: https://zenodo.org/records/14711244/files/de.tgz

withr::with_options(list(
  "getRad.vpts_local_path_format"="{substr(radar, 1, 2)}/{radar}/{year}/{radar}_vpts_{year}{month}.csv.gz"),
  getRad::get_vpts("deboo", as.Date("2017-10-1"), source = "/media/bart/data_disk/tmp/"))
#> Called from: getRad::get_vpts("deboo", as.Date("2017-10-1"), source = "/media/bart/data_disk/tmp/")
#> debug: fetched_vpts <- radar_to_name(switch(dplyr::case_when(source == 
#>     "rmi" ~ "rmi", source %in% eval(formals("get_vpts_aloft")$source) ~ 
#>     "aloft", dir.exists(source) ~ "local"), rmi = purrr::map(radar, 
#>     ~get_vpts_rmi(.x, rounded_interval), .purrr_error_call = cl), 
#>     aloft = purrr::map(radar, ~get_vpts_aloft(.x, rounded_interval = rounded_interval, 
#>         source = source), .purrr_error_call = cl), local = get_vpts_local(radar, 
#>         rounded_interval, directory = source)))
#> Error in `purrr::map_chr()`:
#> ℹ In index: 1.
#> ℹ With name: deboo.
#> Caused by error:
#> ! Result must be length 1, not 2.

It seems that this is caused by two radar names in the file:

vroom::vroom("/media/bart/data_disk/tmp/de/deboo/2017/deboo_vpts_201710.csv.gz") |> dplyr::pull("radar") |> table()
#> Rows: 74200 Columns: 26
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (2): radar, source_file
#> dbl  (21): height, u, v, w, ff, dd, sd_vvp, eta, dens, dbz, dbz_all, n, n_db...
#> lgl   (2): gap, vcp
#> dttm  (1): datetime
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> 
#> 10132 deboo 
#> 53550 20650

It seems this change happened exactly in this month:

vroom::vroom("/media/bart/data_disk/tmp/de/deboo/2017/deboo_vpts_201709.csv.gz", show_col_types = F ) |> dplyr::pull("radar") |> table()
#> 
#> 10132 
#> 71400
vroom::vroom("/media/bart/data_disk/tmp/de/deboo/2017/deboo_vpts_201710.csv.gz", show_col_types = F ) |> dplyr::pull("radar") |> table()
#> 
#> 10132 deboo 
#> 53550 20650
vroom::vroom("/media/bart/data_disk/tmp/de/deboo/2017/deboo_vpts_201711.csv.gz", show_col_types = F ) |> dplyr::pull("radar") |> table()
#> 
#> deboo 
#> 71325

It breaks in this function as there are tow different names:

getRad:::radar_to_name
#> function (vpts_df_list) 
#> {
#>     purrr::set_names(vpts_df_list, purrr::map_chr(vpts_df_list, 
#>         function(df) unique(dplyr::pull(df, .data$radar))))
#> }
#> <bytecode: 0x5db748176680>
#> <environment: namespace:getRad>

@PietrH any suggestion for a good resolution? What name to pick I guess we here have the odim and wmo code both in one csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions