export endpoint: optionally apply a sort order to the CSVs in the output zip#558
export endpoint: optionally apply a sort order to the CSVs in the output zip#558
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds optional lexicographic sorting functionality to the feed version export endpoint. Users can now specify a sort order ("asc" or "desc") to have all CSV files in the exported zip sorted lexicographically before the archive is created.
Changes:
- Added
SortCSVFilesmethod toZipWriterAdapterthat reads, sorts, and rewrites CSV files - Extended
ExportTransformsstruct withLexicographicSortfield and added validation - Added comprehensive test coverage for ascending sort, descending sort, and invalid parameter handling
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tlcsv/adapter.go | Implements CSV sorting logic with SortCSVFiles method and compareRowsLexicographic helper function |
| server/rest/feed_version_export.go | Adds API parameter, validation, and integration with the sort functionality |
| server/rest/feed_version_export_test.go | Adds test cases for ascending/descending sort and invalid parameter validation |
| doc/openapi/rest.json | Updates OpenAPI schema with new lexicographic_sort parameter definition |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| file, err = os.Create(fullPath) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create file %s: %w", filename, err) | ||
| } | ||
|
|
||
| writer := csv.NewWriter(file) | ||
| if err := writer.Write(header); err != nil { | ||
| file.Close() | ||
| return fmt.Errorf("failed to write header to %s: %w", filename, err) | ||
| } | ||
| if err := writer.WriteAll(dataRows); err != nil { | ||
| file.Close() | ||
| return fmt.Errorf("failed to write rows to %s: %w", filename, err) | ||
| } | ||
| writer.Flush() | ||
| if err := writer.Error(); err != nil { | ||
| file.Close() | ||
| return fmt.Errorf("failed to flush writer for %s: %w", filename, err) | ||
| } | ||
| file.Close() |
There was a problem hiding this comment.
File handle is not closed on all error paths. If writer.Write or writer.WriteAll returns an error, the file is closed, but if writer.Flush succeeds and then writer.Error returns an error, the file is not closed before returning. Consider using defer to ensure the file is always closed, or ensure all error paths close the file handle.
| reader := csv.NewReader(file) | ||
| allRows, err := reader.ReadAll() | ||
| file.Close() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to read file %s: %w", filename, err) | ||
| } |
There was a problem hiding this comment.
Loading entire CSV files into memory using ReadAll could cause memory issues with large GTFS feeds. For example, shapes.txt files can contain millions of rows. Consider implementing a streaming sort approach using external sorting algorithms or temporary files for large datasets to avoid potential out-of-memory errors.
| // Close all open files first | ||
| for filename, f := range adapter.DirAdapter.files { | ||
| if err := f.Close(); err != nil { | ||
| return fmt.Errorf("failed to close file %s: %w", filename, err) | ||
| } | ||
| delete(adapter.DirAdapter.files, filename) | ||
| } |
There was a problem hiding this comment.
The SortCSVFiles method closes all files and deletes them from adapter.DirAdapter.files map (line 632), but then ZipWriterAdapter.Close() expects those file handles to still be open for seeking and copying to the zip archive (lines 742-758). This will result in an empty files map when Close() is called, meaning no files will be added to the zip archive. The sorting should preserve the files in a way that Close() can still access them, or Close() should be modified to handle already-closed files by reopening them.
| // Validate lexicographic sort order if provided | ||
| if req.Transforms != nil && req.Transforms.LexicographicSort != "" { | ||
| if req.Transforms.LexicographicSort != "asc" && req.Transforms.LexicographicSort != "desc" { | ||
| return util.NewBadRequestError(fmt.Sprintf("invalid lexicographic_sort value: %s (must be 'asc' or 'desc')", req.Transforms.LexicographicSort), nil) |
There was a problem hiding this comment.
The error message prefix does not match the established pattern. Other error messages in the validation use lowercase (e.g., "only 'gtfs_zip' format is currently supported" at line 369), but this message uses "invalid lexicographic_sort value". For consistency, consider using lowercase or a consistent capitalization pattern throughout validation error messages.
| return util.NewBadRequestError(fmt.Sprintf("invalid lexicographic_sort value: %s (must be 'asc' or 'desc')", req.Transforms.LexicographicSort), nil) | |
| return util.NewBadRequestError(fmt.Sprintf("lexicographic_sort must be 'asc' or 'desc' (got %s)", req.Transforms.LexicographicSort), nil) |
No description provided.