diff --git a/docs/design/supply/bike-logsums.md b/docs/design/supply/bike-logsums.md index c2d63842b..a948287da 100644 --- a/docs/design/supply/bike-logsums.md +++ b/docs/design/supply/bike-logsums.md @@ -11,23 +11,25 @@ This process generates the output\bikeMgraLogsum.csv and output\bikeTazLogsum.cs -**Inputs:** The AT network is read from input\SANDAG_Bike_Net.dbf and input\SANDAG_Bike_Node.dbf. Coefficients and other settings are read from sandag_abm.properties. +**Inputs:** The AT network is read from input\SANDAG_Bike_Net.dbf and input\SANDAG_Bike_Node.dbf. The bike model settings and utility files are found in src\asim\scripts\bike_route_choice. **Network Construction:** Node and edge attributes are processed from the network files. Traversals are derived for edge pairs, using angle to determine turn type. **Path Sampling:** - Doubly stochastic, coefficients are randomized and resulting edge cost is randomized -- Dijkstra's algorithm finds path each origin to destination with minimum cost +- Dijkstra's algorithm finds path each origin to destination with minimum combined edge and traversal cost - Add paths to path alternatives list and calculate path sizes from overlap of alternatives -- Repeat until minimum number of alternatives and minimum path size is met for each origin/destination pair - -**Path Resampling:** Resample paths from alternatives list until minimum path size is met. +- Repeat for preset number of iterations (default 10) **Path Choice Utility Calculation:** Calculate bike logsum values for each origin/destination pair from utility expression on each path alternative. -**Outputs:** Bike logsums and times are written to output\bikeMgraLogsum.csv and output\bikeTazLogsum.csv. During ActivitySim preprocessing, TAZ values are added to BIKE_LOGSUM and BIKE_TIME matrices of output\skims\traffic_skims_AM.omx and MGRA values are written to output\skims\maz_maz_bike.csv. +**Outputs:** Bike logsums and times are written to output\bikeMgraLogsum.csv and output\bikeTazLogsum.csv. Log and trace files are written to output\bike. During ActivitySim preprocessing, TAZ values are added to BIKE_LOGSUM and BIKE_TIME matrices of output\skims\traffic_skims_AM.omx and MGRA values are written to output\skims\maz_maz_bike.csv. ## Further reading -[Active Transportation Improvements Report (2015)](https://github.com/SANDAG/ABM/wiki/files/at.pdf) \ No newline at end of file +[Bike Route Choice README](../../../src/asim/scripts/bike_route_choice/README.md) + +[ABM3 Bike Model Report (2025)](../../pdf_reports/SANDAG_ABM3_Bike_Model_Report.pdf) + +[Active Transportation Improvements Report (2015)](https://github.com/SANDAG/ABM/wiki/files/at.pdf) (Prior bike logsum implementation in Java) \ No newline at end of file diff --git a/docs/images/design/bike_logsum_design.png b/docs/images/design/bike_logsum_design.png index ffcfc895e..48a3c4454 100644 Binary files a/docs/images/design/bike_logsum_design.png and b/docs/images/design/bike_logsum_design.png differ diff --git a/docs/pdf_reports/SANDAG_ABM3_Bike_Model_Report.pdf b/docs/pdf_reports/SANDAG_ABM3_Bike_Model_Report.pdf new file mode 100644 index 000000000..d42d37b54 Binary files /dev/null and b/docs/pdf_reports/SANDAG_ABM3_Bike_Model_Report.pdf differ diff --git a/src/asim/scripts/bike_route_choice/README.md b/src/asim/scripts/bike_route_choice/README.md new file mode 100644 index 000000000..4e70c0262 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/README.md @@ -0,0 +1,136 @@ +# Active Transport Bike Model + +This directory contains a Python-based implementation of a bike route +choice model first developed in Java. The bike route choice model is a +multinomial logit discrete choice model that estimates the probabilities +of an individual’s choosing among multiple alternative routes between a given +origin and destination. This discrete choice model forms the basis of both +the estimation of the level of service between OD pairs used by the demand +models and the estimation of the number of cyclists assigned to network links. +The level of service is the expected maximum utility, or logsum, from the model, +and network link assignments are made with individual route probabilities. + +## Setup and Running +The UV environment used to run AcitivtySim (asim_140) is also used to run the bike model. +The bike model is launched from the `bike_route_choice.py` file and accepts a +single (optional) command-line argument specifying the settings YAML file to +read. The default value for this argument is `bike_route_choice_settings.yaml`. + +### Settings +In the specified settings YAML file, several options are available for configuration: + +**Network** +- `data_dir`: path to the directory in which the model inputs are stored +- `node_file`: name of the input shapefile containing bike network nodes +- `link_file`: name of the input shapefile containing bike network links +- `zone_level`: zone level on which path choice should be performed. Allowed values: `taz`,`mgra` +- `zone_subset`: subset of zones to use for model testing + +**Utilities** +- `edge_util_file`: name of the file containing the utility function variable specification for link costs +- `traversal_util_file`: name of the file containing the utility function variable specification for edtraversalge costs +- `random_scale_coef`: scaling coefficient to use for alternative utilities +- `random_scale_link`: scaling coefficient to use for randomized edge costs + +**Model Hyperparameters** +- `number_of_iterations`: maximum number of paths which should be found before terminating +- `number_of_batches`: number of batches into which origin centroids should be divided for sequential processing +- `number_of_processors`: number of processors to use for processing each batch +- `max_dijkstra_utility`: cutoff threshold utility (positive) for early termination of the shortest-paths search +- `min_iterations`: the minimum number of paths found - zone pairs with fewer paths will be discarded + +**Output and Caching** +- `output_path`: path to the directory in which model outputs should be written +- `output_file_path`: path to the final bike logsum file (bikeMgraLogsum.csv, bikeTazLogsum.csv) +- `save_bike_net`: whether to write the derived network to a set of CSVs +- `read_cached_bike_net`: whether to read the derived network from a cache instead of re-deriving from the original network +- `bike_speed`: the speed in mph to use for calculating bike travel times in output + +**Tracing** +- `trace_bike_utilities`: whether to output the chosen path sets from a specified list of origin-destination pairs +- `trace_origins`: ordered list of origins whose paths will be output in tracing +- `trace_destinations`: ordered list of destinations corresponding to the origins for tracing +- `generate_shapefile`: whether to output the trace paths as a shapefile in addition to CSV +- `crs`: network coordinate reference system to attach to the output shapefiles + +## Utility Calculation +The variable specifications used in the utility function for edges and traversals +are stored in `bike_edge_utils.csv` and `bike_traversal_utils.csv`, respectively, +including the expressions and their associated coefficients. The utility accounts +for the distance on different types of facilities, the gain in elevation, turns, +signal delay, and navigating un-signalized intersections with high-volume facilities. +To account for the correlation in the random utilities of overlapping routes, the +utility function in the model includes a “path size” measure, described in the +following section. + +## Path Size +The size of path alternative $n_i$ in alternative set $\textbf{n}$ is calculated using the +formula: +```math +size(n_i)=\sum_{l\in n_i}\frac{edgeLength(l)}{pathLength(n_i)*pathsUsing(\textbf{n},l)} +``` +where $pathsUsing$ is the integer number of paths in the alternative set $\textbf{n}$ +which contain link $l$. Its use derives from the theory of aggregate and elemental +alternatives, where a link is an aggregation of all paths that use the link. If multiple +routes overlap, their "size" is less than one. If two routes overlap completely, their +size will be one-half. + +## Path Sampling +Given an origin, paths are sampled to all relevant destinations in the network by repeatedly +applying Dijstra’s algorithm to search for the paths that minimize a stochastic link +generalized cost with a mean given by the additive inverse of the path utility. For each path +sampling iteration, a random coefficient vector is first sampled from a non-negative +multivariate uniform distribution with zero covariance and mean equal to the link +generalized cost coefficients corresponding to the path choice utility function. As the path +search extends outward from the origin, the random cost coefficients do not vary over links, +but only over iterations of path sampling. In the path search, the cost of each subsequent +link is calculated by summing the product of the random coefficients with their respective +link attributes, and then multiplying the result by a non-negative discrete random edge cost +multiplier. + +## Threshold Utility and Distance +To improve runtime, the Dijkstra's algorithm implementation allows for early termination once a +predetermined utility threshold has been reached in its search, with no paths found for zone pairs +whose utility is beyond the threshold. However, the utility of paths is not necessarily the most +user-friendly metric - it is reasonable to aim to define the cutoff by distance rather than utility. +However, because the underlying implementation of Dijkstra's algorithm is only aware of utilities, +not distances, this is not directly feasible, as utility and distance do not correspond directly. + +To address this, the `bike_threshold_calculator` script has been built to provide a handy mechanism +to search for a utility threshold which approximates a desired distance cutoff. Given a valid +configuration YAML file (as described above), the script iteratively performs a binary search, +using the YAML file's `max_dijkstra_utility` setting as a starting threshold utility and modifying +its value on subsequent iterations until the target distance is within the allowed margin of error +(or the maximum number of bike model iterations has been completed). The calling signature of the +script is shown below and requires a minimum of two command line arguments: the settings filepath +and the target distance – the remainder of the parameters are optional, with each optional argument +requiring those listed before it in sequence: + +~~~ +Usage: + python bike_threshold_calculator.py [target_margin [percentile [max_iterations]]] + + parameters: + settings filepath: path to YAML file containing bike model settings + target distance: the distance for which the search should aim (in miles) + target margin: the margin of error (< 1) allowed before termination (optional, default: 0.1) + percentile: the percentile of distance to compare against the target (optional, default: 0.99) + max iterations: the most bike model iterations that can be performed in the search (optional, default: 20) + + examples: + + python bike_threshold_calculator.py bike_route_choice_settings_taz.yaml 20 + # the resulting 99th %ile distance must be w/in 10% of the 20-mile target distance + # equivalent to: + python bike_threshold_calculator.py bike_route_choice_settings_taz.yaml 20 0.1 0.99 20 + + python bike_threshold_calculator.py bike_route_choice_settings_mgra.yaml 3 0.05 + # the resulting 99th %ile distance must be w/in 5% of the three-mile target distance +~~~ + +With the default parameters used, the script will search until either 20 iterations have elapsed or +the 99th percentile of distances found is within 10% of the specified target distance. At termination, +a CSV named `threshold_results.csv` will be written to the output directory (set in the `output_path` +setting in the YAML file) with the input utility thresholds, the iteration runtime, and the chosen +percentile of distance. These will also be scatter plotted on a graph and displayed, but note that +this graph is not saved automatically to the output directory. \ No newline at end of file diff --git a/src/asim/scripts/bike_route_choice/bike_edge_utils.csv b/src/asim/scripts/bike_route_choice/bike_edge_utils.csv new file mode 100644 index 000000000..46532d275 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_edge_utils.csv @@ -0,0 +1,51 @@ +Label,Description,Expression,Coefficient +################################################################################## +# The below coefficients are adapted from Lukawska et al. (2023),,, +util_distance_general,Edge length (distance) in miles,distance,-10.93 +util_medium_upslope,Edge slope in range [1%-3.5%],"@df.distance * ((df.gain / (df.distance * 5280) >= 0.01) & (df.gain / (df.distance * 5280) <= 0.035)).astype(int)",-9.012 +util_high_upslope,Edge slope > 3.5%,"@df.distance * (df.gain / (df.distance * 5280) > 0.035).astype(int)",-18.19 +# per class penalties - large roadways,,, +util_lg_protected,Large roadway w/ protected lanes,"@df.distance * ( df.functionalClass.isin([3,4]) & df.cycleTrack ).astype(int)",0.1748 +util_lg_painted,Large roadway w/ painted lanes & no protected lanes,"@df.distance * ( df.functionalClass.isin([3,4]) & (~df.cycleTrack) & (df.bikeClass == 2) ).astype(int)",-3.158 +util_lg_nofacils,Large roadway w/o bike lanes,"@df.distance * ( df.functionalClass.isin([3,4]) & (~df.cycleTrack) & ~(df.bikeClass == 2) ).astype(int)",-2.513 +# per-class penalties - medium roadways,,, +# util_med_protected,Medium roadway w/ protected lanes,"@df.distance * ( df.functionalClass.isin([5,6]) & df.cycleTrack ).astype(int)",0.000 +util_med_painted,Medium roadway w/ painted lanes & no protected lanes,"@df.distance * ( df.functionalClass.isin([5,6]) & (~df.cycleTrack) & (df.bikeClass == 2) ).astype(int)",-0.5464 +util_med_nofacils,Medium roadway w/o bike lanes,"@df.distance * ( df.functionalClass.isin([5,6]) & (~df.cycleTrack) & ~(df.bikeClass == 2) ).astype(int)",-1.235 +# per class penalties - residential roadways,,, +util_res_protected,Residential roadway w/ protected lanes,"@df.distance * ( (df.functionalClass == 7) & df.cycleTrack ).astype(int)",-0.9835 +util_res_painted,Residential roadway w/ painted lanes & no protected lanes,"@df.distance * ( (df.functionalClass == 7) & (~df.cycleTrack) & (df.bikeClass == 2) ).astype(int)",0.9288 +util_res_nofacils,Residential roadway w/o bike lanes,"@df.distance * ( (df.functionalClass == 7) & (~df.cycleTrack) & ~(df.bikeClass == 2) ).astype(int)",-1.901 +# per class penalties - non-vehicle routes,,, +util_cycleway,Cycleway,"@df.distance * ( (~df.functionalClass.isin([3,4,5,6,7,10])) & (df.bikeClass == 2) ).astype(int)",0.4152 +util_sup,Shared-Use Path,"@df.distance * ( (~df.functionalClass.isin([3,4,5,6,7,10])) & (df.bikeClass == 1) ).astype(int)",-1.705 +util_pedzone,Pedestrian Zone,"@df.distance * ( (~df.functionalClass.isin([3,4,5,6,7,10])) & (~df.bikeClass.isin([1,2])) ).astype(int)",-4.021 +# FIXME better way to check wrong way?,,, +util_wrong_way,Wrong Way - no lanes in direction of travel and no bike lane,"@df.distance * ( (~df.cycleTrack) & (~(df.bikeClass == 2)) & (df.lanes == 0) ).astype(int)",-3.202 +# The above coefficients are adapted from Lukawska et al. (2023),,, + + +################################################################################## +# The below coefficients are from the ABM2 model and have dubious sourcing (see ABM2 documentation),,, +# util_class_0_bike_lane_dist,Distance without bike lanes,"@df.distance * np.where((df.bikeClass < 1) | (df.bikeClass > 3), 1, 0)",-0.858 +# util_class_I_bike_lane_dist,Distance on class I bike lanes,"@df.distance * np.where(df.bikeClass == 1, 1, 0)",-0.348 +# util_class_II_bike_lane_dist,Distance on class II bike lanes,"@df.distance * np.where((df.bikeClass == 2) & (~df.cycleTrack), 1, 0)",-0.544 +# util_class_III_bike_lane_dist,Distance on class III bike lanes,"@df.distance * np.where((df.bikeClass == 3) & (~df.bikeBlvd), 1, 0)",-0.858 +# util_art_no_bike_lane,Distance on arterial without bike lanes,"@df.distance * np.where((df.arterial) & (df.bikeClass != 2) & (df.bikeClass != 1), 1, 0)",-1.050 +# util_dist_cycle_track_class_II,Distance on cyle track class II bike lanes,"@df.distance * df.cycleTrack",-0.424 +# util_dist_cycle_track_class_III,Distance on boulevard class III bike routes,"@df.distance * df.bikeBlvd",-0.343 +# FIXME better way to check wrong way distance?,,, +# util_dist_wrong_way,Distance wrong way -- no lanes in direction of travel and no bike lane means wrong way,"@df.distance * np.where((df.bikeClass != 1) & (df.lanes == 0), 1, 0)",-3.445 +# util_elevation_gain_ft,Cumulative gain in elevation ignoring declines in feet,@df.gain,-0.015 +# util_access_of_highway,"Unallowed access onto interstate, freeway, or expressway","@(df.functionalClass == 1) | (df.functionalClass == 2)",-999.9 +# The above coefficients are from the ABM2 model and have dubious sourcing (see ABM2 documentation),,, +##################################################################################,,, +# The below coefficients are from Meister et al. (2024) converted from km to miles,,, +# util_distance,Total Distance,@df.distance,-0.944 +# util_bike_path,Distance on Bike Path (Class I),"@df.distance * np.where(df.bikeClass == 1, 1, 0)",1.969 +# util_bike_lane,Distance on Bike Lane (Class II),"@df.distance * np.where(df.bikeClass == 2, 1, 0)",1.616 +# util_speed_limit,Distance where speed limit is <= 25 mph,"@df.distance * np.where(df.speedLimit <= 25, 1, 0)",0.081 +# util_low_incline,Distance where slope is > 2% and <= 6%,"@df.distance * np.where((df.gain / (df.distance * 5280) > 0.02) & (df.gain / (df.distance * 5280) <= 0.06), 1, 0)",-1.727 +# util_med_incline,Distance where slope is > 6% and <= 10%,"@df.distance * np.where((df.gain / (df.distance * 5280) > 0.06) & (df.gain / (df.distance * 5280) <= 0.1), 1, 0)",-9.171 +# util_steep_incline,Distance where slope is > 10%,"@df.distance * np.where(df.gain / (df.distance * 5280) > 0.1, 1, 0)",-12.92 +# The above coefficients are from Meister et al. (2024) converted from km to mi,,, \ No newline at end of file diff --git a/src/asim/scripts/bike_route_choice/bike_net_reader.py b/src/asim/scripts/bike_route_choice/bike_net_reader.py new file mode 100644 index 000000000..d911ae791 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_net_reader.py @@ -0,0 +1,1006 @@ +import pandas as pd +import numpy as np +import os +import math +import logging + +# import tqdm +from bike_route_utilities import BikeRouteChoiceSettings, read_file + +# turn type encodings +TURN_NONE = 0 +TURN_LEFT = 1 +TURN_RIGHT = 2 +TURN_REVERSE = 3 + +BLUE_STEEL = TURN_RIGHT +LE_TIGRE = TURN_RIGHT +FERRARI = TURN_RIGHT +MAGNUM = TURN_LEFT + + +def create_and_attribute_edges( + settings: BikeRouteChoiceSettings, + node: pd.DataFrame, + link: pd.DataFrame, + logger: logging.Logger, +) -> pd.DataFrame: + """ + Create and attribute edges from the provided node and link dataframes. + This function creates a directional edge dataframe from the links, + creates reverse direction edges, combines them, and attributes them. + It also manipulates the node dataframe to the output format and calculates + edge headings and arterial status. + + Parameters: + settings: BikeRouteChoiceSettings - Settings for the bike route choice model + node: pd.DataFrame - Node dataframe + link: pd.DataFrame - Link dataframe + + Returns: + edges: pd.DataFrame - Edge dataframe with derived attributes + nodes: pd.DataFrame - Node dataframe with derived attributes + + """ + + # create directional edge dataframe from links + logger.info("Creating directional edges from links") + abEdges = ( + link.rename( + columns={ + "A": "fromNode", + "B": "toNode", + "AB_Gain": "gain", + "ABBikeClas": "bikeClass", + "AB_Lanes": "lanes", + "Func_Class": "functionalClass", + "Bike2Sep": "cycleTrack", + "Bike3Blvd": "bikeBlvd", + "SPEED": "speedLimit", + } + ) + .copy() + .assign( + distance=link.Shape_Leng / 5280.0, # convert feet to miles + autosPermitted=link.Func_Class.isin(range(1, 8)), + centroidConnector=link.Func_Class == 10, + )[ + [ + "fromNode", + "toNode", + "bikeClass", + "lanes", + "functionalClass", + "centroidConnector", + "autosPermitted", + "cycleTrack", + "bikeBlvd", + "distance", + "gain", + "speedLimit", + "geometry", + ] + ] + ) + + # create reverse direction edges + baEdges = ( + link.rename( + columns={ + "B": "fromNode", + "A": "toNode", + "BA_Gain": "gain", + "BABikeClas": "bikeClass", + "BA_Lanes": "lanes", + "Func_Class": "functionalClass", + "Bike2Sep": "cycleTrack", + "Bike3Blvd": "bikeBlvd", + "SPEED": "speedLimit", + } + ) + .copy() + .assign( + distance=link.Shape_Leng / 5280.0, # convert feet to miles + autosPermitted=link.Func_Class.isin(range(1, 8)), + centroidConnector=link.Func_Class == 10, + geometry=link.geometry.reverse(), + )[ + [ + "fromNode", + "toNode", + "bikeClass", + "lanes", + "functionalClass", + "centroidConnector", + "autosPermitted", + "cycleTrack", + "bikeBlvd", + "distance", + "gain", + "speedLimit", + "geometry", + ] + ] + ) + + # combine bidirectional edges + edges = pd.concat([abEdges, baEdges]).set_index(["fromNode", "toNode"]) + + # manipulate node dataframe to output format + nodes = ( + node.assign(centroid=(node.MGRA > 0) | (node.TAZ > 0)) + .rename( + columns={ + "NodeLev_ID": "nodeID", + "XCOORD": "x", + "YCOORD": "y", + "MGRA": "mgra", + "TAZ": "taz", + "Signal": "signalized", + } + ) + .drop(columns=["ZCOORD", "geometry"]) + .set_index("nodeID") + ) + + # calculate edge headings + # Merge coordinates for fromNode and toNode + edges_with_coords = ( + edges.reset_index() + .merge( + nodes[["x", "y"]], + how="left", + left_on="fromNode", + right_index=True, + suffixes=("", "_from"), + ) + .merge( + nodes[["x", "y"]], + how="left", + left_on="toNode", + right_index=True, + suffixes=("_from", "_to"), + ) + ) + + # Calculate angle in radians + edges_with_coords["angle"] = np.arctan2( + edges_with_coords["y_to"] - edges_with_coords["y_from"], + edges_with_coords["x_to"] - edges_with_coords["x_from"], + ) + + # Set index and assign angle back to edges + edges_with_coords = edges_with_coords.set_index(["fromNode", "toNode"]).reindex( + edges.index + ) + assert edges_with_coords.index.equals( + edges.index + ), "Index mismatch between edges and coordinates" + edges["angle"] = edges_with_coords["angle"] + + # attribute (major) arterial status + edges["majorArterial"] = ( + (edges.functionalClass <= 3) + & (edges.functionalClass > 0) + & (edges.bikeClass != 1) + ) + edges["arterial"] = ( + (edges.functionalClass <= 4) + & (edges.functionalClass > 0) + & (edges.bikeClass != 1) + ) + + # keep track of how many duplicate edges are (major) arterials + dupMajArts = edges.groupby(edges.index).majorArterial.sum() + dupMajArts.index = pd.MultiIndex.from_tuples(dupMajArts.index) + edges["dupMajArts"] = dupMajArts.reindex(edges.index) + + dupArts = edges.groupby(edges.index).arterial.sum() + dupArts.index = pd.MultiIndex.from_tuples(dupArts.index) + edges["dupArts"] = dupArts.reindex(edges.index) + + # keep track of how many edges emanating from a node cross (major) arterials + nodes["majorArtXings"] = ( + edges.groupby(edges.index.get_level_values(0)) + .majorArterial.sum() + .reindex(nodes.index) + ) + nodes["artXings"] = ( + edges.groupby(edges.index.get_level_values(0)) + .arterial.sum() + .reindex(nodes.index) + ) + return edges, nodes + + +# def recreate_java_attributes( +# traversals: pd.DataFrame, +# ): +# """ +# Recreate Java attributes for traversals. +# WARNING: This function contains bugs that we think exist in the Java implementation. +# This is merely an optional function used for potential backwards compatibility.\ + +# Loop and vectorized outputs should be the same. Loop code closely mimic the Java implementation, +# while vectorized code is more efficient for python implementation. + +# Parameters: +# traversals: pd.DataFrame - Traversal dataframe with derived attributes + +# Returns: +# pd.DataFrame - Traversal dataframe with recreated Java attributes + +# """ +# def isThruJunc(trav, turn_type, last_all): +# # this method is buggy because it only considers the last traversal +# # instead of checking for any right turn +# if trav.centroid_start or trav.centroid_thru or trav.centroid_end: +# return False + +# if trav.turnType != TURN_NONE: +# return False + +# if ( +# # if there are no sibling traversals +# len( +# traversals[ +# (traversals.start == trav.start) +# & (traversals.thru == trav.thru) +# & (traversals.end != trav.start) +# & (traversals.end != trav.end) +# ] +# ) +# == 0 +# ): +# return True + + +# if last_all == "last": +# # if the last sibling traversal is of the target turn type +# if ( +# traversals[ +# (traversals.start == trav.start) +# & (traversals.thru == trav.thru) +# & (traversals.end != trav.start) +# & (traversals.end != trav.end) +# ] +# .iloc[-1] +# .turnType +# == turn_type +# ): +# return False +# else: +# return True + +# elif last_all == "any": +# # if any sibling traversal is of the target turn type +# return not ( +# traversals[ +# (traversals.start == trav.start) +# & (traversals.thru == trav.thru) +# & (traversals.end != trav.start) +# & (traversals.end != trav.end) +# ].turnType +# == turn_type +# ).any() + +# # keep track of whether traversal is "thru junction" +# traversals["ThruJunction_anynonevec"] = ( +# ~( +# traversals.centroid_start +# | traversals.centroid_thru +# | traversals.centroid_end +# ) +# & (traversals.turnType == TURN_NONE) +# & (traversals.none_turns - traversals.dupNoneTurns_toEdge == 0) +# ) + +# # create columns to populate in the slow loop +# traversals["ThruJunction_lastnoneloop"] = False +# traversals["ThruJunction_lastrtloop"] = False +# traversals["ThruJunction_anynoneloop"] = False +# traversals["ThruJunction_anyrtloop"] = False + +# logger.info("Beginning slow loop recreating Java attributes") + +# # the slow loop through all the traversals (could this be parallelized?) +# for idx, trav in tqdm.tqdm(traversals.iterrows(), total=len(traversals)): + +# # check all four possible parameter combos +# traversals.loc[idx, "ThruJunction_lastnoneloop"] = isThruJunc( +# trav, TURN_NONE, "last" +# ) +# traversals.loc[idx, "ThruJunction_lastrtloop"] = isThruJunc( +# trav, TURN_RIGHT, "last" +# ) +# traversals.loc[idx, "ThruJunction_anynoneloop"] = isThruJunc( +# trav, TURN_NONE, "any" +# ) +# traversals.loc[idx, "ThruJunction_anyrtloop"] = isThruJunc( +# trav, TURN_RIGHT, "any" +# ) + +# # the below is buggy because it counts none-turns instead of right turns + +# # index: all last traversals of all input groups +# # values: whether the index traversal is a none turn +# is_none = ( +# traversals.groupby(["start", "thru"]) +# .last() +# .reset_index() +# .set_index(["start", "thru", "end"]) +# .turnType +# == TURN_NONE +# ) + +# last_travs = is_none.index + +# # index: all last and penultimate traversals +# # values: whether the index traversal is a none turn +# is_none = pd.concat( +# [ +# is_none, +# traversals[ +# # don't allow index of penultimates to match last - we want two different candidates +# ~traversals.set_index(["start", "thru", "end"]).index.isin(last_travs) +# ] +# .groupby(["start", "thru"]) +# .last() +# .reset_index() +# .set_index(["start", "thru", "end"]) +# .turnType +# == TURN_NONE, +# ] +# ) + +# # create an index of all penultimate sibling turn movements +# penult_travs = is_none.index[~is_none.index.isin(last_travs)] + +# # get the penultimate sibling turn movments whose turn type is none +# penult_is_none = is_none[penult_travs].reset_index(2).turnType.rename("penultimate_is_none") + +# # get the last sibling turn movements whose turn type is none +# last_is_none = is_none[last_travs].reset_index(2).turnType.rename("last_is_none") + +# # tack on the two new columns +# traversals = traversals.merge( +# last_is_none, left_on=["start", "thru"], right_index=True, how="left" +# ).merge(penult_is_none, left_on=["start", "thru"], right_index=True, how="left") + +# # for all traversals that match the last traversal, use the penultimate value +# traversals.loc[ +# traversals.index.isin(last_is_none.index), "buggyRTE_none" +# ] = traversals.loc[traversals.index.isin(last_is_none.index), "penultimate_is_none"] +# # for all other traversals, use the last value +# traversals.loc[ +# ~traversals.index.isin(last_is_none.index), "buggyRTE_none" +# ] = traversals.loc[~traversals.index.isin(last_is_none.index), "last_is_none"] + +# # index: all last traversals of all input groups +# # values: whether the index traversal is a right turn +# last_is_rt = ( +# traversals.groupby(["start", "thru"]) +# .last() +# .reset_index() +# .set_index(["start", "thru", "end"]) +# .turnType +# == TURN_RIGHT +# ) +# # index: all penultimate traversals of input groups w/ >1 out link +# # values: whether the index traversal is a right turn +# penultimate_is_rt = ( +# traversals[ +# ~traversals.set_index(["start", "thru", "end"]).index.isin(last_is_rt.index) +# ] +# .groupby(["start", "thru"]) +# .last() +# .reset_index() +# .set_index(["start", "thru", "end"]) +# .turnType +# == TURN_RIGHT +# ) + +# # drop the end column +# last_is_rt = ( +# last_is_rt.reset_index() +# .set_index(["start", "thru"]) +# .turnType.rename("last_is_rt") +# ) +# # drop the end column and assume false for input groups w/ 1 out link +# penultimate_is_rt = ( +# penultimate_is_rt.reset_index() +# .set_index(["start", "thru"]) +# .turnType.reindex(last_is_rt.index, fill_value=False) +# .rename("penultimate_is_rt") +# ) + +# # tack on two new columns (last_is_rt and penultimate_is_rt) +# traversals = traversals.merge( +# last_is_rt, left_on=["start", "thru"], right_index=True, how="left" +# ).merge(penultimate_is_rt, left_on=["start", "thru"], right_index=True, how="left") + +# # define the right turn exists parameter for all positions, +# # starting with those which are last-siblings +# traversals.loc[ +# traversals.index.isin(last_is_rt.index), "buggyRTE_rt" +# ] = traversals.loc[traversals.index.isin(last_is_rt.index), "penultimate_is_rt"] + +# # then moving on to all others +# traversals.loc[ +# ~traversals.index.isin(last_is_rt.index), "buggyRTE_rt" +# ] = traversals.loc[~traversals.index.isin(last_is_rt.index), "last_is_rt"] + +# # define whether each entry is a thru-junction based on its underlying +# # right-turn-exists variables +# traversals["ThruJunction_lastnonevec"] = ( +# ~( +# traversals.centroid_start +# | traversals.centroid_thru +# | traversals.centroid_end +# ) +# & (traversals.turnType == TURN_NONE) +# & ~(traversals.buggyRTE_none) +# ) +# traversals["ThruJunction_lastrtvec"] = ( +# ~( +# traversals.centroid_start +# | traversals.centroid_thru +# | traversals.centroid_end +# ) +# & (traversals.turnType == TURN_NONE) +# & ~(traversals.buggyRTE_rt) +# ) + +# logger.info("Finished calculating java attributes") + +# java_cols = [ +# "ThruJunction_anynonevec", +# "ThruJunction_lastnoneloop", +# "ThruJunction_lastrtloop", +# "ThruJunction_anynoneloop", +# "ThruJunction_anyrtloop", +# "ThruJunction_anyrtvec", +# "ThruJunction_lastnonevec", +# "ThruJunction_lastrtvec", +# ] + +# return traversals, java_cols + + +def calculate_signalExclRight_alternatives( + traversals: pd.DataFrame, logger: logging.Logger +) -> pd.DataFrame: + """ + Calculate alternative signalized right turn exclusion columns. + This was originally used to test backwards compatibility with the Java implementation. + It is not used in the current implementation, but is left here for reference. + + "signalExclRight_lastnoneloop", # current Java implementation + "signalExclRight_lastrtloop", # fixes rt-vs-none + "signalExclRight_lastnonevec", # vector implementation of current Java + "signalExclRight_lastrtvec", # same as above but also fixes rt-vs-none + "signalExclRight_anynoneloop", # fixes any-vs-last check + "signalExclRight_anyrtloop", # fixes any-vs-last and rt-vs-none (allegedly correct) + "signalExclRight_anynonevec", # vector implementation of any-vs-last fix + "signalExclRight_anyrtvec", # allegedly correct vector implementation + + Parameters: + traversals: pd.DataFrame - Traversal dataframe with attributes + + Returns: + pd.DataFrame - Traversal dataframe with alternative signalized right turn exclusion columns + """ + # the rest can be ditched (ALLEGEDLY) + traversals = traversals.assign( + signalExclRight_anynonevec=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_anynonevec) + ), + signalExclRight_anyrtloop=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_anyrtloop) + ), + signalExclRight_anynoneloop=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_anynoneloop) + ), + signalExclRight_lastnonevec=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_lastnonevec) + ), + signalExclRight_lastrtvec=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_lastrtvec) + ), + signalExclRight_lastnoneloop=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_lastnoneloop) + ), + signalExclRight_lastrtloop=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_lastrtloop) + ), + ) + + java_attributes = [ + "signalExclRight_lastnoneloop", + "signalExclRight_lastrtloop", + "signalExclRight_lastnonevec", + "signalExclRight_lastrtvec", + "signalExclRight_anynoneloop", + "signalExclRight_anyrtloop", + "signalExclRight_anynonevec", + # "signalExclRight_anyrtvec", + ] + + return traversals, java_attributes + + +def create_and_attribute_traversals( + settings: BikeRouteChoiceSettings, + edges: pd.DataFrame, + nodes: pd.DataFrame, + logger: logging.Logger, +) -> pd.DataFrame: + """ + Create and attribute traversals from edges and nodes. + + This function creates a traversal table from the edges, calculates traversal attributes, + and merges node information to provide a comprehensive traversal dataset. + It calculates angles, turn types, and various derived attributes for each traversal. + + Parameters: + edges: pd.DataFrame - Edge dataframe with attributes + nodes: pd.DataFrame - Node dataframe with attributes + + Returns: + pd.DataFrame - Traversal dataframe with derived attributes + """ + + # create initial traversal table from edges + logger.info("Creating traversal table from edges") + traversals = ( + edges.reset_index() + .merge( + edges.reset_index(), + left_on="toNode", + right_on="fromNode", + suffixes=["_fromEdge", "_toEdge"], + how="outer", + ) + .rename( + columns={ + "fromNode_fromEdge": "start", + "toNode_fromEdge": "thru", + "toNode_toEdge": "end", + } + ) + .merge(nodes, left_on="thru", right_index=True, how="outer") + ) + + # drop U-turns + traversals = traversals[traversals.start != traversals.end] + + # calculate traversal angles + traversals["angle"] = traversals.angle_toEdge - traversals.angle_fromEdge + traversals.loc[traversals.angle > math.pi, "angle"] = ( + traversals.loc[traversals.angle > math.pi, "angle"] - 2 * math.pi + ) + traversals.loc[traversals.angle < -math.pi, "angle"] = ( + traversals.loc[traversals.angle < -math.pi, "angle"] + 2 * math.pi + ) + traversals.set_index(["start", "thru", "end"]) + + # keep track of the absolute value of the traversal angle + traversals["absAngle"] = traversals.angle.abs() + + # attach component nodes' centroid statuses + traversals = ( + traversals.merge( + nodes.centroid, + left_on="start", + right_index=True, + suffixes=("_thru", "_start"), + ) + .merge( + nodes.centroid, + left_on="end", + right_index=True, + ) + .rename(columns={"centroid": "centroid_end"}) + ) + + logger.info("Calculating derived traversal attributes") + + # calculate traversal angle attributes for thru node + # maximum turning angle + max_angles = ( + traversals[ + traversals.autosPermitted_toEdge & (traversals.start != traversals.end) + ] + .groupby(["start", "thru"]) + .angle.max() + .fillna(-math.pi) + .rename("max_angle") + ) + + # minimum turning angle (most negative) + min_angles = ( + traversals[ + traversals.autosPermitted_toEdge & (traversals.start != traversals.end) + ] + .groupby(["start", "thru"]) + .angle.min() + .fillna(math.pi) + .rename("min_angle") + ) + + # minimum absolute turning angle + min_abs_angles = ( + traversals[ + traversals.autosPermitted_toEdge & (traversals.start != traversals.end) + ] + .groupby(["start", "thru"]) + .absAngle.min() + .fillna(math.pi) + .rename("min_abs_angle") + ) + + # number of outbound legs + leg_count = ( + traversals[ + traversals.autosPermitted_toEdge & (traversals.start != traversals.end) + ] + .groupby(["start", "thru"]) + .size() + .rename("leg_count") + + 1 + ) + + # consolidate into a dataframe for merging and reindex to full traversal table + turn_atts = ( + pd.concat( + [ + max_angles, + min_angles, + min_abs_angles, + leg_count, + ], + axis=1, + ) + .reindex([traversals.start, traversals.thru]) + .fillna( + { + "max_angle": -math.pi, + "min_angle": math.pi, + "min_abs_angle": math.pi, + "leg_count": 1, + } + ) + ) + + # attach to full table + traversals = pd.concat( + [traversals.set_index(["start", "thru"]), turn_atts], axis=1 + ).reset_index() + + # calculate turn type + # turn type is determined by the angle and leg count + # the options are no turns, left, right, and reverse + traversals.loc[~traversals.leg_count.isna(), "turnType"] = TURN_NONE + + # label right turns + traversals.loc[ + (traversals.leg_count == 3) + & (traversals.angle <= traversals.min_angle) + & (traversals.angle.abs() > math.pi / 6), + "turnType", + ] = TURN_RIGHT + + # label left turns + traversals.loc[ + (traversals.leg_count == 3) + & (traversals.angle >= traversals.max_angle) + & (traversals.angle.abs() > math.pi / 6), + "turnType", + ] = TURN_LEFT + + # more right turns + traversals.loc[ + (traversals.leg_count > 3) + & (traversals.angle.abs() > traversals.min_abs_angle) + & ( + (traversals.angle.abs() >= math.pi / 6) + | (traversals.angle <= traversals.min_angle) + | (traversals.angle >= traversals.max_angle) + ) + & (traversals.angle < 0), + "turnType", + ] = TURN_RIGHT + + # more left turns + traversals.loc[ + (traversals.leg_count > 3) + & (traversals.angle.abs() > traversals.min_abs_angle) + & ( + (traversals.angle.abs() >= math.pi / 6) + | (traversals.angle <= traversals.min_angle) + | (traversals.angle >= traversals.max_angle) + ) + & (traversals.angle >= 0), + "turnType", + ] = TURN_LEFT + + # reversals by angle + traversals.loc[ + (traversals.angle < -math.pi * 5 / 6) | (traversals.angle > math.pi * 5 / 6), + "turnType", + ] = TURN_REVERSE + + # reversals by start/end node + traversals.loc[traversals.start == traversals.end, "turnType"] = TURN_REVERSE + + # all centroid connector traversals are NONE type + traversals.loc[ + traversals.centroid_start | traversals.centroid_thru | traversals.centroid_end, + "turnType", + ] = TURN_NONE + + traversals.turnType = traversals.turnType.astype(int) + + # keep track of the number of outgoing turns w/ turn type == none + # FIXME this should almost certainly get removed + traversals = traversals.merge( + (traversals.set_index(["start", "thru"]).turnType == TURN_NONE) + .reset_index() + .groupby(["start", "thru"]) + .sum() + .rename(columns={"turnType": "none_turns"}), + left_on=["start", "thru"], + right_index=True, + how="left", + ) + + # do the same but with right turns + # FIXME this should be the actual usage, not the above, but we're + # copying from the java implementation + traversals = traversals.merge( + (traversals.set_index(["start", "thru"]).turnType == TURN_RIGHT) + .reset_index() + .groupby(["start", "thru"]) + .sum() + .rename(columns={"turnType": "rt_turns"}), + left_on=["start", "thru"], + right_index=True, + how="left", + ) + + # keep track of how many duplicate traversals have turn type == none + traversals = traversals.merge( + (traversals.set_index(["start", "thru", "end"]).turnType == TURN_NONE) + .reset_index() + .groupby(["start", "thru", "end"]) + .sum() + .rename(columns={"turnType": "dupNoneTurns_toEdge"}), + left_on=["start", "thru", "end"], + right_index=True, + how="left", + ) + + # keep track of how many duplicate traversals have turn type == right + traversals = traversals.merge( + (traversals.set_index(["start", "thru", "end"]).turnType == TURN_RIGHT) + .reset_index() + .groupby(["start", "thru", "end"]) + .sum() + .rename(columns={"turnType": "dupRtTurns_toEdge"}), + left_on=["start", "thru", "end"], + right_index=True, + how="left", + ) + + # for replicating the buggy Java implementation + # if settings.recreate_java_attributes: + # traversals, java_cols = recreate_java_attributes(traversals) + # traversals, java_attributes = calculate_signalExclRight_alternatives(traversals) + # java_cols += java_attributes + + # this is the correct implementation + traversals["ThruJunction_anyrtvec"] = ( + ~( + traversals.centroid_start + | traversals.centroid_thru + | traversals.centroid_end + ) + & (traversals.turnType == TURN_NONE) + & (traversals.rt_turns - traversals.dupRtTurns_toEdge == 0) + ) + + # populate derived traversal attributes + + traversals = traversals.assign( + thruCentroid=( + traversals.centroidConnector_fromEdge & traversals.centroidConnector_toEdge + ), + # this one is allegedly the one to keep + # taken from signalExclRight_anyrtvec + signalExclRight=( + traversals.signalized + & (traversals.turnType != TURN_RIGHT) + & (~traversals.ThruJunction_anyrtvec) + ), + # unlfrma: unsignalized left from major arterial + unlfrma=( + (~traversals.signalized) + & (traversals.functionalClass_fromEdge <= 3) + & (traversals.functionalClass_fromEdge > 0) + & (traversals.bikeClass_fromEdge != 1) + & (traversals.turnType == TURN_LEFT) + ), + # unlfrmi: unsignalized left from minor arterial + unlfrmi=( + (~traversals.signalized) + & (traversals.functionalClass_fromEdge == 4) + & (traversals.bikeClass_fromEdge != 1) + & (traversals.turnType == TURN_LEFT) + ), + # unxma: unsignalized cross major arterial + unxma=( + ~( + traversals.centroid_start + | traversals.centroid_thru + | traversals.centroid_end + ) + & ( + ( + (traversals.turnType == TURN_NONE) + & ( + traversals.majorArtXings + - traversals.dupMajArts_fromEdge + - traversals.dupMajArts_toEdge + ) + >= 2 + ) + | ( + (traversals.turnType == TURN_LEFT) + & (traversals.functionalClass_toEdge <= 3) + & (traversals.functionalClass_toEdge > 0) + & (traversals.bikeClass_toEdge != 1) + ) + ) + ), + # unxmi: unsignalized cross minor arterial + unxmi=( + ~( + traversals.centroid_start + | traversals.centroid_thru + | traversals.centroid_end + ) + & ( + ( + (traversals.turnType == TURN_NONE) + & ( + traversals.artXings + - traversals.dupArts_fromEdge + - traversals.dupArts_toEdge + ) + >= 2 + ) + | ( + (traversals.turnType == TURN_LEFT) + & (traversals.functionalClass_toEdge == 4) + & (traversals.functionalClass_toEdge > 0) + & (traversals.bikeClass_toEdge != 1) + ) + ) + ), + roadDowngrade=( + traversals.functionalClass_fromEdge.isin([3, 4, 5, 6]) + & traversals.functionalClass_toEdge.isin([4, 5, 6, 7]) + & (traversals.functionalClass_fromEdge < traversals.functionalClass_toEdge) + ), + ) + + output_cols = [ + "start", + "thru", + "end", + "turnType", + "thruCentroid", + "signalExclRight", + "unlfrma", + "unlfrmi", + "unxma", + "unxmi", + "roadDowngrade", + ] + # if settings.recreate_java_attributes: + # # include the java attributes if they were recreated + # output_cols += java_attributes + + # keep only the relevant columns + traversals = traversals.set_index(["edgeID_fromEdge", "edgeID_toEdge"])[output_cols] + + logger.info("Finished calculating derived traversal attributes") + + return traversals + + +def create_bike_net(settings: BikeRouteChoiceSettings, logger: logging.Logger) -> tuple[ + pd.DataFrame, # node dataframe + pd.DataFrame, # edge dataframe + pd.DataFrame, # traversal dataframe +]: + """ + Read bike network from supplied shapefiles and derive attributes + + This method reads in two shapefiles detailing the nodes and edges + of a bike network, manipulates the data tables to match the expected + output format, and derives additonal attributes, some of which are + solely used internally while others are appended to the edge and node + tables. Additionally, a new table is developed of all traversals, that + is, all possible combinations of edges leading to and from a node which + are not reversals. + + Parameters: + node_file: str - path to the node shapefile + edge_file: str - path to the edge shapefile + + Returns: + nodes: pd.DataFrame - node dataframe with derived attributes in expected format + edges: pd.DataFrame - edge dataframe with derived attributes in expected format + traversals: pd.DataFrame - traversal dataframe with derived attributes in expected format + """ + if settings.read_cached_bike_net: + logger.info("Reading cached bike network from CSV files") + edges = pd.read_csv( + os.path.join(os.path.expanduser(settings.output_path), "edges.csv"), + index_col=[0], + ) + nodes = pd.read_csv( + os.path.join(os.path.expanduser(settings.output_path), "nodes.csv"), + index_col=0, + ) + traversals = pd.read_csv( + os.path.join( + os.path.join(os.path.expanduser(settings.output_path), "traversals.csv") + ), + index_col=[0, 1], + ) + return nodes, edges, traversals + + # read shapefiles + logger.info("Reading link and node shapefiles") + node = read_file(settings, settings.node_file) + link = read_file(settings, settings.link_file) + + # create and attribute edges + edges, nodes = create_and_attribute_edges(settings, node, link, logger) + + # generate authoritiative positional index + edges = edges.reset_index() + edges.index.name = "edgeID" + + traversals = create_and_attribute_traversals(settings, edges, nodes, logger) + + # save edges, nodes, and traversals to csv files if specified + if settings.save_bike_net: + logger.info("Saving bike network to CSV files") + edges.to_csv( + os.path.join(os.path.expanduser(settings.output_path), "edges.csv") + ) + nodes.to_csv( + os.path.join(os.path.expanduser(settings.output_path), "nodes.csv") + ) + traversals.to_csv( + os.path.join(os.path.expanduser(settings.output_path), "traversals.csv") + ) + + return nodes, edges, traversals diff --git a/src/asim/scripts/bike_route_choice/bike_route_calculator.py b/src/asim/scripts/bike_route_choice/bike_route_calculator.py new file mode 100644 index 000000000..b8e7455cd --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_route_calculator.py @@ -0,0 +1,161 @@ +import os +import logging +import pandas as pd +import numpy as np +import warnings + +from bike_route_utilities import BikeRouteChoiceSettings, load_settings, read_file + +def calculate_utilities( + settings: BikeRouteChoiceSettings, + choosers: pd.DataFrame, + spec: pd.DataFrame, + trace_label: str, + logger: logging.Logger, +) -> pd.DataFrame: + """ + Calculate utilities for choosers using the provided specifications. + Modeled after ActivitySim's core.simulate.eval_utilities. + + Parameters: + settings: BikeRouteChoiceSettings - settings for the bike route choice model + choosers: pd.DataFrame - DataFrame of choosers (edges or traversals) + spec: pd.Series - DataFrame with index as utility expressions and values as coefficients + trace_label: str - label for tracing + Returns: + pd.DataFrame - DataFrame of calculated utilities with same index as choosers + """ + + assert isinstance( + spec, pd.Series + ), "Spec must be a pandas Series with utility expressions as index and coefficients as values" + + globals_dict = {} + locals_dict = { + "np": np, + "pd": pd, + "df": choosers, + } + + expression_values = np.empty((spec.shape[0], choosers.shape[0])) + + for i, expr in enumerate(spec.index): + try: + with warnings.catch_warnings(record=True) as w: + # Cause all warnings to always be triggered. + warnings.simplefilter("always") + if expr.startswith("@"): + expression_value = eval(expr[1:], globals_dict, locals_dict) + else: + expression_value = choosers.eval(expr) + + if len(w) > 0: + for wrn in w: + logger.warning( + f"{trace_label} - {type(wrn).__name__} ({wrn.message}) evaluating: {str(expr)}" + ) + + except Exception as err: + logger.exception( + f"{trace_label} - {type(err).__name__} ({str(err)}) evaluating: {str(expr)}" + ) + raise err + + expression_values[i] = expression_value + + # - compute_utilities + utilities = np.dot(expression_values.transpose(), spec.astype(np.float64).values) + utilities = pd.DataFrame(utilities, index=choosers.index, columns=["utility"]) + + if settings.trace_bike_utilities: + # trace entire utility calculation + logger.info(f"Saving utility calculations to {trace_label}.csv") + expressions = pd.DataFrame( + data=expression_values.transpose(), index=choosers.index, columns=spec.index + ) + expressions = pd.concat([choosers, expressions], axis=1) + expressions["utility"] = utilities["utility"] + expressions.to_csv(os.path.join(os.path.expanduser(settings.output_path), f"{trace_label}.csv")) + + assert utilities.index.equals( + choosers.index + ), "Index mismatch between utilities and choosers" + + return utilities + + +def calculate_utilities_from_spec( + settings: BikeRouteChoiceSettings, + choosers: pd.DataFrame, + spec_file: str, + trace_label: str, + logger: logging.Logger, + randomize: bool = False, +) -> pd.DataFrame: + """ + Calculate utilities from a specification file (edge or traversal). + + Args: + settings: BikeRouteChoiceSettings + choosers: pd.DataFrame (edges or traversals) + spec_file: str (path to the specification file) + trace_label: str (label for tracing/logging) + randomize: bool (whether to randomize coefficients) + + Returns: + pd.DataFrame with a 'utility' column + """ + # read specification file + spec = read_file(settings, spec_file) + + # Optionally randomize coefficients + if randomize: + # randomize coefficients between 1 +/- settings.random_scale_coef, expect for unavailiability (-999) + spec["Coefficient"] = spec["Coefficient"].apply( + lambda x: x + * ( + 1 + + np.random.uniform( + 0 - settings.random_scale_coef, settings.random_scale_coef + ) + ) + if x > -990 + else x + ) + + # calculate utilities + utilities = calculate_utilities( + settings=settings, + choosers=choosers, + spec=spec.set_index("Expression")["Coefficient"], + trace_label=trace_label, + logger=logger, + ) + + # Optionally also add a random component across all utilities + if randomize: + utilities.loc[utilities.utility > -999,"utility"] = utilities.loc[utilities.utility > -999,"utility"] * ( + np.random.choice( + [(1-settings.random_scale_link),(1+settings.random_scale_link)], + size=len(utilities.loc[utilities.utility > -999]) + ) + ) + + # check that all utilities are less than or equal to zero + if (utilities["utility"] > 0).any(): + logger.warning( + f"{trace_label} utilities contain positive values, which may indicate an issue with the utility calculation." + ) + logger.info(f"Utilities: {utilities['utility'].describe()}") + logger.info( + f"Percentage of positive utilities: {(utilities['utility'] > 0).mean() * 100:.2f}%" + ) + logger.info( + "Capping the positive utilities to zero to avoid issues in Dijkstra's algorithm." + ) + utilities.loc[utilities["utility"] > 0, "utility"] = 0 + assert ( + ~utilities.utility.isna() + ).all(), f"{trace_label} utilities should not contain any NaN values" + + return utilities diff --git a/src/asim/scripts/bike_route_choice/bike_route_choice.py b/src/asim/scripts/bike_route_choice/bike_route_choice.py new file mode 100644 index 000000000..a7d2512d4 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_route_choice.py @@ -0,0 +1,937 @@ +import sys, os +import numpy as np +import pandas as pd +import geopandas as gpd +import logging +import warnings +from scipy.sparse import csr_matrix +from scipy.sparse import csr_array, coo_array +from scipy.sparse.csgraph import dijkstra +from multiprocessing import Pool + +import bike_net_reader +import bike_route_calculator +from bike_route_utilities import BikeRouteChoiceSettings, load_settings, read_file + +import time + + +def process_paths_new(centroids, predecessors, logger): + """ + Converts the predecessors array from Dijkstra's algorithm into paths. + + Args: + centroids (list): List of centroid indices. + predecessors (np.ndarray): Predecessors array from Dijkstra's algorithm. + + Returns: + tuple: A tuple containing: + - paths_orig (np.ndarray): Origin indices for paths. + - paths_dest (np.ndarray): Destination indices for paths. + - paths_from_node (np.ndarray): From-node indices for paths. + - paths_to_node (np.ndarray): To-node indices for paths. + """ + logger.debug("Processing paths without numba...") + + # Add self-referential column to predecessor table to indicate end of path + predecessors_null = np.hstack( + (predecessors, np.full((predecessors.shape[0], 1), -1)) + ) + predecessors_null[predecessors_null == -9999] = -1 + # Get starting indices for OD pairs with path found + notnull = (predecessors_null >= 0).nonzero() + notnull = tuple( + i.astype(np.int32) for i in notnull + ) # force int32 to save memory (defaults to int64) + notnull_dest = np.isin(notnull[1], centroids).nonzero() + origin_indices, dest_indices = (notnull[0][notnull_dest], notnull[1][notnull_dest]) + + # Iterate through predecessors + node_indices = dest_indices + paths = [node_indices] + while np.any(node_indices >= 0): + node_indices = predecessors_null[origin_indices, node_indices] + paths.append(node_indices) + + stack_paths = np.vstack(paths).T + stack_paths = stack_paths[:, ::-1] # Reverse order to get origin -> destination + stack_paths_from = stack_paths[:, :-1] + stack_paths_to = stack_paths[:, 1:] # Offset by 1 to get to-node + + # Remove null edges + od_index, path_num = (stack_paths_from >= 0).nonzero() + # path_num_actual = path_num - np.argmax(stack_paths_from >= 0, axis=1)[od_index] # 0-index + paths_from_node = stack_paths_from[od_index, path_num] + paths_to_node = stack_paths_to[od_index, path_num] + + paths_orig = origin_indices[od_index] # centroids index + paths_dest = dest_indices[od_index] # mapped node id + + return (paths_orig, paths_dest, paths_from_node, paths_to_node) + + +def calculate_final_logsums_batch_traversals( + settings, + nodes, + edges, + traversals, + origin_centroids, + dest_centroids, + all_paths_orig, + all_paths_dest, + all_paths_from_edge, + all_paths_to_edge, + all_paths_iteration, + logger, + trace_origins=[], + trace_dests=[], +): + """ + Calculate the final logsums for the bike network using pre-computed paths and traversals. + Includes path size calculation. + + Args: + settings: BikeRouteChoiceSettings + nodes (pd.DataFrame): DataFrame containing node information. + edges (pd.DataFrame): DataFrame containing edge information. + traversals (pd.DataFrame): DataFrame containing traversal information. + origin_centroids (list): List of origin centroids. + dest_centroids (list): List of destination centroids. + all_paths_orig (np.ndarray): Array of origin indices for all paths. + all_paths_dest (np.ndarray): Array of destination indices for all paths. + all_paths_from_edge (np.ndarray): Array of from-edge indices for all paths. + all_paths_to_edge (np.ndarray): Array of to-edge indices for all paths. + all_paths_iteration (np.ndarray): Array of iteration indices for all paths. + trace_origins (list, optional): List of origins to trace. Defaults to []. + trace_dests (list, optional): List of destinations to trace. Defaults to []. + + Returns: + tuple: A tuple containing: + - paths_od_orig_mapped (np.ndarray): Mapped origin indices for paths. + - paths_od_dest_mapped (np.ndarray): Mapped destination indices for paths. + - paths_od_logsum (np.ndarray): Logsum values for each OD pair. + - trace_paths_orig_mapped (np.ndarray): Mapped origins for traced paths. + - trace_paths_dest_mapped (np.ndarray): Mapped destinations for traced paths. + - trace_paths_iteration (np.ndarray): Iteration indices for traced paths. + - trace_paths_prev_node (np.ndarray): Previous node indices for traced paths. + - trace_paths_from_node (np.ndarray): From-node indices for traced paths. + - trace_paths_to_node (np.ndarray): To-node indices for traced paths. + - trace_paths_path_size (np.ndarray): Path size for traced paths. + - trace_paths_edge_cost (np.ndarray): Edge cost for traced paths. + - trace_paths_trav_cost (np.ndarray): Traversal cost for traced paths. + """ + logger.info("Calculating logsums...") + + if all_paths_orig.size == 0: + logger.info("No paths found in this batch") + return ( + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + ) + + # Mapped node id to centroids index + dest_centroids_rev_map = np.zeros(int(max(dest_centroids)) + 1, dtype=np.int32) + dest_centroids_rev_map[dest_centroids] = range(len(dest_centroids)) + all_paths_dest_rev = dest_centroids_rev_map[all_paths_dest] + + node_mapping = {node_id: idx for idx, node_id in enumerate(nodes.index)} + + edges = edges.reset_index() + edge_from_node = edges.fromNode.map(node_mapping).to_numpy() + edge_to_node = edges.toNode.map(node_mapping).to_numpy() + edge_cost = edges.edge_utility.to_numpy() + edge_length = edges.distance.to_numpy() + edge_ids = edges.edgeID.to_numpy() + + all_paths_from_node = edge_from_node[all_paths_to_edge] + all_paths_to_node = edge_to_node[all_paths_to_edge] + all_paths_edge_cost = edge_cost[all_paths_to_edge] + all_paths_edge_length = edge_length[all_paths_to_edge] + all_paths_edge_id = edge_ids[all_paths_to_edge] + + if len(trace_origins) > 0: + all_paths_prev_node = edge_from_node[all_paths_from_edge] + + num_edges = len(edges) + + # node mapping needs to start at 0 in order to create adjacency matrix + # using edges instead of nodes since the edges are treated as nodes in the Dijkstra's algorithm + # constructing edge_mapping with columns [index, fromNode, toNode] + edge_mapping = edges[["fromNode", "toNode"]].reset_index() + + traversals_mapped = ( + traversals.reset_index() + .merge( + edge_mapping, + how="left", + left_on=["start", "thru"], + right_on=["fromNode", "toNode"], + ) + .merge( + edge_mapping, + how="left", + left_on=["thru", "end"], + right_on=["fromNode", "toNode"], + suffixes=("FromEdge", "ToEdge"), + ) + ) + + row = traversals_mapped.indexFromEdge.to_numpy() + col = traversals_mapped.indexToEdge.to_numpy() + data = traversals_mapped.traversal_utility.to_numpy() + trav_costs = csr_array((data, (row, col)), shape=(num_edges, num_edges)) + + all_paths_trav_cost = trav_costs[all_paths_from_edge, all_paths_to_edge] + + # Add origin connectors + orig_connectors_indices = np.isin(all_paths_from_edge, origin_centroids).nonzero()[ + 0 + ] + all_paths_from_node = np.concatenate( + ( + all_paths_from_node, + edge_from_node[all_paths_from_edge][orig_connectors_indices], + ) + ) + all_paths_to_node = np.concatenate( + (all_paths_to_node, edge_to_node[all_paths_from_edge][orig_connectors_indices]) + ) + all_paths_edge_cost = np.concatenate( + (all_paths_edge_cost, edge_cost[all_paths_from_edge][orig_connectors_indices]) + ) + all_paths_edge_id = np.concatenate( + (all_paths_edge_id, edge_ids[all_paths_from_edge][orig_connectors_indices]) + ) + all_paths_edge_length = np.concatenate( + ( + all_paths_edge_length, + edge_length[all_paths_from_edge][orig_connectors_indices], + ) + ) + all_paths_trav_cost = np.concatenate( + (all_paths_trav_cost, np.zeros_like(orig_connectors_indices)) + ) + all_paths_orig_new = np.concatenate( + (all_paths_orig, all_paths_orig[orig_connectors_indices]) + ) + all_paths_dest_rev_new = np.concatenate( + (all_paths_dest_rev, all_paths_dest_rev[orig_connectors_indices]) + ) + all_paths_iteration_new = np.concatenate( + (all_paths_iteration, all_paths_iteration[orig_connectors_indices]) + ) + all_paths_cost = all_paths_edge_cost + all_paths_trav_cost + + # SciPy Sparse arrays only surrport 2d arrays, so flatten OD and link indices + paths_od_ravel = np.ravel_multi_index( + (all_paths_orig_new, all_paths_dest_rev_new), + (len(origin_centroids), len(dest_centroids)), + ) + paths_link_ravel = np.ravel_multi_index( + (all_paths_from_node, all_paths_to_node), (len(nodes), len(nodes)) + ) + + # extract paths for OD pairs that are being traced + if len(trace_origins) > 0: + # trace_indices = (np.isin(all_paths_orig_new, trace_origins) & np.isin(all_paths_dest_rev_new, trace_dests)).nonzero()[0] + all_paths_prev_node = np.concatenate( + (all_paths_prev_node, np.full_like(orig_connectors_indices, -1)) + ) + trace_od_ravel = np.ravel_multi_index( + (trace_origins, trace_dests), (len(origin_centroids), len(dest_centroids)) + ) + trace_indices = (np.isin(paths_od_ravel, trace_od_ravel)).nonzero()[0] + trace_paths_orig = all_paths_orig_new[trace_indices] + trace_paths_dest_rev = all_paths_dest_rev_new[trace_indices] + trace_paths_iteration = all_paths_iteration_new[trace_indices] + trace_paths_prev_node = all_paths_prev_node[trace_indices] + trace_paths_from_node = all_paths_from_node[trace_indices] + trace_paths_to_node = all_paths_to_node[trace_indices] + trace_paths_edge_cost = all_paths_edge_cost[trace_indices] + trace_paths_trav_cost = all_paths_trav_cost[trace_indices] + trace_paths_edge_id = all_paths_edge_id[trace_indices] + + # SciPy COO array will add duplicates together upon conversion to CSR array + # Insert ones for each path link to count number of paths for each OD/link + # Likely not an optimal solution, but np.unique took far longer, should consider other alternatives + ones_arr = np.ones(len(paths_od_ravel), dtype=np.uint8) + link_num_paths = coo_array( + (ones_arr, (paths_od_ravel, paths_link_ravel)), + shape=(len(origin_centroids) * len(dest_centroids), len(nodes) ** 2), + ) + link_num_paths = csr_array(link_num_paths) + + paths_num_paths = link_num_paths[paths_od_ravel, paths_link_ravel] + + # path size = sum( ( la / Li ) * ( 1 / Man ) ) = sum( la / Man ) / Li + all_paths_size_component = all_paths_edge_length / paths_num_paths # la / Man + # Flatten OD and iteration indices to sum cost, length, path size + # Should check if COO array is faster than bincount + paths_od_iter_ravel = np.ravel_multi_index( + (all_paths_orig_new, all_paths_dest_rev_new, all_paths_iteration_new), + (len(origin_centroids), len(dest_centroids), settings.number_of_iterations), + ) + od_iter_length_total = np.bincount(paths_od_iter_ravel, all_paths_edge_length) # Li + + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + od_iter_path_size = ( + np.bincount(paths_od_iter_ravel, all_paths_size_component) + / od_iter_length_total + ) # sum( la / Man ) / Li + + od_iter_cost = np.bincount(paths_od_iter_ravel, all_paths_cost) + + # extract path sizes for OD pairs that are being traced + if len(trace_origins) > 0: + trace_paths_od_iter_ravel = np.ravel_multi_index( + (trace_paths_orig, trace_paths_dest_rev, trace_paths_iteration), + (len(origin_centroids), len(dest_centroids), settings.number_of_iterations), + ) + trace_paths_path_size = od_iter_path_size[trace_paths_od_iter_ravel] + + # Unflatten OD and iteration indices, no longer need individual links + od_iter_indices = (od_iter_length_total > 0).nonzero()[0] + paths_od_iter_orig, paths_od_iter_dest, paths_od_iter_iter = np.unravel_index( + od_iter_indices, + (len(origin_centroids), len(dest_centroids), settings.number_of_iterations), + ) + paths_od_iter_cost = od_iter_cost[od_iter_indices] + paths_od_iter_path_size = od_iter_path_size[od_iter_indices] + paths_od_iter_length_total = od_iter_length_total[od_iter_indices] + + # Normalize path size, need to sum path size by OD + paths_od_ravel = np.ravel_multi_index( + (paths_od_iter_orig, paths_od_iter_dest), + (len(origin_centroids), len(dest_centroids)), + ) + od_count_iter = np.bincount(paths_od_ravel) + od_path_size_sum = np.bincount(paths_od_ravel, paths_od_iter_path_size) + paths_od_iter_path_size_sum = od_path_size_sum[paths_od_ravel] + paths_od_iter_path_size_normalized = ( + paths_od_iter_path_size / paths_od_iter_path_size_sum + ) + + # Add path cost to utility function + paths_od_iter_utility = paths_od_iter_cost + np.log( + paths_od_iter_path_size_normalized + ) + paths_od_iter_utility_exp = np.exp(paths_od_iter_utility) + + od_utility_exp_sum = np.bincount(paths_od_ravel, paths_od_iter_utility_exp) + paths_od_iter_utility_exp_sum = od_utility_exp_sum[paths_od_ravel] + paths_od_iter_prob = paths_od_iter_utility_exp / paths_od_iter_utility_exp_sum + + paths_od_iter_length_weighted = paths_od_iter_length_total * paths_od_iter_prob + + # Unflatten OD indices, no longer need iterations + od_indices = (od_path_size_sum > 0).nonzero()[0] + paths_od_orig, paths_od_dest = np.unravel_index( + od_indices, (len(origin_centroids), len(dest_centroids)) + ) + + paths_od_count_iter = od_count_iter[od_indices] + paths_od_path_size_sum = od_path_size_sum[od_indices] + + # Average path length + od_dist = np.bincount(paths_od_ravel, paths_od_iter_length_weighted) + paths_od_dist = od_dist[od_indices] + + # Logsum calculation + od_logsum = np.bincount(paths_od_ravel, paths_od_iter_utility_exp) + paths_od_logsum = np.log(od_logsum[od_indices]) + + # Centroids index to mapped node id + origin_centroids_np = np.array(origin_centroids) + paths_od_orig_mapped = origin_centroids_np[paths_od_orig] + dest_centroids_np = np.array(dest_centroids) + paths_od_dest_mapped = dest_centroids_np[paths_od_dest] + + # edge mapping is used to map origins and destinations back to their original node ids + # this awkward mapping is necessary because the edges are treated as nodes in the dijkstra's algorithm + fromNode_map = edge_mapping.set_index("index")["fromNode"].to_dict() + toNode_map = edge_mapping.set_index("index")["toNode"].to_dict() + + paths_od_orig_mapped = np.array([fromNode_map[i] for i in paths_od_orig_mapped]) + paths_od_dest_mapped = np.array([toNode_map[i] for i in paths_od_dest_mapped]) + + if len(trace_origins) > 0: + # remapping traced origins and destinations from 0 index to edge index, then edge index to node id + trace_paths_orig_mapped = origin_centroids_np[trace_paths_orig] + trace_paths_dest_mapped = dest_centroids_np[trace_paths_dest_rev] + trace_paths_orig_mapped = np.array( + [fromNode_map[i] for i in trace_paths_orig_mapped] + ) + trace_paths_dest_mapped = np.array( + [toNode_map[i] for i in trace_paths_dest_mapped] + ) + + # paths themselves just need to be mapped back from 0-index to node id + rev_node_mapping = {v: k for k, v in node_mapping.items()} + rev_node_mapping[-1] = -1 # map no path -1 to itself + trace_paths_prev_node = np.array( + [rev_node_mapping[i] for i in trace_paths_prev_node] + ) + trace_paths_from_node = np.array( + [rev_node_mapping[i] for i in trace_paths_from_node] + ) + trace_paths_to_node = np.array( + [rev_node_mapping[i] for i in trace_paths_to_node] + ) + + return ( + paths_od_orig_mapped, + paths_od_dest_mapped, + paths_od_dist, + paths_od_logsum, + paths_od_path_size_sum, + paths_od_count_iter, + trace_paths_orig_mapped, + trace_paths_dest_mapped, + trace_paths_iteration, + trace_paths_prev_node, + trace_paths_from_node, + trace_paths_to_node, + trace_paths_path_size, + trace_paths_edge_cost, + trace_paths_trav_cost, + trace_paths_edge_id, + ) + else: + return ( + paths_od_orig_mapped, + paths_od_dest_mapped, + paths_od_dist, + paths_od_logsum, + paths_od_path_size_sum, + paths_od_count_iter, + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + np.empty((0)), + ) + + +def _perform_dijkstra(centroids, adjacency_matrix, logger, limit=3): + """Perform Dijkstra's algorithm for a batch of centroids.""" + logger.info( + f"Processing Dijkstra's on {len(centroids)} centroids with limit={limit}..." + ) + distances, predecessors = dijkstra( + adjacency_matrix, + directed=True, + indices=centroids, + return_predecessors=True, + limit=limit, + ) + + return (distances, predecessors) + + +def perform_dijkstras_algorithm_batch_traversals( + edges, traversals, origin_centroids, limit, logger +): + """Perform Dijkstra's algorithm for centroids using SciPy's sparse graph solver with batched parallel processing.""" + num_edges = len(edges) + + # node mapping needs to start at 0 in order to create adjacency matrix + # fromNode and toNode index is saved as columns in edges + # then reindex again to get index of the edge + edge_mapping = edges["edge_utility"] + + traversals_mapped = ( + traversals.reset_index() + .merge( + edge_mapping, + how="left", + left_on="edgeID_fromEdge", + right_index=True + ) + .merge( + edge_mapping, + how="left", + left_on="edgeID_toEdge", + right_index=True, + suffixes=("FromEdge", "ToEdge"), + ) + ) + # Total bike cost is edge cost (after traversal) plus traversal cost + # Note origin zone connector edge cost is not included with this approach, but this has no impact on shortest path selection + traversals_mapped["total_utility"] = ( + traversals_mapped.edge_utilityToEdge + traversals_mapped.traversal_utility + ) + traversals_mapped.loc[ + traversals_mapped.edgeID_fromEdge.isin(origin_centroids), "total_utility" + ] += traversals_mapped.edge_utilityFromEdge + + # Convert from negative utility to positive cost for Dijkstra's + traversals_mapped.total_utility *= -1 + + # Create a sparse adjacency matrix + row = traversals_mapped.edgeID_fromEdge.to_numpy() + col = traversals_mapped.edgeID_toEdge.to_numpy() + data = traversals_mapped.total_utility.to_numpy() + adjacency_matrix = csr_matrix((data, (row, col)), shape=(num_edges, num_edges)) + + logger.debug(f"Need to calculate Dijkstra's on {len(origin_centroids)} centroids") + + # Perform Dijkstra's algorithm for all centroids + shortest_paths = _perform_dijkstra(origin_centroids, adjacency_matrix, logger, limit) + return shortest_paths + + +def run_iterations_batch_traversals( + settings: BikeRouteChoiceSettings, + edges: pd.DataFrame, + traversals: pd.DataFrame, + origin_centroids: list, + dest_centroids: list, + logger: logging.Logger, +): + """ + Run multiple iterations of Dijkstra's algorithm using traversals as "links" and edges as "vertices". + For each iteration, it calculates utilities for edges and traversals, then performs Dijkstra's algorithm to find paths. + + Args: + settings: BikeRouteChoiceSettings + edges: pd.DataFrame + traversals: pd.DataFrame + origin_centroids: list + dest_centroids: list + + Returns: + tuple: A tuple containing arrays of path information. + """ + + all_paths = [] + + for i in range(settings.number_of_iterations): + logger.info(f"Running iteration {i + 1} of {settings.number_of_iterations}") + # calculating utilties with randomness + edges["edge_utility"] = bike_route_calculator.calculate_utilities_from_spec( + settings, + choosers=edges, + spec_file=settings.edge_util_file, + trace_label=f"bike_edge_utilities_iteration_{i}", + randomize=True, + logger=logger, + ) + + logger.debug("Overriding utility of centroid connectors") + edges.loc[edges.centroidConnector,"edge_utlity"] = -np.finfo(edges.edge_utility.dtype).eps + + traversals[ + "traversal_utility" + ] = bike_route_calculator.calculate_utilities_from_spec( + settings, + choosers=traversals, + spec_file=settings.traversal_util_file, + trace_label=f"bike_traversal_utilities_iteration_{i}", + randomize=False, + logger=logger, + ) + + # run dijkstra's + distances, predecessors = perform_dijkstras_algorithm_batch_traversals( + edges, traversals, origin_centroids, limit=settings.max_dijkstra_utility, logger=logger, + ) + + # process paths + paths = process_paths_new(dest_centroids, predecessors, logger) + all_paths.append(paths + (np.full_like(paths[0], i, dtype=np.uint8),)) + + all_paths_concat = map(np.concatenate, zip(*all_paths)) + ( + all_paths_orig, + all_paths_dest, + all_paths_from_edge, + all_paths_to_edge, + all_paths_iteration, + ) = all_paths_concat + return ( + all_paths_orig, + all_paths_dest, + all_paths_from_edge, + all_paths_to_edge, + all_paths_iteration, + ) + + +def run_batch_traversals( + settings, + nodes, + edges, + traversals, + origin_centroids, + dest_centroids, + trace_origin_edgepos, + trace_dest_edgepos, + logger, + proc_id +): + """ + Run batch traversals for the bike route choice model. + + """ + if logger is None: + logger = logging.getLogger(f"subprocess_{proc_id}") + log_file_location = os.path.join(os.path.expanduser(settings.output_path), "bike_model.log") + logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(name)s %(levelname)s - %(message)s", + handlers=[logging.FileHandler(log_file_location), logging.StreamHandler()], + ) + ( + all_paths_orig, + all_paths_dest, + all_paths_from_edge, + all_paths_to_edge, + all_paths_iteration, + ) = run_iterations_batch_traversals( + settings, + edges, + traversals, + origin_centroids, + dest_centroids, + logger, + ) + trace_origins_rev = [] + trace_dests_rev = [] + + if len(settings.trace_origins) > 0: + + filtered_trace_origs = trace_origin_edgepos[np.isin(trace_origin_edgepos,origin_centroids)] + + trace_origins_rev = pd.DataFrame( + enumerate(origin_centroids), + columns=['subset_pos','edge_pos'] + ).set_index( + 'edge_pos' + ).loc[ + filtered_trace_origs + ].subset_pos.values + + filtered_trace_dests = trace_dest_edgepos[np.isin(trace_dest_edgepos,dest_centroids)] + + trace_dests_rev = pd.DataFrame( + enumerate(dest_centroids), + columns=['subset_pos','edge_pos'] + ).set_index( + 'edge_pos' + ).loc[ + filtered_trace_dests + ].subset_pos.values + + # calculate non-randomized utilities for edges and traversals to use in final logsum calculation + edges["edge_utility"] = bike_route_calculator.calculate_utilities_from_spec( + settings, + choosers=edges, + spec_file=settings.edge_util_file, + trace_label="bike_edge_utilities_final", + randomize=False, + logger=logger, + ) + + logger.debug("Overriding utility of centroid connectors") + edges.loc[edges.centroidConnector,"edge_utlity"] = -np.finfo(edges.edge_utility.dtype).eps + + traversals[ + "traversal_utility" + ] = bike_route_calculator.calculate_utilities_from_spec( + settings, + choosers=traversals, + spec_file=settings.traversal_util_file, + trace_label="bike_traversal_utilities_final", + randomize=False, + logger=logger, + ) + + final_paths = calculate_final_logsums_batch_traversals( + settings, + nodes, + edges, + traversals, + origin_centroids, + dest_centroids, + all_paths_orig, + all_paths_dest, + all_paths_from_edge, + all_paths_to_edge, + all_paths_iteration, + logger, + trace_origins_rev, + trace_dests_rev, + ) + return final_paths + + +def get_centroid_connectors( + settings: BikeRouteChoiceSettings, nodes: pd.DataFrame, edges: pd.DataFrame, logger: logging.Logger, +): + """ + Generate centroids for the bike route choice model. + This function is a placeholder and should be replaced with a more intelligent centroid selection method. + + Returns: + Index of edge IDs of corresponding connectors + """ + # node mapping needs to start at 0 in order to create adjacency matrix + # constructing edge_mapping with columns [index, fromNode, toNode] + edge_mapping = edges[["fromNode", "toNode"]] + + # Get mgra centroids (nodes with 'centroid' flag True and 'mgra' greater than zero) + # Centroid connectors + origin_centroid_connectors = nodes[ + nodes["centroid"] & (nodes[settings.zone_level] > 0) + ].merge(edge_mapping, how="left", left_index=True, right_on="fromNode") + + dest_centroid_connectors = nodes[nodes["centroid"] & (nodes[settings.zone_level] > 0)].merge( + edge_mapping, how="left", left_index=True, right_on="toNode" + ) + + if isinstance(settings.zone_subset, list): + # filter centroids based on zone_subset if it is a list + origin_centroid_connectors = origin_centroid_connectors[ + origin_centroid_connectors[settings.zone_level].isin(settings.zone_subset) + ] + dest_centroid_connectors = dest_centroid_connectors[dest_centroid_connectors[settings.zone_level].isin(settings.zone_subset)] + elif isinstance(settings.zone_subset, int): + # take the first N centroids if zone_subset is an integer + origin_centroid_connectors = origin_centroid_connectors[: settings.zone_subset] + dest_centroid_connectors = dest_centroid_connectors[: settings.zone_subset] + + def _clean_centroid_connectors(df, label, edge_mapping): + null_rows = df[df.isnull().any(axis=1)] + if not null_rows.empty: + logger.warning( + f"Null columns found in {label} centroids dataframe! Dropping:\n {null_rows}" + ) + df = df.dropna() + df.index = df.index.astype(np.int32) + return df + + origin_centroid_connectors = _clean_centroid_connectors(origin_centroid_connectors, "origin", edge_mapping) + dest_centroid_connectors = _clean_centroid_connectors(dest_centroid_connectors, "destination", edge_mapping) + + origin_centroid_connectors = origin_centroid_connectors[settings.zone_level] + dest_centroid_connectors = dest_centroid_connectors[settings.zone_level] + + return origin_centroid_connectors, dest_centroid_connectors + + +def run_bike_route_choice(settings, logger): + """Main function to run the bike route choice model.""" + + # create bike network + nodes, edges, traversals = bike_net_reader.create_bike_net(settings, logger) + + # Define centroids + origin_centroid_connectors, dest_centroid_connectors = get_centroid_connectors(settings, nodes, edges, logger) + + trace_origins_edgepos = np.array(origin_centroid_connectors[origin_centroid_connectors.isin(settings.trace_origins)].index) + trace_dests_edgepos = np.array(dest_centroid_connectors[dest_centroid_connectors.isin(settings.trace_destinations)].index) + + # drop zone IDs + origin_centroid_connectors = origin_centroid_connectors.index + dest_centroid_connectors = dest_centroid_connectors.index + + + logger.info( + f"Splitting {len(origin_centroid_connectors)} origins into {settings.number_of_batches} batches" + ) + origin_centroid_batches = np.array_split( + origin_centroid_connectors, settings.number_of_batches + ) + + # run the bike route choice model in either single or multi-process mode + if settings.number_of_processors > 1: + # Split origin centroids into batche + final_paths = [] + for i, origin_centroid_batch in enumerate(origin_centroid_batches): + logger.info( + f"Splitting batch {i} of {len(origin_centroid_batch)} origins into {settings.number_of_processors} processes" + ) + origin_centroid_sub_batches = np.array_split( + origin_centroid_batch, settings.number_of_processors + ) + with Pool(processes=settings.number_of_processors) as pool: + results = pool.starmap( + run_batch_traversals, + [ + ( + settings, + nodes, + edges.drop(columns='geometry'), + traversals, + origin_centroid_sub_batch, + dest_centroid_connectors, + trace_origins_edgepos, + trace_dests_edgepos, + None, + proc_id + ) + for proc_id, origin_centroid_sub_batch in enumerate(origin_centroid_sub_batches) + ], + ) + final_paths.extend(results) + + else: + final_paths = [] + for i, origin_centroid_batch in enumerate(origin_centroid_batches): + logger.info( + f"Processing batch {i+1} of {len(origin_centroid_batch)} origins" + ) + results = run_batch_traversals( + settings=settings, + nodes=nodes, + edges=edges.drop(columns='geometry'), + traversals=traversals, + origin_centroids=origin_centroid_batch, + dest_centroids=dest_centroid_connectors, + trace_origin_edgepos=trace_origins_edgepos, + trace_dest_edgepos=trace_dests_edgepos, + logger=logger, + proc_id=0 + ) + final_paths.append(results) + + final_paths_concat = tuple(map(np.concatenate,zip(*final_paths))) + + logsums = pd.DataFrame( + { + "origin": final_paths_concat[0], + "destination": final_paths_concat[1], + "distance": final_paths_concat[2], + "logsum": final_paths_concat[3], + "path_size_sum": final_paths_concat[4], + "iterations": final_paths_concat[5] + } + ) + + + # calculate replacement intrazonal distance values + diag_dists = logsums[logsums.origin!=logsums.destination].groupby('origin').distance.min() * 0.5 + diags = diag_dists.to_frame() + + # calculate replacement intrazonal logsum values + + # FIXME this is repetitive - find a way to retrieve spec only once (already read in run_batch_traversals) + spec = read_file(settings, settings.edge_util_file).set_index('Label') + + intrazonal_coeff = settings.intrazonal_coefficient + + assert intrazonal_coeff in spec.index, f"Intrazonal logsum coefficient {intrazonal_coeff} missing from edge spec" + + diags['logsum'] = diags.distance * spec.loc[intrazonal_coeff,"Coefficient"] + + # indexing work + diags = diags.reset_index() + diags['destination'] = diags.origin + diags = diags.set_index(['origin','destination']) + + diags['path_size_sum'] = 0 + diags['iterations'] = 0 + + # replace the values in the logsums table + logsums = logsums.set_index(['origin','destination']) + logsums = diags.combine_first(logsums) + # logsums.loc[diags.index,'distance'] = diags.distance + # logsums.loc[diags.index,'logsum'] = diags.logsum + # logsums.loc[diags.index,'path_size_sum'] = np.nan + + logsums.to_csv(f"{settings.output_path}/bike_route_choice_logsums.csv") + + logsums['time'] = (logsums['distance'] / settings.bike_speed) * 60 + + logsums = logsums.merge(nodes[[settings.zone_level]], how='left', left_on='origin', right_index=True) + logsums = logsums.rename(columns={settings.zone_level: 'i'}) + logsums = logsums.merge(nodes[[settings.zone_level]], how='left', left_on='destination', right_index=True) + logsums = logsums.rename(columns={settings.zone_level: 'j'}) + + if settings.zone_level == 'mgra': + logsums = logsums[(logsums.iterations == 0) | (logsums.iterations >= settings.min_iterations)] + + logsums = logsums[['i','j','logsum','time']] + + logsums.to_csv(settings.output_file_path,index=False) + + # Save the final paths to a CSV file + if len(settings.trace_origins) > 0: + trace_paths = pd.DataFrame( + { + "origin": final_paths_concat[6], + "destination": final_paths_concat[7], + "iteration": final_paths_concat[8], + "prev_node": final_paths_concat[9], + "from_node": final_paths_concat[10], + "to_node": final_paths_concat[11], + "path_size": final_paths_concat[12], + "edge_cost": final_paths_concat[13], + "traversal_cost": final_paths_concat[14], + "edgeID":final_paths_concat[15], + } + ) + trace_paths.to_csv( + f"{settings.output_path}/bike_route_choice_trace.csv", index=False + ) + + if settings.generate_shapefile: + logger.info("Generating shapefile...") + + + # # attach edges to the paths + trace_paths = gpd.GeoDataFrame( + trace_paths.merge( + edges['geometry'], + left_on=['edgeID'], + right_index=True, + how='left'), + crs=settings.crs + ) + + logger.info("Writing shapefile...") + trace_paths.to_file(f"{settings.output_path}/bike_route_choice_trace.shp") + + + logger.info("Bike route choice model completed.") + return logsums + + +if __name__ == "__main__": + + start_time = time.time() + + logger = logging.getLogger(__name__) + # can pass settings file as command line argument + if len(sys.argv) > 1: + settings_file = sys.argv[1] + else: + settings_file = "bike_route_choice_settings.yaml" + # load settings + settings = load_settings(logger,settings_file) + + run_bike_route_choice(settings, logger) + + end_time = time.time() + elapsed_time = end_time - start_time + logger.info(f"Total time taken: {elapsed_time:.2f} seconds") diff --git a/src/asim/scripts/bike_route_choice/bike_route_choice_settings_mgra.yaml b/src/asim/scripts/bike_route_choice/bike_route_choice_settings_mgra.yaml new file mode 100644 index 000000000..9dbc2c1de --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_route_choice_settings_mgra.yaml @@ -0,0 +1,54 @@ +# Path to bike network shapefiles +node_file: SANDAG_Bike_Node.shp +link_file: SANDAG_Bike_Net.shp + +# Data directory, optional additional place to look for data +data_dir: ${path}\input + +# Path to bike route choice model output +output_path: ${path}\output\bike\mgra +output_file_path: ${path}\output\bikeMgraLogsum.csv + +# Edge utility specification file +edge_util_file: bike_edge_utils.csv + +# Traversal utility specification file +traversal_util_file: bike_traversal_utils.csv + +# Bike speed to determine bike times +bike_speed: {mode-nonmotorized-bike-speed:} + +# can define a subset of zones to use for the model +# this is useful for testing or if you only want to run the model for a specific area +# zone_subset: [90, 5310, 5873, 8647, 5647, 3032] + +# whether to treat "mgra" or "taz" as the centroid zones +zone_level: mgra +max_dijkstra_utility: 25 + +# how many different paths to build for each origin-destination pair +# this is the number of times dijkstra's algorithm will be run +number_of_iterations: 10 + +# minimum number of iterations/paths. Zone pairs with fewer paths will be discarded +min_iterations: 7 + +# runtime settings +number_of_batches: 16 +number_of_processors: 47 + +generate_shapefile: True +crs: epsg:2230 + +# randomization settings +random_scale_coef: 0.1 +random_scale_link: 0.2 + +# whether to trace the edge utilities +trace_bike_utilities: False + +# trace_origins: [90, 5310, 5873] +# trace_destinations: [8647, 5647, 3032] + +read_cached_bike_net: False +save_bike_net: False \ No newline at end of file diff --git a/src/asim/scripts/bike_route_choice/bike_route_choice_settings_taz.yaml b/src/asim/scripts/bike_route_choice/bike_route_choice_settings_taz.yaml new file mode 100644 index 000000000..1cf20ed88 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_route_choice_settings_taz.yaml @@ -0,0 +1,54 @@ +# Path to bike network shapefiles +node_file: SANDAG_Bike_Node.shp +link_file: SANDAG_Bike_Net.shp + +# Data directory, optional additional place to look for data +data_dir: ${path}\input + +# Path to bike route choice model output +output_path: ${path}\output\bike\taz +output_file_path: ${path}\output\bikeTazLogsum.csv + +# Edge utility specification file +edge_util_file: bike_edge_utils.csv + +# Traversal utility specification file +traversal_util_file: bike_traversal_utils.csv + +# Bike speed to determine bike times +bike_speed: {mode-nonmotorized-bike-speed:} + +# can define a subset of zones to use for the model +# this is useful for testing or if you only want to run the model for a specific area +# zone_subset: [90, 5310, 5873, 8647, 5647, 3032] + +# whether to treat "mgra" or "taz" as the centroid zones +zone_level: taz +max_dijkstra_utility: 120 + +# how many different paths to build for each origin-destination pair +# this is the number of times dijkstra's algorithm will be run +number_of_iterations: 10 + +# minimum number of iterations/paths. Zone pairs with fewer paths will be discarded +min_iterations: 0 + +# runtime settings +number_of_batches: 16 +number_of_processors: 47 + +generate_shapefile: True +crs: epsg:2230 + +# randomization settings +random_scale_coef: 0.1 +random_scale_link: 0.2 + +# whether to trace the edge utilities +trace_bike_utilities: False + +# trace_origins: [90, 5310, 5873] +# trace_destinations: [8647, 5647, 3032] + +read_cached_bike_net: False +save_bike_net: False \ No newline at end of file diff --git a/src/asim/scripts/bike_route_choice/bike_route_utilities.py b/src/asim/scripts/bike_route_choice/bike_route_utilities.py new file mode 100644 index 000000000..140775165 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_route_utilities.py @@ -0,0 +1,286 @@ +import os +import logging +import yaml +from pydantic import BaseModel, ValidationError +import geopandas as gpd +import pandas as pd +import numpy as np +import networkx as nx +import matplotlib.pyplot as plt +from typing import Literal + + +class BikeRouteChoiceSettings(BaseModel): + """ + Bike route choice settings + """ + + # path to bike network shapefiles + node_file: str = "SANDAG_Bike_Node.shp" + link_file: str = "SANDAG_Bike_Net.shp" + + # data directory, optional additional place to look for data + data_dir: str = "" + + # edge utility specifcation file + edge_util_file: str = "bike_edge_utils.csv" + + # traversal utility specifcation file + traversal_util_file: str = "bike_traversal_utils.csv" + + # coefficient to multiply intrazonal distance by to get logsum + intrazonal_coefficient: str = "util_res_nofacils" + + bike_speed: float = 0 + + # path to bike route choice model output + output_path: str = "output" + output_file_path: str = "output\bikeTazLogsum.csv" + + # whether to trace edge and traversal utility calculations + trace_bike_utilities: bool = False + + # runtime settings + number_of_batches: int = 1 + number_of_processors: int = 1 + + # randomization settings + random_seed: int = 42 + random_scale_coef: float = 0.5 + random_scale_link: float = 0.7 + + # trace origin and destination pairs + trace_origins: list = [] + trace_destinations: list = [] + + # whether to recreate Java attributes -- not needed, but here for backwards compatibility tests + recreate_java_attributes: bool = False + + # maximum distance in miles for Dijkstra's algorithm to search for shortest paths + max_dijkstra_utility: float = 10 + + # caching options for bike network creation to save time + read_cached_bike_net: bool = False # will crash if network does not exist + save_bike_net: bool = True + + # shapefile generation properties + generate_shapefile: bool = False + crs: str = None + + # can define a subset of zones to use for the model + # this is useful for testing or if you only want to run the model for a specific area + zone_subset: int | list | None = None + + # whether to treat mazs or tazs as the centroid zones + zone_level: Literal["taz", "mgra"] = "taz" + + # how many different paths to build for each origin-destination pair + # this is the number of times dijkstra's algorithm will be run + number_of_iterations: int = 10 + + # minimum number of iterations/paths. Zone pairs with fewer paths will be discarded + min_iterations: int = 0 + + +def load_settings( + logger: logging.Logger, yaml_path: str = "bike_route_choice_settings.yaml", +) -> BikeRouteChoiceSettings: + with open(yaml_path, "r") as f: + data = yaml.safe_load(f) + try: + settings = BikeRouteChoiceSettings(**data) + except ValidationError as e: + print("Settings validation error:", e) + raise + + # ensure output path exists + if not os.path.exists(os.path.expanduser(settings.output_path)): + os.makedirs(os.path.expanduser(settings.output_path)) + logger.info(f"Created output directory: {settings.output_path}") + + # setup logger + log_file_location = os.path.join(os.path.expanduser(settings.output_path), "bike_model.log") + logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s", + handlers=[logging.FileHandler(log_file_location), logging.StreamHandler()], + ) + + # set random seed for reproducibility + np.random.seed(settings.random_seed) + + return settings + + +def read_file(settings, file_path: str) -> gpd.GeoDataFrame: + """ + Read a shapefile and return a GeoDataFrame + + Looks for the shapefile in a few places: + 1. The current working directory + 2. The directory of the script + 3. The data directory specified in the settings file + """ + + def return_file(path: str) -> gpd.GeoDataFrame | pd.DataFrame: + if path.endswith(".shp"): + return gpd.read_file(path) + elif path.endswith(".csv"): + return pd.read_csv(path, comment="#") + else: + raise ValueError(f"Unsupported file type: {path}") + + # 1. Try current working directory + if os.path.exists(file_path): + return return_file(file_path) + + # 2. Try directory of the script + script_dir = os.path.dirname(os.path.abspath(__file__)) + script_path = os.path.join(script_dir, file_path) + if os.path.exists(script_path): + return return_file(script_path) + + # 3. Try data directory + data_path = os.path.join(os.path.expanduser(settings.data_dir), file_path) + if os.path.exists(data_path): + return return_file(data_path) + + raise FileNotFoundError( + f"Shapefile '{file_path}' not found in current directory, script directory, or provided path." + ) + + +def plot_shortest_path_with_results_buffered( + nodes, edges, shortest_path_df, origin, destination, iteration, buffer_size=None +): + """Plot the shortest path between two nodes with additional path information within a square buffer around the origin node.""" + print("Plotting the shortest path...") + path_data = shortest_path_df[ + (shortest_path_df.origin == origin) + & (shortest_path_df.destination == destination) + & (shortest_path_df.iteration == iteration) + ] + if path_data.empty: + print( + f"No path found between {origin} and {destination} for iteration {iteration}" + ) + return + + # Get the coordinates of the origin node + origin_node = nodes[nodes["id"] == origin].iloc[0] + origin_x, origin_y = origin_node["x"], origin_node["y"] + + if buffer_size: + # Define the buffer boundaries + min_x, max_x = origin_x - buffer_size, origin_x + buffer_size + min_y, max_y = origin_y - buffer_size, origin_y + buffer_size + + # Filter nodes within the buffer + filtered_nodes = nodes[ + (nodes["x"] >= min_x) + & (nodes["x"] <= max_x) + & (nodes["y"] >= min_y) + & (nodes["y"] <= max_y) + ] + + # Filter edges to include only those with both nodes within the buffer + filtered_edges = edges[ + edges["fromNode"].isin(filtered_nodes["id"]) + & edges["toNode"].isin(filtered_nodes["id"]) + ] + + # check to make sure destination node is also in the buffer + if destination not in filtered_nodes["id"].values: + print( + f"Destination node {destination} is not in the buffer size of {buffer_size}" + ) + + else: + filtered_nodes = nodes + filtered_edges = edges + + # Create a graph from the filtered nodes and edges + G = nx.Graph() + G.add_nodes_from( + [ + (node["id"], {"pos": (node["x"], node["y"])}) + for _, node in filtered_nodes.iterrows() + ] + ) + G.add_edges_from( + [ + (edge.fromNode, edge.toNode, {"weight": edge.distance}) + for _, edge in filtered_edges.iterrows() + ] + ) + + # Extract positions for plotting + pos = nx.get_node_attributes(G, "pos") + + # Plot the network + plt.figure(figsize=(10, 10)) + nx.draw( + G, pos, node_size=10, with_labels=False, edge_color="gray", alpha=0.5, width=0.5 + ) + + # Highlight the shortest path + path_edges = [ + (path_data.from_node.iloc[i], path_data.to_node.iloc[i]) + for i in range(len(path_data)) + ] + path_nodes = set(path_data.from_node).union(set(path_data.to_node)) + nx.draw_networkx_nodes( + G, pos, nodelist=path_nodes, node_color="red", node_size=50, label="Path Nodes" + ) + nx.draw_networkx_edges( + G, pos, edgelist=path_edges, edge_color="blue", width=2, label="Shortest Path" + ) + + # Highlight the origin and destination nodes + nx.draw_networkx_nodes( + G, pos, nodelist=[origin], node_color="green", node_size=100, label="Origin" + ) + nx.draw_networkx_nodes( + G, + pos, + nodelist=[destination], + node_color="purple", + node_size=100, + label="Destination", + ) + + # Calculate path information + num_edges = len(path_edges) + total_distance = path_data["distance"].sum() + + # total_cost = path_data["cost_total"].sum() + # turns = path_data[path_data.turnType > 0]["turnType"].count() + path_size = path_data.iloc[0]["path_size"] + # Add path information to the plot + info_text = ( + f"Origin: {origin}\n" + f"Destination: {destination}\n" + f"Iteration: {iteration}\n" + f"Number of Edges: {num_edges}\n" + f"Total Distance: {total_distance:.2f} units\n" + # f"Turns: {turns}\n" + # f"Total Cost: {total_cost:.2f}\n" + f"Path Size: {path_size:.2f}" + ) + + plt.text( + 0.05, + 0.95, + info_text, + transform=plt.gca().transAxes, + fontsize=12, + verticalalignment="top", + bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="white"), + ) + + # Add a legend + plt.legend(loc="upper right") + plt.title( + f"Shortest Path from Node {origin} to Node {destination} for iteration {iteration}" + ) + plt.show(block=True) diff --git a/src/asim/scripts/bike_route_choice/bike_threshold_calculator.py b/src/asim/scripts/bike_route_choice/bike_threshold_calculator.py new file mode 100644 index 000000000..c958b08e6 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_threshold_calculator.py @@ -0,0 +1,148 @@ +import sys, time, logging +import pandas as pd +import matplotlib.pyplot as plt +from bike_route_choice import load_settings, run_bike_route_choice + +def plot_results(results): + fig, ax1 = plt.subplots() + ax2 = ax1.twinx() + ax1.grid(True, which='both', axis='x', linestyle='-', color='#48484a', alpha=1,zorder=0) + ax1.grid(True, which='both', axis='y', linestyle='-', color='#006fa1', alpha=1,zorder=0) + ax2.grid(True, which='both', axis='y', linestyle='--', color='#f68b1f', alpha=0.8,zorder=0) + ax1.set_facecolor('#dcddde') + ax1.set_ylabel('99th %%ile Distance (mi)',color='#006fa1') + ax1.tick_params(axis='y',labelcolor='#006fa1') + + ax2.set_ylabel('Runtime (sec)',color='#f68b1f',rotation=-90,labelpad=15) + ax2.tick_params(axis='y',labelcolor='#f68b1f') + + ax1.set_xlabel('Utility Threshold (utiles)') + dist_plot = ax1.scatter(results.index,results.dist,color='#006fa1',label='99th %%ile Distance',zorder=3) + runtime_plot = ax2.scatter(results.index,results.runtime,color='#f68b1f',label='Runtime',zorder=3,marker='s',alpha=0.8) + ymin, ymax= ax2.get_ylim() + padding = 0.1*(ymax-ymin) + ax2.set_ylim(ymin-padding,ymax+padding) + legend = ax2.legend(handles=[dist_plot,runtime_plot],labels=['Distance','Runtime'],loc='upper left') + legend.get_frame().set_facecolor('white') + legend.get_frame().set_alpha(1) + legend.get_frame().set_edgecolor('#48484a') + + plt.show() + +if __name__=="__main__": + assert len(sys.argv) > 2,"""Usage: + python bike_threshold_calculator.py [target_margin [percentile [max_iterations]]] + + parameters: + settings filepath: path to YAML file containing bike model settings + target distance: the distance for which the search should aim (in miles) + target margin: the margin of error (< 1) allowed before termination (optional, default: 0.1) + percentile: the percentile of distance to compare against the target (optional, default: 0.99) + max iterations: the most bike model iterations that can be performed in the search (optional, default: 20) + + examples: + + python bike_threshold_calculator.py bike_route_choice_settings_taz.yaml 20 + # the resulting 99th %%ile distance must be w/in 10%% of the 20-mile target distance + # equivalent to: + python bike_threshold_calculator.py bike_route_choice_settings_taz.yaml 20 0.1 0.99 20 + + python bike_threshold_calculator.py bike_route_choice_settings_mgra.yaml 3 0.05 + # the resulting 99th %%ile distance must be w/in 5%% of the three-mile target distance + """ + + # pass settings file as command line argument + + settings_file = sys.argv[1] + + target_distance = float(sys.argv[2]) + + if len(sys.argv) > 3: + target_margin = float(sys.argv[3]) + else: + target_margin = 10/100 # 10% + + if len(sys.argv) > 4: + pctile = float(sys.argv[4]) + else: + pctile = 0.99 + + if len(sys.argv) > 5: + max_iterations = int(sys.argv[5]) + else: + max_iterations = 20 + + logger = logging.getLogger(__name__) + + # load settings + settings = load_settings(logger, settings_file) + + # first x data point + cur_threshold = settings.max_dijkstra_utility + upper_bound = None + lower_bound = None + + distances = {} + times = {} + + iteration = 0 + while True: + logger.info(f"Running w/ threshold {cur_threshold}") + start_time = time.time() + output = run_bike_route_choice(settings, logger) + end_time = time.time() + elapsed_time = end_time - start_time + + # record runtime + times[cur_threshold] = elapsed_time + + # calculate the 99th %ile of distances + cur_distance = output.distance.quantile(pctile) + logger.info(f"99th %ile distance: {cur_distance}") + + distances[cur_threshold] = cur_distance + + # termination condition + if abs(cur_distance - target_distance)/target_distance < target_margin or iteration >= max_iterations: + break + + else: + if cur_distance > target_distance: + # distance is too high, so reduce threshold + + upper_bound = cur_threshold + + # if we don't have a lower bound yet + if lower_bound is None: + # just halve the current threshold + cur_threshold = cur_threshold / 2 + + else: + # move halfway to the lower bound + cur_threshold = cur_threshold - (cur_threshold - lower_bound)/2 + + + else: # cur_distance < target_distance + # distance is too low, so increase threshold + + lower_bound = cur_threshold + + # if we don't have an upper bound yet + if upper_bound is None: + # just double the current threshold + cur_threshold = cur_threshold * 2 + + else: + # move halfway to the upper bound + cur_threshold = cur_threshold + (upper_bound - cur_threshold)/2 + + settings.max_dijkstra_utility = cur_threshold + + iteration +=1 + + results = pd.DataFrame({'dist':distances,'runtime':times}) + results.index.name = 'threshold' + + plot_results(results) + + results.to_csv(settings.output_path+"/threshold_results.csv") \ No newline at end of file diff --git a/src/asim/scripts/bike_route_choice/bike_traversal_utils.csv b/src/asim/scripts/bike_route_choice/bike_traversal_utils.csv new file mode 100644 index 000000000..d625e8160 --- /dev/null +++ b/src/asim/scripts/bike_route_choice/bike_traversal_utils.csv @@ -0,0 +1,23 @@ +Label,Description,Expression,Coefficient +util_thru_centroid,Prevent taking shortcuts via centroid connectors,thruCentroid,-999.9 +# The below coefficients are adapted from Lukawska et al. (2023),,, +util_yield,Biker must yield at junction,"@(df.signalExclRight | df.unlfrma | df.unlfrmi | df.unxma | df.unxmi)",-0.301 +util_downgrade,Biker has ROW but downgraded functional class,"@df.roadDowngrade & (~(df.signalExclRight | df.unlfrma | df.unlfrmi | df.unxma | df.unxmi))",-0.164 +# FIXME: do we really want to incentivize ALL nodes like this? +# The above coefficients are adapted from Lukawska et al. (2023),,, +################################################################################## +# The below coefficients are from the ABM2 model and have dubious sourcing (see ABM2 documentation),,, +# util_turns,Total turns,(turnType != 0),-0.083 +# util_traffic_signal,Traffic signals excluding right turns and T junctions,signalExclRight,-0.040 +# util_unsig_left_from_principal,Unsignalized left turns from principal arterials,unlfrma,-0.360 +# util_unsig_left_from_minor,Unsignalized left turns from minor arterials,unlfrmi,-0.150 +# util_unsig_cross_or_left_onto_principal,Unsignalized cross or left onto principal arterials,unxma,-0.48 +# util_unsig_cross_or_left_onto_minor,Unsignalized cross or left onto minor arterials,unxmi,-0.1 +# can't calculate path size here since we don't know the path yet,, +# path size calculation is done in the code when calculating the logsum,, +# util_log_path_size,log of path size,logPathSize,1.0 +# The above coefficients are from the ABM2 model and have dubious sourcing (see ABM2 documentation),,, +################################################################################## +# The below traffic signal coefficient is from Meister et al. (2024) +# util_traffic_signal,Traffic signals excluding right turns and T junctions,signalExclRight,-0.03 +# The above traffic signal coefficient is from Meister et al. (2024) \ No newline at end of file diff --git a/src/main/emme/toolbox/master_run.py b/src/main/emme/toolbox/master_run.py index 5b6071a66..64bdea0c3 100644 --- a/src/main/emme/toolbox/master_run.py +++ b/src/main/emme/toolbox/master_run.py @@ -251,6 +251,7 @@ def __call__(self, main_directory, scenario_id, scenario_title, emmebank_title, manage_settings(_join(main_directory, "conf", "sandag_abm.properties")) manage_settings(_join(main_directory, "src", "asim", "configs")) manage_settings(_join(main_directory, "src", "asim-cvm", "configs")) + manage_settings(_join(main_directory, "src", "asim", "scripts", "bike_route_choice")) props = load_properties(_join(main_directory, "conf", "sandag_abm.properties")) props.set_year_specific_properties(_join(main_directory, "input", "parametersByYears.csv")) @@ -528,8 +529,14 @@ def __call__(self, main_directory, scenario_id, scenario_title, emmebank_title, import_demand(omx_file, "TRUCK", period, base_scenario) if not skipBikeLogsums: + for file_name in ['bike_route_choice_settings_mgra.yaml','bike_route_choice_settings_taz.yaml']: + with open(_join(self._path,r'src\asim\scripts\bike_route_choice',file_name), 'r') as file: + settings = file.read() + settings = settings.replace(r"${path}",self._path) + with open(_join(self._path,r'src\asim\scripts\bike_route_choice',file_name), 'w') as file: + file.writelines(settings) self.run_proc("runSandagBikeLogsums.cmd", [drive, path_forward_slash], - "Bike - create AT logsums and impedances") + "Bike - create AT logsums and impedances", capture_output=True) # Copy updated logsums to scenario input to avoid overwriting self.copy_files(["bikeMgraLogsum.csv", "bikeTazLogsum.csv"], output_dir, input_dir) elif not os.path.exists(_join(output_dir, "bikeMgraLogsum.csv")): diff --git a/src/main/resources/runSandagBikeLogsums.cmd b/src/main/resources/runSandagBikeLogsums.cmd index da7e82dc3..282e4d66f 100644 --- a/src/main/resources/runSandagBikeLogsums.cmd +++ b/src/main/resources/runSandagBikeLogsums.cmd @@ -1,45 +1,13 @@ -rem @echo off +ECHO OFF set PROJECT_DRIVE=%1 set PROJECT_DIRECTORY=%2 %PROJECT_DRIVE% -cd %PROJECT_DRIVE%%PROJECT_DIRECTORY% -call %PROJECT_DIRECTORY%\bin\CTRampEnv.bat +cd /d %PROJECT_DIRECTORY% -rem ### First save the JAVA_PATH environment variable so it s value can be restored at the end. -set OLDJAVAPATH=%JAVA_PATH% - -rem ### Set the directory of the jdk version desired for this model run -set JAVA_PATH=%JAVA_64_PATH% - -rem ### Name the project directory. This directory will hava data and runtime subdirectories -set RUNTIME=%PROJECT_DIRECTORY% -set CONFIG=%RUNTIME%/conf - -set JAR_LOCATION=%PROJECT_DIRECTORY%/application -set LIB_JAR_PATH=%JAR_LOCATION%\* - -rem ### Define the CLASSPATH environment variable for the classpath needed in this model run. -set CLASSPATH=%CONFIG%;%RUNTIME%;%LIB_JAR_PATH%; - -rem ### Save the name of the PATH environment variable, so it can be restored at the end of the model run. -set OLDPATH=%PATH% - -rem ### Change the PATH environment variable so that JAVA_HOME is listed first in the PATH. -rem ### Doing this ensures that the JAVA_HOME path we defined above is the on that gets used in case other java paths are in PATH. -set PATH=%JAVA_PATH%\bin;%OLDPATH% - -%PROJECT_DRIVE% -cd %PROJECT_DRIVE%%PROJECT_DIRECTORY% - -rem rem build bike logsums -%JAVA_64_PATH%\bin\java -showversion -server -Xmx%MEMORY_BIKELOGSUM_MAX% -cp "%CLASSPATH%" -Dlog4j.configuration=log4j.xml -Dproject.folder=%PROJECT_DIRECTORY% org.sandag.abm.active.sandag.SandagBikePathChoiceLogsumMatrixApplication %PROPERTIES_NAME% -if ERRORLEVEL 1 goto DONE - -:done -rem ### restore saved environment variable values, and change back to original current directory -set JAVA_PATH=%OLDJAVAPATH% -set PATH=%OLDPATH% +CALL %activate_uv_asim% +python src/asim/scripts/bike_route_choice/bike_route_choice.py src/asim/scripts/bike_route_choice/bike_route_choice_settings_mgra.yaml || exit /b 2 +python src/asim/scripts/bike_route_choice/bike_route_choice.py src/asim/scripts/bike_route_choice/bike_route_choice_settings_taz.yaml || exit /b 2 \ No newline at end of file