Skip to content

Adding 'problems' module#269

Open
j-atkins wants to merge 75 commits intomainfrom
problems
Open

Adding 'problems' module#269
j-atkins wants to merge 75 commits intomainfrom
problems

Conversation

@j-atkins
Copy link
Collaborator

@j-atkins j-atkins commented Jan 15, 2026

Summary

This PR adds a 'problems' module to VirtualShip, whereby as expeditions are run users are faced with authentic problems which may happen on a real-life research vessel. These can be "general" problems (e.g. delayed food/fuel deliveries, unplanned safety drills), or "instrument" problems (e.g. CTD winch breaks down).

Features & implementation

All problems are associated with a delay duration. There are checks of whether there is already enough contingency time built into the schedule to still reach the next waypoint in time. If there is then the simulation is allowed to continue. However, if the next waypoint would be missed, users are prompted to account for the delay in their schedules and the expedition cannot proceed until the scheduling is resolved.

The problems module interacts with Checkpoint objects to test that the scheduling issues have been resolved. A unique record of the problems is also cached to keep track of and determine whether they've been resolved etc. It may be that dealing with a problem at one waypoint causes more scheduling issues way down the chain of waypoints. This is by design (as similar headaches would be experienced in real life!) and the schedule/checkpoint verification methods should capture this and prompt further changes.

The selection of problems and their execution in the main virtualship run workflow is handled by a new ProblemSimulator class. Specific problem scenarios are housed in scenarios.py as problem classes, which are themselves sub-classes of GeneralProblem and InstrumentProblem base classes (to establish a structure for building complexity in further PRs).

When propagated through virtualship run, a selection of all the problems for the expedition is first created but the individual problems are only raised at the right time. For example, a pre-departure problem will occur before any instrument simulations and a CTD related problem only when the CTD instrument is being simulated.

A new 'prob-level' argument is now added to the virtualship run CLI command. There are three options to choose from:

  • Level 0 = No problems encountered during the expedition (i.e. turn off problems module).
  • Level 1 = 1-2 problems encountered across the expedition [DEFAULT].
  • Level 2 = 1 or more problems encountered, depending on expedition length and complexity, where longer and more complex expeditions will encounter more problems.

N.B. I have set the default to prob_level = 1 for now, as this is best for upcoming in-class VirtualShip use.

Additional notes

I have set up the logic so that if waypoints are added/removed from the expedition, if a waypoint location changes or the instruments employed changes, this will prompt a new set of problems for the expedition. I see this as the expedition having materially changed and it being a 'new' expedition. This also prevents just re-running the same expedition until you get a 'nicer' set of problems (without manually removing the problems cache!).


  • Tests

Closes #120

@j-atkins j-atkins marked this pull request as draft January 15, 2026 13:31
@j-atkins j-atkins changed the title Problems Adding 'problems' module Jan 15, 2026
Copy link
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! I quickly went through the code, but mostly focussed on the problem descriptions and have a few suggestions on these


message: str = (
"A pod of dolphins is observed swimming directly beneath the planned deployment area. "
"To avoid risk to wildlife and comply with environmental protocols, all in-water operations "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if there are no "in-water" operations at this station. Is it not safer to say "all operations"?

message: str = (
"A miscommunication with the ship's captain results in the sudden initiation of a mandatory safety drill. "
"The emergency vessel must be lowered and tested while the ship remains stationary, pausing all scientific "
"operations for the duration of the exercise. The drill introduces a delay of approximately 2 hours."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and on all later problems, I'm a bit confused about the word "approximately". In the schedule, it will be exactly 5 hours. So better to remove that word, as it could only confuse students (i.e. they may decide to build 10 hour contingency)?

message: str = (
"The drifter scheduled for deployment fails to establish a satellite connection during "
"pre-launch checks. To improve signal acquisition, the float must be moved to a higher location on deck "
"with fewer obstructions. The team waits for the satellite fix to come through, resulting in a delay "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fix" is perhaps a confusing word here?

message: str = (
"The Argo float scheduled for deployment fails to establish a satellite connection during "
"pre-launch checks. To improve signal acquisition, the float must be moved to a higher location on deck "
"with fewer obstructions. The team waits for the satellite fix to come through, resulting in a delay "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, replace fix with e.g. "waits for the satellite connection to be established"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great set of problems! Would be fun to collect a few more at e.g. Ocean Sciences Meeting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nice idea, I'll ask for ideas at my poster!

* display problems output in tabular format

* tidy up and fix simulation exiting procedure
j-atkins and others added 2 commits February 6, 2026 15:26
* update quickstart

* update sail the ship
* add post expedition report generator

* update tests and add new for post expedition report

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@VeckoTheGecko
Copy link
Collaborator

VeckoTheGecko commented Feb 6, 2026

A new 'prob-level' argument is now added to the virtualship run CLI command

(nit) We could be a bit more descriptive and playful with this naming by calling it 'difficulty-level' and using "easy", "medium", and "hard" - almost gamifying the naming, which is nice even if this is a more "advanced user" or "developer" API

I have set up the logic so that if waypoints are added/removed from the expedition, if a waypoint location changes [...] this will prompt a new set of problems for the expedition

I don't think that waypoint location or adding/removal should be used to seed the problems. This would mean that if users change a waypoint location to deal with a problem they may accidentally get a new problem set in return.

@j-atkins
Copy link
Collaborator Author

j-atkins commented Feb 6, 2026

I don't think that waypoint location or adding/removal should be used to seed the problems. This would mean that if users change a waypoint location to deal with a problem they may accidentally get a new problem set in return.

Yes, very true. I will fix this!

@j-atkins
Copy link
Collaborator Author

j-atkins commented Feb 9, 2026

I don't think that waypoint location or adding/removal should be used to seed the problems. This would mean that if users change a waypoint location to deal with a problem they may accidentally get a new problem set in return.

Yes, very true. I will fix this!

Now fixed 👍

Copy link
Collaborator

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the user interaction (showing problems, displaying the table left to right). Very readable and nice to show the users that gradually.

There are lots of folders created under problems_*. Perhaps it would be nice to have a cache dir in the expedition dir so that these can be hidden away?

Its not clear what the responsibility of the checkpoint.yaml files are. IIRC these were introduced before problems. What was the responsibility of this file, and does its responsibility change now we have problems? (also, is there overlap in responsibility between that and the problems_*/log/schedule_original.yaml file?)

Comment on lines 79 to 95
expedition_identifier = expedition.get_unique_identifier()

# dedicated problems directory for this expedition
problems_dir = expedition_dir / PROBLEMS_ENCOUNTERED_DIR.format(
expedition_identifier=expedition_identifier
)

# Verify instruments_config file is consistent with schedule
# verify instruments_config file is consistent with schedule
expedition.instruments_config.verify(expedition)

# load last checkpoint
checkpoint = _load_checkpoint(expedition_dir)
if checkpoint is None:
checkpoint = Checkpoint(past_schedule=Schedule(waypoints=[]))

# verify that schedule and checkpoint match
checkpoint.verify(expedition.schedule)
# verify that schedule and checkpoint match, and that problems have been resolved
checkpoint.verify(expedition, problems_dir.joinpath(LOG_DIR))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this logic re. the problems cache folder (+dir structure) should be refactored out so it can be more easily maintained

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs associated with the problems module are now saved to a cache dir

Comment on lines +82 to +181
# 2) check that problems have been resolved in the new schedule
hash_fpaths = [
str(path.resolve()) for path in problems_dir.glob("problem_*.json")
]

if len(hash_fpaths) > 0:
for file in hash_fpaths:
with open(file, encoding="utf-8") as f:
problem = json.load(f)
if problem["resolved"]:
continue
elif not problem["resolved"]:
# check if delay has been accounted for in the new schedule (at waypoint immediately after problem waypoint; or first waypoint if pre-departure problem)
delay_duration = timedelta(
hours=float(problem["delay_duration_hours"])
)

problem_waypoint = (
new_schedule.waypoints[0]
if problem["problem_waypoint_i"] is None
else new_schedule.waypoints[problem["problem_waypoint_i"]]
)

# pre-departure problem: check that whole delay duration has been added to first waypoint time (by testing against past schedule)
if problem["problem_waypoint_i"] is None:
time_diff = (
problem_waypoint.time - self.past_schedule.waypoints[0].time
)
resolved = time_diff >= delay_duration

# problem at a later waypoint: check new scheduled time exceeds sail time + delay duration + instrument deployment time (rather whole delay duration add-on, as there may be _some_ contingency time already scheduled)
else:
failed_waypoint = new_schedule.waypoints[self.failed_waypoint_i]

scheduled_time = failed_waypoint.time - problem_waypoint.time

stationkeeping_time = _calc_wp_stationkeeping_time(
problem_waypoint.instrument,
expedition,
) # total time required to deploy instruments at problem waypoint

sail_time = _calc_sail_time(
problem_waypoint.location,
failed_waypoint.location,
ship_speed_knots=expedition.ship_config.ship_speed_knots,
projection=PROJECTION,
)[0]

min_time_required = (
sail_time + delay_duration + stationkeeping_time
)

resolved = scheduled_time >= min_time_required

if resolved:
print(
"\n\n🎉 Previous problem has been resolved in the schedule.\n"
)

# save back to json file changing the resolved status to True
problem["resolved"] = True
with open(file, "w", encoding="utf-8") as f_out:
json.dump(problem, f_out, indent=4)

# only handle the first unresolved problem found; others will be handled in subsequent runs but are not yet known to the user
break

else:
problem_wp_str = (
"in-port"
if problem["problem_waypoint_i"] is None
else f"at waypoint {problem['problem_waypoint_i'] + 1}"
)
affected_wp_str = (
"1"
if problem["problem_waypoint_i"] is None
else f"{problem['problem_waypoint_i'] + 2}"
)
time_elapsed = (
(sail_time + delay_duration + stationkeeping_time)
if problem["problem_waypoint_i"] is not None
else delay_duration
)
failed_waypoint_time = (
failed_waypoint.time
if problem["problem_waypoint_i"] is not None
else new_schedule.waypoints[0].time
)
current_time = problem_waypoint.time + time_elapsed

raise CheckpointError(
f"The problem encountered in previous simulation has not been resolved in the schedule! Please adjust the schedule to account for delays caused by the problem (by using `virtualship plan` or directly editing the {EXPEDITION} file).\n\n"
f"The problem was associated with a delay duration of {problem['delay_duration_hours']} hours {problem_wp_str} (meaning waypoint {affected_wp_str} could not be reached in time). "
f"Currently, the ship would reach waypoint {affected_wp_str} at {current_time}, but the scheduled time is {failed_waypoint_time}."
+ (
f"\n\nHint: don't forget to factor in the time required to deploy the instruments {problem_wp_str} when rescheduling waypoint {affected_wp_str}."
if problem["problem_waypoint_i"] is not None
else ""
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that a lot of the logic can be refactored out. Theres a lot of info needed about the structure of the problems file, and triggering the writing of said file. This can become very difficult if we start reading/writing this file in a different part of the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see what you mean, new issue opened in #283 for a later PR

Comment on lines 74 to 88
@pytest.mark.parametrize(
"delay_duration_hours, new_wp2_time, should_resolve",
[
(1.0, datetime(2024, 2, 1, 15, 0, 0), True), # problem resolved
(5.0, datetime(2024, 2, 1, 12, 0, 0), False), # problem unresolved
],
)
@patch(
"virtualship.models.checkpoint._calc_wp_stationkeeping_time",
return_value=timedelta(hours=1),
)
@patch(
"virtualship.models.checkpoint._calc_sail_time",
return_value=(timedelta(hours=2), None),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for the patching here? In general in testing I think we should avoid reaching for monkeypatching and mocking as much as possible - its a testing technique with lots of footguns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patching now removed

assert is_dataclass(instance)

# required attributes and types
assert hasattr(instance, "message")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check if the attr is there before checking it further on the next line. You can remove the hasattr checks in both these functions

Comment on lines 299 to 311
@dataclass
@register_instrument_problem
class ArgoSatelliteConnectionDelay(InstrumentProblem):
"""Problem: Argo float fails to establish satellite connection before deployment."""

message: str = (
"The Argo float scheduled for deployment fails to establish a satellite connection during "
"pre-launch checks. To improve signal acquisition, the float must be moved to a higher location on deck "
"with fewer obstructions. The team waits for the satellite connection to be established, resulting in a delay "
"of 2 hours."
)
delay_duration: timedelta = timedelta(hours=2.0)
instrument_type: InstrumentType = InstrumentType.ARGO_FLOAT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these many classes are more complicated than they need to be (due to confusion about the roles of data classes).

@dataclass
class GeneralProblem:...

uses dataclasses correctly (i.e., it defines a container structure that houses data and can be used in the rest of the application) however the definition of the data itself should not be done via subclassing but by creating these instances.

Also @register_general_problem etc. isn't needed.

I think the following would be much simpler?

GENERAL_PROBLEMS = [
    GeneralProblem(# Problem: Sudden initiation of a mandatory safety drill.
        message = (
            "A miscommunication with the ship's captain results in the sudden initiation of a mandatory safety drill. "
            "The emergency vessel must be lowered and tested while the ship remains stationary, pausing all scientific "
            "operations for the duration of the exercise. The drill introduces a delay of 2 hours."
        ),
        delay_duration = timedelta(hours=2.0),
        pre_departure = False,
    ),
    GeneralProblem( # Problem: Fuel delivery tanker delayed, causing a postponement of departure.
        message = (
            "The fuel tanker expected to deliver fuel has not arrived. Until the tanker reaches the pier, "
            "we cannot leave. Once it arrives, securing the fuel lines in the ship's tanks and fueling operations "
            "will also take additional time. These combined delays postpone departure by 5 hours."
        ),
        delay_duration = timedelta(hours=5.0),
        pre_departure = True,
    ),
    GeneralProblem( #Problem: Marine mammals observed in deployment area, causing delay.
        message = (
        "A pod of dolphins is observed swimming directly beneath the planned deployment area. "
        "To avoid risk to wildlife and comply with environmental protocols, all operations "
        "must pause until the animals move away from the vicinity. This results in a delay of about 2 hours."
    ),
    delay_duration = timedelta(hours=2),
    pre_departure= False,
    ),
]

I see the message # TODO: could add a (abstract) method to check if problem is valid for given waypoint, e.g. location (tropical waters etc.) but I think that should be scoped out and handled later (agreeing on what context is available to the problem)

I think the best way to handle that would probably be via a

@dataclass
class GeneralProblem:
    """Base class for general problems. Can occur pre-depature or during expedition."""

    message: str
    delay_duration: timedelta
    pre_departure: bool  # True if problem occurs before expedition departure, False if during expedition

+    should_run: types.FunctionType # e.g, has the waypoint information
def in_tropical_waters(waypoint):...
GeneralProblem(
    message=...,
    delay_duration=...,
   pre_departure=...,
   should_run=in_tropical_waters,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following would be much simpler?

GENERAL_PROBLEMS = [
    GeneralProblem(# Problem: Sudden initiation of a mandatory safety drill.
        message = (
            "A miscommunication with the ship's captain results in the sudden initiation of a mandatory safety drill. "
...

Now implemented as suggested

@j-atkins
Copy link
Collaborator Author

j-atkins commented Feb 9, 2026

Thanks @VeckoTheGecko! PR now updated to reflect your suggestions.

I've also updated the logic for determining an expedition identifier (id). Before it was determining a new expedition (and therefore new set of problems) based on a change in instrument deployment at any waypoint. However, that could inadvertently lead to a new set of problems being returned if a user deletes a waypoint to deal with scheduling issues.

Now, instead, a new id is only returned when there is an addition to the set of instruments deployed across whole expedition (but not subtraction because this could be triggered, again, by deleting waypoints if the waypoint is the only one with that instrument type).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding problems during expedition

4 participants