Conversation
ee6b375 to
2f42d8c
Compare
75b0bad to
a145408
Compare
db7d73b to
d3589c2
Compare
|
@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3 as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.) |
|
Hi, I think this looks promising! I have a few points:
|
|
Thanks, @masokol, for the review! Please find my response here:
This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this
There is already a new nodetool command that would print the current status. Please take a look at it here.
Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above. |
d3589c2 to
3610701
Compare
5730aa2 to
126684a
Compare
ec99162 to
f9f6971
Compare
95d0a5e to
e093b34
Compare
tolbertam
left a comment
There was a problem hiding this comment.
still in process of reviewing but thought I'd leave comments as I go since I still have a bit left.
test/unit/org/apache/cassandra/service/AutoRepairServiceSetterTest.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tcm/sequences/BootstrapAndJoin.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/config/YamlConfigurationLoaderTest.java
Outdated
Show resolved
Hide resolved
| .readRepair(params.readRepair); | ||
| .readRepair(params.readRepair) | ||
| .automatedRepair(params.autoRepair) | ||
| ; |
There was a problem hiding this comment.
nit: move ";" on the line above
| */ | ||
|
|
||
| package org.apache.cassandra.repair.autorepair; | ||
|
|
There was a problem hiding this comment.
nit: remove extra line
| } | ||
| }); | ||
|
|
||
|
|
There was a problem hiding this comment.
nit: extra line
| autoRepairProxy.setAutoRepairEnabled(repairType, enabled); | ||
| } | ||
|
|
||
| public void setRepairThreads(String repairType, int repairThreads) |
There was a problem hiding this comment.
s/setRepairThreads/setAutoRepairThreads/ here and elsewhere in this class
| public class AutoRepairMetrics | ||
| { | ||
| public static final String TYPE_NAME = "autorepair"; | ||
| public Gauge<Integer> repairsInProgress; |
There was a problem hiding this comment.
This and the other Gauge in this class can be final
| opts.table_max_repair_time = new DurationSpec.IntSecondsBound("6h"); | ||
| opts.materialized_view_repair_enabled = false; | ||
| opts.token_range_splitter = new ParameterizedClass(DEFAULT_SPLITTER.getName(), Collections.emptyMap()); | ||
| opts.initial_scheduler_delay = new DurationSpec.IntSecondsBound("5m"); // 5 minutes |
There was a problem hiding this comment.
The comment is not needed here
| COL_REPAIR_TYPE, COL_HOST_ID); | ||
|
|
||
| final static String RECORD_FINISH_REPAIR_HISTORY = String.format( | ||
|
|
There was a problem hiding this comment.
nit: remove empty line
| } | ||
| } | ||
|
|
||
| private ImmutableMap<String, String> options; |
There was a problem hiding this comment.
Make this a final.
| return StringUtils.isNumeric(value); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
nit: remove extra line
| .add("extensions", params.extensions); | ||
| .add("extensions", params.extensions) | ||
| .add("auto_repair", params.autoRepair.asMap()); | ||
|
|
There was a problem hiding this comment.
nit: remove the new line
| .readRepair(getReadRepairStrategy(row)); | ||
| .readRepair(getReadRepairStrategy(row)) | ||
| .automatedRepair(AutoRepairParams.fromMap(row.getFrozenTextMap("auto_repair"))) | ||
| ; |
There was a problem hiding this comment.
move it to the above line
| private static final TableMetadata AutoRepairPriorityTable = | ||
| parse(AUTO_REPAIR_PRIORITY, "Auto repair priority for each group", AUTO_REPAIR_PRIORITY_CQL).build(); | ||
|
|
||
|
|
There was a problem hiding this comment.
nit: remove extra line
| READ_REPAIR; | ||
| READ_REPAIR, | ||
| AUTO_REPAIR, | ||
| ; |
There was a problem hiding this comment.
move it above
| public interface AutoRepairServiceMBean | ||
| { | ||
| /** | ||
| * Enable or disable auto-repair for a given repair type |
There was a problem hiding this comment.
this comment is not needed as APIs are self explanatory
…geSplitter.java Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
…geSplitter.java Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Previously AutoRepairUtils.myTurnToRunRepair would return MY_TURN if there is capacity to select a new node for repair and the current node has the oldest lastRepairFinishTime. This change adjusts this method to make use of a newly added method getMostEligibleHostToRepair which will identify the node with the oldest lastRepairFinishTime of which none of the node's replicas are actively repairing. This should prevent replicas from actively repairing concurrently, which would increase load on replica sets and also create anticompaction conflicts for incremental repair. Behavior is configurable by allow_parallel_replica_repair and allow_parallel_replica_repair_across_schedules configuration. patch by Andy Tolbert, reviewed by Jaydeepkumar Chovatia and Francisco Guerrero for CASSANDRA-20180
9feb931 to
89b4ab6
Compare
|
Rebase on top of the latest trunk has been completed; all the unit tests are passing on CircleCI Many dtests are failing on CircleCI due to resource limitations. For that, I have run them internally inside our infrastructure and most of them are passing as expected. |
frankgh
left a comment
There was a problem hiding this comment.
Great work! I left a few minor comments. Also I have a few NITs , so feel free to ignore those
|
|
||
| public void setIncrementalRepairDiskHeadroomRejectRatio(double value) | ||
| { | ||
| DatabaseDescriptor.setIncrementalRepairDiskHeadroomRejectRatio(value); |
There was a problem hiding this comment.
can we validate for only positive numbers to be set here between the values of 0-1?
| * // TODO: TCM - how do we evolve these tables? | ||
| */ | ||
| public static final long GENERATION = 6; | ||
| public static final long GENERATION = 7; |
There was a problem hiding this comment.
can we add a description for gen 7 above?
| * <p> | ||
| * While this splitter has a lot of tuning parameters, the expectation is that the established default configuration | ||
| * shall be sensible for all {@link org.apache.cassandra.repair.autorepair.AutoRepairConfig.RepairType}'s. The following | ||
| * configuration parameters are offered. |
There was a problem hiding this comment.
did we intend to list the configuration parameters below in the javadocs?
There was a problem hiding this comment.
Incorporated
src/java/org/apache/cassandra/repair/autorepair/RepairTokenRangeSplitter.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
…geSplitter.java Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
…0572 Validate disk headroom for IR and a couple of comment updates
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
df3eb40 to
54e39a9
Compare
CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0
The Cassandra JIRA: https://issues.apache.org/jira/browse/CASSANDRA-19918
Code review comments