Start with the MethodDescriptor.
It is the contract between your method and the rest of the platform.
Create api/methods/your_method.py first, then implement the method behind it.
from api.methods.base import MethodDescriptor, SubprocessEngineConfig
from api.prediction_engines.your_method import your_method_predictions # only for Path 2
descriptor = MethodDescriptor(
key="YourMethod", # unique ID used in API/UI
display_name="Your Method", # human-readable name
authors="Author A, Author B",
publication_title="Paper title",
citation_url="https://doi.org/...",
repo_url="https://github.com/...",
supports=["kcat"], # e.g. ["kcat"], ["Km"], ["kcat/Km"], or combinations
input_format="single", # backend contract: "single" or "multi"
output_cols={"kcat": "kcat (1/s)"},
max_seq_len=1024,
col_to_kwarg={"Substrate": "substrates"},
target_kwargs={"kcat": {}},
# Engine selection rule:
# - Use subprocess=SubprocessEngineConfig(...) by default.
# - Use pred_func=your_method_predictions only when custom orchestration is required.
embeddings_used=[],
)supports: which targets your method predicts.input_format: backend CSV column contract expected by the method. Descriptors usesinglefor theSubstratecolumn contract andmultifor the full-reactionSubstrates+Productscontract. User-facing docs should describe the three CSV formats:single,multi(dot-joined co-substrates inSubstrate), andfull reaction(Substrates+Products).col_to_kwarg: maps CSV columns to kwargs passed into your method runtime.target_kwargs: per-target switches (for shared kcat/Km scripts).subprocessorpred_func: set exactly one. Usesubprocessby default. Usepred_funconly when the shared subprocess engine cannot support your runtime flow.
Use this decision rule:
- Use the shared subprocess engine by default.
- Use a custom engine only when required by method-specific behaviour.
Source code of your method should be added to "models/YourMethod/" (this can be a Git submodule).
General batching best practice:
- Batching is fine, but keep batch sizes realistic to avoid RAM spikes (generally no more than 32-64 rows/sequences per batch).
Use this if your model can run as one subprocess call.
You write:
- One prediction script
subprocess=SubprocessEngineConfig(...)in descriptor
The shared engine handles:
- Row validation (sequence + substrate/product chemistry)
- Temporary input/output files
- Subprocess execution
- Progress parsing (
Progress: x/y) - Output parsing and row mapping
Your script must support:
python your_script.py --input <input.json> --output <output.json>Input JSON:
{
"method": "YourMethod",
"target": "kcat",
"public_id": "abc1234",
"rows": [
{"sequence": "MKT...", "substrates": "CC(=O)O"}
],
"params": {
"kinetics_type": "KCAT"
}
}Output JSON:
{
"predictions": [12.3],
"invalid_indices": []
}Rules:
predictionslength must equalrowslength.invalid_indicesis optional and is relative torows.- Use
nullfor missing predictions. - If your script uses PyTorch, handle both GPU and CPU runtimes:
use CUDA only when
torch.cuda.is_available()isTrue, and keep a CPU fallback.
Path config example:
subprocess=SubprocessEngineConfig(
python_path_key="YourMethod",
script_key="YourMethod",
data_path_env={"YOUR_METHOD_DATA": "YourMethod"},
)Use this if you need custom behavior not covered by the shared engine.
Examples:
- Special validation rules
- Non-standard file contracts
- Multi-stage orchestration
- Extra Python-side preprocessing/caching
You write:
api/prediction_engines/your_method.pypred_func=your_method_predictionsin descriptor
Expected engine signature:
def your_method_predictions(
sequences: list[str],
public_id: str,
**kwargs,
) -> tuple[list, list[int] | dict[int, str]]:
...Return:
predictions: one value per input rowinvalid_indices: one of:list[int]of failed row indices relative to input listdict[int, str]mapping failed row indices to clear reasons
Recommendation:
- Return
dict[int, str]for richer user feedback in job output and progress views.
If your method needs a new Python environment, you must update the full worker image Dockerfile.envs.
- Add a requirements file:
docker-requirements/your_method_requirements.txt
- Add a parallel env stage in
Dockerfile.envs.
The Dockerfile uses multi-stage builds so all envs are built in parallel by BuildKit. Add two things:
a) A new FROM base AS env-your_method stage (alongside the other env-* stages):
# ── YourMethod ────────────────────────────────────────────────────────────────
FROM base AS env-your_method
COPY docker-requirements/your_method_requirements.txt ./docker-requirements/
RUN --mount=type=cache,target=/opt/conda/pkgs,sharing=locked \
--mount=type=cache,id=webkinpred-pip-py310,target=/root/.cache/pip,sharing=locked \
mamba create -n your_method_env python=3.10 -c conda-forge -y \
&& conda run -n your_method_env pip install -r docker-requirements/your_method_requirements.txtIf your method needs extra conda packages (e.g. RDKit, XGBoost), install them before pip install (see env-dlkcat and env-turnup stages for examples).
b) A COPY --from line in the final stage (alongside the other env copies):
COPY --from=env-your_method /opt/conda/envs/your_method_env /opt/conda/envs/your_method_env- Add runtime keys in:
webKinPred/config_docker.pywebKinPred/config_local.py(for local development) Both inherit common path shape fromwebKinPred/config_base.py.
PYTHON_PATHS["YourMethod"] = "/opt/conda/envs/your_method_env/bin/python"
PREDICTION_SCRIPTS["YourMethod"] = "/app/models/YourMethod/predict.py"
DATA_PATHS["YourMethod"] = "/app/models/YourMethod/data"If your method can reuse an existing env, skip steps 1-2 and only add the config keys.
The embeddings cache stores reusable PLM outputs under media/sequence_info, keyed by seq_id.
We use this to avoid repeated PLM inference for the same sequence across jobs and methods.
GPU offload runs missing embedding work on a remote GPU before prediction starts. We use this to reduce CPU load and improve throughput. If the remote GPU path fails or is unavailable, prediction falls back to local compute.
Read the full guide:
If you want to include your method's training data in the sequence-similarity validation, read:
This includes:
- reusing an existing dataset by extending its label (for example
DLKcat/UniKP/YourMethod) - adding a new FASTA + DB dataset
- setting
method_keysin each dataset entry so backend method mapping works
Setup:
pip install -r requirements.txt
python manage.py migrateRun:
python tools/test_method_integration.py --method YourMethodWhat it tests:
- method registry discovery
- descriptor validity (runnable config checks)
- direct prediction execution through backend task helpers
- output CSV generation and output-shape checks
- all targets your method supports (
kcat,Km, and/orkcat/Km) - optional DLKcat sanity check first
If you use Path 1 (subprocess=SubprocessEngineConfig(...)), do this before testing:
- create/install your method environment
- set
PYTHON_PATHS["YourMethod"]inwebKinPred/config_local.pyto that environment's Python executable