EPIC-IDP: A Python tool for calculating effective interactions between intrinsically disordered proteins
EPIC-IDP (Effective Protein Interaction Calculator for Intrinsically Disordered Proteins) is a Python package for calculating interaction strengths between intrinsically disordered proteins (IDPs), as quantified by a matrix of effective Flory-Huggins
To use the package, you need to copy the epic_idp folder to your working directory or add the path of the epic_idp folder to your Python path. You can then import the package using
import epic_idpNote that the package requires numpy to be installed.
The main object of the package is the chi_effective_calculator class which is used to calculate the effective
- Create an instance of the
chi_effective_calculatorclass providing interaction parameters (e.g. the Bjerrum length or short-range interaction parameter set) as input. - Add the sequences of all IDPs of interest using the
add_IDPmethod. - Calculated the effective
$\chi$ parameters are using thecalc_chi_effandcalc_all_chi_effmethods.
The following example, based on the example_1.py script, demonstrates how to use the package to calculate the effective
First, we import the chi_effective_calculator.
from epic_idp import chi_effective_calculatorWe next create the chi_effective_calculator instance as follows:
cec = chi_effective_calculator(rho0=5., lB=0.8, kappa=0.2, a=0.4)The arguments, given in units of the residue-residue bond length
-
rho0: A reference density$\rho_0 b^3$ that only provides an overall multiplicative factor to the$\chi$ parameters. -
lB: The Bjerrum length$l_{\rm B} / b$ which sets the strength of the electrostatic interactions. -
kappa: The inverse screening length (or Debye length)$\kappa b$ which sets the range of the electrostatic interactions. -
a: A Gaussian smearing length$a/b$ that smoothly suppresses the electrostatic interactions at short distances.
The sequences of interest are
seqs = {}
seqs['sv1'] = 'EKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEKEK'
seqs['sv2'] = 'EEEKKKEEEKKKEEEKKKEEEKKKEEEKKKEEEKKKEEEKKKEEEKKKEK'
seqs['sv3'] = 'KEKKKEKKEEKKEEKEKEKEKEEKKKEEKEKEKEKKKEEKEKEEKKEEEE'
seqs['sv4'] = 'KEKEKKEEKEKKEEEKKEKEKEKKKEEKKKEEKEEKKEEKKKEEKEEEKE'
seqs['sv5'] = 'KEKEEKEKKKEEEEKEKKKKEEKEKEKEKEEKKEEKKKKEEKEEKEKEKE'
seqs['sv6'] = 'EEEKKEKKEEKEEKKEKKEKEEEKKKEKEEKKEEEKKKEKEEEEKKKKEK'
seqs['sv7'] = 'EEEEKKKKEEEEKKKKEEEEKKKKEEEEKKKKEEEEKKKKEEEEKKKKEK'
seqs['sv8'] = 'KKKKEEEEKKKKEEEEKKKKEEEEKKKKEEEEKKKKEEEEKKKKEEEEKE'
seqs['sv9'] = 'EEKKEEEKEKEKEEEEEKKEKKEKKEKKKEEKEKEKKKEKKKKEKEEEKE'
seqs['sv10'] = 'EKKKKKKEEKKKEEEEEKKKEEEKKKEKKEEKEKEEKEKKEKKEEKEEEE'
seqs['sv11'] = 'EKEKKKKKEEEKKEKEEEEKEEEEKKKKKEKEEEKEEKKEEKEKKKEEKK'
seqs['sv12'] = 'EKKEEEEEEKEKKEEEEKEKEKKEKEEKEKKEKKKEKKEEEKEKKKKEKK'
seqs['sv13'] = 'KEKKKEKEKKEKKKEEEKKKEEEKEKKKEEKKEKKEKKEEEEEEEKEEKE'
seqs['sv14'] = 'EKKEKEEKEEEEKKKKKEEKEKKEKKKKEKKKKKEEEEEEKEEKEKEKEE'
seqs['sv15'] = 'KKEKKEKKKEKKEKKEEEKEKEKKEKKKKEKEKKEEEEEEEEKEEKKEEE'
seqs['sv16'] = 'EKEKEEKKKEEKKKKEKKEKEEKKEKEKEKKEEEEEEEEEKEKKEKKKKE'
seqs['sv17'] = 'EKEKKKKKKEKEKKKKEKEKKEKKEKEEEKEEKEKEKKEEKKEEEEEEEE'
seqs['sv18'] = 'KEEKKEEEEEEEKEEKKKKKEKKKEKKEEEKKKEEKKKEEEEEEKKKKEK'
seqs['sv19'] = 'EEEEEKKKKKEEEEEKKKKKEEEEEKKKKKEEEEEKKKKKEEEEEKKKKK'
seqs['sv20'] = 'EEKEEEEEEKEEEKEEKKEEEKEKKEKKEKEEKKEKKKKKKKKKKKKEEE'
seqs['sv21'] = 'EEEEEEEEEKEKKKKKEKEEKKKKKKEKKEKKKKEKKEEEEEEKEEEKKK'
seqs['sv22'] = 'KEEEEKEEKEEKKKKEKEEKEKKKKKKKKKKKKEKKEEEEEEEEKEKEEE'
seqs['sv23'] = 'EEEEEKEEEEEEEEEEEKEEKEKKKKKKEKKKKKKKEKEKKKKEKKEEKK'
seqs['sv24'] = 'EEEEKEEEEEKEEEEEEEEEEEEKKKEEKKKKKEKKKKKKKEKKKKKKKK'
seqs['sv25'] = 'EEEEEEEEEEEKEEEEKEEKEEKEKKKKKKKKKKKKKKKKKKEEKKEEKE'
seqs['sv26'] = 'KEEEEEEEKEEKEEEEEEEEEKEEEEKEEKKKKKKKKKKKKKKKKKKKKE'
seqs['sv27'] = 'KKEKKKEKKEEEEEEEEEEEEEEEEEEEEKEEKKKKKKKKKKKKKKKEKK'
seqs['sv28'] = 'EKKKKKKKKKKKKKKKKKKKKKEEEEEEEEEEEEEEEEEEKKEEEEEKEK'
seqs['sv29'] = 'KEEEEKEEEEEEEEEEEEEEEEEEEEEKKKKKKKKKKKKKKKKKKKKKKK'
seqs['sv30'] = 'EEEEEEEEEEEEEEEEEEEEEEEEEKKKKKKKKKKKKKKKKKKKKKKKKK'
seq_names = list(seqs.keys()) # All sequence namesWe add these sequences to the chi_effective_calculator instance as follows:
for seq_name in seq_names:
cec.add_IDP(seq_name, seqs[seq_name])The 30-by-30 matrix of effective calc_all_chi_eff method:
chi_eff_matrix = cec.calc_all_chi_eff()If we only want the chi parameter between two specific sequences, we can instead call cec.calc_chi_eff('sv10', 'sv25').
The resulting chi_eff_matrixis here visualized as a heatmap:
In this example, we consider variants of the low-complexity domain (LCD) the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1), referred to as A1-LCD. These sequences have been studied experimentally (Bremer et. al, Nat Chem 2022, https://doi.org/10.1038/s41557-021-00840-w) and form part of the basis for the Mpipi force field (Joseph et. al, Nat Comp Sci 2021, https://doi.org/10.1038/s43588-021-00155-3). The code for this example is given in the example_2.py script.
The sequences are:
seqs = {}
seqs['WT'] = 'MASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF'
seqs['-3R+3K'] = 'MASASSSQRGKSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF'
seqs['-4F-2Y'] = 'MASASSSQRGRSGSGNSGGGRGGGFGGNDNFGRGGNSSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNSGGGGSSNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGYGGSSSSSSSGSGRRF'
seqs['-6R+6K'] = 'MASASSSQKGKSGSGNFGGGRGGGFGGNDNFGKGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF'
seqs['+7F-7Y'] = 'MASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGFGGSGDGFNGFGNDGSNFGGGGSFNDFGNFNNQSSNFGPMKGGNFGGRSSGGSGGGGQFFAKPRNQGGFGGSSSSSSFGSGRRF'
seqs['+7K+12D'] = 'MASADSSQRDRDDKGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGKYGGDGDKYNGFGNDGKNFGGGGSYNDFGNYNNQSSNFDPKMGGNFKDRSSGPYDKGGQYFAKPRNQGGYGGSSSSKSYGSDRRF'
seqs['+7R+12D'] = 'MASADSSQRDRDDRGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGRYGGDGDRYNGFGNDGRNFGGGGSYNDFGNYNNQSSNFDPKMGGNFRDRSSGPYDRGGQYFAKPRNQGGYGGSSSSRSYGSDRRF'
seqs['-9F+3Y'] = 'MASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNGNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRS'
seqs['-12F+12Y']= 'MASASSSQRGRSGSGNYGGGRGGGYGGNDNYGRGGNYSGRGGYGGSRGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGRSSGGSGGGGQYYAKPRNQGGYGGSSSSSSYGSGRRY'
seq_names = list(seqs.keys()) # All sequence namesThe chi_effective_calculator instance is now created with the interaction_matrix argument set to 'Mpipi' to account for the Mpipi force field interaction matrix for short-range interactions.
cec = chi_effective_calculator(rho0=1., lB=1.7, kappa=0.75, a=0.1, Vh0=3.0, interaction_matrix='Mpipi')The sequences are added to the chi_effective_calculator instance as before:
for seq_name in seq_names:
cec.add_IDP(seq_name, seqs[seq_name])The 9-by-9 matrix of effective calc_all_chi_eff method:
chi_eff_matrix = cec.calc_all_chi_eff()The results are shown here:
The left panel shows the diagonal elements
A common approach to study the phase behaviour of IDPs is the Flory-Huggins model, defined by the free energy density
Here,
The
In Wessén et al., J. Phys. Chem. B, 2022 (https://pubs.acs.org/doi/10.1021/acs.jpcb.2c06181), a field-theoretic model for IDP phase separation is formulated based on a microscopic interaction Hamiltonian that includes pair-wise amino-acid interactions through long-range electrostatic (Coulomb) forces and short-range non-electrostatic (e.g., hydrophobic or cation-
The EPIC-IDP package implements the calculation of the effective
where the three contributions correspond to:
-
$\left( \chi_{\rm e}^{(0)} \right)_{ij}$ : Effective$\chi_{ij}$ parameter following from a mean-field treatment of long-range electrostatic interactions. This only depends on the net charge per chain of the two proteins. -
$\left( \chi_{\rm e}^{(1)} \right)_{ij}$ : The first order correction from electrostatic interactions that follows from RPA theory. This term accounts for charge sequence patterns in the amino-acid sequences, and can thus distinguish between IDPs with same composition but different sequences. -
$\left( \chi_{\rm h}^{(0)} \right)_{ij}$ : Effective$\chi_{ij}$ parameter following from a mean-field treatment of short-range non-electrostatic interactions (e.g., hydrophobic interactions or cation-$\pi$ interactions). This only depends on the amino-acid content (composition), but not the residue order (sequence) in the involved proteins.
The first term is given by
where
The second term is given by
where
is a type of form-factor for the charge density of a single IDP species of type
The final term is given by
Here,
i.e., as a residue-universal interaction potential interaction_matrix argument of the chi_effective_calculator class, and can be one of the following:
'KH-D': Table S3 Data in Dignon et al., 2018 (https://doi.org/10.1371/journal.pcbi.1005941)'Mpipi': The 20-by-20 matrix for amino-acid pairs in the Mpipi force field, Joseph et al., 2021 (https://doi.org/10.1038/s43588-021-00155-3)'Mpipi_RNA': The 24-by-24 matrix including RNA bases in the Mpipi force field. RNA bases are denoted by lower-case letters (a,c,gandu) to distinguish them from amino-acids.'CALVADOS1': The original CALVADOS hydrophobicity scale in Tesei et al., 2021 (https://doi.org/10.1073/pnas.2111696118)'CALVADOS2': The updated CALVADOS hydrophobicity scale in Tesei et al., 2022 (https://doi.org/10.12688/openreseurope.14967.2)'HPS': Table S1 in Dignon et al., 2018 (https://doi.org/10.1371/journal.pcbi.1005941)'URRY': Table S2 in Regy et al., 2021 (https://doi.org/10.1002/pro.4094)'FB': Table S7 in Dannenhoffer-Lafage et al., 2021 (https://doi.org/10.1021/acs.jpcb.0c11479)

