The TemplateProcessor class enables template-based file generation with parameter sampling for uncertainty quantification and sensitivity analysis.
TemplateProcessor allows you to:
- Define variables with statistical distributions
- Generate multiple realizations for Monte Carlo simulations
- Import variable values from tables
- Create parametric studies with sampling
- Support various probability distributions
This is particularly useful for:
- Uncertainty quantification workflows
- Sensitivity analysis
- Parameter optimization setups
- Ensemble simulation generation
from rsimpy.common.template import TemplateProcessor
processor = TemplateProcessor(
template_path="template.dat",
variables_table=None,
output_file_path="output.dat",
all_uniform=False,
n_samples=100,
verbose=False
)Parameters:
- template_path (
strorPath): Path to template file with variable definitions - variables_table (
str,Path, orDataFrame, optional): CSV file or DataFrame with variable values - output_file_path (
strorPath, optional): Output file path for generated files - all_uniform (
bool, optional): Force all distributions to uniform (default: False) - n_samples (
int, optional): Number of samples to generate (default: 0, no generation) - verbose (
bool, optional): Print progress messages (default: False)
Attributes:
variables: Dictionary of parsed variables with their specificationsexperiments_table: DataFrame containing generated samples
Variables are defined in the template file using the following syntax:
<\var>variable_name[type,default_value,(distribution,param1,param2,...)]<var>
Components:
- variable_name: Unique identifier for the variable
- type (optional):
int,float, orstr(inferred if omitted) - default_value: Default value used if no sampling occurs
- distribution (optional): Statistical distribution specification
Fixed value, no variation:
<\var>var1[float,10.5,(constant,10.5)]<var>
Uniformly distributed between min and max:
<\var>var2[int,50,(uniform,10,100)]<var> # Discrete: 10, 11, ..., 100
<\var>var3[float,0.5,(uniform,0,1)]<var> # Continuous: [0, 1]
Parameters: (uniform, min, max)
Normally distributed with mean and standard deviation:
<\var>var4[float,100,(normal,100,15)]<var>
Parameters: (normal, mean, std_dev)
Note: Unbounded distribution
Normal distribution bounded by limits:
<\var>var5[float,0.25,(truncnormal,0.25,0.05,0.1,0.4)]<var>
Parameters: (truncnormal, mean, std_dev, min, max)
Log-normally distributed:
<\var>var6[float,1000,(lognormal,7,0.5)]<var>
Parameters: (lognormal, log_mean, log_std_dev)
Note: Always positive values
Triangular distribution with mode:
<\var>var7[float,150,(triangular,100,200,150)]<var>
Parameters: (triangular, min, max, mode)
Discrete values with specified probabilities:
<\var>var8[int,2,(categorical,{1,2,3,4},{0.1,0.2,0.3,0.4})]<var>
<\var>var9[str,'type1',(categorical,{type1,type2,type3},{0.5,0.3,0.2})]<var>
Parameters: (categorical, {values}, {probabilities})
Note: Probabilities must sum to 1.0
Values imported from external table:
<\var>var10[float,100,(table)]<var>
Note: Requires variables_table parameter in constructor
You can omit optional components:
<\var>var1<var> # Inferred type, requires table
<\var>var2[150]<var> # Type inferred, constant value
<\var>var3[(uniform,10,100)]<var> # Type inferred from distribution
<\var>var4[float,0.25,(normal,0.25,0.05)]<var> # Full specification
Generate sample realizations based on variable distributions:
processor.generate_experiments(n_samples=100)Parameters:
- n_samples (
int): Number of realizations to generate
Returns: None (stores samples in experiments_table attribute)
# Access as DataFrame
samples = processor.experiments_table
# Iterate through samples
for idx, row in samples.iterrows():
perm = row['permeability']
poro = row['porosity']
# ... use valuesIf output_file_path and n_samples are provided during initialization, files are automatically generated:
processor = TemplateProcessor(
template_path="template.dat",
output_file_path="simulation.dat",
n_samples=50
)
# Creates: simulation_0.dat, simulation_1.dat, ..., simulation_49.datTemplate file (template.dat):
** Reservoir Properties
PERMEABILITY <\var>perm[float,100,(uniform,50,500)]<var> md
POROSITY <\var>por[float,0.25,(uniform,0.15,0.35)]<var>
THICKNESS <\var>h[float,50,(normal,50,10)]<var> m
Python code:
from rsimpy.common.template import TemplateProcessor
processor = TemplateProcessor(
template_path="template.dat",
output_file_path="reservoir.dat",
n_samples=100,
verbose=True
)
# Access generated samples
samples = processor.experiments_table
print(f"Generated {len(samples)} realizations")
print(samples.describe())Create variable table (variables.csv):
well_name,rate,pressure
PROD-01,5000,3000
PROD-02,4500,3200
PROD-03,5500,2800
PROD-04,4800,3100Template file:
WELL <\var>well_name<var>
PRODUCER <\var>well_name<var>
OPERATE MAX STG <\var>rate<var>
OPERATE BHP <\var>pressure<var>
END
Python code:
from rsimpy.common.template import TemplateProcessor
import pandas as pd
# Load variable table
variables = pd.read_csv("variables.csv")
processor = TemplateProcessor(
template_path="well_template.dat",
variables_table=variables,
output_file_path="wells.dat",
n_samples=len(variables)
)
# Creates wells_0.dat, wells_1.dat, wells_2.dat, wells_3.dattemplate_text = """
** Uncertainty Analysis Template
** Permeability - Lognormal (typically log-distributed)
PERMI <\var>kx[float,100,(lognormal,4.6,0.5)]<var>
PERMJ <\var>ky[float,100,(lognormal,4.6,0.5)]<var>
PERMK <\var>kz[float,10,(lognormal,2.3,0.5)]<var>
** Porosity - Truncated Normal (physical bounds)
PORO <\var>phi[float,0.25,(truncnormal,0.25,0.05,0.1,0.4)]<var>
** Rock type - Categorical
RTYPE <\var>rock_type[int,1,(categorical,{1,2,3},{0.5,0.3,0.2})]<var>
** Well locations - Uniform integer
WELL_I <\var>well_i[int,50,(uniform,30,70)]<var>
WELL_J <\var>well_j[int,50,(uniform,30,70)]<var>
** Aquifer strength - Triangular (expert judgment)
AQUIFER_STRENGTH <\var>aq_str[float,1e6,(triangular,5e5,2e6,1e6)]<var>
"""
# Write template
with open("uncertainty_template.dat", "w") as f:
f.write(template_text)
# Generate samples
processor = TemplateProcessor(
template_path="uncertainty_template.dat",
output_file_path="case.dat",
n_samples=500
)
# Analyze samples
import matplotlib.pyplot as plt
samples = processor.experiments_table
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
axes = axes.flatten()
for i, col in enumerate(samples.columns):
if i < len(axes):
axes[i].hist(samples[col], bins=30, alpha=0.7, edgecolor='black')
axes[i].set_title(col)
axes[i].set_xlabel('Value')
axes[i].set_ylabel('Frequency')
plt.tight_layout()
plt.savefig("sample_distributions.png")# Create template for one-at-a-time sensitivity
template = """
PERMEABILITY <\var>perm[float,100]<var>
POROSITY <\var>poro[float,0.25]<var>
THICKNESS <\var>thick[float,50]<var>
"""
# Base case values
base = {'perm': 100, 'poro': 0.25, 'thick': 50}
variations = [-20, -10, 0, 10, 20] # Percent variations
import pandas as pd
# Generate sensitivity cases
cases = []
for param in base.keys():
for var in variations:
case = base.copy()
case[param] = base[param] * (1 + var/100)
case['varied_param'] = param
case['variation_pct'] = var
cases.append(case)
sensitivity_df = pd.DataFrame(cases)
# Generate files
processor = TemplateProcessor(
template_path="template.dat",
variables_table=sensitivity_df,
output_file_path="sensitivity.dat",
n_samples=len(sensitivity_df)
)For more efficient sampling with better coverage:
from scipy.stats import qmc # SciPy Quasi-Monte Carlo
import numpy as np
# Define ranges
n_samples = 100
n_vars = 3
# Generate LHS samples [0,1]
sampler = qmc.LatinHypercube(d=n_vars, seed=42)
lhs_samples = sampler.random(n=n_samples)
# Transform to desired distributions
import pandas as pd
# Permeability: lognormal
kx = np.exp(qmc.scale(lhs_samples[:, 0], 3, 6)) # ln(kx) ~ [3, 6]
# Porosity: uniform
phi = qmc.scale(lhs_samples[:, 1], 0.15, 0.35)
# Thickness: normal (using quantile function)
from scipy.stats import norm
h = norm.ppf(lhs_samples[:, 2], loc=50, scale=10)
# Create DataFrame
lhs_df = pd.DataFrame({
'permeability': kx,
'porosity': phi,
'thickness': h
})
# Use with template
processor = TemplateProcessor(
template_path="template.dat",
variables_table=lhs_df,
output_file_path="lhs_case.dat",
n_samples=n_samples
)The generated samples maintain statistical properties of the specified distributions:
import numpy as np
samples = processor.experiments_table
# Verify mean and std dev
for col in samples.columns:
mean = samples[col].mean()
std = samples[col].std()
print(f"{col}: mean={mean:.3f}, std={std:.3f}")
# Check correlations
correlation_matrix = samples.corr()
print(correlation_matrix)
# Goodness of fit tests can be applied as neededCommon errors and their solutions:
try:
processor = TemplateProcessor("template.dat")
except FileNotFoundError:
print("Template file not found")
try:
# Invalid distribution specification
template = "<\var>var[(invalid,0,1)]<var>"
except ValueError as e:
print(f"Invalid distribution: {e}")
try:
# Type mismatch
template = "<\var>var[str,1.5,(normal,0,1)]<var>"
except ValueError as e:
print(f"Type inconsistency: {e}")- Variable Naming: Use descriptive names that match your simulation inputs
- Distribution Choice:
- Use lognormal for permeability (always positive, typically log-distributed)
- Use truncated normal for porosity (bounded physically)
- Use uniform when you have no information
- Use categorical for discrete choices
- Sample Size:
- Monte Carlo: 100-1000+ samples for good statistics
- Sensitivity: 5-10 points per variable
- Latin Hypercube: Can achieve good coverage with fewer samples
- Validation: Always verify generated samples have expected distributions
- Documentation: Include comments in templates explaining variable choices
- Template parsing is fast even for large templates
- File generation scales linearly with number of samples
- Memory usage depends on number of variables and samples
- Consider batch processing for very large ensembles (>10,000 cases)