Comparison of EM Algorithm to Hierarchical Estimator

EM Algorithm

The EM Algorithm is an iterative method to calculate the MLE of observed data when there is a latent variable. This algorithm is implemented in Python in emAlgorithm.py.

Hierarchical Estimator

The Hierarchical Estimator uses Markov Chain Composite Likelihood (MCCL) to approximate the MLE by assuming the observed data follow a Markov Chain. This algorithm is implemented in R. See DOCUMENTATION.txt for details.

How To Use

Set-up

When you first clone the repo, run the following commands in your terminal from this directory.

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

To run 1 data set

Create a file of the binarized DNA sequences, with one observed sequence per line.

To use the EM algorithm: in emAlgorithm.py, run main() with the following parameters:

L is an integer with the length of the DNA sequences
q is a float representing the probability of recombination
file_name is a path to the file containing the data

To use the Hierarchical estimator: In simulator.R, run get_one_estimate with the following parameters:

L is an integer with the length of the DNA sequences
q_val is a float representing the probability of recombination
m is an integer representing the margin to use, which must be between 1 and min(3, L - 2).
n is the number of DNA sequences in the data set
file_name is a path to the file containing the data

To run a simulation

In main.py, set the parameters test_l, test_q, test_pi, and test_m. Then, call run_full_simulation() to do the simulation.

test_l is an array of integers representing the lengths you want to test
test_q is an array of floats representing the recombination probabilities you want to test
test_pi is a dictionary with a key for each element, L of test_l. The value associated with each key is an array of length 2^L representing the true ancestral distribution.
test_m is a dictionary with a key for each element, L of test_l. The value associated with each key is an array with integers ranging from 1 to L - 2, with a maximum of 3.
generate_new_data is a Boolean. It is False by default, but can be True if you don't have existing data or want to generate new data.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
DOCUMENTATION.txt		DOCUMENTATION.txt
README.md		README.md
ancestor_bitwise.R		ancestor_bitwise.R
ancestor_estimation.R		ancestor_estimation.R
ancestor_general.R		ancestor_general.R
dataInterpreter.py		dataInterpreter.py
descendant_sequences.R		descendant_sequences.R
emAlgorithm.py		emAlgorithm.py
fourwise_marginal_corrected.R		fourwise_marginal_corrected.R
main.py		main.py
markovChain.py		markovChain.py
onewise_marginal.R		onewise_marginal.R
pairwise_marginal.R		pairwise_marginal.R
r_interface.py		r_interface.py
requirements.txt		requirements.txt
run_analysis_MCCL.R		run_analysis_MCCL.R
seqGenerator.py		seqGenerator.py
sim_interface.R		sim_interface.R
simulator.R		simulator.R
simulator.py		simulator.py
threewise_marginal.R		threewise_marginal.R
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of EM Algorithm to Hierarchical Estimator

EM Algorithm

Hierarchical Estimator

How To Use

Set-up

To run 1 data set

To run a simulation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Comparison of EM Algorithm to Hierarchical Estimator

EM Algorithm

Hierarchical Estimator

How To Use

Set-up

To run 1 data set

To run a simulation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages