Skip to content

Getting Started with the Comparative Segmentation Network (CompSegNet)

Dajana Müller edited this page Feb 23, 2022 · 3 revisions

The comparative segmentation network (CompSegNet) is a weakly supervised neural network that is trained on binary sample labels only, but yields a U-Net [Ronneberger et al., 2015] whose output layer as an activation map facilitates segmentation. The activation map is inferred with the help of a pooling neuron whose activation is maximized for e.g. cancer samples and minimized for control samples during training.

Device Distribution

By default, CompSegNet will prioritize GPU devices and run on all available GPUs. Information of the status is printed while running train_CSN.

Data Preprocessing

CompSegNet can be applied on either histopathologically stained images or native tissue from an infrared microspectroscopy approach to solve a binary segmentation problem. Therefore, the whole slide images have to be tiled into smaller patches of the same size ( e.g. (64, 64, z) with z = channel depth, np.ndarray, np.float32) with a script of your choice and should be appended to a list. For every patch a binary background mask is needed ( e.g. (64, 64), np.ndarray, np.float32). Every pixel that is covered by sample or tissue shall receive the value 1, and 0 otherwise (background). Mask arrays have to be appended to a list in the same order as the input patches. For every patch a binary sample label is needed and has to be appended to list ( e.g. [[1], [0], ...], np.float32 ). To include information of every patch ( e.g. patient information, health status or tile number) a list of str should be created ( e.g. ['pat1_tile0' , 'pat1_tile1', ... , 'patN_tileN'] ).

A more detailed explanation of the data can be found here:

from openvibspec.compsegnet import helper, train_CSN, predict_CSN
helper()

#_____HOW TO TRAIN COMPSEGNET: EXAMPLE DATA______
#
#Training/Validation data:  <class 'list'> 
#Example of elements inside list: (64, 64, 427) 
#Binary mask of data: <class 'list'> with elements of shape (64, 64) 
#Labels: <class 'list'> example [[1], [0], [0], [0]] 
#File Names:  <class 'list'> example ['NXHC', 'POSX', 'KCHR', 'GKKP']
#________________________________________________

Sample data can be randomly created by:

import string, random

def create_random_data(a,b,c,d,e):
    for i in range(e):
        a.append(np.random.rand(64,64,427))
        b.append(np.reshape(np.float32(np.random.randint(0,2,64*64)),(64,64)))
        c.append([np.random.randint(2)])
        d.append(''.join(random.choice(string.ascii_uppercase) for _ in range(4)))
    return a,b,c,d

n_t = 12
n_v = 4
train, mask_t, y_train, fnames_t = create_random_data([], [], [], [],n_t)
vali, mask_v, y_vali, fnames_v = create_random_data([], [], [], [],n_v)

Training

To train CompSegNet with only the minimal amount of parameters and input arguments, the following is required:

train_CSN(
          x_train: list,
          mask_train: list,
          y_train: list,
          fnames_train: list,
          x_vali: list,
          mask_vali: list,
          y_vali: list,
          fnames_vali: list,
          out_dir: str
)

To adjust training, more variables can be defined: The amount of epochs, the initial_epoch if training is continued (restore must be set to True and a path to the model path_model_restore that shall be restored, is required. The batch size can be set and the learning rate. Parameter alpha specifies the minimum percentage of e.g. tumor in the cancer samples so it equals to a lower boundary of pixels that should be present in class 1 tiles. Parameter beta is part of the upper boundary (alpha + beta) and shall not exceed 0.9. The momentum of the optimizer RMSprop can be adjusted and the amount of epochs of the learning rate scheduler and its factor for decreasing the learning rate. Furthermore, the dropout rate in the U-Net architecture can be changed.

train_CSN(......
          epochs: int = 300, 
          initial_epoch: int = 0,
          batch_size: int = 20,
          learning_rate: float = 0.005,
          alpha: float = 0.05,
          beta: float = 0.8,
          momentum_RMSprop: float = 0,
          lr_scheduler_epochs: int = 30,
          lr_scheduler_factor: float = 0.9,
          dropout_rate_Unet: float = 0.2,
          path_model_restore: str = None,
          restore: bool = False,
          save_vali_images: bool = True
)

Validation of an independent dataset

To validate an independent dataset, the testing data has to be prepared as the training data mentioned in "Data Preprocessing". Variables that are necessary are the list of testing data, list of masks of testing data, list of binary labels and list of file names of testing tiles. The output directory, batch size, alpha, beta and the dropout rate has to be defined according to training parameters. The prediction of more than one model is possible by adding the number of epochs to test in the list epochs_to_test.

predict_CSN(test,
            mask_e,
            y_test,
            fnames_e,
            out_dir = "/bph/puredata1/bioinfdata/user/dajmue/PhD/Github_Projects/test/",
            epochs_to_test = [3],  #Model of epoch 3 is tested
            batch_size = 2,
            alpha = 0.05,
            beta = 0.8,
            dropout_rate_Unet= 0.2)

Got any trouble?

Is there a topic missing?

Reach out to us via an issue:   Issues

Clone this wiki locally