diff --git a/Bayes Guards SMS/Predict/task.md b/Bayes Guards SMS/Predict/task.md index c660f3d..ab8ef56 100644 --- a/Bayes Guards SMS/Predict/task.md +++ b/Bayes Guards SMS/Predict/task.md @@ -33,32 +33,86 @@ delete the `pass` operator. help of the `split_by_words()` function. - In each message, find a set of unique words and create a vector of zeros of the same size. + +
+Create a zero-initialized array to store the index of each unique word. + +```python +index_array = np.zeros(unique.shape, dtype=np.int64) +``` +
+ - For each unique word from the list, find a correspondence in the dictionary; if you find it, write its index to the vector created in the previous step, if not – write an index equal to the dictionary length. All such words have the same probability. + +
+ +For each word, assign its dictionary index to `word_index`; if the word is not found, set `word_index` to `dict_size`. + +```python + word_index = self.dictionary[word] if word in self.dictionary else self.dict_size +``` + +**Why do this?** + +Because `self.likelihood` stores word probabilities by index rather than the string itself. + +
+ - Calculate `log_likelihood` by applying the `np.log()` function to the slice of the `likelihood` array obtained with the help of `index_array`; thus, the array will contain the probabilities of only those words that occur in our sentence. -- Use the above formula to calculate the most probable class for this message. -- Return the list of most probable classes of all messages in the input array. +
-Then, implement the `score` method, which passes the testing sample through the algorithm, compares the received -class labels with the real ons and returns the proportion of correctly classified objects. +`self.likelihood[c, w]` represents the probability of word `w` for class `c`. +Select the probabilities for the message words and calculate their logs: -
-Posterior probabilities for each class are calculated as the sum of the prior probability logarithm and the summarized logarithms of probabilities for the words from -log_likelihood -positioned along one axis, i.e., separately for each class. +```python +log_likelihood = np.log(self.likelihood[:, index_array]) +```
-
+- Compute the posterior score for each class. + +
+The posterior score indicates the probability of each class given the message. +Compute it by summing the log-probabilities of the words (across axis=1) and adding the log prior. +
+ +
+ +```python +posterior = np.log(self.classes_prior) + np.sum(log_likelihood, axis=1) +``` +
+ +- Identify the class with the highest score for each message. + +
After finding the posterior probabilities for classes, you need to determine which one is the largest among them and choose a class corresponding to it from unique_classes. Here, you may use the numpy.argmax function.
+
+ +```python +predicted = self.unique_classes[np.argmax(posterior)] +``` +
+ +Then, implement the `score` method, which passes the test samples through the algorithm, compares the predicted +class labels with the true labels, and returns the proportion of correctly classified objects. + +
+Use predict to generate labels, compare them with the true labels y, and return the fraction of matches: + +
return np.sum(self.predict(X) == y) / len(y)
+
+ To see the results of your code, you can add the following lines to the `main` block in `task.py` and then run it: diff --git a/Comic-Con and K-means/Reading an image/task-info.yaml b/Comic-Con and K-means/Reading an image/task-info.yaml index c04b55c..46cbf97 100644 --- a/Comic-Con and K-means/Reading an image/task-info.yaml +++ b/Comic-Con and K-means/Reading an image/task-info.yaml @@ -3,12 +3,12 @@ files: - name: task.py visible: true placeholders: - - offset: 156 + - offset: 153 length: 16 placeholder_text: "# TODO" - - offset: 243 + - offset: 231 length: 30 - placeholder_text: "# Reshape the image" + placeholder_text: "# TODO" - name: tests/test_task.py visible: false propagatable: false diff --git a/Comic-Con and K-means/Reading an image/task.py b/Comic-Con and K-means/Reading an image/task.py index b78f885..c37412f 100644 --- a/Comic-Con and K-means/Reading an image/task.py +++ b/Comic-Con and K-means/Reading an image/task.py @@ -3,10 +3,9 @@ def read_image(path='superman-batman.png'): - # Here, we load the image using PIL's open function. + # Here, load the image using PIL's open function. image = Image.open(path) - # We reshape the image into an (M x N, 3) - # array. + # Reshape the image into an (M x N, 3) array. return np.array(image).reshape(-1, 3) diff --git a/Horror Trees/Conclusion/task.md b/Horror Trees/Conclusion/task.md index 9fb9ba8..52c651d 100644 --- a/Horror Trees/Conclusion/task.md +++ b/Horror Trees/Conclusion/task.md @@ -40,5 +40,5 @@ To see the results of the algorithm's work, run the code in the `halloween.py` f see an image `tree.jpg` in the list of files in Course View – it is our decision tree! #### Additional information links: -1) An [example](https://iq.opengenus.org/id3-algorithm/#:~:text=ID3%20algorithm%2C%20stands%20for%20Iterative,or%20minimum%20Entropy%20(H)) of using the ID3 algorithm in weather forecasts. +1) An [example](https://discourse.opengenus.org/t/using-id3-algorithm-to-build-a-decision-tree-to-predict-the-weather/3343) of using the ID3 algorithm in weather forecasts. 2) An [article](https://en.wikipedia.org/wiki/ID3_algorithm#cite_ref-2) in Wikipedia on ID3. diff --git a/Horror Trees/Information Gain/task.md b/Horror Trees/Information Gain/task.md index 40063ab..5cea6a3 100644 --- a/Horror Trees/Information Gain/task.md +++ b/Horror Trees/Information Gain/task.md @@ -15,18 +15,29 @@ We subtract entropy `Y` for the condition `X` from entropy `Y` to calculate the `Y`, provided that there is some additional knowledge `X` about `Y`. - ### Task Implement the `information_gain` method, which takes a sample, divides it into two independent sub-samples, and calculates the information gain. To divide the sample, use the `divide` method written -in the previous step. +in the previous step. + +
+Use the divide `method` from the `Predicate` class to split the sample into two subsets: +`X1, y1, X2, y2 = self.divide(X, y)` + +
+
+Each subset contributes to the total entropy proportionally to its size. Compute the fraction of samples in the first subset as follows: + +`p = float(len(X1)) / len(X)` + +
-
+
To calculate information gain, you can use the above formula in the following way: diff --git a/Horror Trees/Node/node.py b/Horror Trees/Node/node.py index 0db1120..05056c9 100644 --- a/Horror Trees/Node/node.py +++ b/Horror Trees/Node/node.py @@ -4,6 +4,8 @@ # the true and false branches. class Node: def __init__(self, column=-1, value=None, true_branch=None, false_branch=None): + # Implement the four attributes of the Node class: + # column, value, true_branch, false_branch self.column = column self.value = value self.true_branch = true_branch diff --git a/Horror Trees/Node/task-info.yaml b/Horror Trees/Node/task-info.yaml index d84a40b..c3c975a 100644 --- a/Horror Trees/Node/task-info.yaml +++ b/Horror Trees/Node/task-info.yaml @@ -3,9 +3,9 @@ files: - name: node.py visible: true placeholders: - - offset: 333 + - offset: 443 length: 127 - placeholder_text: "# TODO: Implement the four attributes of the Node class" + placeholder_text: "# TODO" - name: task.py visible: true - name: tests/test_task.py diff --git a/Horror Trees/Predict/task.md b/Horror Trees/Predict/task.md index bdb85d0..5190e91 100644 --- a/Horror Trees/Predict/task.md +++ b/Horror Trees/Predict/task.md @@ -13,10 +13,73 @@ In `classify_subtree`, you need to: 1. Check whether `sub_tree` is an instance of the `Node` class and if yes, return the current value of `sub_tree`, as in such a case, it is a class label. + +
+ +If `sub_tree` is not a `Node`, return it; it already represents the class label. + +```python +if not isinstance(sub_tree, Node): + return sub_tree +``` +
+ 2. Compare the characteristic value from the column according to which a condition is set in the given node with the threshold value. + +
+ +Each node evaluates a specific feature of the object being classified. The index of the feature to check is stored in `sub_tree.column`. +You need to extract the corresponding value from `x`. + +```python +v = x[sub_tree.column] +``` + +
+ 3. Depending on the result, choose the tree branch along which you will proceed (`true_branch` or `false_branch`). + +
+ +For numeric features, the node evaluates a threshold (e.g., `age >= 30`). +Compare the feature value against this threshold to determine which branch to follow. + +```python +if isinstance(v, int) or isinstance(v, float): + if v >= sub_tree.value: + branch = sub_tree.true_branch + else: + branch = sub_tree.false_branch +``` + +
+ +
+ +For categorical features, the node evaluates an equality condition (e.g., `color == "red"`). +Determine the next branch based on whether the feature value matches this criterion. + +```python +else: + if v == sub_tree.value: + branch = sub_tree.true_branch + else: + branch = sub_tree.false_branch +``` + +
+ 4. Repeat these actions recursively until the result will be a class label (a leaf node). +
+ +Choosing a branch is only the first step – you may encounter another node. +Apply the same logic again by calling the function on the selected branch until you reach a leaf (the final class label). + +```python +return self.classify_subtree(x, branch) +``` +
To see the results of your code, add the following lines to the `main` block in `task.py` and run it: diff --git a/Iris Network/Backpropagation/task.md b/Iris Network/Backpropagation/task.md index 21080af..4423ca0 100644 --- a/Iris Network/Backpropagation/task.md +++ b/Iris Network/Backpropagation/task.md @@ -70,9 +70,42 @@ In the `network.py` file, implement only the `backward` method of the `NN` class
  • Calculate the error for the output layer (delta_l2) as the difference between the network results (output) and the real class labels (y) multiplied elementwise by the derivative of the activation function for output ($\delta_{o}$ formula).
  • -
  • Calculate the error for the hidden layer (delta_l1) as the product of input layer error matrices and the weights w2 multiplied elementwise by the derivative of the activation function wrt the output data of the hidden layer (layer1) ($\delta_{h}$ formula).
  • + +
    + +```python +delta_l2 = (y - output) * sigmoid_derivative(output) +``` +
    + +
  • Calculate the error for the hidden layer (delta_l1) by taking the product of the output layer error +and the transpose of the weight matrix w2, then multiplying element-wise by the derivative +of the activation function with respect to the hidden layer's output (layer1) ($\delta_{h}$ formula).
  • + +
    + +```python +delta_l1 = np.dot(delta_l2, self.w2.T) * sigmoid_derivative(self.layer1) +``` +
    +
  • Adjust the weight coefficients of the output layer (w2) by calculating the vector product of the hidden layer (layer1) and the output layer error (delta_l2) multiplied elementwise by the learning rate (formula 3).
  • + +
    + +```python +self.w2 += (np.dot(self.layer1.T, delta_l2) * learning_rate) +``` +
    +
  • Adjust the weight coefficients of the hidden layer (w1) by calculating the vector product of the input layer (X) and the hidden layer error (delta_l1), multiplied elementwise by the learning rate (formula 3).
  • + +
    + +```python +self.w1 += (np.dot(X.T, delta_l1) * learning_rate) +``` +
Before you start, delete the `pass` operator and uncomment all lines that are not task commentaries. diff --git a/Iris Network/Train and Predict/task.md b/Iris Network/Train and Predict/task.md index 9e0cc19..c1d2b92 100644 --- a/Iris Network/Train and Predict/task.md +++ b/Iris Network/Train and Predict/task.md @@ -5,13 +5,29 @@ The process of setting up a neural network involves successive implementation of In the `network.py` file, implement the `train` method of the `NN` class. Besides data, it takes the `n_iter` parameter, which sets the necessary number of iterations. The method should call two other (previously implemented) methods in the right order. It does not return anything. +
+On each iteration, generate predictions via feedforward and update the model's parameters using backward propagation. + +```python + for itr in range(n_iter): + l2 = self.feedforward(X) + self.backward(X, y, l2) +``` +
+ Augment the implementation by the `predict` method, which passes all objects from the `X` matrix through the trained neural network. -Before you start, delete the `pass` operator and uncomment all lines that are not task commentaries. +
+The predict method is a required part of the neural network's interface. +We will implement it here, even though it simply acts as a wrapper for the feedforward method. + +```python +return self.feedforward(X) +``` +While this case is straightforward, other scenarios may require a more complex implementation. +
-
The predict method is a part of the interface of a program the neural network is expected to include, so we will implement it -despite the fact that it just calls the feedforward method. It's a lucky coincidence – in other cases, there might be -something else.
+Before you begin, delete the `pass` statement and uncomment all lines that are not task-related comments. To see the results of your code in this step, add the following lines to the `main` block in `task.py`: diff --git a/Pima Indians Diabetes and Linear Classifier/Gradient Descent/task-info.yaml b/Pima Indians Diabetes and Linear Classifier/Gradient Descent/task-info.yaml index bc46119..aade20a 100644 --- a/Pima Indians Diabetes and Linear Classifier/Gradient Descent/task-info.yaml +++ b/Pima Indians Diabetes and Linear Classifier/Gradient Descent/task-info.yaml @@ -21,7 +21,7 @@ files: placeholder_text: "# TODO: Set it to the new ones" - offset: 2210 length: 28 - placeholder_text: "# Return the predicted classes" + placeholder_text: "# TODO" - name: loss_functions.py visible: true - name: task.py diff --git a/Pima Indians Diabetes and Linear Classifier/Read data/task-info.yaml b/Pima Indians Diabetes and Linear Classifier/Read data/task-info.yaml index d0d71a1..ca00b97 100644 --- a/Pima Indians Diabetes and Linear Classifier/Read data/task-info.yaml +++ b/Pima Indians Diabetes and Linear Classifier/Read data/task-info.yaml @@ -3,18 +3,18 @@ files: - name: task.py visible: true placeholders: - - offset: 314 + - offset: 256 length: 35 placeholder_text: "# TODO" - - offset: 802 + - offset: 532 length: 36 - placeholder_text: "# Standardize the dataset" - - offset: 1000 + placeholder_text: "# TODO" + - offset: 703 length: 60 - placeholder_text: "# Add a column of -1 to the left of X" - - offset: 1145 + placeholder_text: "# TODO" + - offset: 812 length: 12 - placeholder_text: "# {0, 1} -> {1, -1}" + placeholder_text: "# TODO" - name: tests/test_task.py visible: false propagatable: false diff --git a/Pima Indians Diabetes and Linear Classifier/Read data/task.py b/Pima Indians Diabetes and Linear Classifier/Read data/task.py index 78df4cc..7cd1146 100644 --- a/Pima Indians Diabetes and Linear Classifier/Read data/task.py +++ b/Pima Indians Diabetes and Linear Classifier/Read data/task.py @@ -6,23 +6,19 @@ # and returns it as a pair of arrays: features # and diabetes presence. def read_data(fname): - # The genfromtxt method loads data from a text file and splits columns - # based on the provided delimiter. + # Load data from a CSV file using numpy.genfromtxt. data = np.genfromtxt(fname, delimiter=',') # The data is split into X (all columns but the last) and # y (the last column). X, y = data[:, :-1], data[:, -1] - # The features are rescaled: - # X is standardized by centering features around the mean - # with a unit standard deviation. This means that the mean - # and standard deviation of the standard scores are 0 and 1, respectively. - # This procedure is recommended for data that follows a normal distribution. + # Standardize features: subtract the mean + # and divide by the standard deviation for each column. X = (X - X.mean(axis=0)) / X.std(axis=0) - # A column of -1s is prepended to the left of the X array. + # Prepend a column of -1s to X. # It acts as a pseudo-feature that simplifies our vector # calculations later on. X = np.concatenate((-np.ones(len(X)).reshape(-1, 1), X), axis=1) - # y is standardized: centered around 0 with a standard deviation of 1. + # Map labels from {0,1} to {1,-1}. y = -(y * 2 - 1) return X, y diff --git a/Pima Indians Diabetes and Linear Classifier/Stochastic Gradient Descent/task-info.yaml b/Pima Indians Diabetes and Linear Classifier/Stochastic Gradient Descent/task-info.yaml index f31373a..f847ae2 100644 --- a/Pima Indians Diabetes and Linear Classifier/Stochastic Gradient Descent/task-info.yaml +++ b/Pima Indians Diabetes and Linear Classifier/Stochastic Gradient Descent/task-info.yaml @@ -5,14 +5,14 @@ files: placeholders: - offset: 1114 length: 32 - placeholder_text: "# TODO: Generate the batch" + placeholder_text: "# TODO" - offset: 1172 length: 60 placeholder_text: "# TODO: Calculate the gradient using the current weights, X\ \ and y batches" - offset: 2183 length: 25 - placeholder_text: "# TODO: Initialize it here" + placeholder_text: "# TODO" - name: gradient_descent.py visible: true - name: loss_functions.py