Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 63 additions & 9 deletions Bayes Guards SMS/Predict/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,32 +33,86 @@ delete the `pass` operator.
help of the `split_by_words()` function.
- In each message, find a set of unique words and create a vector of zeros of
the same size.

<div class="hint">
Create a zero-initialized array to store the index of each unique word.

```python
index_array = np.zeros(unique.shape, dtype=np.int64)
```
</div>

- For each unique word from the list, find a correspondence in the dictionary; if you find it,
write its index to the vector created in the previous step, if not – write an index
equal to the dictionary length. All such words have the same
probability.

<div class="hint">

For each word, assign its dictionary index to `word_index`; if the word is not found, set `word_index` to `dict_size`.

```python
word_index = self.dictionary[word] if word in self.dictionary else self.dict_size
```

**Why do this?**

Because `self.likelihood` stores word probabilities by index rather than the string itself.

</div>

- Calculate `log_likelihood` by applying the `np.log()` function to the slice of the `likelihood` array
obtained with the help of `index_array`; thus, the array will contain the probabilities of only
those words that occur in our sentence.
- Use the above formula to calculate the most probable class for this message.
- Return the list of most probable classes of all messages in the input array.

<div class="hint">

Then, implement the `score` method, which passes the testing sample through the algorithm, compares the received
class labels with the real ons and returns the proportion of correctly classified objects.
`self.likelihood[c, w]` represents the probability of word `w` for class `c`.
Select the probabilities for the message words and calculate their logs:

<div class="hint">
Posterior probabilities for each class are calculated as the sum of the prior probability logarithm and the summarized logarithms of probabilities for the words from
<code>log_likelihood</code>
positioned along one axis, i.e., separately for each class.
```python
log_likelihood = np.log(self.likelihood[:, index_array])
```
</div>

<div class="hint">
- Compute the posterior score for each class.

<div class="hint" title="Posterior meaning">
The posterior score indicates the probability of each class given the message.
Compute it by summing the log-probabilities of the words (across <code>axis=1</code>) and adding the log prior.
</div>

<div class="hint" title="Posterior formula">

```python
posterior = np.log(self.classes_prior) + np.sum(log_likelihood, axis=1)
```
</div>

- Identify the class with the highest score for each message.

<div class="hint" title="Find the best class">
After finding the posterior probabilities for classes, you need to determine which
one is the largest among them and choose a class corresponding to it from <code>unique_classes</code>. Here,
you may use the <a href="https://numpy.org/doc/stable/reference/generated/numpy.argmax.html">numpy.argmax</a> function.
</div>

<div class="hint" title="Prediction formula">

```python
predicted = self.unique_classes[np.argmax(posterior)]
```
</div>

Then, implement the `score` method, which passes the test samples through the algorithm, compares the predicted
class labels with the true labels, and returns the proportion of correctly classified objects.

<div class="hint">
Use <code>predict</code> to generate labels, compare them with the true labels <code>y</code>, and return the fraction of matches:

<pre><code>return np.sum(self.predict(X) == y) / len(y)</code></pre>
</div>

To see the results of your code, you can add the following
lines to the `main` block in `task.py` and then run it:

Expand Down
6 changes: 3 additions & 3 deletions Comic-Con and K-means/Reading an image/task-info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ files:
- name: task.py
visible: true
placeholders:
- offset: 156
- offset: 153
length: 16
placeholder_text: "# TODO"
- offset: 243
- offset: 231
length: 30
placeholder_text: "# Reshape the image"
placeholder_text: "# TODO"
- name: tests/test_task.py
visible: false
propagatable: false
Expand Down
5 changes: 2 additions & 3 deletions Comic-Con and K-means/Reading an image/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@


def read_image(path='superman-batman.png'):
# Here, we load the image using PIL's open function.
# Here, load the image using PIL's open function.
image = Image.open(path)
# We reshape the image into an (M x N, 3)
# array.
# Reshape the image into an (M x N, 3) array.
return np.array(image).reshape(-1, 3)


Expand Down
2 changes: 1 addition & 1 deletion Horror Trees/Conclusion/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,5 @@ To see the results of the algorithm's work, run the code in the `halloween.py` f
see an image `tree.jpg` in the list of files in Course View – it is our decision tree!

#### Additional information links:
1) An [example](https://iq.opengenus.org/id3-algorithm/#:~:text=ID3%20algorithm%2C%20stands%20for%20Iterative,or%20minimum%20Entropy%20(H)) of using the ID3 algorithm in weather forecasts.
1) An [example](https://discourse.opengenus.org/t/using-id3-algorithm-to-build-a-decision-tree-to-predict-the-weather/3343) of using the ID3 algorithm in weather forecasts.
2) An [article](https://en.wikipedia.org/wiki/ID3_algorithm#cite_ref-2) in Wikipedia on ID3.
17 changes: 14 additions & 3 deletions Horror Trees/Information Gain/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,29 @@ We subtract entropy `Y` for the condition `X` from entropy `Y` to calculate the
`Y`, provided that there is some additional knowledge `X` about `Y`.



### Task

Implement the `information_gain` method, which
takes a sample, divides it into two independent sub-samples, and calculates the information gain.
To divide the sample, use the `divide` method written
in the previous step.
in the previous step.

<div class="hint" title="Division into two subsets">

Use the divide `method` from the `Predicate` class to split the sample into two subsets:

`X1, y1, X2, y2 = self.divide(X, y)`

</div>

<div class="hint" title="p value">
Each subset contributes to the total entropy proportionally to its size. Compute the fraction of samples in the first subset as follows:

`p = float(len(X1)) / len(X)`

</div>

<div class="hint">
<div class="hint" title="Information gain formula">

To calculate information gain, you can use the above formula in the following way:

Expand Down
2 changes: 2 additions & 0 deletions Horror Trees/Node/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
# the true and false branches.
class Node:
def __init__(self, column=-1, value=None, true_branch=None, false_branch=None):
# Implement the four attributes of the Node class:
# column, value, true_branch, false_branch
self.column = column
self.value = value
self.true_branch = true_branch
Expand Down
4 changes: 2 additions & 2 deletions Horror Trees/Node/task-info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ files:
- name: node.py
visible: true
placeholders:
- offset: 333
- offset: 443
length: 127
placeholder_text: "# TODO: Implement the four attributes of the Node class"
placeholder_text: "# TODO"
- name: task.py
visible: true
- name: tests/test_task.py
Expand Down
63 changes: 63 additions & 0 deletions Horror Trees/Predict/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,73 @@ In `classify_subtree`, you need to:

1. Check whether `sub_tree` is an instance of the `Node` class and if yes, return the
current value of `sub_tree`, as in such a case, it is a class label.

<div class="hint">

If `sub_tree` is not a `Node`, return it; it already represents the class label.

```python
if not isinstance(sub_tree, Node):
return sub_tree
```
</div>

2. Compare the characteristic value from the column according to which a condition is set in the given node with the threshold value.

<div class="hint">

Each node evaluates a specific feature of the object being classified. The index of the feature to check is stored in `sub_tree.column`.
You need to extract the corresponding value from `x`.

```python
v = x[sub_tree.column]
```

</div>

3. Depending on the result, choose the tree branch along which you will proceed (`true_branch` or `false_branch`).

<div class="hint" title="Compare numeric features">

For numeric features, the node evaluates a threshold (e.g., `age >= 30`).
Compare the feature value against this threshold to determine which branch to follow.

```python
if isinstance(v, int) or isinstance(v, float):
if v >= sub_tree.value:
branch = sub_tree.true_branch
else:
branch = sub_tree.false_branch
```

</div>

<div class="hint" title="Compare categorical features">

For categorical features, the node evaluates an equality condition (e.g., `color == "red"`).
Determine the next branch based on whether the feature value matches this criterion.

```python
else:
if v == sub_tree.value:
branch = sub_tree.true_branch
else:
branch = sub_tree.false_branch
```

</div>

4. Repeat these actions recursively until the result will be a class label (a leaf node).

<div class="hint">

Choosing a branch is only the first step – you may encounter another node.
Apply the same logic again by calling the function on the selected branch until you reach a leaf (the final class label).

```python
return self.classify_subtree(x, branch)
```
</div>

To see the results of your code, add the following lines
to the `main` block in `task.py` and run it:
Expand Down
35 changes: 34 additions & 1 deletion Iris Network/Backpropagation/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,42 @@ In the `network.py` file, implement only the `backward` method of the `NN` class

<ul>
<li>Calculate the error for the output layer (<code>delta_l2</code>) as the difference between the network results (<code>output</code>) and the real class labels (<code>y</code>) multiplied elementwise by the derivative of the activation function for output ($\delta_{o}$ formula).</li>
<li>Calculate the error for the hidden layer (<code>delta_l1</code>) as the product of input layer error matrices and the weights <code>w2</code> multiplied elementwise by the derivative of the activation function wrt the output data of the hidden layer (<code>layer1</code>) ($\delta_{h}$ formula).</li>

<div class="hint" title="Result">

```python
delta_l2 = (y - output) * sigmoid_derivative(output)
```
</div>

<li>Calculate the error for the hidden layer (<code>delta_l1</code>) by taking the product of the output layer error
and the transpose of the weight matrix <code>w2</code>, then multiplying element-wise by the derivative
of the activation function with respect to the hidden layer's output (<code>layer1</code>) ($\delta_{h}$ formula).</li>

<div class="hint" title="Result">

```python
delta_l1 = np.dot(delta_l2, self.w2.T) * sigmoid_derivative(self.layer1)
```
</div>

<li>Adjust the weight coefficients of the output layer (<code>w2</code>) by calculating the vector product of the hidden layer (<code>layer1</code>) and the output layer error (<code>delta_l2</code>) multiplied elementwise by the learning rate (formula 3).</li>

<div class="hint" title="Result">

```python
self.w2 += (np.dot(self.layer1.T, delta_l2) * learning_rate)
```
</div>

<li>Adjust the weight coefficients of the hidden layer (<code>w1</code>) by calculating the vector product of the input layer (<code>X</code>) and the hidden layer error (<code>delta_l1</code>), multiplied elementwise by the learning rate (formula 3).</li>

<div class="hint" title="Result">

```python
self.w1 += (np.dot(X.T, delta_l1) * learning_rate)
```
</div>
</ul>

Before you start, delete the `pass` operator and uncomment all lines that are not task commentaries.
Expand Down
24 changes: 20 additions & 4 deletions Iris Network/Train and Predict/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,29 @@ The process of setting up a neural network involves successive implementation of
In the `network.py` file, implement the `train` method of the `NN` class. Besides data, it takes the `n_iter` parameter, which sets
the necessary number of iterations. The method should call two other (previously implemented) methods in the right order. It does not return anything.

<div class="hint">
On each iteration, generate predictions via <code>feedforward</code> and update the model's parameters using <code>backward</code> propagation.

```python
for itr in range(n_iter):
l2 = self.feedforward(X)
self.backward(X, y, l2)
```
</div>

Augment the implementation by the `predict` method, which passes all objects from the `X` matrix through the trained neural network.

Before you start, delete the `pass` operator and uncomment all lines that are not task commentaries.
<div class="hint">
The <code>predict</code> method is a required part of the neural network's interface.
We will implement it here, even though it simply acts as a wrapper for the <code>feedforward</code> method.

```python
return self.feedforward(X)
```
While this case is straightforward, other scenarios may require a more complex implementation.
</div>

<div class="hint"> The <code>predict</code> method is a part of the interface of a program the neural network is expected to include, so we will implement it
despite the fact that it just calls the <code>feedforward</code> method. It's a lucky coincidence – in other cases, there might be
something else.</div>
Before you begin, delete the `pass` statement and uncomment all lines that are not task-related comments.

To see the results of your code in this step, add the following lines to the `main` block in `task.py`:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ files:
placeholder_text: "# TODO: Set it to the new ones"
- offset: 2210
length: 28
placeholder_text: "# Return the predicted classes"
placeholder_text: "# TODO"
- name: loss_functions.py
visible: true
- name: task.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,18 @@ files:
- name: task.py
visible: true
placeholders:
- offset: 314
- offset: 256
length: 35
placeholder_text: "# TODO"
- offset: 802
- offset: 532
length: 36
placeholder_text: "# Standardize the dataset"
- offset: 1000
placeholder_text: "# TODO"
- offset: 703
length: 60
placeholder_text: "# Add a column of -1 to the left of X"
- offset: 1145
placeholder_text: "# TODO"
- offset: 812
length: 12
placeholder_text: "# {0, 1} -> {1, -1}"
placeholder_text: "# TODO"
- name: tests/test_task.py
visible: false
propagatable: false
Expand Down
Loading
Loading