Newer
Older
# Mean Squared Error (MSE)
## Understand the code
The codebase uses a script `score.py` to define metric functions. Accuracy metric was implemented for you. Run the script to see the result:
```bash
./score.py predict.txt solution.txt
```
Install the necessary packages:
```bash
pip install numpy
```
## Implement MSE metric
Now your task as a developer is to implement MSE metric. Recall the formula for MSE is:
$$ MSE = \frac{1}{N} \sum_{i=1}^{N} (predicted_i - actual_i)² $$
<details>
<summary>Solution</summary>
<p>
- Create a feature `a_MSE` branch:
```bash
git pull origin develop
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
git checkout -b a_MSE
```
</p>
<p>
- Create a `mse_metric` function and compute its score:
```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from sys import argv
import numpy as np
# ========= Useful functions ==============
def read_array(filename):
''' Read array and convert to 2d np arrays '''
array = np.genfromtxt(filename, dtype=float)
if len(array.shape) == 1:
array = array.reshape(-1, 1)
return array
def accuracy_metric(solution, prediction):
correct_samples = np.all(solution == prediction, axis=1)
return correct_samples.mean()
def mse_metric(solution, prediction):
'''Mean-square error.
Works even if the target matrix has more than one column'''
mse = np.sum((solution - prediction)**2, axis=1)
return np.mean(mse)
def _HERE(*args):
h = os.path.dirname(os.path.realpath(__file__))
return os.path.join(h, *args)
# =============================== MAIN ========================================
if __name__ == "__main__":
#### INPUT/OUTPUT: Get input and output directory names
prediction_file = argv[1]
solution_file = argv[2]
score_file = open(_HERE('scores.txt'), 'w')
# # Extract the dataset name from the file name
prediction_name = os.path.basename(prediction_file)
# Read the solution and prediction values into numpy arrays
solution = read_array(solution_file)
prediction = read_array(prediction_file)
# Compute the score prescribed by the metric file
accuracy_score = accuracy_metric(solution, prediction)
mse_score = mse_metric(solution, prediction)
print(
"======= (" + prediction_name + "): score(accuracy_metric)=%0.2f =======" % accuracy_score)
print(
"======= (" + prediction_name + "): score(mse_metric)=%0.2f =======" % mse_score)
# Write score corresponding to selected task and metric to the output file
score_file.write("accuracy_metric: %0.2f\n" % accuracy_score)
score_file.write("mse_metric: %0.2f\n" % mse_score)
score_file.close()
```
</p>
<p>
- Commit and push changes:
```bash
git add score.py
git commit -m "implement MSE metric"
git push origin a_MSE
```
</p>
</details>
## Implement unit tests for MSE
The tester have to implement unit tests for the accuracy_metric and mse_metric functions in the `score.py` program. The process involves creating a test class, setting up test data, and then writing test cases for each function. You'll ensure that the tests cover a variety of scenarios, such as correct and incorrect predictions, and edge cases like empty arrays.
<details>
<summary>Solution</summary>
<p>
- Install pytest:
```bash
pip install pytest
```
</p>
<p>
- Create a file `tests/test_metrics.py`:
```python
import unittest
import numpy as np
import os, sys
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
from score import accuracy_metric, mse_metric # Import your metric functions
class TestMetrics(unittest.TestCase):
def setUp(self):
# Set up test data
self.solution = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 1]])
self.correct_prediction = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 1]])
self.incorrect_prediction = np.array([[0, 1, 0], [1, 0, 1], [0, 0, 1]])
self.empty_array = np.array([])
def test_accuracy_metric(self):
# Test accuracy with correct predictions
self.assertEqual(accuracy_metric(self.solution, self.correct_prediction), 1.0)
# Test accuracy with incorrect predictions
self.assertEqual(accuracy_metric(self.solution, self.incorrect_prediction), 0.0)
# Test accuracy with empty arrays
self.assertEqual(accuracy_metric(self.empty_array, self.empty_array), 0.0)
def test_mse_metric(self):
# Test MSE with correct predictions
self.assertEqual(mse_metric(self.solution, self.correct_prediction), 0.0)
# Test MSE with incorrect predictions
self.assertEqual(mse_metric(self.solution, self.incorrect_prediction), 8/3)
# Test MSE with empty arrays
self.assertEqual(mse_metric(self.empty_array, self.empty_array), 0.0)
if __name__ == '__main__':
unittest.main()
```
</p>
<p>
- Run the tests:
```bash
pytest
```
</p>
<p>
- Commit and push changes:
```bash
git add tests/test_metrics.py
git commit -m "implement unit tests for metrics"
git push origin a_MSE
```
</p>
</details>
## Fix MSE metric
The tests failed. The tester tells to the developer that the two metric functions can't handle empty arrays. Now, the developer need to fix this by adding checks to return a default value or raise an error when the input arrays are empty. The appropriate action depends on the expected behavior of your application.
<details>
<summary>Solution</summary>
<p>
```python
def accuracy_metric(solution, prediction):
if len(solution) == 0 or len(prediction) == 0:
return 0
correct_samples = np.all(solution == prediction, axis=1)
return np.mean(correct_samples)
def mse_metric(solution, prediction):
'''Mean-square error.
Works even if the target matrix has more than one column'''
if len(solution) == 0 or len(prediction) == 0:
return 0
mse = np.sum((solution - prediction)**2, axis=1)
return np.mean(mse)
```
</p>
<p>
- Run the tests:
```bash
pytest
```
The tests now should pass.
</p>
<p>
- Commit and push changes:
```bash
git add score.py
git commit -m "fix metric functions to handle empty arrays"
git push origin a_MSE
```
</p>
</details>
## Create a merge request
As the tests passed, the developer creates a merge request from `a_MSE` to `develop`.
1. On Gitlab project page, Select Code > Merge requests.
2. In the upper-right corner, select New merge request.
3. Select a source and target branch, then select Compare branches and continue.
4. Complete the fields on the New merge request page,
* Description: implement MSE metric and unit tests for the two functions.
* Assignee: `Project leader`
* Reviewer: `Code reviewer`
then select Create merge request.
## Approve a merge request
The code reviewer does run all the tests again and ensures that the features `MSE` and `unittest` have been accurately implemented. If something goes wrong, the reviewer communicates these findings to the developer and tester for rectification. If everything is OK, click on `Approve`.
## Merge `a_MSE` into `develop`
It is the responsibility of the project leader to finalize the integration of changes into the `develop`. This involves merging the requests that have been created by the developer and subsequently approved by the reviewer by simply clicking `Merge` button.