diff --git a/0-Setup.md b/0-Setup.md new file mode 100644 index 0000000000000000000000000000000000000000..5bcc1fc241ff4f33ef970b23e0f67c6d963114d7 --- /dev/null +++ b/0-Setup.md @@ -0,0 +1,81 @@ +# Environment setup + +## SSH key setup + +To clone the project code, you'll need to use Git. To do this, you first need to set up an SSH key pair and import it into your Gitlab account. To do this, follow the steps below: + +If you do not have an existing SSH key pair, generate a new one: +1. Open a terminal. + ```bash + ssh-keygen -t ed25519 -C "<comment>" + ``` +2. Press `Enter`. Output similar to the following is displayed: + ```bash + Generating public/private ed25519 key pair. + Enter file in which to save the key (/home/user/.ssh/id_ed25519): + ``` +3. Accept the suggested filename and directory +4. Specify a passphrase (let empty): + ```bash + Enter passphrase (empty for no passphrase): + Enter same passphrase again: + ``` + +Next, add an SSH key to your GitLab account; +1. Copy the contents of your public key file + ```bash + cat ~/.ssh/id_ed25519.pub + ``` +2. Sign in to GitLab. +3. On the left sidebar, select your avatar. +4. Select Edit profile. +5. On the left sidebar, select SSH Keys. +6. Select Add new key. +7. In the Key box, paste the contents of your public key. +8. In the Title box, type a description, like `Gitlab workshop`. +9. Select Add key. + +## Setting up working groups + +To simulate a team, we invite you to form groups of 4 people, trying to mix people from different laboratories. One member of the group will be "Project Leader", creating a fork in your personal namespace: +1. On the project’s homepage (https://gitlab.lisn.upsaclay.fr/asard/scoring), in the upper-right corner, select `Fork` +2. For Project URL, select the personal namespace your fork should belong to. +3. Select Fork project. + +GitLab creates your fork, and redirects you to the new fork’s page. + +## Setting up colloraborating workflows + +The `main` branch stores the official release history, and the `develop` branch serves as an integration branch for features. The Project leader creates a `develop` branch locally and push it to the server: + +```bash +git clone git@gitlab.com:[namespace]/scoring.git +cd scoring +git branch develop +git push -u origin develop +``` + +This branch will contain the complete history of the project, whereas `main` will contain an abridged version. Other developers should now clone the central repository and create a tracking branch for develop. + +Now, add your collaborators to the project: +1. Select Manage > Members. +2. Select Invite members. +3. Add three peoples in your group to the project with following roles: + * A developer: `Developer` role + * A tester: `Developer` role + * A code reviewer: `Maintainer` role + +## Cloning the repository for collaborators: + +1. Clone the repository +```bash +git clone git@gitlab.com:[namespace]/scoring.git +cd scoring +``` + +The developer should create feature branchs from the latest `develop` branch. + +## Resources and useful links +* SSH keys: https://docs.gitlab.com/ee/user/ssh.html +* Gitflow workflow: https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow + diff --git a/a-MSE.md b/a-MSE.md new file mode 100644 index 0000000000000000000000000000000000000000..b00aaec898ca804dcc2a8077294b26c153916b67 --- /dev/null +++ b/a-MSE.md @@ -0,0 +1,274 @@ +# Mean Squared Error (MSE) + +## Understand the code + +The codebase uses a script `score.py` to define metric functions. Accuracy metric was implemented for you. Run the script to see the result: + +```bash +./score.py predict.txt solution.txt +``` + +Install the necessary packages: +```bash +pip install numpy +``` + +## Implement MSE metric + +Now your task as a developer is to implement MSE metric. Recall the formula for MSE is: + +$$ MSE = \frac{1}{N} \sum_{i=1}^{N} (predicted_i - actual_i)² $$ + +<details> +<summary>Solution</summary> +<p> + +- Create a feature `a_MSE` branch: + +```bash +git checkout develop +git pull origin develop +git checkout -b a_MSE +``` + +</p> + +<p> + +- Create a `mse_metric` function and compute its score: + +```python +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import os +from sys import argv +import numpy as np + + +# ========= Useful functions ============== +def read_array(filename): + ''' Read array and convert to 2d np arrays ''' + array = np.genfromtxt(filename, dtype=float) + if len(array.shape) == 1: + array = array.reshape(-1, 1) + return array + + +def accuracy_metric(solution, prediction): + correct_samples = np.all(solution == prediction, axis=1) + return correct_samples.mean() + + +def mse_metric(solution, prediction): + '''Mean-square error. + Works even if the target matrix has more than one column''' + mse = np.sum((solution - prediction)**2, axis=1) + return np.mean(mse) + + +def _HERE(*args): + h = os.path.dirname(os.path.realpath(__file__)) + return os.path.join(h, *args) + + +# =============================== MAIN ======================================== +if __name__ == "__main__": + + #### INPUT/OUTPUT: Get input and output directory names + + prediction_file = argv[1] + solution_file = argv[2] + score_file = open(_HERE('scores.txt'), 'w') + + # # Extract the dataset name from the file name + prediction_name = os.path.basename(prediction_file) + + # Read the solution and prediction values into numpy arrays + solution = read_array(solution_file) + prediction = read_array(prediction_file) + + # Compute the score prescribed by the metric file + accuracy_score = accuracy_metric(solution, prediction) + mse_score = mse_metric(solution, prediction) + print( + "======= (" + prediction_name + "): score(accuracy_metric)=%0.2f =======" % accuracy_score) + print( + "======= (" + prediction_name + "): score(mse_metric)=%0.2f =======" % mse_score) + # Write score corresponding to selected task and metric to the output file + score_file.write("accuracy_metric: %0.2f\n" % accuracy_score) + score_file.write("mse_metric: %0.2f\n" % mse_score) + score_file.close() +``` + +</p> + +<p> + +- Commit and push changes: + +```bash +git add score.py +git commit -m "implement MSE metric" +git push origin a_MSE +``` + +</p> +</details> + +## Implement unit tests for MSE + +The tester have to implement unit tests for the accuracy_metric and mse_metric functions in the `score.py` program. The process involves creating a test class, setting up test data, and then writing test cases for each function. You'll ensure that the tests cover a variety of scenarios, such as correct and incorrect predictions, and edge cases like empty arrays. + +<details> +<summary>Solution</summary> +<p> + +- Install pytest: + +```bash +pip install pytest +``` +</p> + +<p> + +- Create a file `tests/test_metrics.py`: + +```python +import unittest +import numpy as np +import os, sys + +sys.path.append(os.path.join(os.path.dirname(__file__), "..")) +from score import accuracy_metric, mse_metric # Import your metric functions + +class TestMetrics(unittest.TestCase): + + def setUp(self): + # Set up test data + self.solution = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 1]]) + self.correct_prediction = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 1]]) + self.incorrect_prediction = np.array([[0, 1, 0], [1, 0, 1], [0, 0, 1]]) + self.empty_array = np.array([]) + + def test_accuracy_metric(self): + # Test accuracy with correct predictions + self.assertEqual(accuracy_metric(self.solution, self.correct_prediction), 1.0) + + # Test accuracy with incorrect predictions + self.assertEqual(accuracy_metric(self.solution, self.incorrect_prediction), 0.0) + + # Test accuracy with empty arrays + self.assertEqual(accuracy_metric(self.empty_array, self.empty_array), 0.0) + + def test_mse_metric(self): + # Test MSE with correct predictions + self.assertEqual(mse_metric(self.solution, self.correct_prediction), 0.0) + + # Test MSE with incorrect predictions + self.assertEqual(mse_metric(self.solution, self.incorrect_prediction), 8/3) + + # Test MSE with empty arrays + self.assertEqual(mse_metric(self.empty_array, self.empty_array), 0.0) + +if __name__ == '__main__': + unittest.main() +``` + +</p> + +<p> + +- Run the tests: + +```bash +pytest +``` + +</p> + +<p> + +- Commit and push changes: + +```bash +git add tests/test_metrics.py +git commit -m "implement unit tests for metrics" +git push origin a_MSE +``` + +</p> + +</details> + +## Fix MSE metric + +The tests failed. The tester tells to the developer that the two metric functions can't handle empty arrays. Now, the developer need to fix this by adding checks to return a default value or raise an error when the input arrays are empty. The appropriate action depends on the expected behavior of your application. + +<details> +<summary>Solution</summary> +<p> + +```python +def accuracy_metric(solution, prediction): + if len(solution) == 0 or len(prediction) == 0: + return 0 + correct_samples = np.all(solution == prediction, axis=1) + return np.mean(correct_samples) + + +def mse_metric(solution, prediction): + '''Mean-square error. + Works even if the target matrix has more than one column''' + if len(solution) == 0 or len(prediction) == 0: + return 0 + mse = np.sum((solution - prediction)**2, axis=1) + return np.mean(mse) +``` + +</p> + +<p> + +- Run the tests: + +```bash +pytest +``` + +The tests now should pass. +</p> + +<p> + +- Commit and push changes: + +```bash +git add score.py +git commit -m "fix metric functions to handle empty arrays" +git push origin a_MSE +``` + +</p> +</details> + +## Create a merge request + +As the tests passed, the developer creates a merge request from `a_MSE` to `develop`. +1. On Gitlab project page, Select Code > Merge requests. +2. In the upper-right corner, select New merge request. +3. Select a source and target branch, then select Compare branches and continue. +4. Complete the fields on the New merge request page, + * Description: implement MSE metric and unit tests for the two functions. + * Assignee: `Project leader` + * Reviewer: `Code reviewer` + +then select Create merge request. + +## Approve a merge request + +The code reviewer does run all the tests again and ensures that the features `MSE` and `unittest` have been accurately implemented. If something goes wrong, the reviewer communicates these findings to the developer and tester for rectification. If everything is OK, click on `Approve`. + +## Merge `a_MSE` into `develop` + +It is the responsibility of the project leader to finalize the integration of changes into the `develop`. This involves merging the requests that have been created by the developer and subsequently approved by the reviewer by simply clicking `Merge` button. \ No newline at end of file diff --git a/b-INPUT-VALIDATION.md b/b-INPUT-VALIDATION.md new file mode 100644 index 0000000000000000000000000000000000000000..32147e0b08cf037a168002126449b3da1ee83bec --- /dev/null +++ b/b-INPUT-VALIDATION.md @@ -0,0 +1,184 @@ +# Input validation + +For the forthcoming feature **input validation**, you need to ensure that the input files (`predict.txt` and `solution.txt`) are correctly formatted and handle possible errors gracefully. Within your team, you can switch roles to diversify experience. Essentially, the steps remains the same to the feature `MSE`. + +## Implement input validation + +<details> +<summary>Solution</summary> +<p> + +- Create a feature `b_input_validation` branch: + +```bash +git checkout develop +git pull origin develop +git checkout -b b_input_validation +``` + +</p> + +<p> + +- `score.py`: + +```python +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import os +import sys +from sys import argv +import numpy as np + + +# ========= Useful functions ============== +def read_array(filename): + ''' Read array and convert to 2d np arrays ''' + if not os.path.exists(filename): + raise FileNotFoundError(f"The file {filename} does not exist.") + + formatted_array = [] + + with open(filename, 'r') as file: + for line_num, line in enumerate(file, start=1): + # Split the line into elements and strip whitespace + elements = line.strip().split() + + # Check if there are exactly three elements + if len(elements) != 3: + raise ValueError(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}") + + # Check if all elements are either '0' or '1' + if not all(elem in ['0', '1'] for elem in elements): + raise ValueError(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'") + + # Convert elements to integers and add to the array + formatted_array.append([int(elem) for elem in elements]) + + # Convert the list to a numpy array + return np.array(formatted_array) + + +def accuracy_metric(solution, prediction): + if len(solution) == 0 or len(prediction) == 0: + return 0 + correct_samples = np.all(solution == prediction, axis=1) + return np.mean(correct_samples) + + +def mse_metric(solution, prediction): + '''Mean-square error. + Works even if the target matrix has more than one column''' + if len(solution) == 0 or len(prediction) == 0: + return 0 + mse = np.sum((solution - prediction)**2, axis=1) + return np.mean(mse) + + +def _HERE(*args): + h = os.path.dirname(os.path.realpath(__file__)) + return os.path.join(h, *args) + + +# =============================== MAIN ======================================== +if __name__ == "__main__": + + #### INPUT/OUTPUT: Get input and output directory names + try: + prediction_file = argv[1] + solution_file = argv[2] + # Read the solution and prediction values into numpy arrays + solution = read_array(solution_file) + prediction = read_array(prediction_file) + except IndexError: + print("Usage: script.py predict.txt solution.txt") + sys.exit(1) + except (FileNotFoundError, IOError) as e: + print(e) + sys.exit(1) + + score_file = open(_HERE('scores.txt'), 'w') + # # Extract the dataset name from the file name + prediction_name = os.path.basename(prediction_file) + + # Check if the shapes of the arrays are compatible + if prediction.shape != solution.shape: + print("Error: Prediction and solution arrays have different shapes.") + sys.exit(1) + + # Compute the score prescribed by the metric file + accuracy_score = accuracy_metric(solution, prediction) + mse_score = mse_metric(solution, prediction) + print( + "======= (" + prediction_name + "): score(accuracy_metric)=%0.2f =======" % accuracy_score) + print( + "======= (" + prediction_name + "): score(mse_metric)=%0.2f =======" % mse_score) + # Write score corresponding to selected task and metric to the output file + score_file.write("accuracy_metric: %0.2f\n" % accuracy_score) + score_file.write("mse_metric: %0.2f\n" % mse_score) + score_file.close() +``` + +</p> +</details> + +## Implement unit tests for input validation + +<details> +<summary>Solution</summary> +<p> + +<p> + +Create a file `tests/test_fileformat.py`: + +```python +import unittest +import numpy as np +import os, sys + +sys.path.append(os.path.join(os.path.dirname(__file__), "..")) +from score import read_array + +class TestFileFormat(unittest.TestCase): + + def setUp(self): + """Create sample files for testing.""" + self.correct_format_file = 'correct_format.txt' + self.wrong_format_file = 'wrong_format.txt' + + with open(self.correct_format_file, 'w') as f: + f.write('0 1 0\n1 0 1\n0 0 0') + + with open(self.wrong_format_file, 'w') as f: + f.write('0 1\n1 1 0 1\na b c') + + def tearDown(self): + """Remove the created test files.""" + os.remove(self.correct_format_file) + os.remove(self.wrong_format_file) + + def test_correct_format(self): + """Test that correctly formatted files are read without errors.""" + result = read_array(self.correct_format_file) + expected = np.array([[0, 1, 0], [1, 0, 1], [0, 0, 0]]) + np.testing.assert_array_equal(result, expected) + + def test_wrong_format(self): + """Test that incorrectly formatted files raise an error.""" + with self.assertRaises(ValueError): + read_array(self.wrong_format_file) + + +if __name__ == '__main__': + unittest.main() +``` + +</p> +</details> + +## Create a merge request + +## Approve a merge request + +## Merge `b_input_validation` into `develop` \ No newline at end of file diff --git a/c-LOGGING.md b/c-LOGGING.md new file mode 100644 index 0000000000000000000000000000000000000000..f2db9dd956c8ae0da8bcbc903527a7b3ac968900 --- /dev/null +++ b/c-LOGGING.md @@ -0,0 +1,163 @@ +# Logging + +Implement logging to capture the process and any potential errors. This will help in debugging and maintenance. + +## Implement logging + +<details> +<summary>Solution</summary> +<p> + +- Create a feature `c_logging` branch: + +```bash +git checkout develop +git pull origin develop +git checkout -b c_logging +``` + +</p> + +<p> + +- `score.py`: + +```python +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import os +import sys +from sys import argv +import numpy as np +import logging + +# Configure logging +logging.basicConfig(filename='score.log', filemode='w', + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', + level=logging.DEBUG) + + +# ========= Useful functions ============== +def read_array(filename): + ''' Read array and convert to 2d np arrays ''' + logging.debug(f"Attempting to read file: {filename}") + + if not os.path.exists(filename): + logging.error(f"The file {filename} does not exist.") + raise FileNotFoundError(f"The file {filename} does not exist.") + + formatted_array = [] + + with open(filename, 'r') as file: + for line_num, line in enumerate(file, start=1): + # Split the line into elements and strip whitespace + elements = line.strip().split() + + # Check if there are exactly three elements + if len(elements) != 3: + logging.error(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}") + raise ValueError(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}") + + # Check if all elements are either '0' or '1' + if not all(elem in ['0', '1'] for elem in elements): + logging.error(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'") + raise ValueError(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'") + + # Convert elements to integers and add to the array + formatted_array.append([int(elem) for elem in elements]) + + logging.info(f"File {filename} read successfully.") + # Convert the list to a numpy array + return np.array(formatted_array) + + +def accuracy_metric(solution, prediction): + logging.debug("Calculating accuracy metric") + if len(solution) == 0 or len(prediction) == 0: + logging.warning("Received empty array(s) in accuracy_metric") + return 0 + correct_samples = np.all(solution == prediction, axis=1) + accuracy = np.mean(correct_samples) + logging.info(f"Accuracy metric calculated successfully: {accuracy}") + return accuracy + + +def mse_metric(solution, prediction): + '''Mean-square error. + Works even if the target matrix has more than one column''' + logging.debug("Calculating MSE metric") + if len(solution) == 0 or len(prediction) == 0: + logging.warning("Received empty array(s) in mse_metric") + return 0 + mse = np.sum((solution - prediction)**2, axis=1) + mse = np.mean(mse) + logging.info(f"MSE metric calculated successfully: {mse}") + return mse + + +def _HERE(*args): + h = os.path.dirname(os.path.realpath(__file__)) + return os.path.join(h, *args) + + +# =============================== MAIN ======================================== +if __name__ == "__main__": + + #### INPUT/OUTPUT: Get input and output directory names + try: + logging.debug("Score execution started.") + prediction_file = argv[1] + solution_file = argv[2] + # Read the solution and prediction values into numpy arrays + solution = read_array(solution_file) + prediction = read_array(prediction_file) + except IndexError: + logging.error("Incorrect usage: script requires two arguments for prediction and solution files.") + print("Usage: script.py predict.txt solution.txt") + sys.exit(1) + except (FileNotFoundError, IOError) as e: + logging.error(e) + sys.exit(1) + + score_file = open(_HERE('scores.txt'), 'w') + # # Extract the dataset name from the file name + prediction_name = os.path.basename(prediction_file) + + # Check if the shapes of the arrays are compatible + if prediction.shape != solution.shape: + logging.error("Error: Prediction and solution arrays have different shapes.") + sys.exit(1) + + # Compute the score prescribed by the metric file + accuracy_score = accuracy_metric(solution, prediction) + mse_score = mse_metric(solution, prediction) + print( + "======= (" + prediction_name + "): score(accuracy_metric)=%0.2f =======" % accuracy_score) + print( + "======= (" + prediction_name + "): score(mse_metric)=%0.2f =======" % mse_score) + # Write score corresponding to selected task and metric to the output file + score_file.write("accuracy_metric: %0.2f\n" % accuracy_score) + score_file.write("mse_metric: %0.2f\n" % mse_score) + score_file.close() + logging.info("Score completed successfully") +``` + +</p> + +<p> + +- See the logs: + +```bash +./score.py predict.txt solution.txt +cat score.log +``` + +</p> +</details> + +## Create a merge request + +## Approve a merge request + +## Merge `c_logging` into `develop` \ No newline at end of file diff --git a/i-RELEASE.md b/i-RELEASE.md new file mode 100644 index 0000000000000000000000000000000000000000..deaa566cde02c91fcade4be01c834b842dab23dc --- /dev/null +++ b/i-RELEASE.md @@ -0,0 +1,29 @@ +# Release + +You have successfully implemented three features, Mean Squared Error (MSE), input validation and logging. This accomplishment fulfills the requirements for proceeding with a release. As a projet leader, you: +1. Merge develop into main +```bash +git pull origin develop +git checkout main +git merge develop +git push origin main +``` + +2. Create a tag +```bash +git tag -a v2.0 -m "Version 2.0" +``` + +3. Push your tags +```bash +git push origin --tags +``` + +4. Create a release + * On Gitlab project page, Select Deploy > Releases and select New release. + * Select an existing Git tag `v2.0` + * Enter additional information about the release, including: + * Release title: Leave blank + * Milestones: Leave blank + * Release notes: Add three new features: Mean Squared Error (MSE), input validation and logging + * Select Create release.