c-LOGGING.md

# Logging

Implement `logging` to capture the process and any potential errors. This will help in debugging and maintenance. We are now going to construct a scenario in which a merge conflict arises, and we will resolve it. This will involve both the developer and the tester concurrently working on implementing the `logging` feature.

## Implement logging

<details>
<summary>Scenario</summary>
<p>

- The developer and tester create a feature `c_logging` branch:

```bash
git checkout develop
git pull origin develop
git checkout -b c_logging
```

</p>

<p>

- The developer pushes a new `score.py` with logging:

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import sys
from sys import argv
import numpy as np
import logging

# Configure logging
logging.basicConfig(filename='score.log', filemode='w', 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', 
                    level=logging.DEBUG)
    

# ========= Useful functions ==============
def read_array(filename):
    ''' Read array and convert to 2d np arrays '''
    logging.debug(f"Attempting to read file: {filename}")

    if not os.path.exists(filename):
        logging.error(f"The file {filename} does not exist.")
        raise FileNotFoundError(f"The file {filename} does not exist.")
    
    formatted_array = []

    with open(filename, 'r') as file:
        for line_num, line in enumerate(file, start=1):
            # Split the line into elements and strip whitespace
            elements = line.strip().split()

            # Check if there are exactly three elements
            if len(elements) != 3:
                logging.error(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}")
                raise ValueError(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}")

            # Check if all elements are either '0' or '1'
            if not all(elem in ['0', '1'] for elem in elements):
                logging.error(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'")
                raise ValueError(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'")

            # Convert elements to integers and add to the array
            formatted_array.append([int(elem) for elem in elements])

    logging.info(f"File {filename} read successfully.")
    # Convert the list to a numpy array
    return np.array(formatted_array)


def accuracy_metric(solution, prediction):
    logging.debug("Calculating accuracy metric")
    if len(solution) == 0 or len(prediction) == 0:
        logging.warning("Received empty array(s) in accuracy_metric")
        return 0
    correct_samples = np.all(solution == prediction, axis=1)
    accuracy = np.mean(correct_samples)
    logging.info(f"Accuracy metric calculated successfully: {accuracy}")
    return accuracy


def mse_metric(solution, prediction):
    '''Mean-square error.
    Works even if the target matrix has more than one column'''
    logging.debug("Calculating MSE metric")
    if len(solution) == 0 or len(prediction) == 0:
        logging.warning("Received empty array(s) in mse_metric")
        return 0
    mse = np.sum((solution - prediction)**2, axis=1)
    mse = np.mean(mse)
    logging.info(f"MSE metric calculated successfully: {mse}")
    return mse


def _HERE(*args):
    h = os.path.dirname(os.path.realpath(__file__))
    return os.path.join(h, *args)
    

# =============================== MAIN ========================================
if __name__ == "__main__":

    #### INPUT/OUTPUT: Get input and output directory names
    try:
        logging.debug("Score execution started.")
        prediction_file = argv[1]
        solution_file = argv[2]
        # Read the solution and prediction values into numpy arrays
        solution = read_array(solution_file)
        prediction = read_array(prediction_file)
    except IndexError:
        logging.error("Incorrect usage: script requires two arguments for prediction and solution files.")
        print("Usage: script.py predict.txt solution.txt")
        sys.exit(1)
    except (FileNotFoundError, IOError) as e:
        logging.error(e)
        sys.exit(1)
    
    score_file = open(_HERE('scores.txt'), 'w')
    # # Extract the dataset name from the file name
    prediction_name = os.path.basename(prediction_file)
    
    # Check if the shapes of the arrays are compatible
    if prediction.shape != solution.shape:
        logging.error("Error: Prediction and solution arrays have different shapes.")
        sys.exit(1)
    
    # Compute the score prescribed by the metric file 
    accuracy_score = accuracy_metric(solution, prediction)
    mse_score = mse_metric(solution, prediction)
    print(
        "======= (" + prediction_name + "): score(accuracy_metric)=%0.2f =======" % accuracy_score)
    print(
        "======= (" + prediction_name + "): score(mse_metric)=%0.2f =======" % mse_score)
    # Write score corresponding to selected task and metric to the output file
    score_file.write("accuracy_metric: %0.2f\n" % accuracy_score)
    score_file.write("mse_metric: %0.2f\n" % mse_score)
    score_file.close()
    logging.info("Score completed successfully")
```

</p>

<p>

- The tester pushes a new `score.py` with logging:

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import sys
from sys import argv
import numpy as np
import logging

# Configure logging
logging.basicConfig(filename='score.log', filemode='w', 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', 
                    level=logging.DEBUG)
    

# ========= Useful functions ==============
def read_array(filename):
    ''' Read array and convert to 2d np arrays '''
    logging.info(f"Attempting to read file: {filename}")

    if not os.path.exists(filename):
        logging.error(f"The file {filename} does not exist.")
        raise FileNotFoundError(f"The file {filename} does not exist.")
    
    formatted_array = []

    with open(filename, 'r') as file:
        for line_num, line in enumerate(file, start=1):
            # Split the line into elements and strip whitespace
            elements = line.strip().split()

            # Check if there are exactly three elements
            if len(elements) != 3:
                logging.error(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}")
                raise ValueError(f"Error in {filename}, line {line_num}: Expected 3 elements, found {len(elements)}")

            # Check if all elements are either '0' or '1'
            if not all(elem in ['0', '1'] for elem in elements):
                logging.error(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'")
                raise ValueError(f"Error in {filename}, line {line_num}: Elements must be '0' or '1'")

            # Convert elements to integers and add to the array
            formatted_array.append([int(elem) for elem in elements])

    logging.info(f"File {filename} read successfully.")
    # Convert the list to a numpy array
    return np.array(formatted_array)


def accuracy_metric(solution, prediction):
    logging.debug("Calculating accuracy metric")
    if len(solution) == 0 or len(prediction) == 0:
        logging.warning("Received empty array(s) in accuracy_metric")
        return 0
    correct_samples = np.all(solution == prediction, axis=1)
    accuracy = np.mean(correct_samples)
    logging.info(f"Accuracy metric calculated successfully: {accuracy}")
    return accuracy


def mse_metric(solution, prediction):
    '''Mean-square error.
    Works even if the target matrix has more than one column'''
    logging.debug("Calculating MSE metric")
    if len(solution) == 0 or len(prediction) == 0:
        logging.warning("Received empty array(s) in mse_metric")
        return 0
    mse = np.sum((solution - prediction)**2, axis=1)
    mse = np.mean(mse)
    logging.info(f"MSE metric calculated successfully: {mse}")
    return mse


def _HERE(*args):
    h = os.path.dirname(os.path.realpath(__file__))
    return os.path.join(h, *args)
    

# =============================== MAIN ========================================
if __name__ == "__main__":

    #### INPUT/OUTPUT: Get input and output directory names
    try:
        logging.debug("Score execution started.")
        prediction_file = argv[1]
        solution_file = argv[2]
        # Read the solution and prediction values into numpy arrays
        solution = read_array(solution_file)
        prediction = read_array(prediction_file)
    except IndexError:
        logging.error("Incorrect usage: script requires two arguments for prediction and solution files.")
        print("Usage: script.py predict.txt solution.txt")
        sys.exit(1)
    except (FileNotFoundError, IOError) as e:
        logging.error(e)
        sys.exit(1)
    
    score_file = open(_HERE('scores.txt'), 'w')
    # # Extract the dataset name from the file name
    prediction_name = os.path.basename(prediction_file)
    
    # Check if the shapes of the arrays are compatible
    if prediction.shape != solution.shape:
        logging.error("Error: Prediction and solution arrays have different shapes.")
        sys.exit(1)
    
    # Compute the score prescribed by the metric file 
    accuracy_score = accuracy_metric(solution, prediction)
    mse_score = mse_metric(solution, prediction)
    print(
        "======= (" + prediction_name + "): score(accuracy_metric)=%0.2f =======" % accuracy_score)
    print(
        "======= (" + prediction_name + "): score(mse_metric)=%0.2f =======" % mse_score)
    # Write score corresponding to selected task and metric to the output file
    score_file.write("accuracy_metric: %0.2f\n" % accuracy_score)
    score_file.write("mse_metric: %0.2f\n" % mse_score)
    score_file.close()
    logging.info("Score completed successfully")
```

</p>

<p>

- Whoever pushes after will see an error, meaning that the updates are rejected.

```bash
To https://gitlab.lisn.upsaclay.fr/tuanbui/scoring.git
 ! [rejected]        c_logging -> c_logging (fetch first)
error: failed to push some refs to 'https://gitlab.lisn.upsaclay.fr/tuanbui/scoring.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
```

This means someone else has pushed changes to the remote repository that you don't have in your local branch. This situation often occurs in collaborative environments where multiple people are pushing to the same repository or the same branch within a repository. Your push was rejected to prevent overwriting those changes. Before you can push your changes, you need to fetch the latest changes from the remote repository and merge them into your local branch. 

</p>

<p>

- Merge the remote changes into your local branch `git pull origin c_logging`:

```bash
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), 2.08 KiB | 2.08 MiB/s, done.
From https://gitlab.lisn.upsaclay.fr/tuanbui/scoring
 * branch            c_logging  -> FETCH_HEAD
 * [new branch]      c_logging  -> origin/c_logging
Auto-merging score.py
CONFLICT (content): Merge conflict in score.py
Automatic merge failed; fix conflicts and then commit the result.
```

Currently, there are visible conflicts that need to be resolved manually. 

</p>

<p>

- Generate a list of the files affected by the merge conflict `git status`:

```bash
On branch c_logging
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   score.py

no changes added to commit (use "git add" and/or "git commit -a")
```

Open your favorite text editor, such as [Visual Studio Code](https://code.visualstudio.com/), and navigate to the file that has merge conflicts. To see the beginning of the merge conflict in your file, search the file for the conflict marker `<<<<<<<`. When you open the file in your text editor, you'll see the changes from the HEAD or base branch after the line `<<<<<<< HEAD`. Next, you'll see `=======`, which divides your changes from the changes in the other branch, followed by `>>>>>>> BRANCH-NAME/COMMIT`. In this example, one person wrote "logging.info(f"Attempting to read file: {filename}")" in the base or HEAD branch and another person wrote "logging.debug(f"Attempting to read file: {filename}")" in the compare branch or commit. 

```python
# ========= Useful functions ==============
def read_array(filename):
    ''' Read array and convert to 2d np arrays '''
<<<<<<< HEAD
    logging.info(f"Attempting to read file: {filename}")
=======
    logging.debug(f"Attempting to read file: {filename}")
>>>>>>> 7890dd5cff2eef84a2c70174e5a8846beaa8bf78

    if not os.path.exists(filename):
```

Decide if you want to keep only your changes, keep only the other changes, or make a brand new change, which may incorporate changes from both. Delete the conflict markers `<<<<<<<`, `=======`, `>>>>>>>` and make the changes you want in the final merge. In this example, keep only "logging.debug(f"Attempting to read file: {filename}")" for a debug message. Add or stage your changes (`git add score.py`) and commit your changes with a comment (`git commit -m "Resolve merge conflict by keeping the debug message"`). Finally, push the final version.
</p>

<p>

- See the logs:

```bash
./score.py predict.txt solution.txt
cat score.log
```

</p>
</details>

## Create a merge request

## Approve a merge request

## Merge `c_logging` into `develop`