PRACTICAL IMPLEMENTATION NOTES.

Datablock syntax:
- Recall that this is a block and does not load the data
- Use item_tfms to RESIZE certain images
- When creating a dataloader include the specific training or validation portion of the path
Show sample batches:
- dataloader.train/valid.show_batch (max n and n rows)
To define custom loss functions, extend torch.autograd.function
- ctx → this is a context object where you can store information for backward computatio
  - We access this information via ctx.saved_tensors, and store it via save_for_backward
- Then, implement a forward and backward method → both of these are static methods in python
  - Use the @staticmethod decorator to declare that both methods are static
- Use the .apply method when calculating this
Loading a pretrained model from fastai:
- Use the appropriate learner (cnn_learner → vision_learner) and specify the pretrained model type
torch.max
- 0 → applies function column wise (selects max from each COLUMN)
- 1 → applies function row wise (as in will select the max from each ROW)
Pytorch’s implementation of NLLLoss only accepts comparing a distribution to a given variable
- You can’t compute cross entropy in Pytorch between two distributions using just this
- CROSSENTROPY AND THE LIKE ONLY ACCEPT ONE DIMENSIONAL TENSORS
- IF YOU WANT TO COMAPRE MULTIPLE TENSORS WITH ONE ANOTHER, USE KULLBACK-DIVERGENCE LOSS instead of cross entropy to measure how far the two tensors are
- KL measures the difference between entropy and cross entropy
- Or, implement it from scratch
Make sure that we are using THE CORRECT VALUES FOR ALL CALCULATIONS!
- Namely → understand the difference between predictions (0-1) and labels (the index of the highest prediction)
NOTE → FOR CWTM, WE ARE SUBTRACTING THE MAXIMUM PROBABILITIES OF THE STUDENT FROM THE TRUE PROBABILITIES (LABELS) OF THE TRUE LABELS
- This is not the same as subtracting indexes. We don’t care about the predicted class but rather the probability
- And, the probabilities for the true labels can only be zero or one
- So, subtract the probabilities at the CORRECT index.
  - Use torch.index_select to specify index selection along indicies
    - CRITICAL NOTE → THIS SELECTS THE SPECIFIED INDEXES PER DIMENSION AND NOT IN TOTAL!
    - So, use TORCH.GATHER for this instead
  - Use torch.rand to generate random tensors for testing, anad torch.randint for a tensor of random integers
- IN FASTAI, DO NOT PASS THE SAME MODEL ARCHITECTURE TO BOTH LEARNERS AS IT MEANS THAT THE UNDERLYING MODEL WILL BE THE SAME!
The loss function IS JUST STANDARD CROSS ENTROPY → WE ARE NOT MEASURING DIFFERENCE IN DISTRIBUTIONS
- We instead just want cross entropy between student and labels
- But, the GRADIENT will have CWTM.
Use torch.where for conditional elementwise operations