Transformers
Safetensors
English
ocbyram commited on
Commit
962257c
·
verified ·
1 Parent(s): 8de6b37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -10
README.md CHANGED
@@ -66,26 +66,34 @@ method for shuffling. The proportions are as follows: Training: 3,200 examples (
66
  with Random seed: 42.
67
 
68
 
69
- ## Methodology
70
 
71
- Finetuning Tasks: As you did in the fourth project check in,
72
- describe which method you chose for training and why you chose
73
- that method. Make note of any hyperparameter values you used so
74
- that others can reproduce your results.
 
 
 
 
 
 
 
 
75
 
76
- ## Evaluation
77
 
78
- ## Usage and Intended Use
79
 
80
- ## Prompt Format
81
 
82
- ## Expected Output Format
83
 
84
  This section should
85
  briefly describe the expected output format for your model and include a
86
  general code chunk showing an example model response.
87
 
88
- ## Limitations
89
 
90
  This section should summarize the main
91
  limitations of your model. Limitations could be based on benchmark task
 
66
  with Random seed: 42.
67
 
68
 
69
+ # Methodology
70
 
71
+ The training method that I implemented for this task was finetuning, specifically the parameter-efficient finetuning method LoRA. In class we learned about several model interventions, ranging from few-shot prompting to full finetuning.
72
+ For the purposes of this project, I chose to use PEFT. PEFT updates various aspects of the model to increase task performance while also attempting to keep catastrophic forgetting at a minimal level,
73
+ as many of the methods freeze parameters/layers to prevent ones irrelevant to the task from being updated. PEFT is a great alternative to full finetuning because it uses less resources,
74
+ but can still produce efficient results for a trained model. Knowing that I was going to use PEFT was not enough for my training approach, I also needed to decide which PEFT method to use, what to set the hyperparameters as,
75
+ and how to choose the best model. As we have learned in class, two basic PEFT models are prompt tuning and LoRA. Through past projects, I found that prompt tuning resulted in catastrophic forgetting, as well as no performance accuracy
76
+ increase in the task I was training.
77
+ The task, gsm8k_cot, has a flexible match accuracy of only 0.02 before and after prompt training, while the benchmark SST-2 decreased in accuracy from 0.72 to 0.60.
78
+ This was not something I was eager to repeat with this project, as I would prefer my training improves my task. In another assignment, I found that LoRA improved that same task
79
+ from 0.0 performance accuracy to 0.10 (a 10% increase), while decreasing the benchmark SST-2 from 0.72 to 0.63 after training. While there was still evidence of
80
+ catastrophic forgetting, it is hard to ignore a 10% performance increase. Due to this, I chose LoRA to be the PEFT model I implement in my training. LoRA injects
81
+ low-rank adapters into specific modules, which I am hoping will train the model to perform my task well. I performed my training with three sets of hyperparameters,
82
+ while collecting the validation loss, then choose model/combination of hyperparameters with the lowest one. This hyperparameter combination was rank: 64, alpha: 128, and dropout: 0.15.
83
 
84
+ # Evaluation
85
 
86
+ # Usage and Intended Use
87
 
88
+ # Prompt Format
89
 
90
+ # Expected Output Format
91
 
92
  This section should
93
  briefly describe the expected output format for your model and include a
94
  general code chunk showing an example model response.
95
 
96
+ # Limitations
97
 
98
  This section should summarize the main
99
  limitations of your model. Limitations could be based on benchmark task