Update README.md
Browse files
README.md
CHANGED
|
@@ -66,26 +66,34 @@ method for shuffling. The proportions are as follows: Training: 3,200 examples (
|
|
| 66 |
with Random seed: 42.
|
| 67 |
|
| 68 |
|
| 69 |
-
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
| 83 |
|
| 84 |
This section should
|
| 85 |
briefly describe the expected output format for your model and include a
|
| 86 |
general code chunk showing an example model response.
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
This section should summarize the main
|
| 91 |
limitations of your model. Limitations could be based on benchmark task
|
|
|
|
| 66 |
with Random seed: 42.
|
| 67 |
|
| 68 |
|
| 69 |
+
# Methodology
|
| 70 |
|
| 71 |
+
The training method that I implemented for this task was finetuning, specifically the parameter-efficient finetuning method LoRA. In class we learned about several model interventions, ranging from few-shot prompting to full finetuning.
|
| 72 |
+
For the purposes of this project, I chose to use PEFT. PEFT updates various aspects of the model to increase task performance while also attempting to keep catastrophic forgetting at a minimal level,
|
| 73 |
+
as many of the methods freeze parameters/layers to prevent ones irrelevant to the task from being updated. PEFT is a great alternative to full finetuning because it uses less resources,
|
| 74 |
+
but can still produce efficient results for a trained model. Knowing that I was going to use PEFT was not enough for my training approach, I also needed to decide which PEFT method to use, what to set the hyperparameters as,
|
| 75 |
+
and how to choose the best model. As we have learned in class, two basic PEFT models are prompt tuning and LoRA. Through past projects, I found that prompt tuning resulted in catastrophic forgetting, as well as no performance accuracy
|
| 76 |
+
increase in the task I was training.
|
| 77 |
+
The task, gsm8k_cot, has a flexible match accuracy of only 0.02 before and after prompt training, while the benchmark SST-2 decreased in accuracy from 0.72 to 0.60.
|
| 78 |
+
This was not something I was eager to repeat with this project, as I would prefer my training improves my task. In another assignment, I found that LoRA improved that same task
|
| 79 |
+
from 0.0 performance accuracy to 0.10 (a 10% increase), while decreasing the benchmark SST-2 from 0.72 to 0.63 after training. While there was still evidence of
|
| 80 |
+
catastrophic forgetting, it is hard to ignore a 10% performance increase. Due to this, I chose LoRA to be the PEFT model I implement in my training. LoRA injects
|
| 81 |
+
low-rank adapters into specific modules, which I am hoping will train the model to perform my task well. I performed my training with three sets of hyperparameters,
|
| 82 |
+
while collecting the validation loss, then choose model/combination of hyperparameters with the lowest one. This hyperparameter combination was rank: 64, alpha: 128, and dropout: 0.15.
|
| 83 |
|
| 84 |
+
# Evaluation
|
| 85 |
|
| 86 |
+
# Usage and Intended Use
|
| 87 |
|
| 88 |
+
# Prompt Format
|
| 89 |
|
| 90 |
+
# Expected Output Format
|
| 91 |
|
| 92 |
This section should
|
| 93 |
briefly describe the expected output format for your model and include a
|
| 94 |
general code chunk showing an example model response.
|
| 95 |
|
| 96 |
+
# Limitations
|
| 97 |
|
| 98 |
This section should summarize the main
|
| 99 |
limitations of your model. Limitations could be based on benchmark task
|