Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ base_model:
|
|
| 9 |
|
| 10 |
# Introduction
|
| 11 |
|
| 12 |
-
According to the
|
| 13 |
of this age range is recent college graduates, many of whom carry student loan debt and are unable to find stable, long-term employment. While this could be
|
| 14 |
attributed to any of the various economic challenges facing the US today, there is speculation that it may
|
| 15 |
be due to insufficient skills regarding job-hunting and interviews. There are many resources that seek to fill this gap, including interview-prep LLMs such as [Interview Copilot](https://interviewcopilot.io/). However, there is not an LLM that
|
|
@@ -99,7 +99,7 @@ due to its ability to test whether my model is able to retain general comprehens
|
|
| 99 |
If my model performs poorly, I know that my synthetic data overfit the model and it cannot perform things like basic sentence composition and reasoning. I chose the comparison model [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
|
| 100 |
because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
|
| 101 |
capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
|
| 102 |
-
This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model, but the high bert scores for
|
| 103 |
the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when t came to HumanEval and E2E NLG Challenge,
|
| 104 |
it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
|
| 105 |
|
|
@@ -137,10 +137,16 @@ pipe = pipeline(
|
|
| 137 |
do_sample=False,
|
| 138 |
)
|
| 139 |
|
| 140 |
-
formatted_prompt = f"
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
University of Delaware in 2020 and a master's in Data Science from the University of Virginia in 2021. I have been working as a data scientist at Google for three
|
| 143 |
-
years. My skills include Python, data visualization, SQL, Microsoft Office, and Tableau.
|
|
|
|
|
|
|
| 144 |
|
| 145 |
```
|
| 146 |
|
|
@@ -150,14 +156,16 @@ The expected output format for this model is a generated interview question foll
|
|
| 150 |
the expected output format below.
|
| 151 |
|
| 152 |
```
|
| 153 |
-
How do you handle missing data in your datasets?
|
|
|
|
|
|
|
| 154 |
missing values. I also use techniques like forward and backward filling to handle missing data in time series data.
|
| 155 |
```
|
| 156 |
|
| 157 |
# Limitations
|
| 158 |
|
| 159 |
The main limitation of this model is that it does not perform well on benchmarks outside of the chosen task, indicating that the model suffered
|
| 160 |
-
catastrophic forgetting during the training process. The benchmark task performance of the trained model on HumanEval
|
| 161 |
This means that using the model for anything outside of the interview preparation use-case is unlikely to work well. Additionally, some of the model responses were not as expected,
|
| 162 |
as they included multiple questions and answers instead of the one that I asked for. While this technically works as long as the questions/answers are coherent and relevant,
|
| 163 |
it is still a limitation because I did not want the model to generate more than one question/answer. Generating multiple has a higher risk of inaccurate generated outputs.
|
|
|
|
| 9 |
|
| 10 |
# Introduction
|
| 11 |
|
| 12 |
+
According to the [August 2025 jobs report](https://www.bls.gov/), overall unemployment has risen, with the unemployment rate for workers aged 16-24 rising to 10.5%. The primary demographic
|
| 13 |
of this age range is recent college graduates, many of whom carry student loan debt and are unable to find stable, long-term employment. While this could be
|
| 14 |
attributed to any of the various economic challenges facing the US today, there is speculation that it may
|
| 15 |
be due to insufficient skills regarding job-hunting and interviews. There are many resources that seek to fill this gap, including interview-prep LLMs such as [Interview Copilot](https://interviewcopilot.io/). However, there is not an LLM that
|
|
|
|
| 99 |
If my model performs poorly, I know that my synthetic data overfit the model and it cannot perform things like basic sentence composition and reasoning. I chose the comparison model [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
|
| 100 |
because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
|
| 101 |
capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
|
| 102 |
+
This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model for the testing split, but the high bert scores for
|
| 103 |
the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when t came to HumanEval and E2E NLG Challenge,
|
| 104 |
it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
|
| 105 |
|
|
|
|
| 137 |
do_sample=False,
|
| 138 |
)
|
| 139 |
|
| 140 |
+
formatted_prompt = f"
|
| 141 |
+
|
| 142 |
+
Job Description: Data Scientist. Education must include Bachelor’s or Master’s degree in Data Science, Computer Science, or Statistics and
|
| 143 |
+
must have 1-2 years of experience in data analytics, machine learning, or AI model deployment.
|
| 144 |
+
|
| 145 |
+
User Profile: I got my bachelor's in computer science from the
|
| 146 |
University of Delaware in 2020 and a master's in Data Science from the University of Virginia in 2021. I have been working as a data scientist at Google for three
|
| 147 |
+
years. My skills include Python, data visualization, SQL, Microsoft Office, and Tableau.
|
| 148 |
+
|
| 149 |
+
Interview Question and Optimal Answer: "
|
| 150 |
|
| 151 |
```
|
| 152 |
|
|
|
|
| 156 |
the expected output format below.
|
| 157 |
|
| 158 |
```
|
| 159 |
+
How do you handle missing data in your datasets?
|
| 160 |
+
|
| 161 |
+
I use various imputation techniques such as mean imputation, median imputation, and KNN imputation to fill in
|
| 162 |
missing values. I also use techniques like forward and backward filling to handle missing data in time series data.
|
| 163 |
```
|
| 164 |
|
| 165 |
# Limitations
|
| 166 |
|
| 167 |
The main limitation of this model is that it does not perform well on benchmarks outside of the chosen task, indicating that the model suffered
|
| 168 |
+
catastrophic forgetting during the training process. The benchmark task performance of the trained model on HumanEval and E2E NLG Challenge were lower than the baseline model.
|
| 169 |
This means that using the model for anything outside of the interview preparation use-case is unlikely to work well. Additionally, some of the model responses were not as expected,
|
| 170 |
as they included multiple questions and answers instead of the one that I asked for. While this technically works as long as the questions/answers are coherent and relevant,
|
| 171 |
it is still a limitation because I did not want the model to generate more than one question/answer. Generating multiple has a higher risk of inaccurate generated outputs.
|