Transformers
Safetensors
English
ocbyram commited on
Commit
16b416a
·
verified ·
1 Parent(s): 26b6571

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -9,7 +9,7 @@ base_model:
9
 
10
  # Introduction
11
 
12
- According to the According to the [August 2025 jobs report](https://www.bls.gov/), overall unemployment has risen, with the unemployment rate for workers aged 16-24 rising to 10.5%. The primary demographic
13
  of this age range is recent college graduates, many of whom carry student loan debt and are unable to find stable, long-term employment. While this could be
14
  attributed to any of the various economic challenges facing the US today, there is speculation that it may
15
  be due to insufficient skills regarding job-hunting and interviews. There are many resources that seek to fill this gap, including interview-prep LLMs such as [Interview Copilot](https://interviewcopilot.io/). However, there is not an LLM that
@@ -99,7 +99,7 @@ due to its ability to test whether my model is able to retain general comprehens
99
  If my model performs poorly, I know that my synthetic data overfit the model and it cannot perform things like basic sentence composition and reasoning. I chose the comparison model [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
100
  because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
101
  capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
102
- This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model, but the high bert scores for
103
  the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when t came to HumanEval and E2E NLG Challenge,
104
  it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
105
 
@@ -137,10 +137,16 @@ pipe = pipeline(
137
  do_sample=False,
138
  )
139
 
140
- formatted_prompt = f"Job Description: Data Scientist. Education must include Bachelor’s or Master’s degree in Data Science, Computer Science, or Statistics and
141
- must have 1-2 years of experience in data analytics, machine learning, or AI model deployment. User Profile: I got my bachelor's in computer science from the
 
 
 
 
142
  University of Delaware in 2020 and a master's in Data Science from the University of Virginia in 2021. I have been working as a data scientist at Google for three
143
- years. My skills include Python, data visualization, SQL, Microsoft Office, and Tableau. Interview Question and Optimal Answer: "
 
 
144
 
145
  ```
146
 
@@ -150,14 +156,16 @@ The expected output format for this model is a generated interview question foll
150
  the expected output format below.
151
 
152
  ```
153
- How do you handle missing data in your datasets? I use various imputation techniques such as mean imputation, median imputation, and KNN imputation to fill in
 
 
154
  missing values. I also use techniques like forward and backward filling to handle missing data in time series data.
155
  ```
156
 
157
  # Limitations
158
 
159
  The main limitation of this model is that it does not perform well on benchmarks outside of the chosen task, indicating that the model suffered
160
- catastrophic forgetting during the training process. The benchmark task performance of the trained model on HumanEval, Squadv2, and E2E NLG Challenge were all lower than the baseline model.
161
  This means that using the model for anything outside of the interview preparation use-case is unlikely to work well. Additionally, some of the model responses were not as expected,
162
  as they included multiple questions and answers instead of the one that I asked for. While this technically works as long as the questions/answers are coherent and relevant,
163
  it is still a limitation because I did not want the model to generate more than one question/answer. Generating multiple has a higher risk of inaccurate generated outputs.
 
9
 
10
  # Introduction
11
 
12
+ According to the [August 2025 jobs report](https://www.bls.gov/), overall unemployment has risen, with the unemployment rate for workers aged 16-24 rising to 10.5%. The primary demographic
13
  of this age range is recent college graduates, many of whom carry student loan debt and are unable to find stable, long-term employment. While this could be
14
  attributed to any of the various economic challenges facing the US today, there is speculation that it may
15
  be due to insufficient skills regarding job-hunting and interviews. There are many resources that seek to fill this gap, including interview-prep LLMs such as [Interview Copilot](https://interviewcopilot.io/). However, there is not an LLM that
 
99
  If my model performs poorly, I know that my synthetic data overfit the model and it cannot perform things like basic sentence composition and reasoning. I chose the comparison model [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
100
  because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
101
  capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
102
+ This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model for the testing split, but the high bert scores for
103
  the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when t came to HumanEval and E2E NLG Challenge,
104
  it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
105
 
 
137
  do_sample=False,
138
  )
139
 
140
+ formatted_prompt = f"
141
+
142
+ Job Description: Data Scientist. Education must include Bachelor’s or Master’s degree in Data Science, Computer Science, or Statistics and
143
+ must have 1-2 years of experience in data analytics, machine learning, or AI model deployment.
144
+
145
+ User Profile: I got my bachelor's in computer science from the
146
  University of Delaware in 2020 and a master's in Data Science from the University of Virginia in 2021. I have been working as a data scientist at Google for three
147
+ years. My skills include Python, data visualization, SQL, Microsoft Office, and Tableau.
148
+
149
+ Interview Question and Optimal Answer: "
150
 
151
  ```
152
 
 
156
  the expected output format below.
157
 
158
  ```
159
+ How do you handle missing data in your datasets?
160
+
161
+ I use various imputation techniques such as mean imputation, median imputation, and KNN imputation to fill in
162
  missing values. I also use techniques like forward and backward filling to handle missing data in time series data.
163
  ```
164
 
165
  # Limitations
166
 
167
  The main limitation of this model is that it does not perform well on benchmarks outside of the chosen task, indicating that the model suffered
168
+ catastrophic forgetting during the training process. The benchmark task performance of the trained model on HumanEval and E2E NLG Challenge were lower than the baseline model.
169
  This means that using the model for anything outside of the interview preparation use-case is unlikely to work well. Additionally, some of the model responses were not as expected,
170
  as they included multiple questions and answers instead of the one that I asked for. While this technically works as long as the questions/answers are coherent and relevant,
171
  it is still a limitation because I did not want the model to generate more than one question/answer. Generating multiple has a higher risk of inaccurate generated outputs.