ocbyram
/

Interview_Prep_Help

Transformers

Safetensors

English

Model card Files Files and versions

xet

Community

ocbyram commited on Nov 29, 2025

Commit

74b3866

verified ·

1 Parent(s): 55ecb83

Update README.md

Browse files

Files changed (1) hide show

README.md +36 -7

README.md CHANGED Viewed

@@ -34,6 +34,22 @@ For both generated datasets, I used the Llama-3.2-1B-Instruct model, due to its
 as well as technical answers, which was necessary for the interview questions. I used the job postings dataset with few-shot prompting to create the user profile
 dataset. For each job posting in the dataset, I had the model create a 'great', 'mediocre', and 'bad' user profile. An example of the few shot prompting for this was:
 ```python
 Job Title: Data Scientist
@@ -42,15 +58,28 @@ Applicant Profile: Experienced in Python, R, and ML models.
 Interview Question: Tell me about a machine learning project you are proud of.
 Optimal Answer: I developed a predictive model using Python and scikit-learn to forecast customer churn, achieving 85% accuracy by carefully preprocessing the data and tuning hyperparameters.
 ```
-Finetuning Tasks: As you did for the second project check in,
-clearly define the training data you used, making sure to note any
-modifications you made to existing datasets in terms of combining
-or reformatting them. Make sure to also describe how you
-established a training, validation, and testing split of your data
-(e.g., report the proportion and random seed you used and/or if
-your dataset had a built-in testing split)
 ## Methodology

 as well as technical answers, which was necessary for the interview questions. I used the job postings dataset with few-shot prompting to create the user profile
 dataset. For each job posting in the dataset, I had the model create a 'great', 'mediocre', and 'bad' user profile. An example of the few shot prompting for this was:
+```python
+Job_Title: Software Engineer
+Profile_Type: Great
+Applicant_Profile:
+Education: Master's in Computer Science
+Experience: 5 years building backend systems
+Skills: Python, SQL, Git, CI/CD
+Certifications: AWS Developer
+Training: Agile methodology
+Additional Qualifications: Mentorship experience
+```
+Due to the long user profiles that were generated, the csv this created was over 2 GB, which is too large for excel to handle.
+I used a python selector to randomly choose 5000 rows. With my new subset dataset, I used the Llama-3.2-1B-Instruct model again to create the interview/answer
+data. For each job posting/user profile I had the model create an interview question based on the job description, then an optimal answer to the question based on the
+user profile. An example of a few shot prompt I used is below.
 ```python
 Job Title: Data Scientist
 Interview Question: Tell me about a machine learning project you are proud of.
 Optimal Answer: I developed a predictive model using Python and scikit-learn to forecast customer churn, achieving 85% accuracy by carefully preprocessing the data and tuning hyperparameters.
 ```
+After creating this dataset, I uploaded it to my project notebook. Then, I modified the dataset to reformat it and make it easier to train. I modified the dataset
+by creating an 'Instruct' column with each row's job title. description, applicant profile, and the prompt 'Generate a relevant interview question and
+provide an optimal answer using the information from this applicant's profile. Interview Question and Optimal Answer:'.  Then I combined the interview question/ optimal answer
+into one column labeled 'Answer'.
+I established a training, validation, and testing split of the data with the following lines:
+```python
+train_career, test_career = train_test_split(career, test_size=1000, random_state=42)
+career = train_career.sample(frac = 1, random_state = 42)
+train_size = int(len(career) * 0.8)
+train = career[:train_size]
+val = career[train_size:]
+train = Dataset.from_pandas(train)
+val = Dataset.from_pandas(val)
+train = train.map(lambda samples: tokenizer(samples['Instruct']), batched = True)
+val = val.map(lambda samples: tokenizer(samples['Instruct']), batched = True)
+```
 ## Methodology