ocbyram
/

Interview_Prep_Help

Transformers

Safetensors

English

Model card Files Files and versions

xet

Community

ocbyram commited on Nov 29, 2025

Commit

21e6212

verified ·

1 Parent(s): 74b3866

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -18

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ on a train/test split. The bert scores specifically indicate that my model has a
 I was able to find a training dataset of job postings on Kaggle (Arshkon, 2023), under a project labeled ‘LinkedIn Job Postings 2023 Data Analysis’.
 The dataset used has ~15,000 jobs from LinkedIn. It includes the company, job title, and a description that includes necessary skills.
 This dataset has a variety of different jobs and descriptions, which allowed my LLM to be trained for a multitude of job descriptions that users may input.
-The other two datasets must be synthetically generated, due to the lack of available datasets.
 For both generated datasets, I used the Llama-3.2-1B-Instruct model, due to its ability to efficiently produce accurate natural language answers,
 as well as technical answers, which was necessary for the interview questions. I used the job postings dataset with few-shot prompting to create the user profile
 dataset. For each job posting in the dataset, I had the model create a 'great', 'mediocre', and 'bad' user profile. An example of the few shot prompting for this was:
@@ -58,28 +58,18 @@ Applicant Profile: Experienced in Python, R, and ML models.
 Interview Question: Tell me about a machine learning project you are proud of.
 Optimal Answer: I developed a predictive model using Python and scikit-learn to forecast customer churn, achieving 85% accuracy by carefully preprocessing the data and tuning hyperparameters.
 ```
-After creating this dataset, I uploaded it to my project notebook. Then, I modified the dataset to reformat it and make it easier to train. I modified the dataset
-by creating an 'Instruct' column with each row's job title. description, applicant profile, and the prompt 'Generate a relevant interview question and
 provide an optimal answer using the information from this applicant's profile. Interview Question and Optimal Answer:'.  Then I combined the interview question/ optimal answer
 into one column labeled 'Answer'.
-I established a training, validation, and testing split of the data with the following lines:
-```python
-train_career, test_career = train_test_split(career, test_size=1000, random_state=42)
-career = train_career.sample(frac = 1, random_state = 42)
-train_size = int(len(career) * 0.8)
-train = career[:train_size]
-val = career[train_size:]
-train = Dataset.from_pandas(train)
-val = Dataset.from_pandas(val)
-train = train.map(lambda samples: tokenizer(samples['Instruct']), batched = True)
-val = val.map(lambda samples: tokenizer(samples['Instruct']), batched = True)
-```
 ## Methodology

 I was able to find a training dataset of job postings on Kaggle (Arshkon, 2023), under a project labeled ‘LinkedIn Job Postings 2023 Data Analysis’.
 The dataset used has ~15,000 jobs from LinkedIn. It includes the company, job title, and a description that includes necessary skills.
 This dataset has a variety of different jobs and descriptions, which allowed my LLM to be trained for a multitude of job descriptions that users may input.
+I synthetically generated the other two datasets, due to the lack of available datasets.
 For both generated datasets, I used the Llama-3.2-1B-Instruct model, due to its ability to efficiently produce accurate natural language answers,
 as well as technical answers, which was necessary for the interview questions. I used the job postings dataset with few-shot prompting to create the user profile
 dataset. For each job posting in the dataset, I had the model create a 'great', 'mediocre', and 'bad' user profile. An example of the few shot prompting for this was:
 Interview Question: Tell me about a machine learning project you are proud of.
 Optimal Answer: I developed a predictive model using Python and scikit-learn to forecast customer churn, achieving 85% accuracy by carefully preprocessing the data and tuning hyperparameters.
 ```
+After creating this dataset, I uploaded it to my project notebook. Then, I modified the dataset to reformat it and make it easier to train. I created an 'Instruct' column with each row's job title,
+description, applicant profile, and the prompt 'Generate a relevant interview question and
 provide an optimal answer using the information from this applicant's profile. Interview Question and Optimal Answer:'.  Then I combined the interview question/ optimal answer
 into one column labeled 'Answer'.
+I established a training, validation, and testing split using scikit-learn's train_test_split function and pandas .sample() method for shuffling. The proportions are as follows:
+Training: 3,200 examples (64% of total)
+Validation: 800 examples (16% of total)
+Testing: 1,000 examples (20% of total)
+Random seed: 42
 ## Methodology