Transformers
Safetensors
English
ocbyram commited on
Commit
74b3866
·
verified ·
1 Parent(s): 55ecb83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -7
README.md CHANGED
@@ -34,6 +34,22 @@ For both generated datasets, I used the Llama-3.2-1B-Instruct model, due to its
34
  as well as technical answers, which was necessary for the interview questions. I used the job postings dataset with few-shot prompting to create the user profile
35
  dataset. For each job posting in the dataset, I had the model create a 'great', 'mediocre', and 'bad' user profile. An example of the few shot prompting for this was:
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ```python
39
  Job Title: Data Scientist
@@ -42,15 +58,28 @@ Applicant Profile: Experienced in Python, R, and ML models.
42
  Interview Question: Tell me about a machine learning project you are proud of.
43
  Optimal Answer: I developed a predictive model using Python and scikit-learn to forecast customer churn, achieving 85% accuracy by carefully preprocessing the data and tuning hyperparameters.
44
  ```
 
 
 
 
45
 
46
- Finetuning Tasks: As you did for the second project check in,
47
- clearly define the training data you used, making sure to note any
48
- modifications you made to existing datasets in terms of combining
49
- or reformatting them. Make sure to also describe how you
50
- established a training, validation, and testing split of your data
51
- (e.g., report the proportion and random seed you used and/or if
52
- your dataset had a built-in testing split)
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ## Methodology
56
 
 
34
  as well as technical answers, which was necessary for the interview questions. I used the job postings dataset with few-shot prompting to create the user profile
35
  dataset. For each job posting in the dataset, I had the model create a 'great', 'mediocre', and 'bad' user profile. An example of the few shot prompting for this was:
36
 
37
+ ```python
38
+ Job_Title: Software Engineer
39
+ Profile_Type: Great
40
+ Applicant_Profile:
41
+ Education: Master's in Computer Science
42
+ Experience: 5 years building backend systems
43
+ Skills: Python, SQL, Git, CI/CD
44
+ Certifications: AWS Developer
45
+ Training: Agile methodology
46
+ Additional Qualifications: Mentorship experience
47
+ ```
48
+
49
+ Due to the long user profiles that were generated, the csv this created was over 2 GB, which is too large for excel to handle.
50
+ I used a python selector to randomly choose 5000 rows. With my new subset dataset, I used the Llama-3.2-1B-Instruct model again to create the interview/answer
51
+ data. For each job posting/user profile I had the model create an interview question based on the job description, then an optimal answer to the question based on the
52
+ user profile. An example of a few shot prompt I used is below.
53
 
54
  ```python
55
  Job Title: Data Scientist
 
58
  Interview Question: Tell me about a machine learning project you are proud of.
59
  Optimal Answer: I developed a predictive model using Python and scikit-learn to forecast customer churn, achieving 85% accuracy by carefully preprocessing the data and tuning hyperparameters.
60
  ```
61
+ After creating this dataset, I uploaded it to my project notebook. Then, I modified the dataset to reformat it and make it easier to train. I modified the dataset
62
+ by creating an 'Instruct' column with each row's job title. description, applicant profile, and the prompt 'Generate a relevant interview question and
63
+ provide an optimal answer using the information from this applicant's profile. Interview Question and Optimal Answer:'. Then I combined the interview question/ optimal answer
64
+ into one column labeled 'Answer'.
65
 
66
+ I established a training, validation, and testing split of the data with the following lines:
67
+
68
+ ```python
 
 
 
 
69
 
70
+ train_career, test_career = train_test_split(career, test_size=1000, random_state=42)
71
+
72
+ career = train_career.sample(frac = 1, random_state = 42)
73
+
74
+ train_size = int(len(career) * 0.8)
75
+ train = career[:train_size]
76
+ val = career[train_size:]
77
+ train = Dataset.from_pandas(train)
78
+ val = Dataset.from_pandas(val)
79
+
80
+ train = train.map(lambda samples: tokenizer(samples['Instruct']), batched = True)
81
+ val = val.map(lambda samples: tokenizer(samples['Instruct']), batched = True)
82
+ ```
83
 
84
  ## Methodology
85