MOTES Children's Word Embeddings (Ages 10β12, 100k vocabulary)
Word embeddings trained on children's creative writing from the Measurement of Thinking and Engagement in School (MOTES) study. These embeddings are used for semantic distance scoring of divergent thinking responses in creativity research.
Model Description
- Vocabulary: ~100,000 words
- Source corpus: Written responses from children ages 10β12 in the MOTES study
- Format: Gensim
KeyedVectors(.kv+.kv.vectors.npy) - Use case: Measuring originality of creative responses via cosine distance
How semantic distance scoring works
Given a prompt (e.g., "brick") and a creative response (e.g., "modern art sculpture"), originality is scored as the cosine distance between the embedding of the prompt word and the average embedding of the response words. Higher distance = more original.
These embeddings capture children's semantic associations, making them particularly suited for scoring children's creative responses, where adult-derived embeddings (like GloVe 840B) may not accurately reflect the semantic landscape.
Normalization
When used with the OCS Semantic Scoring system, raw cosine distances can be scaled to a 1β7 range. For the 200k variant, the calibration values are:
- min: 0.5033
- max: 0.8955
(The 100k variant uses similar but not identical scaling.)
Usage
from gensim.models import KeyedVectors
from huggingface_hub import hf_hub_download
# Download model files
kv_path = hf_hub_download("massivetexts/motes-embeddings-100k", "all_weighted_10-12_100k.kv")
npy_path = hf_hub_download("massivetexts/motes-embeddings-100k", "all_weighted_10-12_100k.kv.vectors.npy")
# Load model
model = KeyedVectors.load(kv_path, mmap='r')
# Check similarity
model.similarity("brick", "house")
Associated Research
This model was developed as part of the MOTES (Measurement of Thinking and Engagement in School) project by Peter Organisciak and colleagues. The MOTES study investigated creativity assessment in children, particularly automated scoring of divergent thinking tasks.
Related publications
- Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. Thinking Skills and Creativity, 49, 101356. https://doi.org/10.1016/j.tsc.2023.101356
- Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality with human raters and text-mining models: A psychometric comparison of methods. Psychology of Aesthetics, Creativity, and the Arts, 15(4), 645β663. https://doi.org/10.1037/aca0000319
Part of OCS (Open Creativity Scoring)
This model is part of the Open Creativity Scoring project. For LLM-based scoring (recommended for new research), see the ocsai Python package. For semantic distance scoring using this model, see the OCS Semantic Scoring HF Space.