MOTES Children's Word Embeddings (Ages 10–12, 100k vocabulary)

Word embeddings trained on children's creative writing from the Measurement of Thinking and Engagement in School (MOTES) study. These embeddings are used for semantic distance scoring of divergent thinking responses in creativity research.

Model Description

Vocabulary: ~100,000 words
Source corpus: Written responses from children ages 10–12 in the MOTES study
Format: Gensim KeyedVectors (.kv + .kv.vectors.npy)
Use case: Measuring originality of creative responses via cosine distance

How semantic distance scoring works

Given a prompt (e.g., "brick") and a creative response (e.g., "modern art sculpture"), originality is scored as the cosine distance between the embedding of the prompt word and the average embedding of the response words. Higher distance = more original.

These embeddings capture children's semantic associations, making them particularly suited for scoring children's creative responses, where adult-derived embeddings (like GloVe 840B) may not accurately reflect the semantic landscape.

Normalization

When used with the OCS Semantic Scoring system, raw cosine distances can be scaled to a 1–7 range. For the 200k variant, the calibration values are:

min: 0.5033
max: 0.8955

(The 100k variant uses similar but not identical scaling.)

Usage

from gensim.models import KeyedVectors
from huggingface_hub import hf_hub_download

# Download model files
kv_path = hf_hub_download("massivetexts/motes-embeddings-100k", "all_weighted_10-12_100k.kv")
npy_path = hf_hub_download("massivetexts/motes-embeddings-100k", "all_weighted_10-12_100k.kv.vectors.npy")

# Load model
model = KeyedVectors.load(kv_path, mmap='r')

# Check similarity
model.similarity("brick", "house")

Associated Research

This model was developed as part of the MOTES (Measurement of Thinking and Engagement in School) project by Peter Organisciak and colleagues. The MOTES study investigated creativity assessment in children, particularly automated scoring of divergent thinking tasks.

Related publications

Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. Thinking Skills and Creativity, 49, 101356. https://doi.org/10.1016/j.tsc.2023.101356
Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality with human raters and text-mining models: A psychometric comparison of methods. Psychology of Aesthetics, Creativity, and the Arts, 15(4), 645–663. https://doi.org/10.1037/aca0000319

Part of OCS (Open Creativity Scoring)

This model is part of the Open Creativity Scoring project. For LLM-based scoring (recommended for new research), see the ocsai Python package. For semantic distance scoring using this model, see the OCS Semantic Scoring HF Space.

Downloads last month: -; Downloads are not tracked for this model. How to track

massivetexts
/

motes-embeddings-100k