this post was submitted on 11 Jan 2025
204 points (98.6% liked)
Data is Beautiful
1339 readers
267 users here now
Be respectful
founded 6 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.
i know that my NN internally uses semantic tokenization method.
i literally often seek the word roots when talking to somebody. it helps me focus.
Interesting paper, thanks for sharing