this post was submitted on 11 Jan 2025
204 points (98.6% liked)

Data is Beautiful

1339 readers
267 users here now

Be respectful

founded 6 months ago
MODERATORS
 

Cross posted from: Latin@lemm.ee

lingua latina pater linguarum dimidum est ๐Ÿ˜Ž

I hope it's okay for me to crosspost here.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Hackworth@lemmy.world 4 points 18 hours ago (2 children)

I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.

[โ€“] gandalf_der_12te@discuss.tchncs.de 2 points 11 hours ago* (last edited 11 hours ago)

i know that my NN internally uses semantic tokenization method.

i literally often seek the word roots when talking to somebody. it helps me focus.

[โ€“] fxomt@lemm.ee 2 points 18 hours ago

Interesting paper, thanks for sharing