overview for Barack

[–] Barack_Embalmer@lemmy.world 2 points 2 months ago

"Crust" makes it sound like superfluous detritus. It's cornicione! Pizza is mostly bread, so if the bread is bad then it's not worth eating.

Neapolitan pizza has a high hydration dough cooked at very high temp, resulting in a delightfully light cornicione filled with large air pockets. The bread is delicious enough to enjoy on its own, which is why it only needs simple toppings like uncooked San Marzano tomato and a few shreds of mozarella. IMO Italian cuisine excels at allowing high quality produce speak for themselves through its simplicity and elegance. What they're shitting out at Papa Johns and whatever is an abomination.

Close call in c/lemmyshitpost@lemmy.world

[–] Barack_Embalmer@lemmy.world 3 points 2 months ago

I thought she was a patron at a male strip club, stuffing dollar bills into an old man's pants.

How can everything be captured in simple data like people's exact voice noise? in c/asklemmy@lemmy.world

[–] Barack_Embalmer@lemmy.world 2 points 3 months ago

Everything about the exact timbre of your voice is captured in the waveform that represents it. To the extent that the sampling rate and bit depth are good enough to mimic your actual voice without introducing digital artefacts (something analogous to a pixelated image) that's all it takes to reproduce any sound with arbitrary precision.

Timbre is the result of having a specific set of frequencies playing simultaneously, that is characteristic of the specific shape and material properties of the object vibrating (be it a guitar string, drum skin, or vocal chords).

As for how multiple frequencies can "exist" simultaneously at a single instant in time, you might want to read up on Fourier's theorem and watch 3Blue1Brown's brilliant series on differential equations that explores Fourier series https://www.youtube.com/watch?v=spUNpyF58BY

How can everything be captured in simple data like people's exact voice noise? in c/asklemmy@lemmy.world

[–] Barack_Embalmer@lemmy.world 5 points 3 months ago

Yes digital media, and computers in general, are miracles of science and engineering. Is there some reason digital audio in particular inspires you in this way, as opposed to digital images?

How can everything be captured in simple data like people's exact voice noise? in c/asklemmy@lemmy.world

[–] Barack_Embalmer@lemmy.world 8 points 3 months ago (9 children)

Long list of numbers in sequence. Each represents how far away from equilibrium the speaker cone should be, at each point in time, as it vibrates back and forth.

My Little Résumé rulé in c/196@lemmy.blahaj.zone

[–] Barack_Embalmer@lemmy.world 8 points 3 months ago

It's Joseph Redmon, creator of the YOLO object detection neural net architecture, which is very widely used.

Forgiveness in c/comicstrips@lemmy.world

[–] Barack_Embalmer@lemmy.world 1 points 8 months ago

I'm reminded of the original #RIPBOZO guy.

Microsoft in their infinite wisdom has replaced the Hide Desktop icon with Copilot. in c/technology@lemmy.world

[–] Barack_Embalmer@lemmy.world 1 points 8 months ago (1 children)

That's pretty disingenuous - it's one of the many reasons that comprise a pattern of behavior whereby Microsoft makes Windows worse at each iteration. More bloat, more spying, more locked-down for user "security". And for what? The dubious benefit of being "compatible" with other shitheel software providers like Adobe who use their monopoly power to stranglehold the corporate and professional media sectors? Toeachizown but IDK how anyone can use Windows by choice. The small amount I have to use it at work is torture enough.

Company disables AI after bot starts swearing at customer, calls itself the ‘worst delivery firm in the world’ in c/humor@lemmy.world

[–] Barack_Embalmer@lemmy.world 5 points 9 months ago

"Wait a minute... these aren't the feet pics I ordered!"

Piers Morgan gets owned in c/lemmyshitpost@lemmy.world

[–] Barack_Embalmer@lemmy.world 17 points 9 months ago

He must have meant the first 5 digits of the theorem expressed as some kind of Godel numbering. I mean there's no way he's a complete moronic cunt, right?

The Ubuntu Bros! in c/linuxmemes@lemmy.world

[–] Barack_Embalmer@lemmy.world 15 points 9 months ago (6 children)

I have used Ubuntu as the daily driver for the last 10 years, because support and tools are widespread and easy, and I don't need any extra pain in my life. Drivers are mostly present and working upon a clean install, and in the one case where the touchpad wasn't recognized, it was super easy to find an ubuntu forum post containing a 1-line command to fix it. But everybody says i should hate it and use Mint instead.

I'm open to give it a go, but in general, will most of the tutorials and fixes you find for Ubuntu also work with Mint?

OpenAI claims The New York Times tricked ChatGPT into copying its articles in c/technology@lemmy.world

[–] Barack_Embalmer@lemmy.world 1 points 10 months ago

We tend to think of these models as agents or persons with a right to information. They “learn like we do” after all.

This is again a similar philosophical tangent that's not germane to the issue at hand (albeit an interesting one).

I think you’ll see that if you only feed an LLM art or text from only one artist you will find that most of the output of the LLM is clearly copyright infringement if you tried to use it commercially.

This is not a feasible proposition in any practical sense. LLMs are necessarily trained on VAST datasets that comprise all kinds of text. The only type of network that could be trained on only one artist's corpus is a tiny pedagogical tool like Karpathy's minGPT https://github.com/karpathy/minGPT, trained solely on the works of Shakespeare. But this is not a "Large" language model, it's a teaching exercise for ML students. One artist's work could never practically train a network that could be considered "Large" in the sense of LLMs. So it's pointless to prevaricate on a contrived scenario like that.

In more practical terms, it's not controversial to state that deep networks with lots of degrees of freedom are capable of overfitting and memorizing training data. However, if they have other additional capabilities besides memorization then this may be considered an acceptable price to pay for those additional capabilities. It's trivial to demonstrate that chatbots can perform novel tasks, like writing a rap song about Spongebob going to the moon on a rocket powered by ice cream - which is surely not existent in any training data, yet any contemporary chatbot is able to produce.

As far as science and progress, I don’t think that’s hampered by the view that these companies are clearly infringing on copyright.

As an example, one open research question concerns the scaling relationships of network performance as dataset size increases. In this sense, any attempt to restrict the pool of available training data hampers our ability to probe this question. You may decide that this is worth it to prioritize the sanctity of copyright law, but you can't pretend that it's not impeding that particular research question.

As far as “it’s on the internet, it’s fair game”. I don’t agree. In Western countries your works are still protected by copyright. Most of us do give away those rights when we post on most platforms, but only to one entity, not anyone/ any company who can read or has internet access.

I wasn't making a claim about law, but about ethics. I believe it should be fair game, perhaps not for private profiteering, but for research. Also this says nothing of adversary nations that don't respect our copyright principles, but that's a whole can of worms.

We can’t just give up all our works and all our ideas to a handful of companies to copy for profit just because they can read and view them and feed them en masse into their expensive emulating machines.

As already stated, that's where I was in agreement with you - It SHOULDN'T be given up to a handful of companies. But instead it SHOULD be given up to public research institutes for the furtherance of science. And whatever you don't want to be included you should refrain from posting. (Or perhaps, if this research were undertaken according to transparent FOSS principles, the curated datasets would be public and open, and you could submit the relevant GDPR requests to get your personal information expunged if you wanted.)

Your whole response is framed in terms of LLMs being purely a product for commercial entities, who shadily exaggerate the learning capabilities of their systems, and couches the topic as a "people vs. corpos" battle. But web-scraped datasets (such as Imagenet) have been powering deep learning research for over a decade, long before AI captured the public imagination the way it has currently, and long before it became a big money spinner. This view neglects that language modelling, image recognition, speech transcription, etc. are also ongoing fields of academic research. Instead of vainly trying to cram the cat back into the bag, and throttling research, we should be embracing the use of publicly available data, with legislation that ensures it's used for public benefit.