We have Unicode these days: blåhaj
Programmer Humor
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
Unicode in filenames? Are you crazy?!
Okay that was /s to some extent but I gotta rant, I'm totally convinced that there's still new software today that completely trip over themselves when files or paths have non-ASCII characters, or sometimes even a space. Incompetence didn't go anywhere.
I still use underscores for filenames, basically muscle memory at this point
Spaces in file names will always be fiddly though. It'll work, but it'll still be wrong, because arguments are space separated, and having spaced file names totally messes with that.
I try to just always put files names or paths into quotes in CLI or tie it to a variable in programming. This way it also accepts spaces and knows how to separate it from arguments.
Yeah. It's a good idea to guard against it, but I would still never put spaces in filesnames that I myself choose.
Unicode in filenames can be a bad idea, since there are more than one way to achieve what looks like the same character. So matching patterns could fail if you think it's one way, but it's actually another representation in unicode.
Good point. Do filesystems use a normal form to at least prevent having two files with effectively the same name?
I should point out the flip side though, that there's no avoiding Unicode in filenames. Users in languages that don't use the Latin alphabet (such as Japanese, Chinese, Korean, Hebrew, Arabic, Greek and Russian, and the list could go on) can reasonably expect to be able to give a file a name they can read and understand with no extra effort. All the software woes that come with it - too bad, software needs to deal with it.
I'm not sure. A few years ago I remember that OpenBSD expected ASCII for files, but I think Linux expects utf-8. I could be wrong though.
I'm assuming Unicode anyway, and UTF-8 is by far the most natural because most files will be in ASCII. A "normal form" (see link above), you might think of it as a canonical form, is a way to check if two strings are equivalent, even if they encoded the text differently. Like the example mentioned on Wikipedia:
For example, the distinct Unicode strings "U+212B" (the angstrom sign "Å") and "U+00C5" (the Swedish letter "Å") are both expanded by NFD (or NFKD) into the sequence "U+0041 U+030A" (Latin letter "A" and combining ring above "°") which is then reduced by NFC (or NFKC) to "U+00C5" (the Swedish letter "Å").
Incompetence didn't go anywhere.
Now that's certainly true, but the beauty of open source software is that we can fix bugs when we encounter them.
I'm too lazy to memorize alt codes
Use a compose key
Why you torture blahaj?
Why are we sttill kink shaming?
Blahaj cannot speak, therefore Blahaj cannot give consent.
You don't necessarily need speech for consent since non-verbal/mute people exist.
blahaj.exe.tar.gz
blahaj.elf.tar.gz.part
blåhaj.squashfs
I feel like unicode in the filename is heavily against the spirit of using squashfs, or at least the ways I've seen it used.
Ok, what kind of monster names their executables .elf
?
Well, a.out doesn't make much sense these days.
Gotta move to .elf
Pi Pico SDK does. Well, the version for debugging symbols, anyway. Regular executable is .uf2.
I reserve .elf for executables for other platforms, like microcontroller firmware.
mv blahaj.elf.tar.gz.part ./rivendell
Speaking of which, it blew my mind when I discovered that .EXEs are just ~~zip files~~ compressed archives. Same goes for .DLLs, and a lot of other common Windows file extensions as well. (.DOC too, for example IIRC). They all open in your favorite archiver software (I like NanaZip; which is a fork of 7-Zip with a modern UI).
I don't think that's true for .exe or .dll files, but it's definitely true for .docx files and other Office files ending with x. Some .exe's are self-extracting archives or have other files embedded in them, so maybe that's what you've been seeing.
You are actually correct. They can contain archived files or resources that can be unpacked with an archive program (including on Linux btw), but they aren't just a zip file. That's why my Linux archive manager (ark I think) offer to open one, but won't execute it. They can see the extra content even if they can't execute the file as intended.
Thanks for the backup :)
Mate I saw the blind leading the blind and had to step in. You could have actually opened some exes on Linux as the other guy suggests. In fact I am surprised you never noticed your system presenting that option. It just isn't actual proof of what they said, even if it appears like it. In fact I am a bit lost how neither of you realized something weird was going on. On what planet would an executable format being a zip file make any sense? Exes actually can include several executable formats.
There are things like self extracting archives that make this all more confusing. They are basically an archive with an extraction program in the same file. Installer exes work in a similar way too. Not all exes can be extracted since not all of them contain secret hidden archives or extra resources.
There actually are tools to show you the contents of an executable file, and you could probably learn a lot by using one. They contain more than just a blob of machine code like one might assume. Often they contain data as well, and instructions and information on how to load the executable like what memory layout to use.
I am annoyed that people upvoted the other guy without double checking as well. Now we have more people walking around spreading misinformation just because of some guy on Lemmy. This is why things like climate change become contentious issues. People come to their own conclusions based on partial information, and since it appears to make sense without proper investigation it gets spread around like wildfire. It's only when you actually know what's going on at a deeper level that it becomes possible to spot the flaws in the reasoning.
Aren’t the x-suffixed files just an xml format?
It's a zip file that includes a bunch of things, including embedded images and a bunch of other junk, but yes - the most important and central files in the zip are XML-based.
Just because they open in 7-Zip or whatever doesn't mean they are just a zip file. There are several kinds of archives. EXEs are a special case as well. They aren't archives at all. Rather they can contain archives or extra content along with being an executable. One reason is self extracting archives. Here an archive is packaged with an extraction program as an exe all in one. The other case is exes that have extra resources like images, videos, graphics textures, etc. Either way it's an executable plus some extra stuff, not a zip archive. DLLs I am not sure about, but I suspect something similar is happening here.
Next time you should research stuff before posting it on Lemmy. Things are sometimes more complicated than they appear.
docx you are correct about though. Specifically it's a zip file that contains XML files and resources.
Edit: I actually found an article on self extracting archives, it's quite an interesting technology to be fair even if it causes confusion: https://en.m.wikipedia.org/wiki/Executable_compression
blahaj.zone
free him
I feel so compressed.
You don't need to tape archive it, it's one thing
Yeah but you can