this post was submitted on 07 Oct 2023
104 points (98.1% liked)

Programming

17354 readers
349 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
top 15 comments
sorted by: hot top controversial new old
[–] Alexstarfire@lemmy.world 9 points 1 year ago

Am I the only one shocked to learn that to find something at the end of a string it starts at the beginning? Perhaps it's because of the simplicity of the example but I expected it to start at the end.

[–] cgtjsiwy@programming.dev 6 points 1 year ago (1 children)

Regular expressions are great and can always be matched in linear time with respect to the input string length.

The problem is that JS standard library RegExps aren't actually regular expressions, but rather a much broader language, which is impossible to implement efficiently. If RegExp switched to proper regular expressions, they would match much faster but supporting backreferences like /(.*)x\1/ would be impossible.

[–] Turun@feddit.de 14 points 1 year ago

If you insist on the definition as it is in formal language theory.

In practice regex is widely used to mean the pattern matching thing that also supports back references.

Wikipedia suggests using the term "regular expressions" for the language theory thing and "regex" for the programming language (PCRE) thing. I agree and would even go further and say that any time one wants to refer to the concept as it is used in formal language theory they should explicitly specify that they are talking about the theoretical concept, not the regex implementation as it is found in most programming languages.

[–] Turun@feddit.de 5 points 1 year ago

The visualization was great! The double loops jump out immediately and make it easy to recognize problematic expressions.

[–] recursive_recursion@programming.dev 4 points 1 year ago* (last edited 1 year ago) (1 children)

Although I haven't fully read this article
feel free to crosspost in:

Programming.dev - Regex

[–] philnash@programming.dev 4 points 1 year ago

Ah, I didn’t realise there was a regex channel here. Thanks!

[–] jeffhykin@lemm.ee 4 points 1 year ago (1 children)
[–] philnash@programming.dev 2 points 1 year ago

That’s brilliant!

[–] MonkderZweite@feddit.ch -2 points 1 year ago* (last edited 1 year ago) (2 children)

guide to the dangers of Javascript, no?

[–] philnash@programming.dev 6 points 1 year ago

While this article is about JavaScript specifically, these issues certainly exist in other regex engines too.

[–] sebsch@discuss.tchncs.de -4 points 1 year ago (1 children)

Is there one thing not screwed up in this language? I mean it's regex, there are so many good implementations for it.

[–] philnash@programming.dev 5 points 1 year ago (1 children)

JavaScript's regex engine isn't the only one to have these problems. There certainly are other implementations, like Re2 and Rust's implementation, that don't have this issue. But they also lack some of the features of the JS implementation too.

[–] sebsch@discuss.tchncs.de -2 points 1 year ago (1 children)

Ok thanks for the clarification.

I would argue, the gold standard of regex would be perlre or even re from python. I never heard one discouraging using them. Do you know sth I don't?

[–] burntsushi@programming.dev 3 points 1 year ago

Both Perl and Python use backtracking regex engines and are thus susceptible to similar problems as discussed in the OP.