Problem with regex again



Let say that i have this to url's

if i go to the first url i have this regex


but url 2 is also vaild url with the regex? how to tell the regex that the first url is the url who should be vaild and not the second?

  • There are infinite solutions, what is the regex logic you want to build?

    Eran Betzalel08 сентября 2009, 12:59
  • Yup - trivial solution would be the regex ^site\.com/hello-world/test\.html$. It matches the first but not the second URL.

    MSalters08 сентября 2009, 13:39
  • И если быть педантичным, без такой схемы, как http:// или https:, это не URL.

    MSalters08 сентября 2009, 13:40
  • Чтобы ответить на этот вопрос, вы должны сообщить нам, почему действителен первый URL, а не второй.

    Adam Bellaire08 сентября 2009, 12:58

6 ответов


Of course the second string it is also valid against your regex:

sub-expression        result
^.*                   matches:   ""
/                     backtrack: ""
([a-z0-9,-]+)         matches:   "" 
/                     backtrack: ""
([a-z0-9,-]+).html$   matches:   ""


sub-expression        result
^[^/]+                matches:   ""
/                     matches:   ""
([a-z0-9,-]+)         matches:   "" 
/                     matches:   ""
([a-z0-9,-]+)\.html$  fails (which is the expected result)

So you should use:

  • That is what you seem to want - the second string should not match, in regex terms that is “the regex fails for this string”.

    Tomalak08 сентября 2009, 13:11
  • ^[^/]./([a-z0-9,-]+)/([a-z0-9,-]+).html$ fails?

    08 сентября 2009, 13:09

I think the problem is the use of the greedy match-all .* at the beginning of your expression.

Cheat a little:


For the first URL the .* part of the pattern matches "", but for the second URL it matches "".

If you don't want to allow more than one folder, you can disallow slash characters in the part of the pattern that matches the domain name:


(Note that I put a backslash before the period before the html extension. A period matches any character, while \. matches only a period.)

If you want to allow both URLs and use "hello-world/test" as folder for the second one, allow slashes in the folder part:


If you want to use "hello-world" as folder and "test/test" as page, allow slashes in the file name part:

  • @Frozzare: You specifially asked that the second url should not be valid… I added some alternatives in the answer.

    Guffa08 сентября 2009, 13:11
  • i want to allow and

    but the are to different pages.

    08 сентября 2009, 13:06
  • @Frozzare: I don’t understand what you want, you seem to contradict yourself over and over… I have given you alternatives both for matching only the first URL and for matching both URLs, something should match your requirements…

    Guffa08 сентября 2009, 18:03

Не решение, а всего лишь предложение: существует множество отличных инструментов, которые позволяют экспериментировать с регулярными выражениями и фактически помогают вам их писать.
Мне особенно нравится Expresso ; очевидно, также Регулятор очень хорош.


In the second case, .* is matching more than you would expect.

Perhaps replace it with the non-greedy quantifier:


.* matches "" in the second case. You have to be more specific for the domain part.