Tuesday, April 5, 2011

Regular expression to use which matches text before .html and after /

With this string

http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html

I need to get sdf-as

with this

hellow-1/yo-sdf.html

I need yo-sdf

From stackoverflow
  • This should get you want you need:

    Regex re = new Regex(@"/([^/]*)\.html$");
    Match match = re.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
    Console.WriteLine(match.Groups[1].Value); //Or do whatever you want with the value
    

    This needs using System.Text.RegularExpressions; at the top of the file to work.

    robert : hey Matt S this work perfect many thanks :)
  • using System.Text.RegularExpressions;
    Regex pattern = new Regex(".*\/([a-z\-]+)\.html");
    Match match = pattern.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
    if (match.Success)
    {
        Console.WriteLine(match.Value);
    }
    else
    {
        Console.WriteLine("Not found :(");
    }
    
  • Try this:

    string url = "http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html";
    Match match = Regex.Match(url, @"/([^/]+)\.html$");
    if (match.Success)
    {
        string result = match.Groups[1].Value;
        Console.WriteLine(result);
    }
    

    Result:

    sdf-as
    

    However it would be a better idea to use the System.URI class to parse the string so that you correctly handle things like http://example.com/foo.html?redirect=bar.html.

  • This one makes the slash and dot parts optional, and allows the file to have any extension:

    new Regex(@"^(.*/)?(?<fileName>[^/]*?)(\.[^/.]*)?$", RegexOptions.ExplicitCapture);

    But I still prefer Substring(LastIndexOf(...)) because it is far more readable.

  • There are many ways to do this. The following uses lookarounds to match only the filename portion. It actually allows no / if such is the case:

    string[] urls = {
       @"http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html",
       @"hellow-1/yo-sdf.html",
       @"noslash.html",
       @"what-is/this.lol",
    };
    
    foreach (string url in urls) {
       Console.WriteLine("[" + Regex.Match(url, @"(?<=/|^)[^/]*(?=\.html$)") + "]");
    }
    

    This prints:

    [sdf-as]
    [yo-sdf]
    [noslash]
    []
    

    How the pattern works

    There are 3 parts:

    • (?<=/|^) : a positive lookbehind to assert that we're preceded by a slash /, or we're at the beginning of the string
    • [^/]* : match anything but slashes
    • (?=\.html$): a positive lookahead to assert that we're followed by ".html" (literally on the dot)

    References


    A non-regex alternative

    Knowing regex is good, and it can do wonderful things, but you should always know how to do basic string manipulations without it. Here's a non-regex solution:

    static String getFilename(String url, String ext) {
       if (url.EndsWith(ext)) {
         int k = url.LastIndexOf("/");
         return url.Substring(k + 1, url.Length - ext.Length - k - 1);
       } else {
         return "";
       }
    }
    

    Then you'd call it as:

    getFilename(url, ".html")
    

    API links


    Attachments

0 comments:

Post a Comment