Regex match open tags except xhtml self contained tags
A nice regex challenge!
To match open tags (i.e., tags that don't have a corresponding closing tag) in HTML, except for self-contained XHTML tags (e.g., <img src="...">
, <br>
, etc.), you can use the following regex pattern:
<[^/][^>]+>
Here's a breakdown of the pattern:
<
: Match the opening angle bracket.[^/]
: Match any character that is not a forward slash (/
). This ensures we don't match self-contained tags.[^>]+
: Match one or more characters that are not a greater-than sign (>
). This ensures we match the entire tag name and any attributes.
This pattern will match open tags like <p>
, <div>
, <span>
, etc., but not self-contained tags like <img src="...">
, <br>
, etc.
Here are some examples to demonstrate the pattern's behavior:
- Matches:
<p>
,<div>
,<span>
,<a rel="nofollow" target="_blank" href="...">
- Doesn't match:
<img src="...">
,<br>
,<hr>
, etc.
Note that this pattern does not account for tags with attributes that contain forward slashes (e.g., <a rel="nofollow" target="_blank" href="https://example.com">
). If you need to match those as well, you can modify the pattern to:
<[^/][^>]+(?: [^>]+)*>
This pattern uses a non-capturing group (?: [^>]+)*
to match zero or more occurrences of a space followed by one or more characters that are not a greater-than sign (>
). This allows the pattern to match tags with attributes that contain forward slashes.