in Troubleshooting

Regex Challenge

A friend asked for a regex that matches a paragraph that contains only upper-case text inside a nested hierarchy of tags. Some examples:

Matches:


<p class="abcdefg"><a href="1.htm"><span>HELLO THERE</span></a></p>
<p class="c8"><span class="c7">BY ERIC D. JAMES, MD</span></p>
<p style="border:1px solid red">HELLO DARLING</p>

Fail:


<p class="c8"><span class="c7">BY Eric James, MD</span></p>
<p style="border:1px solid red">Hello Darling</p>
<p class="abcdefg"><a href="1.htm"><span>HELLO THeRE</span></a></p>

I came up with the following expression:

/&lt;p[^&gt;]<em>&gt;(&lt;[^&gt;]</em>&gt;)<em>[^a-z&lt;]+(&lt;\/[^p][^&gt;]</em>&gt;)<em>&lt;\/p[^&gt;]</em>&gt;/

It doesn’t handle tags interspersed with text or nested paragraph tags.

Here’s a sample on Rubular.