A friend asked for a regex that matches a paragraph that contains only upper-case text inside a nested hierarchy of tags. Some examples:
Matches:
1 2 3 | <p class="abcdefg"><a href="1.htm"><span>HELLO THERE</span></a></p> <p class="c8"><span class="c7">BY ERIC D. JAMES, MD</span></p> <p style="border:1px solid red">HELLO DARLING</p> |
Fail:
1 2 3 | <p class="c8"><span class="c7">BY Eric James, MD</span></p> <p style="border:1px solid red">Hello Darling</p> <p class="abcdefg"><a href="1.htm"><span>HELLO THeRE</span></a></p> |
I came up with the following expression:
1 | /<p[^>]<em>>(<[^>]</em>>)<em>[^a-z<]+(<\/[^p][^>]</em>>)<em><\/p[^>]</em>>/ |
It doesn’t handle tags interspersed with text or nested paragraph tags.