regex - How do I match this pattern in R -


i have match first country name in pattern below. country names given in upper case letters. used following code matches matches countries.

'\\b[a-z]{2,}.\\b' 

eg: in pattern below, want united kingdom

x = "~ london, greater london ~ united kingdom;~ ottawa, ontario ~ canada;~,~ australia;~,~ poland;~,~ usa" 

this seems work:

regmatches(x, regexpr('\\b[a-z ]{2,}\\b', x)) # [1] "united kingdom" 

i added space make character set [a-z ]. note regexpr gets first match while gregexpr gets of them (similar sub vs gsub).

for more info, recommend official docs @ ?regexpr.


Comments