XML, Violence, Nokogiri and Xpath

I love Xpath.

 It makes XML easy to use and easy to query.  Gone are the days of parsing things with a SaxParser unless you’re really hard up for control of you text.

Also, I love the Ruby Nokogiri Gem.  

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

- Nokogiri docs

But I do have to say that there is a lack of good examples and documentation for anything particularly advanced.  I found a working solution to my issue, but thought I’d paste here what I wanted to do versus what I ended up doing.

Given the following XML, 

1 2 3 4 5 6 7 8 9 10 11 12 
<Container>
<Set >
<RecommendedCoverSong>Hurt by NiN - Johnny Cash</RecommendedCoverSong>
<RecommendedOriginalSong>She Like Electric by Smoosh</RecommendedOriginalSong>
<RecommendedDuetSong>Portland by Jack White and Loretta Lynn</RecommendedDuetSong>
<RecommendedGroupSong>SoS by Abba</RecommendedGroupSong>
<CoverSong>Kangaroo by Big Star - This Mortal Coil</CoverSong>
<OriginalSong>Pick up the Change by Wilco</OriginalSong>
<DuetSong>I am the Cosmos by Pete Yorn and Scarlett Johansen</DuetSong>
<GroupSong>Kitties Never Rest by Rex or Regina</GroupSong>
</Set>
 </Container>

I’d like to grab two elements that include “Cover” in the tag, and then operate on each of them.

Nokogiri’s use of Xpath easily allows the first query expression like so: price_xml = doc_xml.xpath('Container/Set/*[contains(name(), "Cover")]')

I’ve selected all the elements (using *) in Set, and then used an Xpath Expression function:

contains, in order to specify that Adult must be in the name.  This returns two Nokogiri XML Nodes in Nodeset.

What I wanted to do was then select one of these elements based on a pattern in the tagname use my favorite tool, Xpath.

But I just couldn’t get Nokogiri to give it to me, and several solutions ending up selecting way more than the 1 element I wanted. (Because the nodes in the Nodeset still contain relationships with their parents)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
songtypes = [‘Cover’, ‘Original’, ‘Duet’, ‘Group’]
songtypes.each do |song|

node_xml = doc.xpath(‘Container/Set/*[contains(name(), “Cover”)]’)
#I wanted to be able to do the following
#
FavoriteCover = node_xml.xpath(’./*[contains(name(), “Recommended”)]’)
RegularCover = node_xml.xpath(’./*[not(contains(name(), “Recommended”))]’)

#or
FavoriteCover = node_xml.xpath(’*[contains(name(), “Recommended”)]’)
RegularCover = node_xml.xpath(’*[not(contains(name(), “Recommended”))]’)
#But instead I had to resort to a Rails solution

RegularCover = node_xml.find{ |node| node.name !~ /Recommended/ }
FavoriteCover = node_xml.find{ |node| node.name =~ /Recommended/ }
    
#Do something with the songs here

end

 

I’m cross posting this on StackOverflow as a question, just in case any Nokogiri Xpath enthusiasts want to recommend a solution that doesn’t resort to find()

 

posted : Sunday, January 8th, 2012

tags :