XML, Violence, Nokogiri and Xpath
I love Xpath.
It makes XML easy to use and easy to query. Gone are the days of parsing things with a SaxParser unless you’re really hard up for control of you text.
Also, I love the Ruby Nokogiri Gem.
XML is like violence - if it doesn’t solve your problems, you are not using enough of it.
- Nokogiri docs
But I do have to say that there is a lack of good examples and documentation for anything particularly advanced. I found a working solution to my issue, but thought I’d paste here what I wanted to do versus what I ended up doing.
Given the following XML,
1 2 3 4 5 6 7 8 9 10 11 12 | <Container> <Set > <RecommendedCoverSong>Hurt by NiN - Johnny Cash</RecommendedCoverSong> <RecommendedOriginalSong>She Like Electric by Smoosh</RecommendedOriginalSong> <RecommendedDuetSong>Portland by Jack White and Loretta Lynn</RecommendedDuetSong> <RecommendedGroupSong>SoS by Abba</RecommendedGroupSong> <CoverSong>Kangaroo by Big Star - This Mortal Coil</CoverSong> <OriginalSong>Pick up the Change by Wilco</OriginalSong> <DuetSong>I am the Cosmos by Pete Yorn and Scarlett Johansen</DuetSong> <GroupSong>Kitties Never Rest by Rex or Regina</GroupSong> </Set> </Container> |
I’d like to grab two elements that include “Cover” in the tag, and then operate on each of them.
Nokogiri’s use of Xpath easily allows the first query expression like so: price_xml = doc_xml.xpath('Container/Set/*[contains(name(), "Cover")]')
I’ve selected all the elements (using *) in Set, and then used an Xpath Expression function:
contains, in order to specify that Adult must be in the name. This returns two Nokogiri XML Nodes in Nodeset.
What I wanted to do was then select one of these elements based on a pattern in the tagname use my favorite tool, Xpath.
But I just couldn’t get Nokogiri to give it to me, and several solutions ending up selecting way more than the 1 element I wanted. (Because the nodes in the Nodeset still contain relationships with their parents)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | songtypes = [‘Cover’, ‘Original’, ‘Duet’, ‘Group’] songtypes.each do |song| node_xml = doc.xpath(‘Container/Set/*[contains(name(), “Cover”)]’) #I wanted to be able to do the following # FavoriteCover = node_xml.xpath(’./*[contains(name(), “Recommended”)]’) RegularCover = node_xml.xpath(’./*[not(contains(name(), “Recommended”))]’) #or FavoriteCover = node_xml.xpath(’*[contains(name(), “Recommended”)]’) RegularCover = node_xml.xpath(’*[not(contains(name(), “Recommended”))]’) #But instead I had to resort to a Rails solution RegularCover = node_xml.find{ |node| node.name !~ /Recommended/ } FavoriteCover = node_xml.find{ |node| node.name =~ /Recommended/ } #Do something with the songs here end |
I’m cross posting this on StackOverflow as a question, just in case any Nokogiri Xpath enthusiasts want to recommend a solution that doesn’t resort to find()
