Comments:"martinblech/xmltodict · GitHub"
URL:https://github.com/martinblech/xmltodict
xmltodict
xmltodict
is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":
>>>doc=xmltodict.parse("""... <mydocument has="an attribute">... <and>... <many>elements</many>... <many>more elements</many>... </and>... <plus a="complex">... element as well... </plus>... </mydocument>... """)>>>>>>doc['mydocument']['@has']u'an attribute'>>>doc['mydocument']['and']['many'][u'elements',u'more elements']>>>doc['mydocument']['plus']['@a']u'complex'>>>doc['mydocument']['plus']['#text']u'element as well'
It's very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:
>>>defhandle_artist(_,artist):...printartist['name']>>>>>>xmltodict.parse(GzipFile('discogs_artists.xml.gz'),...item_depth=2,item_callback=handle_artist)APerfectCircleFantômasKingCrimsonChrisPotter...
It can also be used from the command line to pipe objects to a script like this:
importsys,marshalwhileTrue:_,article=marshal.load(sys.stdin)printarticle['title']
$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...
Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:
$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | gzip > enwiki.dicts.gz
And you reuse the dicts with every script that needs them:
$ cat enwiki.dicts.gz | gunzip | script1.py$ cat enwiki.dicts.gz | gunzip | script2.py ...
You can also convert in the other direction, using the unparse()
method:
>>>mydict={...'page':{...'title':'King Crimson',...'ns':0,...'revision':{...'id':547909091,...}...}...}>>>printunparse(mydict)<?xmlversion="1.0"encoding="utf-8"?><page><ns>0</ns><revision><id>547909091</id></revision><title>KingCrimson</title></page>
Ok, how do I get it?
You just need to
There is an official Fedora package for xmltodict. If you are on Fedora or RHEL, you can do:
$ sudo yum install python-xmltodict
Donate
If you love xmltodict
, consider supporting the author on Gittip.