Quantcast
Channel: Hacker News 100
Viewing all articles
Browse latest Browse all 5394

martinblech/xmltodict · GitHub

$
0
0

Comments:"martinblech/xmltodict · GitHub"

URL:https://github.com/martinblech/xmltodict


xmltodict

xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

>>>doc=xmltodict.parse("""... <mydocument has="an attribute">... <and>... <many>elements</many>... <many>more elements</many>... </and>... <plus a="complex">... element as well... </plus>... </mydocument>... """)>>>>>>doc['mydocument']['@has']u'an attribute'>>>doc['mydocument']['and']['many'][u'elements',u'more elements']>>>doc['mydocument']['plus']['@a']u'complex'>>>doc['mydocument']['plus']['#text']u'element as well'

It's very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:

>>>defhandle_artist(_,artist):...printartist['name']>>>>>>xmltodict.parse(GzipFile('discogs_artists.xml.gz'),...item_depth=2,item_callback=handle_artist)APerfectCircleFantômasKingCrimsonChrisPotter...

It can also be used from the command line to pipe objects to a script like this:

importsys,marshalwhileTrue:_,article=marshal.load(sys.stdin)printarticle['title']
$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ cat enwiki.dicts.gz | gunzip | script1.py$ cat enwiki.dicts.gz | gunzip | script2.py
...

You can also convert in the other direction, using the unparse() method:

>>>mydict={...'page':{...'title':'King Crimson',...'ns':0,...'revision':{...'id':547909091,...}...}...}>>>printunparse(mydict)<?xmlversion="1.0"encoding="utf-8"?><page><ns>0</ns><revision><id>547909091</id></revision><title>KingCrimson</title></page>

Ok, how do I get it?

You just need to

There is an official Fedora package for xmltodict. If you are on Fedora or RHEL, you can do:

$ sudo yum install python-xmltodict

Donate

If you love xmltodict, consider supporting the author on Gittip.


Viewing all articles
Browse latest Browse all 5394

Trending Articles