Uncategorized

How can I use Python to transform MongoDB’s bsondump into JSON?



So I have an enormous quantity of .bson from a MongoDB dump. I am using bsondump on the command line, piping the output as stdin to python. This successfully converts from BSON to ‘JSON’ but it is in fact a string, and seemingly not legal JSON.

For example an incoming line looks like this:

{ "_id" : ObjectId( "4d9b642b832a4c4fb2000000" ),
  "acted_at" : Date( 1302014955933 ),
  "created_at" : Date( 1302014955933 ),
  "updated_at" : Date( 1302014955933 ),
  "_platform_id" : 3,
  "guid" : 72106535190265857 }

Which I belive is Mongo Extended JSON.

When I read in such a line and do:

json_line = json.dumps(line)

I get:

"{ \"_id\" : ObjectId( \"4d9b642b832a4c4fb2000000\" ),
\"acted_at\" : Date( 1302014955933 ),
\"created_at\" : Date( 1302014955933 ),
\"updated_at\" : Date( 1302014955933 ),
\"_platform_id\" : 3,
\"guid\" : 72106535190265857 }\n"

Which is still <type 'str'>.

I have also tried

json_line = json.dumps(line, default=json_util.default)

(see pymongo json_util – spam detection prevents a third link )
Which seems to output the same as dumps above. loads gives an error:

json_line = json.loads(line, object_hook=json_util.object_hook)
ValueError: No JSON object could be decoded

So, how can I transform the string of TenGen JSON into parseable JSON?
(the end goal is to stream tab separated data to another database)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *