[ccan] help with gracefully dealing with alloc failure in a recursive function

Daniel Burke dan.p.burke at gmail.com
Fri Oct 7 11:45:16 EST 2011


You are correct, single pass only. The code is the TTXML library in CCAN.

I like your ideas.

> Does anyone want to contemplate the relationship between the size of an
XML document and the equivalent representation in data structures + data
that they point to?

It would be easy enough to craft an xml file that could swing that either
way. Most XML files I endure have obscene amounts of whitespace indenting
the nodes, so in general usage I'd say the structures + data approach would
win by a considerable margin. In an indenting free XML file it would be
close, with tag length (specifically the closing tag being wasted memory),
and attributes (with their optional values)nudging it either way. Lots of
small nodes would push the string representation of the xml document to win
out.

I like to think of "XML" and "efficient" as antipodes, otherwise I have
trouble getting out of bed in the morning :-)

-dan

On Fri, Oct 7, 2011 at 11:10 AM, Bart Grantham
<bart-ccan at bartgrantham.com>wrote:

> [resent to list]
>
>
> I have a crazy-talk idea inspired by your question.
>
> If your parser can take ownership of the lifetime of the XML data in memory
> then it might be possible to hack up the XML (by replacing the quotes at the
> end of strings with \0's, as an example) and then use it as a
> "pre-allocated" hunk of memory that already contains all your data; you just
> need to pull out the locations of this data (the attribs and tag data) and
> put them into pointers into the data structures that you're using to
> represent the XML tree.  It doesn't solve unwinding a failed alloc through a
> recursive call, but it makes the likelihood of an alloc failing as you'll
> only be alloc'ing the space to hold your XML node data structures which will
> only contain pointers and other scalars to describe the pointed-to data.
>
> Does anyone want to contemplate the relationship between the size of an XML
> document and the equivalent representation in data structures + data that
> they point to?  I suspect that with 4/8 byte pointers pointing to everything
> vs. the verbosity of XML syntax it probably comes out close to even.  I ask
> because I wonder if one could reason about the maximum amount of memory
> required to store an arbitrary XML document of a given size.
>
> BG
>
> On Thu, Oct 6, 2011 at 2:54 PM, Daniel Burke <dan.p.burke at gmail.com>wrote:
>
>> I'm wondering what a commonly acceptable method of handling this failure
>> would be, my Google-Fu's not giving me answers I like, so   I'm turning to
>> the collective wisdom of this list. I suspect my knowledge of other
>> languages is poisoning my thought process.
>>
>> So parsing XML in a recursive function, with a structure that contains the
>> relevant state of the task. My initial plan is to add a variable to the
>> structure named "failed", and if an alloc fails I set it, and then test this
>> after every function call that can fail, trying to bail out to the head
>> function ASAP, where I call the free function on the partial tree I've
>> created so far.
>>
>> This puts a lot of ugly checking code in what is presently on the clean
>> side of what I typically write. Most other languages I'd raise an exception
>> and deal with the failure once.
>>
>> I've a few existing Linux Kernel style Goto-Exceptions to keep all the
>> error code together, and not spread throughout the meat of the functions,
>> however my understanding is that it's a Bad Thing (tm) to goto across
>> functions, as depending on compiler/flags there's going to have to be some
>> stack twiddling, and while my inner assembly programmer says just store SI
>> in the data structure, every other bone in my body is telling me this is a
>> capitol offense.
>>
>> Should I bite the bullet and turn my pretty 1 page function into a 3 page
>> function with lots of checking, or is there a clever/easy way to quickly
>> bail?
>>
>>
>> regards,
>>
>> dan
>> --
>> "Within C++, there is a much smaller and cleaner language struggling to
>> get out"
>> --Bjarne Stroustrup
>>
>>
>> _______________________________________________
>> ccan mailing list
>> ccan at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/ccan
>>
>>
>
> _______________________________________________
> ccan mailing list
> ccan at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/ccan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/ccan/attachments/20111007/31ee0b16/attachment.html>


More information about the ccan mailing list