[PATCH 4/4] DTC: Begin the path to sane literals and expressions.

Fri Oct 26 23:07:49 EST 2007

So, like, the other day David Gibson mumbled:
> 
> Ah... I think I see the source of our misunderstanding.  Sorry if I
> was unclear.  I'm not saying that the version token would be
> invisible to the parser, just that it would be recognized by the lexer
> first.

Ah! Right.  OK, I see what you are saying now.

> The nice thing about having a token, is that if necessary we can
> completely change the grammar for each version, without having to have
> tangled rules that have to generate yyerror()s in some circumstances
> depending on the version variable.  The alternate grammars can be
> encoded directly into the yacc rules:
> 	startsymbol : version0_file
> 		    | V1_TOKEN version1_file
> 		    | V2_TOKEN version2_file
> 		    ;

Hmmm...  Now that I see that your symbol is still in the grammar,
I can see this part as well.  OK.  I'll buy it.

> > > I'm also inclined to leave the syntax for bytestrings as it is, in
> > 
> > Why?  Why not be allowed to form up a series of expressions
> > that make up a byte string? Am I missing something obvious here?
> 
> Because part of the point of bytestrings is to provide representation
> for binary data.  For a MAC address, say
> 	[0x00 0x0a 0xe4 0x2c 0x23 0x1f]
> is way bulkier than
> 	[000ae42c231f]

No, I think you misuderstand what I was after.  I'm not after the
the latter [000ae4...].  In that case, there would be multiple
expressions, each no bigger than 8 bits wide:

    [ expr expr expr    expr  expr      expr ]
    [ 0x00   10 0x4  0x20+12 '0'+3  0x20 - 1 ]

or whatever seemed appropriate.  It would not be one giant value.

> And in bytestring context, I suspect having every expression result be
> truncated to bytesize will be way more of a gotcha than in cell
> context.

Which is why we run a semantic checking as well and warn on
values not fitting in container sizes.

> I suspect we can get the expression flexibility we want here by
> providing the right operators to act *on* bytestrings, rather than
> within bytestrings.

That too.  No problem.  I suspect some may be functional, though.
Haven't thought about that a bunch yet.  I just want to get
basis stuff in first.

> Hrm.  I think just exprval or intval would be better.  Actually
> probably intval, since last we spoke I though we were planning on
> having expressions of string and bytestring types as well.

Except I think we want more generalized than that.

> Incidentally, there's another problem here: we haven't solved the
> problem about having to allow property names with initial digits.

I know.

> That's a particular problem here, because although we can make
> literals scanned in preference to propnames of the same length, in
> this case
> 	0x1234..0xabcd
> Will be scanned as one huge propname.

I know.  White space is mandatory right now.

> This might work for you at the moment, if you've still got all the
> lexer states, but I was really hoping we could ditch most of them with
> the new literals.

Which is really why they are all still there.  Longer term,
I want to _quit_ supporting "version 0" and remove the cruft...

> But you haven't actually addressed my concern about this.  Actually
> it's worse that I said then, because
> 	<0x10000000 -999>
> is ambiguous.  Is it a single subtraction expression, or one literal
> cell followed by an expression cell with a unary '-'?

Gah.

Paren'ed expressions may be the thing to do.
How do you feel about comma separation?

Anyone else care to chime in?

> > > > +unsigned int dts_version = 0;

> Yeah, I figured this out after.  Youch, an even tighter and harder to
> follow coupling between lexer and parser execution order.  I can think
> of at least two better ways to do this.

I'm listening... :-)

> 1) handle d# b# etc. at the lexer lexel, with a regex like
> (d#{WS}*[0-9]+).  Strictly speaking that changes the language, but I
> don't think anyone's been insane enough to do something like "d#
> /*stupid comment*/ 999".  That would remove the whole ugly
> opt_cell_base tangle from the grammar.

That seems like it could work...

> 2) Have the lexer just pass up literals as strings, and let the parser
> do the conversion to integer, based on the grammatical context.  I
> think this is preferable because it has other advantages: we can do
> the distinction between 64-bit values for memreserve and 32-bit values
> for cell at the grammatical level.  It can also be used to handle the
> propname/literal ambiguity without lexer states (I had a patch a while
> back which removed the MEMRESERVE and CELLDATA lex states using this
> technique).

I'm not so keen on that approach, I don't think.

> > The same call to set_dts_version() as any other case.
> 
> Erm... which same call to set_dts_version()?  Surely not the one in
> the parser..

I'm clearly not understanding your point, I'm afraid.  There are
static default values here:

    /*
     * DTS sourcefile version.
     */
    unsigned int dts_version = 0;
    unsigned int expr_default_base = 10;

And there is a call to set_dts_version() made when any DTS file
is parsed, which happens before any -O option is even handled.

What am I missing?

jdl