[PATCH v2 2/3] dtc: Support character literals in cell lists

Fri Sep 9 07:20:16 EST 2011

On Wed, Sep 7, 2011 at 8:47 PM, David Gibson
<david at gibson.dropbear.id.au> wrote:
> On Wed, Sep 07, 2011 at 04:15:39PM -0700, Anton Staaf wrote:
>> With this patch the following property assignment:
>>
>>     property = <0x12345678 'a' '\r' 100>;
>>
>> is equivalent to:
>>
>>     property = <0x12345678 0x00000061 0x0000000D 0x00000064>
>>
>> Signed-off-by: Anton Staaf <robotboy at chromium.org>
>> Cc: Jon Loeliger <jdl at jdl.com>
>> Cc: David Gibson <david at gibson.dropbear.id.au>
>> ---
>>  Documentation/dts-format.txt |    2 +-
>>  Documentation/manual.txt     |    7 ++++---
>>  dtc-lexer.l                  |   14 ++++++++++++++
>>  dtc-parser.y                 |    4 ++++
>>  4 files changed, 23 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/dts-format.txt b/Documentation/dts-format.txt
>> index a655b87..eae8b76 100644
>> --- a/Documentation/dts-format.txt
>> +++ b/Documentation/dts-format.txt
>> @@ -33,7 +33,7 @@ Property values may be defined as an array of 32-bit integer cells, as
>>  NUL-terminated strings, as bytestrings or a combination of these.
>>
>>  * Arrays of cells are represented by angle brackets surrounding a
>> -  space separated list of C-style integers
>> +  space separated list of C-style integers or character literals.
>>
>>       e.g. interrupts = <17 0xc>;
>>
>> diff --git a/Documentation/manual.txt b/Documentation/manual.txt
>> index f8a8a7b..940fd3d 100644
>> --- a/Documentation/manual.txt
>> +++ b/Documentation/manual.txt
>> @@ -213,10 +213,11 @@ For example:
>>
>>  By default, all numeric values are hexadecimal.  Alternate bases
>>  may be specified using a prefix "d#" for decimal, "b#" for binary,
>> -and "o#" for octal.
>> +and "o#" for octal.  Character literals are supported using the C
>> +language character literal syntax of 'a'.
>
> Hrm, this paragraph is clearly referring to the old-style dts-v0
> syntax (hex is not the default, these days) - which may well mean the
> whole document is hopelessly out of date.  And I think the char
> literals should only be processed in dts-v1 mode.

Fair enough, would it make sense for me to just remove this
modification to manual.txt and leave the changes in place in
dts-format.txt?

>> -Strings support common escape sequences from C: "\n", "\t", "\r",
>> -"\(octal value)", "\x(hex value)".
>> +Strings and character literals support common escape sequences from C:
>> +"\n", "\t", "\r", "\(octal value)", "\x(hex value)".
>>
>>
>>  4.3) Labels and References
>> diff --git a/dtc-lexer.l b/dtc-lexer.l
>> index e866ea5..d4f9eaa 100644
>> --- a/dtc-lexer.l
>> +++ b/dtc-lexer.l
>> @@ -29,6 +29,8 @@ PROPNODECHAR        [a-zA-Z0-9,._+*#?@-]
>>  PATHCHAR     ({PROPNODECHAR}|[/])
>>  LABEL                [a-zA-Z_][a-zA-Z0-9_]*
>>  STRING               \"([^\\"]|\\.)*\"
>> +CHAR_LITERAL '[^\\']'
>> +CHAR_ESCAPED '\\([^']+|')'
>
> I'd prefer a single regex here, of '[^']+', and then check that it is
> indeed a single character when you process it.  An aside as to why...

Done, though I made it a slightly more complex regex that can also match '\''.

> So, when first working with lexer/parser generators, it is very
> tempting to make the token regexes as tight as possible, so that as
> soon as lex has recognized the token, you know it is 100%
> syntactically valid.  The problem with this is that when the user
> enters something that looks kinda like the token in question, but
> isn't quite right, then flex will re-interpret those characters as
> some other tokens.  That usually results in a deeply incomprehensible
> syntax error from the parser.

Good point.

> I think the better approach, therefore, is to make the token regexes
> as wide as they reasonably can be without ambiguity, then do extra
> validation once the token is recognized by flex.  That way you can
> produce a much more helpful "Bad format for token X" type error
> message.  Currently dtc is a bit mixed in the extent it does this, but
> that's the direction I'd prefer to move.

Done.

>>  WS           [[:space:]]
>>  COMMENT              "/*"([^*]|\*+[^*/])*\*+"/"
>>  LINECOMMENT  "//".*\n
>> @@ -109,6 +111,18 @@ static int pop_input_file(void);
>>                       return DT_LITERAL;
>>               }
>>
>> +<V1>{CHAR_LITERAL}   {
>> +                     yylval.byte = yytext[1];
>> +                     DPRINT("Character literal: %s\n", yytext);
>> +                     return DT_BYTE;
>> +             }
>> +
>> +<V1>{CHAR_ESCAPED}   {
>> +                     yylval.byte = get_escape_char_exact(yytext+1, yyleng-2);
>> +                     DPRINT("Character literal escaped: %s\n", yytext);
>
> So, I'd prefer the error handling to be done here - that is if there's
> a bad escape sequence or more than one character, report it here.  For
> now, a die() will do, although longer term I'd prefer a not
> immediately fatal message (allowing the rest of the file to be checked
> for errors) with an error triggered after the parse is complete.

I ended up (as you'll see shortly) doing this in dtc-parser.y and
using the print_error function there so I didn't have to just die out.
 So longer term can be now, instead of...

Thanks,
    Anton

> --
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                | _way_ _around_!
> http://www.ozlabs.org/~dgibson
>