C++ Feature Whitelist/Blacklist

Thu Nov 3 11:02:27 AEDT 2016

Summary
=======

To sum up the discussion below so far for everyone who is not Patrick and
me:
    * it sounds like there is some discussion to be had on how strong a
prohibition should be.
    * it seems that Patrick and I have general agreement on (just not
necessarily exact
      strength of rule):
        - Variables of class type with static storage duration (including
constants),
          plain old data types (POD) are allowed
        - Implicit conversion constructors and operators
        - Multiple Inheritance (I made a typo that I think got construed
that I did not want any
          multiple inheritance, including interfaces, when I wanted to only
allow multiple
          inheritance of interfaces)
    * issues still under discussion
        - Non-const-non-r-value references
        - Exceptions

This is not to be construed that any of the above items are closed or
discussion has concluded; this is only intended as a convenience for others
who are following the conversation that I think Patrick and I had come to
an agreement on something or that we are continuing to discuss it. I
certainly have no formal authority in the project and only wish to make it
as easy as possible for people to follow an important discussion. I would
encourage anyone who feels strongly about anything discussed so far or not
discussed so far to share his or her view points on the matter.

Intro
------

With regards to what you said about philosophy about style guides and the
Google style guide in particular; I do not disagree.

I strongly feel that if we simply leap to a style guide targeting a
> monolithic N-MLOC codebase, we are taking on solutions that are not
> relevant to our own problems.

I agree in principle; we should not blindly reuse a style guide that was
intended for a completely different type of project. However, I do think
that some sort of style guide is essential. The intention of the project is
to have as many people as possible contributing; thus it should be easy for
them to contribute and not get disheartened; I think a style guide is
essential in creating this environment because it sets up expectations for
what code should look like. Code review is inevitably the place where a
newcomers learn a great deal of how a project works and how to contribute
happens, but having a set of basic expectations reduces the pain on all
sides for the first few code reviews; this is important for not burning out
first time contributors.

My expectation is that OpenBMC is a
> collection of small, self-contained repositories on the order of 5 -
> 50kLOC each.  The dependencies between repositories (interfaces) are
> purposefully segmented via the DBus interfaces, which represent a very
> different problem from a monolitic codebase where a new dependency is
> simply an #include away.  With small repositories, the code itself can
> "fit in the head" of a maintainer or active contributor fairly easily[1].

I do not want to go too far into this because I do not want to have an
extended conversation about microservices as it would be off topic;
however, I will say a microservice architecture does not protect against
bad coding practices and is not intended to do so. Being able to come up
with a good mental model for a project comes solely from good abstractions
and thus clean code. The goal should not be to make it possible for a small
number of people to specialize and understand a handful of repos, but to
allow for anyone to jump in and understand any part of the project with
only a modicum of effort; consistency is required to achieve this and
consistency is built on rules.

I do suspect that the majority of our "rules" will evolve around the
> DBus interfaces, which are fairly unique to our project.

Again, I agree in principle, but I would caution against relying too
heavily on concepts particular to a library or framework. I see nothing
wrong with a microservice architecture, but I do not really see that as
unique and would hope to avoid making it overly unique. In the sense that
these ideas are not new and we should try our best to keep it that way.
Also, yes I fully expect the rules by which we govern the project to change
over time, but that does not mean we should not make a style guide.

My typical philosophy regarding coding practices tends to be fairly
> pragmatic.  I don't tend to see practices as "black" and "white" but
> more along a scale of how much justification you need to have to use a
> practice (more on this in the specifics, maybe).

I agree entirely. I made a response to this effect to the email sent by
Brad (I forwarded the contents of his email to the list in my email this
morning). I used the terms whitelist and blacklist entirely as a convention
to refer to what we are currently discussing: the stronger rules that will
require more discussion and likely solicit stronger opinions; although
these rules are stronger than the other rules; I do not expect them to be
without exception (except for maybe exceptions ;-), I will get to that
later).

Re: General Code Philosophy
---------------------------------------

    1) Conciseness:
>         - Did you use far more lines of code to solve the problem than
>           necessary?  Those are lines we have to maintain.

I tend to disagree that this should be the most important thing. I think
"easy to read" is far more important than conciseness; it is true that it
may create more lines of code to maintain, but if they are easy to
understand, they will take less effort to maintain. Still, conciseness is
something that does have value; I would agree that if two solutions are
equally easy to read than the shorter one is probably better assuming there
are no other detractors to it, but I would also argue that it is actually
easier to read in that case.

    2) Modern, widely-accepted practices:
>         - This is already spelled out in the 'contributing' doc, having
>           priority over any specific style statements. [2]

I certainly cannot argue with that. Keeping concepts and practices
generally consistent with the greater programming community has a lot of
value.

    3) Performance:
>         - Favoring in order { Code size, CPU utilization, memory size }
>           due to the unique concerns of the OpenBMC project and current
>           technologies used.
>         - Macro-level optimization that does not violate conciseness and
>           clarity:
>             - Are the algorithms used of the appropriate algorithmic
>               complexity?
>             - Are you avoiding practices that would contribute to large
>               generated code sizes?

This is a pretty tough one; although no one can dispute that performance at
some level is always critically important especially when resources are
scarce, it is also true that writing good code in any context is always a
balance of performance and development cost.

Quantifying "code smell" into a set of non-ambiguous rules is a hard
> problem

Absolutely, and that is not my aim. Style guides are tools to make it
easier to have conversations about code. They set the base expectations and
provide a structure for people to discuss code in the context of. If one
disagrees with a practice, she has a document that she can make arguments
against (it is hard to argue with a loose set of practices and thus hard to
change them).

I would
> prefer we start with industry recognized experts opinion[3] and then
> document deviations specific to OpenBMC as they come up.

I agree with trying to follow generally accepted practices and it seems
like the CppCoreGuidelines seem to encode that pretty well, but it is not
really the same as a style guide and I see no reason why that precludes us
from writing one. I would, however, caution against citing the opinions of
"industry recognized experts" as sole justification for something. If the
opinion itself is cited with good reasoning/good supported evidence that's
fine, but it is important not to discount someone because he disagrees with
someone important. I am not saying you did that anywhere, all of the
arguments you made below either make clear arguments that stand on their
own or cite well thought out self-contained arguments, just something to
keep in mind.

Re: Variables of class type with static storage duration (including
constants)
-----------------------------------------------------------------------------------------------------

This proposal is a key example on why it is hard to quantify these types
> of rules.  Everyone agrees with a general principle of "reducing
> globals", but as spelled out this proposal would prohibit
> initializer-list construction, std::string, and the pattern within our
> IPMI plugins of "register on load".  These are all reasonable exceptions
> that require little justification in my mind.

It sounds like we agree in principle. My proposal would prohibit
std::string constants, but it would not prevent their general use.
Initializer list construction would not be prohibited for c-style arrays
and not its use in general; you would not be able to use it for
constructing variables of class type with static storage, but it outright
prohibits that anyway. As written currently, yes the ipmid plugin
registration code would not conform, but it would be possible to write it
in a way that does conform; this is actually something I am working on with
my changes to ipmid, but that is a discussion for elsewhere. (I would like
to be clear that this does not discourage global pointers, but in general
these also would require a good justification).

Re: Implicit conversion constructors and operators
-------------------------------------------------------------------

This would prohibit 'unique_ptr<T>::operator bool()' and
> 'string::string(const char*)', which are part of the STL and exist
> for concise syntax.  I do agree with this in most cases but there are
> cases where they both obvious and crisper.  A non-obvious conversion
> should be prohibited.

Maybe I should have called out more clearly that I do not plan on
prohibiting widely used parts of the styleguide that do these types of
things. In anycase, the smart pointers are special in that they maintain
the API that exists for raw pointers and it is common practice for people
to check for a nullptr by simply treating it as a boolean value.
Nevertheless, the vast majority of custom APIs should not use this feature,
and you seem to agree with this point in general anyway.

Re: Multiple Inheritance
--------------------------------

Composition is fairly common practice in C++ which uses multiple
> inheritance.  The sdbusplus bindings, as an example, have a specific
> template to aid in the composition of N dbus interfaces into a single
> C++ class. [4]

Your argument seems to be arguing for allowing multiple inheritance *of
interfaces*, which I meant to allow. I said "unless at most one of the
parent classes is an interface i.e. only has pure virtual methods", it
should have read "unless at most one of the parent classes is *not* an
interface i.e. only has pure virtual methods". You can inherit from as many
interfaces as you want, but only one (or no) non-interface. Technically,
your sdbusplus bindings are not interfaces, but I could see an exception
for that, because they are fairly special constructs. *It quacks like an
interface*, from the standpoint of an implementor. Also, the CCG sections
you cited agree with this standpoint. C.135 specifically calls out
interfaces and C.136 seemed to want to discourage the behaviour in the
reasoning section.

Re: Non-const-non-r-value references
---------------------------------------------------

I find this especially unnecessary, not recommended anywhere outside of
> the GSG, and having a crippling effect to the language.  C++ is not a
> procedural / OO programming language; it is also a functional
> programming language.  The bulk of the STL is to facilitate a
> functional programming style and many of the C++11 features (lambdas,
> constexpr functions, etc.) take large inspiration from FP.  Eliminating
> l-value reference makes FP techniques exceptionally harder and
> syntactically clumsier.

Again, I explicitly allowed use where necessary to maintain consistency our
utilize the STL. But in general, const references, r-value references, and
pointers together fulfill the same purposes and make it easier to
understand what the code is doing. Const reference implies nothing will
happen to the data, it will not (outside of exceptional circumstances)
store that reference and will not modify it; basically it stays in the
function. No special semantics are required when calling the function
because there is nothing to consider. R-value reference implies that the
data will be consumed, you do not own it anymore and it does it in a
beautifully explicit way, std::move, which means that you know the data
does not belong to you anymore when you call the function. Providing a
pointer requires you to either be working with a pointer which is clear
because it has different semantics, or you have to pass a pointer
explicitly; either way, you know that the function is free to edit what you
passed in; it should not store it because it does not own it (otherwise use
a smart pointer), but is free to edit it in the context of the function
call. I do not see it standing in the way of FP at all, in FP you are
usually dealing with immutable data types which you should use as much as
possible i.e. const references, in the case that is expensive you have
r-value references and pointers. Although, the CCG does say it is allowed,
it admits that non-const references can lead to bugs see the note under
F.17.

A very common recommendation in "Exceptional Modern C++" is to define a
> function like:
>     template <typename T> foo(T&&)
> This function operates on both l-value and r-value references
> ("universal references" as Meyers calls them).

I think I am missing something here. l-value and r-value references are two
different things and are treated differently by the compiler. I tried the
following:

int add(int&& l, int&& r) {
  return l + r;
}

int main() {
  int l = 1;
  int r = 2;
  std::cout << "add(1, 2) = " << add(l, r) << std::endl;
  return 0;
}

and got the following compile error:

test.cc:10:42: error: cannot bind ‘int’ lvalue to ‘int&&’
   std::cout << "add(1, 2) = " << add(l, r) << std::endl;

which is precisely what I expected to happen. I think I am missing part of
your example.

Re: Exceptions
--------------------

This was the "old way" in C++ and the standards body specifically
> deprecated what you are suggesting.  I'm sure you could read their
> rationale.

Sure, I would be happy to. I tried googling around and the only recent
update I could find relating to how the compiler treats exceptions was the
addition of noexcept. Could you share this?

In GCC, at least, function attributes can only be defined as part of
> declaration and not definition.  That means you would have to forward
> declare every function that returns your error type.

That's true. Clang allows you to put the attribute on the type definition,
I forgot gcc does not allow that. Still, if the function was part of a
public interface, you would have to forward declare anyway. In anycase, my
point was that there are alternatives to exceptions where the compiler will
actually bug you if you ignore it.

In every large project I have been involved with, an integer quickly
> becomes insufficient to express "an error".  The projects all make their
> own "error object".  According to Titus' talk, Google has one of these
> internally as well.  So that means we're now talking about passing
> around a heavy object by value or a pointer to one (which is always at
> risk of leaking)?  What does this error look like?

Also true, hypothetically we would use some sort of error type. Probably an
error enum and an optional string message. Google has an open source one we
could use, or we could write our own. In any case, it does not need to be
heavy weight, as long as the field list is small the object will be small.
For user defined types that fit inside of a few machine types, gcc and
clang will frequently pass them via register instead of stack. In any case,
I would expect the error object to be returned so it would be subject to
return value optimization.

Having a error return type likely eliminates any RVO and again plays a
> huge diminishing role in using functional programming methods.  The two
> approaches for output variables become:
>     1) Fill-in out parameter reference.  [anti-FP]
>     2) Return a tuple<error, real_return>.
>

The approach we use in this case is called StatusOr<T>; it wraps the type
being returned and the associated Status object (what we call the error
object). This is actually a really common pattern outside of C++ and
parallels the error monad type found in a lot of functional programming
languages (we do not actually support the fmap operation, but the usage
pattern is semantically equivalent).

> >         - Exceptions are not typed
>
> Exceptions are strongly typed in the exact same type system as any
> reference.  I don't know what this statement means.

Sorry, what I meant by this is exceptions are unchecked in C++ so you can
throw any type; hence, there is no way to enforce what kinds of exceptions
are thrown and that those exceptions are caught in functions that do not
forward them as in Java.

> Again this is the "lowest common denominator" argument, which I do not

accept.  We could make the same argument with our legacy codebase, which

I can assure you in this problem domain is much bigger than yours.  We

are making a concerted effort to not directly port any of our code we

are contributing to OpenBMC but to make sure it is modernized and

follows the practices with the rest of the codebase.

> Any code you are planning to port would either need to go though the

same vetting process as a clean contribution (for acceptance into an

openbmc tree) or would be maintained in your own repositories.  If it is

something that is maintained in your own repositories, the only place

exceptions could be introduced is when you call base OpenBMC utilities,

like sdbusplus.  It would not be unreasonable to "catch" all exceptions

at the API level into base utilities and turn them into a return code.

(And in fact, I highly suspect you already have these abstracted out due

to your Google Mock test framework).

> Since the majority of the repositories have self-contained programs that

expose DBus interfaces, what those programs do internally has no impact

on your ported programs that are interacting with those DBus interfaces.

DBus itself already has a DBus-Error type, which is distinct from method

return values, that you are free to turn into a return code instead of

an exception.

Fair enough, this is admittedly not a great argument, but in the case of an
even split, it is of minor benefit.

CppCoreGuidelines C.44 and C.50 cover this sufficiently?  Some of your
> proposal seems to be a fallout of not allowing exceptions, in my
> opinion.  We should always use RAII (E.6) in any case.

Fair enough.

Re: Virtual Functions
----------------------------

I was not proposing an elimination or prohibition on virtual functions,
> but a "justification required".  As in, you need to be prepared to
> answer how this is better and why compile-time polymorphism would be
> insufficient.
>
> CppCoreGuidelines C.10 suggests effectively the same thing: "You need a
> reason for using a hierarchy."

I would agree that class hierarchies should be well designed and thought
out, but I could say the same thing about every bit of code that is
written, especially when it is part of the core structure or the externally
exposed API. I would, however, argue that "justification required" does
imply that the behavior is discouraged; at least my definition of
discouraged in a style guide means that a justification is required to use
it. Interfaces, and thus virtual functions are required for good unit
testing and mocking, so as long as that is not being restricted in that
sense I do not have any complaints, I am not sure where that falls on the
spectrum of "requires justification".

Again, this was not meant to be a total prohibition, unless it causes a
> systemic concern.  Using streams for a one-off serial / deserial
> function might be entirely reasonable (though we should use a
> serialization library instead).  The concern was that logging is a
> widely-used operation and the stream-style operators cause a big
> number of function call locations which result in much larger code than
> a direct call to a printf-style function.

Fair enough, I tried to avoid characterizing your argument because I felt
it was unfair and that you should be allowed to present it yourself.

Cheers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20161102/af8c4560/attachment-0001.html>