[PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

Thu May 26 19:15:18 EST 2011

* Thomas Gleixner <tglx at linutronix.de> wrote:

> > If anything then that should tell you something that events and 
> > seccomp are not just casually related ...
> 
> They happen to have the hook at the same point in the source and 
> for pure coincidence it works because the problem to solve is 
> extremly simplistic. And that's why the diffstat is minimalistic, 
> but that does not prove anything.

Here are the diffstats of the various versions of this proposed 
security feature:

       bitmask (2009):  6 files changed,  194 insertions(+), 22 deletions(-)
 filter engine (2010): 18 files changed, 1100 insertions(+), 21 deletions(-)
 event filters (2011):  5 files changed,   82 insertions(+), 16 deletions(-)

The third variant, 'event filters', is actually the most 
sophisticated one of all and it is not simplistic at all.

The main reason why the diffstat is small is because it reuses over 
ten thousand lines of pre-existing kernel code intelligently. Are you 
interpreting that as some sort of failure of the patch? I think it's 
a very good thing.

To demonstrate the non-simplicity of the feature:

 - These security rules/filters can be sophisticated like:

   sys_close() rule protecting against the closing of 
   stdin/stdout/stderr:

                  "fd == 0 || fd == 1 || fd == 2"

   sys_ioperm() rule allowing port 0x80 access but nothing else:

                  "from != 128 || num != 1"

   sys_listen() rule limiting the max accept() backlog to 16 entries:

                  "backlog > 16"

   sys_mprotect(), sys_mmap[2](), sys_unmap() and sys_mremap() rule
   protecting the first 1 MB NULL pointer guard range:

                  "addr < 0x00100000"

   sys_setscheduler() rule protecting against the switch to 
   non-SCHED_OTHER scheduler policies:

                  "policy != 0"

   Most of these examples are finegrained access restrictions that 
   AFAIK are not possible with any of the LSM based security measures 
   that Linux offers today.

 - These security rules/filters can be safely used and installed by 
   unprivileged userspace, allowing arbitrary end user apps to define 
   their own, flexible security policies.

 - These security rules/filters get automatically inherited into child 
   tasks and child tasks cannot mess with them - they cannot even 
   query/observe that these filters *exist*.

 - These security rules/filters nest on each other in basically 
   arbitrary depth, giving us a working, implemented, stackable LSM
   concept.

 - These security rules/filters can be extended to arbitrary more 
   object lifetime events in the future, without changing the ABI.

 - These security rules/filters, unlike most LSM rules, can execute
   not just within hardirqs but also within deeply atomic contexts
   such as NMI contexts, putting far less restrictions on what can
   be security/access checked.

 - Access permission violations can be set up to generate events of
   the violations into a scalable ring-buffer, providing unprivileged
   security-auditing functionality to the managing task(s).

I'd call that anything but 'simplistic'.

Thanks,

	Ingo