<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, sans-serif;font-size:10.5pt" ><div dir="ltr" ><div>Design goals:<br> • Enable Debug data collection for user level application failure and user initiated dump. Design should be flexible enough to accommodate<br> ⁃ Any future dump use cases<br> ⁃ Open source reporting tools</div>
<div> • Assumptions:<br> ▪ FFDC partition size is 1MB.<br> ▪ Maximum size of a Dump entry is 500K.</div>
<div> • Overview<br> ▪ Dump Management contains two processes and a bash script based infrastructure(dReport) for reporting.<br> ⁃ phosphor-dump-manager: This is daemon and serves the key Dbus calls related to Dump. Which includes<br> • Manager Class which is responsible for overall Dump management.<br> • Entry class is responsible for individual Dump entry level management.<br> • Dbus interfaces ( REST use cases) as part of this process.<br> • Create(): Start creating new user initiated dump reports and save in to FFDC partition.<br> ⁃ Member in Manager class.<br> • Delete(): Delete all dump reports in FFDC partition.<br> ⁃ Member in Manager class<br> • GetDump(): returns Dump data file for a given Dump ID.<br> ⁃ Member in Entry class<br> • DeleteDump(): Delete dumpreport for a given Dump ID. ( Use exiting DBus delete function)<br> ⁃ Member in Entry class<br> • ReportDump(Type,list of files): Start creating new dump reports using dReport script based on type and save in to FFDC partition.<br> • Member in Manager class and Dbus (internal) interface.</div>
<div> • List of files includes core file info incase core dump type.<br> • initiate dReport script for creating dump report.</div>
<div> • Attributes supports.<br> • Size.<br> ⁃ Member in Entry class<br> • TimeStamp().<br> ⁃ Member in Entry class<br> ⁃ phosphor-dump-inotify:</div>
<div> -This is daemon for monitoring for new core files in the selected core file directory.</div>
<div> - For any new core this will read the list of available cores in the inotify list and call ReportDump(ApplicationCrash,list of core files) bus interface to generate new Report.</div>
<div><br> ⁃ dReport: Bash script based reporting application, which is used to generates a compressed tarball of debugging information based on input type and</div>
<div> settings in the bmcdump.config file settings.Details are available in dReport Section.</div>
<div> </div>
<div> • Execution flow for User initiated Dump<br> ▪ User request Dump via REST call<br> ▪ Dump Manager gets this request<br> ⁃ Call internal function ReportDump(Type = userInitiated) to initiate dump reporting process.<br> ⁃ ReportDump(Type,ErrorlogID)<br> ⁃ Increment DumpID.<br> ⁃ Generate UUID ( use existing systemD d bus api)<br> ⁃ Creates Entry objects with dumpID, errorlogID and Type.<br> ⁃ call dReport script for reporting dump.<br> ⁃ input parameter Type, dumpID, errorlogID,UUID<br> ⁃ update Entry object member size with file size info.</div>
<div> </div>
<div> • Execution flow for Application Crash<br> • phosphor-dump-inotify daemon gets notification for the new core file.</div>
<div> • phosphor-dump-inotify daemon reads the list core file from inotify.<br> • phosphor-dump-inotify call’s internal dBus interface ReportDump(Type = ApplicationCrash, list of core files).<br> ⁃ Create Error log and commit.<br> ⁃ Need to follow same steps mentioned above in ReportDump(Type, ErrorlogID)</div>
<div> </div>
<div> • For future usecases<br> • Create service file and use internal DBus interface ReportDump(Type,errorlogid) to trigger dump.</div>
<div> </div>
<div> • Git Repo - phosphor-debug-collector.</div>
<div> </div>
<div> • Dump File Name: bmcdump[TIMESTAMP][Serial Number][UUID][TYPE].tar.bz2</div>
<div> </div>
<div> • Other potential Dump use cases, not considered here<br> ▪ Kernel panic<br> ▪ Host quiesce state<br> ▪ Boot Failure<br> ▪ BMC Reset</div>
<div> </div>
<div> </div>
<div><strong>dReport:</strong><br> • This design approach is adopted from init.d.<br> • “dReport” set of bash scripts that gathers information about system hardware, configuration and debug logs. This information can be used for diagnostic purposes</div>
<div> and debugging. This tool is mainly aimed for developers and lab bring-up. Key features<br> ⁃ Generates a compressed tarball of debugging information based on user inputs and bmcdump.config file settings.</div>
<div>Design Flow: <br> • The /etc/bmcdump.conf used to keep all dump specific config and settings</div>
<div> • All installed Dump collection commands script should store in /usr/share/dump/bmc-base.<br> ⁃ Script command file follows a simple format and set of consistent actions, which includes<br> ⁃ CONFIG [TYPE ..] [priority] - TYPE - 0..6 priority 0..99 ( Eg : CONFIG 234 93 — include this in TYPE 2,3,4 with priority 93)<br> ⁃ Methods<br> ⁃ execute() — execute command and save in to temporary file.<br> ⁃ error() — handle error.<br> ⁃ summary() — Update summary log file with execution status.</div>
<div> • Create symbolic links in /usr/share/dump/bmc-xx (use case specific path) during initialization based on the control comments defined in the beginning of the script.<br> ⁃ File name Eg: /usr/share/dump/bmc-rc1/b93date — b ( BMC Dump) , 93 - Sequence .Date -command name.<br> ⁃ Sequence number in which the programs should be started. This can be used optimizing dump incase size exceeded.</div>
<div> • Depending on type setting, the report bash script will execute the programs from one of the following directories.<br> ⁃ 0 : Disable Dump /usr/share/dump/bmc-disable <br> ⁃ 1 : User Initiated /usr/share/bmc-userinitiated<br> ⁃ 2 : Application crash /usr/share/bmc-appcrash</div>
<div> </div>
<div> Example: Subsystems installed in /etc/init.d<br> -bash-4.1$ pwd<br> /etc/init.d<br> -bash-4.1$ ls -l<br> -rwxr-xr-x. 1 root root 1288 Jan 23 2013 abrt-ccpp<br> -rwxr-xr-x. 1 root root 1628 Jan 23 2013 abrtd<br> -rwxr-xr-x. 1 root root 1642 Jan 23 2013 abrt-oops<br> --rwxr-xr-x. 1 root root 2062 Oct 6 2014 atd<br> -rwxr-xr-x. 1 root root 1041 Jan 16 2014 atop<br> -rwxr-xr-x. 1 root root 3378 Mar 14 2012 auditd</div>
<div> /etc/rc.d/rcN.d (N is the runlevel indicator)<br> These directories must contain only special symbolic links to the scripts in /etc/init.d. This is how it looks:<br> Example: /etc/rc3.d listing<br> -bash-4.1$ pwd<br> /etc/rc3.d<br> -bash-4.1$ ls -l<br> lrwxrwxrwx. 1 root root 17 May 2 2015 K00ipmievd -> ../init.d/ipmievd<br> lrwxrwxrwx. 1 root root 22 May 2 2015 K01bmc-watchdog -> ../init.d/bmc-watchdog<br> lrwxrwxrwx. 1 root root 18 May 2 2015 K01collectl -> ../init.d/collectl<br> lrwxrwxrwx. 1 root root 16 May 2 2015 K01smartd -> ../init.d/smartd</div>
<div><br> • Dump Manager/command line can executes /usr/share/dump/dReport bash script , which includes<br> ⁃ Check the space available in temporary location (/tmp)<br> ⁃ Exit script with error code , incase no space available.<br> ⁃ For any space issue during dump data collection ,dReport should save the collected info into specified location in the Flash.<br> ⁃ Depending on the “type”, execute the programs from the specified directories ,. which includes<br> ⁃ Dump Summary<br> ⁃ copy of /usr/share/bmcdump.conf.<br> ⁃ Dump Log<br> ⁃ Outputs of the various commands in /usr/share/dump/bmc-rcx<br> ⁃ Create log file to keep the Dump collection progress.<br> ⁃ Store the data in /tmp/Dump/filename temporarily.<br> ⁃ Use file move option for core specific debug files incase file delete option is enabled for the specific files.<br> ⁃ Data stores in File format similar to sosreport.<br> ⁃ Check the space availability in Permanent location<br> ⁃ Delete the oldest dump incase, the allowed size is less than Dump Max size if the file delete policy is enabled.<br> ⁃ Generate compressed tarball and save into permanent location.<br> ⁃ Delete lower priority files incase compressed tar file is great than allowed size<br> ⁃ Delete temporary file after successful compression.<br> ⁃ Delete core debug file from core location incase , it is included in this dump.<br> ⁃ For any general failure which cause Dump collection script will exit with an error code</div>
<div> </div>
<div>BMC Dump configuration - /etc/bmcdump.conf.<br> • This need discussion , keep system config only now<br> • Options are set using 'ini'-style name = value pairs.<br> • Name info<br> ⁃ filelocationtemp // Temporary file location in RAM<br> ⁃ filelocation // File path in Flash<br> ⁃ compression // Compression scheme.<br> ⁃ corefile // Core file path<br> ⁃ kerneldebug // Kernel Debug Data path<br> ⁃ maxsize // Max file size.<br> ⁃ debug // Enable Debug option. to print more trace</div>
<div> - deletedump // Delete old dumps incase not enough storage space.</div>
<div> </div>
<div> </div>
<div><strong>Dump Contents:</strong></div>
<div> </div>
<div>0-9: Must to have for all types of Dumps</div>
<div>1. summary.log <br> • DumpID<br> • UUID<br> • File Name<br> • The Date<br> • System uptime<br> • Software release information (Kernel,BMC,Host).<br> ⁃ cat /etc/version<br> ⁃ cat /etc/os-release<br> ⁃ uname -a<br> • State information ( BMC,Chassis,Host )<br> • Boot progress<br> • Dump Status summary.</div>
<div>2. dump Log file.</div>
<div>3. Configuration/State Basic:<br> • CPUs info (cat /proc/cpuinfo)<br> • Memory info (cat /proc/meminfo)<br> • Disk usage (df -hT) <br> • network basic info ( ip link and ip addr)<br> • LED.<br> • systemctl status - runtime status information<br> • systemctl --all --state=failed<br> • OS commands<br> • top<br> • free<br> • ps<br> • mount<br> • ulimit -a</div>
<div>4. Dump Config file.<br> <br>10-99: Use case specific failures</div>
<div> </div>
<div><strong>• User initiated:</strong><br> ⁃ Journal Traces “warning and above. (journalctl -p 4)<br> ⁃ Error Logs.<br> ⁃ systemctl<br> • list-jobs To list running Jobs<br> • list-units -t service --all To list all available services and their current status<br> • list-units -t target --all To show all available targets.<br> • list-timers<br> ⁃ Inventory dump.<br> ⁃ Watchdog<br> ⁃ Settings<br> ⁃ Host Sensor info<br> ⁃ /var/logs<br> • Console: /var/log/obmc-console.log<br> • SystemD: /var/log/messages<br> ⁃ /proc<br> • /proc/net Network advanced info<br> • device tree (find /proc/device-tree/ -type f -exec head {} + | less)<br> • dmesg</div>
<div> </div>
<div><strong>• Application Crash</strong><br> ⁃ Core File.</div></div>
<div dir="ltr" > </div>
<div dir="ltr" >Thanks & Regards<br><br>Jayanth Othayoth<br> </div></div><BR>