<meta charset="utf-8" emacsmode="-*- markdown -*-">
   <link rel="shortcut icon" href="/image/favicon.png">
   <link rel="apple-touch-icon" type="image/png" href="/image/favicon.png">
   <link rel="icon" type="image/png" sizes="144x144" href="/image/favicon.png">
   <link rel="stylesheet" href="/dependencies/markdeep/latest/journal.css?">

   <div align="right">
    <span id="clock"　></span>
    <script>
    window.onload     = () => { displayTime() };
    var displayTime   = () => {
        var elt = document.getElementById("clock");
        var now = new Date();
        elt.innerHTML = "\ \ " + now.toLocaleTimeString() + "\ \ ";
        setTimeout(displayTime, 999);
    };
    </script>
    <style>
    #clock {
        font: bold 14pt sans;
        color: #FFFFFF;
        background: #74A5BD;
        padding: 3px;
        border: solid black 0px;
        border-radius: 8px;
    }
    </style>
</div>


   <header>
    <a class="site-title" href="/">GhaSShee</a>
    <nav class="site-nav">
        <div class="trigger">
            <a class="page-link" href="/about/">About</a>
            <a class="page-link" href="/build/">Build</a>
            <a class="page-link" href="/category/">Cat</a>
            <a class="page-link" href="/diary/">Diary</a>
            <a class="page-link" href="/ethereum/">Eth</a>
            <a class="page-link" href="/functional/">Fun</a>
            <a class="page-link" href="/ghasshee/">GhaSShee</a>
            <a class="page-link" href="/html/">Html</a>
        </div>
    </nav>
</header>
<br><br>
   <center><font size="6">Linux Kernel</font></center><br><br>
   reading "Linux Kernel 2.6 decode"


<br>
<br>

## Kernel Src

| directory | abstract |
| --- | --- |
| mm | memory control |
| fs | vfs  (subdir:fs) |
| net | network protocol |
| ipc | System V ipc (ipc : inter-processor communication) |
| init | programs in kernel initializing |
| crypto | crypto functions |
| block | controls block type device |
| drivers | device drivers |
| sound | sound driver |
| arch | cpu architecture interface |
| include | header files used in compiling kernel |


<br>
<br>

## CPU flags

~~~
How can I tell whether my processor has a particular feature? (64-bit instruction set, hardware-assisted virtualization, cryptographic accelerators, etc.) I know that the file /proc/cpuinfo contains this information, in the flags line, but what do all these cryptic abbreviations mean?

For example, given the following extract from /proc/cpuinfo, do I have a 64-bit CPU? Do I have hardware virtualization?

model name      : Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
…
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority
~~~

~~~
x86

(32-bit a.k.a. i386–i686 and 64-bit a.k.a. amd64. In other words, your workstation, laptop or server.)

FAQ: Do I have…

* 64-bit (x86_64/AMD64/Intel64)? lm
* Hardware virtualization (VMX/AMD-V)? vmx (Intel), svm (AMD)
* Accelerated AES (AES-NI)? aes
* TXT (TPM)? smx
* a hypervisor (announced as such)? hypervisor

Most of the other features are only of interest to compiler or kernel authors.

All the flags

The full listing is in the kernel source, in the file arch/x86/include/asm/cpufeatures.h.


Intel-defined CPU features, CPUID level 0x00000001 (edx)

See also Wikipedia and table 2-27 in Intel Advanced Vector Extensions Programming Reference

fpu: Onboard FPU (floating point support)
vme: Virtual 8086 mode enhancements
de: Debugging Extensions (CR4.DE)
pse: Page Size Extensions (4MB memory pages)
tsc: Time Stamp Counter (RDTSC)
msr: Model-Specific Registers (RDMSR, WRMSR)
pae: Physical Address Extensions (support for more than 4GB of RAM)
mce: Machine Check Exception
cx8: CMPXCHG8 instruction (64-bit compare-and-swap)
apic: Onboard APIC
sep: SYSENTER/SYSEXIT
mtrr: Memory Type Range Registers
pge: Page Global Enable (global bit in PDEs and PTEs)
mca: Machine Check Architecture
cmov: CMOV instructions (conditional move) (also FCMOV)
pat: Page Attribute Table
pse36: 36-bit PSEs (huge pages)
pn: Processor serial number
clflush: Cache Line Flush instruction
dts: Debug Store (buffer for debugging and profiling instructions)
acpi: ACPI via MSR (temperature monitoring and clock speed modulation)
mmx: Multimedia Extensions
fxsr: FXSAVE/FXRSTOR, CR4.OSFXSR
sse: Intel SSE vector instructions
sse2: SSE2
ss: CPU self snoop
ht: Hyper-Threading
tm: Automatic clock control (Thermal Monitor)
ia64: Intel Itanium Architecture 64-bit (not to be confused with Intel's 64-bit x86 architecture with flag x86-64 or "AMD64" bit indicated by flag lm)
pbe: Pending Break Enable (PBE# pin) wakeup support


AMD-defined CPU features, CPUID level 0x80000001

See also Wikipedia and table 2-23 in Intel Advanced Vector Extensions Programming Reference

syscall: SYSCALL (Fast System Call) and SYSRET (Return From Fast System Call)
mp: Multiprocessing Capable.
nx: Execute Disable
mmxext: AMD MMX extensions
fxsr_opt: FXSAVE/FXRSTOR optimizations
pdpe1gb: One GB pages (allows hugepagesz=1G)
rdtscp: Read Time-Stamp Counter and Processor ID
lm: Long Mode (x86-64: amd64, also known as Intel 64, i.e. 64-bit capable)
3dnowext: AMD 3DNow! extensions
3dnow: 3DNow! (AMD vector instructions, competing with Intel's SSE1)

Transmeta-defined CPU features, CPUID level 0x80860001

recovery: CPU in recovery mode
longrun: Longrun power control
lrti: LongRun table interface
Other features, Linux-defined mapping

cxmmx: Cyrix MMX extensions
k6_mtrr: AMD K6 nonstandard MTRRs
cyrix_arr: Cyrix ARRs (= MTRRs)
centaur_mcr: Centaur MCRs (= MTRRs)
constant_tsc: TSC ticks at a constant rate
up: SMP kernel running on UP
art: Always-Running Timer
arch_perfmon: Intel Architectural PerfMon
pebs: Precise-Event Based Sampling
bts: Branch Trace Store
rep_good: rep microcode works well
acc_power: AMD accumulated power mechanism
nopl: The NOPL (0F 1F) instructions
xtopology: cpu topology enum extensions
tsc_reliable: TSC is known to be reliable
nonstop_tsc: TSC does not stop in C states
extd_apicid: has extended APICID (8 bits)
amd_dcm: multi-node processor
aperfmperf: APERFMPERF
eagerfpu: Non lazy FPU restore
nonstop_tsc_s3: TSC doesn't stop in S3 state
mce_recovery: CPU has recoverable machine checks


Intel-defined CPU features, CPUID level 0x00000001 (ecx)

See also Wikipedia and table 2-26 in Intel Advanced Vector Extensions Programming Reference

pni: SSE-3 (“Prescott New Instructions”)
pclmulqdq: Perform a Carry-Less Multiplication of Quadword instruction — accelerator for GCM)
dtes64: 64-bit Debug Store
monitor: Monitor/Mwait support (Intel SSE3 supplements)
ds_cpl: CPL Qual. Debug Store
vmx: Hardware virtualization: Intel VMX
smx: Safer mode: TXT (TPM support)
est: Enhanced SpeedStep
tm2: Thermal Monitor 2
ssse3: Supplemental SSE-3
cid: Context ID
sdbg: silicon debug
fma: Fused multiply-add
cx16: CMPXCHG16B
xtpr: Send Task Priority Messages
pdcm: Performance Capabilities
pcid: Process Context Identifiers
dca: Direct Cache Access
sse4_1: SSE-4.1
sse4_2: SSE-4.2
x2apic: x2APIC
movbe: Move Data After Swapping Bytes instruction
popcnt: Return the Count of Number of Bits Set to 1 instruction (Hamming weight, i.e. bit count)
tsc_deadline_timer: Tsc deadline timer
aes/aes-ni: Advanced Encryption Standard (New Instructions)
xsave: Save Processor Extended States: also provides XGETBY,XRSTOR,XSETBY
avx: Advanced Vector Extensions
f16c: 16-bit fp conversions (CVT16)
rdrand: Read Random Number from hardware random number generator instruction
hypervisor: Running on a hypervisor


VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001

rng: Random Number Generator present (xstore)
rng_en: Random Number Generator enabled
ace: on-CPU crypto (xcrypt)
ace_en: on-CPU crypto enabled
ace2: Advanced Cryptography Engine v2
ace2_en: ACE v2 enabled
phe: PadLock Hash Engine
phe_en: PHE enabled
pmm: PadLock Montgomery Multiplier
pmm_en: PMM enabled


More extended AMD flags: CPUID level 0x80000001, ecx

lahf_lm: Load AH from Flags (LAHF) and Store AH into Flags (SAHF) in long mode
cmp_legacy: If yes HyperThreading not valid
svm: “Secure virtual machine”: AMD-V
extapic: Extended APIC space
cr8_legacy: CR8 in 32-bit mode
abm: Advanced Bit Manipulation
sse4a: SSE-4A
misalignsse: indicates if a general-protection exception (#GP) is generated when some legacy SSE instructions operate on unaligned data. Also depends on CR0 and Alignment Checking bit
3dnowprefetch: 3DNow prefetch instructions
osvw: indicates OS Visible Workaround, which allows the OS to work around processor errata.
ibs: Instruction Based Sampling
xop: extended AVX instructions
skinit: SKINIT/STGI instructions
wdt: Watchdog timer
lwp: Light Weight Profiling
fma4: 4 operands MAC instructions
tce: translation cache extension
nodeid_msr: NodeId MSR
tbm: Trailing Bit Manipulation
topoext: Topology Extensions CPUID leafs
perfctr_core: Core Performance Counter Extensions
perfctr_nb: NB Performance Counter Extensions
bpext: data breakpoint extension
ptsc: performance time-stamp counter
perfctr_l2: L2 Performance Counter Extensions
mwaitx: MWAIT extension (MONITORX/MWAITX)


Auxiliary flags: Linux defined - For features scattered in various CPUID levels

ring3mwait: Ring 3 MONITOR/MWAIT
cpuid_fault: Intel CPUID faulting
cpb: AMD Core Performance Boost
epb: IA32_ENERGY_PERF_BIAS support
cat_l3: Cache Allocation Technology L3
cat_l2: Cache Allocation Technology L2
cdp_l3: Code and Data Prioritization L3
invpcid_single: effectively invpcid and CR4.PCIDE=1
hw_pstate: AMD HW-PState
proc_feedback: AMD ProcFeedbackInterface
sme: AMD Secure Memory Encryption
intel_ppin: Intel Processor Inventory Number
intel_pt: Intel Processor Tracing
avx512_4vnniw: AVX-512 Neural Network Instructions
avx512_4fmaps: AVX-512 Multiply Accumulation Single precision
mba: Memory Bandwidth Allocation


Virtualization flags: Linux defined

tpr_shadow: Intel TPR Shadow
vnmi: Intel Virtual NMI
flexpriority: Intel FlexPriority
ept: Intel Extended Page Table
vpid: Intel Virtual Processor ID
vmmcall: prefer VMMCALL to VMCALL
Intel-defined CPU features, CPUID level 0x00000007:0 (ebx)

fsgsbase: {RD/WR}{FS/GS}BASE instructions
tsc_adjust: TSC adjustment MSR
bmi1: 1st group bit manipulation extensions
hle: Hardware Lock Elision
avx2: AVX2 instructions
smep: Supervisor Mode Execution Protection
bmi2: 2nd group bit manipulation extensions
erms: Enhanced REP MOVSB/STOSB
invpcid: Invalidate Processor Context ID
rtm: Restricted Transactional Memory
cqm: Cache QoS Monitoring
mpx: Memory Protection Extension
rdt_a: Resource Director Technology Allocation
avx512f: AVX-512 foundation
avx512dq: AVX-512 Double/Quad instructions
rdseed: The RDSEED instruction
adx: The ADCX and ADOX instructions
smap: Supervisor Mode Access Prevention
clflushopt: CLFLUSHOPT instruction
clwb: CLWB instruction
avx512pf: AVX-512 Prefetch
avx512er: AVX-512 Exponential and Reciprocal
avx512cd: AVX-512 Conflict Detection
sha_ni: SHA1/SHA256 Instruction Extensions
avx512bw: AVX-512 Byte/Word instructions
avx512vl: AVX-512 128/256 Vector Length extensions


Extended state features, CPUID level 0x0000000d:1 (eax)

xsaveopt: Optimized XSAVE
xsavec: XSAVEC
xgetbv1: XGETBV with ECX = 1
xsaves: XSAVES/XRSTORS
Intel-defined CPU QoS sub-leaf, CPUID level 0x0000000F:0 (edx)

cqm_llc: LLC QoS
Intel-defined CPU QoS sub-leaf, CPUID level 0x0000000F:1 (edx)

cqm_occup_llc: LLC occupancy monitoring
cqm_mbm_total: LLC total MBM monitoring
cqm_mbm_local: LLC local MBM monitoring


AMD-defined CPU features, CPUID level 0x80000008 (ebx)

clzero: CLZERO instruction
irperf: instructions retired performance counter
xsaveerptr: Always save/restore FP error pointers
Thermal and Power Management leaf, CPUID level 0x00000006 (eax)

dtherm (formerly dts): digital thermal sensor
ida: Intel Dynamic Acceleration
arat: Always Running APIC Timer
pln: Intel Power Limit Notification
pts: Intel Package Thermal Status
hwp: Intel Hardware P-states
hwp_notify: HWP notification
hwp_act_window: HWP Activity Window
hwp_epp: HWP Energy Performance Preference
hwp_pkg_req: HWP package-level request


AMD SVM Feature Identification, CPUID level 0x8000000a (edx)

npt: AMD Nested Page Table support
lbrv: AMD LBR Virtualization support
svm_lock: AMD SVM locking MSR
nrip_save: AMD SVM next_rip save
tsc_scale: AMD TSC scaling support
vmcb_clean: AMD VMCB clean bits support
flushbyasid: AMD flush-by-ASID support
decodeassists: AMD Decode Assists support
pausefilter: AMD filtered pause intercept
pfthreshold: AMD pause filter threshold
avic: Virtual Interrupt Controller
vmsave_vmload: Virtual VMSAVE VMLOAD
``vgif`: Virtual GIF


Intel-defined CPU features, CPUID level 0x00000007:0 (ecx)

avx512vbmi: AVX512 Vector Bit Manipulation instructions
umip: User Mode Instruction Protection
pku: Protection Keys for Userspace
ospke: OS Protection Keys Enable
avx512_vbmi2: Additional AVX512 Vector Bit Manipulation instructions
gfni: Galois Field New Instructions
vaes: Vector AES
vpclmulqdq: Carry-Less Multiplication Double Quadword
avx512_VNNI: Vector Neural Network Instructions
avx512_bitalg: VPOPCNT[B,W] and VPSHUF-BITQMB instructions
avx512_vpopcntdq: POPCNT for vectors of DW/QW
la57: 5-level page tables
rdpid: RDPID instruction


AMD-defined CPU features, CPUID level 0x80000007 (ebx)

overflow_recov: MCA overflow recovery support
succor: uncorrectable error containment and recovery
smca: Scalable MCA


Detected CPU bugs (Linux-defined)

f00f: Intel F00F
fdiv: CPU FDIV
coma: Cyrix 6x86 coma
amd_tlb_mmatch: tlb_mmatch AMD Erratum 383
amd_apic_c1e: apic_c1e AMD Erratum 400
11ap: Bad local APIC aka 11AP
fxsave_leak: FXSAVE leaks FOP/FIP/FOP
clflush_monitor: AAI65, CLFLUSH required before MONITOR
sysret_ss_attrs: SYSRET doesn't fix up SS attrs
espfix: "" IRET to 16-bit SS corrupts ESP/RSP high bits
null_seg: Nulling a selector preserves the base
swapgs_fence: SWAPGS without input dep on GS
monitor: IPI required to wake up remote CPU
amd_e400: CPU is among the affected by Erratum 400
cpu_meltdown: CPU is affected by meltdown attack and needs kernel page table isolation


P.S. This listing was derived from arch/x86/include/asm/cpufeature.h in the kernel source. The flags are listed in the same order as the source code. Please help by adding links to descriptions of features when they're missing, by writing a short description of features that have an unexpressive names, and by updating the list for new kernel versions. The current list is from Linux 4.15-rc7.
~~~


~~~
ARM

On ARM processors, a few features are mentioned in the features: line. Only features directly related to the ARM architecture are mentioned there, not features specific to a silicon manufacturer or system-on-chip.

The features are obtained from looking up the CPU id with read_cpuid() and looking it up in the processor type definitions known at compile time where the features are expressed as a mask of HWCAP_xxx flags. The corresponding strings are in hwcap_str etc. in setup.c.

In the list below, ARMv6 introduced SIMD instructions and datatypes. ARMv7 provided Advanced SIMD instructions and datatypes. On 32-bit ARM machines, neon signals Advanced SIMD; while asimd signals Advanced SIMD on 64-bit arm machines.

swp: SWP instruction (atomic read-modify-write)
half: Half-word loads and stores
thumb: Thumb (16-bit instruction set)
26bit: "26 Bit" Model (Processor status register folded into program counter)
fastmult: 32×32→64-bit multiplication
fpa: Floating point accelerator
vfp: VFP (early SIMD vector floating point instructions)
edsp: DSP extensions (the 'e' variant of the ARM9 CPUs, and all others above)
java: Jazelle (Java bytecode accelerator)
iwmmxt: SIMD instructions similar to Intel MMX
crunch: MaverickCrunch coprocessor (if kernel support enabled)
thumbee: ThumbEE
neon: Advanced SIMD/NEON (asimd on AArch64 older kernels)
vfpv3: VFP version 3
vfpv3d16: VFP version 3 with 16 D-registers
tls: TLS register
vfpv4: VFP version 4 with fast context switching
idiva: SDIV and UDIV hardware division in ARM mode
idivt: SDIV and UDIV hardware division in Thumb mode
vfpd32: VFP with 32 D-registers
lpae: Large Physical Address Extension (>4GB physical memory on 32-bit architecture)
evtstrm: kernel event stream using generic architected timer
aes: hardware-accelerated AES (secret-key cryptography)
pmull{2}: 64×64→128-bit F2m multiplication — acceleration for the GCM mode of authenticated encryption
sha1: hardware-accelerated SHA-1
sha2: hardware-accelerated SHA-256
crc32: hardware-accelerated CRC-32
Beyond that, the Hardware: line indicates the processor model. Depending on the model, there may be other information in other files under /proc or /sys, or in boot-time kernel log messages. Unfortunately each ARM CPU manufacturer has its own method for reporting processor features, if any.

shareedit
edited Nov 14 '17 at 16:32
community wiki
12 revs, 5 users 78%
Gilles
add a comment
up vote
9
down vote
Or alternatively you can use cpuid program, it must be in debian repository. It dumps every possible info about your CPU with some explanations, so you don't get those obscure flags.

shareedit
answered May 22 '14 at 13:19

hurufu
24122
cpuid expands the abbreviations. I wouldn't really call its output explanations. Knowing that ht means “Hyper Threading” explains it to some extent, but knowing that mmx means “MMX instruction set”, not so much, and that mca means “Machine Check Architecture”, hardly. – Gilles May 22 '14 at 18:10
6
@Gilles ...and yet, "Machine Check Architecture" is certainly better Google query than "mca" ;) – Alois Mahdal Jun 6 '14 at 10:07
add a comment
up vote
7
down vote
x86

Find it yourself in 4.1.3 x86 and the Intel manual

arch/x86/include/asm/cpufeature.h contains the full list.

The define values are of type:

X*32 + Y
E.g.:

#define X86_FEATURE_FPU     ( 0*32+ 0) /* Onboard FPU */
The features flags, extracted from CPUID, are stored inside the:

__u32 x86_capability[NCAPINTS + NBUGINTS]; field
of struct cpuinfo_x86 boot_cpu_data
defined at x86/kernel/setup.c
which is initialized through __init functions.

Where each x86_capability array element comes from:

| index | eax      | ecx | output | file        |
|-------|----------|-----|--------|-------------|
|     0 |        1 |   0 | edx    | common.c    |
|     1 | 80000001 |     | edx    | common.c    |
|     2 | 80860001 |     | edx    | transmeta.c |
|     3 |          |     |        |             |
|     4 |        1 |   0 | ecx    | common.c    |
|     5 | C0000001 |     | edx    | centaur.c   |
|     6 | 80000001 |     | ecx    | common.c    |
|     7 |          |     |        | scattered.c |
|     8 |          |     |        |             |
|     9 |        7 |   0 | ebx    | common.c    |
|    10 |        D |   1 | eax    | common.c    |
|    11 |        F |   0 | edx    | common.c    |
|    12 |        F |   1 | edx    | common.c    |
Notes:

empty entries mean: "from various places" or "not available"
index: is the index of x86_capability, e.g. x86_capability[0]
eax and exc: are the input values for CPUID in hex. Inputs that use exc, which are fewer, call it the subleaf (of a 2 level tree with eax at the root).
output: is the register from which CPUID output is taken
file: is the file where those fields are defined. Paths are relative to arch/x86/kernel/cpu/.
transmeta: was the name of a CPU vendor https://en.wikipedia.org/wiki/Transmeta that was acquired by Novafora https://www.crunchbase.com/organization/novafora
centaur: was the name of a CPU vendor https://en.wikipedia.org/wiki/Centaur_Technology that was acquired by VIA https://en.wikipedia.org/wiki/VIA_Technologies. Cyrix is another one.
Conclusions:

most entries come directly from CPUID output registers and are set in common.c by something like:

c->x86_capability[0] = edx;
Those are easy to find in batch on the Intel manual for CPUID.
the others are scattered throughout the source, and are set bit by bit with set_cpu_cap.

To find them, use git grep X86_FEATURE_XXX inside arch/x86.

You can usually deduce what CPUID bit they correspond to from the surrounding code.
Other fun facts

The flags are actually printed at arch/x86/kernel/cpu/proc.c with the code:

seq_puts(m, "flags\t\t:");
for (i = 0; i < 32*NCAPINTS; i++)
    if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
        seq_printf(m, " %s", x86_cap_flags[i]);
Where:

cpu_has does the main check for the feature.
x86_cap_flags[i] contains strings that correspond to each flags.
This gets passed as a callback to the proc system setup. The entry point is at fs/proc/cpuinfo.c.
x86_cap_flags strings are generated by arch/x86/kernel/cpu/mkcapflags.h directly from arch/x86/include/asm/cpufeature.h by "parsing" it with sed...

The output goes to arch/x86/kernel/cpu/capflags.c of the build directory, and resulting array looks like:

const char * const x86_cap_flags[NCAPINTS*32] = {
    [X86_FEATURE_FPU]        = "fpu",
    [X86_FEATURE_VME]        = "vme",
so for example X86_FEATURE_FPU corresponds to the string "fpu" and so on.
cpu_has breaks down into two cases with code:

#define cpu_has(c, bit)                         \
    (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :  \
    test_cpu_cap(c, bit))
They are:
__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit): the flag is required for the kernel to run.

This is determined by data inside required-features.h, which comments:

Define minimum CPUID feature set for kernel These bits are checked
really early to actually display a visible error message before the
kernel dies.  Make sure to assign features to the proper mask!
Since those are known at compile time (kernel requirements), have already been checked at startup, the check can be resolved at compile time if bit is known at compile time.

Thus the __builtin_constant_p(bit) which checks if bit is a compile time constant.
test_cpu_cap: this uses up CPUID data from the struct cpuinfo_x86 boot_cpu_data global
~~~


### power management


https://www.kernel.org/doc/Documentation/usb/power-management.txt


~~~
Power Management for USB

Alan Stern <stern@rowland.harvard.edu>

   Last-updated: February 2014


Contents:
---------
* What is Power Management?
* What is Remote Wakeup?
* When is a USB device idle?
* Forms of dynamic PM
* The user interface for dynamic PM
* Changing the default idle-delay time
* Warnings
* The driver interface for Power Management
* The driver interface for autosuspend and autoresume
* Other parts of the driver interface
* Mutual exclusion
* Interaction between dynamic PM and system PM
* xHCI hardware link PM
* USB Port Power Control
* User Interface for Port Power Control
* Suggested Userspace Port Power Policy


What is Power Management?
-------------------------

Power Management (PM) is the practice of saving energy by suspending
parts of a computer system when they aren't being used.  While a
component is "suspended" it is in a nonfunctional low-power state; it
might even be turned off completely.  A suspended component can be
"resumed" (returned to a functional full-power state) when the kernel
needs to use it.  (There also are forms of PM in which components are
placed in a less functional but still usable state instead of being
suspended; an example would be reducing the CPU's clock rate.  This
document will not discuss those other forms.)

When the parts being suspended include the CPU and most of the rest of
the system, we speak of it as a "system suspend".  When a particular
device is turned off while the system as a whole remains running, we
call it a "dynamic suspend" (also known as a "runtime suspend" or
"selective suspend").  This document concentrates mostly on how
dynamic PM is implemented in the USB subsystem, although system PM is
covered to some extent (see Documentation/power/*.txt for more
information about system PM).

System PM support is present only if the kernel was built with CONFIG_SUSPEND
or CONFIG_HIBERNATION enabled.  Dynamic PM support for USB is present whenever
the kernel was built with CONFIG_PM enabled.

[Historically, dynamic PM support for USB was present only if the
kernel had been built with CONFIG_USB_SUSPEND enabled (which depended on
CONFIG_PM_RUNTIME).  Starting with the 3.10 kernel release, dynamic PM support
for USB was present whenever the kernel was built with CONFIG_PM_RUNTIME
enabled.  The CONFIG_USB_SUSPEND option had been eliminated.]


What is Remote Wakeup?
----------------------

When a device has been suspended, it generally doesn't resume until
the computer tells it to.  Likewise, if the entire computer has been
suspended, it generally doesn't resume until the user tells it to, say
by pressing a power button or opening the cover.

However some devices have the capability of resuming by themselves, or
asking the kernel to resume them, or even telling the entire computer
to resume.  This capability goes by several names such as "Wake On
LAN"; we will refer to it generically as "remote wakeup".  When a
device is enabled for remote wakeup and it is suspended, it may resume
itself (or send a request to be resumed) in response to some external
event.  Examples include a suspended keyboard resuming when a key is
pressed, or a suspended USB hub resuming when a device is plugged in.


When is a USB device idle?
--------------------------

A device is idle whenever the kernel thinks it's not busy doing
anything important and thus is a candidate for being suspended.  The
exact definition depends on the device's driver; drivers are allowed
to declare that a device isn't idle even when there's no actual
communication taking place.  (For example, a hub isn't considered idle
unless all the devices plugged into that hub are already suspended.)
In addition, a device isn't considered idle so long as a program keeps
its usbfs file open, whether or not any I/O is going on.

If a USB device has no driver, its usbfs file isn't open, and it isn't
being accessed through sysfs, then it definitely is idle.


Forms of dynamic PM
-------------------

Dynamic suspends occur when the kernel decides to suspend an idle
device.  This is called "autosuspend" for short.  In general, a device
won't be autosuspended unless it has been idle for some minimum period
of time, the so-called idle-delay time.

Of course, nothing the kernel does on its own initiative should
prevent the computer or its devices from working properly.  If a
device has been autosuspended and a program tries to use it, the
kernel will automatically resume the device (autoresume).  For the
same reason, an autosuspended device will usually have remote wakeup
enabled, if the device supports remote wakeup.

It is worth mentioning that many USB drivers don't support
autosuspend.  In fact, at the time of this writing (Linux 2.6.23) the
only drivers which do support it are the hub driver, kaweth, asix,
usblp, usblcd, and usb-skeleton (which doesn't count).  If a
non-supporting driver is bound to a device, the device won't be
autosuspended.  In effect, the kernel pretends the device is never
idle.

We can categorize power management events in two broad classes:
external and internal.  External events are those triggered by some
agent outside the USB stack: system suspend/resume (triggered by
userspace), manual dynamic resume (also triggered by userspace), and
remote wakeup (triggered by the device).  Internal events are those
triggered within the USB stack: autosuspend and autoresume.  Note that
all dynamic suspend events are internal; external agents are not
allowed to issue dynamic suspends.


The user interface for dynamic PM
---------------------------------

The user interface for controlling dynamic PM is located in the power/
subdirectory of each USB device's sysfs directory, that is, in
/sys/bus/usb/devices/.../power/ where "..." is the device's ID.  The
relevant attribute files are: wakeup, control, and
autosuspend_delay_ms.  (There may also be a file named "level"; this
file was deprecated as of the 2.6.35 kernel and replaced by the
"control" file.  In 2.6.38 the "autosuspend" file will be deprecated
and replaced by the "autosuspend_delay_ms" file.  The only difference
is that the newer file expresses the delay in milliseconds whereas the
older file uses seconds.  Confusingly, both files are present in 2.6.37
but only "autosuspend" works.)

power/wakeup

This file is empty if the device does not support
remote wakeup.  Otherwise the file contains either the
word "enabled" or the word "disabled", and you can
write those words to the file.  The setting determines
whether or not remote wakeup will be enabled when the
device is next suspended.  (If the setting is changed
while the device is suspended, the change won't take
effect until the following suspend.)

power/control

This file contains one of two words: "on" or "auto".
You can write those words to the file to change the
device's setting.

"on" means that the device should be resumed and
autosuspend is not allowed.  (Of course, system
suspends are still allowed.)

"auto" is the normal state in which the kernel is
allowed to autosuspend and autoresume the device.

(In kernels up to 2.6.32, you could also specify
"suspend", meaning that the device should remain
suspended and autoresume was not allowed.  This
setting is no longer supported.)

power/autosuspend_delay_ms

This file contains an integer value, which is the
number of milliseconds the device should remain idle
before the kernel will autosuspend it (the idle-delay
time).  The default is 2000.  0 means to autosuspend
as soon as the device becomes idle, and negative
values mean never to autosuspend.  You can write a
number to the file to change the autosuspend
idle-delay time.

Writing "-1" to power/autosuspend_delay_ms and writing "on" to
power/control do essentially the same thing -- they both prevent the
device from being autosuspended.  Yes, this is a redundancy in the
API.

(In 2.6.21 writing "0" to power/autosuspend would prevent the device
from being autosuspended; the behavior was changed in 2.6.22.  The
power/autosuspend attribute did not exist prior to 2.6.21, and the
power/level attribute did not exist prior to 2.6.22.  power/control
was added in 2.6.34, and power/autosuspend_delay_ms was added in
2.6.37 but did not become functional until 2.6.38.)


Changing the default idle-delay time
------------------------------------

The default autosuspend idle-delay time (in seconds) is controlled by
a module parameter in usbcore.  You can specify the value when usbcore
is loaded.  For example, to set it to 5 seconds instead of 2 you would
do:

modprobe usbcore autosuspend=5

Equivalently, you could add to a configuration file in /etc/modprobe.d
a line saying:

options usbcore autosuspend=5

Some distributions load the usbcore module very early during the boot
process, by means of a program or script running from an initramfs
image.  To alter the parameter value you would have to rebuild that
image.

If usbcore is compiled into the kernel rather than built as a loadable
module, you can add

usbcore.autosuspend=5

to the kernel's boot command line.

Finally, the parameter value can be changed while the system is
running.  If you do:

echo 5 >/sys/module/usbcore/parameters/autosuspend

then each new USB device will have its autosuspend idle-delay
initialized to 5.  (The idle-delay values for already existing devices
will not be affected.)

Setting the initial default idle-delay to -1 will prevent any
autosuspend of any USB device.  This has the benefit of allowing you
then to enable autosuspend for selected devices.


Warnings
--------

The USB specification states that all USB devices must support power
management.  Nevertheless, the sad fact is that many devices do not
support it very well.  You can suspend them all right, but when you
try to resume them they disconnect themselves from the USB bus or
they stop working entirely.  This seems to be especially prevalent
among printers and scanners, but plenty of other types of device have
the same deficiency.

For this reason, by default the kernel disables autosuspend (the
power/control attribute is initialized to "on") for all devices other
than hubs.  Hubs, at least, appear to be reasonably well-behaved in
this regard.

(In 2.6.21 and 2.6.22 this wasn't the case.  Autosuspend was enabled
by default for almost all USB devices.  A number of people experienced
problems as a result.)

This means that non-hub devices won't be autosuspended unless the user
or a program explicitly enables it.  As of this writing there aren't
any widespread programs which will do this; we hope that in the near
future device managers such as HAL will take on this added
responsibility.  In the meantime you can always carry out the
necessary operations by hand or add them to a udev script.  You can
also change the idle-delay time; 2 seconds is not the best choice for
every device.

If a driver knows that its device has proper suspend/resume support,
it can enable autosuspend all by itself.  For example, the video
driver for a laptop's webcam might do this (in recent kernels they
do), since these devices are rarely used and so should normally be
autosuspended.

Sometimes it turns out that even when a device does work okay with
autosuspend there are still problems.  For example, the usbhid driver,
which manages keyboards and mice, has autosuspend support.  Tests with
a number of keyboards show that typing on a suspended keyboard, while
causing the keyboard to do a remote wakeup all right, will nonetheless
frequently result in lost keystrokes.  Tests with mice show that some
of them will issue a remote-wakeup request in response to button
presses but not to motion, and some in response to neither.

The kernel will not prevent you from enabling autosuspend on devices
that can't handle it.  It is even possible in theory to damage a
device by suspending it at the wrong time.  (Highly unlikely, but
possible.)  Take care.


The driver interface for Power Management
-----------------------------------------

The requirements for a USB driver to support external power management
are pretty modest; the driver need only define

.suspend
.resume
.reset_resume

methods in its usb_driver structure, and the reset_resume method is
optional.  The methods' jobs are quite simple:

The suspend method is called to warn the driver that the
device is going to be suspended.  If the driver returns a
negative error code, the suspend will be aborted.  Normally
the driver will return 0, in which case it must cancel all
outstanding URBs (usb_kill_urb()) and not submit any more.

The resume method is called to tell the driver that the
device has been resumed and the driver can return to normal
operation.  URBs may once more be submitted.

The reset_resume method is called to tell the driver that
the device has been resumed and it also has been reset.
The driver should redo any necessary device initialization,
since the device has probably lost most or all of its state
(although the interfaces will be in the same altsettings as
before the suspend).

If the device is disconnected or powered down while it is suspended,
the disconnect method will be called instead of the resume or
reset_resume method.  This is also quite likely to happen when
waking up from hibernation, as many systems do not maintain suspend
current to the USB host controllers during hibernation.  (It's
possible to work around the hibernation-forces-disconnect problem by
using the USB Persist facility.)

The reset_resume method is used by the USB Persist facility (see
Documentation/usb/persist.txt) and it can also be used under certain
circumstances when CONFIG_USB_PERSIST is not enabled.  Currently, if a
device is reset during a resume and the driver does not have a
reset_resume method, the driver won't receive any notification about
the resume.  Later kernels will call the driver's disconnect method;
2.6.23 doesn't do this.

USB drivers are bound to interfaces, so their suspend and resume
methods get called when the interfaces are suspended or resumed.  In
principle one might want to suspend some interfaces on a device (i.e.,
force the drivers for those interface to stop all activity) without
suspending the other interfaces.  The USB core doesn't allow this; all
interfaces are suspended when the device itself is suspended and all
interfaces are resumed when the device is resumed.  It isn't possible
to suspend or resume some but not all of a device's interfaces.  The
closest you can come is to unbind the interfaces' drivers.


The driver interface for autosuspend and autoresume
---------------------------------------------------

To support autosuspend and autoresume, a driver should implement all
three of the methods listed above.  In addition, a driver indicates
that it supports autosuspend by setting the .supports_autosuspend flag
in its usb_driver structure.  It is then responsible for informing the
USB core whenever one of its interfaces becomes busy or idle.  The
driver does so by calling these six functions:

int  usb_autopm_get_interface(struct usb_interface *intf);
void usb_autopm_put_interface(struct usb_interface *intf);
int  usb_autopm_get_interface_async(struct usb_interface *intf);
void usb_autopm_put_interface_async(struct usb_interface *intf);
void usb_autopm_get_interface_no_resume(struct usb_interface *intf);
void usb_autopm_put_interface_no_suspend(struct usb_interface *intf);

The functions work by maintaining a usage counter in the
usb_interface's embedded device structure.  When the counter is > 0
then the interface is deemed to be busy, and the kernel will not
autosuspend the interface's device.  When the usage counter is = 0
then the interface is considered to be idle, and the kernel may
autosuspend the device.

Drivers need not be concerned about balancing changes to the usage
counter; the USB core will undo any remaining "get"s when a driver
is unbound from its interface.  As a corollary, drivers must not call
any of the usb_autopm_* functions after their disconnect() routine has
returned.

Drivers using the async routines are responsible for their own
synchronization and mutual exclusion.

usb_autopm_get_interface() increments the usage counter and
does an autoresume if the device is suspended.  If the
autoresume fails, the counter is decremented back.

usb_autopm_put_interface() decrements the usage counter and
attempts an autosuspend if the new value is = 0.

usb_autopm_get_interface_async() and
usb_autopm_put_interface_async() do almost the same things as
their non-async counterparts.  The big difference is that they
use a workqueue to do the resume or suspend part of their
jobs.  As a result they can be called in an atomic context,
such as an URB's completion handler, but when they return the
device will generally not yet be in the desired state.

usb_autopm_get_interface_no_resume() and
usb_autopm_put_interface_no_suspend() merely increment or
decrement the usage counter; they do not attempt to carry out
an autoresume or an autosuspend.  Hence they can be called in
an atomic context.

The simplest usage pattern is that a driver calls
usb_autopm_get_interface() in its open routine and
usb_autopm_put_interface() in its close or release routine.  But other
patterns are possible.

The autosuspend attempts mentioned above will often fail for one
reason or another.  For example, the power/control attribute might be
set to "on", or another interface in the same device might not be
idle.  This is perfectly normal.  If the reason for failure was that
the device hasn't been idle for long enough, a timer is scheduled to
carry out the operation automatically when the autosuspend idle-delay
has expired.

Autoresume attempts also can fail, although failure would mean that
the device is no longer present or operating properly.  Unlike
autosuspend, there's no idle-delay for an autoresume.


Other parts of the driver interface
-----------------------------------

Drivers can enable autosuspend for their devices by calling

usb_enable_autosuspend(struct usb_device *udev);

in their probe() routine, if they know that the device is capable of
suspending and resuming correctly.  This is exactly equivalent to
writing "auto" to the device's power/control attribute.  Likewise,
drivers can disable autosuspend by calling

usb_disable_autosuspend(struct usb_device *udev);

This is exactly the same as writing "on" to the power/control attribute.

Sometimes a driver needs to make sure that remote wakeup is enabled
during autosuspend.  For example, there's not much point
autosuspending a keyboard if the user can't cause the keyboard to do a
remote wakeup by typing on it.  If the driver sets
intf->needs_remote_wakeup to 1, the kernel won't autosuspend the
device if remote wakeup isn't available.  (If the device is already
autosuspended, though, setting this flag won't cause the kernel to
autoresume it.  Normally a driver would set this flag in its probe
method, at which time the device is guaranteed not to be
autosuspended.)

If a driver does its I/O asynchronously in interrupt context, it
should call usb_autopm_get_interface_async() before starting output and
usb_autopm_put_interface_async() when the output queue drains.  When
it receives an input event, it should call

usb_mark_last_busy(struct usb_device *udev);

in the event handler.  This tells the PM core that the device was just
busy and therefore the next autosuspend idle-delay expiration should
be pushed back.  Many of the usb_autopm_* routines also make this call,
so drivers need to worry only when interrupt-driven input arrives.

Asynchronous operation is always subject to races.  For example, a
driver may call the usb_autopm_get_interface_async() routine at a time
when the core has just finished deciding the device has been idle for
long enough but not yet gotten around to calling the driver's suspend
method.  The suspend method must be responsible for synchronizing with
the I/O request routine and the URB completion handler; it should
cause autosuspends to fail with -EBUSY if the driver needs to use the
device.

External suspend calls should never be allowed to fail in this way,
only autosuspend calls.  The driver can tell them apart by applying
the PMSG_IS_AUTO() macro to the message argument to the suspend
method; it will return True for internal PM events (autosuspend) and
False for external PM events.


Mutual exclusion
----------------

For external events -- but not necessarily for autosuspend or
autoresume -- the device semaphore (udev->dev.sem) will be held when a
suspend or resume method is called.  This implies that external
suspend/resume events are mutually exclusive with calls to probe,
disconnect, pre_reset, and post_reset; the USB core guarantees that
this is true of autosuspend/autoresume events as well.

If a driver wants to block all suspend/resume calls during some
critical section, the best way is to lock the device and call
usb_autopm_get_interface() (and do the reverse at the end of the
critical section).  Holding the device semaphore will block all
external PM calls, and the usb_autopm_get_interface() will prevent any
internal PM calls, even if it fails.  (Exercise: Why?)


Interaction between dynamic PM and system PM
--------------------------------------------

Dynamic power management and system power management can interact in
a couple of ways.

Firstly, a device may already be autosuspended when a system suspend
occurs.  Since system suspends are supposed to be as transparent as
possible, the device should remain suspended following the system
resume.  But this theory may not work out well in practice; over time
the kernel's behavior in this regard has changed.  As of 2.6.37 the
policy is to resume all devices during a system resume and let them
handle their own runtime suspends afterward.

Secondly, a dynamic power-management event may occur as a system
suspend is underway.  The window for this is short, since system
suspends don't take long (a few seconds usually), but it can happen.
For example, a suspended device may send a remote-wakeup signal while
the system is suspending.  The remote wakeup may succeed, which would
cause the system suspend to abort.  If the remote wakeup doesn't
succeed, it may still remain active and thus cause the system to
resume as soon as the system suspend is complete.  Or the remote
wakeup may fail and get lost.  Which outcome occurs depends on timing
and on the hardware and firmware design.


xHCI hardware link PM
---------------------

xHCI host controller provides hardware link power management to usb2.0
(xHCI 1.0 feature) and usb3.0 devices which support link PM. By
enabling hardware LPM, the host can automatically put the device into
lower power state(L1 for usb2.0 devices, or U1/U2 for usb3.0 devices),
which state device can enter and resume very quickly.

The user interface for controlling hardware LPM is located in the
power/ subdirectory of each USB device's sysfs directory, that is, in
/sys/bus/usb/devices/.../power/ where "..." is the device's ID. The
relevant attribute files are usb2_hardware_lpm and usb3_hardware_lpm.

power/usb2_hardware_lpm

When a USB2 device which support LPM is plugged to a
xHCI host root hub which support software LPM, the
host will run a software LPM test for it; if the device
enters L1 state and resume successfully and the host
supports USB2 hardware LPM, this file will show up and
driver will enable hardware LPM	for the device. You
can write y/Y/1 or n/N/0 to the file to	enable/disable
USB2 hardware LPM manually. This is for	test purpose mainly.

power/usb3_hardware_lpm_u1
power/usb3_hardware_lpm_u2

When a USB 3.0 lpm-capable device is plugged in to a
xHCI host which supports link PM, it will check if U1
and U2 exit latencies have been set in the BOS
descriptor; if the check is passed and the host
supports USB3 hardware LPM, USB3 hardware LPM will be
enabled for the device and these files will be created.
The files hold a string value (enable or disable)
indicating whether or not USB3 hardware LPM U1 or U2
is enabled for the device.

USB Port Power Control
----------------------

In addition to suspending endpoint devices and enabling hardware
controlled link power management, the USB subsystem also has the
capability to disable power to ports under some conditions.  Power is
controlled through Set/ClearPortFeature(PORT_POWER) requests to a hub.
In the case of a root or platform-internal hub the host controller
driver translates PORT_POWER requests into platform firmware (ACPI)
method calls to set the port power state. For more background see the
Linux Plumbers Conference 2012 slides [1] and video [2]:

Upon receiving a ClearPortFeature(PORT_POWER) request a USB port is
logically off, and may trigger the actual loss of VBUS to the port [3].
VBUS may be maintained in the case where a hub gangs multiple ports into
a shared power well causing power to remain until all ports in the gang
are turned off.  VBUS may also be maintained by hub ports configured for
a charging application.  In any event a logically off port will lose
connection with its device, not respond to hotplug events, and not
respond to remote wakeup events*.

WARNING: turning off a port may result in the inability to hot add a device.
Please see "User Interface for Port Power Control" for details.

As far as the effect on the device itself it is similar to what a device
goes through during system suspend, i.e. the power session is lost.  Any
USB device or driver that misbehaves with system suspend will be
similarly affected by a port power cycle event.  For this reason the
implementation shares the same device recovery path (and honors the same
quirks) as the system resume path for the hub.

[1]: http://dl.dropbox.com/u/96820575/sarah-sharp-lpt-port-power-off2-mini.pdf
[2]: http://linuxplumbers.ubicast.tv/videos/usb-port-power-off-kerneluserspace-api/
[3]: USB 3.1 Section 10.12
* wakeup note: if a device is configured to send wakeup events the port
power control implementation will block poweroff attempts on that
port.


User Interface for Port Power Control
-------------------------------------

The port power control mechanism uses the PM runtime system.  Poweroff is
requested by clearing the power/pm_qos_no_power_off flag of the port device
(defaults to 1).  If the port is disconnected it will immediately receive a
ClearPortFeature(PORT_POWER) request.  Otherwise, it will honor the pm runtime
rules and require the attached child device and all descendants to be suspended.
This mechanism is dependent on the hub advertising port power switching in its
hub descriptor (wHubCharacteristics logical power switching mode field).

Note, some interface devices/drivers do not support autosuspend.  Userspace may
need to unbind the interface drivers before the usb_device will suspend.  An
unbound interface device is suspended by default.  When unbinding, be careful
to unbind interface drivers, not the driver of the parent usb device.  Also,
leave hub interface drivers bound.  If the driver for the usb device (not
interface) is unbound the kernel is no longer able to resume the device.  If a
hub interface driver is unbound, control of its child ports is lost and all
attached child-devices will disconnect.  A good rule of thumb is that if the
'driver/module' link for a device points to /sys/module/usbcore then unbinding
it will interfere with port power control.

Example of the relevant files for port power control.  Note, in this example
these files are relative to a usb hub device (prefix).

prefix=/sys/devices/pci0000:00/0000:00:14.0/usb3/3-1

          attached child device +
      hub port device +         |
hub interface device +       |         |
              v       v         v
      $prefix/3-1:1.0/3-1-port1/device

$prefix/3-1:1.0/3-1-port1/power/pm_qos_no_power_off
$prefix/3-1:1.0/3-1-port1/device/power/control
$prefix/3-1:1.0/3-1-port1/device/3-1.1:<intf0>/driver/unbind
$prefix/3-1:1.0/3-1-port1/device/3-1.1:<intf1>/driver/unbind
...
$prefix/3-1:1.0/3-1-port1/device/3-1.1:<intfN>/driver/unbind

In addition to these files some ports may have a 'peer' link to a port on
another hub.  The expectation is that all superspeed ports have a
hi-speed peer.

$prefix/3-1:1.0/3-1-port1/peer -> ../../../../usb2/2-1/2-1:1.0/2-1-port1
../../../../usb2/2-1/2-1:1.0/2-1-port1/peer -> ../../../../usb3/3-1/3-1:1.0/3-1-port1

Distinct from 'companion ports', or 'ehci/xhci shared switchover ports'
peer ports are simply the hi-speed and superspeed interface pins that
are combined into a single usb3 connector.  Peer ports share the same
ancestor XHCI device.

While a superspeed port is powered off a device may downgrade its
connection and attempt to connect to the hi-speed pins.  The
implementation takes steps to prevent this:

1/ Port suspend is sequenced to guarantee that hi-speed ports are powered-off
before their superspeed peer is permitted to power-off.  The implication is
that the setting pm_qos_no_power_off to zero on a superspeed port may not cause
the port to power-off until its highspeed peer has gone to its runtime suspend
state.  Userspace must take care to order the suspensions if it wants to
guarantee that a superspeed port will power-off.

2/ Port resume is sequenced to force a superspeed port to power-on prior to its
highspeed peer.

3/ Port resume always triggers an attached child device to resume.  After a
power session is lost the device may have been removed, or need reset.
Resuming the child device when the parent port regains power resolves those
states and clamps the maximum port power cycle frequency at the rate the child
device can suspend (autosuspend-delay) and resume (reset-resume latency).

Sysfs files relevant for port power control:
<hubdev-portX>/power/pm_qos_no_power_off:
This writable flag controls the state of an idle port.
Once all children and descendants have suspended the
port may suspend/poweroff provided that
pm_qos_no_power_off is '0'.  If pm_qos_no_power_off is
'1' the port will remain active/powered regardless of
the stats of descendants.  Defaults to 1.

<hubdev-portX>/power/runtime_status:
This file reflects whether the port is 'active' (power is on)
or 'suspended' (logically off).  There is no indication to
userspace whether VBUS is still supplied.

<hubdev-portX>/connect_type:
An advisory read-only flag to userspace indicating the
location and connection type of the port.  It returns
one of four values 'hotplug', 'hardwired', 'not used',
and 'unknown'.  All values, besides unknown, are set by
platform firmware.

"hotplug" indicates an externally connectable/visible
port on the platform.  Typically userspace would choose
to keep such a port powered to handle new device
connection events.

"hardwired" refers to a port that is not visible but
connectable. Examples are internal ports for USB
bluetooth that can be disconnected via an external
switch or a port with a hardwired USB camera.  It is
expected to be safe to allow these ports to suspend
provided pm_qos_no_power_off is coordinated with any
switch that gates connections.  Userspace must arrange
for the device to be connected prior to the port
powering off, or to activate the port prior to enabling
connection via a switch.

"not used" refers to an internal port that is expected
to never have a device connected to it.  These may be
empty internal ports, or ports that are not physically
exposed on a platform.  Considered safe to be
powered-off at all times.

"unknown" means platform firmware does not provide
information for this port.  Most commonly refers to
external hub ports which should be considered 'hotplug'
for policy decisions.

NOTE1: since we are relying on the BIOS to get this ACPI
information correct, the USB port descriptions may be
missing or wrong.

NOTE2: Take care in clearing pm_qos_no_power_off.  Once
power is off this port will
not respond to new connect events.

Once a child device is attached additional constraints are
applied before the port is allowed to poweroff.

<child>/power/control:
Must be 'auto', and the port will not
power down until <child>/power/runtime_status
reflects the 'suspended' state.  Default
value is controlled by child device driver.

<child>/power/persist:
This defaults to '1' for most devices and indicates if
kernel can persist the device's configuration across a
power session loss (suspend / port-power event).  When
this value is '0' (quirky devices), port poweroff is
disabled.

<child>/driver/unbind:
Wakeup capable devices will block port poweroff.  At
this time the only mechanism to clear the usb-internal
wakeup-capability for an interface device is to unbind
its driver.

Summary of poweroff pre-requisite settings relative to a port device:

echo 0 > power/pm_qos_no_power_off
echo 0 > peer/power/pm_qos_no_power_off # if it exists
echo auto > power/control # this is the default value
echo auto > <child>/power/control
echo 1 > <child>/power/persist # this is the default value

Suggested Userspace Port Power Policy
-------------------------------------

As noted above userspace needs to be careful and deliberate about what
ports are enabled for poweroff.

The default configuration is that all ports start with
power/pm_qos_no_power_off set to '1' causing ports to always remain
active.

Given confidence in the platform firmware's description of the ports
(ACPI _PLD record for a port populates 'connect_type') userspace can
clear pm_qos_no_power_off for all 'not used' ports.  The same can be
done for 'hardwired' ports provided poweroff is coordinated with any
connection switch for the port.

A more aggressive userspace policy is to enable USB port power off for
all ports (set <hubdev-portX>/power/pm_qos_no_power_off to '0') when
some external factor indicates the user has stopped interacting with the
system.  For example, a distro may want to enable power off all USB
ports when the screen blanks, and re-power them when the screen becomes
active.  Smart phones and tablets may want to power off USB ports when
the user pushes the power button.
~~~

## Interupt 割り込み

kernel is dealing with many computations ,such as loop.
External Input signal by keyboards or networks reach kernel .
How kernel deal with this kind of signals is called "Interruption".

kernel はループ処理をはじめとして、さまざまな計算処理を処理している
そこに、キーボードやネットワークからの入力がある。
これを割り込みという。


# Linux と Unix の違い

Linux :
「ソフト割り込み」
システム全体の負荷を軽減

Unix :
「割り込みレベル」と言う概念をもちいて
応答性を確保


-p21
<br><br>
   <script>
	(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
	(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
	m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
	})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
	ga('create', '', 'auto');
	ga('send', 'pageview', {
	  'page': '/linux/Linux-Kernel.md/',
	  'title': 'Linux Kernel'
	});
</script>

   
<script id="dsq-count-scr" src="//ghasshee.disqus.com/count.js" async></script>


<!-- Markdeep: --><style class="fallback">body{visibility:hidden}</style>
<script>markdeepOptions={tocStyle:'long'};</script>
<script src="/dependencies/markdeep/latest/markdeep.min.js?" charset="utf-8"></script>