URL
https://opencores.org/ocsvn/test_project/test_project/trunk
Subversion Repositories test_project
[/] [test_project/] [trunk/] [linux_sd_driver/] [Documentation/] [unshare.txt] - Rev 62
Compare with Previous | Blame | View Log
unshare system call:--------------------This document describes the new system call, unshare. The documentprovides an overview of the feature, why it is needed, how it canbe used, its interface specification, design, implementation andhow it can be tested.Change Log:-----------version 0.1 Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006Contents:---------1) Overview2) Benefits3) Cost4) Requirements5) Functional Specification6) High Level Design7) Low Level Design8) Test Specification9) Future Work1) Overview-----------Most legacy operating system kernels support an abstraction of threadsas multiple execution contexts within a process. These kernels providespecial resources and mechanisms to maintain these "threads". The Linuxkernel, in a clever and simple manner, does not make distinctionbetween processes and "threads". The kernel allows processes to shareresources and thus they can achieve legacy "threads" behavior withoutrequiring additional data structures and mechanisms in the kernel. Thepower of implementing threads in this manner comes not only fromits simplicity but also from allowing application programmers to workoutside the confinement of all-or-nothing shared resources of legacythreads. On Linux, at the time of thread creation using the clone systemcall, applications can selectively choose which resources to sharebetween threads.unshare system call adds a primitive to the Linux thread model thatallows threads to selectively 'unshare' any resources that were beingshared at the time of their creation. unshare was conceptualized byAl Viro in the August of 2000, on the Linux-Kernel mailing list, as partof the discussion on POSIX threads on Linux. unshare augments theusefulness of Linux threads for applications that would like to controlshared resources without creating a new process. unshare is a naturaladdition to the set of available primitives on Linux that implementthe concept of process/thread as a virtual machine.2) Benefits-----------unshare would be useful to large application frameworks such as PAMwhere creating a new process to control sharing/unsharing of processresources is not possible. Since namespaces are shared by defaultwhen creating a new process using fork or clone, unshare can benefiteven non-threaded applications if they have a need to disassociatefrom default shared namespace. The following lists two use-caseswhere unshare can be used.2.1 Per-security context namespaces-----------------------------------unshare can be used to implement polyinstantiated directories usingthe kernel's per-process namespace mechanism. Polyinstantiated directories,such as per-user and/or per-security context instance of /tmp, /var/tmp orper-security context instance of a user's home directory, isolate userprocesses when working with these directories. Using unshare, a PAMmodule can easily setup a private namespace for a user at login.Polyinstantiated directories are required for Common Criteria certificationwith Labeled System Protection Profile, however, with the availabilityof shared-tree feature in the Linux kernel, even regular Linux systemscan benefit from setting up private namespaces at login andpolyinstantiating /tmp, /var/tmp and other directories deemedappropriate by system administrators.2.2 unsharing of virtual memory and/or open files-------------------------------------------------Consider a client/server application where the server is processingclient requests by creating processes that share resources such asvirtual memory and open files. Without unshare, the server has todecide what needs to be shared at the time of creating the processwhich services the request. unshare allows the server an ability todisassociate parts of the context during the servicing of therequest. For large and complex middleware application frameworks, thisability to unshare after the process was created can be veryuseful.3) Cost-------In order to not duplicate code and to handle the fact that unshareworks on an active task (as opposed to clone/fork working on a newlyallocated inactive task) unshare had to make minor reorganizationalchanges to copy_* functions utilized by clone/fork system call.There is a cost associated with altering existing, well tested andstable code to implement a new feature that may not get exercisedextensively in the beginning. However, with proper design and codereview of the changes and creation of an unshare test for the LTPthe benefits of this new feature can exceed its cost.4) Requirements---------------unshare reverses sharing that was done using clone(2) system call,so unshare should have a similar interface as clone(2). That is,since flags in clone(int flags, void *stack) specifies what shouldbe shared, similar flags in unshare(int flags) should specifywhat should be unshared. Unfortunately, this may appear to invertthe meaning of the flags from the way they are used in clone(2).However, there was no easy solution that was less confusing and thatallowed incremental context unsharing in future without an ABI change.unshare interface should accommodate possible future addition ofnew context flags without requiring a rebuild of old applications.If and when new context flags are added, unshare design should allowincremental unsharing of those resources on an as needed basis.5) Functional Specification---------------------------NAMEunshare - disassociate parts of the process execution contextSYNOPSIS#include <sched.h>int unshare(int flags);DESCRIPTIONunshare allows a process to disassociate parts of its executioncontext that are currently being shared with other processes. Partof execution context, such as the namespace, is shared by defaultwhen a new process is created using fork(2), while other parts,such as the virtual memory, open file descriptors, etc, may beshared by explicit request to share them when creating a processusing clone(2).The main use of unshare is to allow a process to control itsshared execution context without creating a new process.The flags argument specifies one or bitwise-or'ed of several ofthe following constants.CLONE_FSIf CLONE_FS is set, file system information of the calleris disassociated from the shared file system information.CLONE_FILESIf CLONE_FILES is set, the file descriptor table of thecaller is disassociated from the shared file descriptortable.CLONE_NEWNSIf CLONE_NEWNS is set, the namespace of the caller isdisassociated from the shared namespace.CLONE_VMIf CLONE_VM is set, the virtual memory of the caller isdisassociated from the shared virtual memory.RETURN VALUEOn success, zero returned. On failure, -1 is returned and errno isERRORSEPERM CLONE_NEWNS was specified by a non-root process (processwithout CAP_SYS_ADMIN).ENOMEM Cannot allocate sufficient memory to copy parts of caller'scontext that need to be unshared.EINVAL Invalid flag was specified as an argument.CONFORMING TOThe unshare() call is Linux-specific and should not be usedin programs intended to be portable.SEE ALSOclone(2), fork(2)6) High Level Design--------------------Depending on the flags argument, the unshare system call allocatesappropriate process context structures, populates it with values fromthe current shared version, associates newly duplicated structureswith the current task structure and releases corresponding sharedversions. Helper functions of clone (copy_*) could not be useddirectly by unshare because of the following two reasons.1) clone operates on a newly allocated not-yet-active taskstructure, where as unshare operates on the current activetask. Therefore unshare has to take appropriate task_lock()before associating newly duplicated context structures2) unshare has to allocate and duplicate all context structuresthat are being unshared, before associating them with thecurrent task and releasing older shared structures. Failuredo so will create race conditions and/or oops when tryingto backout due to an error. Consider the case of unsharingboth virtual memory and namespace. After successfully unsharingvm, if the system call encounters an error while allocatingnew namespace structure, the error return code will have toreverse the unsharing of vm. As part of the reversal thesystem call will have to go back to older, shared, vmstructure, which may not exist anymore.Therefore code from copy_* functions that allocated and duplicatedcurrent context structure was moved into new dup_* functions. Now,copy_* functions call dup_* functions to allocate and duplicateappropriate context structures and then associate them with thetask structure that is being constructed. unshare system call onthe other hand performs the following:1) Check flags to force missing, but implied, flags2) For each context structure, call the corresponding unsharehelper function to allocate and duplicate a new contextstructure, if the appropriate bit is set in the flags argument.3) If there is no error in allocation and duplication and thereare new context structures then lock the current task structure,associate new context structures with the current task structure,and release the lock on the current task structure.4) Appropriately release older, shared, context structures.7) Low Level Design-------------------Implementation of unshare can be grouped in the following 4 differentitems:a) Reorganization of existing copy_* functionsb) unshare system call service functionc) unshare helper functions for each different process contextd) Registration of system call number for different architectures7.1) Reorganization of copy_* functionsEach copy function such as copy_mm, copy_namespace, copy_files,etc, had roughly two components. The first component allocatedand duplicated the appropriate structure and the second componentlinked it to the task structure passed in as an argument to the copyfunction. The first component was split into its own function.These dup_* functions allocated and duplicated the appropriatecontext structure. The reorganized copy_* functions invokedtheir corresponding dup_* functions and then linked the newlyduplicated structures to the task structure with which thecopy function was called.7.2) unshare system call service function* Check flagsForce implied flags. If CLONE_THREAD is set force CLONE_VM.If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND isset and signals are also being shared, force CLONE_THREAD. IfCLONE_NEWNS is set, force CLONE_FS.* For each context flag, invoke the corresponding unshare_*helper routine with flags passed into the system call and areference to pointer pointing the new unshared structure* If any new structures are created by unshare_* helperfunctions, take the task_lock() on the current task,modify appropriate context pointers, and release thetask lock.* For all newly unshared structures, release the correspondingolder, shared, structures.7.3) unshare_* helper functionsFor unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,and CLONE_THREAD, return -EINVAL since they are not implemented yet.For others, check the flag value to see if the unsharing isrequired for that structure. If it is, invoke the correspondingdup_* function to allocate and duplicate the structure and returna pointer to it.7.4) Appropriately modify architecture specific code to register thenew system call.8) Test Specification---------------------The test for unshare should test the following:1) Valid flags: Test to check that clone flags for signal andsignal handlers, for which unsharing is not implementedyet, return -EINVAL.2) Missing/implied flags: Test to make sure that if unsharingnamespace without specifying unsharing of filesystem, correctlyunshares both namespace and filesystem information.3) For each of the four (namespace, filesystem, files and vm)supported unsharing, verify that the system call correctlyunshares the appropriate structure. Verify that unsharingthem individually as well as in combination with eachother works as expected.4) Concurrent execution: Use shared memory segments and futex onan address in the shm segment to synchronize execution ofabout 10 threads. Have a couple of threads execute execve,a couple _exit and the rest unshare with different combinationof flags. Verify that unsharing is performed as expected andthat there are no oops or hangs.9) Future Work--------------The current implementation of unshare does not allow unsharing ofsignals and signal handlers. Signals are complex to begin with andto unshare signals and/or signal handlers of a currently runningprocess is even more complex. If in the future there is a specificneed to allow unsharing of signals and/or signal handlers, it canbe incrementally added to unshare without affecting legacyapplications using unshare.
