New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SIGCHLD handler to pause container #36853
Conversation
Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands will still work. Regular contributors should join the org to skip this step.
If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits. Don't block approval of this on me, I just translated pause to C.
int main() { | ||
pid_t pid; | ||
pid = fork(); | ||
if ( pid == 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style mismatch with pause.c. I don't know if kubernetes has C style conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could run clang formatter on the C code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Ran clang-format on pause.c and orphan.c.
int main() { | ||
if (signal(SIGINT, sigdown) == SIG_ERR) | ||
if (getpid() != 1) { | ||
fprintf(stderr, "Warning: pause should be the first process in a pod\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this an error? I'm not sure what value this has as a warning.
Also nit: capitalization (other messages are lowercase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the capitalization on the other messages. I'm inclined to keep it a warning rather than taking down the infra container when presented with an unusual situation. Plus it was helpful to be able to run it outside of a container.
Added to Agenda for the 2016-11-22 SIG Node meetup |
@mrunalp - PTAL |
Looks fine to me. |
LGTM , I will let @dchen1107 make an additional review. |
fprintf(stderr, "error: infinite loop terminated\n"); | ||
return 42; | ||
if (getpid() != 1) | ||
fprintf(stderr, "Warning: pause should be the first process in a pod\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: We use pause images for testing as well. So this is necessarily not an error condition.
return 1; | ||
if (sigaction(SIGTERM, &(struct sigaction){.sa_handler = sigdown}, NULL) < 0) | ||
return 2; | ||
if (sigaction(SIGCHLD, &(struct sigaction){.sa_handler = sigreap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not place this under a if getpid() == 1 {..}
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no downside to installing a signal handler that will never get called so my preference would be to avoid special cases. If there's going to be a failure installing the handler it should happen whether or not it's run in a container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned because pause
image is used for arbitrary testing too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to make pause create a child, so it will never automatically receive a SIGCHLD outside of a shared PID namespace. If some other process sends it a SIGCHLD (which would be strange) then waitpid() will immediately return 0 (because it has no children) and go back to the infinite loop.
Its use in arbitrary testing is an argument for keeping its behavior consistent no matter how it's executed, but I might have missed your point. Could you explain what you want to protect against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is to not make much assumptions about how pause is expected to be used. Instead, we make it behave like an init process whenever its pid is 1
. Otherwise, its old behavior can continue to exist.
.sa_flags = SA_NOCLDSTOP}, | ||
NULL) < 0) | ||
return 3; | ||
sigaction(SIGKILL, &(struct sigaction){.sa_handler = sigdown}, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to trap sigkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add that part, but I believe PID 1 has special handling and will ignore signals by default, including SIGKILL. I could look into it more but perhaps @uluyol just knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied it over from the original Go port. I didn't think to remove it because as @verb said above it shouldn't harm anything. I don't know whether it's possible to trap it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @vishh. I think this should be removed as process can't catch SIGKILL sent to the process itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this on a single host and PID 1 (in the container) ignored SIGKILL even with a handler installed. Doing a little research, sysvinit installs the handler for SIGKILL while upstart & systemd appear not to install one. I have no preference here and will defer to the reviewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not installing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Once this PR lands, there might be some changes required to how pod restarts are handled. As of now, infra container death does not bring down other containers. If a pod needs to be greacefully restarted, all its user containers will have to be stopped (or killed) gracefully first, and then the infra container can be restarted, followed by restarting of user containers. |
@vishh I agree that those are concerns, but they're not concerns for this PR which only modifies the pause container and does not enable the shared PID namespace. Let's address that in #1615. I'd really like to avoid forwarding signals and other init features which would be better handled by the kubelet. @dchen1107 has longer term plans for init-like functionality that will work across run times. |
This allows pause to reap zombies in the upcoming Shared PID namespace (kubernetes#1615). Uses the better defined sigaction() instead of signal() for all signals both for consistency (SIGCHLD handler avoids SA_RESTART) and to avoid the implicit signal()->sigaction() translation of various libc versions. Also makes warnings errors and includes a tool to make orphaned zombies for manual testing.
Run clang-format on the C files and capitalize error messages.
fprintf(stderr, "error: infinite loop terminated\n"); | ||
return 42; | ||
if (getpid() != 1) | ||
fprintf(stderr, "Warning: pause should be the first process in a pod\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning? shouldn't this be a distinct fatal condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oh, I guess maybe you intend this for testing - though in the production case I think it should fail hard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the pause container is used in some circumstances where this wouldn't be an error (#36853 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for missing that previous comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I guess a code comment would help :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Added a comment.
Pause container can function across runtimes AFAIK. How do you expect kubelet to forward signals? It is only aware of the init pid in a container. Kubelet needs to rely on this custom init process for all init functionalities. In any case, I'm ok with punting on handling init issues until we turn on shared pidns. It might help if you can capture these comments in #1615 just so that we don't forget them in the future. |
Whoops, didn't realize this was blocked on me. I don't expect the kubelet to forward signals. I meant that init features other than signal handling are better handled by the kubelet. For example, users should rely on the kubelet to start processes in a pod rather than running an init system in the container. Regarding signals, I expect the runtime (docker) to deliver signals to the proper process in containers that share a namespace. For example:
(signalsquealer just prints out what signal it receives and ignores it.) I expect signals sent to c2 to be delivered to the process for c2 and not c1, and they are:
It's the case that docker 1.12 currently doesn't reparent PIDs correctly, so this isn't an entirely accurate test, but I still expect However if you're referring to delivering signals to children of processes in other containers, I still think that shouldn't be handled by the infra container, even if it's a process that double forks and is inherited by pause. Containers shipping their own in-container init will continue to function correctly as long as they're not doing anything too exotic, but in-container init isn't necessary for Kubernetes and we shouldn't add complexity to support it. I don't think we should establish the convention that a signal delivered to the pause container must be forwarded to every process in other containers in the pod. Caveats:
|
@vishh @dchen1107 @derekwaynecarr (assigned reviewers) This PR has been open for 2 months. What else do I need to do for it? |
runtimes are expected to track processes using cgroups and the scenario above is also expected to be covered. We should ideally have a test case for it.
Yeah. The infra container is essentially the "pod sandbox" in CRI. It's should be managed accordingly. |
@verb This comment thread is still open - #36853 (comment) Other than that comment, this PR LGTM. Happy to tag once you address that. |
@vishh Thanks, I totally missed that, too. My github-fu is weak. :-/ Added a commit to remove the SIGKILL handler. |
PR workflow in github can be frustrating for long PRs :)
…On Thu, Jan 19, 2017 at 2:59 PM, Lee Verberne ***@***.***> wrote:
@vishh <https://github.com/vishh> Thanks, I totally missed that, too. My
github-fu is weak. :-/
Added a commit to remove the SIGKILL handler.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#36853 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKCXy-Rag_vPGLskxRqSPxqeRfrJEks5rT-q7gaJpZM4KzK4r>
.
|
/lgtm
|
Automatic merge from submit-queue (batch tested with PRs 39446, 40023, 36853) |
What this PR does / why we need it: This allows pause to reap orphaned zombies in a shared PID namespace. (#1615)
Special notes for your reviewer: I plan to discuss this with SIG Node to ensure compatibility with future runtimes.
Release note: This will have no effect until shared PID namespace is enabled, so recommend release-note-none.
This allows pause to reap zombies in the upcoming Shared PID namespace
(#1615). Uses the better defined sigaction() instead of signal() for all
signals both for consistency (SIGCHLD handler avoids SA_RESTART) and to
avoid the implicit signal()->sigaction() translation of various libc
versions.
Also makes warnings errors and includes a tool to make orphaned zombies
for manual testing.
This change is