Bugzilla

Updated

•

6 years ago

Keywords: sec-high

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security → dom-core-security

[:philipp]

Updated

•

6 years ago

tracking-firefox60: --- → ?

Nika Layzell [:nika] (ni? for response)

Comment 4

•

6 years ago

:mystor can you take another look based on comment 3, since it seems this isn't actually null much of the time?

tracking-firefox60: ? → +

tracking-firefox61: --- → +

Flags: needinfo?(nika)

Comment 5

•

6 years ago

Sorry I don't have much time to look into this right now, I tried to think through the way this could happen, and I don't have any ideas.

Flags: needinfo?(nika)

Andrew McCreight [:mccr8]

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INCOMPLETE

Andrew McCreight [:mccr8]

Comment 6

•

6 years ago

Why was this closed?

Flags: needinfo?(continuation)

Comment 7

•

6 years ago

Unactionable sec-high bugs make the list of open sec-high bug less useful.

Flags: needinfo?(continuation)

Comment 8

•

6 years ago

(In reply to Julien Cristau [:jcristau] from comment #6)
> Why was this closed?

The Dev says he has no ideas of how to fix this so this bug is just going to sit open until... In that situation, critsmash triage resolves them as incomplete so the list of sec-high (or critical) bugs is a list of bugs that are actually being worked on to a resolution.

Sylvestre Ledru [:Sylvestre]

Comment 9

•

6 years ago

No sense tracking a bug that isn't going anywhere. That said, Fx60 betas are still showing a pretty regular crash rate here...

status-firefox-esr60: --- → affected

tracking-firefox60: + → -

tracking-firefox61: + → -

Updated

•

6 years ago

status-firefox60: affected → wontfix

status-firefox62: --- → ?

status-firefox63: --- → affected

Andrew Overholt [:overholt]

Comment 10

•

6 years ago

Given the high crash rate here, it might be more appropriate to have this marked as stalled but to keep it open. However, given the below, I'm not yet sure we're stalled.

Peter Van Der Beken and I looked into this a bit and with Jeff Muizelaar's help, we were able to get to https://bit.ly/2LJRfGm vs https://bit.ly/2osbsHy (we weren't sure if the template parameter in the signature was catching too many unrelated crashes). It looks like the crashes with mozilla::net::ExtensionJARFileOpener::SendBackFD() on the stack are the most common. Haik touched that code last with reviews by Jim so I'll needinfo them here to see if they have ideas.

Status: RESOLVED → REOPENED

Flags: needinfo?(jmathies)

Flags: needinfo?(haftandilian)

Resolution: INCOMPLETE → ---

Assignee

Comment 12

•

6 years ago

I don't know what's causing this yet, but given what :nika and :philipp said above, I suspect this is due to inadequate synchronization in the NeckoParent handling of the async GetExtensionFD message. We handle that message by offloading work to another thread then jumping to the main thread to send back the result.

The GetExtensionFD message is sent from a child process when it needs to get the FD for an extension JAR file so it can open and read the JAR without having filesystem access to it directly. This async message is handled in the Necko parent process. The JAR file is opened in the parent process and the FD is sent back to the child. The JAR file is opened off the main thread in ExtensionJARFileOpener::OpenFile() and then the FD is sent back from the main thread in ExtensionJARFileOpener::SendBackFD(). Tear down of the NeckoParent needs to wait for any ExtensionJARFileOpener instances to complete, but that is not being done. As a fix, we could keep a list of outstanding ExtensionJARFileOpener instances and then wait for them to all complete duing teardown.

I need to do some more debugging to confirm that this is the problem. Taking this bug.

Assignee: nobody → haftandilian

Flags: needinfo?(haftandilian)

Priority: -- → P1

Comment 13

•

6 years ago

(In reply to Haik Aftandilian [:haik] from comment #12)
> Tear down of the
> NeckoParent needs to wait for any ExtensionJARFileOpener instances to
> complete, but that is not being done. As a fix, we could keep a list of
> outstanding ExtensionJARFileOpener instances and then wait for them to all
> complete duing teardown.

What I don't understand here is: if the actor is destroyed before the resolve function is called, the captured WeakPtr self__ should be null here: https://crash-stats.mozilla.com/sources/highlight/?url=https://gecko-generated-sources.s3.amazonaws.com/ccc17ebc155a410d6b3a59b3247a330b6e1af1ea1217cdeb22906fbcd899a89c5b1711fcfa6472957bff18c5957fe964a8d521b445c95db3e7e534fe63987c49/ipc/ipdl/PNeckoParent.cpp#L-2321

And in that case we shouldn't be able to reach the point where the crash is reported.  I was wondering if maybe something was happening on the wrong thread, but I didn't see any obvious way that would happen from looking at the code.

Comment 14

•

6 years ago

(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #13)
> What I don't understand here is:

See also comment #2.

Comment 15

•

6 years ago

A thing that's useful to know: on amd64, accesses to non-canonical addresses (where it's not a sign-extended 48-bit value) raise #GP, not #PF, and the CPU doesn't supply the address in question to the OS.  Linux and Mac report the address as 0, Windows as -1.

The 32-bit versions of this crash are almost all 0xe5e5e5e5 (or the 64-bit sign extension of that value).  Many of the 64-bit versions are the address we'd get if it were a non-canonical address, but some of them aren't.  This suggests comparing two crashes from the same Firefox build:

bp-677abd07-4382-4a24-94c1-1dad60180828 has crash address 0xe2d82c1351c; in the Raw Dump tab, rsi and rdi have the same value.

bp-5e96cbb7-72a6-4933-8bab-3765c0180825 reports crash address 0x0, but in the raw dump, rsi and rdi are 0xe5e5e5e5e5e5e5e5

So maybe the value we're getting from self__ is the result of loading from a dangling pointer, which would explain why it's not null.  Specifically:

Are we copy-constructing or destroying a copy of the resolver function off-main-thread?  It contains a non-thread-safe WeakPtr, so that could cause a lost update to the WeakReference refcount and explain what we're seeing.  WeakPtr has thread-safety assertions, but they're debug-only, so if this happens only in a timing-sensitive edge case they might not catch it.

Comment 16

•

6 years ago

bp-ccf7bb2a-d6b7-4ca1-bc27-ce8dc0180907 and bp-c8630a42-07c4-4c92-b6fb-ff4bd0180909 are from one of our builds, so I can do some diassembly.  The bad address really is the aActor argument to IPDLParamTraits<FileDescriptor>::Write, and it does appear to be result of converting self__ to a raw pointer (and is the same value checked against 0 at the start of the function).

Assignee

Comment 17

•

6 years ago

Working with Jed, we've got an explanation for the crash. In short, the ExtensionJARFileOpener (and therefore the WeakPtr within the resolve closure) can be deleted off the main thread. WeakPtr isn't thread safe and in this case needs to be deleted on the main thread to avoid racing with other instances of WeakPtr<PNeckoParent> referring to the same PNeckoParent. This could cause the WeakReference's reference counting to be off which can cause it to be prematurely deleted.

ExtensionJARFileOpener gets deleted on the OpenFile thread if the SendBackFD Runnable it creates to send back the FD on the main thread completes before ExtensionJARFileOpener is released by the OpenFile thread.

Comment 18

•

6 years ago

For the record, I was wrong about the assertions.  Destroying a WeakPtr doesn't check thread safety; other operations (creation, assignment, dereference) do.

Assignee

Comment 19

•

6 years ago

Attached patch Patch to avoid the crash be ensuring the PNeckoParent resolver is deleted on the main thread (obsolete) — Details — Splinter Review

This patch clears the resolver on the main thread after it is used in SendBackFD. If the final Release of ExtensionJARFileOpener ends up being called after SendBackFD on the worker thread, the resolver will be already deleted and we'll avoid this crash because we won't be deleting a WeakPtr<PNeckoParent> off the main thread. I intentionally did not add comments because this is a security bug.

I haven't reproduced the reported crash, but I have reproduced the WeakPtr being deleted off the main thread which could lead to the crash. I'm still testing the patch.

Attachment #9009664 - Flags: review?(jld)

Assignee

Updated

•

6 years ago

Flags: needinfo?(jmathies)

Comment 20

•

6 years ago

Comment on attachment 9009664 [details] [diff] [review]
Patch to avoid the crash be ensuring the PNeckoParent resolver is deleted on the main thread

Review of attachment 9009664 [details] [diff] [review]:
-----------------------------------------------------------------

I like the simplicity of this approach.

::: netwerk/protocol/res/ExtensionProtocolHandler.cpp
@@ +193,5 @@
>  private:
> +  virtual ~ExtensionJARFileOpener()
> +  {
> +    if (!NS_IsMainThread()) {
> +      MOZ_RELEASE_ASSERT(mResolve == nullptr);

This part of the patch seems like a relatively big hint about what the bug is, but whoever does the sec-approval would have a better idea of how much of a concern that is.

Specifically, this should work without the main thread check — as far as I can see, once we create one of these objects, every path either eventually calls SendBackFD or leaks it (if dispatch fails during shutdown).

Attachment #9009664 - Flags: review?(jld) → review+

Assignee

Comment 21

•

6 years ago

(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #20)
> Comment on attachment 9009664 [details] [diff] [review]
> ::: netwerk/protocol/res/ExtensionProtocolHandler.cpp
> @@ +193,5 @@
> >  private:
> > +  virtual ~ExtensionJARFileOpener()
> > +  {
> > +    if (!NS_IsMainThread()) {
> > +      MOZ_RELEASE_ASSERT(mResolve == nullptr);
> 
> This part of the patch seems like a relatively big hint about what the bug
> is, but whoever does the sec-approval would have a better idea of how much
> of a concern that is.

Thanks for the review. I'll err on the side of obfuscation and remove the new ASSERT. I'll use another bug to add more comments and add the ASSERT back once this lands.

> Specifically, this should work without the main thread check — as far as I
> can see, once we create one of these objects, every path either eventually
> calls SendBackFD or leaks it (if dispatch fails during shutdown).

I agree.

Assignee

Comment 22

•

6 years ago

Attached patch Patch to avoid the crash by ensuring the PNeckoParent resolver is deleted on the main thread — Details — Splinter Review

Removed the destructor ASSERT. Carrying forward r+.

Attachment #9009664 - Attachment is obsolete: true

Attachment #9010110 - Flags: review+

Assignee

Updated

•

6 years ago

Attachment #9010110 - Flags: review+

Assignee

Comment 23

•

6 years ago

Comment on attachment 9010110 [details] [diff] [review]
Patch to avoid the crash by ensuring the PNeckoParent resolver is deleted on the main thread

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not easily. The ramifications of the bug aren't clear from the patch. The UAF occurs in the parent process, but the code is triggered from the child process via an IPC message. A compromised child process could repeatedly trigger the code to run in the parent and eventually hit the race condition causing the UAF. The compromised child would also have to control allocations in order to leverage the UAD.

I don't think web content or an extension could repeatedly trigger the code to run in the parent, hitting the UAF.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No, but please review the check-in comment.

Which older supported branches are affected by this flaw?
All (ESR60, Release, Beta, Nightly)

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches?
No

If not, how different, hard to create, and risky will they be?
The patches should all be the same 1-line change and should not be hard to create. Given the small size of the patch, I think the change is well understood and low risk.

How likely is this patch to cause regressions; how much testing does it need?
Low probability of causing regressions. Standard bug fix testing. Existing automated tests will exercise the changed code.

Attachment #9010110 - Flags: sec-approval?

Updated

•

6 years ago

status-firefox61: affected → wontfix

status-firefox62: ? → wontfix

status-firefox64: --- → affected

tracking-firefox63: --- → +

tracking-firefox64: --- → +

tracking-firefox-esr60: --- → 63+

Comment 24

•

6 years ago

Comment on attachment 9010110 [details] [diff] [review]
Patch to avoid the crash by ensuring the PNeckoParent resolver is deleted on the main thread

sec-approval+ for trunk. Let's get Beta and ESR60 patches nominated as well.

Attachment #9010110 - Flags: sec-approval? → sec-approval+

Assignee

Comment 25

•

6 years ago

This problem was introduced by my fix for bug 1334550 "Proxy moz-extension protocol requests to the parent process".

Working on getting the other patches ready.

Assignee

Comment 26

•

6 years ago

Attached patch Beta patch — Details — Splinter Review

Identical patch for Beta.

Assignee

Comment 27

•

6 years ago

Attached patch Release patch — Details — Splinter Review

Identical patch for Release.

Assignee

Comment 28

•

6 years ago

Attached patch ESR60 patch — Details — Splinter Review

Identical patch for ESR60.

Assignee

Comment 29

•

6 years ago

Comment on attachment 9010773 [details] [diff] [review]
Beta patch

[Security approval request comment]
The code changes are the same as on Nightly.
See comment 23.

Attachment #9010773 - Flags: sec-approval?

Assignee

Comment 30

•

6 years ago

Comment on attachment 9010774 [details] [diff] [review]
Release patch

[Security approval request comment]
The code changes are the same as on Nightly.
See comment 23.

Attachment #9010774 - Flags: sec-approval?

Assignee

Comment 31

•

6 years ago

Comment on attachment 9010775 [details] [diff] [review]
ESR60 patch

[Security approval request comment]
The code changes are the same as on Nightly.
See comment 23.

Attachment #9010775 - Flags: sec-approval?

Assignee

Comment 32

•

6 years ago

:RyanVM informed me sec-approval is only needed for the Nightly patch. I'll update patches to request standard approvals.

Assignee

Comment 33

•

6 years ago

Comment on attachment 9010773 [details] [diff] [review]
Beta patch

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1334550

[User impact if declined]:
User is exposed to a UAF crash in the parent process.

[Is this code covered by automated tests?]:
Yes

[Has the fix been verified in Nightly?]:
No, not yet on Nightly due to it being a security bug.

[Needs manual test from QE? If yes, steps to reproduce]:
No specific tests that reproduce the crash. Try builds have been shown to exercise the changed code, but don't cause the crash because that requires a race condition that has not been reproduced.
 
[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
The change is small and well understood.

[String changes made/needed]:
None

Attachment #9010773 - Flags: sec-approval? → approval-mozilla-beta?

Assignee

Comment 34

•

6 years ago

Comment on attachment 9010774 [details] [diff] [review]
Release patch

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1334550

[User impact if declined]:
User is exposed to a UAF crash in the parent process.

[Is this code covered by automated tests?]:
Yes

[Has the fix been verified in Nightly?]:
No, not yet on Nightly due to it being a security bug.

[Needs manual test from QE? If yes, steps to reproduce]:
No specific tests that reproduce the crash. Try builds have been shown to exercise the changed code, but don't cause the crash because that requires a race condition that has not been reproduced.
 
[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
The change is small and well understood.

[String changes made/needed]:
None

Attachment #9010774 - Flags: approval-mozilla-release?

Assignee

Updated

•

6 years ago

Attachment #9010774 - Flags: sec-approval?

Assignee

Comment 35

•

6 years ago

Comment on attachment 9010775 [details] [diff] [review]
ESR60 patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined:
sec-high
 
Fix Landed on Version:
Not yet landed.

Risk to taking this patch (and alternatives if risky):
Low risk due to the size of the change. It's in code used for loading JAR files which is used by extensions so it's frequently used. 

String or UUID changes made by this patch: 
None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #9010775 - Flags: sec-approval? → approval-mozilla-esr60?

Comment 36

•

6 years ago

I filed bug 1493045 to assert thread-safety on WeakPtr destruction.  In this case, that would have caught the wrong-thread destruction even if it wouldn't have caused a lost update to the refcount, which is probably the much more common case, and hopefully (1) failed often enough that we'd have caught it in CI, and (2) pointed much more directly at the problem.

(I'm not linking the bugs with a See Also because I'd like to not put up a big sign saying the other bug is inspired by a live security issue.)

Pascal Chevrel:pascalc (PTO until April 26)

Comment 37

•

6 years ago

Comment on attachment 9010774 [details] [diff] [review]
Release patch

Unless this is being actively exploited it should go in 63.

Attachment #9010774 - Flags: approval-mozilla-release? → approval-mozilla-release-

Comment 38

•

6 years ago

Comment on attachment 9010773 [details] [diff] [review]
Beta patch

Uplift approved for 63 beta 9, thanks.

Attachment #9010773 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 39

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/bad3f36435488b47c23c46ab8e8ee7aa74d62464

Comment 40

•

6 years ago

https://hg.mozilla.org/mozilla-central/rev/bad3f3643548

Group: dom-core-security → core-security-release

Status: REOPENED → RESOLVED

Closed: 6 years ago → 6 years ago

status-firefox64: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 41

•

6 years ago

I also filed bug 1493342 about most of the 64-bit crashes having the wrong address.

Comment 42

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/817efa50e613e4f8fc704fb106b13654ab0d6688

status-firefox63: affected → fixed

Andrei Vaida [:avaida]

Updated

•

6 years ago

Flags: qe-verify-

Whiteboard: [post-critsmash-triage]

Comment 43

•

6 years ago

Comment on attachment 9010775 [details] [diff] [review]
ESR60 patch

Fixes a sec-high crash. Approved for ESR 60.3.

Attachment #9010775 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+

Comment 44

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr60/rev/b80f94262165

status-firefox-esr60: affected → fixed

Updated

•

6 years ago

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main63+][adv-esr60.3+]