Closed Bug 1499198 Opened 6 years ago Closed 6 years ago

Crash in js::wasm::Instance::callExport

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla64

Tracking Flags:

Tracking

Status

firefox-esr60

63+

verified

firefox62

---

wontfix

firefox63

---

wontfix

firefox64

---

verified

People

(Reporter: philipp, Assigned: bbouvier)

References

Details

(Keywords: crash, csectype-wildptr, sec-low, Whiteboard: [adv-esr60.3+][adv-main64+])

Crash Data

Attachments

(1 file)

fix.patch 6 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.81 KB, patch	luke : review+ RyanVM : approval-mozilla-esr60+	Details \| Diff \| Splinter Review

[:philipp]

Reporter

Description

•

6 years ago

This bug was filed from the Socorro interface and is
report bp-47e3e173-b5d4-4df0-b108-8cee80181014.
=============================================================

crash reports with this signature a re recently spiking up in volume on release to ~250 recorded reports a day - most of the crashes happen in the content process.

a lot of user comments are referring to the https://apps.facebook.com/bubbles_iq/ facebook game as the source of the crash.

Nicolas B. Pierron [:nbp]

Updated

•

6 years ago

Component: JavaScript Engine: JIT → Javascript: Web Assembly

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 1

•

6 years ago

Unfortunately, this is the equivalent of the most generic entry into the JIT: wasm::Instance::callExport is the C++ function that's called when entering wasm code. So there could be one or several different causes to this bug. In the worst case, there could be a call from wasm to the JIT, and the JIT function is the one which crashes, so this could also be an issue in JIT code (baseline or ion).

From a quick look at the first 5 reports, the crashes reasons seem to ~all consist of EXCEPTION_ACCESS_VIOLATION_READ, on Windows, on x86 (not sure if Socorro displays something different for x64, so this might be 32 or 64 bits; the code addresses look like 32 bits though). The stacks always look a bit weird:

- one raw address
- N times the same address
- then wasm::Instance::callExport on top of this
- then IonBuilder::makeCall??? Probably we can't trust this.

(One remark: if the raw address that's shown is the value of FP, then the address that's repeated N times is marked with the low bit, meaning it's a recursive call in a wasm function; the top stack address isn't marked, so this would then be a JIT function or exit. But I see a weird pattern in the repeated address (e.g. 0xea7afff), which resemble a JS::Value layout. So this may be a red herring.)

I'll try to see if there's anything else useful in the crash reports. Playing with the Facebook app might be a start.

Luke Wagner [:luke]

Comment 2

•

6 years ago

It'd be useful to know if the crash dumps contain the surrounding assembly code at the point of the crash.  With wasm, this can sometimes be especially indicative of (baseline vs. ion) and which operation.

Luke Wagner [:luke]

Comment 3

•

6 years ago

Wow, yeah, with 99% x86 (compared to ~28% of overall population), that suggests something 32-bit-specific; maybe regarding the use of guard pages and bounds checks.  It'd be useful to run an instrumented build that print()ed on traps (esp. out-of-bounds) to see if that's happening in normal runs.

Ryan VanderMeulen [:RyanVM]

Comment 4

•

6 years ago

If there's something we can land to get better diagnostics from the crashes, I could take it in 60.3esr if I had it today still. We might be able to get it in 63rc2 as well.

Flags: needinfo?(luke)

Flags: needinfo?(bbouvier)

Luke Wagner [:luke]

Comment 5

•

6 years ago

Ok, with help from Jan, I think I know what the bug is.  So Jan extracted this assembly for the crash dump:
  https://www.irccloud.com/pastebin/UIhT5xEF/
and other crash dumps crash at the same punpckldq instruction.

Notice right before we're moving a constant address into a register and loading it; that's a red flag in wasm code because we serialize module code and so process-relative addresses are a no-no, so this would have to be in a wasm non-module stub (or JS JIT code).

Jan found a stub that seems to emit this pattern:
  https://searchfox.org/mozilla-central/source/js/src/jit/x86/MacroAssembler-x86.cpp#1190
and we can see it baking in a static variable address!  (If ImmPtr() was used instead of ImmWord(), this would have asserted because IsCompilingWasm().)

Lars, seems like there could be a quick fix?

It'd be great to have a fix in the next releases.  Alternatively, if it's too risky, we can just disable asm.js caching.

Flags: needinfo?(luke)

Flags: needinfo?(lhansen)

Flags: needinfo?(bbouvier)

Lars T Hansen [:lth]

Comment 6

•

6 years ago

Will look at this first thing in the morning.

Luke Wagner [:luke]

Comment 7

•

6 years ago

Actually, Lars: is punpckldq only emitted for wasm?

If so, then the fact that the crashes are only for FF62 and wasm serialization was removed in FF63 (bug 1469395) and not yet re-added (bug 1487113) might mean that this crash is implicitly fixed in FF63 without any patches.

Ryan VanderMeulen [:RyanVM]

Comment 8

•

6 years ago

Note that ESR60 is also affected.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 9

•

6 years ago

(In reply to Luke Wagner [:luke] from comment #7)
> Actually, Lars: is punpckldq only emitted for wasm?

If you meant to ask "only wasm and not asm.js": yes. Otherwise, it's also used in regular JS for the MRandom node.

> If so, then the fact that the crashes are only for FF62 and wasm
> serialization was removed in FF63 (bug 1469395) and not yet re-added (bug
> 1487113) might mean that this crash is implicitly fixed in FF63 without any
> patches.

Agreed; we'd need a stop gap for ESR60 (and maybe others) in this case.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 10

•

6 years ago

Attached patch fix.patch — Details — Splinter Review

This is silly, we have masm features to embed the constant values within the code for 128-bit values (because SIMD!). This is equivalent and just a bit slower (by adding one load per conversion), but it doesn't use an absolute reference.

Assignee: nobody → bbouvier

Status: NEW → ASSIGNED

Attachment #9017918 - Flags: review?(lhansen)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 11

•

6 years ago

Regarding the sec-rating:
- in wasm:
  - in the best case, the address we're trying to read from is incorrect and we'll segfault
  - in the worst case, we read from an incorrect address, and the computation will be incorrect. Then the double will flow through wasm operators, and at the JS boundaries, be converted either into a regular double if it's valid, or a canonical NaN; so no risk for wrong type interpretation.
- under js (since this function is used by MRandom code gen): the problem is non-existent, because jit code isn't cached.

Luke agreed with me. So I think this is sec-low.

I'll be away until tomorrow (end of day here), so feel free to tweak/land the patch. For what it's worth, I've tried it locally and it passes all the wasm testing in --tbpl mode.

Flags: needinfo?(lhansen)

Luke Wagner [:luke]

Comment 12

•

6 years ago

Comment on attachment 9017918 [details] [diff] [review]
fix.patch

Review of attachment 9017918 [details] [diff] [review]:
-----------------------------------------------------------------

(Stealing, so we can get a fix in ESR) Looks great to me!

Attachment #9017918 - Flags: review?(lhansen) → review+

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox62: affected → wontfix

status-firefox63: ? → affected

status-firefox64: ? → affected

tracking-firefox-esr60: --- → 63+

Keywords: sec-low

Luke Wagner [:luke]

Comment 13

•

6 years ago

Comment on attachment 9017918 [details] [diff] [review]
fix.patch

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: It prevents a crash with decent volume

User impact if declined: Continued crashes (for a long time)

Fix Landed on Version: Nightly

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Simple patch.

String or UUID changes made by this patch:

Attachment #9017918 - Flags: approval-mozilla-esr60?

Ryan VanderMeulen [:RyanVM]

Comment 14

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/01368a04b48aeca172e2153cd0f8401e5162226a

Per IRC discussion with Luke and Al, I'm going to take this fix for 60.3esr because it's a topcrash there, but we're going to let it ride the 64 train to release since 63 isn't hitting the crash reported here and it'd be needless late-cycle churn.

status-firefox63: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Comment 15

•

6 years ago

Comment on attachment 9017918 [details] [diff] [review]
fix.patch

Fixes a topcrash, approved for 60.3esr.

Attachment #9017918 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+

Ryan VanderMeulen [:RyanVM]

Comment 16

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr60/rev/ea9412b18ca8

status-firefox-esr60: affected → fixed

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: [adv-esr60.3+]

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

6 years ago

Comment 17

•

6 years ago

https://hg.mozilla.org/mozilla-central/rev/01368a04b48a

Group: javascript-core-security → core-security-release

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox64: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

Andrei Vaida [:avaida]

Updated

•

6 years ago

Flags: qe-verify+

Bogdan Maris, Desktop QA

Comment 18

•

6 years ago

Was able to reproduce the crash using the url from comment 0 after multiple refreshes on the page: https://crash-stats.mozilla.com/report/index/ef1234e4-bb1b-4204-b079-84f310181101 (Fx 62.0.3, Windows 7 32bit).

Did not encounter the crash on new builds after the fix 64.0b5 and 60.3.0esr under Windows 7 32bit. Checking Socorro I do still see 3 crashes in the last 7 days affecting 64.0b4. Is it safe we call this verified fixed based on these results?

Flags: needinfo?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 19

•

6 years ago

Yes, I think so (and that's in line with comment 14).

Flags: needinfo?(bbouvier)

Bogdan Maris, Desktop QA

Comment 20

•

6 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #19)
> Yes, I think so (and that's in line with comment 14).

Thanks, marking the bug accordingly.

Status: RESOLVED → VERIFIED

status-firefox64: fixed → verified

status-firefox-esr60: fixed → verified

Flags: qe-verify+

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: [adv-esr60.3+] → [adv-esr60.3+][adv-main64+]

Daniel Veditz [:dveditz]

Updated

•

4 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.