Merge ContDone+ContExit into ContRetC with added support of passing results. This variable passing mechanism is not exposed to the PHP as the ReturnStatements in generators do not contain result expression. However, this is exposed by restored hphp_continuation_done() built-in to allow experimentation.
The idea is that once we introduce ContYield opcode (merge of all opcodes used by YieldExpression), we could change ContRetC and ContYield to leave result and done-status on the stack and leave it up to the caller (ContNext/ContSend/ContRaise) to fill in Continuation fields. This will make these opcodes more generic and useful for other things, while allowing us to move some properties to the VM and kill opcodes like ContCurrent.
Split up ConvToDbl into ConvArrToDbl, ConvBoolToDbl, ConvIntToDbl,
ConvObjToDbl, ConvStrToDbl and ConvGenToDbl, so that different flags
can be set for each instruction.
Continuation::send() uses m_received field to transfer the value inside
the continuation. This field remains set until the continuation is
iterated the next time.
Unset this field early by ContReceive so that the memory can be
reclaimed.
VectorTranslator is now good enough that all of this code can be
removed without causing a perf regression. Instructions, loads, and stores are
all slightly up, but CPU time looks like noise.
Of the four horsemen of the SmartAllocator, ArrayData was the only virtual
call. This meant an extra layer of indirection when coming from the TC
to allow the c++ compiler to emit its virtual call, and slightly larger
callsites when using non-generic paths.
While we're moving in this direction, consolidate ArrayData introspection
on its type enum. isSharedMap() was previously implemented with a vtable
slot, and we had no way of asking if an ArrayData was a NameValueTable.
Tx64 only translates CGetS when you do it with the context
class the same as the class being looked up, to avoid the need for
accessibility checks. The IR can translate it more often, but was
using its fast path even when it was not safe to do so: if a previous
(safe) access had allocated a targetcache entry, later accesses would
be able to get to the property without an access check. For now just
limit the optimiation to safe cases.
Many destructors were going through a C++ trampoline that did
nothing but turn a C call into a method call. Get rid of these where
possible. ArrayData still uses a virtual release, which is unfortunate
but cannot be helped at the moment.
I got to do a few optimizations:
* burn in the number of use vars
* not write out uninitialized use vars
* use the staticness of the method as a hint for the incRef
There is not a clear path forward to safely using MMX registers as
scratch storage. Doing so spoils the state of the legacy x87 FPU, and the
x64 ABI uses the x87 FPU to implement long double. Our options are to
either:
1. Prohibit use of long double in source code (or x87 instructions in
machine code). Given that we have a dynamically linked binary that
includes system libraries outside our control, open source that needs
to be patched, fbcode, etc., this could prove difficult.
2. Always execute an FPU resetting instruction before transitioning from
TC code to C++. For all I know, these instructions are cheap, but it
still seems like an unfortunate overhead to impose everywhere; since
we don't want to mark every helper with a "reset fpu" call, we'd
probably end up bloating callsites.
3. Admit this doesn't work.
In the absence of evidence this matters for perf, I lean towards 3.
They both returned the late static bound class, not the context
class. This meant that eg "constant('self::FOO')" was actually
returning what "constant('static::FOO')" should have done.
In addition, we often want the Class*, not its name, so
change them to return Class*. The remaining places that then
read the name from the Class* should be fixed to use the Class*
directly (in a later diff).
Finally, noticed that while "defined()" was recently fixed to
support "static::", "constant()" was not. Pulled out a common
function to find the correct Class*.
In the case where m_this was already in a register,
and known to be non-zero, we would fail to zero the ActRec's
m_this field, which could result in a dangling reference
to $this being captured by debug_backtrace.
Put the store on the presumably cold path. Fix DecRefThis to
do the same.
Avoid punning TypedValue* to String&/Array&/Object& in FCallBuiltin
(all three implementations). Our native function calling conventions
require passing pointers into a TypedValue for these types, and
pointers-to-scratch for return values.
In the HHIR case, I removed the optional "return pointer" argument
from the IR CallBuiltin instruction. The C++ value-passing ABI details
are now handled in cgCallBuiltin and are no longer exposed in the IR.
The argument types are still PtrTo*, but we handle the address fixups
in CodeGenerator.
ConvToArr has at least two variants: one that holds on to the object being converted and the other that does not. Having separate opcodes allow this distinction to be made. Making separate opcodes for each type of operand makes for a more consistent IR.
The interpreter fix was different than the jit/ir fix because ##translateFPushObjMethodD()##'s ##i.inputs[0]->rtt.valueClass()## is null (with a TODO in the code to make it not null). It then goes into the slowpath.
Another one like this and I think I should have a different attribute for static closures :(
The only places where ReturnStatement is constructed are:
- onReturn(check_yield=true) -> not allowed in generator
- onReturn(check_yield=false) -> coming from transform_yield_break, right after creating hphp_continuation_done()
- MethodStatement, end of function call -> hphp_continuation_done() is created at end of generator in prepare_generator()
Emitter is emitting ContExit in ReturnStatements used in generators. As
can be seen from the analysis above, it's always preceded by emitting
ContDone from hphp_continuation_done(). Let's emit ContDone inside the
ReturnStatement directly and kill usage of hphp_continuation_done().
transform_yield_break() becomes a simple onReturn(check_yield=false), so
let's inline it into onYield and create ReturnStatement directly. After
this change, check_yield flag is always true and can be killed.
ContExit was also used after emitting a generator method in case the end
of method is still reachable. ContDone is added so that the generator is
properly closed. I believe this is never actually used, as MethodStatement
creates ReturnStatement at the end of method anyway.
I directly copied continuations for this, but they never have params. Closures sometimes have params, and the closure itself will be the first local AFTER those.
Added a new IRInstruction, DecRefNZOrBranch, that decrefs but
branches out of the trace if the reference count is about to go to
zero. This guarantees that no destructor will run, and thus no memory
side effects on trace. Tracked the last value available in memory in
TraceBuilder and used it to convert DecRef instructions to
DecRefNZ. These 2 changes allow us to eliminate more IncRef-DecRef
pairs, in particular cases due to SetS, SetG & SetM; for example:
85: SetS
(20) t12:Cls = LdStack<Cls> t10:StkPtr, 1
(21) t13:Str = LdStack<Str> t10:StkPtr, 2
(23) t15:PtrToGen = LdClsPropAddr t12:Cls, t13:Str, Cls(0)
(24) DecRef t13:Str
(25) t16:PtrToCell = UnboxPtr t15:PtrToGen
(26) t17:Cell = LdMem<Cell> t16:PtrToCell, 0
(27) t18:Obj = IncRef t11:Obj
(28) StMem [t16:PtrToCell]:Obj, t11:Obj
(36) DecRefNZOrBranch t17:Cell -> L4
L5:
(38) DefLabel
86: PopC
(39) DecRefNZ t18:Obj
In the above example, memory tracking and DecRefNZOrBranch allowed us
to change instruction (39) from a DecRef to a DecRefNZ. Subsequent
dead-code elimination will remove the IncRef instruction (27) and
DecRefNZ instruction (39).
This diff does not handle SetM.
Unhack the parser and introduce YieldExpression that emits the
equivalent set of opcodes that were emitted by bunch of
expressions/statements generated by parser before.
YieldExpression expects evaluation stack to contain just the value
being yielded, so {,List}AssignmentExpression need to evaluate RHS
first. The previous code had the same behavior.
This will let us consolidate continuation-related opcodes and make
them less tied with continuation objects.
Most of this is pretty boring and mechanical. I added
VectorProp and VectorElem flags to help deal with the increasing
number of vector-related opcodes.
Slight change to assign operations that affect the inner type.
This gets rid of an unnecessary load in some cases. I have little evidence
that this is practically important, but when debugging on 7pack I found the
dead load confusing.
This set of changes fixes problems pointed out in earlier diffs by Ed and Brett, mostly centered on redundant ref counting. It will be followed up by another diff that does a mechanical refactor to introduce variants of convToArr, specialized on the types of it operand.
Pretty simple. This makes me slightly nervous because I've only
confirmed it works in my stupid OpenEmbedded ARM SDK; a different Linux
might call this something else. But we'll cross that bridge when we get
to it, and this works for now.
I'm also sneaking in a change to remove x29 from the list of
callee-saved regs; I put it in there by accident last time.
I added the check for this in the interpreter but ##f_array_map## re-enteres the VM via a different path than FCall. Here is the equivilent check for the VM.
PHP added this in 5.4 so that you can say your closure shouldn't capture ##$this##. https://wiki.php.net/rfc/closures
Should we add it? Many of their unit tests use it.
Instead of putting a boolean somewhere I used the same attr framework and set the static bit there. Thoughts? It only needs one change in the ##FPushFunc##.
In Zend 5.3 they decided that closures should inherit the ##$this## from the containing scope. This brings us close to paraity with them. The remaining thing is to make
function() use ($this) {}
fatal after we purge it from WWW.
This is a "mechanical" refactoring that is not supposed to introduce any new behavior or to fix anything. It sets the stage for further changes that will reduce the number of places where code generation punts.
Sometimes SpillStack and Call end up storing onto the stack a value
that was just loaded from the same stack location. There was already
code in cgSpillStack to avoid the extra stores in some cases, but that
still kept the LdStack alive. This diff detects some of those cases
with SpillStack in the Simplifier, which allows the LdStack to be
eliminated by DCE. This diff also adds similar support for Call. The
operands that don't need to be store onto the stack are replaced with
None.
Note that the optimization in cgSpillStack is not subsumed by this
diff. In the Simplifier, we cannot chase through IncRefs as done in
cgSpillStack -- that would result in an IncRef that is not consumed.
The simplifier was not promoting the boolean to integer in some
cases. I hit this for the add-zero case, and code inspection revealed
similar bugs in sub-zero and multiply-one cases.
This diff adds a relativeOffsets option to Disasm, which will print
code addresses as relative offsets from the beginning of the tracelet instead
of absolute addresses. This format makes it much easier to compare the relative
sizes of the instructions selected by the two jits. I also changed the runtime
option to be a double instead of a bool, which is used as the cutoff ratio for
which tracelets to print. Setting it to 1 will print tracelets where the ir
made larger code than tx64, setting it to 2 will print tracelets where the ir's
code is at least twice as big as tx64's, etc...
Any vector instructions that take a pointer to a base and might modify
it are now flagged with MayModifyStack. Any that actually do modify the stack
(this can be determined by looking at the input types) will produe two dests:
their original result and a new StkPtr. getStackValue calls into VectorEffects
to extract the new type and/or value. The instructions currently assume that
the stack cell they might be modifying is already synced to memory. This may
change in the future when we don't have to do a full SpillStack before the
helpers.
This mimics what TranslatorX64 does in translateSetMArray,
but it does it with fewer helpers and (often) fewer instructions in
translated code. I also found a bug in both jits and the interpreter
when dealing with arrays that hold refs to themselves. The new test
case exercises the fix, which involved a bit of refactoring of the
refcounting logic.
Enabling VectorTranslator while punting to tx64 is no longer a
regression so I removed the punt in emit().