This diff removes the Marker opcode, replacing it with a BCMarker
struct in each IRInstruction. This gives us fewer redundant lines in IRTrace
dumps and allows for more straightforward control of which IRInstructions are
associated with which bytecodes. I took this opportunity to do some more
cleanup of ir dumps as well, and it's now possible to interpOne every codegen
punt.
While debugging a flib test that times out due to bad
compile-time behavior, I fixed several small sources of bad perf
before giving up on trying to just make things fast enough. Most of
these came from profiling while the flib test repeatedly compiled a
tracelet with close to 100k SSATmps (it keeps side-exiting and
recompiling).
Details:
- Waiting until codegen for punting on DefCns made some tracelets
take really long to compile (when they just consist of a bunch of
DefCnses).
- In LinearScan::collectInfo, m_jmps.reset() was the top of the
profiler (since we do it for each exit trace), and from talking
with @swtaarrs it seems like it can be just omitted.
- dce.cpp consumeIncRef was trying to memoize in a way that involved
creating hphp_hash_sets for each SSATmp; removed that. (Someone
should double-check I didn't break the algorithm if possible
because I didn't quite spend the time to 100% understand it.)
- dce creates a new StateVector<SSATmp,SSATmp*> for each exit trace
when sinking. Since the tracelet in question had a side exit for
about 1/3rd of the HHBC ops, this was kinda bad. It's also pretty
sparse, so I just changed it to a smart::flat_map.
- Convert WorkList from std::list to smart::list. (Should maybe be
smart::deque but I didn't want to test fixing the remove() call.)
This converter enables much more thorough testing of the
region translator. It currently passes all tests, though it does punt
on one or more Tracelets in roughly 1/6th of them. The two big
unimplemented features for the whole pipeline are interpOne (should be
pretty easy) and parameter reffiness checks (probably
nontrivial). I'll attack those in separate diffs next.
Remove unnecessary optimization that can be achieved in a more general
way and simplify continuation creation code.
VMExecutionContext::createContinuationHelper was renamed to
VMExecutionContext::createCont{Func,Meth}. The createContFunc no longer
takes this/class arguments, the createContMeth transfers them in one
Type::Ctx pointer that is used natively by ActRec's m_this. Since the
whole logic of this function is to set this single field, the logic is
now simpler.
Interpreter: iopCreateCont() just passes the m_this field of the parent
ActRec.
Translator: CreateCont opcode loads m_this field using LdCtx opcode.
This opcode optimizes into LdThis, which optimizes into
DefInlineFP->SpillFrame->object and allows frame to be eliminated, a
case previously covered by InlineCreateCont.
This diff uncovered a bug in trace builder, where LdCtx in static
methods could be optimized into LdThis, if the method was called thru
object. Fixed.
Save 8 bytes of m_args and its initialization for Continuations without
func_get_args() call (does not save real memory due to 16-byte alignment).
Store variable arguments in optional local.
For converting Transl -> JIT. We either do this or convert all the other `Trace` calls to `HPHP::Trace` which seemed worse. This also niecly mirrors `IRInstruction`.