If a default value is optimized from a class constant,
to the value of the class constant, then reflection needs to
be able to get the text of the original class constant.
Fortunately hphp tags that onto the replacement expression for
just this reason. Lets use it in the emitter.
eg
(new X) == (new X);
Was converted to something like:
(new X);
(new X);
Which calls the destructor for the first X before constructing the second.
I tried a fix where we used an ExpressionList with ListKindLeft, which
would preserve both expressions until after both objects are created;
but that ends up calling the second object's destructor first. Thats
much better; its not clear to me that there's any guarantee about which
object's destructor is called first when they both go out of scope at
the same point; but currently hhvm and zend seem to aggree, so Im
going with a solution that preserves the left-to-right order.
This diff fixes a bug in shuffleArgs when there
are three register arguments in a cycle, and one
of them needs a zero extend.
Assume the shuffle that needs to be performed is
rdi -> rsi, rsi -> rdx, rdx -> rdi. doRegMoves()
determines the sequence of moves to be:
xchg rdx, rsi
xchg rsi, rdi
Assume also that the second dest reg (rsi) needs a zero extend.
The current implementatin will spit out the code
sequence:
xchg %rdx, %rsi
movzbl %sil, %esi
xchg %rsi, %rdi
movzbl %sil, $esi
Basically, the problem is that if a move sequence uses
two xchg's for a cycle of three registers, we should not perform
the zero extending (and address-lea) till both the exchanges are done.
My recent change to now/heredoc processing resulted in variables in heredocs mistakenly being tacked onto the end of the previous line in the output. This tends not to be a problem when emitting things that get parsed by something else, like emitting PHP. But it's noticable when emitting raw text.
I think the only case where we currently don't spillstack as
part of a vector translation is CGetM of a defined property. However,
this can still raise exceptions in the unlikely case of an unset
property. For now, just spill before any vector translation---a
follow up diff relaxes this to not do it in cases where we know the
property can't be undefined. Presumably later we'll push the
spillstacks into the unlikely paths for these translations (or put
them in unwind handlers)---currently we can't handle control flow
where the join point has a different stack.
If the global was not defined, it would try to return
a pointer into the MInstrCtx, which was not passed in.
We dont actually need it though, because in that case we can
just return a pointer to init_null_variant.
Fixed a few issues in the lexer for heredocs and nowdocs. The source file is read in 64k chunks, and any time a doc was split across a buffer boundary the lexer would fail to consume the doc properly. Modified the lexer to refill the file buffer when it is used, or when more data is needed in a variety of cases. Also fixed a number of other corner cases where we'd fail to recognize the doc end label or other special characters. The old code was also a bit over- and under-flowy.
I thought I'd have to teach memelim about the PtrToBoxed*
srcs for vector instructions but it turns out the refactoring I did in
the UnsetElem diff was enough. It already clears out the local map for
instructions that may modify refs.
We need to make sure that we execute the code
corresponding to the actual Func*. This re-uses the
prolog array to hold the entry points for the cloned
Func*.
The code was assuming that the result of the assignment
would be the value assigned, but its actually a new string
containing the first character of the value assigned. The result
was that the SetM had already decRef'd the rhs, and then the
jitted code decRef'd it again. Since that last decRef was a decRefNZ,
we would often get away with simply leaking both the original rhs and
the value of the SetM. But the new testcase crashes in a DEBUG build
without the fix.
The vector translator produced a type that was
boxed-array-or-string, which we couldnt guard on.
Since applying a SetM to a string will almost always produce
a string, change it to report string instead (which will still
be guarded on).
posix_getpwuid()
posix_getpwnam()
posix_getgrgid()
posix_getgrnam()
posix_ttyname()
were using non-threadsafe posix calls.
Most using AttachLiteral for shared space as well.
ZendPack::unpack() always returns a signed int32_t,
regardless of actual storage type being unpacked. For 32bit
unsigned types ('L', 'N', 'V', and 'I'(on I32 systems)) this
means overflowing in the helper and we have to explicitly
recast it to a uint64_t to get the data back out.
For LNV, this was already handled, it was missed for I.
They both returned the late static bound class, not the context
class. This meant that eg "constant('self::FOO')" was actually
returning what "constant('static::FOO')" should have done.
In addition, we often want the Class*, not its name, so
change them to return Class*. The remaining places that then
read the name from the Class* should be fixed to use the Class*
directly (in a later diff).
Finally, noticed that while "defined()" was recently fixed to
support "static::", "constant()" was not. Pulled out a common
function to find the correct Class*.
Alias manager does not know whether generator parameters are passed by
reference. This didn't matter, because every generator had at least one
function call (hphp_continuation_done()) that pretty much disabled unused
variable elimination.
This diff fixes that, lets us get rid of artificial function calls in
generators and will allow later improvements in alias manager.
There is a runtime option to filter out notices and warnings,
but strict_warnings were left out. Bundle them with notices.
We raise a lot of strict_warnings; and when we fix hphpiCompat
(to match zend better) we will raise a lot more, so this could
matter.
Alias manager does not know that generator parameters are populated and
assumes they are uninit. The current code works because control flow
algorithm gives up while trying to deal with the continuation switch
statement full of gotos.
This diff fixes it by setting isGeneratorParameter flag in symbols
representing parameters of enclosing generator wrapper and use variables
of enclosing closure.
Pretty simple. This makes me slightly nervous because I've only
confirmed it works in my stupid OpenEmbedded ARM SDK; a different Linux
might call this something else. But we'll cross that bridge when we get
to it, and this works for now.
I'm also sneaking in a change to remove x29 from the list of
callee-saved regs; I put it in there by accident last time.
I added the check for this in the interpreter but ##f_array_map## re-enteres the VM via a different path than FCall. Here is the equivilent check for the VM.
When we generate a binary with an embedded repo,
we add various args (-vRepo.Authoritative etc) to the end
of the command line. But the argument "--" is taken to mean
"pass the rest of the args to the script". So if you use
"--" on such a binary, it gets rather badly broken.
Reorder the args so that the inserted ones come first.
raise_notice() wasn't declared in this file, so include runtime_error.h.
This caused further problems with ATTRIBUTE_PRINTF not being defined
yet, so include some more stuff.
Bad memory allocation, the buffer needs to be large enough to fit all
the data we've decompressed so far plus the extra storage we're
incrementally allocationg, not just the incremental part.
The hoistConditionalJumps pass was not handling traces ending with
JmpZero and JmpNZero. This was resulting in spurious jumps to
'astubs' instead of patching the jcc+jump pair in 'a'.
In HHBC mode, traits are flattened into their classes.
When that happens, closures need to know about all the
classes that contain them so that when we find a ##$this##
inside the closure, it can tell EVERY containing scope to
please propogate ##$this## down to me.
Summary:
Specifying Signed 16-bit Int in HDF parsing
prevents use of any port greater than 32767.
Test Plan: Run a server on port 40000
Reviewers: andrewparoski, jdelong, smith, kma, mwilliams
Reviewed By: jdelong
CC: hphp-diffs@lists, ps, ottoni, aalexandre
Differential Revision: https://phabricator.fb.com/D735202
Remove HPHPc-only inconsistencies from the inconsistencies doc. Also,
clarify the wording to make it clear we are comparing HipHop VM with the
Zend PHP engine.
This diff changes DceFlags to be an opaque struct, and uses
the new decRefNZed flag to ensure that we don't kill any IncRefs
unless they're consumed by a DecRefNZ in at least one path. This
doesn't catch all cases of malformed IR, but it catches common
mistakes, including one in the vector translator that I just spent 2
days hunting down.
A couple things came out of this:
- dce now removes unreachable code before performing the real dead
code analysis, since any unreachable IncRefs should be killed
regardless of what consumes them.
- I found a bug in memelim where it wouldn't properly account for
Uninit -> InitNull promotion when boxing a local. The fix for this
was to never emit "Box Uninit".
gzinflate tries to do some kind of exponential growth that
hits the max string limit immediately when you try to deflate a string
larger than 128 megs. The logic is a little convoluted with a few
ways to bail, but rather than try to really fix it I just made it
behave close to the same on small strings, but use a smaller starting
factor on long ones and be willing to try to deflate it into a
max-sized string.
This diff removes LVariableTable, RVariableTable, and getHphpBinaryType().
It also consolidates some of the include logic that was spread across
multiple files.
Access to TypedValue.m_aux must now be via TypedValueAux. For now,
TypedValueAux is an empty subclass with accessors to m_aux, which is
now private. Once RefData.m_tv is moved out from under RefData._count,
we can move TypedValue.m_aux to TypedValueAux.
Removed unnecessary initialization of m_aux.u_hash from c_Vector.
For a debug build:
cmake -DCMAKE_BUILD_TYPE=Debug .
For a release build:
cmake -DCMAKE_BUILD_TYPE=Release .
If you never set CMAKE_BUILD_TYPE, the default will remain "Release"
If the argument was in the expected register, we failed to do the
zero-extension. This showed up as a heisenbug when experimenting
with different TypedValue layouts, but could be causing other
stability problems in HHIR as well.
I got bit by a really nasty bug where I cloned a ##Func## and didn't set the ID. Then the JIT happily used it and I got really weird segfaults.
Making the ##m_funcId## private turned out to be really hard (~30 callsites) so @andrewparoski said the ##translate()## and ##retranslate()## were the most important places. We just have to remember to do it. If there are other important places I'll just bite the bullet and make it private with an accessor.
This should save days of many future engineers.
DecRef needs to be the last use of its source, since it might
destruct it. In the new test case, the second cast expression was
reusing the result of the first (through CSE) after it had been
DecReffed and destroyed.
This diff removes initializing stores to TypedValue._count, renames
_count to m_aux, and makes m_aux a union with members typed
and named according to their specialized uses. The few remaining
uses of that field for random tweaks are more obvious and easy to
grep for.
TypedValue no longer extends Value, (allowing m_data to move to a
different offset in the future), and Variant now extends TypedValue,
so we only have to maintain one definition.
HphpArray now explicitly uses TypedValue.m_pad instead of overlapping
TypedValue with an anonymous struct, again so we don't have to maintain
another structure to match TypedValue's layout.
The JIT's were using offsetof(TypedValue, _count) all over the place
for access to String/Array/Object/RefData::_count. Instead, use
FAST_REFCOUNT_OFFSET.
Blocks consisting of just a Jmp_ instruction could result in
emitting jmp instructions to the immediately following
instruction. This was fairly simple to avoid.
Previous deff introduced mechanism to avoid calling constructor of abstract class. In some cases, subclassing an abstract class on the PHP side makes sense.
Let's remove this limitation by introducing IsCppAbstract flag, so that only constructor of classes that are abstract in the C++ sense is not called.
Extend cgConv with support for converting to an array. Except for the trival cases of converting null to an empty array and converting arrays to arrays, which are handled in Simplifier, the conversion is delegated to a helper that calls tvCastToArrayInPlace on its input parameter and then returns the embedded array that results from the call. The helper also increases the ref count of operand to convert since tvCastToArrayInPlace assumes that updating the typed value passed to it will result in one fewer referenced to the argument.
In HPHPc it didn't allow closures to be cloned but in HHVM it did. When I migrated to a C++ closure then i left the HPHPc code. Zend 5.4 allows them to be cloned so lets go with this.
For closures, I'm going to clone the ##__invoke## method and set the runtime ##m_cls## to be whatever the containing class was (or nullptr if there isn't one). Instead of finding all the callsites that decRef the ##ActRec##'s ##m_this## based on ##isMethod##, I think it is clenaer to make ##isMethod## reflect the state of the ##Func*##
closures will need to know at emission time whether there is a
static $foo;
statement inside them, because if there is, the closure has to stick around to hold the static locals.
There are a few code quality issues that affect the IR in
general but are magnified in the vector translator. I'm going to work
on fixing those, but in the meantime we're faster with it disabled
(while punting to tx64).
This is a very simple optimization to avoid some unnecessary
stores. In codegen for SpillStack, if the value we're going to spill
comes from a LdStack instruction corresponding to the spill
destination, don't spill it.
This actually was pretty broken already. If you defined a new function in the ##create_function## string it would return that function but it wouldn't move it to the right unit so it can't execute. A big mess. For example:
$ cat a.php
<?php
$a = create_function('', 'function b() { return 3; }');
$a();
var_dump(b());
$ php a.php
HipHop Fatal error: Undefined function: b in /data/users/ptarjan/hphp/hphp/a.php on line 4
$ hhvm a.php
int(3)
Xbox threads can now release memory after each request. Also
gives cleaner semantics as Xbox requests dodn't have to be careful
about not modifying global state.
Expose the following data to the PHP:
- asio_get_current_context_idx(): get current context index
- asio_get_running_in_context(): get running wait handle in a given context
- asio_get_running(): get running wait handle in a current context (renamed asio_get_current())
- WaitableWaitHandle::getContextIdx(): get context the wait handle operates in
- WaitableWaitHandle::getCreator(): get continuation wait handle that constructed this wait handle
Move the responsibility of entering/exiting contexts from PHP to the
implementation of $wait_handle->join().
This eliminates possibility of weird situations, like contexts without
any running wait handle. This guarantees that asio_get_current() returns
null only if called completely out of asio framework and simplifies some
logic, such as getCurrentWaitHandleDepth().
Noticed this in perf, maybe I'm missing something obvious
but it seemed like there was a pretty easy way to avoid an extra
hashtable lookup during get_class_constants
This shows up because the Enum class call this method once per enum
the first time isValid is call. Might be worth doing something to
make this even more efficient, but seems like an easy win.
If an internal HPHP exception is thrown in a continuation executed by
ext_asio, m_current pointer was not reset and resources were not cleaned
up. This doesn't matter that much in prod, but when used in debug mode,
an assertion was hit.
Take advantage of previous diff that won't try to construct abstract classes.
Abstract methods now don't need to be implemented, so remove their
dummy implementation.
Abstract classes can't be constructed. Don't generate helper for
constructing new instances of such classes. Set null to m_InstanceCtor
field of HhbcExtClassInfo that eventually gets passed to m_InstanceCtor
of Class. The Instance::newInstance() then ends up raising correct
exception.
m_unboxPtrs looks like a relic from the early days of the
ir. emitCGetProp has been cleaned up to never punt (it checks every
needed condition before deciding to not using the vector translator).
This was causing the ir to generate much larger code than
tx64 for simple SetMs. tx64 does have a specialized helper for setting
array elements that we may want to replicate in the IR, but that can
come in a separate diff. The code generated by VectorTranslator is now
just a few instructions larger than tx64. Without the SpillStack it
would be 1 instruction smaller.
This is the last step to being able to get rid of the c++ code
gen in hphp. "make -Chphp/system" is now a no-op.
I'll rip out the actual c++ generating code as a separate diff.
I found some byte-wide padding in Unit and UnitEmitter that we can
use to store the mergeOnly flag, instead of m_returnValue's unused
_count field. This is one step towards eliminating TV._count.
Improved the size of the code emitted by HHIR. Checks for
this pointer should use a byte comparison. Added a linearscan
precoloring hint for global address helper. Modified argument
shuffling code to use xor rather than move of an immediate zero. For
StaticMethodCache access, moved the test next to the branch to make it
more x86 friendly.
This diff updates the parser and runtime to support using collection
literals in initializer expressions for instance properties, static
properties, parameters, and static locals.
The runtime as-is was able to correctly handle collection literals in
initializers for static properties, parameters, and static locals. However,
for instance properties I needed to way to make it so that each instance
got a fresh copy of the collection literal.
To achieve this, I added an attribute to indicate that a property requires
'deep' intiialization. When this attribute is set, the Class machinery
will not call setEvalScalar() on the initial value (the value produced by
86pinit), and it will make a deep copy of the initial value when a new
instance of the Class is allocated.
Support == and != operators for collections. Also fix some bugs the <, <=,
>, and >= operators when comparing two objects or when comparing an array
with an object.
This fixes an old bug with autoloading interfaces and traits where the
runtime erroneously continued to call autoload handlers even though the
requested interface/trait had already been loaded.
This also rips out some dead autoload helpers and removes the 'declared'
parameter from invokeHandler() since it is null for all callsites.
A prior diff added property error-reporting code that was looking up the
property in the PreClass. However, a PreClass doesn't contain the
inherited properties, so HHVM was crashing when trying to raise an
accessibility error on an inherited property.
This diff changes the code to lookup the property in the actual Class
instead.
IDL says Object, but C++ code says Variant. They always return Object,
but the ext_hhvm adapters don't crash because of the way Object and
Variant happen to be layed out. When the layout changes, stuff breaks.
Adds debug symbols on calls into the runtime so we can better
understand/compare the helpers being called when we use DumpIR or
JitCompareHHIR.
I used the GNU backtrace_symbols() facility to do the translation of
address->symbol, since it seemed like the easiest route to this
functionality.
The variable declared in an if condition cant be redeclared in the
(outermost scope of the) else.
g++4.7.1 treats FOO as a literal suffix operator in "x"FOO, even if FOO
is a macro whose expansion wouldnt qualify (not sure if this is a bug
in g++4.7.1)
We only used it to get the values of certain class constants,
and to define the ObjectStaticCallbacks for every class.
We can put the class constants directly into the class_map
(we should have done that before for perf reasons), and then
the only remaining use of ObjectStaticCallbacks is to proxy
the Class* for each builtin class. So just use the Class*
directly.
Once this is in, Im just a small step away from eliminating
make -C hphp/system - so Im leaving a lot of dead code here.
Its going to be easier to delete it en masse, rather than
try to pick and chose now.
Completed implementing LateBoundCls in HHIR by handling the
case where we need to dynamically distinguish between a 'this' pointer
and a class in the ActRec. Introduced new instructions to extract
the context class from the m_cls/m_this field of the ActRec. Re-used
them for FPushClsMethodF.
However, it triggered a bug in memelim. When chasing the value of a
Ref, if it came from a load other than LdRef (eg LdLoc), the value
tracked was the Ref itself instead of the inner value. This diff
separates RefInfo and LocInfo, and it adds findRefValue() to get the
value of a Ref.
This exposes the core async mysql async APIs we added to our
mysql client.
These APIs are not meant to really be directly used (though they can
be); instead they're the building block upon which higher level APIs can
be implemented.
This turns the output into three columns: tx64 asm, hhir asm,
then pretty-printed hhir with asm inline. This should help us figure
out what the ir is doing wrong when it makes bigger code. I also
compacted the output of Trace::print a bit and changed it to print
unlikely blocks at the very end.
TestServer finds a free port to use as the server, but it
used the standard admin and rpc ports. If you had a
webserver running, it would own the admin port, causing
TestServer's tests to time out (because it uses the admin
port to stop them). Similarly rpc tests would fail because
they would be talking to the wrong server.
Find free ports for the admin and rpc servers, and use them.
chars_len holds the capacity of the buffer on input,
and is filled in with the number of chars written.
It was not being set, causing random behavior (including
potential buffer overrun). The testcase actually relied
on it being set to a too-small value.
I found 4 separate problems, each of which broke this:
- A hoistable class didnt generate bytecode, so the line number
information was associated with whatever line the /next/
bytecode corresponded to. Often there was no next bytecode,
so we got the line number of the end of the file.
- If there /was/ a bytecode for the defClass, the line number
recorded for it was that of the pseudoMain, not the class
itself; so again, it would be reported as the end of the file.
- The parser records two sets of line numbers for a class; its
start, and end. The end line gets associated with the bytecode.
- There were optimized paths where we didnt setup an ActRec
at all, resulting in either reporting the file/line of the
require (or often a require much higher up the stack), or
not reporting a file/line at all.
So now, we generate a nop if we would have skipped the defCls.
We override the normal line info, and set both the start and end
lines to the start. And we wrap all errors from defClass in a
temporary frame to ensure the line info is seen.
This changed the output for several of our tests. Spot checking
a couple showed they now report the same line number as zend.
It was only used to fold SimpleFunctionCall nodes. Ive setup enough
of the runtime that we can call invoke (which goes through hhvm's
normal Func dispatch), and then removed it.
This basically targetted symbols.php, and Globals, but ended up
killing a lot more. I could keep adding more and more, but
this seems like a good point to stop and continue with
another diff.
There were a few bytecodes in irtranslator that were
HHIR_UNIMPLEMENTED. These are bytecodes that tx64 handles but the IR
does not. Changed these to use hhbctranslator emit routines and used
interponeOrPunt to implement them for now. This should increase IR's
coverage a little when IRPuntDontInterp is false.
This reduces CPU instruction count by about 0.2%, has negligible impact on other metrics. Implementation has been changed a but upon discussion with @mwilliams to account for circular destruction. The question remains open whether this should be in before or after rooting out hphpc, but on the other hand we gotta do what we gotta do to move forward.
SharedMap was the last dependency on ZendArray. For its localCache,
use a TypedValue[] array indexed by SharedVariant.getIndex(), and
for escalate(mutableIteration), escalate to an HphpArray instead of
a ZendArray.
I'm using OpenEmbedded for the ARMv8 build right now, and it seems to be
BSD-like, based on the preprocessor flags that are defined, including
__USE_BSD. OpenEmbedded has the same problem with the "isset" macro, and
this change includes it in the code that undefs isset.
I guess this problem is common to BSD-like systems (if you consider OS X
to be BSD-like for this purpose), but __USE_BSD by itself isn't enough;
OS X doesn't define it.
This diff suppresses the output of C++ for the "pure" classes defined in
system/classes, and it rips out all the uses of MethodCallPackage (except
for the i_* and ifa_* helpers, which we can go after separately).
Also cleans up a bunch of "if (hhvm)" and "#ifdef HHVM" checks in builtin_functions.cpp,
systemlib.cpp, object_data.cpp, and class_info.cpp (and the corresponding .h files).
Note that this does not completely remove the generated C++ files. We
still generate code for the PHP files in "system/globals" and we still
generate the g_class_map (because the VM needs g_class_map at startup
when it creates Class's for the extensions). We also still have the
dynamic_func_table/dynamic_class_table stuff, MethodCallPackage, and
the i_* and ifa_* helpers to support invoke_builtin() (which is still
used by the compiler).
This is in preparation for saving a Func* on the class to fix the perf.
It turns out Properties were never used on a single class in the IDL so HHVM never implemented them. Now it does.
This diff was a perfect exercise in "change very few lines but finding which lines to change takes hours".
On the back burner I've been working on a diff to pull out of some of
the old VM-incompatible machinery for invoking functions/methods and for
creating object. Along the way I noticed some bugs, so I figured I'd fix
them first in a separate diff to make review easier.
When allocating registers and inserting Reloads, be less pessimistic
about reusing earlier reloads. Only "forget" reloads that don't
dominate current block. e.g. right after an if/then/else block, it
will clear any reloads from if/else arms so they can't be used in
the join block.
Moved numberInstructions() to linearScan.cpp, and don't re-sort the
control flow graph every time; register allocation doesn't change
the CFG shape so its enough to just iterate the existing block list
each time.
Un-pessimize rematerialization by snapshotting state at branches,
merging state when necessary, then restoring state at block starts.
Here, "state" is the previously seen tmps for sp, fp, and each
local variable.
removeUnusedSpills() can just iterate over the spill slots, erasing
spills with no uses; we don't need to visit every instruction.
enterContext() throws an exception when cross-context cycle is found.
The problem is that it modifies state before the exception is thrown,
assuming that the call will succeed.
When an exception is thrown, a dependency is left in invalid state, with
parent being in more specific context. This breaks exitContext()
algorithm and results in either internal invariant violations as seen
in #2091939, or memory corruptions and crashes as seen in #2125762.
Let's fix it by modifying state after returning back from recursive call
instead of before doing such call. This was previously unsafe in case we
tried to import dependency loop. Once D720506 is committed, dependency
loops will not exist anymore.
Currently, we detect dependency loops by waiting until there is nothing
else to execute. If the wait handle we are waiting for did not finish,
it means it is in a cycle. We find the cycle by simply following the
dependency chain. Once the cycle is found, one edge is eliminated and an
exception is injected.
There are multiple problems with this approach:
1. Unability to exit contet safely
We are unable to exit context safely. When a context is exited, all wait
handles in that context must be kicked out. But we maintain only
references to the SCHEDULED wait handles + BLOCKED wait handles that
recursively depend on them.
If we do not kick out all unfinished wait handles, we end up in
corrupted state.
2. Unability to break edge that caused the cycle
Once the cycle is detected, we don't know which edge caused the cycle to
be formed. We can only use heuristics to eliminate the edge that likely
formed the cycle, we cannot be sure. This may make it very hard to fix
the PHP code that caused the cycle.
Solution:
This diff implements online cycle detection with a naive approach of
visiting the dependency chain from child at a time new edge between
parent and child is being added. If a parent is visited, a cycle is
found. Otherwise we eventually reach non-BLOCKED wait handle as it is
guaranteed the rest of the graph is cycle-free.
Currently, wait handles store pointer to the context they are in. This
pointer is not protected with reference counting, as it is expected that
whenever a context is exited, references to it are cleaned thru
exitContext() mechanism.
If a bug is present that violates this assumption, it is impossible to
guard against invalid pointer access and a hard to debug memory
corruption occurs.
Since the structure of contexts is a simple stack, let's reference them
by index instead of by pointer.
As a bonus, one pointer worth of memory is saved for every non-trivial wait handle.
The actual bugs will be fixed by the next 2 diffs that do:
1. implement online cycle detection
2. do enterContext() atomically and properly handle failure
Just treating some more x64-specific assembly. The tx64 helpers are all
written to trap on ARM right now -- we're not going to be running the
jit for a little while, and we don't need real implementations till
then.
Fortunately, the system we're targeting has thread-local storage
support, and it's pretty easy to get at it, so I just did it.
Fixed a few places where we were calling exitTrace when we
should have been calling exitSlowTrace in
hhbctranslator. ExitSlowTrace will exit and not return to the IR,
going to either code generated by tx64 or the interpreter (depending
on the IRPuntDontInterp runtime flag). ExitTrace will simply exit and
retranslate using the IR if possible. In some cases we were using
exitTrace for conditions that the IR could not handle, causing us to
just try the same doomed operation again in the IR.
Used control flow to implement IssetS/G and EmptyS/G instead of
exiting the trace on an undefined static property or global.
Implemented code gen for Unbox and cleaned up its
simplification.
Cleaned up some DefConsts in hhbctranslator.
We only have recursion if what we're currently looking at is the same as
one of its parents in the nested arrays. We don't need to keep track of
everything that's been seen, only the elements seen in a path down to that
element.
Seems pointless (but maybe Im missing something). Long term its
not important - this only affects things when the ir is off, but
its going to give us a false baseline wrt startup translation
time of the ir.
Now we can rely the Native flag to tell whether a call is going to
happen or not. This should improve precoloring hints, and allows
re-enabling spilling to MMX registers.
Instead of custom asm we generate in a few places to generate asserts
that check a TypedValue, use an opcode instead. This also moves more
of the assert-generation logic to a dedicated pass.
Moved the CSEHash for constants from TraceBuilder to IRFactory, to
simplify generating DefConst instructions later than the TraceBuilder
pass. IRFactory now owns the constant table.
Cleaned up type names in ir.specification. We always pretty-print
pointer types as PtrToWhatever, so do it that way in the spec too.
Added spec for Call.
Snapshot Tracebuilder state at branch points, and merge it at join points.
Remove the no-control-flow restriction from TraceBuilder::optimizeTrace,
since we can deal with control flow now.
In optimizeTrace, When we enter a block that has a snapshot state,
remove entries from the cse table that don't dominate the current block.
Add support for boxed ints, and boxed/unboxed doubles, and bools.
Emit inc/dec x86 instructions in some cases.
Got rid of some "mov rX, rX" found along the way.
We could overflow the size of an int, return a
"legitimate" looking value, and then crash in
fb_serialize_into_buffer when we go past the allocated
size.
If a PopR was marked as having a predicted type, and it was
not in the same tracelet as its producer, we would generate
code that didnt verify the predicted type. Note that normally
we would translate them together, unless we failed to get the
write lease. In that case, we could spend a long time inside
the call, and get the write lease by the time we return.
There's widespread dislike of the isStaticallyKnown
predicate, and I think the main problem is the "statically" part in
the name. Some of the basic types (str, null, array) are actually
unions of a static and counted type, so sometimes we don't precisely
know the type statically, but it doesn't matter.
To remedy the situation, I'm renaming isStaticallyKnown to be simply
"isKnown", which I think is a semantic improvement (@smith suggested
"hasKnownDataType" -- I figured that since the object is a type that
I'd go with something shorter). I'm also providing "needsReg", which
is now just the inverse of "isKnown", but having this available makes
the code clearer in places that care about the type only to figure out
whether to allocate storage for it.
There might be places where more precise tests could be used, but
they're not totally obvious to me. So, I want to push out the simpler
naming scheme first.
Got rid of lots of boilerplate by using ifThen helper. In nearby
areas, updated asm generation code to use the new style.
Factored two-register-assignment out of cgCallHelper(), use it
in cgIncRef().
Generalized cgIncRefWork slightly (since it was cleaner this way)
so it handles the case of no-static-check but with a type check,
and use shuffle2() instead of more adhoc copy logic.
cgGuardRefs should generate fewer unnecessary jumps now, by crafting
the code we generate based on compile-time-known cases (fixes TODO).
The logic for unwinding generator frames was assuming that the previous
frame was always a VM frame. This used to be a correct assumption, but now
we have invokeContFunc which allows native code to call generator functions.
This diff fixes the unwinder to correctly computer the bottom of the eval
stack in the case where the previous frame is a native frame. It also
updates invokeContFunc to properly decref the return value.
This flag isn't being used in any of our deployments. I sure hope not,
anyway, because if you turn it off, things are mega-busted; I can't even
run a sandbox without crashing pretty early on.
This breaks the OSS build's dependency on xhp. We're still depending on
it internally in a Facebook-specific extension (exposing the XHP
preprocessor to PHP code as ##xhp_preprocess_code()##). There might be
some way to replicate this functionality using HPHP's native XHP parser,
but that seems low-pri.
While I'm looking at this stuff during debugging, get rid of some of the
stuff that's only compiled for the now-nonexistent hphpc build.
I'm aiming for this change to result in identical code after
preprocessing. There is some other stuff in these files that could go
(any method with ##const_assert(!hhvm)##) but getting rid of those is a
bit hairier; I tried, and it resulted in some weird, hard-to-repro
instability, and it doesn't seem worth sinking much time into at the
moment.
Instead of having the body of the closure be in the ##__invoke()## on the ##Closure## class, instead we make an anonymous function on the real class and put the body there. The signature for this function is:
function methodForClosure$1234($arg1, $arg2, ..., $use1, $use2, ...)
and then ##__invoke## now just takes all the params that were passed to it, puts them as the first args to the anonymous function, then takes all the use variables it had saved up and passed them in as the next params.
I tried to not have an ##__invoke## at all, but I ended up basically doing the same parameter and use var repacking in iopFCall (and would have had to do it in x86 code too). I opted for doing the rejiggering in bytecode. If I did it in raw PHP I think it would have been much slower with many ##func_get_args()## and array operations.
We don't display the stack frame for ##isNoInjection()## functions, but the backtrace does set the previous call lines wrong.
This came up when I made the Closure::__invoke to be hidden, but the line numbers were wrong.
I often want to make sure we are close to parity with zend so
importing their tests seems like a good idea. I used this for an
upcoming diff and it worked out well.
I just tried this on all the closure tests and it needs a bit
of hardening. It worked well though :)
IRInstruction and Block have asmRange fields to record where
their assembly code got generated, but it's only used for
pretty-printing. Do it with a side structure instead, indexed
by block (for block ranges) or instruction.
Gave each instruction its own range, to simplify pretty-printing.
Modified the pretty-printer to also print code following the last
instruction in a block but which is still in the block's range
(e.g. a jump to the next block)
On the decompression+deserialization path one useless copy will be created in case the data is not compressed, but serialized. Uncompress always returns a platform native string, but if the data was serialized this platform string is not needed, since it will be unserialized into platform variant anyway.
This can be improved by unserializing the data directly from mc_msg_t in case it was not compressed.
The Zend versions of idn_to_ascii, idn_to_unicode and idn_to_utf8 take parameters whose valid values are defined by predefined constants and whose error codes correspond to
predefined constants. This change adds those predefined constants to hiphop. It also ports two of the applicable unit tests from the Zend code base to hiphop.
Fixed a few instructions in the HHIR instruction properties
table related to instructions that should not be moved or eliminated
by dead code elimination. Also made sure codegen can handle essential
instructions whose result are not used (i.e., instructions whose
destination registers are InvalidReg).
Fixes HHIR bugs that were causing perflab crashes: guards on
class types and tvBox.
We were ignoring stack guards on class, which was causing the IR to
lose track of types for classes on the stack and causing assertion
failures. Fixed by making class stack type guards into stack type
asserts.
tvBox was not returning the right value, causing VGetS to crash in the
IR.
Changed 1 test and added another to cover VGetS in verify_quick_hhir.
A test was hanging on to the result of
std::string::c_str() past the lifetime of the string and then throwing
an uncaught exception.
This diff adds a top-level catch block to RUN_TEST to prevent things like
this from killing the test binary in the future.
Each Block has a new Likely/Unlikely/Neither hint that can be set
any time. At codegen time, unlikely blocks are put in astubs.
Codegen takes care of inserting branches to/from them if necessary.
Updated the pretty-printing code to show AStubs code after each
block instead of after each trace.
Refactored the instructions that emit code into both a and astubs
so they work when a == astubs, by using a new unlikelyIfBlock
helper.
In order to find the srcs that reach a given DefLabel destination,
we need to know the Jmps that reach the label. This diff tracks them
in a singly linked list which stays up-to-date when setTaken()
is called.
Just to exersize the ability, I changed the prettyprint output
to print "phi" pseudio-instructions. These don't exist in the
actual IR but its helpful to see them in the dump output..
Instead of Traces holding a non-intrusive list of instructions,
have Blocks own instructions with an efficient intrusive list
so each instruction can only be in one block at a time.
Traces hold a list of blocks (which hold instructions); the overall
instruction order isn't changed. However, use the Block api to
insert/remove instructions.
IRInstructions and Blocks point to their "parent" but the accessors
are named specifically (IRInstruction::getBlock and Block::getTrace).
Branches refer to a Block instead of a LabelInstruction. Fields
of LabelInstruction were moved to Block, and LabelInstruction is
removed. IRInstruction is now nonvirtual.
Several loops over trace->getInstructionList() were modified to
be nested loops over blocks, then instructions.
Removed the union of SSATmp[*|**] for destinations; instead, use
SSATmp* m_dst, which points to 0 or more destinations. The
opcode still dictates whether the count must be 0, 1, or N.
Looping over multiple destinations now typically uses SSATmp&.
Renamed typedef SSARange to SrcRange and added DstRange.
Branch-patching state was moved out of LabelInstruction (or Block)
and is private to CodeGen now; each block has an associated patch
list.
This diff creates a Counted type union by splitting Arr into
CountedArr and StaticArr. This cleans up the *Counted() predicates
and makes needsStaticBitCheck more precise for arrays. Please check
me (esp on canRunDtor), because I am n00b with respect to PHP
semantics.
Include locals information in gdb via the preexisting dwarf output.
As discussed I have a top level DIE for 'HPHP::TypedValue' with the same name
which gdb takes care of while doing the opaque type resolution.
And use them in AsioContext, which was doing a lot of memory
allocation via malloc/free, was itself allocated by malloc, and
needed to be sweepable to deal with the fact that it contained
standard containers.
If the return value is a register pair (e.g. TypedValue by value),
and the register assignments required a swap or assigned rdx to
dest-register-0, then we'd clobber rdx before saving it.
Setting an array element with a key that's an array or object
raises a warning and evaluates to null. VectorEffects wasn't taking
this into account and was also getting a bunch of other things wrong,
so I rewrote it.
Pretty self-explanatory. I took out the 32-bit compatibility stuff
because we're pretty far away from being able to support 32-bit
platforms (the open source build explicitly checks and fails on 32-bit).
On non-x64 platforms, we'll fall back to calling uname(). As far as I
can tell, ARM doesn't have a "processor name" facility like x86 does,
but if we find one, we can add support for it here.
Don't compile this on other platforms. (Really, this should be something
like HAVE_EMMINTRIN_H but we don't have that kind of infrastructure
right now.)
To get the non-hphpc portions of buliding for HPHPc,
perform an HHVM build and execute it with the --hphp flag
To checkout the commit immediately prior to this:
git checkout use-hphpc
Removes the output errorCode parameter (that is not currently used anywhere, except in testing code that gets but ignores it). Adds the options, variant and
idna_info parameters that are present in the Zend version of idn_to_ascii. Implement the new functionality required by these parameters by using the UTS #46 API, if
present.
I was debugging this method and jumping into a macro that is 20 lines is much harder than a method. Shouldn't the compiler inline it if that is the right thing to do anyways?
There also was a hidden variable ##numArgs## which is what was messing me up.
This diff adds support for StringSlice and MutableSlice to the
String::operator+=() method. It also uses operator+= to replace some
manual string manipulation in throwStrOOB inside ext_collection.cpp.
These rely on the hilarious BSR and BSF instructions on x64. ARMv8 has
an instruction that does something similar: count leading zeros.
Unfortunately, their semantics differ in a few important ways:
- BSR and BSF, when given a zero input, set the zero flag and have
undefined output. CLZ outputs 64.
- BSR and BSF return the index (starting from 0 at the LSB) of the most- or
least-significant 1 bit. CLZ does what it says: count leading zeros.
Bottom line, there has to be some bit trickery somewhere to fit the two
instructions into the same interface. I kept the x64 implementations the
same and put all the nastiness in the ARM implementations to make them
match, since obviously the perf of the ARM implementations doesn't
matter yet.
PAGE_SIZE isn't portable -- e.g. my ARMv8 sysroot doesn't define it. I'm
replacing it in the way the Linux manpage recommends. This should work
on a wide variety of platforms. For example, it works on OS X.
Allow extensions to hook into hphp_process_init and hphp_process_exit
in the same way they could already hook into thread_init/thread_fini.
Also fixed a bug where we didnt actually call the fini funcs (nobody
is using it though).
The interpreter apparently does so, but the JIT doesn't.
I suspect this is what is causing the problem reported in the
task. However, I can't reproduce (or properly test) it.
I believe we need a scenario with a least two requests. The first
request has seen the callee function by the time it translates
FPushCufSafe, so that it doesn't interpOne this instruction. The
second request hasn't loaded the callee by the time it runs this
translation, so it misses in the TargetCache (at which point it should
try the autloader).
Broke LdClsPropAddr into 2 instructions by introducing a
ldClsPropAddrCached. LdClsPropAddrCached loads the static property via
the target cache so requires the class and property name to be
compile-time constants. LdClsPropAddr does not use the target cache
and works even if the property and class names are not constant.
Similarly broke LdCls into 2 by introducing LdClsCached. LdClsCached
uses the target cache to handle the case where the class name is a
compile time constant. Added code to detect cases where the class name
can be folded to a Class* constant.
Moved the code that decides between LdCls (LdClsPropAddr) and
LdClsCached (LdClsPropAddrCached) to the simplifier, allowing us to
optimize these instructions as optimization passes discover more
constants.
Changed HhbcTranslator code generation for AGet* and getClsPropAddr to
handle more cases using the above new instructions. Also improved
HhbcTranslator's translations for self and parent bytecodes.
Added a label to LdClsPropAddr to allow control flow when property is
not accessible. This enables implementation of IssetS (which was
incorrect before this diff) and EmptyS.
Implemented BindS, VGetS, EmptyS, and LateBoundCls.
Opportunitistically introduces calls to SSATmp::isA(...) to replace
explicit comparisons against Type::Tag values.
Moved AddElem punt from code generator to hhbctranslator so that we
can interpone.
One of the places that sets m_lastHHIRPunt was lost in
D705347. This will still capture punts that causes partial tracelet
interp because that translation ends up being a full-tracelet punt
when the IR tries to translate it again.
It's not ok for the unwinder to use a reference to elements
living in the m_faults array, since the unwinder can re-enter the VM
when calling destructors (or the FunctionExit hook). If one of those
re-entries does exception handling, it can modify m_faults.
Additionally, gets rid of VMPrepareThrow and instead just throw Object
and use the same case as we do when exceptions came from an extension.
I had to fix an assembler test catch handler to actually catch to keep
the assertion about m_faults on re-entry correct.
This diff adds a table containing information about native
helpers called by translated code. It is used by LinearScan and
CodeGenerator to replace the manual and somewhat error-prone
precoloring hints and calls to cgCallHelper. I discovered in the
process that many of these helpers didn't have any precoloring hints,
so register allocation for some things should be improved slightly.
Code generation should be otherwise unchanged.
- Implement SetProp and SetElem
- SpillStack before each helper call since they can throw
- Stub out the remaining unimplemented vector operations for more
fine-grained punt counts
- Kill TraceBuilder::gen* wrappers for vector stuff
- Rearrange some of the fake DataTypes to not confuse gdb. Values are
unaffected.
- LdLocAddr now returns a properly typed pointer
- Audit and clean up every use of Type::Null
- Add Type::debugString for use in gdb (calling Type::isString often
doesn't work)
- PtrToStr and PtrToNull are statically known
When interp'ing an N-instruction tracelet, we ask the interpreter
to step through N instructions. The interpreter treats FCall as one of
those instructions, and happily starts interp'ing into the callee. This can
cause us to form weird tracelets in the callee that have no business
existing. Instead, break on any control-flow we encounter.
The central repo is always writable, so use that for writing,
and disable the local repo, so that hphp doesnt try to open
the default central repo, which can cause contention if there
are a lot of processes.
It was testing the low bit of the destination register before
moving the context to be tested into it, and if the low bit of that
register happened to be set already, nothing was ever put into the
destination reg. In certain situations this was causing fatals from a
null $this in instance methods.
If you allocate a StringData on the stack, and it escapes,
you're in trouble. Make the destructor assert by default,
and add a StackStringData which does the appropriate
refCounting and checking.
Some StringData* helpers were forwarding to CStrRef
methods - which now forward to StringData* methods. Just
call the StringData* methods directly
Other methods were using stack allocated StringData's.
Thats pointlessly risky for slow-path code. Use a String
instead.
Although support for FP arithmetic was added to HHIR codegen, it was
not plumbed through. So it was still punting to tx64 or interpreting
FP Add, Sub, Mul. This diff enables these instructions, which had to
be changed to have a TypeParam.
This diff also implements some checking for IR instructions' dest
operands in debug mode.
Finally, this diff fixes the 'reduce' script, which got broken when an
HHIR banner generated by DumpIR was changed.
Try to scoop up some more instructions in the IR. In a
sequence like:
BoringInstrA
BoringInstrB
PuntInstr
BoringInstrC
we used to punt the entire tracelet to interp. Now we do:
hhir:
BoringInstrA
BoringInstrB
interp:
PuntInstr
BoringInstrC
Rescuing BoringInstrC from PuntInstr is future work.
This enables gdb to print something more useful than just the
raw integer when inspecting a Type. It also made the union macros
self-documenting and cleaned up the initialization in ir.cpp.
Removes all uses of Type::Home in the IR. Rather than moving
the local ids into SSATmps via DefConst, beefs up IRExtraData (adding
support for CSE-able extra data and pretty printing them), so we can
use a new LocalId extra data type for all these instructions. This
should save us a decent number of of ConstInstructions and reduce the
size of the IR, while also making accesses to local ids a little less
untyped.
In the case of a race to create a new class, we did
that, corrupting jemalloc's data structures in strange ways
(it didnt notice, even with a debug build, but bad things
happened later)
Avoid dynamic_cast<> on fast paths, use o_instanceof(StaticString)
instead. Saves about ~0.2-0.3% of CPU time.
Once HPHPc is gone, we will convert these calls to HHVM-specific API.
Thanks @bmaurer for discovery and @mwilliams for suggestion how to fix
it.
The old version read the first element of the array into $top,
checked some values on it, and then either shifted the first
element into $frame, or shifted it off and discarded it.
So its cheaper to just shift it directly into $top, and then
use $top rather than $frame.
If the profiler hook throws (and it used to not be able to),
we can decref a local twice. The generic return case is safe, but the
inline return case wasn't zeroing the type.
ThreadLocalSingleton had an s_key static property, and called
pthread_setspecific using it. In the USE_GCC_FAST_TLS case. Both
are unneeded, so remove them.
g++-4.7.1 treats "FOO"bar as a c++-11 literal operator, even
if bar is a macro with an expansion such as "BAR" - so add a space
after the quote (this seems like a bug, and I fixed a bunch of these
a while ago, but we just added a slew of PRI*64 macros which break
under 4.7.1).
Also, it warned that "explicit by-copy capture of 'this' redundant"
for a lambda declared [=, this] - so I removed the this.
We also needed more than the 60 levels of template expansion that was
allowed by the makefile.
Implements the SSwitch opcode.
This involved adding an instruction that needed a whole switch table
of compile-time data, so rather than encode this in SSATmps or adding
a new internal type this adds support for every instruction to have an
associated C++ structure of extra data. We should be able to move
Marker Label and ConstInstruction to use this, so this extra pointer
will be free since the vtable will be able to go away; we'll also be
able to save down this route by allocating fewer ConstInstructions for
compile-time data.
Array is already a SmartPtr, wrapping it in another is pointless.
Except it turns out that there are include dependencies preventing
us from using Array here. Use boost::intrusive_ptr to wrap an
ArrayData instead.
The parser depends on zend_html, which means it's impossible
to write parser unit tests without compiling all of hphp. This
separates the pieces that use runtime structures into ext_string, so
the rest can be moved to util/zend. There was a raise_warning call
indirectly from string_html_decode (in determine_charset), but the
runtime use of string_html_decode already checks that it is a
supported charset and otherwise throws NotImplementedException, so I
removed this. Another diff will do the actual move to util.
This diff refactors some of the VM's logic for iterators (with a focus on
mutable iteration), delivering several improvements:
1) MIterCtx was renamed to MArrayIter, and the m_key and m_val fields
were eliminated.
2) Eliminated the need for MArrayIter to dynamically allocate a
MutableArrayIter object, and removed other layers of indirection as
well.
3) Reduced the size of HPHP::VM::Iter from 64 bytes down to 32 bytes.
4) Removed the "if (siPastEnd())" check when adding a new element to an
HphpArray or a ZendArray.
5) Moved all of the iterator logic into a single .cpp file.
This diff reworks FullPos's to point to current element instead of pointing
to the next element. It also splits up the IterFree instruction into two
instructions (IterFree and MIterFree). These changes allowed various logic
to be simplified and data structures to be reduced in size. There is
definitely more opportunity for refactoring, but I know the JIT helpers for
iteration have been carefully tuned and so I'll leave further refactoring
for future diffs.
Finally, I spent a little time cleaning up the bytecode spec a bit, mostly
with respect to iteration.
Implements AKExists in HHIR. I've added a new IR opcode for it which
basically generates a call to helper method (which is very similar to
the tx64 helper). The major difference is putting the decrefs in the
IR rather than the helper.
We were calling Exception::__init__() from Instance's constructor.
If an exception is thrown (eg timeout), then we capture a backtrace
including $this pointers, unwind, and then pass the backtrace
to the error handler. But as c++ unwinds through Instance's
constructor, it calls ObjectData's destructor - so we end up
with a lame instance that doesnt work too well.
In addition, we were failing to correctly refCount the object
so we would leak its memory (and properties).
Every non-union type has been assigned a bit between 0 and
47. This includes Boxed*, PtrTo*, and PtrToBoxed* for all PHP-visible
types. There are static const members of Type for all of these, along
with all the union types that used to be in Type::Tag (Cell, Gen,
Uncounted, etc...). Arbitrary union types can now be easily composed
using Type::operator|. Switching on types and comparing using any
operator other than == are no longer allowed. Also gone is
Type::isRefCounted (since the exact needs of code using it can be
unclear), replaced by maybeCounted and notCounted.
Type::Tag still exists as a typedef for Type itself; I'm going to fix
up all of the useage sites in a separate diff to keep this more
manageable. I'm also planning on introducing operator< and operator<=
on Type as synonyms for structSubtypeOf and subtypeOf,
respectively. That will happen after this diff goes in, ensuring that
all old-style uses of those comparisons are dead.
When a callback throws an exception, we catch it, and then handle
the exception after the libcurl function that invoked the callback
returns.
But we weren't checking the exception after some libcurl calls that
could invoke callbacks, which leaked the exception object in RELEASE
builds, and asserted in DEBUG builds.
The memelim pass was being overly conservative by assuming that all
boxed locals can be modified by any instruction marked as
MayModifyRefs. However, we know that this cannot happen for locals
that have not escaped. This diff uses this fact to be less
conservative for functions other than pseudo-mains.
If a "surprise" happened during the prolog for a continuation,
the fixed-up $rbx was too low by the number of locals. This
resulted in the unwinder trying to unwind through garbage,
with unpredictable results
For "$obj->{__FUNCTION__}" and "$obj->{__CLASS__}", we currently don't
recognize __FUNCTION__ and __CLASS__ as "magic constants" and we just use
the literal strings "__FUNCTION__" and "__CLASS__" for the property names.
This is wrong. Same goes for object method calls (ex. "$obj->{..}(..)").
This diff fixes HipHop to recognize magic constants for properties names in
"$obj->{..}" expressions and method names in "$obj->{..}(..)" expressions.
There was an assert that the statically inferred type was at least
as good as the runtime type. The new test case shows clearly why
that cannot be the case (and we were tripping over this assert
in production).
When we skip EHEnts for catch blocks that we aren't going to
enter, we still need to count them in the handledCount so that we run
the appropriate fault funclet (see the unit test). Before this
change, it would fail to account for the catch handler and end up
running the fault funclet twice, leading to an assertion about
IterFree trying to free an already-freed iterator.
Add support for in-trace control flow to HHIR. To support
merging SSATmps, add support for 1+ sources to Jmp_ and
1+ destinations to DefLabel. (One must use Jmp_ to pass
values to a label).
Every pass is affected by this. The linearscan passes were
modified to visit blocks in reverse postorder (linear scan
should work even early blocks are not dominators of later
blocks).
Other passes had to be pessimized a bit because the dominator
assumption no longer holds. Either they need to do full
dataflow analysis, be changed to do preorder traversals
down the dominator tree, changed to work only on a block
at a time, or fully disabled if control flow is detected.
... Until we have an assert(m_curInst->isNative()) in cgCallHelper
and fix our flags. Without accurate flags, we can choose to spill
to an MMX register, then do a call anyway.
Per @mwilliams' suggestion, this is the first stage in a staggered approach to replacing int64 with int64_t. More precisely I inserted "typedef ::int64_t int64;" in util/base.h and dealt with the consequences.
Improved the way HHIR translates AGetC by generating a
defConst<Class> for the case where the loaded class is the same as or
a superclass of the context class. This eliminates an unnecessary
runtime class lookup (LdCls). Documented LdClsPropAddr.
were marked S(Gen) but can be anything, e.g. Gen*. Also, Mov needed to be unrestricted. Added an assert and todo in case Mov is used with types that use both registers.
The way type.cpp was handling return types wasn't quite
sufficient for multiple destinations and unfortunately added a step
that's easy to forget when you add new opcodes with the HasDest flag.
Also, assertions about operand types are too distributed through the
code. Pull everything into a central table, similar to hhbc.h. This
still leaves several opcodes without type signatures, since figuring
it out got a bit tedious for now. It did uncover a few documentation
bugs in ir.specification, though.
Since we now use the original throw location to figure out
which EHEnt applies when propagating from a fault funclet, we need to
check whether we've already unwound the ActRec before doing it again.
Doing it again could cause issues with nested FPI regions (hitting an
assert in sandboxes).
... is that once you put them into the codebase, the odds of them
showing up where you dont want them go up dramatically.
With a gcc 4.7.1 -O0 build I hit a consistent crash in production
where debugBacktrace had stepped too far, thinking that the
first non-vm frame was in fact a vm frame. It did so because
it looked at the word above the frame, and saw that it contained
c_Continuation::kMagic.
It turns out that fixupWork's isVMFrame left kMagic in $rdx if the
frame really was a generator ($rdx was dead, but still). We then
return from the tc via a serviceReq that doesnt need $rdx (so doesnt
set it), and then (with a following wind, so that $rdx hasnt been
smashed yet) enterTCHelper stores $rdx in info.args[1].
By another staggering coincidence, info.args[1] is at exactly the
right address to make the call into enterTCHelper /look/ like a
a continuation (based on kMagic). So if we then have a catch
which re-enters the TC there's a good chance kMagic is still there
and the next debugBacktrace (or uncaught exception) will crash.
This diff rewrites everything in terms of the C++ stack; we
now say that its a VM frame if its not on the C++ stack.
This kind of business obviously doesn't work on non-x64 platforms, so
let's abstract it out into platform-specific macros. I've also changed
some of the associated variable names, to try to break people of the
habit of only thinking about x64.
There are a few instances of this trick remaining, referencing rbx and
r15. These are a little more of a pain to abstract out because their
roles are defined by our internal ABI rather than an architecture's ABI,
so their ARM equivalents will have to be tied to our ARM register
allocation convention. These will have to wait a bit.
Summary: If we allow null characters in a regex, then a user could construct a string that ends with "/e\0" and surreptitiously execute an arbitrary piece of code. With this patch we fatal if a null is found anywhere before the end of the regex (determined by length()).
shorter
more consistent with other types (e.g. Obj is a pointer too)
makes some of them more consistent, e.g. FuncClassPtr should
actually be FuncPtrClassPtr in the old scheme
While tracking down gcc 4.7.1 issues I tried a -O1 build,
and got uninitialized variable warnings. None of them is
real, and at -O3 the compiler can figure out that the variables
are never actually used uninitialized.
Implemented Issets in HHIR. Added 2 new IR opcodes,
IsTypeMem<T> and IsNTypeMem<T>, that test the type of a given Cell* or
Gen*. We should be able to use them for IssetM as well. Fixed a bug in
simplifyIsType: We were not correctly folding the isType query when
the source was a static string. Removed a few unnecessary instructions
that cleared the top bits of a boolean register; booleans are
represented only in the lower byte of a register so its unnecessary to
clear the top bits.
In the interpreter, neither ContEnter nor ContExit checked the
surprise flags, resulting in the time being attributed to the
callers. In the jit, ContEnter checked the surprise flags, but
ContExit did not, resulting in what appeared to be highly
recursive profiles.
This change is mostly for FB internal organizational reasons.
Building is not effected beyond the fact that the target now
lands in hphp/hhvm/hhvm rather than src/hhvm/hhvm.
Instead of checking at every single callsite if a function is abstract, instead lets write out a method that just immediately fatals. I added an immediate to the Fatal opcode to say if the stack should be popped or not.
The hardest part of this diff was threading the `skipFrame` through everything. I ran into an ambiguous method issue and had to solve it with an Enum. I chose to create a new method instead of adding a boolean to `raise_errror` because I don't like boolean params.
Currently, IRInstruction has no id field that is stable between
passes, and the existing id field is used for transient analyzer
state. This adds a new IId (instruction id) id field and changes
memelim to use a side bitset for liveness.
When we resume after executing a fault handler via the Unwind
opcode, we were rethrowing as if the exception came from the PC
immediately following the protected region for that handler. If that
offset was protected by a non-overlapping user catch handler, user
catch handlers (or other fault handlers) could be invoked even though
the original exception didn't come from their region. This changes it
to track the number of nested handlers we've hit and rethrow from the
same PC that originally raised it.
The text returned for T_XHP_TAG_LT tokens was incorrect. This diff fixes
the lexer appropriately. I also touched up a few parts of the lexer where
"STEPPOS" was missing, and removed the hack for the T_FINALLY token as it
is no longer needed.
This is meant to help compare code quality between hhir and
tx64. If hhir successfully translates a tracelet, it will be
retranslated with tx64. If the code from hhir is larger than the code
from tx64, the bytecode and disassembly for both translations will be
logged. Only the hhir tracelet is put in the SrcDB, and the cursors of
a and astubs are rewound to reuse the space from the tx64 translation.
We want to produce a warning when someone tries to access a scalar as if it were an array. So, for example:
$x = 5;
$x[0]; // Bad.
This warning should be suppressed for list assignments where the RHS is false, though, because it's common to return FALSE in place of a list when there are no more values (e.g., mysql_fetch_array). So:
$x = FALSE;
$foo = $x[0]; // warning
list($foo) = $x; // no warning.
To support this behavior I've introduced a new member operation to the bytecode, EQ, which is exactly like EI but is quiet about warnings. List assignment is then translated using EQ.
Prohibit pcntl_exec() in multi-threaded mode, because otherwise file
handles related to repos can survive across exec with file locks held,
resulting in effective deadlock.
Cleaned up tracebuilder and simplifier. Moved simplifications
from tracebuilder to simplifier. Collected all the state tracking code
in one place in tracebuilder. Added runtime options flags to control
CSE and simplification during parsing, to disable parse-time
optimizations and to enable running them as a separate pass. Cleaned
up irfactory.
Implement InstanceOfD for HHIR, with most of the tx64
optimizations to it. Currently leaves branch fusion disabled since it
appears to be too aggressive in some grepping of DumpIR on a repo
sandbox.
This diff includes the basic outline for an IR vector
translator, as well as some minor changes to the rest of the system to
support it. Only a limited number of CGetMs are actually supported
right now; more complete support will come in diffs to follow.
We're creating a corrupt IRInstruction by saving a pointer to an
argument that's only valid for the lifetime of the ConstInstruction
constructor, rather than the lifetime of the ConstInstruction itself.
Relatively easy, and I didn't come across any suspicious activity while I was at
it.
There are two forms of CAS in the library: weak and strong. The difference is
that the weak version is allowed to spuriously fail; i.e. not do the CAS even
though the atomic variable has the expected value.
On x64 there is no difference in the underlying implementation, but on ARM there
is, because ARM has no atomic-CAS instruction. So the distinction now matters to
us. I've followed the guidance from the docs, which is: "When a
compare-and-exchange is in a loop, the weak version will yield better
performance on some platforms. When a weak compare-and-exchange would require a
loop and a strong one would not, the strong one is preferable." (The ARM
implementation of strong-CAS actually has a loop in it, is why.)
Instead of rolling our own atomic operations, we should use the C++11 ones. I'm
planning to get the other ones too (inc, dec, cas), but I'm doing them in
separate diffs to keep the diffs manageable-sized, and bisectable in case
something breaks.
The C++11 atomic ops let you specify the memory ordering semantics you want. Our
implementation of atomic_add was equivalent to memory_order_relaxed (i.e. no
memory barriers of any kind), so I stayed with that, for the most part, when
converting. I think this is OK: a lot of the usages are manipulating counters,
and aren't used to synchronize threads. The one exception is
PhpFile::m_refCount, for which I added acquire/release barriers as appropriate.
Also, ARM has a different definition of what's atomically accessible.
Once a PlainFile hit eof, we would never clear it - even though
we could be writing to it through a different file handle (or
another process could be writing to it).
This clears the eof flag on every read (it will immediately get
set again if the file didnt grow).
Before each write we would seek to the current position,
as maintained by the File itself. But another File could
be open on the same stream, and could have moved the
current position. Make the seeks relative to the stream's
own pointer.
This adds ext_asio extension that reimplements Preparer in C++. The
semantics are preserved as much as possible.
Known changes in semantics:
- cycle detection was improved: once cycle is detected, C++ preparer
does not fail the whole prep() call, but removes one edge of the cycle
and injects an exception
- context tracking was improved: C++ preparer maintains stack of entered
contexts and once context is exited, it is actively eliminated; as a
result, invariant is maintained that if wait handle A is blocked by
wait handle B, the context of B must be at least as specific as the
context of A (i.e. if you prep(A), all dependencies are guaranteed to
be imported into the context; if that context dies, all remaining wait
handles are moved to the parent context); this was not always the case
in a rare situations in PHP preparer
Implementation improvements:
- PHP preparer had only ResultToken (now StaticResultWaitHandle) and
AsyncContinuation (now ContinuationWaitHandle); a new types of wait
handles were introduced:
- GenArrayWaitHandle abstracts away the functionality of waiting for an
array of dependencies previously implemented as part of
AsyncContinuation
- SetResultToRefWaitHandle separates functionality of multiple result
destinations, previously part of AsyncContinuation
- StaticExceptionWaitHandle was introduced
- wait handles are now structured in a tree that implements various
abstract concepts (WaitableWaitHandle, BlockableWaitHandle, etc.),
useful for later introduction of wait handles that can wait for I/O
operations
Not implemented:
- Teak profiling is currently incompatible
- we plan to introduce better profiling support that will combine sync
and async stack traces
- before improved profiling is available, we plan to collect Teak
samples using PHP preparer
hphp used to parse the php files under src/system/classes, and
src/system/global.
- use systemlib.php instead of src/system/classes (when it exists).
- explicitly generate the contents of s_variables, since we already
had assertions about exactly what was supposed to be there
- build s_constants from g_class_map (this is ok, because when we're
building the system files, we bypass this code altogether).
Smith has been pointing out that most places where we do
exact type equality comparisons should probably go away. We'd like to
be able to have types like Obj(Foo*) as a subtype of Obj (if we have
traces with known instance types), and think of ConstInt(1) as a
subtype of Int. The intention is we'll start using isA() or
Type::subtypeOf() in most places that currently are using
Tag::operator==. Putting this up as its own diff (pieces taken from
pending diffs by @smith and @bsimmers) since I want it in InstanceOfD.
I think this isn't necessary now because we use GuardStk and
AssertStk to get type information into the stack. Unless I'm missing
something, I think we shouldn't be generating any cases where we spill
something back to right where we loaded it from anymore. (If it turns
out we are, we implement this then.)
Caching pointers to HPHP::VM::Func improved overall CPU time by about
2%, simplifying invokefunc() by another 1.5%.
Once asynchronous functions are natively implemented, we will be able to
eliminate >95% of these VM reenters.
It seems like it should exist for completness.
I needed this for adding a trait to the fb extension but I'm
breaking it out because I'm going a different direction with
that diff but it seems generally useful.
Used range-for in more places. A few places were accidentally
making copies of the list before iterating over them (value assignment).
Created an enum for DEAD, LIVE, etc.
build successfully on x64. Building hhvm on ARMv8 no longer trips
over these things. (It still doesn't complete.) I've manuall tested the inline
asm to get the stack pointer, and it works.
Move hphp/main.cpp to compiler/compiler.cpp, and rename
a few things to make it possible to link with hhvm.
Modify hhvm startup to run as hphp if
the first argument is --hphp.
While comparisons aren't very dynamically frequent, they're an
easy place to get started on implementing FP support. Like tx64, I haven't
bothered to teach the register allocator anything about the xmm regs; I
just move operands from gprs to xmm0 and xmm1, and evaluate the expression.
I was worried about interfering with spilling to these same registers, but
we spill to mmx registers, which alias the legacy floating point stack.
The XMM regs are physically distinct.
The new runtime option (Eval.EnableInstructionCounts)
combined with the new pseudo-counters instr_hhir and instr_tx64 will
let us track how many executed hhbc instructions were translated by
tx64 vs. hhir.
Modify hphpc so that if format contains "exe" it uses objcopy
to put the generated repo into the hhvm binary, in a section
named "repo".
Write an sqlite vfs wrapper that allows us to open a range
of bytes in a file as a read-only sqlite database (using the
filename to encode the range: file:<start>:<length>).
Use libelf at startup to see if there is a repo section, and
if so enable the vfs wrapper, and add appropriate
repo-authoritative options to the command line to use the
repo section as the repo.
Sometimes we have PhysRegs that are InvalidReg and try to use
them anyway. Right now, there's some cases where this ends up passing
reg::noreg to an emitC* function that treats that as a special
meaning, and you can emit machine code that didn't have the type of
memory operand you wanted. Assert here because PhysReg stuff is
constexpr (cannot assert there). Ideally it wouldn't implicitly
convert to Reg64 and this would all be impossible, but oh well.
It looks like these were implemented before simplifyCmp, so
it duplicated some of the abiltiy to simplify comparisons with certain
types of constant arguments. This code can be simpler and in the
simplifier.
This fixes a few problems when DCE is disabled, however some
tests still fail. Fixed:
* register allocator assigns registers to unused destination tmps
* fixed LiveOutRegs calculation with unused results
* removed assert from cgJmp_
* support immediate constants in cgMov
When using hphp to compile a repo, a syntax error in an input
file would cause an assertion to fail (DEBUG only).
When I tried to add a testcase, I found that we didnt quite support
this. We have a mode that ignores errors; but that also ignored crashes,
so I fixed test.mk to differentiate.
That revealed that hphp didnt pass through the result of invoking
hhvm, and that hhvm returned -1 for normal exits where the
php code had fataled.
When running the tests I found that it took over an hour to run
TestCodeRunRepoJit-Compilation, vs 4 minutes for the whole of
TestCodeRunRepoJit a couple of weeks ago (and my machines was
unusable for most of that hour). I dropped the parallelism
to "number of cores" from "number of cores * 2" and the time
dropped to a couple of seconds.
Replace ands with movzbl. Calling C++ for print_boolean was
wrong because it took an int64 but we don't know that the bool was
zero-extended yet. Added two boolean instructions to the assembler
(with unit tests) that I didn't ever use.
VirtualHost.*.overwrites.AllowedDirectories would replace the
default AllowedDirectories list, rather than of adding to it,
while Tiers.*.overwrites.AllowedDirectories would append to the
default AllowedDirectories list. This caused confusion, and
errors.
This makes both cases append, and also sorts and uniquifies the
list to make checking it O(log N) rather than O(N).
Introduce a runtime option to allow falling back to the
interpreter, rather than to tx64. Once this is faster, we know we're
ready to ditch tx64.
Cleanups along the way:
- Start macro-izing the runtime options. For tractability, I only did
the Eval* runtime options, but the same pattern should work for other
parts of the hdf namespace.
- const SrcKey& -> SrcKey in a lot of places.
- slightly iron out the handshaking between translate and
translateTracelet.
Added a label to LdClsCns so that we generate better code for
it if guardStackType refines its type.
Cleaned up some of the load instructions: Removed the NR suffix from
LdMemNR, LdPropNR, and LdRefNR. Removed unused offset parameter from
LdMem. Improved docmentation of load instructions and described
rationale for having labels on some of them.
Renamed CheckUninit to CheckInit to make it consistent with CheckType
(i.e., branch to label if type is not init).
Honor the MaxArrayChain runtime option by raising an error when
HphpArray probes get too long. Since HphpArray uses open hashing
with quadratic probing, we use double the configured threshold,
which is arbitrary, but based on testing, to reduce false positives.
Command line config options were applied after the Tiers
overwrites were applied - allowing the command line options
to take precedence over the Tiers overwrites. But they're applied
too late to affect which machines a tier overwrite applies to,
or to modify the values of Tiers.*.overwrites before they're
applied. So instead we apply them both before and after, allowing
you to set eg -vTier.devweb.machine=// to make your dev server
get the same settings as devweb, but still ensuring that command
line options take precedence over everything else.
The '--parse' option has never been supported for HHVM, and we silently
fail when this option is used. Let's at least give a more friendly 'not
supported' error message.
While looking at a task I discovered that Hdf::exists(const char*)
was broken in some cases, because it created the node, and then
tested to see if it existed.
I rewrote it to lookup the node without creating it.
Changes guard instructions to modify the stack by returning a
new StkPtr, so we don't have to load all guarded locations with
extendStack. This means SpillStackAllocAR doesn't ever have to do
anything differently from SpillStack (it only differs in a case where
we are spilling locations above an act rec that we loaded only as a
side-effect of guarding). After that, move the the responsibility of
logically popping the ActRec to the Call instruction (and account for
pushing ActRecs for NewObj in getStackValue).
In the immediate term, this lets us consolidate some
adjustments to rbx, but the motivation here is to make it easier to
omit spilling ActRecs for inlined functions. When we begin the
inlined function, we'll only have done a DefActRec in the FPush*,
instead of an AllocActRec, so if no SpillStack was needed before the
inlined function returns we can just pop it from m_evalStack. If an
exit trace in the inlined code needs to spill, its SpillStack will
spill the ActRec.
While trying to debug a g++4.7.1 crash I came across
some valgrind warnings.
One issue was that AsyncFunc was trying to construct
an Exception, which tried to build a backtrace before
things were properly setup. Its also pretty inefficient
to generate a backtrace for every AsyncFunc which is
never used (if an exception is thrown, the backtrace
is overwritten from the thrown exception anyway).
This changes it to use an Exception* instead.
A slow exit requests the code to be executed without HHIR (currently
via Tx64, and possibly the interpreter). These exits should be used
when a runtime condition determines that HHIR is currently incapable
of producing valid code for that tracelet.
Inspecting the code, I found out that a number of places were using
slow traces incorrectly, i.e. for cases where HHIR would produce a new
translation that correctly handles the runtime condition. So convert
these over to be "normal" exits.
This diff also does some cleanups, including removing unused LdThisNc
instruction, removing label of LdLoc, and reworking genLdLoc.
If we actually try to rematerialize a LdClsMethodCache, right
now you'll get a crash because cgLdClsMethodCache assumes the label is
non-null, but rematerialization nulls out labels. It seems to me that
we probably don't want to be in situations where rematerializing these
happens (since they usually should just be stored to an ActRec
location), so just remove the flag rather than making it support a
null label.
VarExport will output the double 1.0 as "1", but for
reflection we want the fact that it's a double to be preserved in the
output. Now the double 1.0 will be output as "1.0" in PHPOutput mode.
If you want tx64 tracing at the same time as DumpIR, right
now the output interleaves in a bit of mess. Send the dumps to the
trace facility if it's enabled.
Trying to debug sandboxes and I hit this. I think this fix
is actually correct, although it'd probably be better to teach the
register allocator to require that GuardType is the last use of its
source, and just always map the src value register as the destination.
rpc requests preserve state across requests, so we cant drop
the TargetCache just because the thread has been idle for a
few seconds. The result was that all classes, functions and
certain items of global state became "unloaded" while most
global state was preserved. The autoloader then did its best
to set things straight, but some things were beyond repair.
If the StringData ever gets put into a String or Variant
it will cause the (stack) memory to be put on the GarbageList,
leading to crashes later. This was exacerbated by not incReffing
the StringData (so the String/Variant didnt even have to outlive
the StringData for this to be a problem).
We probably couldnt hit this case prior to function autoloading,
but now its easy. Making an rpc request to a non-existent function
hits it reliably.
It turns out that invoke is normally entered via the version that takes
a CStrRef, so this should actually be an improvement on the old code
(which threw away the incoming StringData and created a new one on the stack).
This expands ArgDesc and cgCallHelper to fully support
passing TypedValues by value to helper functions, and changes all
relevant helpers to take TypedValues by value, instead of taking value
and type as two separate arguments. I expect that we might need to
teach cgCallHelper how to emit >6 arguments at some point, but that's
not needed yet.
Modify hphp to batch units together when writing to a repo, in order to
reduce transaction commit overhead. Do all writes on the main thread,
in order to avoid database lock contention.
Short-circuit FunctionScope::RecordFunctionInfo() unless in WholeProgram
mode. This removes a major point of lock contention.
DwarfInfo takes up a lot of memory in production,
and is only needed if we want to see PHP symbols in core
files. Disentangle it from perf-pid.map generation, which is useful
for profiling.
If you line up a tracelet *just* right, you can have a live
register for the stack slot that FCallBuiltin outputs. Since FCallBuiltin
works in-memory, this register will contain a stale value. This can
cause arbitrary badness; it was first sighted by Emir as a branch that
goes the wrong way in PHP.
Inside Unit::mergeImpl, a ReqDoc would create a new local
VarEnv if the current function didnt already have one. That was
wrong because the unit must have been required via either ReqSrc,
ReqMod or fb_autoload_map. In addition, it meant that we were
attaching a VarEnv to a function that might not be marked
AttrMayUseVV - with the result that we would leak the varenv.
We approximate the dataflow for try/catch by adding
edges from the start and end of the try block to the start
of every catch. But a top level return in the try block will
kill the edge from the end.
This diff explicitly adds edges from a return in a try block
to every catch. This is sub-optimal, and still misses a
hopefully unlikely case; I'll follow up with a bigger fix to
control flow.
ImmutableMap was doing 3 mallocs (one for the ImmutableMap
itself, one for the hash table, and one for the buckets). In
addition, it was pointlessly rounding up the number of
buckets to the next power of two. Reduce it to one malloc,
and only allocate the required number of buckets.
Similarly SharedVariant::VectorData was doing two mallocs. Reduce
to 1 by inlining the data.
Unit::defClass sometimes needs to set up a top-level ActRec
before calling Class::newClass, because newClass can
throw. Previously, the old ActRec would remain in place and the
VMExecutionContext would eventually get confused and try to execute
memory that didn't have valid bytecode in it.
Finishes moving value types from IRInstruction into SSATmp
(changing the other m_type uses to use m_typeParam), and allows any
instruction to have any number of srcs. Also consolidates creation
overloads for out different IRInstruction shapes into one helper, so
we need fewer constructors/wrappers. Adds an outputType() routine
that gives the return type of an instruction given its src types and
type parameter. (We may want to change outputType() and
assertOperandTypes() to be table-driven---I was initially expecting
more instructions would have logic there, but they are all really
simple.)
ContEnter is pretty straightforward after my change to use call/ret in tx64. The
change to ContExit is to generate slightly simpler machine code (now that I know
what DefSP actually does).
It looks like irtranslator copied translator-x64's headers. Its
concerns are a lot more cleanly separated, so let's exploit that. The
vestigial "log.{cpp,h}" were also sitting around taking up precious git
repo space.
traceInstruction was hardcoded to use stdout, via printf. Abstract that out.
Everything's still going to stdout, but we can now change that to any other
std::ostream if we want.
It's looking like having a stack of TraceBuilders in
HhbcTranslator might be nice for stuff I'm trying with inlining, but
rather than do that intertwined with those changes I separated it out.
Most of the lines changed here are m_tb becoming a pointer instead of
a reference. Also removed a little bit of cut&pasted things and
unneeded includes in irtranslator.cpp while waiting for things to
compile.
When comparing objects with strings, we can re-enter, and may
throw or run arbitrary user code. We shouldn't CSE or DCE any of
these cases, and we need to pessimize memelim and do a spillStack for
the exception case.
Problem occurs since there isn't a late run of simplification
to identify this, change it to a direct jump, and eliminate
any following IR.
This is a patch to materialize the condition in the flags reg
and then jcc. After the simplifier's extra pass is implemented,
this if stanza can be removed or become an assert(false);
If you LdClsCns and don't use the output, it doesn't get
assigned a register which breaks the check for uninit. To solve this,
split the uninit check into a separate, Essential instruction that
consumes the output of the LdClsCns. (Presumably we could leave out
this instruction in cases where we know the class is loaded, and the
LdClsCns could rightfully be optimized out in that case.)
When redeclaring functions, there was some line noise from an
assertion failure in the emitter and a crash in debug builds. Approximate
our behavior for other kinds of redeclaration. According to Zend, this
really is an include-time failure.
Register supported URI schemes mapping to internal
handlers for File::open(). If no scheme is specifid,
the file:// wrapper is used by default.
This diff also adds the file:// and zlib: wrappers which were
previously only invoked via default and compress.zlib://
respectively.
Adds a Type field to SSATmp, so we can store the type of a variable
separate from the type hint attached to an instruction, *and* so that
we can have instructions with more than one destination, each with
a distinct type.
For the time being, this field just copies IRInstruction::m_type and
tries to keep them both up-to-date. Future diffs need to decouple the
use cases for the two fields.
Gets rid of TypeInstruction, instead putting a new (possibly
Type::None) Type in the base class for the IsType instructions. This
should go away after things are changed so SSATmps have types instead
of instructions having types (needed for multiple-dsts). I think it'd
also be nice to remove the ExtendedInstruction vs IRInstruction
distinction (allow any instruction to have any number of srcs (and
eventually dests)); this is a small step toward that.
Modify the test_code_run program to specify a distinct repo for every
test, so that no repo contention occurs.
Remove the default limit of 20 parallel tests, and for hhvm increase the
default to four times the number of CPUs to improve throughput. hphp
uses a lot of memory when building, so leave the hphp default at the
number of CPUs.
If -v WholeProgram=false and --optimize-level=0 for hhbc compilation,
use the same file-level optimization configuration as is used by hhvm.
Update repo documentation.
ArrayData::hasInternalReference had an incorrect
short-circuit when ds was false; if it found a
Serializable object, it immediately returned false,
even though later events might have forced it to
return true.
The result is that arrays with internal references
that contain Serializable objects may not deserialize
correctly (the internal references can get split, for
example).
When there was a non-const null and a const bool, the
simplifier would continuously flip the arguments and recurse
infinitely. For some reason there was a check for const bools before
the check for null, although the optimizations we can do knowing the
rhs is null seem just as good as the optimizations knowing the rhs is
a const bool. (In fact after a couple more recursive calls I think it
ends up there.)
liboniguruma claims to initialize itself correctly, but
its init routing isnt thread safe (despite trying).
onig_init looks like:
if (inited) return;
pthread_mutex_init(...);
pthread_mutex_lock(...);
inited = true;
// perform initialization
pthread_mutex_unlock(...);
So
- if a second thread comes along after inited is set
to true, but before the initialization is complete, it will
read from tables that are being initialized.
- if a second thread comes along after the first one has called
pthread_mutex_init, but before it has set inited, it will
try to call pthread_mutex_init again, which is undefined
behavior
We can avoid all the issues by calling onig_init() at startup time.
addLineEntries modifies a std::vector which another thread
could be reading. The reader has a lock, but the caller of
addLineEntries was explicitly dropping the lock right before
calling it. Since the lock is needed again, just hold it to
the end of the function.
AddNewElemC is a new helper method for HphpArray that directly
implements the NewElemC opcode and avoids the need for a wrapper
when invoked from JIT compiled code.
The value is passed by value with no ref counting; semantically
the same as CVarRef or TypedValue*, but without the added indirection.
Its also worth noting that its semanticly the same as using VarNR for
the parameter type, but since we rely on simple structs being passed
in registers on x64, I wanted to avoid VarNR with its constructors, etc.
This modifies AttachString mode for string creation to immediately
copy the string to smart-malloc'd memory and free the input block.
Since we always copy malloc'd strings to smart-malloc'd memory,
and we require strings created with CopyMalloc to run their destructors,
the only kind of sweepable string is the IsShared (SharedVariant) kind;
this slightly simplifies sweeping strings. (fewer need sweeping, at
least).
My hacking around with non-deterministic finalization found
this segv lurking in our current build; if you destroy an array, that
destroys an object, that has a destructor, we didn't have a reentry
record for the call to arr_to_bool.
This fixes some bugs in our code that were exposed by
gcc-4.7.1 and fixes/disables some warnings.
In addition, it works around a bug in gcc-4.7.1
(apparently fixed in 4.7.2) that ends a temprary's
lifetime too soon.
Some minor comment tweaks to make them closer to truth; add
some missing header dependencies; return unique_ptr<Tracelet> and take
SrcKey by value; move stackFrameOffset to not be a member variable;
and remove some uses of Transl::curFunc from the IR.
Use helpers where we can. Introduced new helpers to aid fast-path/cold-path code that checks to see if a destructor will be called before doing a decref.
Remove some boilerplate. Also most functions (top-level IR
instructions) seem to be intended to have the semantics that emitting
no code returns null, while many helpers just returned the frontier.
This lead to some cases (e.g. cgLdStack, cgStMem, etc) having
different semantics from other instructions.
Indents the IR and assembler output more than the respective
previous representations (seems easier to understand than outdenting
things relative to the HHBC). Also fixes support for ostreams other
than std::cout and get rid of the php-source line numbers.
A previous diff made a couple changes to the text returned from
ReflectionFunction::getDefaultValueText() (it now returns
double-quoted strings and uppercase NULL constants). I forgot to
update TestHint which tests this.
The escaping used by outputPHP produces valid PHP, but it is
sometimes a string concatenation expression which can't be directly
substituted in as the default value of a function. This diff adds a
new mode to VariableSerializer that will always output a single string
literal for string values.
The LVariableTable is ignored by invoke_file in hhvm.
(There's an assert that it is null in our hhvm invoke stub now, but it
shouldn't be causing any issue in release.)
Earlier when trying to make libutil compile separately via
rules.mk I ran into issues where cassert was including our header
instead of the system assert.h. Might as well fix it: probably it's a
bad idea to name our own headers after stdlib headers.
We want to let people write code that looks like this:
function foo(array<int> $x) { ... }
or
function foo(array<string, string> $y) { ... }
There is of course no guarantee that $x contains only ints, or that $y contains string=>string. This is
no different from containers like Vectors, and the guarantees will come from the type checker.
The parser will let you do crazy things like foo(array<Foo, string> $x). This is somewthing we'll
catch in the type checker.
libcurl isnt exception safe. In a few places it saves a heap
pointer, replaces it with a pointer to a local, calls something
involving user callbacks, and then restores the original heap
pointer. If an exception is thrown, the original value doesnt get
restored, and when the curl handle is freed, it attempts to
free a pointer into the stack.
So we catch exceptions around the callbacks, return normally,
and then raise the exception before returning from the original
call.
Remove an obsolete StaticString initialization assertion related to
integer string constants. Integer string constants are now allocated as
StringData rather than StaticString, so they have no impact on
StaticString.
Replace the precomputed integer strings with an array of pointers to
static strings that are allocated via the generic
StringData::GetStaticString() mechanism. Reduce the range of
precomputed integer strings from [-128..65535] to [-128..3969]. Lazily
populate the pointer array unless running in server mode.
In a certain configuration assert(false) isn't enough to
avoid -Wreturn-type in release builds. (I think because we're not
really defining NDEBUG in release for the old Makefiles.)
Remove icu_get_checks(), and convert the related thread-locals from
NO_CHECK to standard checking implementations. This reduces startup
overhead for command line hhvm.
- Changed input & return types of f_strlen() to be CVarRef & Variant instead of CStrRef & int64.
* strlen($array) now prints a warning and returns null.
* strlen($obj) checks if $obj->__toString exists, if yes uses it, otherwise prints a warning and returns null.
- Updated x_strlen(), fni_strlen(), iopStrlen(), emitStrlen(), analyzeStrlen(), translateStrlen() accordingly
- Updated files idl/string.idl, system/string.inc and fbmysqllexer.cpp
- Added test cases
So says the doc: http://php.net/manual/en/function.hex2bin.php.
Currently, it fatals on malformed input.
Also add a new field for return type hints in idl files. This
can be used for type prediction for builtin functions that
return Variant.
Since we enable FCallBuiltin based on whether or not
JitEnableRenameFunction is enabled. We don't want a repo
built with JitEnableRenameFunction=false to be run with
JitEnableRenameFunction=true.
A test breaks without this. Also this adds an
hhvm specific implementation, to avoid going through
the pointless and expensive ClassInfo mechanism. It
also cleans up the implementation of defined.
Before, generator bodies were entered without prologues, meaning they didn't get
stack-size checks or surprise flag checks. This fixes that.
This lets us use an entry path that's much more similar to regular function
entry, including the use of the call/ret instructions, which should offset some
of the perf regression caused by the extra checking on entry.
Define wrappers for accessing size/contents on older versions
of libxml2 which match the function protos in 2.9.0 and later.
Thanks to github/nareshv for finding the issue and providing a fix.
The PHP master that calls into pagelet server currently needs to wait
indefinitely on the worker. This adds an optional timeout parameter so that the
PHP master can timeout after e.g. 10 seconds and log a descriptive message
rather than dieing with the generic message of "the request timed out after 30
seconds".
There was also a bug with the return code for partial flushed responses.
Sometimes the PHP master would receive a bogus return status like 65353. I
traced this down to the fact that the return code was never set for partial
responses, so it would be the initial value of `int rcode` on the stack.
Separates hhvm from hphpc. (This will mean we don't have to
compile hphpc when iterating on VM changes, and will help for eventual
hphpc-deprecation.) Details:
- Stubs for hphpc-externals symbols that can't yet be removed.
- src/system now includes g_system_class_map, which is essentially
what the hhvm class map contained (only system stuff, plus things
from constants.php).
- VM::ProcessInit was the only part of the runtime that depended on
libext_hhvm---breaks that dependency using a function pointer for
now so you can link hphp_runtime without linking ext_hhvm.
- Remove dlsym usage to access compiler symbols, using function
pointers for now too.
LdLoc and LdStack, when provided a label, also do the type check, and
that's how guards were being implemented. Although that has the nice
effect of hoisting the loads early in the trace, it's bad because the
loads effectively end up as part of the guards and they execute
redundantly when guards fail.
This diff splits the checks into their own instructions, but also
loads all the dependencies right after the guards to preserve the
hoisting effect.
Values produced by the tracelet which had their values inferred by
static analysis were not being passed to the IR. This was not only
bad for performance, but it was only causing correctness issues (see
new test, which used to fail in repo-authoritative mode).
fb_autoload_map($map, $root) specifies a mapping
from classes, functions and constants to the files
that define them. The map has the form:
array('class' => array('cls' => 'cls_file.php', ...),
'function' => array('fun' => 'fun_file.php', ...),
'constant' => array('con' => 'con_file.php', ...),
'failure' => callable);
If $root is non empty, it is prepended to every filename
(so will typically need to end with '/').
If the 'failure' element exists, it will be called if the
lookup in the map fails, or the file cant be included. It
takes a kind ('class', 'function' or 'constant') and the
name of the entity we're trying to autoload.
You can supply the autoload map to hphpc at compile time.
Currently, this is just used to augment hphpc's parseOnDemand
feature. It will use the map it to find more files to add to
the build based on class/function and constant names used.
This becomes necessary when all the requires are removed.
For RPCRequestHandler threads, the ExecutionContext can stay alive
across requests, and hold references to the VM stack. Make
sure we don't free the stack from under the ExecutionContext.
$a = 'foo';
$a = array($a => $x[$a = 'bar']);
should be equivalent to
$a = array('bar' => $x['bar']);
but hhvm was treating it as
$a = array('foo' => $x['bar']);
In production mode, some code transformations resulted in code similar to
the above example, which hhvm then mis-interpreted.
ir.h contained an outdated and unused declaration for
eleminateDeadCode. It has been removed.
opt.cpp has a bug whereby an instruction is inserted into a trace, but
the backedge (IRInstruction's m_parent) is not being correctly set to
the trace. This has been fixed.
Gets rid of a lot of warnings in the cmake build,
and allows us to set -Werror=format-security to catch
potentially broken calls. Mostly codemod changes.
Now that the admin server isn't enabled by default, we have
to turn it on in the config for TestServer. Also fixed up
test_server.cpp so individual tests can be run.
Change the CG_PUNT macro in codegen.cpp so that it prints its argument
instead of its function. This allows us to get much more precise reasons
for why codegen punted.
Add support for NewTuple. The new_tuple helper function needs
the count and values passed to it on the stack, which is like
a call instruction but unlike most other IR instructions.
This diff enables the dormant CGetM type prediction code.
This affects tracelet geometry, sometimes making the trace guards explode.
To compensate, I appear to need two new heuristics:
1. give up on type prediction if we're in a polymorphic trace
2. don't inline returns in polymorphic traces.
These two heuristics use different tracelet depths to kick in;
an exhaustive search of an 8x8 matrix of parameters suggests (2,6)
is the right setting for now.
Any comparisons that are simple enough to be done inline end
up getting converted to Eq. This makes for much nicer register
allocation around the call.
This saves 2 bytes for the first local and 4 bytes for each
local after that in the inline case. Converting the loop code to the
new style saves a byte on the storel (it used to store 8 bytes to get
both _count and m_type, now it stores 4 bytes to m_type).
Enhance IncRef/DecRef elimination by emitting a DecRefNZ when
dec-refing a value that is also stored in a local -- since a local
still contains a reference to the value, such DecRef can't get to zero.
Pleasantly straightforward. This implements translations for all the Cont*
instructions that appear inside actual generator bodies. I have to still do
CreateCont and ContEnter, but in a separate diff.
Array access of the form [int][str] seems like a narrow
case to optimize for; the helpers are tricky and we're
i-cache limited. This removes the special case.
We don't have many violations of this, so it seemed easy
enough to fix them and turn it on. It looks like it caught one bug in
Func::mustBeRef(), but everything else looked right.
Move Local objects from being arena-allocated to values
in-place in the ConstInstruction. Some minor const-correctness stuff.
Also, the number of locals was being incorrectly counted and passed to
linearscan (which didn't really use the count for anything except an
initial vector size).
These don't need to be virtual. While doing that, changed to
a shared implementation (may make adding/removing bits from the
instruction table marginally easier). Also moved documentation from
the .cpp to the header.
Moves SSATmp spill information into a separate API from
allocated registers, with a bit indicating whether we're spilled.
This lets us drop the encoding of spilled vs. mmx vs. reg as integer
ranges, so we can move away from RegNumber everywhere (followup diff).
Also separates the concept of getNumRegs into numAllocatedRegs and
numNeededRegs, with some documentation on it and removes all the
LinearScan:: constants that duplicate abi-x64.h.
It feels like premature space optimization to give this a
generic name. If we need to chunk space down later, we can use unions
internally while still preserving a public interface to SSATmp that
has real identifiers.
Right now, assigned registers and spill locations are stored
on SSATmp with some ranges of integers indicating what means what.
I'm going to separate those, and the name getReg() will make more
sense than getAssignedLoc after this. (The only time getAssignedLoc
can really return spill locations instead of regs in codegen appears
to be cgSpill and cgRestore.)
Puts RegSet into physreg.h so we don't really need to include
regalloc. Also uses PhysRegSaver to save regs instead of new code for
it. (RegSet/PhysRegSaver/etc should probably move to a util/x64 soon
after this ...) Ideally PhysReg can go away some day (after old-style
assembler apis are unused, we should probably just use Reg64 or a sum
type of Reg64 + RegXMM).
I think we probably ideally want the only real public part of
this class to be the non-member function assignRegsForTrace. We'll
need a PhysReg-like abstraction hanging off of SSATmp to encapsulate
all the rest.
FrameInjectObjectMethod can free an object without
first setting m_object to NULL. If the object destructor
eventually calls debug_backtrace(), it would return the
freed object as part of the backtrace.
This diff fixes it so that FrameInjection NULLs out an object
before releasing it.
Setting HHVM_ATT_DISAS before running hhvm or tc-print will
result in any disassembled instructions being printed with AT&T
syntax. The default is still Intel.
These are the only instructions that take two HA immediates
and the emitter is not set up to handle that. In the IterInit* case,
it was causing failed ASSERTs in the translator due to misnumbered
metainfo.
The dedicated function for HphpArray doesn't pay for itself
so replace it with a call to ArrayData::appendWithRef instead.
If this path were hot enough, a dedicated HphpArray constructor
would be best.
When calling a stub that is also inside a/astubs, we
shouldn't need to load the address into r10 first. Also modifies
Stats::emitInc to preserve rScratch. This should fix TRACE=stats:1.
We don't need this anymore because there are no longer
extensions that add smart allocator types. This also makes it
possible to make -C src/compiler again.
The parser only needed string_md5 from zend, which also
requires a few other things. I pulled a few related zend functions to
util/zend (ideally we'd pull everything that doesn't use runtime/, but
no need to get carried away). After this (and the other diffs), the
only dependency from util to runtime is compiler_id.h.
StackTrace (and Logger via it) only depended on the world
because of the seg fault handler code importing execution_context.
Move this to runtime/base (it seems more logical to not have it part
of the StackTrace lib anyway).
It would probably be cleaner if most of the drop-stack
machinery was moved into a policy, but for now this just moves the
part that is VM-specific out so we don't need to depend on bytecode.h.
Only hooks up the servery ones to do a drop on the VM stack (I don't
think the workers in the compiler should need this?).
It looks like the main pain if we want a libutil.a is Logger,
which depends on StackTrace, which depends on execution context, which
depends on everything. JobQueue also needs to call VM::Stack::flush
right now (separate diff), and the parser probably would need to be
pulled out of util/. This diff just does the remaining easier things:
- put TypedValue::pretty outside of trace.cpp
- ringbuffer/pathtrack didn't really need to include asm-x64.h
- move stat_cache (uses VM::Transl and some other things) to runtime/base
- move hardware_counter (VM::Transl and complex_types.h) to runtime/base
HphpArray::nvAppend is semanticly equivalent to ArrayData::append.
All call sites but one were passing copy=false with a guaranteed HphpArray,
except one, which is cleaner as a call to ArrayData::append anyway.
Firstly, the ArrayData add() method is only legal when the key
does not exist, so we can use the fast findForNewInsert method
when adding strings to an array. (We already assert on dups).
Second, if we assume most find() operations on strings (for
get access) find the string on the first probe, then it makes
sense to do the string-equal check first, before the empty-pos
check.
- We know the type of local 0 for all Cont* instructions inside generator
bodies: it's an object. We don't need to guard on it. I removed the typehint
(normally added by the parser) because it results in a useless VerifyParamType
(which results in a tracelet guard). Theoretically we should be able to derive
the information I added in MetaInfo from the typehint, but that's for another
day.
- PackCont and UnpackCont don't touch locals anymore.
When Translator::analyze runs with the IR enabled, it skips
most optimizations. This is what we want to keep the instruction
stream clean for the IR, but if it has to pass the tracelet back to
TranslatorX64 for translation, the resulting code will be missing all
the optimizations that analyze would've done. Now we re-analyze the
tracelet with m_useHHIR == false if the IR translation fails.
Use guard relaxation for CGetL, RetC, and RetV.
This diff also changes the DataType numbers to give one bit for each
type (with the exception of KindOfUninit, which is still 0). This
enables a more efficient check in some cases, such as "uncounted but
not uninit", which is used for CGetL.
Changed LdRaw/StRaw to take an enum specifying the field accessed
by the Ld/St instead of an offset. Introduced a new class, RawMemSlot that
specifies the type, size, and offset of each field accessed by LdRaw/StRaw.
Fixed a missing evaluation stack bug in Strlen. Fixed bug in encoding of int3
in asm-x64.h. Fixed bug in rematerialization of LdLoc.
A number of builtins needs to access the ActRec of the calling
function in order to get context information (this, context
class, etc). Currently they do so by explicitly skipping over
the builtin's ActRec. However, when calling builtins with
FCallBuiltin, there is no builtin ActRec, and these builtins
must be marked with the "NeedsActRec" flag.
This diff eliminates the explicit skip step, and changes it so
that the innermost ActRec is automatically skipped if it is
the builtin's ActRec. With this change, fewer builtin's need to
have the NeedsActRec flag.
The sweep lists for Sweepable and StringData are circularly linked
lists, with the head pointing to an arbitrary node if the list is
non-empty, or null if its empty. By changing the head to a node
instead of a pointer, we can remove the branches from the insert/remove
code since they don't have to check for NULL.
While tracking down other issues, I hit a bogus
ASSERT in sendImpl.
I also noticed some fragile code, where it would have
been possible to write "HphpArray*arr; ArrayIter(arr);"
expecting to hit the ArrayIter(const ArrayData*) constructor,
but actually hitting the const HphpArray* constructor.
The latter is a special one that skips refcounting, and skips
null checks - so force the user choose it explicitly.
Also make another special constructor more self documenting.
Aravind noticed that the logic here was broken; it didn't
check enough locals when counting how many were ref-counted. (We'll
probably want to revisit the value for kMaxInlineReturnDecRefs after
this; I'll start perflabs for a few values.)
If 86sinit or 86pinit failed (eg the reference unknown class
constants), and there's a user error handler, local
NameValueTableWrappers could be leaked into the backtrace,
resulting in use-after-free errors and crashes.
This fixes the issue by skipping these functions in the
backtrace (which is desirable anyway - they're just an
artifact of our current implementation).
We were generating calls to sanity-checking helpers at the beginning
of each tracelet, and also at the end of tracelets ending with a Ret*.
That's not good for performance... so only do it for debug builds. :-)
The debugger keeps a map of DebuggerProxy's in Debugger::s_debugger.
When a DebuggerProxy is destroyed, it has to wait for the
corresponding thread to finish its work, and then exit. As things
stood, this didnt happen until s_debugger's destructor was called.
But that was way too late - lots of things (such as empty_array)
have been destroyed, and the thread would typically crash.
Clearing the map on Debugger::Stop shuts them down at the right
time, and things go much more smoothly.
This is similar to what I did for integer switches: when
every case in a switch is a literal string, the new SSWitch
instruction is used instead of a series of if/elses. At translation
time, if the input is a string and none of the cases are numeric
strings, we can turn it into a hashtable lookup. Otherwise we simulate
the code that would've been emitted by comparing the subject against
each case in the order they appeared in the source.
Adds some new syntax for using the assembler, to try to make
it look a little more like an x64 DSEL. Backward compatible with the
existing APIs, and stops short of any sort of massive conversions (we
can do this if it proves to be nice to use in new translations).
Converts a few small pieces of translator code to make sure it worked
and to experiment with using it.
We'd like to use operator* on registers to indicate register
indirect addressing in the assembler, so it's a little confusing if it
is also used to dereference scratch registers. This uses r(reg) as
the syntax to get at the underlying PhysReg.
Check the beginning of traces for a contiguous sequence of
Marker, DefFP, DefSP, and either LdLoc or LdStack with a label.
Record in the LdLoc and LdStack instructions that these
should branch directly to the anchor translation.
This is almost exactly what is needed to enable the
same optimization for slow exits inside the trace.
Code cleanup to reduce use of the TV_DUP* macros, plus a few
other minor cleanups to use available helper functions.
This also fixes comments about not modifying _count.
GlobalArrayWrapper and NameValueTableWrapper convert numeric
keys to Strings. But due to changes in the interface, the
cast to Variant now results in a recursive call back to the
int version. Cast directly to String instead.
Using a malloc'd StringBuffer leads to memory corruption because
of StringBuffer's internal smart-alloc'd StringData. So use std::string
instead and copy to a String on demand.
Implement a new PHP function called icu_match that is loosely
modeled on the semantics of preg_match that wraps ICU library's
regex compiler and matcher. Also steal some pattern parseing
code from f_preg_match(). Function returns 0 (no match),
1 (match) or false (error).
The prediction stress code was producing some impossible traces:
C* instructions that left a Ref on the stack, and KindOfUninit right
hand sides. This could cause rare downstream failures.
If fetching an apc primed key fails, raise a notice, return
false, and dont store the returned value (so that we'll
continue to raise the notice on every fetch).
If the function entry had not been translated,
and we couldnt translate it (because the write lease
was held by another thread) we would incorrectly
setup the pc to point to the function body, which
would skip the default initializers.
Replace perfectly reasonable macros with some templated
stuff. (Makes it so we can easily add support for more return types
and supports double arguments (but that doesn't matter since the JIT
doesn't know how still).)
Edwin and the Internet both agree: cvtsi2sd is considered
harmful without doing something to break the dependency on the target
register's high-order bits.
The array was being modified in place, while calling the
user sort functions, with the result that the user sort
function saw garbage. As a temporary fix, always duplicate
the input array, sort that, and then update the original
array reference.
A better fix would be to only do this for
the user sort cases. Even better would be to sort an array
of indices, and then update the array itself after the
user functions have been called. (task #1910931)
In addition, the new test case I wrote revealed a bug in
the translator's type inference. It failed to take account
of OpBindM's effect on its base. Also fixed here.
There was code that assumed it could call foreach
more than once on a continuation (as long as the
continuation wasnt "done"). This was broken by
recent changes to Continuation.
Optimize builtin calls so that they can be invoked directly
without going through fg_wrappers. Adds a new opcode FCallBuiltin
that does not require an ActRec for builtin calls.
Add a new flag "NeedsActRec" to mark idl functions that require
an ActRec.
The cleanup code assumed that the number of parameters
reported in the actrec was the number of parameters
pushed on the stack. But some of them could be in
extraArgs.
PREP_VAL tries to allocate a specific register (since it
will be used as a call argument). But there's no guarantee
the register is free. Clean/smash is not the right thing to
do, because it could be that one of the /other/ call arguments
is using that register.
So if the desired target register is not free, just use
any register, and let the usual argument shuffling take
care of getting it into the right place
1. Added Transport::getRequestSize() to expose the http request size. For LibEventTransport,
the size is computed based on the headers because libevent hides the data structure needed
to directly get the size (in evhttp_connection, not exposed).
2. Exposed the previous new API as a php function.
"yield break" is setting the current() value of generator to null, but
end of generator execution, or thrown exception do not. Let's be
consistent and set the value to null everytime a generator is closed.
This also makes generators more consistent with Zend's behavior:
https://wiki.php.net/rfc/generators
"current: Returns whatever was passed to yield or null if nothing was
passed or the generator is already closed."
IIRC we had this behavior for end of generator execution a few months
ago.
A few callsites of nvGetKey were not recounting correctly,
which causes string keys to be leaked.
Added a few new helpers that we can start using elsewhere too.
We've been running with StrictCollections=true in our configs for a while
without problems. Let's make it true by default, and then in the near
future we can remove this runtime option.
TimeoutThread was never notified when a worker thread
exited. This was causing a crash during shutdown when it tried to set
the timed out flag of a deleted thread (in hhvm the surprise flags
live in the targetcache, which is unmapped during thread shutdown).
We were not enforcing that traits from base classes are imported
before importing traits in their derived classes. This could result
in a derived class with an abstract trait method not "seeing" a
concrete definition of this method if the parent method got that
definition from a trait as well. This would result in a "pure
virtual" fatal being thrown.
Thread building blocks issues -- concurrent map only uses spin locks
and calls sched_yield to pause, which just doesn't work well. If there
is any contention, it causes the system to spin like crazy.
This type of contention seems to be happening at startup when many
people are trying to use the same variables from APC. Based on profiling
I think that people are trying to read data structures that are very
array heavy, and in the handleUpdate spend a lot of time freeing it, all
while holding a writer lock.
This diff introduces a lock when you first read the piece of data from
disk. Because you hold a reader lock with the const_accessor, there's
no risk of the datastructure itself being freed. One thread is allowed
to read the object.
This also introduces a SmallLock datastructure which implements a mutex
in the space of a single int. As noted in the comments, this could be
made smaller. It could also be extended to other small locking structures
(eg, a pointer that could also serve as a lock).
A few minor tweaks. Changes things so the compiler will
inline the fast path of getTranslation, only stores service request
args when there is a non-REQ_EXIT request, and removes the resume()
virtual since we're using TranslatorX64* in bytecode.cpp anyway.
I noticed this in je's command-line script and in histograms
of interp'ed instructions from www. A lot of the time (double) is just
to make sure on something that was already a double.
Removes more branches from common cases in the
frameFreeLocals helper by calling to different offsets in a
generic-decref slide. Also inlines the decref and destructor calls
instead of calling out to tvDecRefHelper. To support this it also
always emits the varenv check (for AttrMayUseVV) and decrefs
ActRec::m_this in translateRetC translation. For now this removes the
limit on total number of locals to do a specialized return (it leaves
the limit on number of reference counted locals).
By allowing avoiding frames and using a custom calling
convention, combines the two generic dtor stubs and fits them on one
cache line. Also inlines the call-table lookup for release helpers
instead of jumping to a C++ stub that does it.
"yield" and "yield break" are fundamentally different statements. Separate
their parsing paths and don't generate unused goto labels for "yield
break".
Parser's m_generators logic was updated to cope with zero labels case
that is now possible. A test case was added to cover this case.
Our attempts to handle reffy left-hand-sides for *= and friends
have always been broken. I noticed this while debugging an assertion
failure caused by misparenthesizing a check in analyzeSetOpL.
These operators produce unpredictable results. However, in the
vast majority of correct programs they produce a numeric type.
There's some uncertainty for /, so favor Double. We could
ultimately profile-drive this if we want more precision.
As part of the migration to making HHVM the default, all
builds must now explicitly state which version of HipHop
they would like to use.
For HHVM: export USE_HHVM=1
For HPHPc: export USE_HPHPC=1
Setting both (or neither) will result in an error message
from cmake.
As a macro, this one has teeth; it can mutate the 'fr' parameter,
so I rewrote it as an inline function. Also cleaned up a hack in
instance.cpp that was committed before we removed the _count = 0
from TV_DUP_FLATTEN_VARS.
There was one place where we didnt check
canUseDummyPseudoMain(), resulting in a
possibility of compilation errors. It was
pretty unlikely, because usually, such
requires have already been removed.
Use the hardware FPU for double operations. Instead of teaching
the register allocator about FPU regs, we just commandeer xmm0 and xmm1
temporarily to evaluate a single term. Doubles continue to live in GPRs
at instruction boundaries. E.g., to subtract two temporaries in $rax
and $rcx:
mov %rax, %xmm0 ;; inout
mov %rcx, %xmm1 ;; in
subsd %xmm0, %xmm1
mov %xmm1, %rax ;; out
If a block ended exactly at the end of a page, and then was
smart_realloc'd to a larger block, and the next page wasn't
mapped, you get a segfault. the extra 8 bytes we copy aren't
initialized anyway, so if we don't meet these three conditions
then the overrun is harmless.
If realloc's nbytes paramter was smaller than the existing block,
the bug doesn't happen because we clamp the copied size to the
passed-in nbytes parameter. So no write-overrun can occur.
Makes two flags that determine when we inline returns. We
must have more than kFewLocals to use a generic return, and may not
have more than kMaxInlineReturnDecRefs.
array_getm_s/s0/s_fast/s0_fast all just pass some constant flags to
array_getm_impl(), so bypass them and call array_getm_impl directly
from JIT code. Also, rename to array_getm_s for symmetry with
array_getm_i.
Fix Unit::defClass() ActRec initalization to correctly set m_soff. This
fixes a crashing bug in the unwinder in the case where a fatal is thrown
during class validation.
If the key is a small integer, try to avoid pulling the m_hash into cache. Guess that we're a vector-like array, and try skipping straight to m_data[key].
As part of this, I've knee-capped the (always dubious, imho) inlined array getm code. Since we're coming out ahead, I claim this worthwhile. Edwin, if this completely overlaps with your vector work I'll happily abandon.
The stats build was broken again because of a flag smash
in stats. This has been a recurrent problem; let's just save/restore
flags around the add unconditionally.
It was matching zend behavior, where any attempt to
redeclare a class is a fatal. But under hhvm we allow
the same PreClass to be declared multiple times. Now
hphpc matches hhvm.
Also, properly escape the class name in the error
message.
Added cgOpCmpHelper function, which amalgamates the logic from
all of the eight different comparison operators. It dispatches control
for specific comparisons via function arguments, calling into
runtime/base/comparisons.h.
This diff also brought to light some bugs and needs-for-improvement
inside of simplifier.cpp. These are included in the diff.
Sometimes we have m_savedRip set to something other than
enterTCHelper. In these cases, we're unbalancing the return stack.
Change the rets from the stubs to use indirect jumps so we only
mispredict once instead of all the way out the stack.
Split iter_value_cell_array into hot and cold paths. The hot path is
now a leaf function that handles the common HphpArray case and tail calls the
cold path as a separate function.
Stack depth computation had a number of issues. This diff introduces a new
way to compute the stack depth:
- Added a new structure, StackDepth. This structure is linked to a block of
instructions (usually starting at a label), and tracks the current stack depth
in this block. This tracking can take two forms:
* absolute depth: the depth of the stack is exactly known for this block
* relative depth: the depth of the stack is unknown for now. We keep track
of an offset, relative to the depth of the stack at the first instruction
of the block
- Each Label structure contains a StackDepth structure
- An additional StackDepth structure is created at the beginning of each
function, with an absolute depth of 0.
During the parsing process, when a Jmp instruction is encountered, the
StackDepth structure for this jump becomes linked to the StackDepth structure
of the label. The absolute depth at the label can then be inferred from the
absolute depth at the jump.
Besides being able to track precisely the stack depth, these changes also
give us the following information:
- detection of the unreachable parts of the code
- detection of some bugs (for example when two Jmps with different stack
depths target the same label)
In hphpc, we just didnt bother to check; as with many
fatals, the idea was that its faster not to check, and
the code should have been tested in hhvm anyway.
In hhvm, we tried to get the rules right. But the rules
for hoisted classes are very complex with lots of edge
cases; typically you have to try to define each class twice,
ignoring fatals the first time. We were bitten by a case where
the first pass correctly didnt fatal, and the second pass
was omitted as an optimization.
To make matters worse, in the cases where hhvm didnt
fatal, it ignored later definitions of a class, while
hphpc always retained the last definition.
enterTC in a release build was still making function calls
related to gremlins and tls accesses for DepthGuard. This separates
the main code path from the handling of translation-time service
requests, and makes some of the debug-only code actually get compiled
out.
Optimized iter_next_array by splitting it into a hot part and
a cold part. gcc now generates really nice code for it. The hot part is a
leaf function that uses only caller-saved registers, so there are no frame
creation or register save overheads. The cold part is a tail function call
invoked via a jump.
Replaces frame_free_locals with an asm stub. This involves a
change to the order that we destroy locals, and destroys m_this before
destroying locals. Also changes it so we still inline returns for
functions with AttrMayUseVV, so all calls to this helper have more
than kFewLocals to free, allowing a little less branching.
This diff shifts DataType values to enable a faster check for
"KindOfString or KindOfStaticString". The old sequence was a
"mov + and + cmp", which is replaced by a single "test".
We need a better way to do instanceof checks from C++ code,
without doing hacky testing of the vtable pointer value.
This patch also moves a few other fields so ZendArray,
HphpArray, and VectorArray stay packed as well as before.
When a closure is stored in a property (ex. "$o->p = function(){..};"),
there is no way to call the closure directly. Currently, if the user wants
to invoke a closure stored in a property, an intermediate local variable
must be used like so: "$cl = $o->p; $cl();".
This diff tweaks HipHop's grammar to be a little more flexible so that it
is possible to directly invoke a closure stored in an object property. If
a closure stored inside "$o->p", the closure can be invoked like so:
"($o->p)();". The parentheses wrapped around "$o->p" lets the parser know
that we are not calling a method named "p", but rather we are invoking a
closure stored in a property named "p".
This diff also introduces similar syntax for invoking a closure stored in
a static property ("(C::$x)()") or in an array inside a static property
("(C::$x[0])()").
They're causing problems with some www revisions. Unfortunately I
couldn't get the problems to go away with a more precise hammer than
this, so hopefully we'll have a real fix soon.
Right now we jump to a register-specialized astubs block that
just jumps back to another stub that pushes a ton of regs and then
calls a destructor. Since we're already specializing, we might as
well specialize the push code to be specific to the register allocator
state. Still calls the unary stubs in the case that emitDecRef is
happening on astubs.
Put the stackoverflow helper at the end of the astubs helpers
(it should be rare), align callToExit, remove some dead typechecks,
and remove an unused tv_helper.h function.
Adds support for loads and stores. Reg-to-reg moves should
be easy on top of this but aren't added yet. Also fixes bugs when you
use rbp or r13 with cmp_imm8_disp_reg8, and adds assertions about rsp
and r12 (which it also doesn't handle correctly).
My big diff changed the way getOrigFuncName behaves for non-static
method continuations. It should return the classname of the continuation's
$this, not the classname where the method is actually defined.
When we construct an object of a persistent class, all sorts
of things are known at translation time. This diff takes advantage of
most of those. I'm sure there are a few more tweaks I can do but this
is a good stopping point for one reasonably sized diff.
Currently if HHVM is built outside of git, the COMPILER_ID
and REPO_SCHEMA values will effectively be constants which
means that schema changes won't be picked up properly.
This diff works around the absence of git by using the system
time (nanosecond scale) to make a "random enough" number.
This returns the current number of active requests, the
number of queued requests, and (for hhvm) the size of the translation
cache and target cache, as json.
I just broke this with my foreach-reset fix. This change may seem hacky but it
makes semantic sense to me: it's OK to "rewind" a continuation to put it at the
beginning (i.e. run it till the first yield if it's not been started) but not at
other times.
foreach is supposed to reset the iterator at the beginning, and the way we
transformed foreach inside generators wasn't doing that.
This actually generates an "impossible" parse tree but the rest of the system
doesn't seem to mind:
($__foreach__1 = hphp_get_iterator($f))->rewind()
PHP maps hyphens to underscores when populating the SERVER array,
which allows a malicious user to spoof headers which might otherwise
be stopped by a proxy. This patch adds detection, and allows the user
to specify a rate limit, Log.HeaderMangle.
After an optimization to deal with known-to-be-constant
conditionals, the emitters virtual stack could become
unbalanced due to emitting code for unreachable expressions.
This fixes it, by skipping the unreachable code.
Use the hardware's call/ret instructions for FCall and RetC.
Since we don't want the C++ stack unbalanced for long stretches,
we call, then pop the return value and store it into the ActRec
in the call destination. RetC loads the return address, pushes,
and ret's.
Dont pad one-time service requests
Often only the immediate needs to be padded in a Jcc, so dont force
the entire Jcc to be smashable.
For a jcc+jmp, we only need the two to be smashable individually.
Except for a TypedValue->CVarRef cast, ArrayData::nvSet() and set()
are identical, so do the cast inline and eliminate two more virtual methods.
All of nvGetCell's callers except one passed in error=true, and
converted a NULL result to KindOfNull. So refactor the outlier
to use nvGet(), then streamline nvGetCell by returning init_null_variant
if the key does not exist.
visitIfCondition visits a conditional expression in
such a way as to avoid creating unnecessary temporaries,
and collapsing logical nots as it goes.
But we weren't using it for the conditions in loops.
This fixes that, and also expands it to recognize constant
conditions and emit constant branches for them.
This doesn't seem to have been kept up to date as new output types were added.
Don't use default in switch statement so that it won't compile if people
forget to update this again.
During warmup requests, the interpreter keeps track of the most
commonly used names for InstanceOfD and VerifyParamType. The first time one of
these instructions is translated, this information is used to assign each of
the top 127 of these names a bit in Class::m_instanceBits. This allows for
very efficient instance checking for classes and interfaces: if the
target name is in this new map, we can skip all the usual drama and just check
a bit in the object's Class struct. Since the bits are keyed on names and not
Class*s, they can be used in sandbox mode and for non-unique classes.
Negative field ids in compact protocol were decoded as unsigned
shorts, then converted to int64. This stopped them from correctly
sign-extending.
This changes decodes them as signed shorts so they sign extend on
conversion to int64.
Various attempts to inline a fast path for the MethodCache
into the TC looked like they cut instructions but always lost on
icache misses. This change makes the fast path in the C++ helper
execute fewer instructions, and seems to not hurt icache.
I profiled our JccBlocks to see which ones are taken often
(see the attached task). If a JccBlock is taken the majority of the
time it's probably going to be causing a bunch of branch
mispredictions, so I've reorganized the lopsided ones to make the
forward jcc the uncommon case.
AutoloadHandler::getSignature() uses the callback
name, which for closures is just Closure::__invoke
(or Classname::__invoke for other invokable classes).
Follow the array($object,$method) pattern of appending
the address pointer to the signature to differentiate it.
Create inline wrappers for the litstr, CStrRef, and CVarRef
ArrayData apis, and remove them from the implementation classes.
Now each method only needs int64 and StringData* overloads.
Added a few StaticString definitions to avoid using litstr methods,
in operators.cpp, ext_error.cpp.
Fixed a bug in the compiler to convert literal bool keys to int,
rather than relying on HphpArray doing it internally.
Later patches will clean up callsites that already have StringData*
so they don't have to use StrNR or other such wrappers.
It's the second most-called builtin in production (after idx()) and the common
case of a string argument is incredibly simple to implement (pull the m_len
field out of the arg, decref) so let's jit it.
Product uses v1 to store hitdata and docdata in indexes,
but ##thrift_protocol_read_compact## currently supports only v2.
This diff adds reading and writing support for
compact protocol v1.
We would like function foo(@int $x) to be equivalent to not having
any type information (i.e. equivalent to function foo($x)).
For now we are going to silently drop the type. At some point in the future,
we'll keep the type information around and log mismatches (but unlike a normal
type hint, we don't want to fatal in case of mismatch).
This kind of silent type is going to be useful to automatically migrate code
from php to hack.
Although the code was already using 32 bit moves when the
TCA's/Func's and Class's were in low memory, it can instead
do store immediates for those cases.
Also, for directly bound builtins, we were incrementing rbx
twice. Combine the two increments.
HPHPc can often infer function return types, although those types might
not be 100% accurate because values can also be Null. This diff stores
those types in the bytecode meta-data and uses them in the JIT to
generate better code.
In the general case where values may also be Null, the information is
passes as a prediction. In some cases, HPHPc guarantees that the type
is Null, so these cases are passes in as inferred and not predicted to
save the check altogether.
Also, if HPHPc's predicted type is not ref-counted, we know for sure that
the values is not ref-counted (because it's either that type or Null).
So use that fact to avoid generating any code for PopR in those cases.
I'm trying to do a cleanup of some of our logging code, and as a first
step, I'd like to clear out some unused logging functionality. This
removes Log Aggregator (AFAIK, this has not been used at least recently
we have other infra that does the same stuff) and the ability to log
to syslog (never going to be used)
Looking at the contents of icache ways at evict time, little
fragments of __memcpy_ssse3_back are all over the place. In the version
of glibc we're using, this routine is staggeringly general, and contains
carefully unrolled and tuned loops for every x86-64 civilization has ever
seen. Its final weight is around 11KB.
This is a tuned memcpy for our hardware. It assumes the presence of
movdqu instructions. Since it is statically linked into our binary, we
don't consume a TLB entry for its PLT entry.
- The generator body's ActRec, locals, and iters live in the Continuation
object. The eval stack itself is still on the main stack. Moving the actual
eval stack turned out to be really hard. It's also not necessary until we
decide to support yield-as-expression.
- The Switch at the top, and the parse-time transform of yield-inside-foreach,
are still there. Those can be gotten rid of separately.
- The FPushContFunc-This-FPassC-FCall sequence is gone, replaced by ContEnter.
RetC is no longer emitted in generator bodies, replaced by ContExit. The new
instructions are very simple; all they do is frame linkage and a jump into, or
out of, the generator body.
- PackCont and UnpackCont are still around, but they're also greatly simplified.
- Unwinding, stack printing, and translator fixups needed a bunch of tweaking,
and a bunch of asserts needed to be weakened or removed. These are the
messiest parts of the diff.
- The one really gross part of this is the magic number just after the
Continuation's AR. I use this when tracing the stack. The trouble is that an
FP pointing outside the stack may now be pointing at a Continuation's AR, or
into the C++ stack. This is the most "reliable" way I could find to
distinguish the two cases. Is there a better way? I figured there must be some
sort of API to tell where the C++ stack is mapped in, but I didn't find one.
- The changes are mostly unimplemented in the IR. The tests pass, because
everything punts.
- Continuation is now a final class. The {Function,Method}Continuation
subclasses are gone.
The prolog guards were optimized for the case of an
8 byte Func*. Now that we expect them to be 4 bytes,
optimize for that case, reducing the guard size from 23
to 14 bytes, and from 4 instructions to 2.
In addition, store the unguarded address in the Func's
prolog table; in the truly polymorphic case, when the
guard complains, we dont need to check again after
looking up the correct function in the prolog table.
HipHop's current implementation for sorting PHP arrays is a bit unweildy,
with a lot of plumbing and layers of indirection that make it slow. Also,
the sorting algorithm used (zend_qsort) is relatively slow.
This diff delivers a more streamlined implementation for sorting PHP
arrays. Sort APIs were added to HphpArray and ZendArray, which allowed for
data-structure specific semantics for sorting a PHP array in place.
ZendArray works by creating an auxillary buffer of Bucket*'s, sorting the
buffer, and then rewiring the array's linked list. HphpArray works by
directly sorting the element store in place.
Both ZendArray and HphpArray do an initial pass over the array to do some
preparatory work before the sort algorithm runs. For sorts that use builtin
comparators (sort, asort, ksort, natsort, natcasesort), the types of values
are also observed during this first pass. Running the home page with some
simple instrumentation, I found that ~94% of these sorts invovle all
strings, and ~5% of these sorts involve all integers. By observing the
types during this first pass, we can use a specialized comparator for these
common cases and avoid performing type checks during the actual sort.
For now I've only updated sort, asort, ksort, usort, uasort, uksort,
natsort, and natcasesort. It might be nice to go after array_multisort and
collator sorts at some point in the future, though I'm not sure how much
that will matter for overall performance.
"this" is useful in code like this:
class Foo {
public function f1(): this {
...
return $this;
}
public function f2(this $x): void {
...
do something with $x
...
}
}
The first case (f1) does not require a grammar change, since the return type is dropped. The second case
however requires this diff, so that the runtime doesn't fatal:
HipHop Fatal error: Argument 1 passed to Foo::f2() must be an instance of this, Foo given
Mid-tracelet variant guards account for some nontrivial
amount of our branches to astubs. If we need to take a side exit
because of one of these and there are no dirty registers, we can emit
a very compact inline side exit. And if the current instruction
doesn't have a stack offset we can just emit a single jnz straight to
the destination.
If the target of an FCallArray was intercepted, and the
intercept handler decided to skip the original function,
we didnt update $rbx, leaving an actrec and the parameter
array on the stack.
The whole system was surprisingly robust against this. If an
exception was thrown, and the register state was dirty (the
usual case) we would recover. But if, eg, we re-entered and
threw an exception we would double free the array, and attempt
to free the ActRec as if it were Cells.
Tx64 has always been a purely greedy register allocator, hoarding
as much program state as it can in registers. This makes our occasional
trips off the main trace more expensive; they must save all dirty state,
and restore all live state.
So, let's finally cave and implement live and dirty range tracking. At
instruction boundaries, shoot down any dead registers.
tvScratch often contains garbage, so it's not supposed to be
decreffed or treated as a Variant. IssetEmptyElem was using
tvAsVariant on it, causing a bunch of crashes.
This case can happen when a function takes a function as input:
function f1((function (string): int) $f): int {
return $f("hello");
}
I talked to pad about "fixing" the lexer and having grammar rules for the casts. At the
end, it's going to be more work and just as ugly since we have methods called
"string" or "int".
A possible solution to these ambiguities would be to force constants to be capitalized, variable names
to start with a lower case letter, etc. I don't think such rules would play well with our goals to
be php-compatible.
VerifyParamTypeSlow was one of the most frequently called
helpers. Its fast path is mostly the same as InstanceOfD's fast path,
so let's use that code here as well.
- Add some more functions to print information about the current state
of things
- Add some more assertions about types
- Move the translation for VGetG back to translator-x64.cpp (it's not
a vector instruction)
This new version uses two fewer bytes and should be more
branch predictor friendly. I don't expect it to affect production
performance but I started an experiment to approximate its effect when
intercepts are turned on.
Adding implementations of the nv* virtual methods to ArrayData
lets us remove the IsHphpArray branches along the hot path in
various helpers. They're implemented genericly in ArrayData.
Of the various ArrayData subclasses, only SharedMap and
NameValueTableWrapper are accessed frequently enough to deserve
specialized implementations (especially SharedMap).
This diff also factors out the calls to raise_notice() into
helpers in ArrayData that the various subclasses can all share.
IsHphpArray() is now only used in a few ASSERTs, so I reimplemented
it in terms of dynamic_cast. I think we can rip it out in a future
diff, although the JIT still needs a fast is-hphp-array check to
guard inlined nvGet operations.
Put some more test and jcc instructions next to each other.
I discovered this oddness with emitCondJmp: it emits a jcc immediately followed
by a jmp, and it forces them to both be on the same cacheline. This could result
in more alignment gap than we need, maybe. Is this necessary for correctness, or
is it OK if the jcc and jmp have a cacheline boundary between them? We already
overwrite them separately and non-atomically anyway.
- Don't use UnlikelyIfBlock since the thing it was checking is
actually pretty likely.
- Call different slow path helpers based on whether or not the
candidate class is probably an interface.
- If the candidate class is unique and not an interface/trait, use
Class::m_classVec in an inlined fast path.
- Fuse all the branches when applicable.
This was showing up in production icache load miss profiles.
We only had tv_release_X because we were too lazy to implement C++
linkage, so get one extra cache line out of our working set by skipping
them.
Some vtune output showed that we were piling up on targetCache's
lock when running wordpress. Since this failure mode could plausibly occur
during warmup of the translator, and if we were falling back to the
interpreter for defines, avoid the lock when doing successful named
lookups.
This diff modifies the vector translator to pass integer and
string keys to the helper functions in registers, instead of as
pointers to TypedValues. It causes a bit of code bloat with a whole
boatload of templated funcitons but appears to be a net win. The
primary benefits should be slightly smaller translations and fewer
branches in the helper calls (since the key time is determined at
compile time)
I surveyed all of our cmp and test instructions to make sure they come right
before the jcc or setcc instructions that they drive, because processors
apparently like it that way. This is the only one I found that has non-nops in
between. There are several examples of cmp-nop-jcc, which is due to alignment
gaps and can be fixed separately.
Initially I thought the culprit was the ref-vs-nonref argument semantics
inside debug_zval_dump(), but after some digging I think the reason is
that m_refCount and m_referenced are not always initialized when
VariableSerializer::write(CVarRef v)
is called.
Func prologues are bulky entry points. We want them to reside on as
few cache lines as possible. Also, rename breaksBB to breaksTracelet,
which is epsilon more accurate.
Capture the number of entries in the SrcRec's m_fallbackJump
vector and if IR trace generation fails, throw away the
entries added at the beginning of trace generation.
TRACE=unlikely:2 will key the hits on (function, opcode)
instead of just (function), and unlikely:3 will spit out the same data
as 2 but in csv format.
Streamline fast-paths and out-line uncommon paths in HphpArray,
to improve icache locality. Several functions called allocElm()
after resizeIfNeeded(), causing a redundant full-check. A few
other functions were inlined too much causing code bloat; even
though they're hot, the inlined copies were big and not any more
specialized from the inlining.
* made find() and findForInsert() non-inlined functions
* inline the fast path of findForNewInsert but outline the
search loop.
* factored the allocElm()/nullcheck/resize pattern into a
newElm() function, with the uncommon (full) case out-of-line.
* split initElm into initElmStr and initElmInt, for clarity.
* made copyImpl() out-of-line
* made nvBind() inline since it's just a tail-call to a helper.
hphp_array.o is really large, and seems to be blowing out
icache. Some of the biggest helpers were these glue layers for the
translator; we had multiple copies of a 1.8KB routine in array_getm_is*,
for instance.
When "$this" is used outside ObjectContext, we would create
a local to hold it. This meant an incRef and decRef for every
invocation of such a method. In addition, "return $this" would
push the local onto the stack, incrementing the refCount, and
then the RetC would decRef the ActRec's $this.
This diff adds a BareThis opcode, which is used in preference
to the local except when the local is passed by reference, or
there are dynamic variables present.
It also special-cases isset($this), is_null($this),
return $this and $this instanceof Cls, to avoid the
refcounting in those common cases.
TranslatorX64::hhirTraceStart was being called incorrectly when
EvalMaxHHIRTrans was being reached. This had a side effect of emiting code
into astubs.
I'm hoping that most comparisions are either against a static string
or one that has already proven not numeric and that we can avoid a
bunch of expensive calls.
Previous diff missed that RuntimeOption::ExecutionMode
was being set *after* the config file was loaded, so checking
ExecutionMode always yielded an empty string (and thus not srv).
Also removed from unnecessary duplicate code.
The various debugger classes with worker threads need to wait
for the worker thread to finish before destroying the AsyncFunc
object. I also added a bunch of logging to make figuring out exactly
what went wrong easier.
For string-typed locals, we can't skip storing the type even when the
old and new types are the same at compile time. That's because the
guards don't distinguish between static and non-static strings, so
translations should be general to handle both cases.
Also added checks for non-generic cells in SpillStack.
When a conditional jump is eliminated during codegen, turn off
the direct jump optimization for the associated exits by setting
the jump's data field to kIRDirectJumpInactive
Ben Maurer pointed out an opportunity this morning while looking at perf top
output: eq_str_str was a little too quick to resort to memcmp. It was relying
on the trinary StringData::compare(), which falls back to
memcmp(a, b, min(len1, len2)) once the easy cases have been exhausted.
If we don't care about which direction the inequality goes (as for ==, !=,
etc.), we can first test length. We apparently test unequal strings frequently
enough for this to be profitable. I'll also try comparing hashes.
This is a little rough as-is; for instance, the short-circuit test for
identical pointers in eq_str_str might not really be helping.
Add low_{malloc,free}(), which use jemalloc functionality to allocate
objects in low memory (via sbrk(2)). Use these functions to allocate
Unit, Class, Func, and static StringData objects, so that the TC can use
more compact addressing modes when burning in references to them.
Fixed a bug in static property accesses where we passed the
wrong parameter to SPropCache::lookup. Fixed some code gen problems in
type conversions. Added a check stack that checks top of stack contents
for bad ref counted values. This is called on a function return.
Fixed a refcounting bug in int-to-string casting and CSE'ing of instructions
that produce or consume reference count.
A very common pattern for array-creation is array(<expr>,...) where
no keys are given. Capitalize on it with an opcode dedicated to that
pattern. Limit the arity to avoid excessively big operand stacks.
"foo()->bar += <expr>;" can produce an instruction where the
bottommost stack input is not the first stack input in
ni.inputs. useTvResult needs to be tolerant of this. I also fixed a
typo in emitPropGeneric that I found while working on literal support.
In eliminateDeadCode(), identify conditional branches at trace ends
that have a side-effect free trace exit. Connect the trace exits to the
Jcc instruction by appending its result SSATmp* to the exit's list
of inputs and set the value to IR_DIRECT_JCC_JMP_ACTIVE.
Option -- While the SSATmp linkage works and makes it easy to visualize
with Eval.DumpIR=1, I'm considering removing it. Connecting the instrs
using the m_tca in each instruction is simpler.
In addition, linking via the result SSATmp interferes with converting
guard checks to direct branches since these operations are currently
combined with the loads. May be better to split these into a guard
check with a label followed by a separate load.
Combine cgJmpZero and cgJmpNZero into cgJmpZeroHelper(.., cc);
Use LabelInstruction::m_patchAddr rather than m_asmAddr to anchor
the list of addresses that need to be patched to point to the label.
var_dumping the parameters includes a field
indicating what function they come from.
This is different between hphpc and hhvm for
closures. Now the test only shows the name
of each parameter.
Many vector instructions don't need an MInstrState struct on
the stack to be executed. The most straightforward of these are CGetMs
with all of their offsets known at compile time, which is what this
diff adds support for. Now the generic vector translator is fast
enough that we can permanently kill off CGetMProp (I've manually
verified that the assembly output for the fast case is exactly the
same). isSupportedCGetMProp is still around for the IR.
A few of the places that used Location::This sidestepped the
normal register allocation paths. I've cleaned those up, which should
cut down on the number of times we load from m_fp->m_this. I've also
changed the manually emitted closure code to use StackSym::H instead
of emitting This.
The Pack and Unpack Continuations helpers were incorrectly
not recording synch points. The DecRef for conditional jumps was being
pushed off-trace, which caused an assertion failure.
Add an array-capcity hint to the NewArray instructions, so
compiled expressions like array(1,2) or array(a=>1, b=>2)
have a shot at right-sizing the array before initialization.
In a really hard to reproduce edge case, fcallHelper would call
doFcall, which could then throw from FunctionEnter via
CheckSurprise.
In this case, the stack wasnt setup correctly to unwind
through fcallHelper, and the c++ exception mechanism called
std::terminate.
To fix it, I added a try catch to fcallHelper, and smashed
the return address to be the call's true return address.
This allows our unwinder to be called, and everything's happy.
While there, realized that we were swallowing exceptions from
the user profiler (basically to avoid issues like this); but
doing so was pointless, given that CheckSurprise could still
throw.
This loop seems to be a significant source of LLC load misses. To some extent
this is unavoidable, since its working set (all the classes in the universe) is
way too big for cache.
However, we can avoid doing a bunch of the same work repeatedly. The series of
pointer chases (*pre->namedEntity()->clsList()) was the major culprit, and if
all the Unit's classes are unique and defined without failure, we "cache" the
results of those pointer chases and identify them with a marked low-order bit.
We already do something very similar for the non-hoistable classes.
Also, comment a few subtle things that I realized as I worked, including a
presumably deliberate (but safe) race condition.
Most of this diff was pretty mechanical, just expanding
tables of helper functions. The generic translator now supports
getting properties off of a base object that's in a register, avoiding
a bunch of unnecessary stores in the <H ...> case. This also adds most
of the infrastructure necessary to support keeping intermediate values
in registers if we ever want to support that.
I used funcd on an FPushClsMethodF as a flag to indicate
that the destination of the call was known at compile time.
But if the FPushClsMethodF is in the fpi region of another
call that was fully bound, funcd will be set on it anyway.
FPushClsMethodF is almost always directly bound, so this
wasnt causing too much trouble. Pretty much the only time
its not directly bound is when the method doesnt exist
(including the case where it doesnt exist, but is handled
by __call or __callstatic).
Some endpoints like WebGraphQLIPhoneFeedController and
/api/perflab_graphql:feed_before do a lot of string appends, for which
we end up allocating O(n^2) memory.
When we append to a string, it is usually a good indication that
we will do more appends. This diff allocates memory for newly
appended strings in a multiplicative manner.
In repoAuth mode, for unique classes, the material type of a
class constant is very strong evidence of its type. In other situations
it's just a prediction.
This diff adds a new location code 'H', representing $this in
the current frame. A new instruction, CheckThis, is used to preserve
the time at which fatals from a null $this occur. While the primary
purpose of the new location code is to further complicate the emitter
and runtime/translator, it also has the nice side effect of avoiding
about half of the increfs/decrefs we used to perform on $this.
Some of the .inc files were out of date after KindOfInt32 was removed. Some
C++ functions use "int" for one or more parameters (we left in support for
Int32 parameters), and this diff updates the .inc files to reflect that.
This diff also updates a few .inc files that were out of date after the
__destruct method was removed.
We can keep probing, rather than bail, if we happen to encounter a
string in our probe sequence while looking for an int. This is probably
very rare, but saving the 5 bytes of code is valuable regardless.
TypeConstraint's constructor is not thread-safe on the
first call. This was causing infrequent hangs for repo-
authoritative bulids.
There may be a similar issue in non-repo-authoritative
runs - but Ive never hit it. I suspect that loading
systemlib.php prevents issues. But Ive added an explicit
beginning of time initialization anyway.
This was causing an awesome register allocation bug wherein a scratch register
was getting allocated on top of one already being used in the same instruction,
because the LRU order didn't reflect that it was still being used.
Since we include size for the header before rounding up, we don't need
the padding in MemoryManager::newSlab(). Also clean up a few comments
while I'm in there.
We were letting diamondGuard refill rBase after calling the
helper, but the whole point of the helper was to mutate the memory backing
rBase. We push/pop rBase around the helper call, just to be certain.
Avoid dereferencing a source variant when we know the
inner type is null, since the register isnt going to
be used.
Use a LazyScratchReg instead of a ScratchReg.
SetM would dereference the register belonging to a loc
and overwrite the same register, causing issues if it was
used later. Use a scratch reg instead.
Passed type inference & profiling information from the
translator to the IR via the HhbcTranslator::emitCGet*
methods.Propagated type information through move and IncRef
instructions.
Translate FCallArray. This adds a pointer to the Func* which
holds an address that will handle entering the appropriate
default-value funclet (or just the function body) after the
rest of the prolog has been setup.
Also cleaned up the dv-funclets, so they know the locals
are uninit-null (and dont have to guard), and to avoid
doint a surprise-check on the backward jump to the function
body.
Scratch regs need to be freed, but some were being manually
allocated - and often not freed.
Switch to RAAI, and add an assert that all scratch
registers are free on instruction boundaries.
Also removes an unneeded clean/smash.
I wasn't able to find any inc or decrefs that were doing
unnecessary runtime work, but these were doing some unnecessary
compile-time work so I cleaned them up. The code emitted should be
unchanged.
We've seen a lot of LLC misses touching the bitmaps in
the target cache. I'm working on a more elaborate change
to type-segregate the targetcache, but meanwhile this
simple diff seems to capture some of the opportunity.
We know we'll be allocating a lot of these bits,
so allocate them in great, throbbing chunks.
Get more information about statically known object types from
hphpc to the JIT. Uses this information on intermediate prop
operations for all M-vectors, and final operations for CGetM and SetM.
Removing the old SetMProp specialization now is a win. I also tried
implementations of the final operations for IssetEmptyProp, VGetProp,
BindProp, and UnsetProp, but none of them seemed to pay (and VGetProp
didn't work) so I'm leaving them out of this.
Rework SmartAllocators so each allocator allocates fresh objects from
the single frontier in MemoryManager, but still has its own
size-specific GarbageList.
Now with smart_malloc and SmartAllocator using the same slabs,
iterating over the heap is no longer possible; sweeping and the
backup cycle gc were effected.
For sweeping, NeedSweep SmartAllocator flag was eliminated.
Strings that need sweeping are placed on a string-specific sweep list,
and SharedMaps are swept using the Sweepable mixin class. Other objects
that needed sweeping were already using the Sweepable mixin.
The cycle collector is now disabled. It still compiles but it doesn't
visit any objects. Since it was not finding many cycles before, and
caused SharedMaps to "inflate", this is okay for now.
At one point we had a system invariant that the _count field
of root-like TypedValue's was 0. This invariant hasn't been
buying us much, and it costs us a lot of stores.
Stay in translator output for some simple array accesses. For
now, this just further specializes the already-specialized case of
array_getm_i. We'll build on this in subsequent diffs.
The translation inlines the *entire* array access, quadratic probe and
all. This appears necessary; only doing one probe, and then falling back
to a helper did not seem to win.
If the class exists at translation time, and is marked
AttrUnique we would burn in the address and assume its
always defined - but it may not be. So add the check.
hphpc does some static analysis to determine that the
class really is available, so use that to skip the check
when possible.
In addition, do some cleanup on Self, Parent, AGetC
and FPushClsMethodF.
Smart-allocate MIterCtx::m_mArray. Directly embedding m_mArray bloats
the VM stack enough to cause an overall performance drop, so
smart-allocation is the next best alternative to straight new/delete.
None of these were needed, either because the function they
were calling had its own VMRegAnchor or they just needed an fp which
was being passed in anyway.
Using StringData's ability to reserve space, we can steer more
string allocation traffic to small strings and smart_malloc'd strings,
at clean up code at the same time.
This also fixes a missing request-memory-limit check in smart_realloc.
This diff adds and uses IR instructions to support all
continuation bytecode instructions (except ContHandle, which is still
interpOned like in Tx64). Because the ir doesn't support Switch yet
and supporting it might end up being a lot of work, I've added a
runtime option to allow emitting Switch that defaults to true,
unless Eval.JitUseIR=true.
I've also fixed a couple existing continuation bugs:
- The exception handler region inside Continuation::next and
Continuation::send (under hhvm) was wrong: it's intended to cover
the body of the generator so it should only cover the FCall, not the
ContNext/ContSend. ContNext/ContSend were changed to not push the
continuation on the stack so they can be pulled out of the eh
region.
- Continuation::next and Continuation::send were setting the received
field before performing their sanity checks. This could lead to
wonky behavior if the current value in the received field messed
with the Continuation object in its destructor. hhvm (jit, interp,
ir) and hphpc now all perform the sanity checks before setting the
value.
If we add space for the header before rounding up, then the usable
pointer will be 8-aligned. However, if we throw away 8 bytes at the
start of every slab, all usable pointers will then be 16-aligned.
To reduce waste, add more size-classes (one every 16 bytes up to 2KiB).
ConcurrentTableSharedStore logs a lot of its actions. It
passes a literal string to log_apc, which expects a std::string,
causing it to create/destroy a std::string every time (resulting
in a malloc/free).
Create a global std::string for each literal string thats used.
Fold Continuation and GenericContinuation into a single class and remove
unused fields, reducing the size of Continuation by 8 bytes for hphpc and
24 bytes for hhvm.
Also get rid of the gross MethodCallPackage/CallInfo hack for generators
as it is no longer needed.
This is a code cleanup patch to make it easier to spot
bugs and refactor code.
Numerous functions in HphpArray check to see if they must
copy the array first, then contain two whole copies of
the function body, one if we copied and one if we didn't.
There's already a coding style in some functions that avoids
the duplicated code; this patch uses that style in the rest
of the code.
Early results show this is about 0.1% increase in CPU
Instructions but about 1% decrease in CPU Time, although
some endpoints have 1% noise
The side exit for an incorrect type prediction should
got to the next instruction; not re-execute the instruction
that just produced an unexpected result.
- Fail early if anything other than TestCodeRun* is run with
an hhvm build
- Set the m_url field of CurlResource in the copy constructor
- Don't fail TestExtCurl if test_curl_copy_handle fails. I've spent a
day trying to figure out why this is failing on Jenkins without much
luck so far and I want to cut down on the test noise.
The function could be a method, so trying to look it
up using a passed in name could fail (and is pointlessly
expensive, since we've already determined the correct
func anyway).
When a side exit from an HHIR-generated translation was taken, a
BIND_JMP service request was being used, which would make it create a
new HHIR translation at the exiting bytecode. This translation would
then generate a REQ_RETRANSLATE_NO_IR to produce a tx64 translation.
This diff avoids the useless HHIR translation generated for the
exiting bytecode in case no translation for this bytecode exists when
the service request is made. This is done through a new
BIND_JMP_NO_IR request that directly generates a translation for the
exiting bytecode without HHIR if no translation for the SrcKey exists.
This is a somewhat weak optimization that pushes some type
dispatch on keys into translation-time. It's easy to do, and looks like
it might be good for 1% or so; it's in the noise, though, so take it with
a grain of salt.
StringBuffer only has a few use cases which require it to
either attach to malloc'd memory or produce malloc'd memory
that the client later free's. This diff refactors those clients
so all StringBuffer-created strings can be smart-malloc'd.
string_cplus_escape() used StringBuffer internally but all callers
work fine (ne better) with std::string. Changed MemFile and
UrlFile to use a plain String (instead of char* ptr) to avoid having
StringBuffer return a plain char* buffer.
Created CstrBuffer to handle the use case of creating a plain
malloc'd char* buffer, for dynamic_content_cache and
static_content_cache; this seemed like the least-tricky solution
and those were the only use cases for the StringBuffer(char*,int)
and StringBuffer(char* filename) constructors.
With those changes, we only need the StringBuffer(int size)
constructor and it only produces ordinary Strings. so we have
a privat StringData instance which holds the string memory,
then we finalize it in detach().
I also renamed StringBuffer's fields to match terminology we
already use in StringData. (size or len = valid bytes of string,
cap or capacity = usable bytes of underlying memory buffer).
This diff simply increases the string size limit, and cleans
up a handful of places we used int when we should have used
uint32_t anyway. This clears the way for a followup, which makes
limits StringBuffer to the same length limit as StringData.
We use 2^31-2 because the real limit is 2^31-1 (biggest unsigned
value for a signed-int length), but we want to safeguard against
the likelihood of client code computing size()+1 into a signed-int32
and wrapping.
A subclass of GenericContinuation is created for each
generator function in code we emit. This was originally intended to be
an optimization to allow burning in Func*s in the translation of
FPushContFunc, but that was not a noticeable win in perflab and tricky
to get right in sandbox mode (see the attached task). This diff
changes the emitter and runtime to not generate all these subclasses,
instead just using MethodContinuation and FunctionContinuation to
allow some specialization at translation time.
Type prediction was disabed with HHIR. Enable it by passing the
predicted (or inferred) types to HHIR.
This diff also passes extra static analysis knowledge to HHIR:
- whether this pointer is available
- whether UnboxR is a no-op
Remove unused IR function HhbcTranslator::emitUnaryArith
Add option to have the IR punt this trace back to the jit rather than
doing an InterpOne for a bytecode to get more sensible perf numbers.
I noticed that doing an array cast ("(array)$x") on a collection will
produce an empty array (because collections currently use the default
object->array conversion and collections don't have properties).
This diff makes array casts for collections produce the same result as the
toArray() method. Array casts for collections will also raise a warning
(depending on runtime options).
For MetaHandle: a member class's member function counts as a
member of the containing class, so it has access to privates already.
The friend for the other direction appears unused.
Some elem/prop helpers know that their keys must be cells, but
those that use locals could get passed Refs. We know the reffiness of
the input keys at translation-time; use this type, rather than the input
flavor, to choose which helper to call. This should strictly reduce the
number of KindOfRef checks in vector instructions.
To gain confidence this was working as intended, I also factored out the
ref checks into a helper that asserts that it has stripped any refs.
For a call_user_func to a simple function which
didnt exist at the time of the call, we gave the
wrong info to recordReentrantStubCall, resulting
in a corrupt stack.
fb_call_user_func_safe_return didnt always return the
correct result in jitted code if the function existed
when the translation was generated, but doesnt exist
at a later call, and the default value was non-null.
fb_rename_function simply didnt work on any jitted
*call_user_func* code.
The warning for call_user_func* given an unknown function
was different between jitted and interpreted code, resulting
in verify_quick failures.
This was extracted from my working branch on returning values
in registers, without anything related to that part of it. It is just
a bit of code cleanup and allows sharing a few of the register
constants with the ir, and a little bit of additional documentation on
a few things. (There are probably more things that should still be
pulled to shared locations between the two translators, and I'm sure
there's still a fair bit of distributed knowledge about which
registers are what.)
This is almost all code motion, though there's a slight
tweak to CGetM to avoid checking reffiness where we can eliminate
it at compile-time. Also make the VM-only helpers in translator-x64
HOT_FUNC_VM rather than HOT_FUNC.
Add smart-allocation mode for strings; the CopyString constructors and
StringData(int) constructor now imply smart-allocation, and the append
function will smart-allocate if a small, literal, or SharedVariant
string is appended too.
CopyMalloc mode is now used in the few places when a client allocates a
long-lived StringData instance (with new). These include static strings
and APC strings, plus a few odds and ends.
StringData.reserve(int capacity) ensures there's at least that much room
in the string, similar to std::string.reserve(). Not called yet, but
will be needed later, so adding now.
We can use the fast path for class methods by giving the
cloned Func*s new funcIds (and separate translations). The remaining
cases that the inline fastpath can't handle are easily dealt with by
just calling the slowpath helper. Also fixed up an ASSERT in
Translator::applyInputMetaData.
Add DateInterval class
Add methods to DateTime class
add()
sub()
createFromFormat()
diff()
getTimestamp()
setTimestamp()
getLastErrors()
Add method to DateTimeZone class
getLocation()
Add matching function aliases for new methods
date_add()
date_create_from_format()
date_diff()
date_get_last_errors()
date_interval_create_from_date_string()
date_interval_format()
date_sub()
date_timestamp_get()
date_timestamp_set()
timezone_location_get()
Add timezone database version info
timezone_version_get()
A recent correctness fix disabled a lot of uninit detection.
While investigating the correct fix, I noticed that many
type predictions for the base of a Vector instruction seemed
to be ineffective (we still generated a guard), and that
"lvalue" vector instructions never generated the DataType
meta-data.
We seem to benefit from using huge pages to map the TC and
targetcache. This diff checks that the kernel version is greater than or
equal to 3.2.28-72-fbk12, and if so asks the kernel for large pages.
If the called function expects a reference,
but doesnt get one, zend raises a warning,
and returns false, rather than executing the
body of the function.
A previous diff emulated that behavior, but we have
a lot of tests that rely on the function being
called anyway. This version still raises the
warning (so we'll hopefully get the bugs fixed)
but continues on and executes the function.
ExtraArgs doesn't actually need to know its size, and also
doesn't need to have an additional pointer hop. This seemed *maybe*
like a small win in perflab, but changing to smart_malloc on top of
this looks like it actually cuts instructions and even CPU noticably.
Parsing XML documents from external resources is dangerous at the moment
since they can pass us:
<!DOCTYPE scan [<!ENTITY test SYSTEM
"file:///etc/passwd">]><scan><foo>&test;</foo></scan>
to read /etc/passwd
bit of an issue is that this is an XML global, so leave it loading by
default and a function to disable it. Will test with adding it to our
standard www init stack and see if we can make it off by default.
Test Plan:
Ran unit tests and:
<?php
$string = '<!DOCTYPE scan [<!ENTITY test SYSTEM
"file:///etc/passwd">]><scan><foo>&test;</foo></scan>';
libxml_disable_entity_loader(true);
$a = simplexml_load_string($string);
var_dump($a->foo);
Eval.ProfileHWEnable (defaults to true) can be used to keep hhvm's
grubby mits off the perf counters. This is useful for some profiling work
we would like to do with Intel.
We were passing a StringData::data() to a method taking a
CStrRef, which causes a new StringData/String to be created.
Instead, wrap the StringData in a CStrNR, and pass that.
For HphpArray lookups, we compare StringData::m_data for
pointer equality, and if that fails, we do a memcmp for
content equality. This diff changes it a StringData*
comparison for pointer identity, which eliminates the
StrinData::m_data dereference.
Calling the getDocComment() method on instances of ReflectionFunction or
ReflectionMethod was always returning 'false' if the function/method was
a generator.
This diff adds the missing plumbing in the parser to make this work
correctly and adds some test coverage.
HipHop Fatal error: Unexpected object type stdClass. when copy() is called
with a bad http:// URL source, for example:
copy('http://does.not.exist', '/tmp/foo');
The impl for the LateBoundCls instruction was missing logic to handle the
case where there is no late bound class (ex. inside a regular function).
Fix LateBoundCls to raise a fatal error for such cases instead of crashing.
Some programs that used static properties with traits would cause the VM
to crash while building the Class. Here is an example:
<?php
final class Foo {
use Bar;
private static $a = array();
}
trait Bar {
private static $a = array();
}
The crash was happening because was the m_staticProperties map of the
current class was being accessed before it was initialized. This fixes the
logic to directly read m_val from the SProp structure for static properties
declared by the current class to avoid this problem.
Fixed a code generation bug in LdThis where we did not
consider the case that the LdThis target is dead but LdThis itself is
not because of its checks. This was causing the destination register
to be reg::noreg. Fixed a bug in ActRec allocation where we needed to
mask in a 1 into the lower bit of ActRec::m_invName for magic
calls. Added a check to TranslatorX64::translate to check if we have reached
the translation limit for a SrcRec. Fixed an assert in SrcRec::newTranslation
that was off by one: When newTranslation is called when a SrcRec has reached
its limit it will cause the number of translations to become
kMaxTranslations+1, so the assert must check that we are one less than that.
Forward declarations were missing in the generated headers if type
contained a class name instead of just Object. Add them.
Class typehints in extensions still doesn't work in HHVM, but at least
it's possible to test impact of improved knowledge of types in HPHPc.
The get() and at() methods for collections were missing a cast from
TypedValue* to CVarRef, causing them to return true or false instead of
returning the value. Fix that.
In WholeProgram mode, we would convert FPass*
instructions to their known-at-compile-time
equivalents, and then drop the actual FPass*
instruction.
This broke verifiability. Instead, we emit the
equivalent CGet/VGet, and emit a NopOut FPass*
bytecode.
While I was there, NopOut some hand generated FPassC
instructions too.
Expressions like "$x[0][1] = 123;" were fataling under hphpc if $x was a
collection, regardless of whether key 0 was present and regardless of the
type of value at $x[0]. Fix that.
Bisecting in h2e showed this diff is the rev that causes the elevated
stacktrace rate. I imagine we'll want to debug and fix it but to unblock
getting an rc out for now we'll revert temporarily.
Summary:
pthread_exit can lead to confusing crashes for us: it raises
a "forced unwind", which it seems (strangely) will enter into C++
catch (...) blocks: but if you don't rethrow from one these blocks,
glibc will call abort() from a function called unwind_cleanup. Our
LibEventWorker has a catch block of this sort that doesn't rethrow, so
calling pthread_exit below there will abort the program. (The point
of this function is to just wait while another thread writes a
stacktrace file, so there's no reason to exit the thread.)
Summary:
Similar to D563995, this fixes more code that assigns a PHP
value so the new value is stored to memory before decreffing the old
value. It's more complicated but unfortunately necessary for
correctness.
Summary:
We're still spending a lot of time in Unit::merge.
This optimizes defClass and defFunc for the case
where the class/func is known to be unique. It also
fixes various issues in the emitter so that builtin
classes and functions, closures and continuations
are all marked as unique.
New smart_malloc/free api for variable-size smart-allocation. Memory
allocated with smart_malloc can be freed with smart_free, but otherwise
is swept at request-end. Allocations underl0 1K are bump-allocated from
slabs with an 8-byte header. Over 1K are routed to malloc with a
16-byte header (doubly-linked list).
ZendArray, VectorArray, HphpArray refactored to use smart_malloc instead
of a bunch of storage-segregated size classes.
Removes the need to patch libcurl and build a copy
explicitly for HipHop.
If libcurl >= 7.28.0 we get the new curl_multi_wait() function.
If not, we either get the patched in curl_multi_select(), or a
fallback which works for FDs < 1024, but fails (with an error
message) for sites which use more concurrent FDs.
DebuggerClient attempted to avoid problems with destructing
Socket objects by calling incPersistent so they would not be swept.
However, the semantics of incPersistent apparently require that the
object only is persistent in some sort of thread-local cache (it is
re-added to the sweepable list for the same thread immediately). When
we disconnect the debugger, the DebuggerClient is deleted and the
socket object is also destructed---the Sweepable destructor tries to
remove it from the current thread's list, so if this is a different
thread, the next sweep loop on the original creating thread will make
function calls on a deleted object.
Each subclass of ArrayData needs a flag to track whether there's
a strong iterator past the end of the array; use a spare bit in
m_strongIterators to store the flag, and consolidate more strong
iterator code in class ArrayData.
This will also aid in packing fields in subclasses.
Moving forward, these compiler features will be required for
IR and other features in the pipeline. Compilers which do not
support C++11 features will be incompatable.
This was injected when I converted the strong iterator
list to a linked list. Only affects HphpArray when
COW occurs on the array being mutated in a mutable foreach
loop. Starting a strong iteration causes a private copy
to be made anyway, but the bug should be fixed either way.
The DebuggerClient destructor could try to invoke ~Array
after the smart allocator shut down from a static dtor. In the case
of the hphpd_get_client API, it also could try to assign null_array to
an Array at sweep time---I think this could be wrong if these Array's
have references to other Object's with custom sweeps.
Non-smart-allocated VectorArray wasn't initializing this field,
which surely can't be good. This would most likely impact HPHPC
but only when creating static arrays; and its not clear when/if
we create static VectorArrays. But it's still a bug.
The assertion assumes we're dealing with a
utf8_string - in which case it will have been
checked for correctness. But preg also works with
non-utf8 strings.
The getNoCheck function will assert that it's not null, which
this code is apparently trying to check. (Somehow hit this loading a
sandbox page, but not sure how to repro.)
This could cause a local with a type assertion to be
inferred non-null at places where thats not the case.
In the new test, the $a in "return $a" was inferred to
to be an object, resulting in a crash when $a was
pushed onto the stack, and its refcount incremented.
I combed through HipHop's grammar and noticed a few bugs. HipHop's grammar
does not allow for traits to be declared anywhere except toplevel scope
(whereas Zend allows for this), and the grammar allows for bogus singleton
XHP tags such as '<foo a="1" /foo>' (which are not allowed by the original
XHP grammar given by xhpast).
This diff fixes these bugs, and it also removes some superfluous rules,
duplicated rules, and dead rules from the grammar.
I also took a look at 14 shift/reduce conflicts in the grammar to see if I
could eliminate most of them. 4 of the 14 conflicts went away when I fixed
the issue with the xhp_tag_body rules and when I removed the dead type_decl
rules. Of the 10 remaining conflicts, 2 were due to the if/else rules and
the other 8 were due to the "expr -> expr [ dim_offset ]" rule. Conflicts
due to "if/else" are well understood and conventional wisdom says we're
better off just accepting these conflicts instead of contorting the grammar
to try to get rid of them. Conflicts due to "expr -> expr [ dim_offset ]",
however, are not really well understood and there are a lot of them, so it
seemed worth fixing.
"expr -> expr [ dim_offset ]" was originally added to the grammar to
support using function calls as the base for array element expressions
(ex. "$x = f()[0]"). The rules for variable, calls, array elements, and
object properties were written in a very weird way that made them hard to
update, so I took the opportunity to rewrite these rules in a more natural
form and I updated the rules to support using function calls as the base
for array element expressions. Finally, I ran into cases where developers
were using array literals, class constants, and parenthesized expressions
as the base of array element expressions, so I introduced 'dim_expr' rules
so that HipHop's grammar will continue to support these cases.
pcre_exec checks that its subject is valid utf8 every time
it's called (unless told not to). Our preg_split implementation can
end up calling pcre_exec once for every character in the subject for
certain inputs, so rechecking the utf8 validity of the subject is a
waste of cycles.
This diff implements support for collections with serialize, unserialize,
var_dump, print_r, json_encode, var_export, and debugger serialization
formats.
try/catch()/finally and try/finally constuctions added to hphp. Code is translated into C++.
There still a lot to do:
1. Throw an exception when there's an unhandled exception in finally scope;
2. Write test units;
3. Support finally in virtual machine;
4. Return an error if there's a yield statement into try/finally scope.
Converting to ThreadLocalSingleton means each MemoryManager instance is
stored directly in __thread memory instead of malloc, and allows clients
to directly access it via the x64 FS register. This avoids needing
SmartAllocatorImpl::m_stats to update stats.
Moved The GarbageList static_assert to memory_manager.cpp.
Failing to call this when there are live RequestLocal objects
will can cause them to access deleted objects during pthread's
shutdown (calling OnThreadExit). This fixes and adds assertions about
ref counts that makes this easier to hit this in a debug build.
Some flib tests were failing, and the lines I looked at were
for SetOpM. The bytecode.cpp version of these two uses MoreWarnings
to decide whether warn is true for setHelper, so do something similar
for the translator.
This is a wrapper around boost::numeric_cast, to give a
helpful error message if we attempt to use an out of range immediate
or jump in the assembler. I'm sure there are other places in the
codebase it can be useful but this was the first that came to mind. In
RELEASE=1 builds it's just a static_cast so there should be no runtime
overhead.
The translations for the M* instructions call getMInstrCtx to
set MInstrState::ctx if the context is not fixed, or if there
are property accesses in the vector. But MInstrState::ctx is
only read when the context is not fixed AND there are property
accesses in the vector.
Also found an issue where the BaseN* helpers would access
the current frame without a VMRegAnchor. Fixed by passing in
the correct fp as a parameter.
Allowing for dynamic properties on collections is kind of silly, and it
would complicate adding var_dump and serialization support for collections.
This diff disallows property access for collections.
This checks to see whether a file could be included.
For hhvm in sandbox mode, this just checks the existence
of the file.
For hhvm in RepoAuthoritative mode, it ignores the filesystem,
and looks to see whether the file is in the repo.
For hphpc it checks to see if the file was compiled into the
binary.
Implement generic translation capability for all *M instructions.
Generic translations are emitted as series of helper function calls.
Add the Eval.JitMGeneric runtime option to allow generic *M instruction
translation to be disabled, for regression testing/reversion purposes.
Fix callUnaryStubImpl() to account for pushing rdi when argument is
rsp+disp.
Fix the interpreter to warn for BaseLW* as needed.
Fix the interpreter to initialize IncDecL's stack output before
operating on it.
Using comparison operators (==, !=, <>, <, <=, >, >=) with arrays and
objects can be expensive and produce strange results in some cases.
Rather than having collections match this behavior, I think it's better to
disallow using these operators to compare a collection with an integer,
string. This diff adds checks to either raise a warning or throw an
exception (depending on a runtime option) when any of these operators is
used to compare collection with an integer, string, array, or object.
Note that collections are still allowed to be compared with null, true, and
false using the comparison operators. Also, collections can still be
compared for reference equality with === and !==.
Update idx() to support collections. Also, for some collections the
contains(), remove(), and discard() methods did not throw exceptions if the
key type was invalid, fix that.
For the cost of one more pointer in struct FullPos, we can simplify
the tracking of an array's currently active mutable iterators.
The current array (vector of FullPos*) requires a secondary malloc()
if there are >1 iterators. Using a linked list never requires malloc,
and keeps the common case fast (most recently added iterator is the first
one in the list).
It was generating a RetC on an UnboxR (on purpose) and a try
region that started with a non-empty stack. Also adds support for an
HHVM_VERIFY_VERBOSE mode.
The static property accessor bytecode translations were not
checking whether the context is fixed, which is incorrect when translating
pseudomains. This diff adds a isContextFixed check to analyze[CGetS, SetS] to
make sure we are not in pseudo-main. It also replaces uses of arGetContextClass
with curFunc()->cls() where its clear that we don't want to be in pseudomain.
This diff removes the GeneratorClosure cppext class (which was only used
by hphpi), reenables a disabled trait test, and cleans up some other
miscellaneous things.
Use SmartAllocator to allocate VectorArray and ZendArray buffers up to 64
entries. The size of inline allocation space (4 slots for VectorArray,
8 Bucket*'s for ZendArray) is unchanged, since experiments showed any
change in either direction was a regression.
Also use helper functions to round up to powers of 2 rather than a
while loop; the helpers use __builtin_clz.
SetM (jit only) and UnsetM (everywhere) were decreffing the
destination before storing the new value. This allows user destructors
to get access to dangling pointers, potentially leading to objects
with invalid refcounts and other badness. Both instructions now store
the new value before decreffing the old value.
Eventually we might want a mode that throws an exception if
it fails verification, etc. This also fixes some inconsistency in
what the error message prefix looked like.
Verifier chokes hard on systemlib right now (I think due to
the OpSwitch changes). This lets you use it just on a small file.
Looking at updating for OpSwitch next ...
Lots of properties were straddling cacheline and page boundaries,
so that reading or writing one property required two entries in important
hardware caches. First, allocate ObjectData's at 16-byte boundaries;
second, make ObjectData itself 16-byte sized, so that its trailing
properties will also be aligned. Along the way drain a code-copying
swamp.
There are still alignment issues with objects that extend an extension
base class; some of them seem to have ragged sizes. We should pad them too,
but baby steps.
GenericContinuation does custom deallocation of memory, but did
so assuming it would not be inherited from. While inheriting is useless for
end-users, we implement some real continuations by inheriting from
GenericContinuation, so simply making it final is not an option.
(This appeared to work before because we weren't testing with enough fields
in the child class.)
I noticed that a lot of cppext classes define __destruct methods that do
nothing and this causes the VM to do an unnecessary re-entry to run these
__destruct methods. This diff removes __destruct methods that do nothing
from all cppext classes.
I think I messed these up when removing "using namespace
boost" from this file. For the various graph concepts, we're supposed
to use unqualified name lookup and ADL, which avoids this header
ordering thing.
Zend-PHP compat fix.
Currently, if you try to extend DOMNode, you're told:
DOMNode is not derived from DOMNode which is true,
but inappropriate for the intent of the method.
PHP 5.3.9 adds "allow_string" parameter to both methods.
Each method defaults to the behavior which they had before the
additional parameter.
- is_a() allow_string defaults to false
- is_subclass_of() allow_string defaults to true
The call to free() in ~c_GenericContinuation was a nontrivial
source of dTLB misses. The locals previously stored in that malloced
buffer are now stored in extra space allocated at the end of the
object, just like builtin properties at the end of an Instance. This
also makes translated code touching these locals slightly faster by
replacing a mov with an lea.
There were a few places (including Cache::alloc and
allocStatic) that allocated space in the targetcache
without taking the handleLock. This was mostly ok,
because they were only called when we held the
write lease; but there are paths in the interpreter,
the file cache and Unit::merge that can allocate
target cache entries without holding the write lease.
This resulted in hard to track inconsistencies in
targetcache state.
Now that we have inline-string space in StringData, we can use it
to store the SharedVariant* m_shared field for shared strings,
shrink m_hash to 32 bits, shuffle fields around so they still
pack nicely, then increase the inline space in StringData from
40 bytes to 44 bytes.
Doing this requires reducing string hashes to 32 bits everywhere,
which means 31 bits in practice because we use the high bit as a
flag in various places. (as before... we're really going from
63 bits to 31 bits).
The change also ripples into HPHPC tables which contain a string
hashcode since we can shrink that field to 32 bits.
There's no reason to sync out the APC data to be persistent on
disk since it's recreated every start. Data will lazily be flushed back
and this will reduce the start time of the process by 20-30 seconds.
There was a place where these were being leaked in the
translator, but also the idea of leaking the top-level ones until the
end of the request breaks long running scripts.
Make sure the components of a list expression know when
their result is unused. This came up when:
$x = expr1 + expr2
was optimized to
(expr1, expr2)
But then expr2 generated a temporary variable to hold its result,
which was never used.
It's possible to store the same key over and over again with a
ttl and it will grow the expiration queue unbounded. Use a concurrent
hashtable to store items which are already in the queue to prevent
adding the same key over and over. We only honor the first TTL and
ignore the rest.
Parameters were not automatically fully evaluated during
the preOutput phase if they couldnt interfere with each
other. But this didnt take account of the ordering wrt
late static binding.
ThreadLocalSingleton stores a __thread T* and lazy-initializes the
object by calling T::Create, which typically allocates a new object.
Access to the thread local T is indirect through this pointer.
We can do better by just inline-allocating __thread T. This patch
does so while preserving the timing of when T is initialized or
destructed.
I accidentally made a test of a function's parent's attributes
into a test of the func's attributes. Whoopsy. While I'm at it, use the
appropriate types on the scalars in Class.
Some investigation of DTLB misses showed that a tremendous amount were happening from free()'s called in the VM reentry path. This turns out to be due to the hash_map/vector pair we were using to record entry points.
Use a FB specific option to shutdown(2) which disallows all new
TCP connections, but does not drop any connections which have completed
the three-way handshake and will allow the application to accept(2) any
remaining connections.
OSS Note: This feature is gated on RuntimeOption::ServerShutdownListenWait
However OSS builds should never enable this option as the shutdown() call
will fail due to receiving an invalid argument.
This patch is included on github for consistency.
You will need to reapply libevent-1.4.14.fb-changes.diff
and build your custom libevent after applying this patch,
but before rebuilding hiphop.
A lot of hot class paths involve looking up the preClass's
attributes. perf runs for dTLB and l1-dcache load misses highlighted these
as a possible problem.
Alignment means we have a spare word in Class. Cache the preClass's
attributes there.
It should be cheaper to just calculate freelist length when we need it
than keeping track of it on the fly. Frequent logging of freelist
length thwarts this, so lets not frequently log that stat.
In HHVM, a fatal error is now thrown when an invalid object property
(one with an empty name) is accessed while creating references. This
behavior matches Zend.
assignRefHelper() takes care to increment the refcount before
copying a value in case destroying the LHS destroys the owner
of the ref; but then it reads from said owner after destruction,
which is a free-after-use bug, for example:
$x = array(0);
$x &= $x[0];
It never showed up before because SmartAllocatorImpl::dealloc().
did not garbage-fill freed memory. We just need to save the
newly created pointer before destroying the LHS value.
This patch also enables garabge-filling for
SmartAllocatorImpl::dealloc().
Explicitly pack in6_addr octets into 32bit BE words
rather than relying on non-cross-platform s6_addr32 linuxism.
Less efficient than autoboxing, but this is just a unit test.
Per the mmap man page: MAP_ANON
The fd and offset arguments are ignored; however, some implementations
require fd to be -1 if MAP_ANON is specified, and portable applications
should ensure this.
FreeBSD fails on fd == 0
GCC 4.4 doesn't recognize sizeof(Class::InstanceVar)
Declare the instance var with a typedef
and use sizeof(Class::TypeDef) instead for successful parsing.
Summary:
This diff adds the following builtins related to circular reference
collection:
gc_collect_cycles()
gc_enabled()
gc_enable()
gc_disable()
The current implementations for these functions just raise a warning and
return. In the near future, we can hook up these builtin functions to
HipHop's cycle collector. This will allow programs written for Zend PHP 5.3
to transparently use HipHop's cycle collector without requiring changes to
the program.
TestCodeRunVM ends up spending >85% of CPU time in _raw_spin_lock
on the certain machines with more than 21 jobs. Cap it at 20 to be safe.
Also let HPHP_SLOW_TESTS_JOBS apply to all TestCodeRun suites.
We need to enforce a minimum allocation size a) for GarbageList,
and b) for RefCountTombstoneValue. sizeof(void*) is not actually
big enough, but we're getting lucky since no real-world size is
less than 16.
I noticed that we weren't annotating the type of
an FPassL even if the variable was known to be UninitNull.
Fixing that exposed a bug in the metadata argument numbering,
and existing tests in TestCodeRun revealed another bug in type
inference, in scopes where $this is altered via a dynamic
variable ($t='this'; $$t = 'surprise').
The 86ctor method is only there for the benefit of FPushCtor*, it
should not be called via parent::__construct.
This removes it from the method table altogether, since we never
need to look it up dynamically; which also fixes:
$ctor = '86ctor';
$x->$ctor();
While I was there, I also removed 86pinit and 86sinit. For now, Im
leaving 86cinit, since it would mean increasing the size of Class
(by one pointer).
This also revealed a bug - 86ctor was not supposed to be inherited,
but it could be if the emitter "forgot" to create one when needed -
or if the user failed to create one using hhas. One of the hhas
examples violated this. I changed the base class constructor to be
__construct instead of 86ctor.
A recent change set the default limit to INT64_MAX, which
broke scripts which tested for -1 to see if a limit had been
set. (eg they would set a 1G limit if no limit had been set).
When the freelist is empty, allocating from the frontier can be
done with just two pointers, but the current code must access
m_blocks, m_row, m_col, and m_colMax. Instead, maintain just
a pair of pointers (next, limit).
This diff also cleans up the way SmartAllocatorImpl.m_stats
is set up.
This was disabled because it requires a special php version
of the gd library. But the existing gd library has similar
functionality via a different api. Since there are only
a few functions that are affected by antialiasing it was
fairly easy to intercept each of the affected routines and
do the right thing.
We want to spend less memory on these, so we'll put them in a
tbb::concurrent_hash_map and share across threads. Also fixes various
memory leaks in some early-return and exception cases in various preg_
functions. Also removes the HAVE_SETLOCALE stuff, since it wasn't
working anyway (and zend53 doesn't handle it in preg_match after
setlocale either in a quick test).
When a query timed out fetching the result, we would return true,
rather than false. Note that queries that return data should only
return false, or a valid MySQLResult resource. The queries that
are allowed to return true have already been weeded out by the
"mysql->status != MYSQL_STATUS_GET_RESULT" above.
to_usec needs to know if we're dealing with cpu-time, and only
do the conversion to usec from nsec when using the fast clock_gettime
for cpu time only.
res_ninit will read /etc/resolv.conf using stdio, which will
mmap 4k worth of data each time res_ninit is called, causing contention
in the kernel. Cache this per-thread.
In rare circumstances, it could attempt to read an invalid iterator.
The problematic case is where a function or class has a name that
puts it at the "end" of the NamedEntityTable (the order of which is
undefined to start with) - and if its a class, its definition must
have been seen, but it must not have been declared yet.
glibc's stdio routines use mmap directly to allocate BUFSIZ
bytes to manage internal buffering of data, but this causes contention
on the shared address space in the kernel. Instead, use setbuffer to
provide an already malloc'd buffer for the buffered-io stdio routines.
Adds support for frames to ArenaImpl<>, which allows it to act
like a stack, and use them for the VarEnv, ExtraArgs, and associated
VarEnv allocations.
There are some cases involving foreach by reference loops where COW is not
triggered when it should be. Here is an example:
<?php
function foo() {
$arr = array(0,10,20,30);
foreach ($arr as $k => &$v) {
$v += 100;
if ($k == 1 && !isset($arr2)) { $arr2 = $arr; }
}
unset($v);
var_dump($arr2);
}
foo();
=== Current output under HipHop ===
array(4) {
[0]=>
int(100)
[1]=>
int(110)
[2]=>
int(120)
[3]=>
int(130)
}
In the example above, the 3rd and 4th elements of $arr2 were incorrectly
modified by the foreach by reference loop. The problem is that the strong
iterator only checks if the array's refcount is greater than 1 before the
first iteration of the loop, but it doesn't check the array's refcount for
subsequent iterations.
I fixed this by making the strong iterator check the array's refcount each
time the iterator is advanced. If the refcount is greater than 1, a copy is
made and the strong iterators are migrated from the original array to the
copy.
In the course of implementing this fix, I made VectorArray escalate to
ZendArray whenever it is used with foreach by reference. This helped avoid
complications in cases where a VectorArray would need to escalate to
ZendArray while a foreach by reference loop is iterating over it. I also
took the opportunity to clean up some of the logic that deals with
escalating an array when a foreach by reference loop begins.
Zend PHP 5.4 raises a warning when code such as "$x = null; $x->foo = 1;"
causes a null, false, or empty string value to be promoted to a stdClass.
This diff updates HipHop to raise this warning as well.
Parts of the runtime call TranslatorX64::Get(), which allocates
a TranslatorX64, but doesnt fully initialize it. Bad things happen
when we then delete it later.
This gets rid of the processInit call (which is kind of a misnomer
now anyway), and moves the code into the constructor.
Given something like:
if ($this instanceof foo) {
}
We can infer the type of this inside the if block to be foo, assuming
that foo is stronger than the known type of $this.
But if foo is a completely unrelated class, we shouldnt do that, because
we could generate incorrect code at compile time (of course, we could also
optimize out the if entirely - but currently dont).
The code to reconcile an actual type with an asserted type assumed the
actual type was passed first, but in one place, we passed the asserted
type first. This was only a problem if the assertion was impossible.
Registers information with gcc so we can throw exceptions
through TC frames. Changes EMIT_CALL to only clean caller-saved
regs---the unwinder will let us clean dirty callee-saved regs during
an exception so the interpreter-based stack unwinding will work.
mmap on Linux requires a write lock in the kernel which will
block page-faults as well as madvise. gethostbyname will call mmap via
glibc when opening and doing IO on /etc/hosts, so keep /etc/hosts open
permanently by calling sethostent(1)
During a recent experiment I copied disassembly of some code generated by
HHVM into a .S file and compiled it, and I noticed that we were using a
6-byte encoding for "mov imm32,reg32" when a 5-byte encoding was possible.
This diff fixes the x64 assembler to take advantage of the better encoding
for "mov imm32,reg32".
I also noticed that the jmp() and call() methods in X64Assembler don't take
advantage of emitImmReg goodness, so I updated them appropriately.
Both HPHP and the system's versions of these headers
were using the same guard defines which means that
whichever version got included first, won.
Rename ours to _HPHP_DWARF_H_ and _HPHP_ELFWRITER_H_
Linux makes it relatively easy these days to hint that 2MB pages
might be a good idea. Doing this seems to be good for about 5% CPU.
iTLB misses are down by about 30%.
is_subclass_of('X', 'X') was returning true under hhvm.
is_subclass_of, method_exists, and get_class_methods were failing
to call __autoload appropriately.
That way, idle threads automatically have their targetcaches
purged. This matters, because typically we have a large number
of threads early on while the server warms up, and then considerably
fewer once everything settles down. With this change, we only pay
for the memory of the active threads, rather than all threads that
have ever been active.
The underscore version of IdPrefix can currently never
work due to uses of $$ being baked into some extensions.
Remove this override for now pending macroization of label
mangling and a proper cmake detection script so that the OSS
build only falls back on other delimited when it has to.
Summary:
This script is slightly too eager to add
INSTANCE_METHOD_INJECTION hooks as it matches sections
like:
/*
void HPHP::c_XMLWriter::t___construct()
*/
TypedValue* tg_9XMLWriter___construct(HPHP::VM::ActRec *ar) {
as being the c_XMLWriter::t___construct() method from the
original file.
A proper fix would be to make the pattern more restrictive,
but since this over-eager matching is only problematic for
the ext_hhvm generated files, I'm inclined to solve this by
excluding those files, rather than making the regex more
complex and harder to maintain.
Explicitly include ucontext.h and define sighandler
Wrap access to the instruction pointer register in a macro
since mcontext_t is different on FreeBSD.
libmemcached doesn't actually mutate this value,
it seems to just be an oversight by the library maintainers.
Regardless, it causes build failures on some platforms.
Targetcache memory is allocated on thread startup, but was
never freed. http threads never die, so its not an issue there,
but eg ParallelQuery threads are created a freed on demand.
g_persistentObjects->get() may (likely) return NULL,
skip trying to restore a non-existant connection
(and testing it for liveness) in that case.
If the connection is dead, don't mark it for saving in rshutdown.
In Linux 3.2, the vdso implementation of clock_gettime/clock_gettime_ns is
extremely fast (faster than vtsc), so implement the infrastructure to
support using it.
This mutex shouldn't be recursive, and it should play nicely with our
ranking system. While trolling through the mutexes, pack them a bit
more nicely and expose primitives for asserting ownership.
A simple, slow, but hopefully functional backup cycle collector
and cyclic garbage detector. Adds an entry point to collect cyclic
garbage, and one that will dump the cyclic garbage as GML without
collecting it.
libiconv sometimes defines the second parameter of its
main function as (char**), and sometimes as (const char**) but
provides no means to detect this. For FB builds, we want (char**)
and explicitly cast as such already, for some OSS builds, this is
backwards.
builtin_functions.h can't compile on
platforms where sys/param.h gets included later
in the #include stack than builtin_functions.h
By explicitly including it here and clearing the
macro, we ensure it doesn't get reset.
Using SmartAllocator instead of malloc for these auxilary buffers
reduces fragmentation of malloc's small-size arenas, and avoids
the need to free at request-end.
This diff factors allocData() out of reallocData(); allocData
is only for first-time allocations and reallocData is only for
growing the array buffer. Each smart-allocated size class has
its own SmartAllocator, and allocating and freeing these requires
some branchy logic; I tried to keep the boilerplate to a minimum.
It was possible for C++ exceptions to propagate when setting
up a nested VM, but the nested VM didn't handle it and just let it fly
out to the containing VM. The previously nested VM would end unwinding
after this first nested ActRec, and then popVMState as if it were the
nested VM. The net result is we would unwind for all but the first VM
entry, because the m_nestedVMs state was off by one.
This fixes a test that crashed the JIT: in that case, we were
re-entering to try to generate the 500 page, attaching the global VarEnv
to some random ActRec that should've been unwound, and then throwing again
(so this was noticed by seeing the VarEnv depth refcounts going wrong).
We burned in STDERR, which changes from request to request. We
also, amazingly, fail to correctly handle the first access to
(some?) system constants in a request when they happen from the JIT.
Jenkins slow_tests runs keep having distcc problems and we
think lowering the parallelism of the test runs might help lessen the
load on the machines and keep things going more smoothly.
This commit changes the behavior of next in HHVM to be line aware.
1. Changes to the next command to skip lines
2. Changes to the breakpoint system to prevent re-execution of the
breakpoint when it returns from a function call.
This drops the check from two dependent loads and a test to
a single load/test. Code is visibly smaller, faster, better, etc. Also
fix BaseExecutionContext::setRequestMemoryBytes to not fall over when
given -1 from user code.
We had a flurry of failures trying to create implausibly large
strings. We had a user-provided 64-bit int used as an index, but a derived
32-bit int for its size.
HphpArray and VectorArray benefit from inline-allocating
so lets do it in ZendArray. This inline-allocates just
the bucket pointer array, up to four elements.
It also drops the m_nTableSize field since it's always
equal to m_nTableMask + 1, and shuffles fields in ZendArray
to put hot fields near the beginning of the object.
Due to a bug, in certain cases involving redeclared classes, type
inference was suboptimal unless a method required more than one
pass. For this case, the extra pass only happened very infrequently
(as a result of lock conflicts in the parallel inference algorithm).
Once the bug was fixed, the failure was consistent - because code gen
wasnt taking account of another rare occurrance relating to redeclared
classes. Fixed the code gen bug too.
Unreachable code could cause an assertion in the emitter.
Variables that get used in unreachable blocks got tagged as
uninited (why not?). But eg $this could also get tagged as
having a particular class, resulting in an assertion.
Also fix an HphpArray bug when jemalloc is disabled.
1) Modify allocInputReg to allow specifying a preferred register,
and add a helper to target inputs to call-argument registers.
Use the helper for CGetM and SetM and Same to get better register usage.
2) VerifyParamType doesnt read its input unless its an Object
3) Splitting a tracelet after a literal instruction causes the
result to be spilled, and then tested/reloaded from the stack
in the next tracelet. Avoid doing so where possible.
This gives the init document a clean way to prevent any further
action (an existing but less satisfactory solution was to
throw an exception - which would be ignored).
The comments didnt match the behavior. In one case,
we weren't decRefing a key where we should have,
causing a (small) leak.
In addition, by adding a few more helpers, we can
merge the SetM with a following Pop, and often with
a preceding CGetL.
HphpArray uses a minimum table size of 8 elements and always allocates using
malloc. Instrumentation shows that 85% of arrays are <= 4 elements,
and if we make room for a 4-entry table (up to 3 elements) inline,
we can save a bunch of cycles and memory for small arrays.
Initially we use the inline table. We also union the inline table with
a medium sized hashtable, but without slots, so for medium sized arrays,
we only need to malloc memory for the slots, not the hashtable.
This change also reduces MinLgTableSize from 3 to 2, to reflect the
smaller initial sizes we support, and fixes a small memory accounting
but where HphpArray::sweep() didn't call adjustUsageStats().
- Make the dataflow pass we were using more general;
it was too specialized towards detecting "guarded"
expressions such as X::foo(); ...; X::bar() where the
X is known to exist in the call to bar, and didnt
handle propagation of information about variables
properly.
- Mark type-hints as guaranteed non-null so we can
avoid guarding on them
- Mark definitely-not-inited variables
- Mark definitely-not-reference-typed variables
- Fix the emitter to handle setting meta-data on
the inputs of SetL correctly.
In SmartAllocator.alloc() we execute if (hhvm) .. if (m_stats.maxBytes > 0)
on every smart allocation, which is a very hot path. We can eliminte this
check by just ensuring maxBytes is always valid (use MAX_INT64 instead of
0 or -1 to suppress the check).
Perflab says this saves about 1% CPU time, just above the noise
margin, but CPU instructions are measurably down and it's obviously
good to streamline the allocator hot path.
Currently, gcc 4.6 is required to build hphp
due to earlier versions being unable to resolve the
function pointer from the template directly into a void*.
By explicitly typedefing these functions, gcc 4.4
(and possibly some earlier versions) can resolve the
function and allow builds to complete.
gdb tends to get confused if we emit dwarf info
for a and astubs together in the same ELF file.
gdb assumes that all ELF files emitted contain
code addresses in contiguous range.
If a single ELF file contains dwarf file for both a and
astubs, gdb assumes all intermediate addresses must
also belong to this ELF file, and this causes
symbol name lookup in gdb to fail for intermediate
addresses that may be present in other ELF files.
set_exception_handler() currently can only return a string
which is fine if the prior exception handler was a normal
function. However the following cases yield incorrect values:
* No prior exception handler set.
* Prior exception handler in array form:
* array($class, $staticmethod)
* array($object, $instancemethod)
* Prior exception handler invokable object instance (e.g. lambda)
Many code paths in VectorArray invoke ArrayData interface
methods when we know the target is VectorArray. Qualify
these calls to avoid the indirect call.
Added an inline helper inRange(index, size) function and
use it instead of VectorArray::exists() in several places.
The type of the result of SetM isnt always the same as
the type of the rhs. This was leading to incorrect
decrefs, and use-after-free issues.
If we cant prove its the same, mark it as predicted so
we insert a runtime check/side exit.
- array_setm treated all keys as strings if the array
was not an HphpArray
- iter_value_cell_local, and iter_key_cell_local assumed
the local was not a reference
Static analysis computes a "guarded" flag for various constructs
that can fatal (such as $this->) to indicate that such a construct
has already been seen.
We can use this flag in the translation of OpThis to avoid repeatedly
testing for null, and in the translation of OpRetC to unconditionally
decRef ar->m_this.
Also fixed some compiler generated byte code for closures that was
emitting strings for vector operations (rather than using literals
in the vectors).
Include malloc.h in util/alloc.h when jemalloc unavailable.
Fix bug when extra arguments are passed to a function that creates a Var
Use intrusive list instead of std::list to store VarEnvs chain
Use a freelist embedded inside the object in smart allocator
Fix a potential crash when toString doesnt exist
Garbage-collect the TC.
Fix some use-after-free issues
Fix a couple of fast_tests
Fix verification of static type inference
Dont generate unnecessary guards for type predictions
Enable syslog output.
[tc-print] Clean up types in tc-print
Re-enable the "MayUseVV" optimization
Blacklist a given function/method when hphpd evals inside it
Add missing check for NULL in VariableTable::getVariablePrefix()
[tc-print] Show bc-stats-event_name.txt as a generated file in tc-prod-p
[tc-print] Cosmetic changes to how we print translation summaries
Perform actual liveness check on pfsockopen()
Add an API to control whether exception backtraces include $this objects
Minor typo fix: Annote -> Annotate
[servicerouter] bump fbcode version
[tc-print] Use PEBS-supported events
[tc-print] Collect inclusive stats
Clean up method lookup
Streamline Unit::merge, and reduce size of Unit
Fix casting resources to strings in the translator
Add request_alloc(); use it for non-ObjectData instead of ALLOCOBJSZ
Take two at reverting short array notation
Make more requires "mergeable"
Fix TestCodeRun-Exit
Change VectorArray to allocate values contiguously
Force file removal in merge-me-to-rc and merge-rc-to-release.
Run all C++ tests on each hphp diff sent out
Fix uncompilable code gen
Move HPHP::VM::Verifier::Arena to HPHP::Arena
Add a TinyVector<> class that can replace some uses of vector; use it some
ServiceRouter: bump version in hphp after single host fix
HPHP: get building with gcc-4.7.1
HPHP extension for textGetHyperlinks
Export translation counters through the hardware counters interface
Fix Makefile
HphpArray methods don't need to be virtual, clean up inline decls.
Remove a few things we don't follow from coding_guideline
Fix bugs with attributes for closures and methods imported from traits
Remove some new using directives that crept in after earlier removal
Add a check for lowercase extension function names in gen_ext_hhvm.php
[tc-print] More detailed summary of events
[tc-print] Changes to tc-prod-collect
Allow COMPILER_ID to be overridden on the command line
Remove boolean args from HphpArray::addVal[WithRef]
Fix NameValueTable load factor check.
Reduce StringBuffer memory usage
Fix compilation errors
Update login for megabench test user
make zend array bucket smaller
Clean up HphpArray code
Fix parser to allow xhp classes as attribute type hints
Enable the use of static analysis types in RepoAuthoritative mode
Fix compilation errors in HphpArray without USE_JEMALLOC
Remove extra element alignment and casting in HphpArray
Revert "HPHP extension for textGetHyperlinks"
bump fbcode revision to include D507348
HPHP extension for textGetHyperlinks
Remove KindOfIndirect support from HphpArray, add new VarEnv implementation
Teach the hhas assembler about string and int vector immediates
Allow ':' to be followed by T_XHP_LABEL
[tc-print] Add top translations summary
Fix string buffer overflows and other issues
Make type-profiling name-driven, and fix bugs.
Summary: Added plumbing to the cmake scripts to enable hhvm builds.
To build the VM, export USE_HHVM=1, and follow the wiki instructions
to do a clean build.
HipHop is a source code transformer which transforms PHP source code into highly optimized C++ and then compiles it using g++. Currently supported platforms are Linux and FreeBSD. There is no OS X support.
HipHop is a high performance PHP toolchain. Currently supported platforms are Linux and FreeBSD. There is no OS X support.
Alguns arquivos não foram exibidos porque demasiados arquivos foram alterados neste diff
Mostrar Mais
Referência em uma Nova Issue
Bloquear um usuário
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.