I made this change in my arrays working branch and liked it
enough to split it out and test/submit it early. treating
the range of valid Elm slots as a vanilla open-ended range
[0..m_used) simplifies a bunch of code.
They aren't used anymore, except to implement isVectorData(),
which we can make pure-virtual and implement more efficiently
in each subclass of ArrayData.
nvSet() only casts the value from TypedValue* to const Variant&; do it
at callsites. Inlined array_setm_ik1_v0() and array_setm_s0k1_v0() into
their only remaining callsites in translator-runtime.cpp.
The in-situ allocation case can be easily detected by checking m_data == m_inline_data.slots. Therefore there's no need for two distinct flags to track current and future allocation strategy.
emitAddNewElemC didn't properly implement the specified
behavior for this opcode. It assumed topC(1) was always an non-static
array with refcount 1 or 0. This changes to interp the case when it's
not an array, and make the helper support static arrays. (If it
matters for perf, we should engineer the bytecode spec to require this
as well, since the point of this opcode is initializing array literals
that can't be done as static arrays.) In practice I think this is
only working because it always comes after a NewArray with a capacity
hint != 1, which means it's a newly allocated non-static array.
Ignoring the capacity hint would've broken it.
ArrayData gets on a diet, down to 32 bytes
devirtualized destructor call in HphpArray::release()
aggressively inlined constructors for ArrayData (do away with defaulted arguments)
eliminate a dead memset() for the hashtable in HphpArray(uint size, const TypedValue* values)
I've done this refactoring without efficiency in mind - it's one of the steps toward more aggressive in-situ allocation. Yet to my surprise perflab rewarded this simple refactoring with a whopping decrease in I-TLB misses. I myself am wondering what has caused this. Anyhow let's run perflab once more and get this baby on the road.
I've run a bunch of experiments on HphpArray speed and am still yet to have a major breakthrough. However, there are a few clear small winners that I'm submitting with this diff. CPU instructions, CPU loads, and CPU stores are all as green as the Eternal Hunting Grounds. Speed is drowned in noise but I sususpect will be measurable on these changes combined.
I'm committing my work on HphpArray in bite-sized pieces for easier review. This piece replaces calls to NEW with a factory method. There are many problems with operator new, starting with the fact that the allocator cannot communicate properly with the constructor.
The newly introduced factory method should return ArrayData but that causes many issues right now, so I left that step to a future diff.
I was learning from @jdelong and he said that you should use
double quotes for local includes and angle brackets for library
includes. I asked why our code was the way it was, and he said he wanted
to clean it up. I beat him to it :)
Conflicts:
hphp/runtime/base/server/admin_request_handler.cpp
hphp/runtime/vm/named_entity.h
PolicyArray splits the ArrayData implementation in two parts. ArrayShell implements the baroque ArrayData implementation in terms of a much smaller statically-bound core that concerns itself exclusively with the storage strategy for the array. This way the two aspects can be worked on separately, and different stores can be easily plugged into the given ArrayShell.
To give a better perspective, the featured SimpleArrayStore has about 340 lines all told (including solid documentation), whereas ArrayShell has some 1100 lines. The shell can be reused with other stores of unbounded sophistication. The store needs to implement only 22 primitives, some of which are trivial. Basically a new store implementation saves 1100 of difficult-to-get-right lines of code right off the bat, and only needs to focus on implementing 22 well-defined primitives.
Things to watch for when reviewing:
- Is the store API small/expressive enough? What should be added/removed?
- Are various forms of duplication present? If so, how could code be factored better?
- Any subtle change in semantics from HphpArray and friends that should be minded?
Known issues:
* Currently ArrayShell is defined as:
class ArrayShell : public ArrayData, private SimpleArrayStore { ... };
when in fact it should be:
template <class StorePolicy>
class ArrayShell : public ArrayData, private StorePolicy { ... };
Extenuating circumstances related to allocator design prevent use of templates at this point. I'll work on these in parallel with the review.
* growNoResize is almost always prefaced by a test-and-grow. This sequence should be encoded in a function.
* The growth policy is spread all over the place in a subtle form of duplication.
* The current store design makes almost no effort to be particularly efficient.
This is a partial step towards merging the HPHP::VM namespace
up into its parent. To keep it reviewable/mergeable I'm not doing
everything at once here, but most of the code I've touched seems
improved. I've drawn an invisible line around the jit, Unit and
its cohort (Class, Func, PreClass, etc.); we'll get back to them
soon.
Streamline the array access methods by always returning an array
pointer instead of a new pointer or null. Callsites compare
(new != old) to detect escalation, rather than (new != null).
Replaces the awkward getFullPos() and setFullPos() methods with a more
intuitive advanceFullPos() method. This refactoring also reduces the number
of virtual calls made when doing mutable iteration on an array.
Fixing various bugs all over the VM that make assumptions about RefData
and TypedValue layout. Here are the assumptions fixed by this diff:
offsetof(RefData, m_tv) == 0. Both JIT's assumed this in many subtle
ways, by punning RefData* as TypedValue* without adding an offset.
This assumption also causes RefData._count to overlap TypedValue.m_aux,
which constraints TypedValue layout.
offsetof(TypedValue, m_data) == 0. gen_ext_hhvm.php assumes you
can cast TypedValue* to Value*; the JITs often weren't using
offsetof(TypedValue, m_data) in their addressing calculations. HHIR
assumed return-by-value TV's have m_data/m_type in rax/rdx, which
can change when TV layout changes.
offsetof(TypedValue, m_type) > 8 is an assumption baked into the
pass-by-value register assignment logic in HHIR's codegen.cpp; if
the type is in the low word, register assignment is swapped.
sizeof(TypedValue::m_type) == 4. We used dword-sized operations
in both JIT's when accessing m_type. Now, we use helper functions
that are sensitive to sizeof(DataType)
Configuration:
DEBUG=: (opt) same layouts as trunk for RefData & TypedValue
DEBUG=1: (dbg) new RefData layout (m_tv doesn't overlap RefData::_count)
PACKED_TV=1, DEBUG=*: new RefData and TypedValue layout.
Of the four horsemen of the SmartAllocator, ArrayData was the only virtual
call. This meant an extra layer of indirection when coming from the TC
to allow the c++ compiler to emit its virtual call, and slightly larger
callsites when using non-generic paths.
While we're moving in this direction, consolidate ArrayData introspection
on its type enum. isSharedMap() was previously implemented with a vtable
slot, and we had no way of asking if an ArrayData was a NameValueTable.
This mimics what TranslatorX64 does in translateSetMArray,
but it does it with fewer helpers and (often) fewer instructions in
translated code. I also found a bug in both jits and the interpreter
when dealing with arrays that hold refs to themselves. The new test
case exercises the fix, which involved a bit of refactoring of the
refcounting logic.
Enabling VectorTranslator while punting to tx64 is no longer a
regression so I removed the punt in emit().
Access to TypedValue.m_aux must now be via TypedValueAux. For now,
TypedValueAux is an empty subclass with accessors to m_aux, which is
now private. Once RefData.m_tv is moved out from under RefData._count,
we can move TypedValue.m_aux to TypedValueAux.
Removed unnecessary initialization of m_aux.u_hash from c_Vector.
This diff removes initializing stores to TypedValue._count, renames
_count to m_aux, and makes m_aux a union with members typed
and named according to their specialized uses. The few remaining
uses of that field for random tweaks are more obvious and easy to
grep for.
TypedValue no longer extends Value, (allowing m_data to move to a
different offset in the future), and Variant now extends TypedValue,
so we only have to maintain one definition.
HphpArray now explicitly uses TypedValue.m_pad instead of overlapping
TypedValue with an anonymous struct, again so we don't have to maintain
another structure to match TypedValue's layout.
The JIT's were using offsetof(TypedValue, _count) all over the place
for access to String/Array/Object/RefData::_count. Instead, use
FAST_REFCOUNT_OFFSET.
g++-4.7.1 treats "FOO"bar as a c++-11 literal operator, even
if bar is a macro with an expansion such as "BAR" - so add a space
after the quote (this seems like a bug, and I fixed a bunch of these
a while ago, but we just added a slew of PRI*64 macros which break
under 4.7.1).
Also, it warned that "explicit by-copy capture of 'this' redundant"
for a lambda declared [=, this] - so I removed the this.
We also needed more than the 60 levels of template expansion that was
allowed by the makefile.
This diff refactors some of the VM's logic for iterators (with a focus on
mutable iteration), delivering several improvements:
1) MIterCtx was renamed to MArrayIter, and the m_key and m_val fields
were eliminated.
2) Eliminated the need for MArrayIter to dynamically allocate a
MutableArrayIter object, and removed other layers of indirection as
well.
3) Reduced the size of HPHP::VM::Iter from 64 bytes down to 32 bytes.
4) Removed the "if (siPastEnd())" check when adding a new element to an
HphpArray or a ZendArray.
5) Moved all of the iterator logic into a single .cpp file.
This diff reworks FullPos's to point to current element instead of pointing
to the next element. It also splits up the IterFree instruction into two
instructions (IterFree and MIterFree). These changes allowed various logic
to be simplified and data structures to be reduced in size. There is
definitely more opportunity for refactoring, but I know the JIT helpers for
iteration have been carefully tuned and so I'll leave further refactoring
for future diffs.
Finally, I spent a little time cleaning up the bytecode spec a bit, mostly
with respect to iteration.
Per @mwilliams' suggestion, this is the first stage in a staggered approach to replacing int64 with int64_t. More precisely I inserted "typedef ::int64_t int64;" in util/base.h and dealt with the consequences.
This change is mostly for FB internal organizational reasons.
Building is not effected beyond the fact that the target now
lands in hphp/hhvm/hhvm rather than src/hhvm/hhvm.