Having everybody and their uncle reading and writing fields out
of a.code and stubs.code was making assembler work hard. This replaces most
reads with accessors, and all writes with structured friends of
X64SAssembler.
We can get rid of all the lock contention by using
tbb::concurrent_hash_map. Unfortunately tbb::concurrent_hash_map doesn't let us
iterate over it while we concurrently do inserts. Work around this limitation by
keeping a copy of all the counter keys in a concurrent_vector which does let us
concurrently insert and iterate.
On lookup / insert, we first look into concurrent_hash_map for the key. If key
already exists, return the value. Otherwise, we insert into both the map and the
list.
On export, we iterate over the list and look up each key from the map.
Add an option to allow hphp's request queue dynamically switch between FIFO and
LIFO. By default everything is still FIFO like today. Setting
ServerThreadJobLIFO to true turns everything to LIFO like today. If
ServerThreadJobLIFOThreshold is also set, then we do FIFO up until the queue
length hits the threshold. Once the threshold is crossed, workers will take
quests from the end of the queue. Once the queue size shrink below the
threshold, we resume FIFO order.
This behavior helps us prioritize newer requests when the server is loaded.
This a simpler / cleaner version of fbcode's ServiceData for hphp. Currently we
support only flat counters, MultiLevelTimeSeries and Histograms. We can add more
stats types later on as needed.
ServiceData is a global entry point for all this stuff. The current idea is to
completely decouple data input and export. ServiceData internally has three
separate maps tracking flat counters, timeseries and histograms. These maps are
wrapped by spin locks and protected by folly::Synchronized.
ServiceData provides three functions to create/retrive counter objects. The
counter objects are thread safe (protected again by spin locks and
folly::Synchronized).
Retry of D848605 which doesn't segfault the open-source build.
Formerly, this change caused the assembler to emit an opsize
prefix before ret instructions taking a 16-bit immediate, which
worked on our machines but caused problems for Travis CI.
There was a confusing comment left in the assembler
that said that we can't encode a decl r32 instruction, because
the opcode is the same as a REX byte. This is untrue in our
assembler, because we use the FF /1 encoding, rather than the
48 one (which is a REX byte and would be a problem).
Some tests were added to show that this is fine.
Formerly, trying to emit a 16-bit instruction, such as
with test_imm16_disp_reg64(...) would actually emit a testl
instead of a testw. This would emit an 8-byte instruction
(ex. 41 f7 45 0e 01 01 00 00) and possibly be testing the wrong
thing. Now, when you emit a word-size instruction, it will prefix
with 66 and use the correct operand size, saving us a byte
(ex. 66 41 f7 45 0e 01 01).
I also converted the instruction emission in ContPreNext to
use the new style assembly emission.
SimpleLableTest used rbx, r10, r11, r15 without saving them,
causing lots of potential issues.
RandomJunk relied on white space at the end of a line - but our
master.emacs is setup to strip trailing space on save. I rewrote
the test to avoid having to remember to override every time.
I was learning from @jdelong and he said that you should use
double quotes for local includes and angle brackets for library
includes. I asked why our code was the way it was, and he said he wanted
to clean it up. I beat him to it :)
Conflicts:
hphp/runtime/base/server/admin_request_handler.cpp
hphp/runtime/vm/named_entity.h
Only in RepoAuthoritative mode, where units can't be
deallocated. I have a semi-reproducable bug where a unit's m_bc
region is getting corrupted (only occasionally in perflab).
Presumably moving it out of the malloc'd heap will make the bug
corrupt something else, so this probably doesn't really help much, but
it seemed like having a separate region for some cold, read-only,
process-lifetime metadata might make sense.
We had the belief that m_type as an int32_t (and in at least one
place, an int64_t) burned in many places. This is going to make any kind of
re-encoding of TypedValues nearly impossible.
Redirect all such accesses via some helpers, so e.g.
a. cmpl(KindOfUninit, base[TVOFF(m_type)]);
becomes
emitCmpTVType(a, KindOfUninit, base[TVOFF(m_type)]);
which may do byte or dword access, depending on m_type's actual size. While
this is motivated by 7pack, I'm planning to route it through trunk to
prevent any more of the old style accesses from cropping up.
These rely on the hilarious BSR and BSF instructions on x64. ARMv8 has
an instruction that does something similar: count leading zeros.
Unfortunately, their semantics differ in a few important ways:
- BSR and BSF, when given a zero input, set the zero flag and have
undefined output. CLZ outputs 64.
- BSR and BSF return the index (starting from 0 at the LSB) of the most- or
least-significant 1 bit. CLZ does what it says: count leading zeros.
Bottom line, there has to be some bit trickery somewhere to fit the two
instructions into the same interface. I kept the x64 implementations the
same and put all the nastiness in the ARM implementations to make them
match, since obviously the perf of the ARM implementations doesn't
matter yet.
This change is mostly for FB internal organizational reasons.
Building is not effected beyond the fact that the target now
lands in hphp/hhvm/hhvm rather than src/hhvm/hhvm.