LLVM 20.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | <hazard padding> |
56// |-----------------------------------|
57// | |
58// | callee-saved fp/simd/SVE regs |
59// | |
60// |-----------------------------------|
61// | |
62// | SVE stack objects |
63// | |
64// |-----------------------------------|
65// |.empty.space.to.make.part.below....|
66// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
67// |.the.standard.16-byte.alignment....| compile time; if present)
68// |-----------------------------------|
69// | local variables of fixed size |
70// | including spill slots |
71// | <FPR> |
72// | <hazard padding> |
73// | <GPR> |
74// |-----------------------------------| <- bp(not defined by ABI,
75// |.variable-sized.local.variables....| LLVM chooses X19)
76// |.(VLAs)............................| (size of this area is unknown at
77// |...................................| compile time)
78// |-----------------------------------| <- sp
79// | | Lower address
80//
81//
82// To access the data in a frame, at-compile time, a constant offset must be
83// computable from one of the pointers (fp, bp, sp) to access it. The size
84// of the areas with a dotted background cannot be computed at compile-time
85// if they are present, making it required to have all three of fp, bp and
86// sp to be set up to be able to access all contents in the frame areas,
87// assuming all of the frame areas are non-empty.
88//
89// For most functions, some of the frame areas are empty. For those functions,
90// it may not be necessary to set up fp or bp:
91// * A base pointer is definitely needed when there are both VLAs and local
92// variables with more-than-default alignment requirements.
93// * A frame pointer is definitely needed when there are local variables with
94// more-than-default alignment requirements.
95//
96// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
97// callee-saved area, since the unwind encoding does not allow for encoding
98// this dynamically and existing tools depend on this layout. For other
99// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
100// area to allow SVE stack objects (allocated directly below the callee-saves,
101// if available) to be accessed directly from the framepointer.
102// The SVE spill/fill instructions have VL-scaled addressing modes such
103// as:
104// ldr z8, [fp, #-7 mul vl]
105// For SVE the size of the vector length (VL) is not known at compile-time, so
106// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
107// layout, we don't need to add an unscaled offset to the framepointer before
108// accessing the SVE object in the frame.
109//
110// In some cases when a base pointer is not strictly needed, it is generated
111// anyway when offsets from the frame pointer to access local variables become
112// so large that the offset can't be encoded in the immediate fields of loads
113// or stores.
114//
115// Outgoing function arguments must be at the bottom of the stack frame when
116// calling another function. If we do not have variable-sized stack objects, we
117// can allocate a "reserved call frame" area at the bottom of the local
118// variable area, large enough for all outgoing calls. If we do have VLAs, then
119// the stack pointer must be decremented and incremented around each call to
120// make space for the arguments below the VLAs.
121//
122// FIXME: also explain the redzone concept.
123//
124// About stack hazards: Under some SME contexts, a coprocessor with its own
125// separate cache can used for FP operations. This can create hazards if the CPU
126// and the SME unit try to access the same area of memory, including if the
127// access is to an area of the stack. To try to alleviate this we attempt to
128// introduce extra padding into the stack frame between FP and GPR accesses,
129// controlled by the aarch64-stack-hazard-size option. Without changing the
130// layout of the stack frame in the diagram above, a stack object of size
131// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
132// to the stack objects section, and stack objects are sorted so that FPR >
133// Hazard padding slot > GPRs (where possible). Unfortunately some things are
134// not handled well (VLA area, arguments on the stack, objects with both GPR and
135// FPR accesses), but if those are controlled by the user then the entire stack
136// frame becomes GPR at the start/end with FPR in the middle, surrounded by
137// Hazard padding.
138//
139// An example of the prologue:
140//
141// .globl __foo
142// .align 2
143// __foo:
144// Ltmp0:
145// .cfi_startproc
146// .cfi_personality 155, ___gxx_personality_v0
147// Leh_func_begin:
148// .cfi_lsda 16, Lexception33
149//
150// stp xa,bx, [sp, -#offset]!
151// ...
152// stp x28, x27, [sp, #offset-32]
153// stp fp, lr, [sp, #offset-16]
154// add fp, sp, #offset - 16
155// sub sp, sp, #1360
156//
157// The Stack:
158// +-------------------------------------------+
159// 10000 | ........ | ........ | ........ | ........ |
160// 10004 | ........ | ........ | ........ | ........ |
161// +-------------------------------------------+
162// 10008 | ........ | ........ | ........ | ........ |
163// 1000c | ........ | ........ | ........ | ........ |
164// +===========================================+
165// 10010 | X28 Register |
166// 10014 | X28 Register |
167// +-------------------------------------------+
168// 10018 | X27 Register |
169// 1001c | X27 Register |
170// +===========================================+
171// 10020 | Frame Pointer |
172// 10024 | Frame Pointer |
173// +-------------------------------------------+
174// 10028 | Link Register |
175// 1002c | Link Register |
176// +===========================================+
177// 10030 | ........ | ........ | ........ | ........ |
178// 10034 | ........ | ........ | ........ | ........ |
179// +-------------------------------------------+
180// 10038 | ........ | ........ | ........ | ........ |
181// 1003c | ........ | ........ | ........ | ........ |
182// +-------------------------------------------+
183//
184// [sp] = 10030 :: >>initial value<<
185// sp = 10020 :: stp fp, lr, [sp, #-16]!
186// fp = sp == 10020 :: mov fp, sp
187// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
188// sp == 10010 :: >>final value<<
189//
190// The frame pointer (w29) points to address 10020. If we use an offset of
191// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
192// for w27, and -32 for w28:
193//
194// Ltmp1:
195// .cfi_def_cfa w29, 16
196// Ltmp2:
197// .cfi_offset w30, -8
198// Ltmp3:
199// .cfi_offset w29, -16
200// Ltmp4:
201// .cfi_offset w27, -24
202// Ltmp5:
203// .cfi_offset w28, -32
204//
205//===----------------------------------------------------------------------===//
206
207#include "AArch64FrameLowering.h"
208#include "AArch64InstrInfo.h"
210#include "AArch64RegisterInfo.h"
211#include "AArch64Subtarget.h"
215#include "llvm/ADT/ScopeExit.h"
216#include "llvm/ADT/SmallVector.h"
217#include "llvm/ADT/Statistic.h"
234#include "llvm/IR/Attributes.h"
235#include "llvm/IR/CallingConv.h"
236#include "llvm/IR/DataLayout.h"
237#include "llvm/IR/DebugLoc.h"
238#include "llvm/IR/Function.h"
239#include "llvm/MC/MCAsmInfo.h"
240#include "llvm/MC/MCDwarf.h"
242#include "llvm/Support/Debug.h"
249#include <cassert>
250#include <cstdint>
251#include <iterator>
252#include <optional>
253#include <vector>
254
255using namespace llvm;
256
257#define DEBUG_TYPE "frame-info"
258
259static cl::opt<bool> EnableRedZone("aarch64-redzone",
260 cl::desc("enable use of redzone on AArch64"),
261 cl::init(false), cl::Hidden);
262
264 "stack-tagging-merge-settag",
265 cl::desc("merge settag instruction in function epilog"), cl::init(true),
266 cl::Hidden);
267
268static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
269 cl::desc("sort stack allocations"),
270 cl::init(true), cl::Hidden);
271
273 "homogeneous-prolog-epilog", cl::Hidden,
274 cl::desc("Emit homogeneous prologue and epilogue for the size "
275 "optimization (default = off)"));
276
277// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
279 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
280 cl::Hidden);
281// Whether to insert padding into non-streaming functions (for testing).
282static cl::opt<bool>
283 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
284 cl::init(false), cl::Hidden);
285
287 "aarch64-disable-multivector-spill-fill",
288 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
289 cl::Hidden);
290
291STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
292
293/// Returns how much of the incoming argument stack area (in bytes) we should
294/// clean up in an epilogue. For the C calling convention this will be 0, for
295/// guaranteed tail call conventions it can be positive (a normal return or a
296/// tail call to a function that uses less stack space for arguments) or
297/// negative (for a tail call to a function that needs more stack space than us
298/// for arguments).
303 bool IsTailCallReturn = (MBB.end() != MBBI)
305 : false;
306
307 int64_t ArgumentPopSize = 0;
308 if (IsTailCallReturn) {
309 MachineOperand &StackAdjust = MBBI->getOperand(1);
310
311 // For a tail-call in a callee-pops-arguments environment, some or all of
312 // the stack may actually be in use for the call's arguments, this is
313 // calculated during LowerCall and consumed here...
314 ArgumentPopSize = StackAdjust.getImm();
315 } else {
316 // ... otherwise the amount to pop is *all* of the argument space,
317 // conveniently stored in the MachineFunctionInfo by
318 // LowerFormalArguments. This will, of course, be zero for the C calling
319 // convention.
320 ArgumentPopSize = AFI->getArgumentStackToRestore();
321 }
322
323 return ArgumentPopSize;
324}
325
327static bool needsWinCFI(const MachineFunction &MF);
330
331/// Returns true if a homogeneous prolog or epilog code can be emitted
332/// for the size optimization. If possible, a frame helper call is injected.
333/// When Exit block is given, this check is for epilog.
334bool AArch64FrameLowering::homogeneousPrologEpilog(
335 MachineFunction &MF, MachineBasicBlock *Exit) const {
336 if (!MF.getFunction().hasMinSize())
337 return false;
339 return false;
340 if (EnableRedZone)
341 return false;
342
343 // TODO: Window is supported yet.
344 if (needsWinCFI(MF))
345 return false;
346 // TODO: SVE is not supported yet.
347 if (getSVEStackSize(MF))
348 return false;
349
350 // Bail on stack adjustment needed on return for simplicity.
351 const MachineFrameInfo &MFI = MF.getFrameInfo();
353 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
354 return false;
355 if (Exit && getArgumentStackToRestore(MF, *Exit))
356 return false;
357
358 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
359 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
360 return false;
361
362 // If there are an odd number of GPRs before LR and FP in the CSRs list,
363 // they will not be paired into one RegPairInfo, which is incompatible with
364 // the assumption made by the homogeneous prolog epilog pass.
365 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
366 unsigned NumGPRs = 0;
367 for (unsigned I = 0; CSRegs[I]; ++I) {
368 Register Reg = CSRegs[I];
369 if (Reg == AArch64::LR) {
370 assert(CSRegs[I + 1] == AArch64::FP);
371 if (NumGPRs % 2 != 0)
372 return false;
373 break;
374 }
375 if (AArch64::GPR64RegClass.contains(Reg))
376 ++NumGPRs;
377 }
378
379 return true;
380}
381
382/// Returns true if CSRs should be paired.
383bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
384 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
385}
386
387/// This is the biggest offset to the stack pointer we can encode in aarch64
388/// instructions (without using a separate calculation and a temp register).
389/// Note that the exception here are vector stores/loads which cannot encode any
390/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
391static const unsigned DefaultSafeSPDisplacement = 255;
392
393/// Look at each instruction that references stack frames and return the stack
394/// size limit beyond which some of these instructions will require a scratch
395/// register during their expansion later.
397 // FIXME: For now, just conservatively guestimate based on unscaled indexing
398 // range. We'll end up allocating an unnecessary spill slot a lot, but
399 // realistically that's not a big deal at this stage of the game.
400 for (MachineBasicBlock &MBB : MF) {
401 for (MachineInstr &MI : MBB) {
402 if (MI.isDebugInstr() || MI.isPseudo() ||
403 MI.getOpcode() == AArch64::ADDXri ||
404 MI.getOpcode() == AArch64::ADDSXri)
405 continue;
406
407 for (const MachineOperand &MO : MI.operands()) {
408 if (!MO.isFI())
409 continue;
410
412 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
414 return 0;
415 }
416 }
417 }
419}
420
424}
425
426/// Returns the size of the fixed object area (allocated next to sp on entry)
427/// On Win64 this may include a var args area and an UnwindHelp object for EH.
428static unsigned getFixedObjectSize(const MachineFunction &MF,
429 const AArch64FunctionInfo *AFI, bool IsWin64,
430 bool IsFunclet) {
431 if (!IsWin64 || IsFunclet) {
432 return AFI->getTailCallReservedStack();
433 } else {
434 if (AFI->getTailCallReservedStack() != 0 &&
436 Attribute::SwiftAsync))
437 report_fatal_error("cannot generate ABI-changing tail call for Win64");
438 // Var args are stored here in the primary function.
439 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
440 // To support EH funclets we allocate an UnwindHelp object
441 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
442 return AFI->getTailCallReservedStack() +
443 alignTo(VarArgsArea + UnwindHelpObject, 16);
444 }
445}
446
447/// Returns the size of the entire SVE stackframe (calleesaves + spills).
450 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
451}
452
454 if (!EnableRedZone)
455 return false;
456
457 // Don't use the red zone if the function explicitly asks us not to.
458 // This is typically used for kernel code.
459 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
460 const unsigned RedZoneSize =
462 if (!RedZoneSize)
463 return false;
464
465 const MachineFrameInfo &MFI = MF.getFrameInfo();
467 uint64_t NumBytes = AFI->getLocalStackSize();
468
469 // If neither NEON or SVE are available, a COPY from one Q-reg to
470 // another requires a spill -> reload sequence. We can do that
471 // using a pre-decrementing store/post-decrementing load, but
472 // if we do so, we can't use the Red Zone.
473 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
474 !Subtarget.isNeonAvailable() &&
475 !Subtarget.hasSVE();
476
477 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
478 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
479}
480
481/// hasFPImpl - Return true if the specified function should have a dedicated
482/// frame pointer register.
484 const MachineFrameInfo &MFI = MF.getFrameInfo();
485 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
486
487 // Win64 EH requires a frame pointer if funclets are present, as the locals
488 // are accessed off the frame pointer in both the parent function and the
489 // funclets.
490 if (MF.hasEHFunclets())
491 return true;
492 // Retain behavior of always omitting the FP for leaf functions when possible.
494 return true;
495 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
496 MFI.hasStackMap() || MFI.hasPatchPoint() ||
497 RegInfo->hasStackRealignment(MF))
498 return true;
499 // With large callframes around we may need to use FP to access the scavenging
500 // emergency spillslot.
501 //
502 // Unfortunately some calls to hasFP() like machine verifier ->
503 // getReservedReg() -> hasFP in the middle of global isel are too early
504 // to know the max call frame size. Hopefully conservatively returning "true"
505 // in those cases is fine.
506 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
507 if (!MFI.isMaxCallFrameSizeComputed() ||
509 return true;
510
511 return false;
512}
513
514/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
515/// not required, we reserve argument space for call sites in the function
516/// immediately on entry to the current function. This eliminates the need for
517/// add/sub sp brackets around call sites. Returns true if the call frame is
518/// included as part of the stack frame.
520 const MachineFunction &MF) const {
521 // The stack probing code for the dynamically allocated outgoing arguments
522 // area assumes that the stack is probed at the top - either by the prologue
523 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
524 // most recent variable-sized object allocation. Changing the condition here
525 // may need to be followed up by changes to the probe issuing logic.
526 return !MF.getFrameInfo().hasVarSizedObjects();
527}
528
532 const AArch64InstrInfo *TII =
533 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
534 const AArch64TargetLowering *TLI =
535 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
536 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
537 DebugLoc DL = I->getDebugLoc();
538 unsigned Opc = I->getOpcode();
539 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
540 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
541
542 if (!hasReservedCallFrame(MF)) {
543 int64_t Amount = I->getOperand(0).getImm();
544 Amount = alignTo(Amount, getStackAlign());
545 if (!IsDestroy)
546 Amount = -Amount;
547
548 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
549 // doesn't have to pop anything), then the first operand will be zero too so
550 // this adjustment is a no-op.
551 if (CalleePopAmount == 0) {
552 // FIXME: in-function stack adjustment for calls is limited to 24-bits
553 // because there's no guaranteed temporary register available.
554 //
555 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
556 // 1) For offset <= 12-bit, we use LSL #0
557 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
558 // LSL #0, and the other uses LSL #12.
559 //
560 // Most call frames will be allocated at the start of a function so
561 // this is OK, but it is a limitation that needs dealing with.
562 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
563
564 if (TLI->hasInlineStackProbe(MF) &&
566 // When stack probing is enabled, the decrement of SP may need to be
567 // probed. We only need to do this if the call site needs 1024 bytes of
568 // space or more, because a region smaller than that is allowed to be
569 // unprobed at an ABI boundary. We rely on the fact that SP has been
570 // probed exactly at this point, either by the prologue or most recent
571 // dynamic allocation.
573 "non-reserved call frame without var sized objects?");
574 Register ScratchReg =
575 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
576 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
577 } else {
578 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
579 StackOffset::getFixed(Amount), TII);
580 }
581 }
582 } else if (CalleePopAmount != 0) {
583 // If the calling convention demands that the callee pops arguments from the
584 // stack, we want to add it back if we have a reserved call frame.
585 assert(CalleePopAmount < 0xffffff && "call frame too large");
586 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
587 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
588 }
589 return MBB.erase(I);
590}
591
592void AArch64FrameLowering::emitCalleeSavedGPRLocations(
595 MachineFrameInfo &MFI = MF.getFrameInfo();
597 SMEAttrs Attrs(MF.getFunction());
598 bool LocallyStreaming =
599 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
600
601 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
602 if (CSI.empty())
603 return;
604
605 const TargetSubtargetInfo &STI = MF.getSubtarget();
606 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
607 const TargetInstrInfo &TII = *STI.getInstrInfo();
609
610 for (const auto &Info : CSI) {
611 unsigned FrameIdx = Info.getFrameIdx();
612 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
613 continue;
614
615 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
616 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
617 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
618
619 // The location of VG will be emitted before each streaming-mode change in
620 // the function. Only locally-streaming functions require emitting the
621 // non-streaming VG location here.
622 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
623 (!LocallyStreaming &&
624 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
625 continue;
626
627 unsigned CFIIndex = MF.addFrameInst(
628 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
629 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
630 .addCFIIndex(CFIIndex)
632 }
633}
634
635void AArch64FrameLowering::emitCalleeSavedSVELocations(
638 MachineFrameInfo &MFI = MF.getFrameInfo();
639
640 // Add callee saved registers to move list.
641 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
642 if (CSI.empty())
643 return;
644
645 const TargetSubtargetInfo &STI = MF.getSubtarget();
646 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
647 const TargetInstrInfo &TII = *STI.getInstrInfo();
650
651 for (const auto &Info : CSI) {
652 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
653 continue;
654
655 // Not all unwinders may know about SVE registers, so assume the lowest
656 // common demoninator.
657 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
658 unsigned Reg = Info.getReg();
659 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
660 continue;
661
663 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
665
666 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
667 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
668 .addCFIIndex(CFIIndex)
670 }
671}
672
676 unsigned DwarfReg) {
677 unsigned CFIIndex =
678 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
679 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
680}
681
683 MachineBasicBlock &MBB) const {
684
686 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
687 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
688 const auto &TRI =
689 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
690 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
691
692 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
693 DebugLoc DL;
694
695 // Reset the CFA to `SP + 0`.
697 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
698 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
699 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
700
701 // Flip the RA sign state.
702 if (MFI.shouldSignReturnAddress(MF)) {
703 auto CFIInst = MFI.branchProtectionPAuthLR()
706 CFIIndex = MF.addFrameInst(CFIInst);
707 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
708 }
709
710 // Shadow call stack uses X18, reset it.
711 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
712 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
713 TRI.getDwarfRegNum(AArch64::X18, true));
714
715 // Emit .cfi_same_value for callee-saved registers.
716 const std::vector<CalleeSavedInfo> &CSI =
718 for (const auto &Info : CSI) {
719 unsigned Reg = Info.getReg();
720 if (!TRI.regNeedsCFI(Reg, Reg))
721 continue;
722 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
723 TRI.getDwarfRegNum(Reg, true));
724 }
725}
726
729 bool SVE) {
731 MachineFrameInfo &MFI = MF.getFrameInfo();
732
733 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
734 if (CSI.empty())
735 return;
736
737 const TargetSubtargetInfo &STI = MF.getSubtarget();
738 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
739 const TargetInstrInfo &TII = *STI.getInstrInfo();
741
742 for (const auto &Info : CSI) {
743 if (SVE !=
744 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
745 continue;
746
747 unsigned Reg = Info.getReg();
748 if (SVE &&
749 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
750 continue;
751
752 if (!Info.isRestored())
753 continue;
754
755 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
756 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
757 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
758 .addCFIIndex(CFIIndex)
760 }
761}
762
763void AArch64FrameLowering::emitCalleeSavedGPRRestores(
766}
767
768void AArch64FrameLowering::emitCalleeSavedSVERestores(
771}
772
773// Return the maximum possible number of bytes for `Size` due to the
774// architectural limit on the size of a SVE register.
775static int64_t upperBound(StackOffset Size) {
776 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
777 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
778}
779
780void AArch64FrameLowering::allocateStackSpace(
782 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
783 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
784 bool FollowupAllocs) const {
785
786 if (!AllocSize)
787 return;
788
789 DebugLoc DL;
791 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
792 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
794 const MachineFrameInfo &MFI = MF.getFrameInfo();
795
796 const int64_t MaxAlign = MFI.getMaxAlign().value();
797 const uint64_t AndMask = ~(MaxAlign - 1);
798
799 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
800 Register TargetReg = RealignmentPadding
802 : AArch64::SP;
803 // SUB Xd/SP, SP, AllocSize
804 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
805 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
806 EmitCFI, InitialOffset);
807
808 if (RealignmentPadding) {
809 // AND SP, X9, 0b11111...0000
810 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
811 .addReg(TargetReg, RegState::Kill)
814 AFI.setStackRealigned(true);
815
816 // No need for SEH instructions here; if we're realigning the stack,
817 // we've set a frame pointer and already finished the SEH prologue.
818 assert(!NeedsWinCFI);
819 }
820 return;
821 }
822
823 //
824 // Stack probing allocation.
825 //
826
827 // Fixed length allocation. If we don't need to re-align the stack and don't
828 // have SVE objects, we can use a more efficient sequence for stack probing.
829 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
831 assert(ScratchReg != AArch64::NoRegister);
832 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
833 .addDef(ScratchReg)
834 .addImm(AllocSize.getFixed())
835 .addImm(InitialOffset.getFixed())
836 .addImm(InitialOffset.getScalable());
837 // The fixed allocation may leave unprobed bytes at the top of the
838 // stack. If we have subsequent alocation (e.g. if we have variable-sized
839 // objects), we need to issue an extra probe, so these allocations start in
840 // a known state.
841 if (FollowupAllocs) {
842 // STR XZR, [SP]
843 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
844 .addReg(AArch64::XZR)
845 .addReg(AArch64::SP)
846 .addImm(0)
848 }
849
850 return;
851 }
852
853 // Variable length allocation.
854
855 // If the (unknown) allocation size cannot exceed the probe size, decrement
856 // the stack pointer right away.
857 int64_t ProbeSize = AFI.getStackProbeSize();
858 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
859 Register ScratchReg = RealignmentPadding
861 : AArch64::SP;
862 assert(ScratchReg != AArch64::NoRegister);
863 // SUB Xd, SP, AllocSize
864 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
865 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
866 EmitCFI, InitialOffset);
867 if (RealignmentPadding) {
868 // AND SP, Xn, 0b11111...0000
869 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
870 .addReg(ScratchReg, RegState::Kill)
873 AFI.setStackRealigned(true);
874 }
875 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
877 // STR XZR, [SP]
878 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
879 .addReg(AArch64::XZR)
880 .addReg(AArch64::SP)
881 .addImm(0)
883 }
884 return;
885 }
886
887 // Emit a variable-length allocation probing loop.
888 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
889 // each of them guaranteed to adjust the stack by less than the probe size.
891 assert(TargetReg != AArch64::NoRegister);
892 // SUB Xd, SP, AllocSize
893 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
894 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
895 EmitCFI, InitialOffset);
896 if (RealignmentPadding) {
897 // AND Xn, Xn, 0b11111...0000
898 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
899 .addReg(TargetReg, RegState::Kill)
902 }
903
904 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
905 .addReg(TargetReg);
906 if (EmitCFI) {
907 // Set the CFA register back to SP.
908 unsigned Reg =
909 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
910 unsigned CFIIndex =
912 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
913 .addCFIIndex(CFIIndex)
915 }
916 if (RealignmentPadding)
917 AFI.setStackRealigned(true);
918}
919
920static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
921 switch (Reg.id()) {
922 default:
923 // The called routine is expected to preserve r19-r28
924 // r29 and r30 are used as frame pointer and link register resp.
925 return 0;
926
927 // GPRs
928#define CASE(n) \
929 case AArch64::W##n: \
930 case AArch64::X##n: \
931 return AArch64::X##n
932 CASE(0);
933 CASE(1);
934 CASE(2);
935 CASE(3);
936 CASE(4);
937 CASE(5);
938 CASE(6);
939 CASE(7);
940 CASE(8);
941 CASE(9);
942 CASE(10);
943 CASE(11);
944 CASE(12);
945 CASE(13);
946 CASE(14);
947 CASE(15);
948 CASE(16);
949 CASE(17);
950 CASE(18);
951#undef CASE
952
953 // FPRs
954#define CASE(n) \
955 case AArch64::B##n: \
956 case AArch64::H##n: \
957 case AArch64::S##n: \
958 case AArch64::D##n: \
959 case AArch64::Q##n: \
960 return HasSVE ? AArch64::Z##n : AArch64::Q##n
961 CASE(0);
962 CASE(1);
963 CASE(2);
964 CASE(3);
965 CASE(4);
966 CASE(5);
967 CASE(6);
968 CASE(7);
969 CASE(8);
970 CASE(9);
971 CASE(10);
972 CASE(11);
973 CASE(12);
974 CASE(13);
975 CASE(14);
976 CASE(15);
977 CASE(16);
978 CASE(17);
979 CASE(18);
980 CASE(19);
981 CASE(20);
982 CASE(21);
983 CASE(22);
984 CASE(23);
985 CASE(24);
986 CASE(25);
987 CASE(26);
988 CASE(27);
989 CASE(28);
990 CASE(29);
991 CASE(30);
992 CASE(31);
993#undef CASE
994 }
995}
996
997void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
998 MachineBasicBlock &MBB) const {
999 // Insertion point.
1001
1002 // Fake a debug loc.
1003 DebugLoc DL;
1004 if (MBBI != MBB.end())
1005 DL = MBBI->getDebugLoc();
1006
1007 const MachineFunction &MF = *MBB.getParent();
1009 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
1010
1011 BitVector GPRsToZero(TRI.getNumRegs());
1012 BitVector FPRsToZero(TRI.getNumRegs());
1013 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
1014 for (MCRegister Reg : RegsToZero.set_bits()) {
1015 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1016 // For GPRs, we only care to clear out the 64-bit register.
1017 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1018 GPRsToZero.set(XReg);
1019 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1020 // For FPRs,
1021 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1022 FPRsToZero.set(XReg);
1023 }
1024 }
1025
1026 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1027
1028 // Zero out GPRs.
1029 for (MCRegister Reg : GPRsToZero.set_bits())
1030 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1031
1032 // Zero out FP/vector registers.
1033 for (MCRegister Reg : FPRsToZero.set_bits())
1034 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1035
1036 if (HasSVE) {
1037 for (MCRegister PReg :
1038 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1039 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1040 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1041 AArch64::P15}) {
1042 if (RegsToZero[PReg])
1043 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1044 }
1045 }
1046}
1047
1049 const MachineBasicBlock &MBB) {
1050 const MachineFunction *MF = MBB.getParent();
1051 LiveRegs.addLiveIns(MBB);
1052 // Mark callee saved registers as used so we will not choose them.
1053 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1054 for (unsigned i = 0; CSRegs[i]; ++i)
1055 LiveRegs.addReg(CSRegs[i]);
1056}
1057
1058// Find a scratch register that we can use at the start of the prologue to
1059// re-align the stack pointer. We avoid using callee-save registers since they
1060// may appear to be free when this is called from canUseAsPrologue (during
1061// shrink wrapping), but then no longer be free when this is called from
1062// emitPrologue.
1063//
1064// FIXME: This is a bit conservative, since in the above case we could use one
1065// of the callee-save registers as a scratch temp to re-align the stack pointer,
1066// but we would then have to make sure that we were in fact saving at least one
1067// callee-save register in the prologue, which is additional complexity that
1068// doesn't seem worth the benefit.
1070 MachineFunction *MF = MBB->getParent();
1071
1072 // If MBB is an entry block, use X9 as the scratch register
1073 // preserve_none functions may be using X9 to pass arguments,
1074 // so prefer to pick an available register below.
1075 if (&MF->front() == MBB &&
1077 return AArch64::X9;
1078
1079 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1080 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1081 LivePhysRegs LiveRegs(TRI);
1082 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1083
1084 // Prefer X9 since it was historically used for the prologue scratch reg.
1085 const MachineRegisterInfo &MRI = MF->getRegInfo();
1086 if (LiveRegs.available(MRI, AArch64::X9))
1087 return AArch64::X9;
1088
1089 for (unsigned Reg : AArch64::GPR64RegClass) {
1090 if (LiveRegs.available(MRI, Reg))
1091 return Reg;
1092 }
1093 return AArch64::NoRegister;
1094}
1095
1097 const MachineBasicBlock &MBB) const {
1098 const MachineFunction *MF = MBB.getParent();
1099 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1100 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1101 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1102 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1104
1105 if (AFI->hasSwiftAsyncContext()) {
1106 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1107 const MachineRegisterInfo &MRI = MF->getRegInfo();
1108 LivePhysRegs LiveRegs(TRI);
1109 getLiveRegsForEntryMBB(LiveRegs, MBB);
1110 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1111 // available.
1112 if (!LiveRegs.available(MRI, AArch64::X16) ||
1113 !LiveRegs.available(MRI, AArch64::X17))
1114 return false;
1115 }
1116
1117 // Certain stack probing sequences might clobber flags, then we can't use
1118 // the block as a prologue if the flags register is a live-in.
1120 MBB.isLiveIn(AArch64::NZCV))
1121 return false;
1122
1123 // Don't need a scratch register if we're not going to re-align the stack or
1124 // emit stack probes.
1125 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1126 return true;
1127 // Otherwise, we can use any block as long as it has a scratch register
1128 // available.
1129 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1130}
1131
1133 uint64_t StackSizeInBytes) {
1134 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1136 // TODO: When implementing stack protectors, take that into account
1137 // for the probe threshold.
1138 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1139 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1140}
1141
1142static bool needsWinCFI(const MachineFunction &MF) {
1143 const Function &F = MF.getFunction();
1144 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1145 F.needsUnwindTableEntry();
1146}
1147
1148bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1149 MachineFunction &MF, uint64_t StackBumpBytes) const {
1151 const MachineFrameInfo &MFI = MF.getFrameInfo();
1152 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1153 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1154 if (homogeneousPrologEpilog(MF))
1155 return false;
1156
1157 if (AFI->getLocalStackSize() == 0)
1158 return false;
1159
1160 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1161 // (to force a stp with predecrement) to match the packed unwind format,
1162 // provided that there actually are any callee saved registers to merge the
1163 // decrement with.
1164 // This is potentially marginally slower, but allows using the packed
1165 // unwind format for functions that both have a local area and callee saved
1166 // registers. Using the packed unwind format notably reduces the size of
1167 // the unwind info.
1168 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1169 MF.getFunction().hasOptSize())
1170 return false;
1171
1172 // 512 is the maximum immediate for stp/ldp that will be used for
1173 // callee-save save/restores
1174 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1175 return false;
1176
1177 if (MFI.hasVarSizedObjects())
1178 return false;
1179
1180 if (RegInfo->hasStackRealignment(MF))
1181 return false;
1182
1183 // This isn't strictly necessary, but it simplifies things a bit since the
1184 // current RedZone handling code assumes the SP is adjusted by the
1185 // callee-save save/restore code.
1186 if (canUseRedZone(MF))
1187 return false;
1188
1189 // When there is an SVE area on the stack, always allocate the
1190 // callee-saves and spills/locals separately.
1191 if (getSVEStackSize(MF))
1192 return false;
1193
1194 return true;
1195}
1196
1197bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1198 MachineBasicBlock &MBB, uint64_t StackBumpBytes) const {
1199 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1200 return false;
1201 if (MBB.empty())
1202 return true;
1203
1204 // Disable combined SP bump if the last instruction is an MTE tag store. It
1205 // is almost always better to merge SP adjustment into those instructions.
1208 while (LastI != Begin) {
1209 --LastI;
1210 if (LastI->isTransient())
1211 continue;
1212 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1213 break;
1214 }
1215 switch (LastI->getOpcode()) {
1216 case AArch64::STGloop:
1217 case AArch64::STZGloop:
1218 case AArch64::STGi:
1219 case AArch64::STZGi:
1220 case AArch64::ST2Gi:
1221 case AArch64::STZ2Gi:
1222 return false;
1223 default:
1224 return true;
1225 }
1226 llvm_unreachable("unreachable");
1227}
1228
1229// Given a load or a store instruction, generate an appropriate unwinding SEH
1230// code on Windows.
1232 const TargetInstrInfo &TII,
1233 MachineInstr::MIFlag Flag) {
1234 unsigned Opc = MBBI->getOpcode();
1236 MachineFunction &MF = *MBB->getParent();
1237 DebugLoc DL = MBBI->getDebugLoc();
1238 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1239 int Imm = MBBI->getOperand(ImmIdx).getImm();
1241 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1242 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1243
1244 switch (Opc) {
1245 default:
1246 llvm_unreachable("No SEH Opcode for this instruction");
1247 case AArch64::LDPDpost:
1248 Imm = -Imm;
1249 [[fallthrough]];
1250 case AArch64::STPDpre: {
1251 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1252 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1253 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1254 .addImm(Reg0)
1255 .addImm(Reg1)
1256 .addImm(Imm * 8)
1257 .setMIFlag(Flag);
1258 break;
1259 }
1260 case AArch64::LDPXpost:
1261 Imm = -Imm;
1262 [[fallthrough]];
1263 case AArch64::STPXpre: {
1264 Register Reg0 = MBBI->getOperand(1).getReg();
1265 Register Reg1 = MBBI->getOperand(2).getReg();
1266 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1267 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1268 .addImm(Imm * 8)
1269 .setMIFlag(Flag);
1270 else
1271 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1272 .addImm(RegInfo->getSEHRegNum(Reg0))
1273 .addImm(RegInfo->getSEHRegNum(Reg1))
1274 .addImm(Imm * 8)
1275 .setMIFlag(Flag);
1276 break;
1277 }
1278 case AArch64::LDRDpost:
1279 Imm = -Imm;
1280 [[fallthrough]];
1281 case AArch64::STRDpre: {
1282 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1283 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1284 .addImm(Reg)
1285 .addImm(Imm)
1286 .setMIFlag(Flag);
1287 break;
1288 }
1289 case AArch64::LDRXpost:
1290 Imm = -Imm;
1291 [[fallthrough]];
1292 case AArch64::STRXpre: {
1293 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1294 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1295 .addImm(Reg)
1296 .addImm(Imm)
1297 .setMIFlag(Flag);
1298 break;
1299 }
1300 case AArch64::STPDi:
1301 case AArch64::LDPDi: {
1302 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1303 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1304 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1305 .addImm(Reg0)
1306 .addImm(Reg1)
1307 .addImm(Imm * 8)
1308 .setMIFlag(Flag);
1309 break;
1310 }
1311 case AArch64::STPXi:
1312 case AArch64::LDPXi: {
1313 Register Reg0 = MBBI->getOperand(0).getReg();
1314 Register Reg1 = MBBI->getOperand(1).getReg();
1315 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1316 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1317 .addImm(Imm * 8)
1318 .setMIFlag(Flag);
1319 else
1320 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1321 .addImm(RegInfo->getSEHRegNum(Reg0))
1322 .addImm(RegInfo->getSEHRegNum(Reg1))
1323 .addImm(Imm * 8)
1324 .setMIFlag(Flag);
1325 break;
1326 }
1327 case AArch64::STRXui:
1328 case AArch64::LDRXui: {
1329 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1330 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1331 .addImm(Reg)
1332 .addImm(Imm * 8)
1333 .setMIFlag(Flag);
1334 break;
1335 }
1336 case AArch64::STRDui:
1337 case AArch64::LDRDui: {
1338 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1339 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1340 .addImm(Reg)
1341 .addImm(Imm * 8)
1342 .setMIFlag(Flag);
1343 break;
1344 }
1345 case AArch64::STPQi:
1346 case AArch64::LDPQi: {
1347 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1348 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1349 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1350 .addImm(Reg0)
1351 .addImm(Reg1)
1352 .addImm(Imm * 16)
1353 .setMIFlag(Flag);
1354 break;
1355 }
1356 case AArch64::LDPQpost:
1357 Imm = -Imm;
1358 [[fallthrough]];
1359 case AArch64::STPQpre: {
1360 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1361 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1362 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1363 .addImm(Reg0)
1364 .addImm(Reg1)
1365 .addImm(Imm * 16)
1366 .setMIFlag(Flag);
1367 break;
1368 }
1369 }
1370 auto I = MBB->insertAfter(MBBI, MIB);
1371 return I;
1372}
1373
1374// Fix up the SEH opcode associated with the save/restore instruction.
1376 unsigned LocalStackSize) {
1377 MachineOperand *ImmOpnd = nullptr;
1378 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1379 switch (MBBI->getOpcode()) {
1380 default:
1381 llvm_unreachable("Fix the offset in the SEH instruction");
1382 case AArch64::SEH_SaveFPLR:
1383 case AArch64::SEH_SaveRegP:
1384 case AArch64::SEH_SaveReg:
1385 case AArch64::SEH_SaveFRegP:
1386 case AArch64::SEH_SaveFReg:
1387 case AArch64::SEH_SaveAnyRegQP:
1388 case AArch64::SEH_SaveAnyRegQPX:
1389 ImmOpnd = &MBBI->getOperand(ImmIdx);
1390 break;
1391 }
1392 if (ImmOpnd)
1393 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1394}
1395
1398 return AFI->hasStreamingModeChanges() &&
1399 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1400}
1401
1404 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1405 // is enabled with streaming mode changes.
1406 if (!AFI->hasStreamingModeChanges())
1407 return false;
1408 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1409 if (ST.isTargetDarwin())
1410 return ST.hasSVE();
1411 return true;
1412}
1413
1415 unsigned Opc = MBBI->getOpcode();
1416 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1417 Opc == AArch64::UBFMXri)
1418 return true;
1419
1420 if (requiresGetVGCall(*MBBI->getMF())) {
1421 if (Opc == AArch64::ORRXrr)
1422 return true;
1423
1424 if (Opc == AArch64::BL) {
1425 auto Op1 = MBBI->getOperand(0);
1426 return Op1.isSymbol() &&
1427 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1428 }
1429 }
1430
1431 return false;
1432}
1433
1434// Convert callee-save register save/restore instruction to do stack pointer
1435// decrement/increment to allocate/deallocate the callee-save stack area by
1436// converting store/load to use pre/post increment version.
1439 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1440 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1442 int CFAOffset = 0) {
1443 unsigned NewOpc;
1444
1445 // If the function contains streaming mode changes, we expect instructions
1446 // to calculate the value of VG before spilling. For locally-streaming
1447 // functions, we need to do this for both the streaming and non-streaming
1448 // vector length. Move past these instructions if necessary.
1449 MachineFunction &MF = *MBB.getParent();
1450 if (requiresSaveVG(MF))
1451 while (isVGInstruction(MBBI))
1452 ++MBBI;
1453
1454 switch (MBBI->getOpcode()) {
1455 default:
1456 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1457 case AArch64::STPXi:
1458 NewOpc = AArch64::STPXpre;
1459 break;
1460 case AArch64::STPDi:
1461 NewOpc = AArch64::STPDpre;
1462 break;
1463 case AArch64::STPQi:
1464 NewOpc = AArch64::STPQpre;
1465 break;
1466 case AArch64::STRXui:
1467 NewOpc = AArch64::STRXpre;
1468 break;
1469 case AArch64::STRDui:
1470 NewOpc = AArch64::STRDpre;
1471 break;
1472 case AArch64::STRQui:
1473 NewOpc = AArch64::STRQpre;
1474 break;
1475 case AArch64::LDPXi:
1476 NewOpc = AArch64::LDPXpost;
1477 break;
1478 case AArch64::LDPDi:
1479 NewOpc = AArch64::LDPDpost;
1480 break;
1481 case AArch64::LDPQi:
1482 NewOpc = AArch64::LDPQpost;
1483 break;
1484 case AArch64::LDRXui:
1485 NewOpc = AArch64::LDRXpost;
1486 break;
1487 case AArch64::LDRDui:
1488 NewOpc = AArch64::LDRDpost;
1489 break;
1490 case AArch64::LDRQui:
1491 NewOpc = AArch64::LDRQpost;
1492 break;
1493 }
1494 // Get rid of the SEH code associated with the old instruction.
1495 if (NeedsWinCFI) {
1496 auto SEH = std::next(MBBI);
1498 SEH->eraseFromParent();
1499 }
1500
1501 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1502 int64_t MinOffset, MaxOffset;
1503 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1504 NewOpc, Scale, Width, MinOffset, MaxOffset);
1505 (void)Success;
1506 assert(Success && "unknown load/store opcode");
1507
1508 // If the first store isn't right where we want SP then we can't fold the
1509 // update in so create a normal arithmetic instruction instead.
1510 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1511 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1512 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1513 // If we are destroying the frame, make sure we add the increment after the
1514 // last frame operation.
1515 if (FrameFlag == MachineInstr::FrameDestroy)
1516 ++MBBI;
1517 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1518 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1519 false, false, nullptr, EmitCFI,
1520 StackOffset::getFixed(CFAOffset));
1521
1522 return std::prev(MBBI);
1523 }
1524
1525 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1526 MIB.addReg(AArch64::SP, RegState::Define);
1527
1528 // Copy all operands other than the immediate offset.
1529 unsigned OpndIdx = 0;
1530 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1531 ++OpndIdx)
1532 MIB.add(MBBI->getOperand(OpndIdx));
1533
1534 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1535 "Unexpected immediate offset in first/last callee-save save/restore "
1536 "instruction!");
1537 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1538 "Unexpected base register in callee-save save/restore instruction!");
1539 assert(CSStackSizeInc % Scale == 0);
1540 MIB.addImm(CSStackSizeInc / (int)Scale);
1541
1542 MIB.setMIFlags(MBBI->getFlags());
1543 MIB.setMemRefs(MBBI->memoperands());
1544
1545 // Generate a new SEH code that corresponds to the new instruction.
1546 if (NeedsWinCFI) {
1547 *HasWinCFI = true;
1548 InsertSEH(*MIB, *TII, FrameFlag);
1549 }
1550
1551 if (EmitCFI) {
1552 unsigned CFIIndex = MF.addFrameInst(
1553 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1554 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1555 .addCFIIndex(CFIIndex)
1556 .setMIFlags(FrameFlag);
1557 }
1558
1559 return std::prev(MBB.erase(MBBI));
1560}
1561
1562// Fixup callee-save register save/restore instructions to take into account
1563// combined SP bump by adding the local stack size to the stack offsets.
1565 uint64_t LocalStackSize,
1566 bool NeedsWinCFI,
1567 bool *HasWinCFI) {
1569 return;
1570
1571 unsigned Opc = MI.getOpcode();
1572 unsigned Scale;
1573 switch (Opc) {
1574 case AArch64::STPXi:
1575 case AArch64::STRXui:
1576 case AArch64::STPDi:
1577 case AArch64::STRDui:
1578 case AArch64::LDPXi:
1579 case AArch64::LDRXui:
1580 case AArch64::LDPDi:
1581 case AArch64::LDRDui:
1582 Scale = 8;
1583 break;
1584 case AArch64::STPQi:
1585 case AArch64::STRQui:
1586 case AArch64::LDPQi:
1587 case AArch64::LDRQui:
1588 Scale = 16;
1589 break;
1590 default:
1591 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1592 }
1593
1594 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1595 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1596 "Unexpected base register in callee-save save/restore instruction!");
1597 // Last operand is immediate offset that needs fixing.
1598 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1599 // All generated opcodes have scaled offsets.
1600 assert(LocalStackSize % Scale == 0);
1601 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1602
1603 if (NeedsWinCFI) {
1604 *HasWinCFI = true;
1605 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1606 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1608 "Expecting a SEH instruction");
1609 fixupSEHOpcode(MBBI, LocalStackSize);
1610 }
1611}
1612
1613static bool isTargetWindows(const MachineFunction &MF) {
1615}
1616
1617static unsigned getStackHazardSize(const MachineFunction &MF) {
1618 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1619}
1620
1621// Convenience function to determine whether I is an SVE callee save.
1623 switch (I->getOpcode()) {
1624 default:
1625 return false;
1626 case AArch64::PTRUE_C_B:
1627 case AArch64::LD1B_2Z_IMM:
1628 case AArch64::ST1B_2Z_IMM:
1629 case AArch64::STR_ZXI:
1630 case AArch64::STR_PXI:
1631 case AArch64::LDR_ZXI:
1632 case AArch64::LDR_PXI:
1633 return I->getFlag(MachineInstr::FrameSetup) ||
1634 I->getFlag(MachineInstr::FrameDestroy);
1635 }
1636}
1637
1639 MachineFunction &MF,
1642 const DebugLoc &DL, bool NeedsWinCFI,
1643 bool NeedsUnwindInfo) {
1644 // Shadow call stack prolog: str x30, [x18], #8
1645 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1646 .addReg(AArch64::X18, RegState::Define)
1647 .addReg(AArch64::LR)
1648 .addReg(AArch64::X18)
1649 .addImm(8)
1651
1652 // This instruction also makes x18 live-in to the entry block.
1653 MBB.addLiveIn(AArch64::X18);
1654
1655 if (NeedsWinCFI)
1656 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1658
1659 if (NeedsUnwindInfo) {
1660 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1661 // x18 when unwinding past this frame.
1662 static const char CFIInst[] = {
1663 dwarf::DW_CFA_val_expression,
1664 18, // register
1665 2, // length
1666 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1667 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1668 };
1669 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1670 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1671 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1672 .addCFIIndex(CFIIndex)
1674 }
1675}
1676
1678 MachineFunction &MF,
1681 const DebugLoc &DL) {
1682 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1683 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1684 .addReg(AArch64::X18, RegState::Define)
1685 .addReg(AArch64::LR, RegState::Define)
1686 .addReg(AArch64::X18)
1687 .addImm(-8)
1689
1691 unsigned CFIIndex =
1693 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1694 .addCFIIndex(CFIIndex)
1696 }
1697}
1698
1699// Define the current CFA rule to use the provided FP.
1702 const DebugLoc &DL, unsigned FixedObject) {
1705 const TargetInstrInfo *TII = STI.getInstrInfo();
1707
1708 const int OffsetToFirstCalleeSaveFromFP =
1711 Register FramePtr = TRI->getFrameRegister(MF);
1712 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1713 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1714 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1715 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1716 .addCFIIndex(CFIIndex)
1718}
1719
1720#ifndef NDEBUG
1721/// Collect live registers from the end of \p MI's parent up to (including) \p
1722/// MI in \p LiveRegs.
1724 LivePhysRegs &LiveRegs) {
1725
1726 MachineBasicBlock &MBB = *MI.getParent();
1727 LiveRegs.addLiveOuts(MBB);
1728 for (const MachineInstr &MI :
1729 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1730 LiveRegs.stepBackward(MI);
1731}
1732#endif
1733
1735 MachineBasicBlock &MBB) const {
1737 const MachineFrameInfo &MFI = MF.getFrameInfo();
1738 const Function &F = MF.getFunction();
1739 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1740 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1741 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1742
1744 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1745 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1746 bool HasFP = hasFP(MF);
1747 bool NeedsWinCFI = needsWinCFI(MF);
1748 bool HasWinCFI = false;
1749 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1750
1752#ifndef NDEBUG
1754 // Collect live register from the end of MBB up to the start of the existing
1755 // frame setup instructions.
1756 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1757 while (NonFrameStart != End &&
1758 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1759 ++NonFrameStart;
1760
1761 LivePhysRegs LiveRegs(*TRI);
1762 if (NonFrameStart != MBB.end()) {
1763 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1764 // Ignore registers used for stack management for now.
1765 LiveRegs.removeReg(AArch64::SP);
1766 LiveRegs.removeReg(AArch64::X19);
1767 LiveRegs.removeReg(AArch64::FP);
1768 LiveRegs.removeReg(AArch64::LR);
1769
1770 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1771 // This is necessary to spill VG if required where SVE is unavailable, but
1772 // X0 is preserved around this call.
1773 if (requiresGetVGCall(MF))
1774 LiveRegs.removeReg(AArch64::X0);
1775 }
1776
1777 auto VerifyClobberOnExit = make_scope_exit([&]() {
1778 if (NonFrameStart == MBB.end())
1779 return;
1780 // Check if any of the newly instructions clobber any of the live registers.
1781 for (MachineInstr &MI :
1782 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1783 for (auto &Op : MI.operands())
1784 if (Op.isReg() && Op.isDef())
1785 assert(!LiveRegs.contains(Op.getReg()) &&
1786 "live register clobbered by inserted prologue instructions");
1787 }
1788 });
1789#endif
1790
1791 bool IsFunclet = MBB.isEHFuncletEntry();
1792
1793 // At this point, we're going to decide whether or not the function uses a
1794 // redzone. In most cases, the function doesn't have a redzone so let's
1795 // assume that's false and set it to true in the case that there's a redzone.
1796 AFI->setHasRedZone(false);
1797
1798 // Debug location must be unknown since the first debug location is used
1799 // to determine the end of the prologue.
1800 DebugLoc DL;
1801
1802 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1803 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1804 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1805 MFnI.needsDwarfUnwindInfo(MF));
1806
1807 if (MFnI.shouldSignReturnAddress(MF)) {
1808 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1810 if (NeedsWinCFI)
1811 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1812 }
1813
1814 if (EmitCFI && MFnI.isMTETagged()) {
1815 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1817 }
1818
1819 // We signal the presence of a Swift extended frame to external tools by
1820 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1821 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1822 // bits so that is still true.
1823 if (HasFP && AFI->hasSwiftAsyncContext()) {
1826 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1827 // The special symbol below is absolute and has a *value* that can be
1828 // combined with the frame pointer to signal an extended frame.
1829 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1830 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1832 if (NeedsWinCFI) {
1833 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1835 HasWinCFI = true;
1836 }
1837 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1838 .addUse(AArch64::FP)
1839 .addUse(AArch64::X16)
1840 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1841 if (NeedsWinCFI) {
1842 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1844 HasWinCFI = true;
1845 }
1846 break;
1847 }
1848 [[fallthrough]];
1849
1851 // ORR x29, x29, #0x1000_0000_0000_0000
1852 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1853 .addUse(AArch64::FP)
1854 .addImm(0x1100)
1856 if (NeedsWinCFI) {
1857 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1859 HasWinCFI = true;
1860 }
1861 break;
1862
1864 break;
1865 }
1866 }
1867
1868 // All calls are tail calls in GHC calling conv, and functions have no
1869 // prologue/epilogue.
1871 return;
1872
1873 // Set tagged base pointer to the requested stack slot.
1874 // Ideally it should match SP value after prologue.
1875 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1876 if (TBPI)
1878 else
1880
1881 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1882
1883 // getStackSize() includes all the locals in its size calculation. We don't
1884 // include these locals when computing the stack size of a funclet, as they
1885 // are allocated in the parent's stack frame and accessed via the frame
1886 // pointer from the funclet. We only save the callee saved registers in the
1887 // funclet, which are really the callee saved registers of the parent
1888 // function, including the funclet.
1889 int64_t NumBytes =
1890 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
1891 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1892 assert(!HasFP && "unexpected function without stack frame but with FP");
1893 assert(!SVEStackSize &&
1894 "unexpected function without stack frame but with SVE objects");
1895 // All of the stack allocation is for locals.
1896 AFI->setLocalStackSize(NumBytes);
1897 if (!NumBytes)
1898 return;
1899 // REDZONE: If the stack size is less than 128 bytes, we don't need
1900 // to actually allocate.
1901 if (canUseRedZone(MF)) {
1902 AFI->setHasRedZone(true);
1903 ++NumRedZoneFunctions;
1904 } else {
1905 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1906 StackOffset::getFixed(-NumBytes), TII,
1907 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1908 if (EmitCFI) {
1909 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1910 MCSymbol *FrameLabel = MF.getContext().createTempSymbol();
1911 // Encode the stack size of the leaf function.
1912 unsigned CFIIndex = MF.addFrameInst(
1913 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1914 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1915 .addCFIIndex(CFIIndex)
1917 }
1918 }
1919
1920 if (NeedsWinCFI) {
1921 HasWinCFI = true;
1922 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1924 }
1925
1926 return;
1927 }
1928
1929 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1930 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1931
1932 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1933 // All of the remaining stack allocations are for locals.
1934 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1935 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1936 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1937 if (CombineSPBump) {
1938 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1939 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1940 StackOffset::getFixed(-NumBytes), TII,
1941 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1942 EmitAsyncCFI);
1943 NumBytes = 0;
1944 } else if (HomPrologEpilog) {
1945 // Stack has been already adjusted.
1946 NumBytes -= PrologueSaveSize;
1947 } else if (PrologueSaveSize != 0) {
1949 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1950 EmitAsyncCFI);
1951 NumBytes -= PrologueSaveSize;
1952 }
1953 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1954
1955 // Move past the saves of the callee-saved registers, fixing up the offsets
1956 // and pre-inc if we decided to combine the callee-save and local stack
1957 // pointer bump above.
1958 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1960 if (CombineSPBump &&
1961 // Only fix-up frame-setup load/store instructions.
1964 NeedsWinCFI, &HasWinCFI);
1965 ++MBBI;
1966 }
1967
1968 // For funclets the FP belongs to the containing function.
1969 if (!IsFunclet && HasFP) {
1970 // Only set up FP if we actually need to.
1971 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1972
1973 if (CombineSPBump)
1974 FPOffset += AFI->getLocalStackSize();
1975
1976 if (AFI->hasSwiftAsyncContext()) {
1977 // Before we update the live FP we have to ensure there's a valid (or
1978 // null) asynchronous context in its slot just before FP in the frame
1979 // record, so store it now.
1980 const auto &Attrs = MF.getFunction().getAttributes();
1981 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1982 if (HaveInitialContext)
1983 MBB.addLiveIn(AArch64::X22);
1984 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1985 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1986 .addUse(Reg)
1987 .addUse(AArch64::SP)
1988 .addImm(FPOffset - 8)
1990 if (NeedsWinCFI) {
1991 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1992 // to multiple instructions, should be mutually-exclusive.
1993 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1994 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1996 HasWinCFI = true;
1997 }
1998 }
1999
2000 if (HomPrologEpilog) {
2001 auto Prolog = MBBI;
2002 --Prolog;
2003 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
2004 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
2005 } else {
2006 // Issue sub fp, sp, FPOffset or
2007 // mov fp,sp when FPOffset is zero.
2008 // Note: All stores of callee-saved registers are marked as "FrameSetup".
2009 // This code marks the instruction(s) that set the FP also.
2010 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
2011 StackOffset::getFixed(FPOffset), TII,
2012 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
2013 if (NeedsWinCFI && HasWinCFI) {
2014 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2016 // After setting up the FP, the rest of the prolog doesn't need to be
2017 // included in the SEH unwind info.
2018 NeedsWinCFI = false;
2019 }
2020 }
2021 if (EmitAsyncCFI)
2022 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2023 }
2024
2025 // Now emit the moves for whatever callee saved regs we have (including FP,
2026 // LR if those are saved). Frame instructions for SVE register are emitted
2027 // later, after the instruction which actually save SVE regs.
2028 if (EmitAsyncCFI)
2029 emitCalleeSavedGPRLocations(MBB, MBBI);
2030
2031 // Alignment is required for the parent frame, not the funclet
2032 const bool NeedsRealignment =
2033 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2034 const int64_t RealignmentPadding =
2035 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2036 ? MFI.getMaxAlign().value() - 16
2037 : 0;
2038
2039 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2040 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2041 if (NeedsWinCFI) {
2042 HasWinCFI = true;
2043 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2044 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2045 // This is at most two instructions, MOVZ follwed by MOVK.
2046 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2047 // exceeding 256MB in size.
2048 if (NumBytes >= (1 << 28))
2049 report_fatal_error("Stack size cannot exceed 256MB for stack "
2050 "unwinding purposes");
2051
2052 uint32_t LowNumWords = NumWords & 0xFFFF;
2053 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2054 .addImm(LowNumWords)
2057 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2059 if ((NumWords & 0xFFFF0000) != 0) {
2060 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2061 .addReg(AArch64::X15)
2062 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2065 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2067 }
2068 } else {
2069 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2070 .addImm(NumWords)
2072 }
2073
2074 const char *ChkStk = Subtarget.getChkStkName();
2075 switch (MF.getTarget().getCodeModel()) {
2076 case CodeModel::Tiny:
2077 case CodeModel::Small:
2078 case CodeModel::Medium:
2079 case CodeModel::Kernel:
2080 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2081 .addExternalSymbol(ChkStk)
2082 .addReg(AArch64::X15, RegState::Implicit)
2087 if (NeedsWinCFI) {
2088 HasWinCFI = true;
2089 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2091 }
2092 break;
2093 case CodeModel::Large:
2094 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2095 .addReg(AArch64::X16, RegState::Define)
2096 .addExternalSymbol(ChkStk)
2097 .addExternalSymbol(ChkStk)
2099 if (NeedsWinCFI) {
2100 HasWinCFI = true;
2101 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2103 }
2104
2105 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2106 .addReg(AArch64::X16, RegState::Kill)
2112 if (NeedsWinCFI) {
2113 HasWinCFI = true;
2114 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2116 }
2117 break;
2118 }
2119
2120 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2121 .addReg(AArch64::SP, RegState::Kill)
2122 .addReg(AArch64::X15, RegState::Kill)
2125 if (NeedsWinCFI) {
2126 HasWinCFI = true;
2127 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2128 .addImm(NumBytes)
2130 }
2131 NumBytes = 0;
2132
2133 if (RealignmentPadding > 0) {
2134 if (RealignmentPadding >= 4096) {
2135 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2136 .addReg(AArch64::X16, RegState::Define)
2137 .addImm(RealignmentPadding)
2139 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2140 .addReg(AArch64::SP)
2141 .addReg(AArch64::X16, RegState::Kill)
2144 } else {
2145 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2146 .addReg(AArch64::SP)
2147 .addImm(RealignmentPadding)
2148 .addImm(0)
2150 }
2151
2152 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2153 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2154 .addReg(AArch64::X15, RegState::Kill)
2156 AFI->setStackRealigned(true);
2157
2158 // No need for SEH instructions here; if we're realigning the stack,
2159 // we've set a frame pointer and already finished the SEH prologue.
2160 assert(!NeedsWinCFI);
2161 }
2162 }
2163
2164 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2165 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2166
2167 // Process the SVE callee-saves to determine what space needs to be
2168 // allocated.
2169 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2170 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2171 << "\n");
2172 // Find callee save instructions in frame.
2173 CalleeSavesBegin = MBBI;
2174 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2176 ++MBBI;
2177 CalleeSavesEnd = MBBI;
2178
2179 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2180 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2181 }
2182
2183 // Allocate space for the callee saves (if any).
2184 StackOffset CFAOffset =
2185 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2186 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2187 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2188 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2189 MFI.hasVarSizedObjects() || LocalsSize);
2190 CFAOffset += SVECalleeSavesSize;
2191
2192 if (EmitAsyncCFI)
2193 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2194
2195 // Allocate space for the rest of the frame including SVE locals. Align the
2196 // stack as necessary.
2197 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2198 "Cannot use redzone with stack realignment");
2199 if (!canUseRedZone(MF)) {
2200 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2201 // the correct value here, as NumBytes also includes padding bytes,
2202 // which shouldn't be counted here.
2203 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2204 SVELocalsSize + StackOffset::getFixed(NumBytes),
2205 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2206 CFAOffset, MFI.hasVarSizedObjects());
2207 }
2208
2209 // If we need a base pointer, set it up here. It's whatever the value of the
2210 // stack pointer is at this point. Any variable size objects will be allocated
2211 // after this, so we can still use the base pointer to reference locals.
2212 //
2213 // FIXME: Clarify FrameSetup flags here.
2214 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2215 // needed.
2216 // For funclets the BP belongs to the containing function.
2217 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2218 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2219 false);
2220 if (NeedsWinCFI) {
2221 HasWinCFI = true;
2222 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2224 }
2225 }
2226
2227 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2228 // SEH opcode indicating the prologue end.
2229 if (NeedsWinCFI && HasWinCFI) {
2230 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2232 }
2233
2234 // SEH funclets are passed the frame pointer in X1. If the parent
2235 // function uses the base register, then the base register is used
2236 // directly, and is not retrieved from X1.
2237 if (IsFunclet && F.hasPersonalityFn()) {
2238 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2239 if (isAsynchronousEHPersonality(Per)) {
2240 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2241 .addReg(AArch64::X1)
2243 MBB.addLiveIn(AArch64::X1);
2244 }
2245 }
2246
2247 if (EmitCFI && !EmitAsyncCFI) {
2248 if (HasFP) {
2249 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2250 } else {
2251 StackOffset TotalSize =
2252 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2253 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2254 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2255 /*LastAdjustmentWasScalable=*/false));
2256 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2257 .addCFIIndex(CFIIndex)
2259 }
2260 emitCalleeSavedGPRLocations(MBB, MBBI);
2261 emitCalleeSavedSVELocations(MBB, MBBI);
2262 }
2263}
2264
2266 switch (MI.getOpcode()) {
2267 default:
2268 return false;
2269 case AArch64::CATCHRET:
2270 case AArch64::CLEANUPRET:
2271 return true;
2272 }
2273}
2274
2276 MachineBasicBlock &MBB) const {
2278 MachineFrameInfo &MFI = MF.getFrameInfo();
2280 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2281 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2282 DebugLoc DL;
2283 bool NeedsWinCFI = needsWinCFI(MF);
2284 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2285 bool HasWinCFI = false;
2286 bool IsFunclet = false;
2287
2288 if (MBB.end() != MBBI) {
2289 DL = MBBI->getDebugLoc();
2290 IsFunclet = isFuncletReturnInstr(*MBBI);
2291 }
2292
2293 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2294
2295 auto FinishingTouches = make_scope_exit([&]() {
2296 if (AFI->shouldSignReturnAddress(MF)) {
2297 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2298 TII->get(AArch64::PAUTH_EPILOGUE))
2299 .setMIFlag(MachineInstr::FrameDestroy);
2300 if (NeedsWinCFI)
2301 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2302 }
2305 if (EmitCFI)
2306 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2307 if (HasWinCFI) {
2309 TII->get(AArch64::SEH_EpilogEnd))
2311 if (!MF.hasWinCFI())
2312 MF.setHasWinCFI(true);
2313 }
2314 if (NeedsWinCFI) {
2315 assert(EpilogStartI != MBB.end());
2316 if (!HasWinCFI)
2317 MBB.erase(EpilogStartI);
2318 }
2319 });
2320
2321 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2322 : MFI.getStackSize();
2323
2324 // All calls are tail calls in GHC calling conv, and functions have no
2325 // prologue/epilogue.
2327 return;
2328
2329 // How much of the stack used by incoming arguments this function is expected
2330 // to restore in this particular epilogue.
2331 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2332 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2333 MF.getFunction().isVarArg());
2334 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2335
2336 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2337 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2338 // We cannot rely on the local stack size set in emitPrologue if the function
2339 // has funclets, as funclets have different local stack size requirements, and
2340 // the current value set in emitPrologue may be that of the containing
2341 // function.
2342 if (MF.hasEHFunclets())
2343 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2344 if (homogeneousPrologEpilog(MF, &MBB)) {
2345 assert(!NeedsWinCFI);
2346 auto LastPopI = MBB.getFirstTerminator();
2347 if (LastPopI != MBB.begin()) {
2348 auto HomogeneousEpilog = std::prev(LastPopI);
2349 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2350 LastPopI = HomogeneousEpilog;
2351 }
2352
2353 // Adjust local stack
2354 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2356 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2357
2358 // SP has been already adjusted while restoring callee save regs.
2359 // We've bailed-out the case with adjusting SP for arguments.
2360 assert(AfterCSRPopSize == 0);
2361 return;
2362 }
2363 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2364 // Assume we can't combine the last pop with the sp restore.
2365 bool CombineAfterCSRBump = false;
2366 if (!CombineSPBump && PrologueSaveSize != 0) {
2368 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2370 Pop = std::prev(Pop);
2371 // Converting the last ldp to a post-index ldp is valid only if the last
2372 // ldp's offset is 0.
2373 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2374 // If the offset is 0 and the AfterCSR pop is not actually trying to
2375 // allocate more stack for arguments (in space that an untimely interrupt
2376 // may clobber), convert it to a post-index ldp.
2377 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2379 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2380 MachineInstr::FrameDestroy, PrologueSaveSize);
2381 } else {
2382 // If not, make sure to emit an add after the last ldp.
2383 // We're doing this by transfering the size to be restored from the
2384 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2385 // pops.
2386 AfterCSRPopSize += PrologueSaveSize;
2387 CombineAfterCSRBump = true;
2388 }
2389 }
2390
2391 // Move past the restores of the callee-saved registers.
2392 // If we plan on combining the sp bump of the local stack size and the callee
2393 // save stack size, we might need to adjust the CSR save and restore offsets.
2396 while (LastPopI != Begin) {
2397 --LastPopI;
2398 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2399 IsSVECalleeSave(LastPopI)) {
2400 ++LastPopI;
2401 break;
2402 } else if (CombineSPBump)
2404 NeedsWinCFI, &HasWinCFI);
2405 }
2406
2407 if (NeedsWinCFI) {
2408 // Note that there are cases where we insert SEH opcodes in the
2409 // epilogue when we had no SEH opcodes in the prologue. For
2410 // example, when there is no stack frame but there are stack
2411 // arguments. Insert the SEH_EpilogStart and remove it later if it
2412 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2413 // functions that don't need it.
2414 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2416 EpilogStartI = LastPopI;
2417 --EpilogStartI;
2418 }
2419
2420 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2423 // Avoid the reload as it is GOT relative, and instead fall back to the
2424 // hardcoded value below. This allows a mismatch between the OS and
2425 // application without immediately terminating on the difference.
2426 [[fallthrough]];
2428 // We need to reset FP to its untagged state on return. Bit 60 is
2429 // currently used to show the presence of an extended frame.
2430
2431 // BIC x29, x29, #0x1000_0000_0000_0000
2432 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2433 AArch64::FP)
2434 .addUse(AArch64::FP)
2435 .addImm(0x10fe)
2437 if (NeedsWinCFI) {
2438 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2440 HasWinCFI = true;
2441 }
2442 break;
2443
2445 break;
2446 }
2447 }
2448
2449 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2450
2451 // If there is a single SP update, insert it before the ret and we're done.
2452 if (CombineSPBump) {
2453 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2454
2455 // When we are about to restore the CSRs, the CFA register is SP again.
2456 if (EmitCFI && hasFP(MF)) {
2457 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2458 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2459 unsigned CFIIndex =
2460 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2461 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2462 .addCFIIndex(CFIIndex)
2464 }
2465
2466 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2467 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2468 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2469 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2470 return;
2471 }
2472
2473 NumBytes -= PrologueSaveSize;
2474 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2475
2476 // Process the SVE callee-saves to determine what space needs to be
2477 // deallocated.
2478 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2479 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2480 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2481 RestoreBegin = std::prev(RestoreEnd);
2482 while (RestoreBegin != MBB.begin() &&
2483 IsSVECalleeSave(std::prev(RestoreBegin)))
2484 --RestoreBegin;
2485
2486 assert(IsSVECalleeSave(RestoreBegin) &&
2487 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2488
2489 StackOffset CalleeSavedSizeAsOffset =
2490 StackOffset::getScalable(CalleeSavedSize);
2491 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2492 DeallocateAfter = CalleeSavedSizeAsOffset;
2493 }
2494
2495 // Deallocate the SVE area.
2496 if (SVEStackSize) {
2497 // If we have stack realignment or variable sized objects on the stack,
2498 // restore the stack pointer from the frame pointer prior to SVE CSR
2499 // restoration.
2500 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2501 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2502 // Set SP to start of SVE callee-save area from which they can
2503 // be reloaded. The code below will deallocate the stack space
2504 // space by moving FP -> SP.
2505 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2506 StackOffset::getScalable(-CalleeSavedSize), TII,
2508 }
2509 } else {
2510 if (AFI->getSVECalleeSavedStackSize()) {
2511 // Deallocate the non-SVE locals first before we can deallocate (and
2512 // restore callee saves) from the SVE area.
2514 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2516 false, false, nullptr, EmitCFI && !hasFP(MF),
2517 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2518 NumBytes = 0;
2519 }
2520
2521 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2522 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2523 false, nullptr, EmitCFI && !hasFP(MF),
2524 SVEStackSize +
2525 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2526
2527 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2528 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2529 false, nullptr, EmitCFI && !hasFP(MF),
2530 DeallocateAfter +
2531 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2532 }
2533 if (EmitCFI)
2534 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2535 }
2536
2537 if (!hasFP(MF)) {
2538 bool RedZone = canUseRedZone(MF);
2539 // If this was a redzone leaf function, we don't need to restore the
2540 // stack pointer (but we may need to pop stack args for fastcc).
2541 if (RedZone && AfterCSRPopSize == 0)
2542 return;
2543
2544 // Pop the local variables off the stack. If there are no callee-saved
2545 // registers, it means we are actually positioned at the terminator and can
2546 // combine stack increment for the locals and the stack increment for
2547 // callee-popped arguments into (possibly) a single instruction and be done.
2548 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2549 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2550 if (NoCalleeSaveRestore)
2551 StackRestoreBytes += AfterCSRPopSize;
2552
2554 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2555 StackOffset::getFixed(StackRestoreBytes), TII,
2556 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2557 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2558
2559 // If we were able to combine the local stack pop with the argument pop,
2560 // then we're done.
2561 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2562 return;
2563 }
2564
2565 NumBytes = 0;
2566 }
2567
2568 // Restore the original stack pointer.
2569 // FIXME: Rather than doing the math here, we should instead just use
2570 // non-post-indexed loads for the restores if we aren't actually going to
2571 // be able to save any instructions.
2572 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2574 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2576 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2577 } else if (NumBytes)
2578 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2579 StackOffset::getFixed(NumBytes), TII,
2580 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2581
2582 // When we are about to restore the CSRs, the CFA register is SP again.
2583 if (EmitCFI && hasFP(MF)) {
2584 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2585 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2586 unsigned CFIIndex = MF.addFrameInst(
2587 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2588 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2589 .addCFIIndex(CFIIndex)
2591 }
2592
2593 // This must be placed after the callee-save restore code because that code
2594 // assumes the SP is at the same location as it was after the callee-save save
2595 // code in the prologue.
2596 if (AfterCSRPopSize) {
2597 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2598 "interrupt may have clobbered");
2599
2601 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2603 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2604 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2605 }
2606}
2607
2610 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2611}
2612
2613/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2614/// debug info. It's the same as what we use for resolving the code-gen
2615/// references for now. FIXME: This can go wrong when references are
2616/// SP-relative and simple call frames aren't used.
2619 Register &FrameReg) const {
2621 MF, FI, FrameReg,
2622 /*PreferFP=*/
2623 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2624 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2625 /*ForSimm=*/false);
2626}
2627
2630 int FI) const {
2631 // This function serves to provide a comparable offset from a single reference
2632 // point (the value of SP at function entry) that can be used for analysis,
2633 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
2634 // correct for all objects in the presence of VLA-area objects or dynamic
2635 // stack re-alignment.
2636
2637 const auto &MFI = MF.getFrameInfo();
2638
2639 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2640 StackOffset SVEStackSize = getSVEStackSize(MF);
2641
2642 // For VLA-area objects, just emit an offset at the end of the stack frame.
2643 // Whilst not quite correct, these objects do live at the end of the frame and
2644 // so it is more useful for analysis for the offset to reflect this.
2645 if (MFI.isVariableSizedObjectIndex(FI)) {
2646 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
2647 }
2648
2649 // This is correct in the absence of any SVE stack objects.
2650 if (!SVEStackSize)
2651 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
2652
2653 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2654 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
2655 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
2656 ObjectOffset);
2657 }
2658
2659 bool IsFixed = MFI.isFixedObjectIndex(FI);
2660 bool IsCSR =
2661 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2662
2663 StackOffset ScalableOffset = {};
2664 if (!IsFixed && !IsCSR)
2665 ScalableOffset = -SVEStackSize;
2666
2667 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
2668}
2669
2672 int FI) const {
2674}
2675
2677 int64_t ObjectOffset) {
2678 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2679 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2680 const Function &F = MF.getFunction();
2681 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2682 unsigned FixedObject =
2683 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2684 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2685 int64_t FPAdjust =
2686 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2687 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2688}
2689
2691 int64_t ObjectOffset) {
2692 const auto &MFI = MF.getFrameInfo();
2693 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2694}
2695
2696// TODO: This function currently does not work for scalable vectors.
2698 int FI) const {
2699 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2701 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2702 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2703 ? getFPOffset(MF, ObjectOffset).getFixed()
2704 : getStackOffset(MF, ObjectOffset).getFixed();
2705}
2706
2708 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2709 bool ForSimm) const {
2710 const auto &MFI = MF.getFrameInfo();
2711 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2712 bool isFixed = MFI.isFixedObjectIndex(FI);
2713 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2714 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2715 PreferFP, ForSimm);
2716}
2717
2719 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2720 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2721 const auto &MFI = MF.getFrameInfo();
2722 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2724 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2725 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2726
2727 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2728 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2729 bool isCSR =
2730 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2731
2732 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2733
2734 // Use frame pointer to reference fixed objects. Use it for locals if
2735 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2736 // reliable as a base). Make sure useFPForScavengingIndex() does the
2737 // right thing for the emergency spill slot.
2738 bool UseFP = false;
2739 if (AFI->hasStackFrame() && !isSVE) {
2740 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2741 // there are scalable (SVE) objects in between the FP and the fixed-sized
2742 // objects.
2743 PreferFP &= !SVEStackSize;
2744
2745 // Note: Keeping the following as multiple 'if' statements rather than
2746 // merging to a single expression for readability.
2747 //
2748 // Argument access should always use the FP.
2749 if (isFixed) {
2750 UseFP = hasFP(MF);
2751 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2752 // References to the CSR area must use FP if we're re-aligning the stack
2753 // since the dynamically-sized alignment padding is between the SP/BP and
2754 // the CSR area.
2755 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2756 UseFP = true;
2757 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2758 // If the FPOffset is negative and we're producing a signed immediate, we
2759 // have to keep in mind that the available offset range for negative
2760 // offsets is smaller than for positive ones. If an offset is available
2761 // via the FP and the SP, use whichever is closest.
2762 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2763 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2764
2765 if (FPOffset >= 0) {
2766 // If the FPOffset is positive, that'll always be best, as the SP/BP
2767 // will be even further away.
2768 UseFP = true;
2769 } else if (MFI.hasVarSizedObjects()) {
2770 // If we have variable sized objects, we can use either FP or BP, as the
2771 // SP offset is unknown. We can use the base pointer if we have one and
2772 // FP is not preferred. If not, we're stuck with using FP.
2773 bool CanUseBP = RegInfo->hasBasePointer(MF);
2774 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2775 UseFP = PreferFP;
2776 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2777 UseFP = true;
2778 // else we can use BP and FP, but the offset from FP won't fit.
2779 // That will make us scavenge registers which we can probably avoid by
2780 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2781 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2782 // Funclets access the locals contained in the parent's stack frame
2783 // via the frame pointer, so we have to use the FP in the parent
2784 // function.
2785 (void) Subtarget;
2786 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2787 MF.getFunction().isVarArg()) &&
2788 "Funclets should only be present on Win64");
2789 UseFP = true;
2790 } else {
2791 // We have the choice between FP and (SP or BP).
2792 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2793 UseFP = true;
2794 }
2795 }
2796 }
2797
2798 assert(
2799 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2800 "In the presence of dynamic stack pointer realignment, "
2801 "non-argument/CSR objects cannot be accessed through the frame pointer");
2802
2803 if (isSVE) {
2804 StackOffset FPOffset =
2806 StackOffset SPOffset =
2807 SVEStackSize +
2808 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2809 ObjectOffset);
2810 // Always use the FP for SVE spills if available and beneficial.
2811 if (hasFP(MF) && (SPOffset.getFixed() ||
2812 FPOffset.getScalable() < SPOffset.getScalable() ||
2813 RegInfo->hasStackRealignment(MF))) {
2814 FrameReg = RegInfo->getFrameRegister(MF);
2815 return FPOffset;
2816 }
2817
2818 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2819 : (unsigned)AArch64::SP;
2820 return SPOffset;
2821 }
2822
2823 StackOffset ScalableOffset = {};
2824 if (UseFP && !(isFixed || isCSR))
2825 ScalableOffset = -SVEStackSize;
2826 if (!UseFP && (isFixed || isCSR))
2827 ScalableOffset = SVEStackSize;
2828
2829 if (UseFP) {
2830 FrameReg = RegInfo->getFrameRegister(MF);
2831 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2832 }
2833
2834 // Use the base pointer if we have one.
2835 if (RegInfo->hasBasePointer(MF))
2836 FrameReg = RegInfo->getBaseRegister();
2837 else {
2838 assert(!MFI.hasVarSizedObjects() &&
2839 "Can't use SP when we have var sized objects.");
2840 FrameReg = AArch64::SP;
2841 // If we're using the red zone for this function, the SP won't actually
2842 // be adjusted, so the offsets will be negative. They're also all
2843 // within range of the signed 9-bit immediate instructions.
2844 if (canUseRedZone(MF))
2845 Offset -= AFI->getLocalStackSize();
2846 }
2847
2848 return StackOffset::getFixed(Offset) + ScalableOffset;
2849}
2850
2851static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2852 // Do not set a kill flag on values that are also marked as live-in. This
2853 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2854 // callee saved registers.
2855 // Omitting the kill flags is conservatively correct even if the live-in
2856 // is not used after all.
2857 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2858 return getKillRegState(!IsLiveIn);
2859}
2860
2862 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2865 return Subtarget.isTargetMachO() &&
2866 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2867 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2869 !requiresSaveVG(MF) && AFI->getSVECalleeSavedStackSize() == 0;
2870}
2871
2872static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2873 bool NeedsWinCFI, bool IsFirst,
2874 const TargetRegisterInfo *TRI) {
2875 // If we are generating register pairs for a Windows function that requires
2876 // EH support, then pair consecutive registers only. There are no unwind
2877 // opcodes for saves/restores of non-consectuve register pairs.
2878 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2879 // save_lrpair.
2880 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2881
2882 if (Reg2 == AArch64::FP)
2883 return true;
2884 if (!NeedsWinCFI)
2885 return false;
2886 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2887 return false;
2888 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2889 // opcode. If this is the first register pair, it would end up with a
2890 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2891 // if LR is paired with something else than the first register.
2892 // The save_lrpair opcode requires the first register to be an odd one.
2893 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2894 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2895 return false;
2896 return true;
2897}
2898
2899/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2900/// WindowsCFI requires that only consecutive registers can be paired.
2901/// LR and FP need to be allocated together when the frame needs to save
2902/// the frame-record. This means any other register pairing with LR is invalid.
2903static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2904 bool UsesWinAAPCS, bool NeedsWinCFI,
2905 bool NeedsFrameRecord, bool IsFirst,
2906 const TargetRegisterInfo *TRI) {
2907 if (UsesWinAAPCS)
2908 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2909 TRI);
2910
2911 // If we need to store the frame record, don't pair any register
2912 // with LR other than FP.
2913 if (NeedsFrameRecord)
2914 return Reg2 == AArch64::LR;
2915
2916 return false;
2917}
2918
2919namespace {
2920
2921struct RegPairInfo {
2922 unsigned Reg1 = AArch64::NoRegister;
2923 unsigned Reg2 = AArch64::NoRegister;
2924 int FrameIdx;
2925 int Offset;
2926 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2927 const TargetRegisterClass *RC;
2928
2929 RegPairInfo() = default;
2930
2931 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2932
2933 bool isScalable() const { return Type == PPR || Type == ZPR; }
2934};
2935
2936} // end anonymous namespace
2937
2938unsigned findFreePredicateReg(BitVector &SavedRegs) {
2939 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2940 if (SavedRegs.test(PReg)) {
2941 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2942 return PNReg;
2943 }
2944 }
2945 return AArch64::NoRegister;
2946}
2947
2948// The multivector LD/ST are available only for SME or SVE2p1 targets
2950 MachineFunction &MF) {
2952 return false;
2953
2954 SMEAttrs FuncAttrs(MF.getFunction());
2955 bool IsLocallyStreaming =
2956 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
2957
2958 // Only when in streaming mode SME2 instructions can be safely used.
2959 // It is not safe to use SME2 instructions when in streaming compatible or
2960 // locally streaming mode.
2961 return Subtarget.hasSVE2p1() ||
2962 (Subtarget.hasSME2() &&
2963 (!IsLocallyStreaming && Subtarget.isStreaming()));
2964}
2965
2969 bool NeedsFrameRecord) {
2970
2971 if (CSI.empty())
2972 return;
2973
2974 bool IsWindows = isTargetWindows(MF);
2975 bool NeedsWinCFI = needsWinCFI(MF);
2977 unsigned StackHazardSize = getStackHazardSize(MF);
2978 MachineFrameInfo &MFI = MF.getFrameInfo();
2980 unsigned Count = CSI.size();
2981 (void)CC;
2982 // MachO's compact unwind format relies on all registers being stored in
2983 // pairs.
2986 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2987 "Odd number of callee-saved regs to spill!");
2988 int ByteOffset = AFI->getCalleeSavedStackSize();
2989 int StackFillDir = -1;
2990 int RegInc = 1;
2991 unsigned FirstReg = 0;
2992 if (NeedsWinCFI) {
2993 // For WinCFI, fill the stack from the bottom up.
2994 ByteOffset = 0;
2995 StackFillDir = 1;
2996 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2997 // backwards, to pair up registers starting from lower numbered registers.
2998 RegInc = -1;
2999 FirstReg = Count - 1;
3000 }
3001 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
3002 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
3003 Register LastReg = 0;
3004
3005 // When iterating backwards, the loop condition relies on unsigned wraparound.
3006 for (unsigned i = FirstReg; i < Count; i += RegInc) {
3007 RegPairInfo RPI;
3008 RPI.Reg1 = CSI[i].getReg();
3009
3010 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
3011 RPI.Type = RegPairInfo::GPR;
3012 RPI.RC = &AArch64::GPR64RegClass;
3013 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
3014 RPI.Type = RegPairInfo::FPR64;
3015 RPI.RC = &AArch64::FPR64RegClass;
3016 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
3017 RPI.Type = RegPairInfo::FPR128;
3018 RPI.RC = &AArch64::FPR128RegClass;
3019 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
3020 RPI.Type = RegPairInfo::ZPR;
3021 RPI.RC = &AArch64::ZPRRegClass;
3022 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
3023 RPI.Type = RegPairInfo::PPR;
3024 RPI.RC = &AArch64::PPRRegClass;
3025 } else if (RPI.Reg1 == AArch64::VG) {
3026 RPI.Type = RegPairInfo::VG;
3027 RPI.RC = &AArch64::FIXED_REGSRegClass;
3028 } else {
3029 llvm_unreachable("Unsupported register class.");
3030 }
3031
3032 // Add the stack hazard size as we transition from GPR->FPR CSRs.
3033 if (AFI->hasStackHazardSlotIndex() &&
3034 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3036 ByteOffset += StackFillDir * StackHazardSize;
3037 LastReg = RPI.Reg1;
3038
3039 int Scale = TRI->getSpillSize(*RPI.RC);
3040 // Add the next reg to the pair if it is in the same register class.
3041 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
3042 Register NextReg = CSI[i + RegInc].getReg();
3043 bool IsFirst = i == FirstReg;
3044 switch (RPI.Type) {
3045 case RegPairInfo::GPR:
3046 if (AArch64::GPR64RegClass.contains(NextReg) &&
3047 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
3048 NeedsWinCFI, NeedsFrameRecord, IsFirst,
3049 TRI))
3050 RPI.Reg2 = NextReg;
3051 break;
3052 case RegPairInfo::FPR64:
3053 if (AArch64::FPR64RegClass.contains(NextReg) &&
3054 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
3055 IsFirst, TRI))
3056 RPI.Reg2 = NextReg;
3057 break;
3058 case RegPairInfo::FPR128:
3059 if (AArch64::FPR128RegClass.contains(NextReg))
3060 RPI.Reg2 = NextReg;
3061 break;
3062 case RegPairInfo::PPR:
3063 break;
3064 case RegPairInfo::ZPR:
3065 if (AFI->getPredicateRegForFillSpill() != 0 &&
3066 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
3067 // Calculate offset of register pair to see if pair instruction can be
3068 // used.
3069 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
3070 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
3071 RPI.Reg2 = NextReg;
3072 }
3073 break;
3074 case RegPairInfo::VG:
3075 break;
3076 }
3077 }
3078
3079 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
3080 // list to come in sorted by frame index so that we can issue the store
3081 // pair instructions directly. Assert if we see anything otherwise.
3082 //
3083 // The order of the registers in the list is controlled by
3084 // getCalleeSavedRegs(), so they will always be in-order, as well.
3085 assert((!RPI.isPaired() ||
3086 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3087 "Out of order callee saved regs!");
3088
3089 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3090 RPI.Reg1 == AArch64::LR) &&
3091 "FrameRecord must be allocated together with LR");
3092
3093 // Windows AAPCS has FP and LR reversed.
3094 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3095 RPI.Reg2 == AArch64::LR) &&
3096 "FrameRecord must be allocated together with LR");
3097
3098 // MachO's compact unwind format relies on all registers being stored in
3099 // adjacent register pairs.
3103 (RPI.isPaired() &&
3104 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3105 RPI.Reg1 + 1 == RPI.Reg2))) &&
3106 "Callee-save registers not saved as adjacent register pair!");
3107
3108 RPI.FrameIdx = CSI[i].getFrameIdx();
3109 if (NeedsWinCFI &&
3110 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3111 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3112
3113 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3114 assert(OffsetPre % Scale == 0);
3115
3116 if (RPI.isScalable())
3117 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3118 else
3119 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3120
3121 // Swift's async context is directly before FP, so allocate an extra
3122 // 8 bytes for it.
3123 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3124 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3125 (IsWindows && RPI.Reg2 == AArch64::LR)))
3126 ByteOffset += StackFillDir * 8;
3127
3128 // Round up size of non-pair to pair size if we need to pad the
3129 // callee-save area to ensure 16-byte alignment.
3130 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3131 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3132 ByteOffset % 16 != 0) {
3133 ByteOffset += 8 * StackFillDir;
3134 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3135 // A stack frame with a gap looks like this, bottom up:
3136 // d9, d8. x21, gap, x20, x19.
3137 // Set extra alignment on the x21 object to create the gap above it.
3138 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3139 NeedGapToAlignStack = false;
3140 }
3141
3142 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3143 assert(OffsetPost % Scale == 0);
3144 // If filling top down (default), we want the offset after incrementing it.
3145 // If filling bottom up (WinCFI) we need the original offset.
3146 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3147
3148 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3149 // Swift context can directly precede FP.
3150 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3151 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3152 (IsWindows && RPI.Reg2 == AArch64::LR)))
3153 Offset += 8;
3154 RPI.Offset = Offset / Scale;
3155
3156 assert((!RPI.isPaired() ||
3157 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3158 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3159 "Offset out of bounds for LDP/STP immediate");
3160
3161 auto isFrameRecord = [&] {
3162 if (RPI.isPaired())
3163 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
3164 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
3165 // Otherwise, look for the frame record as two unpaired registers. This is
3166 // needed for -aarch64-stack-hazard-size=<val>, which disables register
3167 // pairing (as the padding may be too large for the LDP/STP offset). Note:
3168 // On Windows, this check works out as current reg == FP, next reg == LR,
3169 // and on other platforms current reg == FP, previous reg == LR. This
3170 // works out as the correct pre-increment or post-increment offsets
3171 // respectively.
3172 return i > 0 && RPI.Reg1 == AArch64::FP &&
3173 CSI[i - 1].getReg() == AArch64::LR;
3174 };
3175
3176 // Save the offset to frame record so that the FP register can point to the
3177 // innermost frame record (spilled FP and LR registers).
3178 if (NeedsFrameRecord && isFrameRecord())
3180
3181 RegPairs.push_back(RPI);
3182 if (RPI.isPaired())
3183 i += RegInc;
3184 }
3185 if (NeedsWinCFI) {
3186 // If we need an alignment gap in the stack, align the topmost stack
3187 // object. A stack frame with a gap looks like this, bottom up:
3188 // x19, d8. d9, gap.
3189 // Set extra alignment on the topmost stack object (the first element in
3190 // CSI, which goes top down), to create the gap above it.
3191 if (AFI->hasCalleeSaveStackFreeSpace())
3192 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3193 // We iterated bottom up over the registers; flip RegPairs back to top
3194 // down order.
3195 std::reverse(RegPairs.begin(), RegPairs.end());
3196 }
3197}
3198
3202 MachineFunction &MF = *MBB.getParent();
3205 bool NeedsWinCFI = needsWinCFI(MF);
3206 DebugLoc DL;
3208
3209 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3210
3212 // Refresh the reserved regs in case there are any potential changes since the
3213 // last freeze.
3214 MRI.freezeReservedRegs();
3215
3216 if (homogeneousPrologEpilog(MF)) {
3217 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3219
3220 for (auto &RPI : RegPairs) {
3221 MIB.addReg(RPI.Reg1);
3222 MIB.addReg(RPI.Reg2);
3223
3224 // Update register live in.
3225 if (!MRI.isReserved(RPI.Reg1))
3226 MBB.addLiveIn(RPI.Reg1);
3227 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3228 MBB.addLiveIn(RPI.Reg2);
3229 }
3230 return true;
3231 }
3232 bool PTrueCreated = false;
3233 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3234 unsigned Reg1 = RPI.Reg1;
3235 unsigned Reg2 = RPI.Reg2;
3236 unsigned StrOpc;
3237
3238 // Issue sequence of spills for cs regs. The first spill may be converted
3239 // to a pre-decrement store later by emitPrologue if the callee-save stack
3240 // area allocation can't be combined with the local stack area allocation.
3241 // For example:
3242 // stp x22, x21, [sp, #0] // addImm(+0)
3243 // stp x20, x19, [sp, #16] // addImm(+2)
3244 // stp fp, lr, [sp, #32] // addImm(+4)
3245 // Rationale: This sequence saves uop updates compared to a sequence of
3246 // pre-increment spills like stp xi,xj,[sp,#-16]!
3247 // Note: Similar rationale and sequence for restores in epilog.
3248 unsigned Size = TRI->getSpillSize(*RPI.RC);
3249 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3250 switch (RPI.Type) {
3251 case RegPairInfo::GPR:
3252 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3253 break;
3254 case RegPairInfo::FPR64:
3255 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3256 break;
3257 case RegPairInfo::FPR128:
3258 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3259 break;
3260 case RegPairInfo::ZPR:
3261 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3262 break;
3263 case RegPairInfo::PPR:
3264 StrOpc = AArch64::STR_PXI;
3265 break;
3266 case RegPairInfo::VG:
3267 StrOpc = AArch64::STRXui;
3268 break;
3269 }
3270
3271 unsigned X0Scratch = AArch64::NoRegister;
3272 if (Reg1 == AArch64::VG) {
3273 // Find an available register to store value of VG to.
3275 assert(Reg1 != AArch64::NoRegister);
3276 SMEAttrs Attrs(MF.getFunction());
3277
3278 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3279 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3280 // For locally-streaming functions, we need to store both the streaming
3281 // & non-streaming VG. Spill the streaming value first.
3282 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3283 .addImm(1)
3285 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3286 .addReg(Reg1)
3287 .addImm(3)
3288 .addImm(63)
3290
3291 AFI->setStreamingVGIdx(RPI.FrameIdx);
3292 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3293 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3294 .addImm(31)
3295 .addImm(1)
3297 AFI->setVGIdx(RPI.FrameIdx);
3298 } else {
3300 if (llvm::any_of(
3301 MBB.liveins(),
3302 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3303 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3304 AArch64::X0, LiveIn.PhysReg);
3305 }))
3306 X0Scratch = Reg1;
3307
3308 if (X0Scratch != AArch64::NoRegister)
3309 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3310 .addReg(AArch64::XZR)
3311 .addReg(AArch64::X0, RegState::Undef)
3312 .addReg(AArch64::X0, RegState::Implicit)
3314
3315 const uint32_t *RegMask = TRI->getCallPreservedMask(
3316 MF,
3318 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3319 .addExternalSymbol("__arm_get_current_vg")
3320 .addRegMask(RegMask)
3321 .addReg(AArch64::X0, RegState::ImplicitDefine)
3323 Reg1 = AArch64::X0;
3324 AFI->setVGIdx(RPI.FrameIdx);
3325 }
3326 }
3327
3328 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3329 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3330 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3331 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3332 dbgs() << ")\n");
3333
3334 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3335 "Windows unwdinding requires a consecutive (FP,LR) pair");
3336 // Windows unwind codes require consecutive registers if registers are
3337 // paired. Make the switch here, so that the code below will save (x,x+1)
3338 // and not (x+1,x).
3339 unsigned FrameIdxReg1 = RPI.FrameIdx;
3340 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3341 if (NeedsWinCFI && RPI.isPaired()) {
3342 std::swap(Reg1, Reg2);
3343 std::swap(FrameIdxReg1, FrameIdxReg2);
3344 }
3345
3346 if (RPI.isPaired() && RPI.isScalable()) {
3347 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3350 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3351 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3352 "Expects SVE2.1 or SME2 target and a predicate register");
3353#ifdef EXPENSIVE_CHECKS
3354 auto IsPPR = [](const RegPairInfo &c) {
3355 return c.Reg1 == RegPairInfo::PPR;
3356 };
3357 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3358 auto IsZPR = [](const RegPairInfo &c) {
3359 return c.Type == RegPairInfo::ZPR;
3360 };
3361 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3362 assert(!(PPRBegin < ZPRBegin) &&
3363 "Expected callee save predicate to be handled first");
3364#endif
3365 if (!PTrueCreated) {
3366 PTrueCreated = true;
3367 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3369 }
3370 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3371 if (!MRI.isReserved(Reg1))
3372 MBB.addLiveIn(Reg1);
3373 if (!MRI.isReserved(Reg2))
3374 MBB.addLiveIn(Reg2);
3375 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3377 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3378 MachineMemOperand::MOStore, Size, Alignment));
3379 MIB.addReg(PnReg);
3380 MIB.addReg(AArch64::SP)
3381 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
3382 // where 2*vscale is implicit
3385 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3386 MachineMemOperand::MOStore, Size, Alignment));
3387 if (NeedsWinCFI)
3389 } else { // The code when the pair of ZReg is not present
3390 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3391 if (!MRI.isReserved(Reg1))
3392 MBB.addLiveIn(Reg1);
3393 if (RPI.isPaired()) {
3394 if (!MRI.isReserved(Reg2))
3395 MBB.addLiveIn(Reg2);
3396 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3398 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3399 MachineMemOperand::MOStore, Size, Alignment));
3400 }
3401 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3402 .addReg(AArch64::SP)
3403 .addImm(RPI.Offset) // [sp, #offset*vscale],
3404 // where factor*vscale is implicit
3407 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3408 MachineMemOperand::MOStore, Size, Alignment));
3409 if (NeedsWinCFI)
3411 }
3412 // Update the StackIDs of the SVE stack slots.
3413 MachineFrameInfo &MFI = MF.getFrameInfo();
3414 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3415 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3416 if (RPI.isPaired())
3417 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3418 }
3419
3420 if (X0Scratch != AArch64::NoRegister)
3421 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3422 .addReg(AArch64::XZR)
3423 .addReg(X0Scratch, RegState::Undef)
3424 .addReg(X0Scratch, RegState::Implicit)
3426 }
3427 return true;
3428}
3429
3433 MachineFunction &MF = *MBB.getParent();
3435 DebugLoc DL;
3437 bool NeedsWinCFI = needsWinCFI(MF);
3438
3439 if (MBBI != MBB.end())
3440 DL = MBBI->getDebugLoc();
3441
3442 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3443 if (homogeneousPrologEpilog(MF, &MBB)) {
3444 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3446 for (auto &RPI : RegPairs) {
3447 MIB.addReg(RPI.Reg1, RegState::Define);
3448 MIB.addReg(RPI.Reg2, RegState::Define);
3449 }
3450 return true;
3451 }
3452
3453 // For performance reasons restore SVE register in increasing order
3454 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3455 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3456 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3457 std::reverse(PPRBegin, PPREnd);
3458 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3459 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3460 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3461 std::reverse(ZPRBegin, ZPREnd);
3462
3463 bool PTrueCreated = false;
3464 for (const RegPairInfo &RPI : RegPairs) {
3465 unsigned Reg1 = RPI.Reg1;
3466 unsigned Reg2 = RPI.Reg2;
3467
3468 // Issue sequence of restores for cs regs. The last restore may be converted
3469 // to a post-increment load later by emitEpilogue if the callee-save stack
3470 // area allocation can't be combined with the local stack area allocation.
3471 // For example:
3472 // ldp fp, lr, [sp, #32] // addImm(+4)
3473 // ldp x20, x19, [sp, #16] // addImm(+2)
3474 // ldp x22, x21, [sp, #0] // addImm(+0)
3475 // Note: see comment in spillCalleeSavedRegisters()
3476 unsigned LdrOpc;
3477 unsigned Size = TRI->getSpillSize(*RPI.RC);
3478 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3479 switch (RPI.Type) {
3480 case RegPairInfo::GPR:
3481 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3482 break;
3483 case RegPairInfo::FPR64:
3484 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3485 break;
3486 case RegPairInfo::FPR128:
3487 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3488 break;
3489 case RegPairInfo::ZPR:
3490 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3491 break;
3492 case RegPairInfo::PPR:
3493 LdrOpc = AArch64::LDR_PXI;
3494 break;
3495 case RegPairInfo::VG:
3496 continue;
3497 }
3498 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3499 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3500 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3501 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3502 dbgs() << ")\n");
3503
3504 // Windows unwind codes require consecutive registers if registers are
3505 // paired. Make the switch here, so that the code below will save (x,x+1)
3506 // and not (x+1,x).
3507 unsigned FrameIdxReg1 = RPI.FrameIdx;
3508 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3509 if (NeedsWinCFI && RPI.isPaired()) {
3510 std::swap(Reg1, Reg2);
3511 std::swap(FrameIdxReg1, FrameIdxReg2);
3512 }
3513
3515 if (RPI.isPaired() && RPI.isScalable()) {
3516 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3518 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3519 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3520 "Expects SVE2.1 or SME2 target and a predicate register");
3521#ifdef EXPENSIVE_CHECKS
3522 assert(!(PPRBegin < ZPRBegin) &&
3523 "Expected callee save predicate to be handled first");
3524#endif
3525 if (!PTrueCreated) {
3526 PTrueCreated = true;
3527 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3529 }
3530 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3531 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3532 getDefRegState(true));
3534 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3535 MachineMemOperand::MOLoad, Size, Alignment));
3536 MIB.addReg(PnReg);
3537 MIB.addReg(AArch64::SP)
3538 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
3539 // where 2*vscale is implicit
3542 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3543 MachineMemOperand::MOLoad, Size, Alignment));
3544 if (NeedsWinCFI)
3546 } else {
3547 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3548 if (RPI.isPaired()) {
3549 MIB.addReg(Reg2, getDefRegState(true));
3551 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3552 MachineMemOperand::MOLoad, Size, Alignment));
3553 }
3554 MIB.addReg(Reg1, getDefRegState(true));
3555 MIB.addReg(AArch64::SP)
3556 .addImm(RPI.Offset) // [sp, #offset*vscale]
3557 // where factor*vscale is implicit
3560 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3561 MachineMemOperand::MOLoad, Size, Alignment));
3562 if (NeedsWinCFI)
3564 }
3565 }
3566 return true;
3567}
3568
3569// Return the FrameID for a MMO.
3570static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
3571 const MachineFrameInfo &MFI) {
3572 auto *PSV =
3573 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3574 if (PSV)
3575 return std::optional<int>(PSV->getFrameIndex());
3576
3577 if (MMO->getValue()) {
3578 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3579 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3580 FI++)
3581 if (MFI.getObjectAllocation(FI) == Al)
3582 return FI;
3583 }
3584 }
3585
3586 return std::nullopt;
3587}
3588
3589// Return the FrameID for a Load/Store instruction by looking at the first MMO.
3590static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3591 const MachineFrameInfo &MFI) {
3592 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3593 return std::nullopt;
3594
3595 return getMMOFrameID(*MI.memoperands_begin(), MFI);
3596}
3597
3598// Check if a Hazard slot is needed for the current function, and if so create
3599// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3600// which can be used to determine if any hazard padding is needed.
3601void AArch64FrameLowering::determineStackHazardSlot(
3602 MachineFunction &MF, BitVector &SavedRegs) const {
3603 unsigned StackHazardSize = getStackHazardSize(MF);
3604 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3606 return;
3607
3608 // Stack hazards are only needed in streaming functions.
3610 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3611 return;
3612
3613 MachineFrameInfo &MFI = MF.getFrameInfo();
3614
3615 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3616 // stack objects.
3617 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3618 return AArch64::FPR64RegClass.contains(Reg) ||
3619 AArch64::FPR128RegClass.contains(Reg) ||
3620 AArch64::ZPRRegClass.contains(Reg) ||
3621 AArch64::PPRRegClass.contains(Reg);
3622 });
3623 bool HasFPRStackObjects = false;
3624 if (!HasFPRCSRs) {
3625 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3626 for (auto &MBB : MF) {
3627 for (auto &MI : MBB) {
3628 std::optional<int> FI = getLdStFrameID(MI, MFI);
3629 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3630 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3632 FrameObjects[*FI] |= 2;
3633 else
3634 FrameObjects[*FI] |= 1;
3635 }
3636 }
3637 }
3638 HasFPRStackObjects =
3639 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3640 }
3641
3642 if (HasFPRCSRs || HasFPRStackObjects) {
3643 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3644 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3645 << StackHazardSize << "\n");
3646 MF.getInfo<AArch64FunctionInfo>()->setStackHazardSlotIndex(ID);
3647 }
3648}
3649
3651 BitVector &SavedRegs,
3652 RegScavenger *RS) const {
3653 // All calls are tail calls in GHC calling conv, and functions have no
3654 // prologue/epilogue.
3656 return;
3657
3659 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3661 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3663 unsigned UnspilledCSGPR = AArch64::NoRegister;
3664 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3665
3666 MachineFrameInfo &MFI = MF.getFrameInfo();
3667 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3668
3669 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3670 ? RegInfo->getBaseRegister()
3671 : (unsigned)AArch64::NoRegister;
3672
3673 unsigned ExtraCSSpill = 0;
3674 bool HasUnpairedGPR64 = false;
3675 bool HasPairZReg = false;
3676 // Figure out which callee-saved registers to save/restore.
3677 for (unsigned i = 0; CSRegs[i]; ++i) {
3678 const unsigned Reg = CSRegs[i];
3679
3680 // Add the base pointer register to SavedRegs if it is callee-save.
3681 if (Reg == BasePointerReg)
3682 SavedRegs.set(Reg);
3683
3684 bool RegUsed = SavedRegs.test(Reg);
3685 unsigned PairedReg = AArch64::NoRegister;
3686 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3687 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3688 AArch64::FPR128RegClass.contains(Reg)) {
3689 // Compensate for odd numbers of GP CSRs.
3690 // For now, all the known cases of odd number of CSRs are of GPRs.
3691 if (HasUnpairedGPR64)
3692 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3693 else
3694 PairedReg = CSRegs[i ^ 1];
3695 }
3696
3697 // If the function requires all the GP registers to save (SavedRegs),
3698 // and there are an odd number of GP CSRs at the same time (CSRegs),
3699 // PairedReg could be in a different register class from Reg, which would
3700 // lead to a FPR (usually D8) accidentally being marked saved.
3701 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3702 PairedReg = AArch64::NoRegister;
3703 HasUnpairedGPR64 = true;
3704 }
3705 assert(PairedReg == AArch64::NoRegister ||
3706 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3707 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3708 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3709
3710 if (!RegUsed) {
3711 if (AArch64::GPR64RegClass.contains(Reg) &&
3712 !RegInfo->isReservedReg(MF, Reg)) {
3713 UnspilledCSGPR = Reg;
3714 UnspilledCSGPRPaired = PairedReg;
3715 }
3716 continue;
3717 }
3718
3719 // MachO's compact unwind format relies on all registers being stored in
3720 // pairs.
3721 // FIXME: the usual format is actually better if unwinding isn't needed.
3722 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3723 !SavedRegs.test(PairedReg)) {
3724 SavedRegs.set(PairedReg);
3725 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3726 !RegInfo->isReservedReg(MF, PairedReg))
3727 ExtraCSSpill = PairedReg;
3728 }
3729 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3730 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3731 SavedRegs.test(CSRegs[i ^ 1]));
3732 }
3733
3734 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
3736 // Find a suitable predicate register for the multi-vector spill/fill
3737 // instructions.
3738 unsigned PnReg = findFreePredicateReg(SavedRegs);
3739 if (PnReg != AArch64::NoRegister)
3740 AFI->setPredicateRegForFillSpill(PnReg);
3741 // If no free callee-save has been found assign one.
3742 if (!AFI->getPredicateRegForFillSpill() &&
3743 MF.getFunction().getCallingConv() ==
3745 SavedRegs.set(AArch64::P8);
3746 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3747 }
3748
3749 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3750 "Predicate cannot be a reserved register");
3751 }
3752
3754 !Subtarget.isTargetWindows()) {
3755 // For Windows calling convention on a non-windows OS, where X18 is treated
3756 // as reserved, back up X18 when entering non-windows code (marked with the
3757 // Windows calling convention) and restore when returning regardless of
3758 // whether the individual function uses it - it might call other functions
3759 // that clobber it.
3760 SavedRegs.set(AArch64::X18);
3761 }
3762
3763 // Calculates the callee saved stack size.
3764 unsigned CSStackSize = 0;
3765 unsigned SVECSStackSize = 0;
3767 for (unsigned Reg : SavedRegs.set_bits()) {
3768 auto *RC = TRI->getMinimalPhysRegClass(Reg);
3769 assert(RC && "expected register class!");
3770 auto SpillSize = TRI->getSpillSize(*RC);
3771 if (AArch64::PPRRegClass.contains(Reg) ||
3772 AArch64::ZPRRegClass.contains(Reg))
3773 SVECSStackSize += SpillSize;
3774 else
3775 CSStackSize += SpillSize;
3776 }
3777
3778 // Increase the callee-saved stack size if the function has streaming mode
3779 // changes, as we will need to spill the value of the VG register.
3780 // For locally streaming functions, we spill both the streaming and
3781 // non-streaming VG value.
3782 const Function &F = MF.getFunction();
3783 SMEAttrs Attrs(F);
3784 if (requiresSaveVG(MF)) {
3785 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3786 CSStackSize += 16;
3787 else
3788 CSStackSize += 8;
3789 }
3790
3791 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3792 // StackHazardSize if so.
3793 determineStackHazardSlot(MF, SavedRegs);
3794 if (AFI->hasStackHazardSlotIndex())
3795 CSStackSize += getStackHazardSize(MF);
3796
3797 // Save number of saved regs, so we can easily update CSStackSize later.
3798 unsigned NumSavedRegs = SavedRegs.count();
3799
3800 // The frame record needs to be created by saving the appropriate registers
3801 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3802 if (hasFP(MF) ||
3803 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3804 SavedRegs.set(AArch64::FP);
3805 SavedRegs.set(AArch64::LR);
3806 }
3807
3808 LLVM_DEBUG({
3809 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3810 for (unsigned Reg : SavedRegs.set_bits())
3811 dbgs() << ' ' << printReg(Reg, RegInfo);
3812 dbgs() << "\n";
3813 });
3814
3815 // If any callee-saved registers are used, the frame cannot be eliminated.
3816 int64_t SVEStackSize =
3817 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3818 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3819
3820 // The CSR spill slots have not been allocated yet, so estimateStackSize
3821 // won't include them.
3822 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3823
3824 // We may address some of the stack above the canonical frame address, either
3825 // for our own arguments or during a call. Include that in calculating whether
3826 // we have complicated addressing concerns.
3827 int64_t CalleeStackUsed = 0;
3828 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3829 int64_t FixedOff = MFI.getObjectOffset(I);
3830 if (FixedOff > CalleeStackUsed)
3831 CalleeStackUsed = FixedOff;
3832 }
3833
3834 // Conservatively always assume BigStack when there are SVE spills.
3835 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3836 CalleeStackUsed) > EstimatedStackSizeLimit;
3837 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3838 AFI->setHasStackFrame(true);
3839
3840 // Estimate if we might need to scavenge a register at some point in order
3841 // to materialize a stack offset. If so, either spill one additional
3842 // callee-saved register or reserve a special spill slot to facilitate
3843 // register scavenging. If we already spilled an extra callee-saved register
3844 // above to keep the number of spills even, we don't need to do anything else
3845 // here.
3846 if (BigStack) {
3847 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3848 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3849 << " to get a scratch register.\n");
3850 SavedRegs.set(UnspilledCSGPR);
3851 ExtraCSSpill = UnspilledCSGPR;
3852
3853 // MachO's compact unwind format relies on all registers being stored in
3854 // pairs, so if we need to spill one extra for BigStack, then we need to
3855 // store the pair.
3856 if (producePairRegisters(MF)) {
3857 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3858 // Failed to make a pair for compact unwind format, revert spilling.
3859 if (produceCompactUnwindFrame(MF)) {
3860 SavedRegs.reset(UnspilledCSGPR);
3861 ExtraCSSpill = AArch64::NoRegister;
3862 }
3863 } else
3864 SavedRegs.set(UnspilledCSGPRPaired);
3865 }
3866 }
3867
3868 // If we didn't find an extra callee-saved register to spill, create
3869 // an emergency spill slot.
3870 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3872 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3873 unsigned Size = TRI->getSpillSize(RC);
3874 Align Alignment = TRI->getSpillAlign(RC);
3875 int FI = MFI.CreateSpillStackObject(Size, Alignment);
3877 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3878 << " as the emergency spill slot.\n");
3879 }
3880 }
3881
3882 // Adding the size of additional 64bit GPR saves.
3883 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3884
3885 // A Swift asynchronous context extends the frame record with a pointer
3886 // directly before FP.
3887 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3888 CSStackSize += 8;
3889
3890 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3891 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3892 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3893
3895 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3896 "Should not invalidate callee saved info");
3897
3898 // Round up to register pair alignment to avoid additional SP adjustment
3899 // instructions.
3900 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3901 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3902 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3903}
3904
3906 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3907 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3908 unsigned &MaxCSFrameIndex) const {
3909 bool NeedsWinCFI = needsWinCFI(MF);
3910 unsigned StackHazardSize = getStackHazardSize(MF);
3911 // To match the canonical windows frame layout, reverse the list of
3912 // callee saved registers to get them laid out by PrologEpilogInserter
3913 // in the right order. (PrologEpilogInserter allocates stack objects top
3914 // down. Windows canonical prologs store higher numbered registers at
3915 // the top, thus have the CSI array start from the highest registers.)
3916 if (NeedsWinCFI)
3917 std::reverse(CSI.begin(), CSI.end());
3918
3919 if (CSI.empty())
3920 return true; // Early exit if no callee saved registers are modified!
3921
3922 // Now that we know which registers need to be saved and restored, allocate
3923 // stack slots for them.
3924 MachineFrameInfo &MFI = MF.getFrameInfo();
3925 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3926
3927 bool UsesWinAAPCS = isTargetWindows(MF);
3928 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3929 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3930 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3931 if ((unsigned)FrameIdx < MinCSFrameIndex)
3932 MinCSFrameIndex = FrameIdx;
3933 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3934 MaxCSFrameIndex = FrameIdx;
3935 }
3936
3937 // Insert VG into the list of CSRs, immediately before LR if saved.
3938 if (requiresSaveVG(MF)) {
3939 std::vector<CalleeSavedInfo> VGSaves;
3940 SMEAttrs Attrs(MF.getFunction());
3941
3942 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3943 VGInfo.setRestored(false);
3944 VGSaves.push_back(VGInfo);
3945
3946 // Add VG again if the function is locally-streaming, as we will spill two
3947 // values.
3948 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3949 VGSaves.push_back(VGInfo);
3950
3951 bool InsertBeforeLR = false;
3952
3953 for (unsigned I = 0; I < CSI.size(); I++)
3954 if (CSI[I].getReg() == AArch64::LR) {
3955 InsertBeforeLR = true;
3956 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3957 break;
3958 }
3959
3960 if (!InsertBeforeLR)
3961 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3962 }
3963
3964 Register LastReg = 0;
3965 int HazardSlotIndex = std::numeric_limits<int>::max();
3966 for (auto &CS : CSI) {
3967 Register Reg = CS.getReg();
3968 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3969
3970 // Create a hazard slot as we switch between GPR and FPR CSRs.
3971 if (AFI->hasStackHazardSlotIndex() &&
3972 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3974 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3975 "Unexpected register order for hazard slot");
3976 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3977 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3978 << "\n");
3979 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3980 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3981 MinCSFrameIndex = HazardSlotIndex;
3982 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3983 MaxCSFrameIndex = HazardSlotIndex;
3984 }
3985
3986 unsigned Size = RegInfo->getSpillSize(*RC);
3987 Align Alignment(RegInfo->getSpillAlign(*RC));
3988 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3989 CS.setFrameIdx(FrameIdx);
3990
3991 if ((unsigned)FrameIdx < MinCSFrameIndex)
3992 MinCSFrameIndex = FrameIdx;
3993 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3994 MaxCSFrameIndex = FrameIdx;
3995
3996 // Grab 8 bytes below FP for the extended asynchronous frame info.
3997 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3998 Reg == AArch64::FP) {
3999 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
4000 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
4001 if ((unsigned)FrameIdx < MinCSFrameIndex)
4002 MinCSFrameIndex = FrameIdx;
4003 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4004 MaxCSFrameIndex = FrameIdx;
4005 }
4006 LastReg = Reg;
4007 }
4008
4009 // Add hazard slot in the case where no FPR CSRs are present.
4010 if (AFI->hasStackHazardSlotIndex() &&
4011 HazardSlotIndex == std::numeric_limits<int>::max()) {
4012 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
4013 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
4014 << "\n");
4015 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4016 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4017 MinCSFrameIndex = HazardSlotIndex;
4018 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4019 MaxCSFrameIndex = HazardSlotIndex;
4020 }
4021
4022 return true;
4023}
4024
4026 const MachineFunction &MF) const {
4028 // If the function has streaming-mode changes, don't scavenge a
4029 // spillslot in the callee-save area, as that might require an
4030 // 'addvl' in the streaming-mode-changing call-sequence when the
4031 // function doesn't use a FP.
4032 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
4033 return false;
4034 // Don't allow register salvaging with hazard slots, in case it moves objects
4035 // into the wrong place.
4036 if (AFI->hasStackHazardSlotIndex())
4037 return false;
4038 return AFI->hasCalleeSaveStackFreeSpace();
4039}
4040
4041/// returns true if there are any SVE callee saves.
4043 int &Min, int &Max) {
4044 Min = std::numeric_limits<int>::max();
4045 Max = std::numeric_limits<int>::min();
4046
4047 if (!MFI.isCalleeSavedInfoValid())
4048 return false;
4049
4050 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
4051 for (auto &CS : CSI) {
4052 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
4053 AArch64::PPRRegClass.contains(CS.getReg())) {
4054 assert((Max == std::numeric_limits<int>::min() ||
4055 Max + 1 == CS.getFrameIdx()) &&
4056 "SVE CalleeSaves are not consecutive");
4057
4058 Min = std::min(Min, CS.getFrameIdx());
4059 Max = std::max(Max, CS.getFrameIdx());
4060 }
4061 }
4062 return Min != std::numeric_limits<int>::max();
4063}
4064
4065// Process all the SVE stack objects and determine offsets for each
4066// object. If AssignOffsets is true, the offsets get assigned.
4067// Fills in the first and last callee-saved frame indices into
4068// Min/MaxCSFrameIndex, respectively.
4069// Returns the size of the stack.
4071 int &MinCSFrameIndex,
4072 int &MaxCSFrameIndex,
4073 bool AssignOffsets) {
4074#ifndef NDEBUG
4075 // First process all fixed stack objects.
4076 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
4078 "SVE vectors should never be passed on the stack by value, only by "
4079 "reference.");
4080#endif
4081
4082 auto Assign = [&MFI](int FI, int64_t Offset) {
4083 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4084 MFI.setObjectOffset(FI, Offset);
4085 };
4086
4087 int64_t Offset = 0;
4088
4089 // Then process all callee saved slots.
4090 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4091 // Assign offsets to the callee save slots.
4092 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4093 Offset += MFI.getObjectSize(I);
4095 if (AssignOffsets)
4096 Assign(I, -Offset);
4097 }
4098 }
4099
4100 // Ensure that the Callee-save area is aligned to 16bytes.
4101 Offset = alignTo(Offset, Align(16U));
4102
4103 // Create a buffer of SVE objects to allocate and sort it.
4104 SmallVector<int, 8> ObjectsToAllocate;
4105 // If we have a stack protector, and we've previously decided that we have SVE
4106 // objects on the stack and thus need it to go in the SVE stack area, then it
4107 // needs to go first.
4108 int StackProtectorFI = -1;
4109 if (MFI.hasStackProtectorIndex()) {
4110 StackProtectorFI = MFI.getStackProtectorIndex();
4111 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4112 ObjectsToAllocate.push_back(StackProtectorFI);
4113 }
4114 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4115 unsigned StackID = MFI.getStackID(I);
4116 if (StackID != TargetStackID::ScalableVector)
4117 continue;
4118 if (I == StackProtectorFI)
4119 continue;
4120 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4121 continue;
4122 if (MFI.isDeadObjectIndex(I))
4123 continue;
4124
4125 ObjectsToAllocate.push_back(I);
4126 }
4127
4128 // Allocate all SVE locals and spills
4129 for (unsigned FI : ObjectsToAllocate) {
4130 Align Alignment = MFI.getObjectAlign(FI);
4131 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4132 // two, we'd need to align every object dynamically at runtime if the
4133 // alignment is larger than 16. This is not yet supported.
4134 if (Alignment > Align(16))
4136 "Alignment of scalable vectors > 16 bytes is not yet supported");
4137
4138 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4139 if (AssignOffsets)
4140 Assign(FI, -Offset);
4141 }
4142
4143 return Offset;
4144}
4145
4146int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4147 MachineFrameInfo &MFI) const {
4148 int MinCSFrameIndex, MaxCSFrameIndex;
4149 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4150}
4151
4152int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4153 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4154 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4155 true);
4156}
4157
4159 MachineFunction &MF, RegScavenger *RS) const {
4160 MachineFrameInfo &MFI = MF.getFrameInfo();
4161
4163 "Upwards growing stack unsupported");
4164
4165 int MinCSFrameIndex, MaxCSFrameIndex;
4166 int64_t SVEStackSize =
4167 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4168
4170 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4171 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4172
4173 // If this function isn't doing Win64-style C++ EH, we don't need to do
4174 // anything.
4175 if (!MF.hasEHFunclets())
4176 return;
4178 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4179
4180 MachineBasicBlock &MBB = MF.front();
4181 auto MBBI = MBB.begin();
4182 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4183 ++MBBI;
4184
4185 // Create an UnwindHelp object.
4186 // The UnwindHelp object is allocated at the start of the fixed object area
4187 int64_t FixedObject =
4188 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
4189 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
4190 /*SPOffset*/ -FixedObject,
4191 /*IsImmutable=*/false);
4192 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4193
4194 // We need to store -2 into the UnwindHelp object at the start of the
4195 // function.
4196 DebugLoc DL;
4198 RS->backward(MBBI);
4199 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4200 assert(DstReg && "There must be a free register after frame setup");
4201 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4202 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4203 .addReg(DstReg, getKillRegState(true))
4204 .addFrameIndex(UnwindHelpFI)
4205 .addImm(0);
4206}
4207
4208namespace {
4209struct TagStoreInstr {
4211 int64_t Offset, Size;
4212 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4213 : MI(MI), Offset(Offset), Size(Size) {}
4214};
4215
4216class TagStoreEdit {
4217 MachineFunction *MF;
4220 // Tag store instructions that are being replaced.
4222 // Combined memref arguments of the above instructions.
4224
4225 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4226 // FrameRegOffset + Size) with the address tag of SP.
4227 Register FrameReg;
4228 StackOffset FrameRegOffset;
4229 int64_t Size;
4230 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4231 // end.
4232 std::optional<int64_t> FrameRegUpdate;
4233 // MIFlags for any FrameReg updating instructions.
4234 unsigned FrameRegUpdateFlags;
4235
4236 // Use zeroing instruction variants.
4237 bool ZeroData;
4238 DebugLoc DL;
4239
4240 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4241 void emitLoop(MachineBasicBlock::iterator InsertI);
4242
4243public:
4244 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4245 : MBB(MBB), ZeroData(ZeroData) {
4246 MF = MBB->getParent();
4247 MRI = &MF->getRegInfo();
4248 }
4249 // Add an instruction to be replaced. Instructions must be added in the
4250 // ascending order of Offset, and have to be adjacent.
4251 void addInstruction(TagStoreInstr I) {
4252 assert((TagStores.empty() ||
4253 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4254 "Non-adjacent tag store instructions.");
4255 TagStores.push_back(I);
4256 }
4257 void clear() { TagStores.clear(); }
4258 // Emit equivalent code at the given location, and erase the current set of
4259 // instructions. May skip if the replacement is not profitable. May invalidate
4260 // the input iterator and replace it with a valid one.
4261 void emitCode(MachineBasicBlock::iterator &InsertI,
4262 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4263};
4264
4265void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4266 const AArch64InstrInfo *TII =
4267 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4268
4269 const int64_t kMinOffset = -256 * 16;
4270 const int64_t kMaxOffset = 255 * 16;
4271
4272 Register BaseReg = FrameReg;
4273 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4274 if (BaseRegOffsetBytes < kMinOffset ||
4275 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4276 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4277 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4278 // is required for the offset of ST2G.
4279 BaseRegOffsetBytes % 16 != 0) {
4280 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4281 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4282 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4283 BaseReg = ScratchReg;
4284 BaseRegOffsetBytes = 0;
4285 }
4286
4287 MachineInstr *LastI = nullptr;
4288 while (Size) {
4289 int64_t InstrSize = (Size > 16) ? 32 : 16;
4290 unsigned Opcode =
4291 InstrSize == 16
4292 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4293 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4294 assert(BaseRegOffsetBytes % 16 == 0);
4295 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4296 .addReg(AArch64::SP)
4297 .addReg(BaseReg)
4298 .addImm(BaseRegOffsetBytes / 16)
4299 .setMemRefs(CombinedMemRefs);
4300 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4301 // final SP adjustment in the epilogue.
4302 if (BaseRegOffsetBytes == 0)
4303 LastI = I;
4304 BaseRegOffsetBytes += InstrSize;
4305 Size -= InstrSize;
4306 }
4307
4308 if (LastI)
4309 MBB->splice(InsertI, MBB, LastI);
4310}
4311
4312void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4313 const AArch64InstrInfo *TII =
4314 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4315
4316 Register BaseReg = FrameRegUpdate
4317 ? FrameReg
4318 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4319 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4320
4321 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4322
4323 int64_t LoopSize = Size;
4324 // If the loop size is not a multiple of 32, split off one 16-byte store at
4325 // the end to fold BaseReg update into.
4326 if (FrameRegUpdate && *FrameRegUpdate)
4327 LoopSize -= LoopSize % 32;
4328 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4329 TII->get(ZeroData ? AArch64::STZGloop_wback
4330 : AArch64::STGloop_wback))
4331 .addDef(SizeReg)
4332 .addDef(BaseReg)
4333 .addImm(LoopSize)
4334 .addReg(BaseReg)
4335 .setMemRefs(CombinedMemRefs);
4336 if (FrameRegUpdate)
4337 LoopI->setFlags(FrameRegUpdateFlags);
4338
4339 int64_t ExtraBaseRegUpdate =
4340 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4341 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
4342 << ", Size=" << Size
4343 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
4344 << ", FrameRegUpdate=" << FrameRegUpdate
4345 << ", FrameRegOffset.getFixed()="
4346 << FrameRegOffset.getFixed() << "\n");
4347 if (LoopSize < Size) {
4348 assert(FrameRegUpdate);
4349 assert(Size - LoopSize == 16);
4350 // Tag 16 more bytes at BaseReg and update BaseReg.
4351 int64_t STGOffset = ExtraBaseRegUpdate + 16;
4352 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
4353 "STG immediate out of range");
4354 BuildMI(*MBB, InsertI, DL,
4355 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4356 .addDef(BaseReg)
4357 .addReg(BaseReg)
4358 .addReg(BaseReg)
4359 .addImm(STGOffset / 16)
4360 .setMemRefs(CombinedMemRefs)
4361 .setMIFlags(FrameRegUpdateFlags);
4362 } else if (ExtraBaseRegUpdate) {
4363 // Update BaseReg.
4364 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
4365 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
4366 BuildMI(
4367 *MBB, InsertI, DL,
4368 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4369 .addDef(BaseReg)
4370 .addReg(BaseReg)
4371 .addImm(AddSubOffset)
4372 .addImm(0)
4373 .setMIFlags(FrameRegUpdateFlags);
4374 }
4375}
4376
4377// Check if *II is a register update that can be merged into STGloop that ends
4378// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4379// end of the loop.
4380bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4381 int64_t Size, int64_t *TotalOffset) {
4382 MachineInstr &MI = *II;
4383 if ((MI.getOpcode() == AArch64::ADDXri ||
4384 MI.getOpcode() == AArch64::SUBXri) &&
4385 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4386 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4387 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4388 if (MI.getOpcode() == AArch64::SUBXri)
4389 Offset = -Offset;
4390 int64_t PostOffset = Offset - Size;
4391 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
4392 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
4393 // chosen depends on the alignment of the loop size, but the difference
4394 // between the valid ranges for the two instructions is small, so we
4395 // conservatively assume that it could be either case here.
4396 //
4397 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
4398 // instruction.
4399 const int64_t kMaxOffset = 4080 - 16;
4400 // Max offset of SUBXri.
4401 const int64_t kMinOffset = -4095;
4402 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
4403 PostOffset % 16 == 0) {
4404 *TotalOffset = Offset;
4405 return true;
4406 }
4407 }
4408 return false;
4409}
4410
4411void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4413 MemRefs.clear();
4414 for (auto &TS : TSE) {
4415 MachineInstr *MI = TS.MI;
4416 // An instruction without memory operands may access anything. Be
4417 // conservative and return an empty list.
4418 if (MI->memoperands_empty()) {
4419 MemRefs.clear();
4420 return;
4421 }
4422 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4423 }
4424}
4425
4426void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4427 const AArch64FrameLowering *TFI,
4428 bool TryMergeSPUpdate) {
4429 if (TagStores.empty())
4430 return;
4431 TagStoreInstr &FirstTagStore = TagStores[0];
4432 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4433 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4434 DL = TagStores[0].MI->getDebugLoc();
4435
4436 Register Reg;
4437 FrameRegOffset = TFI->resolveFrameOffsetReference(
4438 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4439 /*PreferFP=*/false, /*ForSimm=*/true);
4440 FrameReg = Reg;
4441 FrameRegUpdate = std::nullopt;
4442
4443 mergeMemRefs(TagStores, CombinedMemRefs);
4444
4445 LLVM_DEBUG({
4446 dbgs() << "Replacing adjacent STG instructions:\n";
4447 for (const auto &Instr : TagStores) {
4448 dbgs() << " " << *Instr.MI;
4449 }
4450 });
4451
4452 // Size threshold where a loop becomes shorter than a linear sequence of
4453 // tagging instructions.
4454 const int kSetTagLoopThreshold = 176;
4455 if (Size < kSetTagLoopThreshold) {
4456 if (TagStores.size() < 2)
4457 return;
4458 emitUnrolled(InsertI);
4459 } else {
4460 MachineInstr *UpdateInstr = nullptr;
4461 int64_t TotalOffset = 0;
4462 if (TryMergeSPUpdate) {
4463 // See if we can merge base register update into the STGloop.
4464 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4465 // but STGloop is way too unusual for that, and also it only
4466 // realistically happens in function epilogue. Also, STGloop is expanded
4467 // before that pass.
4468 if (InsertI != MBB->end() &&
4469 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4470 &TotalOffset)) {
4471 UpdateInstr = &*InsertI++;
4472 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4473 << *UpdateInstr);
4474 }
4475 }
4476
4477 if (!UpdateInstr && TagStores.size() < 2)
4478 return;
4479
4480 if (UpdateInstr) {
4481 FrameRegUpdate = TotalOffset;
4482 FrameRegUpdateFlags = UpdateInstr->getFlags();
4483 }
4484 emitLoop(InsertI);
4485 if (UpdateInstr)
4486 UpdateInstr->eraseFromParent();
4487 }
4488
4489 for (auto &TS : TagStores)
4490 TS.MI->eraseFromParent();
4491}
4492
4493bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4494 int64_t &Size, bool &ZeroData) {
4495 MachineFunction &MF = *MI.getParent()->getParent();
4496 const MachineFrameInfo &MFI = MF.getFrameInfo();
4497
4498 unsigned Opcode = MI.getOpcode();
4499 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4500 Opcode == AArch64::STZ2Gi);
4501
4502 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4503 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4504 return false;
4505 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4506 return false;
4507 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4508 Size = MI.getOperand(2).getImm();
4509 return true;
4510 }
4511
4512 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4513 Size = 16;
4514 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4515 Size = 32;
4516 else
4517 return false;
4518
4519 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4520 return false;
4521
4522 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4523 16 * MI.getOperand(2).getImm();
4524 return true;
4525}
4526
4527// Detect a run of memory tagging instructions for adjacent stack frame slots,
4528// and replace them with a shorter instruction sequence:
4529// * replace STG + STG with ST2G
4530// * replace STGloop + STGloop with STGloop
4531// This code needs to run when stack slot offsets are already known, but before
4532// FrameIndex operands in STG instructions are eliminated.
4534 const AArch64FrameLowering *TFI,
4535 RegScavenger *RS) {
4536 bool FirstZeroData;
4537 int64_t Size, Offset;
4538 MachineInstr &MI = *II;
4539 MachineBasicBlock *MBB = MI.getParent();
4541 if (&MI == &MBB->instr_back())
4542 return II;
4543 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4544 return II;
4545
4547 Instrs.emplace_back(&MI, Offset, Size);
4548
4549 constexpr int kScanLimit = 10;
4550 int Count = 0;
4552 NextI != E && Count < kScanLimit; ++NextI) {
4553 MachineInstr &MI = *NextI;
4554 bool ZeroData;
4555 int64_t Size, Offset;
4556 // Collect instructions that update memory tags with a FrameIndex operand
4557 // and (when applicable) constant size, and whose output registers are dead
4558 // (the latter is almost always the case in practice). Since these
4559 // instructions effectively have no inputs or outputs, we are free to skip
4560 // any non-aliasing instructions in between without tracking used registers.
4561 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4562 if (ZeroData != FirstZeroData)
4563 break;
4564 Instrs.emplace_back(&MI, Offset, Size);
4565 continue;
4566 }
4567
4568 // Only count non-transient, non-tagging instructions toward the scan
4569 // limit.
4570 if (!MI.isTransient())
4571 ++Count;
4572
4573 // Just in case, stop before the epilogue code starts.
4574 if (MI.getFlag(MachineInstr::FrameSetup) ||
4576 break;
4577
4578 // Reject anything that may alias the collected instructions.
4579 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
4580 break;
4581 }
4582
4583 // New code will be inserted after the last tagging instruction we've found.
4584 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4585
4586 // All the gathered stack tag instructions are merged and placed after
4587 // last tag store in the list. The check should be made if the nzcv
4588 // flag is live at the point where we are trying to insert. Otherwise
4589 // the nzcv flag might get clobbered if any stg loops are present.
4590
4591 // FIXME : This approach of bailing out from merge is conservative in
4592 // some ways like even if stg loops are not present after merge the
4593 // insert list, this liveness check is done (which is not needed).
4595 LiveRegs.addLiveOuts(*MBB);
4596 for (auto I = MBB->rbegin();; ++I) {
4597 MachineInstr &MI = *I;
4598 if (MI == InsertI)
4599 break;
4600 LiveRegs.stepBackward(*I);
4601 }
4602 InsertI++;
4603 if (LiveRegs.contains(AArch64::NZCV))
4604 return InsertI;
4605
4606 llvm::stable_sort(Instrs,
4607 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4608 return Left.Offset < Right.Offset;
4609 });
4610
4611 // Make sure that we don't have any overlapping stores.
4612 int64_t CurOffset = Instrs[0].Offset;
4613 for (auto &Instr : Instrs) {
4614 if (CurOffset > Instr.Offset)
4615 return NextI;
4616 CurOffset = Instr.Offset + Instr.Size;
4617 }
4618
4619 // Find contiguous runs of tagged memory and emit shorter instruction
4620 // sequencies for them when possible.
4621 TagStoreEdit TSE(MBB, FirstZeroData);
4622 std::optional<int64_t> EndOffset;
4623 for (auto &Instr : Instrs) {
4624 if (EndOffset && *EndOffset != Instr.Offset) {
4625 // Found a gap.
4626 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4627 TSE.clear();
4628 }
4629
4630 TSE.addInstruction(Instr);
4631 EndOffset = Instr.Offset + Instr.Size;
4632 }
4633
4634 const MachineFunction *MF = MBB->getParent();
4635 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4636 TSE.emitCode(
4637 InsertI, TFI, /*TryMergeSPUpdate = */
4639
4640 return InsertI;
4641}
4642} // namespace
4643
4645 const AArch64FrameLowering *TFI) {
4646 MachineInstr &MI = *II;
4647 MachineBasicBlock *MBB = MI.getParent();
4648 MachineFunction *MF = MBB->getParent();
4649
4650 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4651 MI.getOpcode() != AArch64::VGRestorePseudo)
4652 return II;
4653
4654 SMEAttrs FuncAttrs(MF->getFunction());
4655 bool LocallyStreaming =
4656 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4659 const AArch64InstrInfo *TII =
4660 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4661
4662 int64_t VGFrameIdx =
4663 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4664 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4665 "Expected FrameIdx for VG");
4666
4667 unsigned CFIIndex;
4668 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4669 const MachineFrameInfo &MFI = MF->getFrameInfo();
4670 int64_t Offset =
4671 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4673 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4674 } else
4676 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
4677
4678 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
4679 TII->get(TargetOpcode::CFI_INSTRUCTION))
4680 .addCFIIndex(CFIIndex);
4681
4682 MI.eraseFromParent();
4683 return UnwindInst->getIterator();
4684}
4685
4687 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4688 for (auto &BB : MF)
4689 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4690 if (requiresSaveVG(MF))
4691 II = emitVGSaveRestore(II, this);
4693 II = tryMergeAdjacentSTG(II, this, RS);
4694 }
4695}
4696
4697/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4698/// before the update. This is easily retrieved as it is exactly the offset
4699/// that is set in processFunctionBeforeFrameFinalized.
4701 const MachineFunction &MF, int FI, Register &FrameReg,
4702 bool IgnoreSPUpdates) const {
4703 const MachineFrameInfo &MFI = MF.getFrameInfo();
4704 if (IgnoreSPUpdates) {
4705 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4706 << MFI.getObjectOffset(FI) << "\n");
4707 FrameReg = AArch64::SP;
4708 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4709 }
4710
4711 // Go to common code if we cannot provide sp + offset.
4712 if (MFI.hasVarSizedObjects() ||
4715 return getFrameIndexReference(MF, FI, FrameReg);
4716
4717 FrameReg = AArch64::SP;
4718 return getStackOffset(MF, MFI.getObjectOffset(FI));
4719}
4720
4721/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4722/// the parent's frame pointer
4724 const MachineFunction &MF) const {
4725 return 0;
4726}
4727
4728/// Funclets only need to account for space for the callee saved registers,
4729/// as the locals are accounted for in the parent's stack frame.
4731 const MachineFunction &MF) const {
4732 // This is the size of the pushed CSRs.
4733 unsigned CSSize =
4734 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4735 // This is the amount of stack a funclet needs to allocate.
4736 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4737 getStackAlign());
4738}
4739
4740namespace {
4741struct FrameObject {
4742 bool IsValid = false;
4743 // Index of the object in MFI.
4744 int ObjectIndex = 0;
4745 // Group ID this object belongs to.
4746 int GroupIndex = -1;
4747 // This object should be placed first (closest to SP).
4748 bool ObjectFirst = false;
4749 // This object's group (which always contains the object with
4750 // ObjectFirst==true) should be placed first.
4751 bool GroupFirst = false;
4752
4753 // Used to distinguish between FP and GPR accesses. The values are decided so
4754 // that they sort FPR < Hazard < GPR and they can be or'd together.
4755 unsigned Accesses = 0;
4756 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
4757};
4758
4759class GroupBuilder {
4760 SmallVector<int, 8> CurrentMembers;
4761 int NextGroupIndex = 0;
4762 std::vector<FrameObject> &Objects;
4763
4764public:
4765 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4766 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4767 void EndCurrentGroup() {
4768 if (CurrentMembers.size() > 1) {
4769 // Create a new group with the current member list. This might remove them
4770 // from their pre-existing groups. That's OK, dealing with overlapping
4771 // groups is too hard and unlikely to make a difference.
4772 LLVM_DEBUG(dbgs() << "group:");
4773 for (int Index : CurrentMembers) {
4774 Objects[Index].GroupIndex = NextGroupIndex;
4775 LLVM_DEBUG(dbgs() << " " << Index);
4776 }
4777 LLVM_DEBUG(dbgs() << "\n");
4778 NextGroupIndex++;
4779 }
4780 CurrentMembers.clear();
4781 }
4782};
4783
4784bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4785 // Objects at a lower index are closer to FP; objects at a higher index are
4786 // closer to SP.
4787 //
4788 // For consistency in our comparison, all invalid objects are placed
4789 // at the end. This also allows us to stop walking when we hit the
4790 // first invalid item after it's all sorted.
4791 //
4792 // If we want to include a stack hazard region, order FPR accesses < the
4793 // hazard object < GPRs accesses in order to create a separation between the
4794 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
4795 //
4796 // Otherwise the "first" object goes first (closest to SP), followed by the
4797 // members of the "first" group.
4798 //
4799 // The rest are sorted by the group index to keep the groups together.
4800 // Higher numbered groups are more likely to be around longer (i.e. untagged
4801 // in the function epilogue and not at some earlier point). Place them closer
4802 // to SP.
4803 //
4804 // If all else equal, sort by the object index to keep the objects in the
4805 // original order.
4806 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
4807 A.GroupIndex, A.ObjectIndex) <
4808 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
4809 B.GroupIndex, B.ObjectIndex);
4810}
4811} // namespace
4812
4814 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4815 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4816 return;
4817
4819 const MachineFrameInfo &MFI = MF.getFrameInfo();
4820 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4821 for (auto &Obj : ObjectsToAllocate) {
4822 FrameObjects[Obj].IsValid = true;
4823 FrameObjects[Obj].ObjectIndex = Obj;
4824 }
4825
4826 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
4827 // the same time.
4828 GroupBuilder GB(FrameObjects);
4829 for (auto &MBB : MF) {
4830 for (auto &MI : MBB) {
4831 if (MI.isDebugInstr())
4832 continue;
4833
4834 if (AFI.hasStackHazardSlotIndex()) {
4835 std::optional<int> FI = getLdStFrameID(MI, MFI);
4836 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
4837 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
4839 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
4840 else
4841 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
4842 }
4843 }
4844
4845 int OpIndex;
4846 switch (MI.getOpcode()) {
4847 case AArch64::STGloop:
4848 case AArch64::STZGloop:
4849 OpIndex = 3;
4850 break;
4851 case AArch64::STGi:
4852 case AArch64::STZGi:
4853 case AArch64::ST2Gi:
4854 case AArch64::STZ2Gi:
4855 OpIndex = 1;
4856 break;
4857 default:
4858 OpIndex = -1;
4859 }
4860
4861 int TaggedFI = -1;
4862 if (OpIndex >= 0) {
4863 const MachineOperand &MO = MI.getOperand(OpIndex);
4864 if (MO.isFI()) {
4865 int FI = MO.getIndex();
4866 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4867 FrameObjects[FI].IsValid)
4868 TaggedFI = FI;
4869 }
4870 }
4871
4872 // If this is a stack tagging instruction for a slot that is not part of a
4873 // group yet, either start a new group or add it to the current one.
4874 if (TaggedFI >= 0)
4875 GB.AddMember(TaggedFI);
4876 else
4877 GB.EndCurrentGroup();
4878 }
4879 // Groups should never span multiple basic blocks.
4880 GB.EndCurrentGroup();
4881 }
4882
4883 if (AFI.hasStackHazardSlotIndex()) {
4884 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
4885 FrameObject::AccessHazard;
4886 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
4887 for (auto &Obj : FrameObjects)
4888 if (!Obj.Accesses ||
4889 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
4890 Obj.Accesses = FrameObject::AccessGPR;
4891 }
4892
4893 // If the function's tagged base pointer is pinned to a stack slot, we want to
4894 // put that slot first when possible. This will likely place it at SP + 0,
4895 // and save one instruction when generating the base pointer because IRG does
4896 // not allow an immediate offset.
4897 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4898 if (TBPI) {
4899 FrameObjects[*TBPI].ObjectFirst = true;
4900 FrameObjects[*TBPI].GroupFirst = true;
4901 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4902 if (FirstGroupIndex >= 0)
4903 for (FrameObject &Object : FrameObjects)
4904 if (Object.GroupIndex == FirstGroupIndex)
4905 Object.GroupFirst = true;
4906 }
4907
4908 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4909
4910 int i = 0;
4911 for (auto &Obj : FrameObjects) {
4912 // All invalid items are sorted at the end, so it's safe to stop.
4913 if (!Obj.IsValid)
4914 break;
4915 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4916 }
4917
4918 LLVM_DEBUG({
4919 dbgs() << "Final frame order:\n";
4920 for (auto &Obj : FrameObjects) {
4921 if (!Obj.IsValid)
4922 break;
4923 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4924 if (Obj.ObjectFirst)
4925 dbgs() << ", first";
4926 if (Obj.GroupFirst)
4927 dbgs() << ", group-first";
4928 dbgs() << "\n";
4929 }
4930 });
4931}
4932
4933/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4934/// least every ProbeSize bytes. Returns an iterator of the first instruction
4935/// after the loop. The difference between SP and TargetReg must be an exact
4936/// multiple of ProbeSize.
4938AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4939 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4940 Register TargetReg) const {
4942 MachineFunction &MF = *MBB.getParent();
4943 const AArch64InstrInfo *TII =
4944 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4946
4947 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4949 MF.insert(MBBInsertPoint, LoopMBB);
4951 MF.insert(MBBInsertPoint, ExitMBB);
4952
4953 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4954 // in SUB).
4955 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4956 StackOffset::getFixed(-ProbeSize), TII,
4958 // STR XZR, [SP]
4959 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4960 .addReg(AArch64::XZR)
4961 .addReg(AArch64::SP)
4962 .addImm(0)
4964 // CMP SP, TargetReg
4965 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4966 AArch64::XZR)
4967 .addReg(AArch64::SP)
4968 .addReg(TargetReg)
4971 // B.CC Loop
4972 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4974 .addMBB(LoopMBB)
4976
4977 LoopMBB->addSuccessor(ExitMBB);
4978 LoopMBB->addSuccessor(LoopMBB);
4979 // Synthesize the exit MBB.
4980 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4982 MBB.addSuccessor(LoopMBB);
4983 // Update liveins.
4984 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4985
4986 return ExitMBB->begin();
4987}
4988
4989void AArch64FrameLowering::inlineStackProbeFixed(
4990 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4991 StackOffset CFAOffset) const {
4993 MachineFunction &MF = *MBB->getParent();
4994 const AArch64InstrInfo *TII =
4995 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4997 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4998 bool HasFP = hasFP(MF);
4999
5000 DebugLoc DL;
5001 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
5002 int64_t NumBlocks = FrameSize / ProbeSize;
5003 int64_t ResidualSize = FrameSize % ProbeSize;
5004
5005 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
5006 << NumBlocks << " blocks of " << ProbeSize
5007 << " bytes, plus " << ResidualSize << " bytes\n");
5008
5009 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
5010 // ordinary loop.
5011 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
5012 for (int i = 0; i < NumBlocks; ++i) {
5013 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
5014 // encodable in a SUB).
5015 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5016 StackOffset::getFixed(-ProbeSize), TII,
5017 MachineInstr::FrameSetup, false, false, nullptr,
5018 EmitAsyncCFI && !HasFP, CFAOffset);
5019 CFAOffset += StackOffset::getFixed(ProbeSize);
5020 // STR XZR, [SP]
5021 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5022 .addReg(AArch64::XZR)
5023 .addReg(AArch64::SP)
5024 .addImm(0)
5026 }
5027 } else if (NumBlocks != 0) {
5028 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
5029 // encodable in ADD). ScrathReg may temporarily become the CFA register.
5030 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
5031 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
5032 MachineInstr::FrameSetup, false, false, nullptr,
5033 EmitAsyncCFI && !HasFP, CFAOffset);
5034 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
5035 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
5036 MBB = MBBI->getParent();
5037 if (EmitAsyncCFI && !HasFP) {
5038 // Set the CFA register back to SP.
5040 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
5041 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
5042 unsigned CFIIndex =
5044 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
5045 .addCFIIndex(CFIIndex)
5047 }
5048 }
5049
5050 if (ResidualSize != 0) {
5051 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
5052 // in SUB).
5053 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5054 StackOffset::getFixed(-ResidualSize), TII,
5055 MachineInstr::FrameSetup, false, false, nullptr,
5056 EmitAsyncCFI && !HasFP, CFAOffset);
5057 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
5058 // STR XZR, [SP]
5059 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5060 .addReg(AArch64::XZR)
5061 .addReg(AArch64::SP)
5062 .addImm(0)
5064 }
5065 }
5066}
5067
5068void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
5069 MachineBasicBlock &MBB) const {
5070 // Get the instructions that need to be replaced. We emit at most two of
5071 // these. Remember them in order to avoid complications coming from the need
5072 // to traverse the block while potentially creating more blocks.
5074 for (MachineInstr &MI : MBB)
5075 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
5076 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
5077 ToReplace.push_back(&MI);
5078
5079 for (MachineInstr *MI : ToReplace) {
5080 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
5081 Register ScratchReg = MI->getOperand(0).getReg();
5082 int64_t FrameSize = MI->getOperand(1).getImm();
5083 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
5084 MI->getOperand(3).getImm());
5085 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
5086 CFAOffset);
5087 } else {
5088 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
5089 "Stack probe pseudo-instruction expected");
5090 const AArch64InstrInfo *TII =
5091 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
5092 Register TargetReg = MI->getOperand(0).getReg();
5093 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
5094 }
5095 MI->eraseFromParent();
5096 }
5097}
5098
5101 NotAccessed = 0, // Stack object not accessed by load/store instructions.
5102 GPR = 1 << 0, // A general purpose register.
5103 PPR = 1 << 1, // A predicate register.
5104 FPR = 1 << 2, // A floating point/Neon/SVE register.
5105 };
5106
5107 int Idx;
5109 int64_t Size;
5110 unsigned AccessTypes;
5111
5112 StackAccess() : Idx(0), Offset(), Size(0), AccessTypes(NotAccessed) {}
5113
5114 bool operator<(const StackAccess &Rhs) const {
5115 return std::make_tuple(start(), Idx) <
5116 std::make_tuple(Rhs.start(), Rhs.Idx);
5117 }
5118
5119 bool isCPU() const {
5120 // Predicate register load and store instructions execute on the CPU.
5121 return AccessTypes & (AccessType::GPR | AccessType::PPR);
5122 }
5123 bool isSME() const { return AccessTypes & AccessType::FPR; }
5124 bool isMixed() const { return isCPU() && isSME(); }
5125
5126 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
5127 int64_t end() const { return start() + Size; }
5128
5129 std::string getTypeString() const {
5130 switch (AccessTypes) {
5131 case AccessType::FPR:
5132 return "FPR";
5133 case AccessType::PPR:
5134 return "PPR";
5135 case AccessType::GPR:
5136 return "GPR";
5137 case AccessType::NotAccessed:
5138 return "NA";
5139 default:
5140 return "Mixed";
5141 }
5142 }
5143
5144 void print(raw_ostream &OS) const {
5145 OS << getTypeString() << " stack object at [SP"
5146 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
5147 if (Offset.getScalable())
5148 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
5149 << " * vscale";
5150 OS << "]";
5151 }
5152};
5153
5154static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
5155 SA.print(OS);
5156 return OS;
5157}
5158
5159void AArch64FrameLowering::emitRemarks(
5160 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
5161
5163 if (Attrs.hasNonStreamingInterfaceAndBody())
5164 return;
5165
5166 unsigned StackHazardSize = getStackHazardSize(MF);
5167 const uint64_t HazardSize =
5168 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
5169
5170 if (HazardSize == 0)
5171 return;
5172
5173 const MachineFrameInfo &MFI = MF.getFrameInfo();
5174 // Bail if function has no stack objects.
5175 if (!MFI.hasStackObjects())
5176 return;
5177
5178 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
5179
5180 size_t NumFPLdSt = 0;
5181 size_t NumNonFPLdSt = 0;
5182
5183 // Collect stack accesses via Load/Store instructions.
5184 for (const MachineBasicBlock &MBB : MF) {
5185 for (const MachineInstr &MI : MBB) {
5186 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
5187 continue;
5188 for (MachineMemOperand *MMO : MI.memoperands()) {
5189 std::optional<int> FI = getMMOFrameID(MMO, MFI);
5190 if (FI && !MFI.isDeadObjectIndex(*FI)) {
5191 int FrameIdx = *FI;
5192
5193 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
5194 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
5195 StackAccesses[ArrIdx].Idx = FrameIdx;
5196 StackAccesses[ArrIdx].Offset =
5197 getFrameIndexReferenceFromSP(MF, FrameIdx);
5198 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
5199 }
5200
5201 unsigned RegTy = StackAccess::AccessType::GPR;
5202 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
5203 if (AArch64::PPRRegClass.contains(MI.getOperand(0).getReg()))
5204 RegTy = StackAccess::PPR;
5205 else
5206 RegTy = StackAccess::FPR;
5207 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
5208 RegTy = StackAccess::FPR;
5209 }
5210
5211 StackAccesses[ArrIdx].AccessTypes |= RegTy;
5212
5213 if (RegTy == StackAccess::FPR)
5214 ++NumFPLdSt;
5215 else
5216 ++NumNonFPLdSt;
5217 }
5218 }
5219 }
5220 }
5221
5222 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
5223 return;
5224
5225 llvm::sort(StackAccesses);
5226 StackAccesses.erase(llvm::remove_if(StackAccesses,
5227 [](const StackAccess &S) {
5228 return S.AccessTypes ==
5230 }),
5231 StackAccesses.end());
5232
5235
5236 if (StackAccesses.front().isMixed())
5237 MixedObjects.push_back(&StackAccesses.front());
5238
5239 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
5240 It != End; ++It) {
5241 const auto &First = *It;
5242 const auto &Second = *(It + 1);
5243
5244 if (Second.isMixed())
5245 MixedObjects.push_back(&Second);
5246
5247 if ((First.isSME() && Second.isCPU()) ||
5248 (First.isCPU() && Second.isSME())) {
5249 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
5250 if (Distance < HazardSize)
5251 HazardPairs.emplace_back(&First, &Second);
5252 }
5253 }
5254
5255 auto EmitRemark = [&](llvm::StringRef Str) {
5256 ORE->emit([&]() {
5258 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
5259 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
5260 });
5261 };
5262
5263 for (const auto &P : HazardPairs)
5264 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
5265
5266 for (const auto *Obj : MixedObjects)
5267 EmitRemark(
5268 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
5269}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
static bool requiresSaveVG(MachineFunction &MF)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
#define LLVM_DEBUG(...)
Definition: Debug.h:106
uint32_t Index
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition: LLParser.cpp:71
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
#define P(N)
static const MCPhysReg FPR[]
FPR - The set of FP registers that should be allocated for arguments on Darwin and AIX.
if(PassOpts->AAPipeline)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
raw_pwrite_stream & OS
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:166
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
const char * getChkStkName() const
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool isCallingConvWin64(CallingConv::ID CC, bool IsVarArg) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:168
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:163
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:707
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:704
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:277
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:353
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition: Function.h:234
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:731
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc, bool RenamableDest=false, bool RenamableSrc=false) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:661
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:582
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:656
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:575
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:617
static MCCFIInstruction createNegateRAStateWithPC(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state_with_pc AArch64 negate RA state with PC.
Definition: MCDwarf.h:648
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:643
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:590
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:687
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:670
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:345
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
bool isLiveIn(MCRegister Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MCContext & getContext() const
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:69
void setFlags(unsigned flags)
Definition: MachineInstr.h:410
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:392
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
Diagnostic information for optimization analysis remarks.
void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:310
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:81
size_t size() const
Definition: SmallVector.h:78
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:573
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:937
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:683
void push_back(const T &Elt)
Definition: SmallVector.h:413
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1196
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:51
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1354
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
constexpr ScalarTy getFixedValue() const
Definition: TypeSize.h:202
self_iterator getIterator()
Definition: ilist_node.h:132
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition: raw_ostream.h:52
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:2037
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1746
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:420
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1664
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
auto remove_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::remove_if which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1778
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
Definition: APFixedPoint.h:303
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.