D3D12 Texture Mipmap Generation

Published on 13th Mar, 2017 by Administrator

Introduction

If you start writing a 3D graphics engine, the basics usually consist of loading mesh, texture and shader data and getting it to the GPU in a way that enables the GPU to run the shader with the mesh and texture data as input. To load the texture data in a plattform independent way, easy formats to get started with are TGA, BMP and PNG (with a lot of help from libpng...). Much better for a real game are usually compressed formats such as S3TC that can be decompressed by the GPU while rendering at no performance cost. But at least I have a lot of PNG files laying around I want to test with and I don't really feel like converting everything until I have a real asset pipeline going.

However, just loading a PNG file and then using it for rendering, results in massive aliasing with a moving camera for all surfaces that are not directly in front of the camera: image

The solution is of course mipmapping, by using lower resolution versions of the same texture depending on the screen pixels size on the textured surface, the aliasing can be eliminated: image

In OpenGL there is a function called glGenerateMipmap, which automatically generates those mipmaps for a given texture. DirectX until DirectX11 has a method called GenerateMips doing the same thing. Both are using the GPU and are somewhat fast as a result. Turns out that both new rendering APIs Vulkan and Direct3D12 got rid of this functionality. The easy solution would be to just generate the mipmaps on the CPU, or well, have an offline process generate them and ideally combine that with a compression format such as S3TC. If you don't want that, Vulkan has a helpful vkCmdBlitImage function that copies a source texture to a destination texture by downsampling it. It is very easy to find this when researching the topic and while it took me a while to get it running, it kinda just works.

The Code

For Direct3D12 on the other hand there is no such blit function and I just found some people talking about Microsofts samples. And yes, after some digging it turned out that the MiniEngine has functionality to generate mipmaps using a compute shader. But since I am not building on top of the MiniEngine and because it has quite a few layers of abstraction it isn't very much plug and play... I copy pasted it all together and somehow made it work. It is quite far away from the MiniEngine code and absolutely not the way to do it, but it is something to get started with that should mostly just work, as everything needed is in one place:

//_mipMapTextures is an array containing texture objects that need mipmaps to be generated. It needs a texture resource with mipmaps in D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE state.
//Textures are expected to be POT and in a format supporting unordered access, as well as the D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS set during creation.
//_device is the ID3D12Device
//GetNewCommandList() is supposed to return a new command list in recording state
//SubmitCommandList(commandList) is supposed to submit the command list to the command queue
//_mipMapComputeShader is an ID3DBlob of the compiled mipmap compute shader
void D3D12Renderer::CreateMipMaps()
{
    //Union used for shader constants
    struct DWParam
    {
        DWParam(FLOAT f) : Float(f) {}
        DWParam(UINT u) : Uint(u) {}

        void operator= (FLOAT f) { Float = f; }
        void operator= (UINT u) { Uint = u; }

        union
        {
            FLOAT Float;
            UINT Uint;
        };
    };

    //Calculate heap size
    uint32 requiredHeapSize = 0;
    _mipMapTextures->Enumerate<D3D12Texture>([&](D3D12Texture *texture, size_t index, bool &stop) {
        if(texture->mipMaps > 1)
            requiredHeapSize += texture->mipMaps - 1;
    });

    //No heap size, means that there was either no texture or none that requires any mipmaps
    if(requiredHeapSize == 0)
    {
        _mipMapTextures->RemoveAllObjects();
        return;
    }

    //The compute shader expects 2 floats, the source texture and the destination texture
    CD3DX12_DESCRIPTOR_RANGE srvCbvRanges[2];
    CD3DX12_ROOT_PARAMETER rootParameters[3];
    srvCbvRanges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 0, 0);
    srvCbvRanges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_UAV, 1, 0, 0);
    rootParameters[0].InitAsConstants(2, 0);
    rootParameters[1].InitAsDescriptorTable(1, &srvCbvRanges[0]);
    rootParameters[2].InitAsDescriptorTable(1, &srvCbvRanges[1]);

    //Static sampler used to get the linearly interpolated color for the mipmaps
    D3D12_STATIC_SAMPLER_DESC samplerDesc = {};
    samplerDesc.Filter = D3D12_FILTER_MIN_MAG_LINEAR_MIP_POINT;
    samplerDesc.AddressU = D3D12_TEXTURE_ADDRESS_MODE_CLAMP;
    samplerDesc.AddressV = D3D12_TEXTURE_ADDRESS_MODE_CLAMP;
    samplerDesc.AddressW = D3D12_TEXTURE_ADDRESS_MODE_CLAMP;
    samplerDesc.MipLODBias = 0.0f;
    samplerDesc.ComparisonFunc = D3D12_COMPARISON_FUNC_NEVER;
    samplerDesc.MinLOD = 0.0f;
    samplerDesc.MaxLOD = D3D12_FLOAT32_MAX;
    samplerDesc.MaxAnisotropy = 0;
    samplerDesc.BorderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_BLACK;
    samplerDesc.ShaderRegister = 0;
    samplerDesc.RegisterSpace = 0;
    samplerDesc.ShaderVisibility = D3D12_SHADER_VISIBILITY_ALL;

    //Create the root signature for the mipmap compute shader from the parameters and sampler above
    ID3DBlob *signature;
    ID3DBlob *error;
    CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc;
    rootSignatureDesc.Init(_countof(rootParameters), rootParameters, 1, &samplerDesc, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
    D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error);
    ID3D12RootSignature *mipMapRootSignature;
    _device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&mipMapRootSignature));

    //Create the descriptor heap with layout: source texture - destination texture
    D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {};
    heapDesc.NumDescriptors = 2*requiredHeapSize;
    heapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
    heapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
    ID3D12DescriptorHeap *descriptorHeap;
    _device->CreateDescriptorHeap(&heapDesc, IID_PPV_ARGS(&descriptorHeap));
    UINT descriptorSize = _device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);

    //Create pipeline state object for the compute shader using the root signature.
    D3D12_COMPUTE_PIPELINE_STATE_DESC psoDesc = {};
    psoDesc.pRootSignature = mipMapRootSignature;
    psoDesc.CS = { reinterpret_cast<UINT8*>(_mipMapComputeShader->GetBufferPointer()), _mipMapComputeShader->GetBufferSize() };
    ID3D12PipelineState *psoMipMaps;
    _device->CreateComputePipelineState(&psoDesc, IID_PPV_ARGS(&psoMipMaps));

    //Prepare the shader resource view description for the source texture
    D3D12_SHADER_RESOURCE_VIEW_DESC srcTextureSRVDesc = {};
    srcTextureSRVDesc.Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
    srcTextureSRVDesc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;

    //Prepare the unordered access view description for the destination texture
    D3D12_UNORDERED_ACCESS_VIEW_DESC destTextureUAVDesc = {};
    destTextureUAVDesc.ViewDimension = D3D12_UAV_DIMENSION_TEXTURE2D;

    //Get a new empty command list in recording state
    ID3D12GraphicsCommandList *commandList = GetNewCommandList();

    //Set root signature, pso and descriptor heap
    commandList->SetComputeRootSignature(mipMapRootSignature);
    commandList->SetPipelineState(psoMipMaps);
    commandList->SetDescriptorHeaps(1, &descriptorHeap);

    //CPU handle for the first descriptor on the descriptor heap, used to fill the heap
    CD3DX12_CPU_DESCRIPTOR_HANDLE currentCPUHandle(descriptorHeap->GetCPUDescriptorHandleForHeapStart(), 0, descriptorSize);

    //GPU handle for the first descriptor on the descriptor heap, used to initialize the descriptor tables
    CD3DX12_GPU_DESCRIPTOR_HANDLE currentGPUHandle(descriptorHeap->GetGPUDescriptorHandleForHeapStart(), 0, descriptorSize);

    _mipMapTextures->Enumerate<D3D12Texture>([&](D3D12Texture *texture, size_t index, bool &stop) {
        //Skip textures without mipmaps
        if(texture->mipMaps <= 1)
            return;

        //Transition from pixel shader resource to unordered access
        commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(texture->_resource, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_UNORDERED_ACCESS));

        //Loop through the mipmaps copying from the bigger mipmap to the smaller one with downsampling in a compute shader
        for(uint32_t TopMip = 0; TopMip < texture->mipMaps-1; TopMip++)
        {
            //Get mipmap dimensions
            uint32_t dstWidth = std::max(texture->width >> (TopMip+1), 1);
            uint32_t dstHeight = std::max(texture->height >> (TopMip+1), 1);

            //Create shader resource view for the source texture in the descriptor heap
            srcTextureSRVDesc.Format = texture->_format;
            srcTextureSRVDesc.Texture2D.MipLevels = 1;
            srcTextureSRVDesc.Texture2D.MostDetailedMip = TopMip;
            _device->CreateShaderResourceView(texture->_resource, &srcTextureSRVDesc, currentCPUHandle);
            currentCPUHandle.Offset(1, descriptorSize);

            //Create unordered access view for the destination texture in the descriptor heap
            destTextureUAVDesc.Format = texture->_format;
            destTextureUAVDesc.Texture2D.MipSlice = TopMip+1;
            _device->CreateUnorderedAccessView(texture->_resource, nullptr, &destTextureUAVDesc, currentCPUHandle);
            currentCPUHandle.Offset(1, descriptorSize);

            //Pass the destination texture pixel size to the shader as constants
            commandList->SetComputeRoot32BitConstant(0, DWParam(1.0f/dstWidth).Uint, 0);
            commandList->SetComputeRoot32BitConstant(0, DWParam(1.0f/dstHeight).Uint, 1);

            //Pass the source and destination texture views to the shader via descriptor tables
            commandList->SetComputeRootDescriptorTable(1, currentGPUHandle);
            currentGPUHandle.Offset(1, descriptorSize);
            commandList->SetComputeRootDescriptorTable(2, currentGPUHandle);
            currentGPUHandle.Offset(1, descriptorSize);

            //Dispatch the compute shader with one thread per 8x8 pixels
            commandList->Dispatch(std::max(dstWidth / 8, 1u), std::max(dstHeight / 8, 1u), 1);

            //Wait for all accesses to the destination texture UAV to be finished before generating the next mipmap, as it will be the source texture for the next mipmap
            commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(texture->_resource));
        }

        //When done with the texture, transition it's state back to be a pixel shader resource
        commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(texture->_resource, D3D12_RESOURCE_STATE_UNORDERED_ACCESS, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
    });

    //Close and submit the command list
    commandList->Close();
    SubmitCommandList(commandList);

    _mipMapTextures->RemoveAllObjects();
}

This is the compute shader code:

Texture2D<float4> SrcTexture : register(t0);
RWTexture2D<float4> DstTexture : register(u0);
SamplerState BilinearClamp : register(s0);

cbuffer CB : register(b0)
{
    float2 TexelSize;   // 1.0 / destination dimension
}

[numthreads( 8, 8, 1 )]
void GenerateMipMaps(uint3 DTid : SV_DispatchThreadID)
{
    //DTid is the thread ID * the values from numthreads above and in this case correspond to the pixels location in number of pixels.
    //As a result texcoords (in 0-1 range) will point at the center between the 4 pixels used for the mipmap.
    float2 texcoords = TexelSize * (DTid.xy + 0.5);

    //The samplers linear interpolation will mix the four pixel values to the new pixels color
    float4 color = SrcTexture.SampleLevel(BilinearClamp, texcoords, 0);

    //Write the final color into the destination texture.
    DstTexture[DTid.xy] = color;
}

Essential things to understand when getting started with D3D12

I had a very hard time understanding descriptors, descriptor heaps and the root signature btw. Turns out that the root signature has to somewhat fit the shader and describes the shader uniform, texture and sample data. Descriptors, which are also called "views", such as "shader resource views" (SRV), "constant buffer views" (CBV) and "unordered access views" (UAV) and don't have much to do with "Descriptors" such as D3D12_RESOURCE_DESC, but instead tell the root signature where to find the data to be used in the shader. They happen to be allocated on a descriptor heap and since it is not recommended to switch the heap all the time, should all be known before the command list is generated. And then instead of switching the heap, a handle to the elements to use on the heap can be changed all the time. That last part is done with SetComputeRootDescriptorTable, but it is also possible to directly set a limited amount constants. For a bit more context of what I am saying, just look at my code above, as it needs all this.

Edit: Since gamma correct rendering is the right thing to do, I found out that the above code does not work for srgb texture formats as unordered access is not available for those. I solved it by copying my srgb resource into a none srgb resource (rgba_8888_srgb can for example be copied into a rgba_8888 resource) using commandList->CopyResource(dest, src). One issue with this approach is that the mipmap generation is happening in gamma space. I solved it by sampling the four pixels individually and applying a gamma curve using pow(color, 2.2) on the four samples, averaging them and transforming them back into gamma space using the inverse (pow(color, 1.0/2.2)). Also the above code does not clean up the resources it created. The tricky part is releasing them once they are not needed by the GPU anymore, which is not at the end of the function, but somewhen later. An easy solution would be to submit the command list to the queue and wait for it to finish. The solution I am using is a global fence that is checked every frame and if it is bigger than the one for the frame the command list generating the mipmaps was generated for, a callback on the command list ist called. That callback can then safely release the resources.

Comments