A hidden feature in solr-init

A hidden feature in solr-init
Photo by Mohammad Rahmani / Unsplash

On a recent project, I had to add a custom Solr index to an existing Dockerized XP solution. Since the solr-init container normally exits if any Sitecore indexes are found, in the past I've handled this by customizing the container to add my new index, deleting all data from the solr-data folder, and then restarting my environment with docker compose up -d. Since solr-init would not see any indexes, it would be forced to create new ones, including the one I added. Although this works, this is inefficient, as it forces you to rebuild all of your indexes, even though they have not been impacted by the change. I was inspired to take a look at the internals of solr-init to see if there's a way around this, and indeed there is.

Let's suppose we've customized solr-init by adding a layer, which adds the following as C:\data\cores-custom.json:

{
  "sitecore": [
     "_my_new_index"
     ]
}

If solr-init ran with normal settings, the following method in Start.ps1 would cause the script to exit as soon as it hit any indexes in any cores.json file:

$solrCollections = (Invoke-RestMethod -Uri "$SolrEndpoint/admin/collections?action=LIST&omitHeader=true" -Method Get -Credential (Get-SolrCredential)).collections
foreach ($solrCoreName in ($solrSitecoreCoreNames + $solrXdbCoreNames)) {
    if ($solrCollections -contains ('{0}{1}' -f $SolrCorePrefix, $solrCoreName)) {
        Write-Information -MessageData "Sitecore collections are already exist. Use collection name prefix different from '$SolrCorePrefix'." -InformationAction:Continue
        exit
    }
}

In this snippet, the script calls the Solr API to get the current list of collections, then it iterates though a list of Solr and xDB core names, and if any are found, it writes a diagnostic message and exits.

The feature that allows us to get past this is found in the method GetCoreNames that populates $solrSitecoreCoreNames and $solrXdbCoreNames:

$solrSitecoreCoreNames = GetCoreNames -CoreType "sitecore" -SolrCollectionsToDeploy $SolrCollectionsToDeploy
$solrXdbCoreNames = GetCoreNames -CoreType "xdb" -SolrCollectionsToDeploy $SolrCollectionsToDeploy

The function takes the value of a variable $SolrCollectionsToDeploy, comma splits it, and then uses that to filter the list of core*.json files in C:\data.

function GetCoreNames {
    param (
        [ValidateSet("sitecore", "xdb")]
        [string]$CoreType,
        
        [string]$SolrCollectionsToDeploy
    )

    $resultCoreNames = @()
    $SolrCollectionsToDeploy.Split(',') | ForEach-Object {
        $solrCollectionToDeploy = $_
        Get-ChildItem C:\data -Filter "cores*$solrCollectionToDeploy.json" | ForEach-Object {
            $coreNames = (Get-Content $_.FullName | Out-String | ConvertFrom-Json).$CoreType
            if ($coreNames) {
                $resultCoreNames += $coreNames
            }
        }
    }

    return $resultCoreNames
}

So if we were able to pass "custom" to the $solrCollectionsToDeploy parameter, only the indexes in our custom JSON file would be checked to determine whether the container should exit. This parameter is passed into Start.ps1, and if we look at the entry point configuration using docker image inspect, we see this value is set from an environment variable, SOLR_COLLECTIONS_TO_DEPLOY:

 "Entrypoint": [
                "powershell.exe",
                ".\\Start.ps1",
                "-SitecoreSolrConnectionString $env:SITECORE_SOLR_CONNECTION_STRING",
                "-SolrCorePrefix $env:SOLR_CORE_PREFIX_NAME",
                "-SolrSitecoreConfigsetSuffixName $env:SOLR_SITECORE_CONFIGSET_SUFFIX_NAME",
                "-SolrReplicationFactor $env:SOLR_REPLICATION_FACTOR",
                "-SolrNumberOfShards $env:SOLR_NUMBER_OF_SHARDS",
                "-SolrMaxShardsPerNodes $env:SOLR_MAX_SHARDS_NUMBER_PER_NODES",
                "-SolrXdbSchemaFile .\\data\\schema.json",
                "-SolrCollectionsToDeploy $env:SOLR_COLLECTIONS_TO_DEPLOY"
            ],

This variable is empty by default:

 "Env": [
                "SOLR_SITECORE_CONFIGSET_SUFFIX_NAME=_config",
                "SOLR_REPLICATION_FACTOR=1",
                "SOLR_NUMBER_OF_SHARDS=1",
                "SOLR_MAX_SHARDS_NUMBER_PER_NODES=1",
                "SOLR_CORE_PREFIX_NAME=sitecore",
                "SOLR_XDB_SCHEMA_FILE=/Content/Website/App_Data/solrcommands/schema.json",
                "TOPOLOGY=xp1",
                "SOLR_COLLECTIONS_TO_DEPLOY="
            ],

However, we can pass a value to this variable from the command line, using the docker compose run command:

docker compose run -e SOLR_COLLECTIONS_TO_DEPLOY=custom solr-init

This will create a second instance of solr-init, which will only check our custom json file to determine if it should exit. Since the new index is not present, it will be created, without having to delete the other indexes. Mission accomplished!

Note that it would be a bad ideas to set this environment value in docker-compose configuration for solr-init, as this would short circuit creation of standard Sitecore indexes for a developer building a new environment. The one-off docker compose run call avoids this, and can be easily shared with other developers who need to work with this index.

One final wrinkle with this parameter: If passed, the script assumes the base collections have already been created, so bypasses the step of creating a base Solr configset. This is logical, as the default configuration would have been created during the initial run of solr-init.

$SkipBaseConfigCreate = $false
if ($SolrCollectionsToDeploy){

    Write-Host "SolrCollectionsToDeploy exists, skipping base configset creation."
    $SkipBaseConfigCreate = $true
}

This parameter is particularly handy if you have a large development team using Docker. If the solution requires the presence of a new index, rather than asking your colleagues to delete all of their indexes, you can simply ask them to run the docker compose run command above, possibly saving considerable time.

References

  • Documentation on adding Solr collections to solr-init are found in the Developer Workstation Deployment with Docker guide, section 2.2.2. The treatment is minimal, just stating a new layer should be added on top of solr-init, and that a new JSON file should be placed in c:\data.
  • A deeper dive into customizing Sitecore containers is found here: https://doc.sitecore.com/xp/en/developers/latest/developer-tools/add-sitecore-modules.html
  • Koen Haye has written up a tutorial on adding custom Solr indexes and an overview of the Start.ps1 script, with tips on how to read container contents.
  • The approach in this article assumes the new index will share the standard Solr configuration used by sitecore_master_index etc. If you need to customize the configuration, this requires a more extensive customization of solr-init. The old Sitecore Commerce solr-init customization provides an example of how do do this. This exists for 10.1 through 10.3, as no XC 10.4 was released. Run docker run --rm -it scr.sitecore.com/sxc/sitecore-xc1-assets:10.3-ltsc2022 and look at C:\module\tools\scripts\Start-Commerce.ps1 for an example script that pushes custom schemas for each index.