Some notes on the contents of some of the header files

Edit these files to suit your needs.

blocklist.h

These words aren't indexed.

custom.h

This file may be created and then used to customise the output from the search form;

#ifndef CGS_CUSTOM_H
#define CGS_CUSTOM_H 1

#include <stdio.h>

#define CGS_STYLE_SHEET "Path/File.css"
#define CGS_HTML_FOOT   "\nSome_String\n\n"

#endif	/*  custom.h   */

'custom.h' is included in 'cgi-search.c';

#if __has_include ( "custom.h" )
 #include "custom.h"
#endif

'__has_include' does not work on older C compilers! In which case, you may need to change this bit of code;

#include "custom.h"

Or remove this bit altogether. It's just used to modify the appearance of the results a bit.

defltlist.h

Default word-list.

lat-diac.h

Latin combining non spacing marks. When combined with ASCII, these form accented letters.
Based on UnicodeData.txt.

no-alnum.h

Characters which aren't considered alpha-numeric.

For as far as ASCII is concerned, the indexer considers anything but ASCII- alnum (0 to 9 and a to z), alnum.alnum, alnum:alnum, '::', alnum::, ::alnum and alnum::alnum a word delimiter.
With the '-u' option it has to make decisions about non-ASCII as well;

This header file is based on UnicodeData.txt.
All but the following is in this header file;

Lowercase letter
Other letter
Title case letter
Upper case letter
Spacing combining mark
Nonspacing mark
Decimal number
Letter number
Surrogate

So all of the above are considered alpha-numeric and therefore not word delimiters.
I hope these criteria are correct. I know very little about non-latin scripts.
Note: Without the '-u' option, the indexer will consider all non-ASCII to be a word dilimiter.

sgent2utf.h

Converts HTML-, SGML- and XML entities into UTF-8. Based on W3C's https://www.w3.org/2003/entities/2007xml/unicode.xml.zip. It runs from 'AElig' ('Æ') to 'zwnj' (zero width non-joiner) and contains 2408 entities.
A complete list here.

wc2str.h

ASCII equivalents for non-ASCII (before 2021-11-29 this used to be wc2asc.h). Based on field 5 in UnicodeData.txt.
Added are;

Letters which are described as ligatures and can be represented by ASCII.
Parsed output from Recode.

Manually added are;

MICRO SIGN, which is represented by 'u', E.G.: uF.
OHM SIGN, which is represented by 'ohm', E.G.: kohm.

So, if you look for 'uF', the search will find 'µF' as well.

wc2num.h

Numeric equivalents for non-ASCII.
Based on UnicodeData.txt.