Site map  

Some notes on the contents of some of the header files

Edit these files to suit your needs.


These words aren't indexed.


This file may be created and then used to customise the output from the search form;

#ifndef CGS_CUSTOM_H
#define CGS_CUSTOM_H 1

#include <stdio.h>

#define CGS_STYLE_SHEET "Path/File.css"
#define CGS_HTML_FOOT   "\nSome_String\n\n"

#endif	/*  custom.h   */

'custom.h' is included in 'cgi-search.c';

#if __has_include ( "custom.h" )
 #include "custom.h"

'__has_include' does not work on older C compilers! In which case, you may need to change this bit of code;

#include "custom.h"

Or remove this bit altogether. It's just used to modify the appearance of the results a bit.


Default word-list.


Latin combining non spacing marks. When combined with ASCII, these form accented letters.
Based on UnicodeData.txt.


Characters which aren't considered alpha-numeric.

For as far as ASCII is concerned, the indexer considers anything but ASCII- alnum (0 to 9 and a to z), alnum.alnum, alnum:alnum, '::', alnum::, ::alnum and alnum::alnum a word delimiter.
With the '-u' option it has to make decisions about non-ASCII as well;

This header file is based on UnicodeData.txt.
All but the following is in this header file;

So all of the above are considered alpha-numeric and therefore not word delimiters.
I hope these criteria are correct. I know very little about non-latin scripts.
Note: Without the '-u' option, the indexer will consider all non-ASCII to be a word dilimiter.


Converts HTML-, SGML- and XML entities into UTF-8. Based on W3C's It runs from 'AElig' ('Æ') to 'zwnj' (zero width non-joiner) and contains 2408 entities.
A complete list here.


ASCII equivalents for non-ASCII (before 2021-11-29 this used to be wc2asc.h). Based on field 5 in UnicodeData.txt.
Added are;

Manually added are;

So, if you look for 'uF', the search will find 'µF' as well.


Numeric equivalents for non-ASCII.
Based on UnicodeData.txt.