PYCON 83
Config-bing.xml Guest on 7th May 2020 10:13:05 AM
  1. This XML file does not appear to have any style information associated with it. The document tree is shown below.
  2. <!--
  3.  
  4.   sitemap_gen.py example configuration script
  5.  
  6.   This file specifies a set of sample input parameters for the
  7.   sitemap_gen.py client.
  8.  
  9.   You should copy this file into "config.xml" and modify it for
  10.   your server.
  11.  
  12.  
  13.   *********************************************************
  14. -->
  15. <!--
  16.  ** MODIFY **
  17.   The "site" node describes your basic web site.
  18.  
  19.   Required attributes:
  20.     base_url   - the top-level URL of the site being mapped
  21.     store_into - the webserver path to the desired output file.
  22.                  This should end in '.xml' or '.xml.gz'
  23.                  (the script will create this file)
  24.  
  25.   Optional attributes:
  26.     verbose    - an integer from 0 (quiet) to 3 (noisy) for
  27.                  how much diagnostic output the script gives
  28.     suppress_search_engine_notify="1"
  29.                - disables notifying search engines about the new map
  30.                  (same as the "testing" command-line argument.)
  31.     default_encoding
  32.                - names a character encoding to use for URLs and
  33.                  file paths.  (Example: "UTF-8")
  34. -->
  35. <site base_url="http://wiki.wiki/" store_into="sitemap-wiki-bing.xml.gz" verbose="1">
  36. <!--
  37.  ********************************************************
  38.           INPUTS
  39.  
  40.   All the various nodes in this section control where the script
  41.   looks to find URLs.
  42.  
  43.   MODIFY or DELETE these entries as appropriate for your server.
  44.   *********************************************************
  45. -->
  46. <!--
  47.  ** MODIFY or DELETE **
  48.     "url" nodes specify individual URLs to include in the map.
  49.  
  50.     Required attributes:
  51.       href       - the URL
  52.  
  53.     Optional attributes:
  54.       lastmod    - timestamp of last modification (ISO8601 format)
  55.       changefreq - how often content at this URL is usually updated
  56.       priority   - value 0.0 to 1.0 of relative importance in your site
  57.  
  58. -->
  59. <!--
  60.  
  61.   <url  href="http://www.example.com/stats?q=name"  />
  62.   <url
  63.      href="http://www.example.com/stats?q=age"
  64.      lastmod="2004-11-14T01:00:00-07:00"
  65.      changefreq="yearly"
  66.      priority="0.3"
  67.   />
  68.  
  69. -->
  70. <!--
  71.  ** MODIFY or DELETE **
  72.     "urllist" nodes name text files with lists of URLs.
  73.     An example file "example_urllist.txt" is provided.
  74.  
  75.     Required attributes:
  76.       path       - path to the file
  77.  
  78.     Optional attributes:
  79.       encoding   - encoding of the file if not US-ASCII
  80.  
  81. -->
  82. <urllist path="urllist-wiki.txt" encoding="UTF-8"/>
  83. <!--
  84.  ** MODIFY or DELETE **
  85.     "directory" nodes tell the script to walk the file system
  86.     and include all files and directories in the Sitemap.
  87.  
  88.     Required attributes:
  89.       path       - path to begin walking from
  90.       url        - URL equivalent of that path
  91.  
  92.     Optional attributes:
  93.       default_file - name of the index or default file for directory URLs
  94.  
  95. -->
  96. <!--
  97.  
  98.   <directory  path="/var/www/icons"    url="http://www.example.com/images/" />
  99.   <directory
  100.      path="/var/www/docroot"
  101.      url="http://www.example.com/"
  102.      default_file="index.html"
  103.   />
  104.  
  105. -->
  106. <!--
  107.  ** MODIFY or DELETE **
  108.     "accesslog" nodes tell the script to scan webserver log files to
  109.     extract URLs on your site.  Both Common Logfile Format (Apache's default
  110.     logfile) and Extended Logfile Format (IIS's default logfile) can be read.
  111.  
  112.     Required attributes:
  113.       path       - path to the file
  114.  
  115.     Optional attributes:
  116.       encoding   - encoding of the file if not US-ASCII
  117.  
  118. -->
  119. <!--
  120.  
  121.   <accesslog  path="/etc/httpd/logs/access.log"       encoding="UTF-8"  />
  122.   <accesslog  path="/etc/httpd/logs/access.log.0"     encoding="UTF-8"  />
  123.   <accesslog  path="/etc/httpd/logs/access.log.1.gz"  encoding="UTF-8"  />
  124.  
  125. -->
  126. <!--
  127.  ** MODIFY or DELETE **
  128.     "sitemap" nodes tell the script to scan other Sitemap files.  This can
  129.     be useful to aggregate the results of multiple runs of this script into
  130.     a single Sitemap.
  131.  
  132.     Required attributes:
  133.       path       - path to the file
  134.  
  135. -->
  136. <!--
  137.  
  138.   <sitemap    path="/var/www/docroot/subpath/sitemap.xml" />
  139.  
  140. -->
  141. <!--
  142.  ********************************************************
  143.           FILTERS
  144.  
  145.   Filters specify wild-card patterns that the script compares
  146.   against all URLs it finds.  Filters can be used to exclude
  147.   certain URLs from your Sitemap, for instance if you have
  148.   hidden content that you hope the search engines don't find.
  149.  
  150.   Filters can be either type="wildcard", which means standard
  151.   path wildcards (* and ?) are used to compare against URLs,
  152.   or type="regexp", which means regular expressions are used
  153.   to compare.
  154.  
  155.   Filters are applied in the order specified in this file.
  156.  
  157.   An action="drop" filter causes exclusion of matching URLs.
  158.   An action="pass" filter causes inclusion of matching URLs,
  159.   shortcutting any other later filters that might also match.
  160.   If no filter at all matches a URL, the URL will be included.
  161.   Together you can build up fairly complex rules.
  162.  
  163.   The default action is "drop".
  164.   The default type is "wildcard".
  165.  
  166.   You can MODIFY or DELETE these entries as appropriate for
  167.   your site.  However, unlike above, the example entries in
  168.   this section are not contrived and may be useful to you as
  169.   they are.
  170.   *********************************************************
  171. -->
  172. <!--
  173.  Exclude URLs that end with a '~'   (IE: emacs backup files)      
  174. -->
  175. <filter action="drop" type="wildcard" pattern="*~"/>
  176. <!--
  177.  Exclude URLs within UNIX-style hidden files or directories      
  178. -->
  179. <filter action="drop" type="regexp" pattern="/\.[^/]*"/>
  180. <filter action="drop" type="regexp" pattern="title="/>
  181. </site>

Paste is for source code and general debugging text.

Login or Register to edit, delete and keep track of your pastes and more.

Raw Paste

Login or Register to edit or fork this paste. It's free.