bash - Extract unique domain names and the document root from Apache vhosts -


i have apache vhost looks little bit this:

<virtualhost *:80> servername aaaaaa.com serveralias www.aaaaaa.com upload.aaaaaa.com publisher.aaaaaa.com serveralias aaaaaa.no upload.aaaaaa.no www.aaaaaa.no publisher.aaaaaa.no serveralias bbbbbb.no www.bbbbbb.no upload.bbbbbb.no publisher.bbbbbb.no serveralias cccccc.live upload.cccccc.live www.cccccc.live publisher.cccccc.live serveralias dddddd.com *.dddddd.com serveralias aaaaaa.cccccc.live *.aaaaaa.cccccc.live serveralias eeeeee.com *.eeeeee.com serveralias ffffff.com *.ffffff.com serveralias aaaaaa-com.bbbbbb.no  serveradmin webmaster@bbbbbb.no  errorlog /var/log/apache2/aaaaaa.com-error.log customlog /var/log/apache2/aaaaaa.com-access.log combined serversignature off  <ifmodule mod_php5.c>     addtype application/x-httpd-php .php .phtml .php3     addtype application/x-httpd-php-source .phps </ifmodule>  documentroot /var/www/html/aaaaaa.com/current/www/ <directory /var/www/html/aaaaaa.com/current>     allowoverride </directory>  alias  /oc-publisher-framework /var/www/releases/oc-publisher-framework-www/      # fake https become on cloudflare special headers.     setenvif x-forwarded-proto https https=on     setenvif cf-visitor {"scheme":"https"} https=on </virtualhost>  <ifmodule mod_ssl.c> <virtualhost _default_:443>  servername aaaaaa.com serveralias aaaaaa.no serveralias www.aaaaaa.no serveralias bbbbbb.no serveralias www.bbbbbb.no serveralias cccccc.live serveralias www.cccccc.live publisher.cccccc.live serveralias dddddd.com serveralias www.dddddd.com serveralias www.aaaaaa.com serveralias gc-aaaaaa.aaaaaa.com serveralias www.gc-aaaaaa.aaaaaa.com serveralias aaaaaa.cccccc.live serveralias www.aaaaaa.cccccc.live serveralias eeeeee.com www.eeeeee.com      serveradmin webmaster@bbbbbb.no      errorlog /var/log/apache2/aaaaaa.com-error.log     customlog /var/log/apache2/aaaaaa.com-access.log combined     serversignature off      <ifmodule mod_php5.c>       addtype application/x-httpd-php .php .phtml .php3       addtype application/x-httpd-php-source .phps     </ifmodule>      documentroot /var/www/html/aaaaaa.com/current/www/     <directory /var/www/html/aaaaaa.com/current>       allowoverride     </directory>      alias  /oc-publisher-framework /var/www/html/releases/oc-publisher-framework-www/      sslengine on     sslcertificatefile /etc/apache2/ssl/bbbbbb.no.crt     sslcertificatekeyfile /etc/apache2/ssl/bbbbbb.no.key.txt     <filesmatch "\.(cgi|shtml|phtml|php)$">             ssloptions +stdenvvars     </filesmatch>     <directory /usr/lib/cgi-bin>             ssloptions +stdenvvars     </directory>     browsermatch "msie [2-6]" \             nokeepalive ssl-unclean-shutdown \             downgrade-1.0 force-response-1.0     browsermatch "msie [17-9]" ssl-unclean-shutdown </virtualhost> </ifmodule> 

i extract unique domain names , document root here output this:

/var/www/html/aaaaaa.com/current/www/ - aaaaaa.com www.aaaaaa.com upload.aaaaaa.com publisher.aaaaaa.com aaaaaa.no upload.aaaaaa.no www.aaaaaa.no publisher.aaaaaa.no bbbbbb.no www.bbbbbb.no upload.bbbbbb.no publisher.bbbbbb.no ... 

how accomplish awk, bash or python?

you can use awk:

awk '$1 ~ /^(servername|serveralias)/ {    (i=2; i<=nf; i++)       hosts[$i] } $1 == "documentroot" {    dr = $2 } end {   printf "%s -", dr   (i in hosts)      printf " %s",   print "" }' httpd.conf  /var/www/html/aaaaaa.com/current/www/ - www.aaaaaa.cccccc.live dddddd.com www.cccccc.live www.eeeeee.com *.ffffff.com upload.cccccc.live www.bbbbbb.no bbbbbb.no ffffff.com eeeeee.com upload.aaaaaa.no gc-aaaaaa.aaaaaa.com publisher.aaaaaa.no publisher.cccccc.live upload.aaaaaa.com www.aaaaaa.com aaaaaa-com.bbbbbb.no www.gc-aaaaaa.aaaaaa.com aaaaaa.cccccc.live www.aaaaaa.no aaaaaa.no publisher.aaaaaa.com aaaaaa.com www.dddddd.com *.eeeeee.com *.aaaaaa.cccccc.live *.dddddd.com cccccc.live upload.bbbbbb.no publisher.bbbbbb.no 

Comments