-[ BFi - English version ]----------------------------------------------------
        BFi is an e-zine written by the Italian hacker community.
        Full source code and original Italian version are available at:
	        http://bfi.s0ftpj.org/dev/BFi12-dev-10.tar.gz
       English version translated by Tanith <lorettaharlowe@yahoo.it>, nail,
       Raist_XOL and Zen.
------------------------------------------------------------------------------


==============================================================================
--------------------[ BFi12-dev - file 10 - 14/03/2004 ]----------------------
==============================================================================

-[ DiSCLAiMER ]---------------------------------------------------------------
        The whole stuff contained in BFi has informative and educational
        purposes only. In no event the authors could be considered liable
        for damages caused to people or things due to the use of code,
        programs, pieces of information, techniques published on the e-zine.
        BFi is a free and autonomous way of expression; we, the authors,
        are as free to write BFi as you are free to go on reading or to stop
        doing it right now. Therefore, if you think you could be harmed by
        the topics covered and/or by the way they are in, * stop reading
        immediately and remove these files from your computer * .
        You, the reader, will keep to youself all the responsabilities about
        the use you will do of the information published on BFi by going on.
        You are not allowed to post BFi to the newsgroups and to spread
        *parts* of the magazine: please distribute BFi in its original and
        complete form.
------------------------------------------------------------------------------

-[ HACKiNG ]------------------------------------------------------------------
---[ STEGAN0GRAPHY APPLIED 0N NETW0RK SESSi0NS AND NEiGHB0URH00D 
-----[ vecna <vecna@s0ftpj.org> 

Ah, what a nice stuff steganography. If you want to clarify your ideas, you can
read http://citeseer.nj.nec.com/fridrich02practical.html

Unfortunately, my experience as a teacher is not helping me when I write an
article, maybe I count on too many things, thus, in order to help you
understanding, I suggest you to read links when there are: so, reading the
article will keep you busy for several days... but eventually it will be worth
it. :)

1) STATE OF THE ART

Usually steganography is considered to be applied to image files. However,
whoever is a bit more interested in file system, whoever has understood its
essence can apply it wherever. Anyway, I will leave the most tasty frenzies in
the third part of the article.

Applying steganography to net sessions is not much different than covert
channels, the only difference is that they are usually thought of as
hidden communication systems for a net protection system. If I have a firewall
allowing me outgoing only through port 80, with an http tunnel
(http://www.nocrew.org/software/httptunnel.html, or even http://openvpn.sf.net)
I'll be able no matter how, by relaying myself on a machine interpreting my
tunnel, to incapsulate every kind of traffic within that channel and to do
things I otherwise couldn't, included reaching machines within the protected
network.
About covert channels, SANS staff in 2000 mentioned a fusys work and a one of
mine in this analysis http://www.s0ftpj.org/docs/covert_shells.htm .

As we said before, network traffic steganography can be considered a
specialized type of "covert communication channel". Network sessions
steganography means "to put data within packets that appears to be empty"; on
the contrary, covert channels usually means "to put arbitrary data within
packets usually containing other kind of data, so that they are accepted since
perfunctorily considered as related to what they appears for". The first
steganography project applied to networking is this one:
http://public.lanl.gov/cdi/networkstenganography.htm . You'll find here a
document related to steganography applied to networking and technique tested
below is mentioned there, too. Later on, you'll understand why I talk about it.

In order to be able to apply steganography to TCP sessions or IP packets in
general, we need to find a place where to put data. Usually is taken header
which is to be abused, and, knowing each field meaning since you've previously
read rfc791 (http://www.faqs.org/rfcs/rfc791.html), you look for fields which
can host data, though taking care of packet to be valid and so to reach its
destination.

IP Header :

   0           4              8          16     19         24            32
   ------------------------------------------------------------------------
   |  VERS  |   HLEN |    Service Type   |          Total Length          |
   ------------------------------------------------------------------------
   |         Identification              | Flags |       Fragment Offset  |
   ------------------------------------------------------------------------
   |                             Source IP Address                        |
   ------------------------------------------------------------------------
   |                         Destination IP Address                       |
   ------------------------------------------------------------------------
   |                                 IP Options             |  Padding    |
   ------------------------------------------------------------------------
   |                                    Data                              |
   ------------------------------------------------------------------------

All of these fields have an aim and we can't hope to find an empty space
(except for padding, which should be filled with zeroes according to the
RFC...): we have to find out how these fields are filled by out TCP/IP stack
and according to that, we have to see if we can manipulate them to insert data
without subverting the normal operations of protocols.
However, if IP leaves a very little space to act, TCP offers more space placing
many more fields linked to control session at your disposal. They can be used
steganographically if there's no session to be controlled:

   0           4              8           16     19    24                 32
   -------------------------------------------------------------------------
   |              Source Port             |         Destination Port       |
   -------------------------------------------------------------------------
   |                               Sequence Number                         |
   -------------------------------------------------------------------------
   |                             Acknowledgment Number                     |
   -------------------------------------------------------------------------
   | HLEN  |  Reserved |   Code Bits     |         Window                  |
   -------------------------------------------------------------------------
   |              Checksum               |         Urgent Pointer          |
   -------------------------------------------------------------------------
   |                                   Options    |        Padding         |
   -------------------------------------------------------------------------
   |                                    Data                               |
   -------------------------------------------------------------------------

To clarify ideas about TCP header fields meaning, I refer to
http://www.faqs.org/rfcs/rfc793.html .

The few softwares implemented, however, had an obvious problem: since they hid
data within unlikely fields or in unusual ways, they weren't at all resistent
to steganalysis. Steganalysis is process by which you can understand there has
been an exchange of information through a non-conventional way: it's not really
an analysis technique that can be described as you can do with cryptanalysis,
it's rather understanding what's too evidently "out of standard" and, after
identifying standard changing, trying to check if it's a false positive due to
errors or if error, because of its frequency, is actually due to an user
controlling it to hide something.

A tool, which was introduced in "Covert Channels in the TCP/IP Protocol Suite"
generates traffic such as this: 

18:50:13.551117 nemesis.psionic.com.7180 > blast.psionic.com.www: 
		S 537657344:537657344(0) win 512 (ttl 64, id 18432)
18:50:14.551117 nemesis.psionic.com.51727 > blast.psionic.com.www: 
		S 1393295360:1393295360(0) win 512 (ttl 64, id 17664)
18:50:15.551117 nemesis.psionic.com.9473 > blast.psionic.com.www: 
		S 3994419200:3994419200(0) win 512 (ttl 64, id 19456)
18:50:16.551117 nemesis.psionic.com.56855 > blast.psionic.com.www: 
		S 3676635136:3676635136(0) win 512 (ttl 64, id 19456)
18:50:17.551117 nemesis.psionic.com.1280 > blast.psionic.com.www: 
		S 774242304:774242304(0) win 512 (ttl 64, id 20224)
18:50:18.551117 nemesis.psionic.com.21004 > blast.psionic.com.www: 
		S 3843751936:3843751936(0) win 512 (ttl 64, id 2560)

which is necessary to give an example of steganography within ID IP. In this
case, to make sense packet SYN flag is set within TCP header, pretending
source host to be connecting to remote server. Analyzing it, however, you can
observe that no TCP/IP stack ever generates, second by second, packets towards
the same port through a source port changing randomly.
Supposing that these elements are corrected, 
so that traffic is more similar to a connection attempt on a port 80 not
responding, it would always be unusual traffic the one created to transfer
even only 1500 bytes (1500 SYN packets spaced out by a second, towards the
same port, makes no sense; if they were towards different ports it could seem
a scan, if it was swifter it could seem a flood, but this is too strange to
ignore it twice or more).

It is quite easy to decide to sacrifice some fields, several fields... ports,
sequence number, acknowledgment number, urgent pointer, ip identifier, being
able to transmit many more bytes in a single packet, but this would mean to
subvert completely the normal operations of protocols, making it a covert
channel rather than a steganographic system.

What I showed cannot at all be the right way. The right way, if you want to
apply steganography to network or transport layers, is to look at a common
connection and to try to understand how data can be inserted within it, while
making sure that an external observer will not be able to detect anything
strange.

This is a real session managed by a TCP/IP stack and by the lynx web browser
and this is how our steganographed session will appear at the end: APPARENTLY
the same.

13:28:07.500468 192.168.1.69.58067 > 66.102.11.104.80: 
		SWE [tcp sum ok] 158029937:158029937(0) 
		win 5840 <mss 1460,sackOK,timestamp 83959185 0,nop,wscale 0> 
		(DF) (ttl 64, id 17888, len 60)
13:28:07.598985 66.102.11.104.80 > 192.168.1.69.58067: 
		S [tcp sum ok] 2710819308:2710819308(0) 
		ack 158029938 win 8190 <mss 1412> (ttl 244, id 1970, len 44)
13:28:07.599064 192.168.1.69.58067 > 66.102.11.104.80: 
		. [tcp sum ok] 1:1(0) 
		ack 1 win 5840 (DF) (ttl 64, id 17889, len 40)
13:28:07.603015 192.168.1.69.58067 > 66.102.11.104.80: 
		. 1:1413(1412) 
		ack 1 win 5840 (DF) (ttl 64, id 17890, len 1452)
13:28:07.603042 192.168.1.69.58067 > 66.102.11.104.80: P 1413:2312(899) 
		ack 1 win 5840 (DF) (ttl 64, id 17891, len 939)
13:28:07.863177 66.102.11.104.80 > 192.168.1.69.58067: . 1:1413(1412) 
		ack 2312 win 32476 [tos 0x10]  (ttl 53, id 2145, len 1452)
13:28:07.863268 192.168.1.69.58067 > 66.102.11.104.80: 
		. [tcp sum ok] 2312:2312(0) 
		ack 1413 win 8472 (DF) (ttl 64, id 17892, len 40)
13:28:07.864275 66.102.11.104.80 > 192.168.1.69.58067: P 1413:1573(160) 
		ack 2312 win 32476 [tos 0x10]  (ttl 53, id 2146, len 200)
13:28:07.864321 192.168.1.69.58067 > 66.102.11.104.80: 
		. [tcp sum ok] 2312:2312(0) 
		ack 1573 win 11296 (DF) (ttl 64, id 17893, len 40)
13:28:07.877845 66.102.11.104.80 > 192.168.1.69.58067: P 1573:2621(1048) 
		ack 2312 win 32476 [tos 0x10]  (ttl 53, id 2159, len 1088)
13:28:07.877911 192.168.1.69.58067 > 66.102.11.104.80: 
		. [tcp sum ok] 2312:2312(0) 
		ack 2621 win 14120 (DF) (ttl 64, id 17894, len 40)
13:28:07.887977 66.102.11.104.80 > 192.168.1.69.58067: FP 2621:3417(796) 
		ack 2312 win 32476 [tos 0x10]  (ttl 53, id 27843, len 836)
13:28:07.920812 192.168.1.69.58067 > 66.102.11.104.80: 
		. [tcp sum ok] 2312:2312(0) 
		ack 3418 win 16944 (DF) (ttl 64, id 17895, len 40)
13:28:11.954544 192.168.1.69.58067 > 66.102.11.104.80: 
		F [tcp sum ok] 2312:2312(0) 
		ack 3418 win 16944 (DF) (ttl 64, id 17896, len 40)
13:28:12.050846 66.102.11.104.80 > 192.168.1.69.58067: 
		. [tcp sum ok] 3418:3418(0) 
		ack 2313 win 32476 (ttl 244, id 232, len 40)

Timing between packages, CWND and options management, actual data transfer
and checksums are included in every connection, so the space we can use is
really small: every field we could have used to put data in has an RCF-imposed
value, which the TCP/IP stacks will enforce.

Anyway, you have to look for before finding, and if we analyze everything
elapsing in those short exchanges, we see which fields are involved...

Untouchable things for a session to run properly are: 
 IP header: addresses, fragmentation, flags, checksums, lens, version and IHL.
 TCP header: sequences and acks, flags, incoming port, checksums.
 DATA: they must be the same for all connections, we can't discriminate them
       depending on a certain source IP sending different pages.

At the same time, during years it has been observed some operating systems
implementing their TCP/IP stacks in different ways, so different that by
analyzing certain combinations of fields operating systems can be fingerprinted
remotely, either by active or passive scanners: these differences are worth
studying, though understanding a port or a sequence number generating nature
won't likely show space to hide data within.

The most useful thing I saw is about OpenBSD and GRsecurty: they introduce
randomization of some incremental stuff for security purposes. A process pid,
for example, is by default random on OpensBSD and selectable through sysctl on
FreeBSD, while it is incremental on Linux.
Even some networking elements draw on this randomization fixation: starting
sequence number (it has already been randomized to avoid ip spoofing, after
statistic analyses introduced in
http://razor.bindview.com/publish/papers/tcpseq.html ; well, generation
algorithms have been revised as you can see in
http://lcamtuf.coredump.cx/newtcp/), source port used to start a connection (or
hidden by NAT), IP ID (identifying packet, it's used if a packet is fragged:
fragments have the same identifier). The last two values are usually
incremental, they are both 16 bit fields. Once source port for session has been
chosen it won't change, while IP ID always will. In my opinion, this is the
point to be used in order to apply steganography to a session, pretending it to
be a trivial standard session, so that it cannot be noticed at all.

Since theoretically steganography cannot be found out and message holder cannot
be discriminated from other similar holders, we have often to plan stegosystem
only relying on a few available elements to be able to transmit securely.
Actually 2 bytes per packet are really few, so compression is necessary
(though ciphering is better and it's free). By compressing or encrypting data,
we reduce the chance to generate sequences of repeating identifiers, and avoid
that an analyst detects a nonrandom sequence in what is supposed to be a random
stream.

Usually the ID is incremental, i.e. each packet generated has its own ID and
the following packet has ID+1. So identifiers are spread across a number of
different sessions, and each individual session will not have sequential IDs,
but seemingly random ones (albeit always increasing).

OpenBSD and GRsecurity introduced random IDs (OpenBSD by default, while GRsec
and FreeBSD need it to be enabled) thus, the appearance of random IDs could be
explained by the use of these operating systems and patches, and not be a
telltale of steganographic activity going on.

We developed a test framework, named "innova"
(http://www.s0ftpj.org/projects/innova/), which allows us to manipulate or
analyze packets of our own sessions in a way which is transparent to kernel and
userspace applications.

As it is well known, as the application demands the creation of a socket, the
kernel actually creates it. Afterwards, as the application requests the
connection of that socket, the kernel performs the three-way handshake on
behalf of the application.

>From this point on, the application sends data through the socket, and relays
construction of packets, transmission, and acknowledgments to the kernel.

Innova intercepts outgoing packets after the kernel has finished going through
them, and can thus manipulate and study what happens to TCP/IP stack in a
transparent and independent way.

It works in userspace, and not in kernel mode (although a minimal one will soon
be implemented), and analysis/manipulation options are managed by plugins to
allow the framework to be more flexible.

Since a likely application is transparent steganography, we implemented a demo
plugin to realize it.

<-| stego/ip_steganography.c |->
/*
 * innova plugin, coded by Claudio Agosti vecna@s0ftpj.org
 *
 * Mon Oct 13 22:08:31 2003, finished Thu Jan 29 18:35:13 2004
 * (I like lost my time)
 *
 * ip_steganography work to implement transparent steganography 
 * on ip packets, using ip id field.
 *
 * some operating system and other with special patch implement
 * random ip id generation. this can be a nice way to 
 * hide data inside connection, using innova plugin this can
 * be applied transparently with your common client sessions.
 *
 * http://www.gnupg.org
 * http://www.gzip.org
 * http://mcrypt.sf.net
 *
 * these software should be used to crypt and compress file before
 * sending with this plugin. the options passed to this plugin must
 * be "/dump_directory /input_directory"
 * the dump_directory could be filled with ip.ip.ip.ip-port dump
 * of incoming session, on the input_directory it looks for the
 * file with ip.ip.ip.ip try to connect, if founded, is used as
 * input to take the couple of byte to put on ip->id field.
 *
 * (http://www.s0ftpj.org :) for better information you should
 * search information on http://claudio.itapac.net
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>

#include "innova.h"

static char *plug_desc="ip steganography";

static char *dump_path, *source_path;

#define MAXSEQ		4
struct stgcouple_track
{
	unsigned int addr, counter;
	unsigned short port;
	FILE *dump;
	char *fname;
	unsigned int last_seq[MAXSEQ];
};
/* maximun of incoming connection tracked */
#define MAXTRAKSEX	20

static struct stgcouple_track incoming_track[MAXTRAKSEX];
static struct stgcouple_track outgoing_track[MAXTRAKSEX];

#define INCOMING	1
#define OUTGOING	2

char *get_io_desc(int *plugin_version)
{
	*plugin_version =PLUGIN_FORMAT;

	return plug_desc;
}

int mangle_init(struct innova_struct *is)
{
	/*
	 * return < 0 is error, and innova is break,
	 * return 0 is for repeat mangle_init,
	 * return > 0 is for init success, innova continue happy
	 */
	return 1;
}

int mangle_cleanup(struct innova_struct *is, int error)
{
	int i;

	printf("forced closing...\n");

	for(i =0; i < MAXTRAKSEX; i++)
	{
		if(incoming_track[i].dump !=NULL)
		{
			printf("incoming file %s, %d packet\n", 
				incoming_track[i].fname, 
				incoming_track[i].counter
			);
			fclose(incoming_track[i].dump);
			incoming_track[i].dump =NULL;
		}
	}

	for(i =0; i < MAXTRAKSEX; i++)
	{
		if(outgoing_track[i].dump && !feof(outgoing_track[i].dump))
		{
			printf("outgoing opened session %s, "
			       "ending without be finish (%d byte sent)\n",
				outgoing_track[i].fname, 
				outgoing_track[i].counter
			);
			fclose(outgoing_track[i].dump);
			incoming_track[i].dump =NULL;
		}
	}

	return 1;
}

static inline struct stgcouple_track *
get_session(int who, struct innova_packet *pkt)
{
	unsigned int i;
	struct stgcouple_track *list;

	if(who ==INCOMING)
		list =incoming_track;
	else
		list =outgoing_track;

	for(i =0; i < MAXTRAKSEX; i++)
	{
		if(who ==INCOMING)
			if(pkt->ip->saddr ==list[i].addr && 
			   pkt->tcp->source ==list[i].port)
				return &list[i];

		if(who ==OUTGOING)
			if(pkt->ip->daddr ==list[i].addr && 
			   pkt->tcp->dest ==list[i].port)
				return &list[i];
	}
	return NULL;
}

static inline struct stgcouple_track *
get_next_free(int who)
{
	unsigned int i;
	struct stgcouple_track *list;

	if(who ==INCOMING)
		list =incoming_track;
	else
		list =outgoing_track;

	for(i =0; i < MAXTRAKSEX; i++)
		if(list[i].dump ==NULL)
			return &list[i];

	return NULL;
}

#define	DUPSEQ	0
#define NEWSEQ	1

/* is used xor because seq and ack_seq should not change both */
static inline int check_seq(unsigned int *seqlist, unsigned int last_xor)
{
	unsigned int i;

	for(i =0; i < MAXSEQ; i++)
	{
		if(seqlist[i] ==last_xor)
			return DUPSEQ;
	}

	for(i =(MAXSEQ -1); i > 0; i--)
		seqlist[i] =seqlist[i -1];

	seqlist[0] =last_xor;

	return NEWSEQ;
}

int local_mangle(struct innova_struct *is, struct innova_packet *pkt)
{
	struct stgcouple_track *tracking =get_session(OUTGOING, pkt);

	if(tracking ==NULL && pkt->tcp->syn)
	{
		char fname[MAXPATHLEN], i;

		if((tracking =get_next_free(OUTGOING)) ==NULL)
		{
			printf("outgoing session overcoming limit of %d\n", 
				MAXTRAKSEX);
			return PACKET_OK;
		}

		sprintf(fname, "%s/%s", source_path, 
			inet_ntoa(*(struct in_addr *)&pkt->ip->daddr));

		if((tracking->dump =fopen(fname, "r")) ==NULL)
			return PACKET_OK;

		printf("opened [%s] file for be sent\n", fname);

		tracking->fname =strdup(fname);

		tracking->addr =pkt->ip->daddr;
		tracking->port =pkt->tcp->dest;

		for(i =0; i < MAXSEQ; i++)
			tracking->last_seq[(int)i] =0x00;
	}

	if(tracking ==NULL)
		return PACKET_OK;

	if(pkt->tcp->fin || pkt->tcp->rst)
	{
		printf("closing session [%s] packets sent %d\n", 
			tracking->fname, tracking->counter);

		if(!feof(tracking->dump))
		{
			printf("ERROR! file %s not totally sent, "
			       "%d byte only\n", 
				tracking->fname, tracking->counter *2
			);
		}

		fclose(tracking->dump);
		tracking->dump =NULL;
		tracking->addr = tracking->port =0;

		return PACKET_OK;
	}

	/* to avoid duplicated packet */
	if((check_seq(tracking->last_seq, 
			pkt->tcp->ack_seq ^ pkt->tcp->seq)) ==DUPSEQ)
		return PACKET_OK;

	if(feof(tracking->dump))
	{
		printf("session %s sent after %d byte\n", 
			tracking->fname,
			tracking->counter
		);
		fclose(tracking->dump);
		tracking->dump =NULL;
		tracking->addr = tracking->port =0;
	}
	else
	{
		fread(&pkt->ip->id, 1, sizeof(unsigned short), tracking->dump);
		tracking->counter++;
	}

	return PACKET_OK;
}

int remote_mangle(struct innova_struct *is, struct innova_packet *pkt)
{
	struct stgcouple_track *tracking =get_session(INCOMING, pkt);

	if(tracking==NULL && pkt->tcp->syn)
	{
		char fname[MAXPATHLEN], i;

		if((tracking =get_next_free(INCOMING)) ==NULL)
		{
			printf("ingoing session overcoming limit of %d\n", 
				MAXTRAKSEX);
			return PACKET_OK;
		}

		/* 
		 * YES, THIS IS A BUFFER OVERFLOW!,
		 * root could became root.
		 */
		sprintf(fname, "%s/%s-%d", dump_path, 
			inet_ntoa(*(struct in_addr *)&pkt->ip->saddr),
			htons(pkt->tcp->source)
		);
		
		if((tracking->dump =fopen(fname, "w+")) ==NULL)
		{
			printf("unable to open file %s!!\n", fname);

			return PACKET_OK;
		}
		tracking->fname =strdup(fname);

		tracking->addr =pkt->ip->saddr;
		tracking->port =pkt->tcp->source;

		for(i =0; i < MAXSEQ; i++)
			tracking->last_seq[(int)i] =0x00;
		tracking->counter =0;
	}

	/* untracked session */
	if(tracking ==NULL)
		return PACKET_OK;

	/* to avoid duplicated packet */
	if((check_seq(tracking->last_seq, 
			pkt->tcp->ack_seq ^ pkt->tcp->seq)) ==DUPSEQ)
		return PACKET_OK;
	/* 
	 * sorry for stressing your scheduler calling a lot of system call,
	 * but this is only a demonstration... static connection table,
	 * check only the last sequence number, and other non-performantic
	 * things :)
	 */
	if(pkt->tcp->fin || pkt->tcp->rst)
	{
		printf("closing session [%s] packets received %d\n", 
			tracking->fname, tracking->counter);

		fclose(tracking->dump);
		tracking->dump =NULL;
		tracking->addr = tracking->port =0;
	}
	else
	{
		fwrite(&pkt->ip->id, 1, sizeof(short), tracking->dump);
		tracking->counter++;
	}


	return PACKET_OK;
}

int io_timeexceed(struct innova_struct *is, int *timeout)
{
	return 0x00;
}

void stegano_print_help(void)
{
	fprintf(stderr,
		"steganography plugin simple required two path, one for "
		"incoming sessions dump\nthe other for search data to put"
		" inside the packets\nfirst: dump, eg /tmp\nsecond: input,"
		" eg /encrypted/ (with.any.ip.file) like "
		"/encrypted/192.168.0.1\n"
	);
}

static int check_path(char *path)
{
	struct stat st;

	if(path ==NULL)
		return 0;

	if((stat(path, &st)) ==-1)
		return 0;

	return S_ISDIR(st.st_mode);
}

int option_parser(struct innova_struct *is, struct innova_options *iopt)
{
	if(iopt->plug_opt ==NULL || !strcmp(iopt->plug_opt, "help"))
	{
		stegano_print_help();
		return 0xff;
	}

	dump_path =iopt->argv[0];
	source_path =iopt->argv[1];

	if(!check_path(dump_path) || !check_path(source_path))
	{
		stegano_print_help();
		innova_p(FATAL, "invalid path passed as option");
	}

	return 0x00;
}
<-X->

How does it work?
For the generic documentation of innova, we refer you to the complete source
code and documentation which can be downloaded as a package from
http://www.s0ftpj.org/projects/innova/

The framework is in its early release phase, so it is expected to show problems
and faults. Bug reports and similars wille be gratefully accepted.

Let's look at a simple possible use of the steganography plugin in a case
study.

Server:
  Let us suppose that we want to use a web server to communicate data in a
  hidden way to a client, and let us suppose, just for simplicity, that it
  already knows the client IP (plugin is just an example, it needs to know
  formerly ip, though it can be expanded in a quite simple way; as you can see
  from plugin, functions framework looks for are the most intuitive I find for
  a traffic manipulation system. Our file will be:

gw@/tmp/innova-0.0.1# man strtoul > secret

gw@/tmp/innova-0.0.1# md5sum secret
ad7b9c8997544c7f4188869457c42118  secret
gw@/tmp/innova-0.0.1# gzip -9 secret
gw@/tmp/innova-0.0.1# ls -l secret.gz
-rw-r--r--    1 root     root         1299 Feb  4 00:20 secret.gz
gw@/tmp/innova-0.0.1# mv secret.gz /tmp/192.168.1.69
gw@/tmp/innova-0.0.1# ./innova -p tcp -l 80 -o "/tmp /tmp" \
-m plugins/ip_steganography -i eth1 192.168.1.69
innova start init of plugin /tmp/innova-0.0.1/plugins/ip_steganography.so: 
ip steganography

Client:

  Client wants to download a file from a web server: to do that, it runs wget
  together with URL and downloads it. Actually, it's not interested on file on
  its own, but it wants to establish a session with remote server and to send
  it a steganographic message. Message is:

schlafen@/tmp$ cat secret 
biscotti con frolla al parmiggiano:

500 grammi di farina, 350 di burro, un tuorlo e un uovo intero, 190 grammi
di zucchero (frolla normale) o 190 di PARMIGGIANO (frolla al parmiggiano)
cuocere a 180 gradi per 20 minuti circa.

sembra strano, sembra fusion, ma sono buoni. antani sia con te
e benedica la tua via steganografica, fratello cuoco.

amen!

schlafen@/tmp$ md5sum secret 
9eaf187b268f8ab67888178ab381534d  secret
schlafen@/tmp$ gzip -9 secret -c > 192.168.1.1 

schlafen@/home/vecna/wrk/innova-0.0.1# ./innova -p tcp -r 80 \
-m plugins/ip_steganography -o "/tmp /tmp/" -i eth0 192.168.1.69
innova start init of plugin /root/wrk/innova-0.0.1/plugins/ip_steganography.so:
ip steganography

  Once innova starts, client starts too, and it will perform its net session
  that is a cover, like a normal download session:

schlafen@~$ wget http://192.168.1.1/film/southpark-matrix.mpg
--01:26:57--  http://192.168.1.1/film/southpark-matrix.mpg
           => `southpark-matrix.mpg'
Connecting to 192.168.1.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25,050,957 [video/mpeg]
	    
100%[====================================>] 25,050,957     3.88M/s    ETA 00:00
	     
01:27:02 (4.33 MB/s) - `southpark-matrix.mpg' saved [25050957/25050957]

innova intercepts outgoing connection, it checks if within /tmp there's a file
having the same name of the one of IP we are checking, and, as secret.gz file
has been renamed 192.168.1.1 , innova opens it and uses it as source. All
packets belonging to a session towards 192.168.1.1 will have associated file in
ip->id .

Application performs connect, kernel performs three way handshake. Then there
are also packets client receives. When innova is running, each packet matching
innova rules (in this case, -r 80 denotes remote gate and 192.168.1.1 host) is
managed by plugin. For each session, plugin opens within dump directory a file
named /directory/ip-porta, in which received packets ID will be written.

session /tmp/192.168.1.1 sent after 130 byte
closing session [/tmp/192.168.1.1-80] packets received 17302
forced quit for signal: 2
forced closing...
Terminated

schlafen@/tmp# file 192.168.1.1-80
192.168.1.1-80: gzip compressed data, was "secret", from Unix, max compression
schlafen@/tmp# mv 192.168.1.1-80 /tmp/secret.gz
schlafen@/tmp# gzip -d secret.gz
 
gzip: secret.gz: decompression OK, trailing garbage ignored
schlafen@/tmp# file secret
secret: ASCII English text, with overstriking
schlafen@/tmp# md5sum secret
ad7b9c8997544c7f4188869457c42118  secret

Server:

  Server has a running http service, when it receives a connection it will dump
  ip->id series, knowing one of them to be likely to contain a steganographic
  session (at first they're all dumped, later on each dump is analyzed, since
  in real conditions they should be ciphered, not gzipped). What is more,
  seeing an outgoing session from analyzed service, it checks if there's a file
  having ip of client which contacted it.
  If there is, it's used as source file:

opened [/tmp//192.168.1.69] file for be sent
session /tmp//192.168.1.69 sent after 650 byte
closing session [/tmp//192.168.1.69-36227] packets received 5254
Terminated
gw@/tmp/innova-0.0.1# file /tmp/192.168.1.69-36227
gw@/tmp/192.168.1.69-36227: gzip compressed data, deflated, original filename, 
last modified: Wed Feb  4 00:29:38 2004, max compression, os: Unix
gw@/tmp/innova-0.0.1# cd ..
gw@/tmp# mv 192.168.1.69-36227 received.gz
gw@/tmp# gzip -d received.gz

gzip: received.gz: decompression OK, trailing garbage ignored
gw@/tmp# md5sum received
9eaf187b268f8ab67888178ab381534d  received

Thus, mpeg downloading has turned to a bidirectional channel for exchanging
hidden data, by downloading the same file everybody downloads, with no modifies.


2) STEGANALYSIS APPLIED TO REAL TRAFFIC

Then analysis is restricted, to understand if "a session managed by a random
ID series is so uncommon to create suspicion or if it's covered by average
traffic". So, asking here and there, I got enough traffic to analyze it
decently: for such analysis, it is just necessary to study ip ID increase for
each host.

I used several scripts and code fragments written off, I'm not going to mention
them. However, from tcpdump lines such as:

74.19.61.75 > 131.192.48.6: [|tcp] (DF) (ttl 46, id 21959, len 92)
74.19.61.79 > 131.192.48.6: [|tcp] (DF) (ttl 45, id 45841, len 65)

I passed to:

74.19.61.75 131.192.48.6 21959
74.19.61.79 131.192.48.6 45841

to end with an array of IDs for each source IP while keeping trace of all ID
increases greater than 1 (standard increase). When acquisition is complete, I
compare increase sum and packets number (thus finding average increase for
each packet).
Now, average increase (included among 1 and 40000) has been forced to be
rounded off and proportioned to a 100 items array, to see distribution and to
understand if average increase is most at all high or low. Surely, this is not
the better way to describe that. Though, if a session shows an extremely high
increase it will arrive above the last value of this analysis referred to
"normal" sessions; if normal sessions are marked off by a large scale, it is
more difficult for an ip id session used as a steganographic container to be
recognized.

This is array dump of ratios proportioned to 100 items:

0) 955  10) 38  20) 24  30) 11  40) 6   50) 1   60) 0   70) 0   80) 0   90) 0
1) 114  11) 47  21) 22  31) 18  41) 1   51) 2   61) 0   71) 0   81) 0   91) 0
2) 99   12) 18  22) 28  32) 6   42) 9   52) 3   62) 0   72) 1   82) 0   92) 0
3) 60   13) 39  23) 28  33) 13  43) 2   53) 1   63) 0   73) 0   83) 0   93) 0
4) 62   14) 33  24) 18  34) 4   44) 5   54) 2   64) 0   74) 0   84) 0   94) 0
5) 71   15) 23  25) 12  35) 9   45) 7   55) 2   65) 0   75) 0   85) 0   95) 0
6) 53   16) 21  26) 27  36) 6   46) 2   56) 1   66) 0   76) 0   86) 0   96) 0
7) 47   17) 29  27) 12  37) 4   47) 5   57) 0   67) 0   77) 0   87) 0   97) 0
8) 42   18) 25  28) 15  38) 6   48) 6   58) 0   68) 0   78) 0   88) 0   98) 0
9) 31   19) 22  29) 16  39) 5   49) 4   59) 0   69) 0   79) 0   89) 0   99) 1

By analyzing exchange session (to compare them, I considered data exchange to
happen only from 192.168.1.1 to 192.168.1.69, not the opposite, in order to
study steganographic session related to a normal one) and I remarked average
increase to be about 19000 units.

This analysis appears to be valid, though, thinking again to steganography
running on IP ID and knowing there can be a highly random transmission only
when session begins, followed by a fall, I tried to divide sessions in 4
groups, depending on the number of packets they contains, and to generate a
graphic taking as X packet number and as Y ip id value. We can thus see which
sessions have an increasing trend and which one are random (the second ones
will be scattered on cartesian axes, the first ones trend will be increasing
till it reaches a maximum point when unsigned short describing them overflows
and counting starts again).

At last, some .png graphs attached to this article (./graph/) have been
published: there are traffic graphs[1-7].png generated by gnuplot, showing some
examples of normal traffic; steganotraffic.png showing traffic generated by
innova session. It's easy to understand, but I also remarked some machines
included into analysis (are they OpenBSD or linux + GRsec?) to have random
ip id, so I still suggest, if you want to use this technique, to use a system
such as the ones generating random id, so that a possible statistic analysis
can't remark any difference.


3) LET'S START MENTAL ILLNESS

While thinking about the last speech during e-privacy 2003, dealing with
steganography, I remembered naif during Q&A session to ask me a question I
don't remember how I answered to at first, yet now I think I have final (I
hope) answer.
Question: "Is it possible to create steganographic techniques resistant to
steganalysis, not depending on related specifications to be public? For
example, while implementing steganography within a mail client, when it's
spread and built-in and specifications are public, is it really possible to
steganography e-mail exchanges or whatever while avoiding steganalysis?"
(http://www.s0ftpj.org/docs/ep2003_steganografia_vecna.ogg)

When does steganalysis work? It works when it is able to understand a datum,
whichever it is, has been modified in order to contain information. If you
don't lack imagination, you can understand it is easy to find out how to hide
data: almost every container can do it, more or less performingly.
Steganography can be easily imagined to be applied to images: being able
to replace with arbitrary data those details human eye can't notice, apparently
image doesn't appears to have changed, but it hides data which can be retrieved
by anyone knowing how to decode them.
Images are fit for it, a green meadow photo is unlikely to remark less
important bytes of grass aren't the real ones. Similarly, steganographing a mp3
by a heavy metal band rather than a solo pianist, probably hides our changes,
no matter how invading they are.

It's normal to think about using other multimedia containers: they're fit for,
as they usually support definitions human senses can't completely appreciate,
so parts considered "too accurate" "unnecessary", and which can have every kind
of data since they're acquired from external devices liable to errors and
noises, can be replaced with arbitrary data.
Anyway, multimedia objects aren't the only ones fit for us: ALL KIND OF DATA,
one way or another, can be steganographic containers. Most at all, what
differentiate them is capacity (if an image made of X bytes contains Y data,
while another X bytes file contains Y/10 data... we'll choose the image, since
we use to think about steganography applied to less important bits.
Each datum we can generate can be used as a steganographic container:
-EACH FILE-, -EACH DATUM-, can be used, even though final contents are the
same. It is necessary to consider it another way, with regard to information
encoding. The only limit is imagination; here there are some examples to
address it :)

Steganography can be applied to whichever transmission system. Let's imagine
having a HTML file: we take each byte and we bold it or we don't. We'll
eventually have a text file appearing to be modified by someone wasting his
time with tags. Who is able to interprete that file for, however, a bold char
means 1, a normal one means 0; by interpreting sequence through a 8 byte text
rating (bold/non-bold) = a hidden byte will be extracted from file/message.
If this stegosystem is quite strange and can be easily found out, can't be
used very much and whatever, yet is an example to say "if we can generate
files and we can move within our options as common users, then, depending
whether these options are used or they aren't, we can hide data, without
modifying final contents".

HTML is fit for next example, that actually can be applied to every word
processiong format...

Let's imagine to create a document containing some tabs. This key just shifts
cursor to the following byte being divisible by 8. If we are at the top of the
document, 0 position, and we hit tabs...we arrive to 7.
If we are in 1 position...we arrive to 7. :)
If, while in 0 position, we set "times new roman" as font, and we hit spacebar,
in the second one we set "courier" and we space, we tab and we set font we are
using to write document, we are in 7 position with font we would have had a
way or another. However, there are 2 more bytes, they can't be seen, but they
are there. Why not using them to hide data?

Each tab allows us disposing up to 7 bytes, which we can fill with space. As
our word processor has 80 different fonts, we can have 80^7 possible
combinations (like a byte is 2^8, since I can use 8 combinations for 2
statuses) we'll have by these elements 2.097152 * 10^13, that is equivalent to
combinations which can be expressed by 45 bit, 5 complete bytes expressed
depending on modifying font is quite good :)
If we use an automatic system to insert/extract information from such
documents, no matter if html, pdf, ps, apparently we didn't change anything,
though we succeeded our purpose.

Very nice, BUT:

If each stegosystem is studied and attacked, and a system aiming to find it
out is created, then almost all stegosystem would be vulnerable. Why?

If we analyze statistically every page and we study use frequency and
distance between a <BOLD> tag and another one, we take a large number of
pages from Google and we statistic it to define a "model", a value or a value
series describing average use... then we could just use this referring to
analyze all pages and find out if something is wrong: some results are false
positives, but other ones could not.

Same for document with odd fonts. If it's analyzed statistically we would fail.
Our document would be the only one having that peculiarity, invisible but
important for an automatic analysis because it's really uncommon.

Of course, stegosystem security must be tested by attacking it, to understand
how much can a system be attackable it must be studied and understood, that's
way we deal with simple things. :) 
jpeg DCT steganography is more complex, not necessarily the most scure or
performing one, and since is not immediate to think how to attack it, is even
more difficult than with other systems.

SNOW (steganographic nature of white-space: http://www.darkside.com.au/snow)
is a software steganographing by spacing (" ":) the end of lines. At first
analysis, this technique appears to be criticizable, since within a document
usually there's no reason for willingly inserting spaces at the end of lines,
so a document showing.......................


The nicest thing about human randomness is it's not forecastable.

~/txt/zine$ find . -name 'BF*' -exec file {} \; | grep ASCII | \
	    awk {'print $1'} | sed -es/:// > /tmp/bfitxtfiles

~/txt/zine$ for i in `cat /tmp/bfitxtfiles`; \
	    do x=`grep -c " $" $i` && y=`wc -l $i | awk {'print $1'}` \
	    && echo "$(($y / $x))" ; done | sort -g | column 
 1       7       11      15      20      22      25      32      38      158
 2       9       11      16      20      22      27      33      45      167
 2       10      11      16      20      22      27      33      49
 3       10      12      16      21      23      28      33      51
 5       10      12      16      21      23      28      33      59
 5       10      12      16      21      23      28      34      62
 6       10      13      17      21      24      29      34      84
 7       10      14      18      22      25      29      36      117
 7       11      15      18      22      25      30      37      118

Let's do the same for phrack :)

~/txt/zine$ for i in `cat /tmp/phracktxtfiles`; \
 	    do x=`grep -c " $" $i` && y=`wc -l $i | awk {'print $1'}` \
	    && echo "$(($y / $x))" ; done | sort -g | column   
 1       4       6       9       15      22      32      53      85      220
 1       4       7       12      16      22      32      54      90      235
 1       4       7       12      17      24      34      54      90      237
 1       4       7       13      18      26      36      66      95      342
 1       4       7       13      18      26      37      73      126     350
 2       4       8       13      19      26      38      75      127     466
 2       5       8       13      19      28      40      76      128     494
 2       5       8       14      19      28      43      80      136     603
 3       5       8       15      20      30      43      83      141     660
 3       5       8       15      21      30      44      84      179     665
 3       5       9       15      21      31      45      84      214     1261
 3       6       9       15      22      31      48      84      219

(Unix is really powerful)

The preceding list of numbers is the relationship between
total_file_line_number / lines_ending_in_space
I thought I was to find much higher numbers, while it's not unusual to find
space characters at the end of a text file. Given this, SNOW can be considered
a fairly strong stegosystem, as what it does is done by the user, too.
We have no other way to understand if a space was added during text formatting
or during steganographic process.

~/steganografia/snow$ man bash > contenitore
Reformatting bash(1), please wait...
~/steganografia/snow$ man ./snow.1 > secret
Reformatting snow.1, please wait...
~/steganografia/snow$ wc -l contenitore secret
  4517 contenitore
   113 secret
  4630 total
~/steganografia/snow$ grep -c " $" contenitore secret
contenitore:0
secret:0
~/steganografia/snow$ ./snow -C -p "antani" -f secret contenitore stegobj
Compressed by 35.47%
Message used approximately 85.15% of available space.
~/steganografia/snow$ grep -c " $" stegobj
1544
~/steganografia/snow$ ls -l secret contenitore stegobj
-rw-r--r--    1 vecna    vecna      300633 Feb  2 23:53 contenitore
-rw-r--r--    1 vecna    vecna        5330 Feb  2 23:54 secret
-rw-r--r--    1 vecna    vecna      340474 Feb  2 23:56 stegobj

Here we get a 2.9 ratio: it's frequent, but there exist even lower
non-steganographic cases. If we were to consider this value suspect, we'd get
too much false positives: the system is strong enough because it's based on
something the user could have done or not, so the presence of space characters
isn't determining for SNOW use. :)

This is basically the same concept on which images and music steganograpy was
used.
Using peripherals that can perceive a higher definition than the human senses
(sight/hearing or both) implies that a small part (less significant bits) can
eventually be substituted with arbitrary data; data container can then
contribute to hiding it or not (an heavy metal band's .mp3 file will be more
suitable than a classic piano performance, as a grass field picture versus a
portrait).

In practical terms this steganographic system would be safe, but as we have
already seen these tecniques suffer of the statistic steganalisys
(http://www.outguess.org).

Using a container for the 90% instead of 10% surely influences analysis
possibility.
How can we find a good staganograpic container then?

- if we hardly can find an injection system that can resist to statistical
  analysis (using too many manipulation options can create unique cases that
  can be used as singularity points -- as in case of text fonts, spaces...)
- if the frequency of use of a container is proportional to the risk of it
  being analyzed and found (jpeg and more, outguess)
- if the action points are many, that is each point that lets the user
  modify something can be used (html, CAD?)

then a potential solution is to create a software that can work at layer 6,
that finds ALL formats/protocols (html, jpeg, gif, html sources with snow,
color options, presence and order of particular elements within a page, LSB
inside images) that can be found within the media we are using to expose
information (e.g. mail or web), and then proceeds ordering all these potential
containers and applying mass-steganography to them, dividing content so that
it will be used as scarcely as possible.
This way each element will contain too few informations to be pinpointed as
a single container, but all these informations together will be able to reveal
the data hidden inside it once decoded and decompressed.
Using different tecniques helps raising the false positive counter for each
tecnique, bringing us closer to a strong system (given the limited use of each
container) and difficult to attack, given its application (as using 4 data
injection systems we'll have to make 4 analisys, and probably we'll have much
more false positives).

There's no doubt that writing media for these formats gets harder...


4) THIS EXAMPLE REALLY ROCKS: SIMULATING COMMON TRAFFIC

While writing this article, I was slightly unsatisfied as I was proposing
just a single piece of code on a so wide topic, but it happened to me to run
tcpdump and collect some traffic, and to remember that it was months I was 
receiving packets like these:

13:51:37.806506 127.0.0.1.80 > 62.211.136.80.1575: R [tcp sum ok] 0:0(0) 
		ack 1 win 0 (ttl 121, id 46859, len 40)

I had met them before, but once classified as "worm traffic" I had ignored them.

http://www.securityfocus.com/archive/75/335132/2003-08-21/2003-08-27/0
http://cert.uni-stuttgart.de/archive/incidents/2003/10/msg00143.html

Such a thing can't be ignored! This can be a fantastic method of network
steganography, where we could use those fields I forbid myself in the previous
example, as these are not linked to any session -- but explained thanks to a
common phenomenon. (they shouldn't raise eyebrows, everybody gets them :))

I received 4908 packets matching this rule:
'ip and tcp port 80' and 'tcp[13] & 5 != 0' and 'src 127.0.0.1'
from 13:48 to 10:23 of the following day, with an average of a packet each 14
seconds.

3060./tmp$ grep -c "ack 1" VIRUS.traffic.dump
4139
3061./tmp$ grep -vc "ack 1" VIRUS.traffic.dump
769
3062./tmp$

Most of these packets does not have ACK flag set to 1, but to a different
value, have different destination ports and the IP ID field is always
free for use.
This worm's traffic is ok.

We can realize a monodirectional system to send information which 'looks like'
worm traffic. Given our data (that will be mixed with all others (real) worm 
packets), we have to find a way to separate things: it seems we need a
pre-shared key between two peers (after all, this code is only an example,
this afternoon I'll watch Lord of the Rings and it's time to end this
article...), so that we can mock up a sort of authentication to the server
that will receive our packets.

Usable fields are, after a quick analysis over captured traffic:

ack_seq (4 byte): completely free
id (2 byte): completely free
destination port (2 byte): between 1000 and 2000

Once we have agreed on a sub-specie of client authentication we'll be
recognized thanks to the destination port (always the same, but not only),
while remaining fields will be useful to store our data.

Client will work using a raw sock layer and will send packets using
IP_HDRINCL, while server will read at datalink layer (given that linux -- with
its reverse path protections -- does not allow us to send a packet with the
address of an interface to another) and will decipher them this way:

- both server and client have a key, that will be expanded as needed to create
  a byte sequence, that will be used to authanticate packets

- first packet is created using IP ID to store the length of the
  file we are going to transmit, tcp->ack_seq to store first 4 stream bytes
  and the port, chosen random within common range.
  Server verifies for each packet if ack_seq is equal to one of its series 
  (server can be waiting more than one session at a  time, so it will have more
  than one expanded serie), and if it is it saves current source port, records
  file length and opens a file with a suffix indicating that session is
  incomplete.

- Each packet having a recorded destination port and a matching incremental
  IP ID first byte will be considered valid: second byte of IP ID and 4 bytes
  of ack_seq will be saved on server's dump file. When the file will reach the
  expcted length (learned from the first packet), session will be closed and
  saved, but until then it will keep its 'incomplete' suffix.
  
<-| stego/blaster/Makefile |->
# blaster noise traffic steganography
# http://www.s0ftpj.org

CC=			gcc -O2 -Wall
all:			blaststegd blaststegsender
			@echo "remember to read comment and article!"

blaststegd:		blaststegd.c blaststeg.h
			$(CC) blaststegd.c -o blaststegd

blaststegsender:	blaststegsender.c blaststeg.h
			$(CC) blaststegsender.c -o blaststegsender
clean:			
			rm -f blaststegd blaststegsender
<-X->

<-| stego/blaster/blaststeg.h |->
#define STREAMSIZE	1024
#define MAXKEYLEN	32
#define EXPANDROUND	6
#define MAXRSTSIZE	40
#define ACTVSTR		"(INCOMPLETE)"
#define BASEDELAY	14
#define MAXDELAY	15

/* expand the key on the STREAMSIZE with good distribution */
void
compute_stream (unsigned char *stream, unsigned char *key)
{
  int j, i, k = 0, klen = strlen (key);

  memset (stream, 0x00, STREAMSIZE);

  for (j = 0; j < EXPANDROUND; j++)
    {
      if (j)
	k = (j % klen);

      for (i = 0; i < STREAMSIZE; i++)
	{
	  stream[i] = (stream[i] << 4) ^ (key[k] + i);

	  if (key[++k] == 0x00)
	    k = 0;
	}
    }
}
<-X->

<-| stego/blaster/blaststegsender.c |->
/*
 * Sat Jan 31 10:55:55 CET 2004
 * vecna@s0ftpj.org
 *
 * example of steganography simulating common (worm) traffic,
 *
 * http://www.securityfocus.com/archive/75/335132/2003-08-21/2003-08-27/0
 * http://cert.uni-stuttgart.de/archive/incidents/2003/10/msg00143.html
 *
 * this code is part of a BFi article, go to www.s0ftpj.org and 
 * get a lot of information about it.
 *
 * this is blaststegclient and it has to work with blaststegd
 *
 * this client sends anonymous hidden data over tcp reset packet commonly
 * generated for blaster workaround
 *
 * blaststegd could listen for a lot of sessions and make a file for each one
 * with the session dump 
 *
 * the session inside the caos is discriminated with a pre-shared key
 *
 * protocol to send packets must respect some things:
 * - the first packet of a session must have tcp->ack matching and ip->id
 *   contains the length of complete data 
 * - the destination port is kept as tracking system, when is matched a 
 * packet with that port it checks if the stream is followed and if it is 
 * then the ack field and one byte of id is kept as incoming data.
 * each packet contains 5 bytes of data and 3 bytes for session tracking.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/param.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>

#include "blaststeg.h"

unsigned int
getrand (unsigned int base, unsigned int min, unsigned int max)
{
  unsigned int ret, diff;

  diff = (max - min);

  srandom (time (NULL) + diff);

  if ((ret = random ()) != 0)
    ret %= diff;

  return (base + ret);
}

inline unsigned int
half_cksum (const unsigned short *data, int len)
{
  unsigned int sum = 0x00;
  unsigned short carry = 0x00;

  while (len > 1)
    {
      sum += *data++;
      len -= 2;
    }

  if (len == 1)
    {
      *((unsigned short *) &carry) = *(unsigned char *) data;
      sum += carry;
    }

  return sum;
}

inline unsigned short
compute_sum (unsigned int sum)
{
  sum = (sum >> 16) + (sum & 0xffff);
  sum += (sum >> 16);

  return (unsigned short) ~sum;
}

void
send_packet (int fd, struct iphdr *ip, struct tcphdr *tcp)
{
  unsigned int sum;
  struct sockaddr_in sa;

  sa.sin_addr.s_addr = ip->daddr;
  sa.sin_port = tcp->dest;
  sa.sin_family = PF_INET;

  /* ip check */
  ip->check = 0;
  sum = half_cksum ((unsigned short *) ip, sizeof (struct iphdr));
  ip->check = compute_sum (sum);

  /* tcp check */
  tcp->check = 0;
  sum = half_cksum ((unsigned short *) &ip->saddr, 8);
  sum += htons (IPPROTO_TCP + sizeof(struct tcphdr)); 
  sum += half_cksum ((unsigned short *) tcp, sizeof (struct tcphdr));
  tcp->check = compute_sum (sum);

  if ((sendto
       (fd, (void *) ip, MAXRSTSIZE, 0, (struct sockaddr *) &sa,
	sizeof (sa))) == -1)
    {
      printf ("unable to send sock raw packet!\n");
      exit (1);
    }
}

int
main (int argc, char **argv)
{
  unsigned char stream[STREAMSIZE], packet[MAXRSTSIZE];
  unsigned int counter = 0, delay, fd, hdrincl = 1;
  int filelen;
  struct iphdr *ip = (struct iphdr *) packet;
  struct tcphdr *tcp = (struct tcphdr *) (packet + sizeof (struct iphdr));
  FILE *source;

  if (argc != 4)
    {
      printf ("%s data_file session_key dest_host\n", *argv);
      exit (1);
    }

  printf ("PRIVACY PROTECTION SOFTWARE - example of hiding data on\n"
	  "apparently common worm traffic error - www.s0ftpj.org\n"
	  "check about other information, it could be useful to understand\n"
	  "the limits, the working system and the motivation before run this\n"
	  "steganographic software. coded by vecna@s0ftpj.org\n");

  if ((source = fopen (argv[1], "r")) == NULL)
    {
      printf ("unable to open file %s\n", argv[1]);
      exit (1);
    }

  fseek (source, 0, SEEK_END);
  filelen = ftell (source);
  rewind (source);

  compute_stream (stream, argv[2]);

  if ((fd = socket (PF_INET, SOCK_RAW, IPPROTO_TCP)) == -1)
    {
      printf ("unable to open socket raw (are you root ?)\n");
      exit (1);
    }

  setsockopt (fd, SOL_IP, IP_HDRINCL, &hdrincl, sizeof (int));

  /* default ip hdr */
  memset ((void *) ip, 0x00, sizeof (struct iphdr));
  ip->saddr = inet_addr ("127.0.0.1");
  ip->daddr = inet_addr (argv[3]);
  ip->ihl = 5;
  ip->version = 4;
  ip->protocol = IPPROTO_TCP;
  ip->tot_len = htons (MAXRSTSIZE);
  ip->ttl = (unsigned char) getrand (124, 0, 6);

  /* default tcp hdr */
  memset ((void *) tcp, 0x00, sizeof (struct tcphdr));
  tcp->doff = 5;
  tcp->rst = 1;
  tcp->ack = 1;
  tcp->source = htons (80);
  tcp->dest = htons ((unsigned short) getrand (1024, 0, 920));

  /* initialization ip settings */
  ip->id = filelen;

  /* initialization tcp settings */
  memcpy ((unsigned char *) &tcp->ack_seq, &stream[counter], 4);
  counter += 4;

  send_packet (fd, ip, tcp);

  while (1)
    {
      delay = getrand (BASEDELAY, 0, MAXDELAY);
      sleep (delay);

      ip->ttl = (unsigned char) getrand (124, 0, 6);

      memcpy (&ip->id, &stream[counter], 1);

      fread ((unsigned char *) &ip->id + 1, 1, 1, source);

      if (filelen < 4)
	tcp->ack_seq = getrand (0, 123, 4000000);

      fread ((void *) &tcp->ack_seq, 4, 1, source);

      send_packet (fd, ip, tcp);

      if (++counter == STREAMSIZE)
	counter = 0;

      filelen -= 5;
      if (filelen < 0)
	break;
    }

  printf ("file sent\n");
  return 0;
}
<-X->

<-| stego/blaster/blaststegd.c |->
/*
 * Sat Jan 31 10:55:55 CET 2004
 * vecna@s0ftpj.org
 *
 * example of steganography simulating common (worm) traffic,
 *
 * http://www.securityfocus.com/archive/75/335132/2003-08-21/2003-08-27/0
 * http://cert.uni-stuttgart.de/archive/incidents/2003/10/msg00143.html
 *
 * this code is part of a BFi article, go to www.s0ftpj.org and
 * get a lot of information about this.
 *
 * this is blaststegd and it has to work with blaststegsender,
 *
 * this blaststegd could listen for a lot of sessions and make a file for each
 * one with the session dump
 *
 * the client sends anonymous hidden data over tcp reset packet commonly
 * generated for blaster workaround
 *
 * the session inside the caos is discriminated with a pre-shared key
 * the format of file required from blaststegd is a simple list of
 * pre-shared key.
 *
 * compiled under linux, gcc file.c -o output
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/param.h>
#include <sys/socket.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>

#include <linux/if_ether.h>
#include <linux/if_packet.h>

#include "blaststeg.h"

struct session
{
  char *key;
  unsigned char stream[STREAMSIZE];
  unsigned short port;
  unsigned int counter, length, readbyte;
  FILE *dump;
};
struct session *tracked;

int
main (int argc, char **argv)
{
  FILE *key;
  char line[MAXKEYLEN], packet[MAXRSTSIZE], *dump_directory;
  int sockfd, i = 0, list_sess = 0;

  if (argc < 2)
    {
      printf ("%s pre-shared-password-file <optional-dump-dir>\n", *argv);
      exit (1);
    }

  if (argc == 3)
    dump_directory = argv[2];
  else
    dump_directory = strdup ((const char *)get_current_dir_name ());

  printf (			/* BANNER! :) SPAM && INFO */
   "KEEP YOUR PRIVACY! this is a free software for communication hiding\n"
   "blaststeg daemon, anonymous steganographed packets receiver\n"
   "coded on Sat Jan 31 2004 vecna@s0ftpj.org, http://www.s0ftpj.org\n"
   "READ ALL ABOUT THIS SOFTWARE, I'm acting like worm-traffic emulation...\n"
   "if a lot of time is past since the time i coded this it could be obsolete\n"
   "and insecure!\n"
   "this software could make some complete/incomplete dump on %s\n"
   "don't forget about that\n\n", dump_directory);

  if ((key = fopen (argv[1], "r")) == NULL)
    {
      printf ("unable to open file %s\n", argv[1]);
      exit (1);
    }

  do
    {
      fgets (line, MAXKEYLEN, key);
      list_sess++;

    }
  while (!feof (key));

  if ((tracked = (void *) calloc (list_sess, sizeof (*tracked))) == NULL)
    {
      printf ("unable to alloc memory\n");
      exit (1);
    }

  rewind (key);

  do
    {
      fgets (line, MAXKEYLEN, key);

      /* strip '\n' */
      line[strlen (line) - 1] = 0x00;

      tracked[i].key = strdup (line);
      compute_stream (tracked[i].stream, line);
      i++;

    }
  while (!feof (key));

  if ((sockfd = socket (PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP))) == -1)
    {
      printf ("unable to open datalink layer socket\n");
      exit (1);
    }

  while (read (sockfd, packet, MAXRSTSIZE) != -1)
    {
      static char fname[MAXPATHLEN], newname[MAXPATHLEN];
      struct iphdr *ip = (struct iphdr *) packet;
      struct tcphdr *tcp = (struct tcphdr *) (packet + sizeof (*ip));

      if (!tcp->rst || !tcp->ack)
	continue;

      if (ip->ihl != 5 || tcp->doff != 5)
	continue;

/* search if is a new session */
      for (i = 0; i < list_sess; i++)
	{
	  if (!memcmp (tracked[i].stream, &tcp->ack_seq, 4))
	    {

	      printf ("new [%s] session detected\n", tracked[i].key);

	      sprintf (fname, "%s/%s-%u-%s",
	         dump_directory, tracked[i].key, tcp->dest, ACTVSTR
	      );

	      if (tracked[i].dump != NULL)
		{
		  printf ("tuncation of incomplete session %s-%u-%s\n",
			  tracked[i].key, tracked[i].port, ACTVSTR);

		  fclose (tracked[i].dump);
		}

	      if ((tracked[i].dump = fopen (fname, "w+")) == NULL)
		{
		  printf ("unable to open dump %s\n", fname);
		  exit (0);
		}

	      tracked[i].port = tcp->dest;
	      tracked[i].length = ip->id;
	      tracked[i].counter =4;

	      break;

	    } /* new session check */

	  /* match session continuation */
	  if (tracked[i].port == tcp->dest)
	    {
	      unsigned char check;

	      memcpy (&check, &ip->id, 1);

	      if (check == tracked[i].stream[tracked[i].counter])
		{
		  fwrite ((char *) (&ip->id) + 1, 1, 1, tracked[i].dump);
		  fwrite ((char *) &tcp->ack_seq, 1, 4, tracked[i].dump);

		  if (++(tracked[i].counter) == STREAMSIZE)
		    tracked[i].counter = 0;

		  tracked[i].readbyte += 5;

		  if (tracked[i].readbyte < tracked[i].length)
		    break;
	       /* else: session is finished */

		  sprintf (fname, "%s/%s-%u-%s",
			   dump_directory, tracked[i].key, tracked[i].port,
			   ACTVSTR);
		  sprintf (newname, "%s/%s-%u", dump_directory, tracked[i].key,
			   tracked[i].port);

		  if (tracked[i].readbyte > tracked[i].length)
		    fseek (tracked[i].dump, tracked[i].length, SEEK_SET);
	       /* else: readbyte == length and doesn't required fseek */

		  fclose (tracked[i].dump);
		  rename (fname, newname);

		  printf ("session closed and saved on %s (%d byte)\n",
			  newname, tracked[i].length);

		  tracked[i].port = 0;
		  tracked[i].dump = NULL;
		  tracked[i].counter = tracked[i].length =
		    tracked[i].readbyte = 0;
		}
	    }
	}	/* for rolling over tracked[] */
    }	/* while read */

/* never reached if read don't make error */
  printf ("error reading at raw sock layer\n");
  exit (1);
}
<-X->

This is how it works client side:

schlafen:blaststeg# man ls > secret 
Reformatting ls(1), please wait...
schlafen:blaststeg# md5sum secret 
f306648d7e04892e23ed31526e55161d  secret
schlafen:blaststeg# gzip -9 secret 
schlafen:blaststeg# ./blaststegsender secret.gz "antanisuperlativo" 192.168.1.1
PRIVACY PROTECTION SOFTWARE - example of hiding data on
apparently common worm traffic error - www.s0ftpj.org
check about other information, it could be useful to understand
the limits, the working system and the motivation before run this
steganographic software. coded by vecna@s0ftpj.org
file sent
schlafen:blaststeg#

And this is how it work server side:

511./tmp# ./blaststegd keyfile 
KEEP YOUR PRIVACY! this is a free software for communication hiding
blaststeg daemon, anonymous steganographed packets receiver
coded on Sat Jan 31 2004 vecna@s0ftpj.org, http://www.s0ftpj.org
READ ALL ABOUT THIS SOFTWARE, I'm acting like worm-traffic emulation...
if a lot of time is past since the time i coded this it could be obsolete
and insecure!
this software could make some complete/incomplete dump on /tmp
don't forget about that

new [antanisuperlativo] session detected
session closed and saved on /tmp/antanisuperlativo-5127 (3141 byte)

[1]+  Stopped                 ./blaststegd keyfile
512./tmp# file antanisuperlativo-5127 
antanisuperlativo-5127: gzip compressed data, deflated, original filename, 
last modified: Tue Feb  3 02:03:17 2004, max compression, os: Unix
513./tmp# mv antanisuperlativo-5127 tmp.gz
514./tmp# gzip -d tmp.gz 

gzip: tmp.gz: decompression OK, trailing garbage ignored
515./tmp# md5sum tmp 
f306648d7e04892e23ed31526e55161d  tmp
516./tmp# fg
./blaststegd keyfile

This is just an example of how important it is to be able to look
like something common, inside which we encode information (public:
to consider safe a stegosystem, like a cryptosystem, it must not rely
on using a secret algorhythm) influenced by a key that permits to
extract data only when that key is really present.

5) OUTRO                                                                        
                                                                                
Steganography is applicable on anything, the unique requirement is an
implementation. Instead writing entire standalone programs a way of exploiting
steganography 'for the masses' could be patching existing software letting them
look for a centralized steganographic engine on the host machine: the patch
should decode the steganographic stream from the lower-level protocol (e.g.
HTTP) and the centralized engine do the rest.
Thus, the engine could decode any type of file from any type of stream,
eventually resembling the original object when composed by different formats or
coming from different streams, or throw the data away.

By this way, avoiding indipendent programs as "insert raw data, insert data in
format XXXX" and  "extra data from format XXXX", less work is required.  If
someone wants to make a try, you can try sending me an e-mail :)

At last, i would like to add that after receiving comments from the article
pre-release:

- innova could be considered stuck at version 0.0.1, instead of 0.0.2

- when planning a new steganographic protocol, it is important to choose
  where to hide data in the original stream, but it is even more important
  thinking about "how much different is my communication from a standard one?"
  possible answers to this question may be:
  "a lot" - which is VERY BAD
  "not so much, but some statistic analysis can easily reveal the difference" -
   meaning that the software is MERELY USABLE
  "it is not possible to distinguish them in any way" - it's VERY GOOD.

- this article talks mainly about two things: the first is usage of 
  steganography in network communications and the other is a general 
  excursus about using the same options an user may play with in standard 
  activity as parameters for a steganographic channel.
  Usage of bold and font variations is a border-line example, but it is not to 
  be undervaluated. Any user-defineable element can be used as a steganographic
  transport container.
  If the abuse of one of those system could introduce a weakness in the system 
  itself, many system could be used simultaneously, thus sharing the load and 
  risk of statistical analisys.

- Any steganographic system can be applied to extract data from any underlying 
  stream. If I combine bold, font variations and extraction of lower bits of JPG
  images in a web page and if I defend myself by using only 10% of each element,
  I can apply the reversed procedure to ALL web pages and what I obtain MUST
  seem uniformly random, because the user's entropy is the best place where to
  hide.
  Raw data that must be hid should be preprocessed to be as much similar as 
  possible to data that can be extracted from non-stegranophic pages, 
  expecially regarding statistical distributions (like in plain text). The
  simpler and may be most effective manipulation is compression and, if
  desired, encrpytion.  (Obviously, specific headers like gzip and ACE's one 
  must be removed) 
  By this way, bruteforce attacks to reach steganographed data are very likely 
  to fail.

- In my opinion, steganography can't be stopped. In any way.
  Even in a scenario having packet filtering, trusted system and software, level
  5 content filtering and so on, it could be possible to securely communicate.
  Let's only imagine to use macros in famous word processing software, which may
  be implemented in crazy programming languages: even in this terrificant
  hyptohetic informatics scenario it could be possible to hack, create or 
  generate documents that could have more content than the simple appereance 
  using steganography.
  Now, escaping this prohibitive situation, by sending them through e-mail or  
  publishing them on public forums it could be possible to make hidden
  communications.
  
- Using higher layer protocols as a trasmission media involves an higher amount
  of possibilities where to hide data, thus any type of analysis becomes harder:
  
  * a lot of formats
  * a lot of possible options requiring dedicated analisys
  * evolutions of protocols and formats through time
  
  This hightens even more the amounts of false positives 
  Basically, the more the underlying system is complex, the more it will be 
  simple to find places where to hide safely.

  The l-user won't have a predictable behaviour, never and never.

6) THANKS
   
CCP, Guccini, Cardigans, Latte e i suoi derivati, Scisma, Vivaldi and Radio
Cybernet.
Acaso and MD for letting me escape from shell's labyrinths

Metro Olografix crypto meeting (http://www.olografix.org)  and e-privacy 
(http://e-privacy.firenze.linux.it) for their contribute in keeping my interest
for these topics alive. 
Smaster for logs suitable for the analisys, which could have been remained
an idea, instead.

If you are looking for the trascendental illumination in malabyte:
http://3564020356.org/

Zeist! The first person who read the pre-release and even understood it :)
odo, that in extremis remembered me one of the best software about steganograpy
which is transparent, simple and written in Italian :) 
(http://www.autistici.org/bakunin/) (mod_stego) which also respects HTML pages'
rendering for the browser but injects data at the same time.


and even if I _do_ support freedom of choice:

                    _ALWAYS_ ENCRYPT YOUR E-MAIL!
                         http://www.gnupg.org/


-[ WEB ]----------------------------------------------------------------------

        http://bfi.s0ftpj.org      [main site - IT]
        http://bfi.cx              [mirror - IT]
        http://bfi.freaknet.org    [mirror - AT]
        http://bfi.anomalistic.org [mirror - SG]
	http://bfi.slackit.org     [mirror - DE]


-[ E-MAiL ]-------------------------------------------------------------------

        bfi@s0ftpj.org


-[ PGP ]----------------------------------------------------------------------

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.3i
mQENAzZsSu8AAAEIAM5FrActPz32W1AbxJ/LDG7bB371rhB1aG7/AzDEkXH67nni
DrMRyP+0u4tCTGizOGof0s/YDm2hH4jh+aGO9djJBzIEU8p1dvY677uw6oVCM374
nkjbyDjvBeuJVooKo+J6yGZuUq7jVgBKsR0uklfe5/0TUXsVva9b1pBfxqynK5OO
lQGJuq7g79jTSTqsa0mbFFxAlFq5GZmL+fnZdjWGI0c2pZrz+Tdj2+Ic3dl9dWax
iuy9Bp4Bq+H0mpCmnvwTMVdS2c+99s9unfnbzGvO6KqiwZzIWU9pQeK+v7W6vPa3
TbGHwwH4iaAWQH0mm7v+KdpMzqUPucgvfugfx+kABRO0FUJmSTk4IDxiZmk5OEB1
c2EubmV0PokBFQMFEDZsSu+5yC9+6B/H6QEBb6EIAMRP40T7m4Y1arNkj5enWC/b
a6M4oog42xr9UHOd8X2cOBBNB8qTe+dhBIhPX0fDJnnCr0WuEQ+eiw0YHJKyk5ql
GB/UkRH/hR4IpA0alUUjEYjTqL5HZmW9phMA9xiTAqoNhmXaIh7MVaYmcxhXwoOo
WYOaYoklxxA5qZxOwIXRxlmaN48SKsQuPrSrHwTdKxd+qB7QDU83h8nQ7dB4MAse
gDvMUdspekxAX8XBikXLvVuT0ai4xd8o8owWNR5fQAsNkbrdjOUWrOs0dbFx2K9J
l3XqeKl3XEgLvVG8JyhloKl65h9rUyw6Ek5hvb5ROuyS/lAGGWvxv2YJrN8ABLo=
=o7CG
-----END PGP PUBLIC KEY BLOCK-----


==============================================================================
-----------------------------------[ EOF ]------------------------------------
==============================================================================